Python: Traverse Directory

By Xah Lee. Date: . Last updated: .

Suppose you want to visit every file in a directory. For example, do find/replace on all HTML files. You can use os.path.walk().

Note: os.path.walk() is deprecated, and removed in Python 3. You should use os.walk() instead, also available in python 2.7.x. For how-to, see: Python 3: Traverse Directory.

# -*- coding: utf-8 -*-
# python 2

# traverse a directory

import os

mydir= "/home/xyz/Documents"

def myfun(s1, s2, s3):
    print "s1 is {}".format(s1)
    print "s2 is {}".format(s2)
    print "s3 is {}".format(s3)
    print "--------------------------"

os.path.walk(mydir, myfun, "xyz")

os.path.walk(base_dir, ƒ, arg) will walk a dir starting at base_dir. When it sees a directory (including base_dir), it will call ƒ(arg, current_dir , list_of_children).


Here's a example of filter some files by file name extension, and for each file we want, call a function on it.

# -*- coding: utf-8 -*-
# python

# traverse a dir, and list only html files

import os

mydir= "/home/xyz/Documents/"

def processThisFile(fpath):
    print "g touched:", fpath

def filterFile(dummy, thisDir, dirChildrenList):
    for child in dirChildrenList:
        if ".html" == os.path.splitext(child)[1] and os.path.isfile(thisDir+"/"+child):

os.path.walk(mydir, filterFile, None)

Note that os.path.splitext() splits a string into two parts, a portion before the last period, and the rest in the second portion. Effectively it is used for getting file suffix. The os.path.isfile() makes sure that this is a actual file and not a dir with “.html” suffix.

Note: before Python 2.4.2: the first arg to os.path.walk must not end in a slash.

for Python 3, see: Python 3: Traverse Directory

10.1. os.path — Common pathname manipulations — Python v2.7.6 documentation #os.path.walk