Python: Traverse Directory

, , …,

Suppose you want to visit every file in a directory. For example, do find/replace on all HTML files. You can use os.path.walk().

Note: os.path.walk() is deprecated, and removed in Python 3. You should use os.walk() instead, also available in python 2.7.x. For how-to, see: Python 3: Traverse Directory.

# -*- coding: utf-8 -*-
# python

# traverse a directory

import os

mydir= "/home/xah/web"

def myfun(s1, s2, s3):
    print "s1 is {}".format(s1)
    print "s2 is {}".format(s2)
    print "s3 is {}".format(s3)
    print "--------------------------"

os.path.walk(mydir, myfun, "xyz")

os.path.walk(base dir, ƒ, arg) will walk a dir starting at base dir. When it sees a directory (including base dir), it will call ƒ(arg, current dir , list of children).

where:

Here's a example of filter some files by file name extension, and for each file we want, call a function on it.

# -*- coding: utf-8 -*-
# python

# traverse a dir, and list only html files

import os

mydir= "/home/xah/web/xahlee_info/perl-python/"

def processThisFile(fpath):
    print "g touched:", fpath

def filterFile(dummy, thisDir, dirChildrenList):
    for child in dirChildrenList:
        if ".html" == os.path.splitext(child)[1] and os.path.isfile(thisDir+"/"+child):
            processThisFile(thisDir+"/"+child)

os.path.walk(mydir, filterFile, None)

Note that os.path.splitext() splits a string into two parts, a portion before the last period, and the rest in the second portion. Effectively it is used for getting file suffix. The os.path.isfile() makes sure that this is a actual file and not a dir with “.html” suffix.

One important thing to note: in the mydir, it must not end in a slash. One'd think Python'd take care of such trivia but no. This took me a while to debug. (as of Python 2.4.2, this is fixed.)

for Python 3, see: Python 3: Traverse Directory

10.1. os.path — Common pathname manipulations — Python v2.7.6 documentation #os.path.walk

blog comments powered by Disqus