Python: Traverse Directory

, , …,

Suppose you want to visit every file in a directory. For example, do find/replace on all HTML files. You can use os.path.walk().

Note: os.path.walk() is deprecated, and removed in Python 3. You should use os.walk() instead, also available in python 2.7.x. For how-to, see: Python 3: Traverse Directory.

# -*- coding: utf-8 -*-
# python 2

# traverse a directory

import os

mydir= "/home/xah/web/xahlee_info/perl-python/"

def myfun(s1, s2, s3):
    print "s1 is {}".format(s1)
    print "s2 is {}".format(s2)
    print "s3 is {}".format(s3)
    print "--------------------------"

os.path.walk(mydir, myfun, "xyz")

# # sample output
# s1 is xyz
# s2 is /home/xah/web/xahlee_info/perl-python/
# s3 is ['python_construct_tree_from_edge.html', 'quality_docs.html', …]
# --------------------------
# s1 is xyz
# s2 is /home/xah/web/xahlee_info/perl-python/_p
# s3 is ['476px-Python_logo.svg.png', 'camel.png']
# --------------------------
# s1 is xyz
# s2 is /home/xah/web/xahlee_info/perl-python/python_re-write
# s3 is ['lib']
# --------------------------
# s1 is xyz
# s2 is /home/xah/web/xahlee_info/perl-python/python_re-write/lib
# s3 is ['node111.html', 'module-re.html', 'lib.css', 'match-objects.html', 're-syntax.html', 'regex_flags.html', 're-objects.html']
# --------------------------

os.path.walk(base dir, ƒ, arg) will walk a dir starting at base dir. When it sees a directory (including base dir), it will call ƒ(arg, current dir , list of children).


Here's a example of filter some files by file name extension, and for each file we want, call a function on it.

# -*- coding: utf-8 -*-
# python

# traverse a dir, and list only html files

import os

mydir= "/home/xah/web/xahlee_info/perl-python/"

def processThisFile(fpath):
    print "g touched:", fpath

def filterFile(dummy, thisDir, dirChildrenList):
    for child in dirChildrenList:
        if ".html" == os.path.splitext(child)[1] and os.path.isfile(thisDir+"/"+child):

os.path.walk(mydir, filterFile, None)

Note that os.path.splitext() splits a string into two parts, a portion before the last period, and the rest in the second portion. Effectively it is used for getting file suffix. The os.path.isfile() makes sure that this is a actual file and not a dir with “.html” suffix.

Note: before Python 2.4.2: the first arg to os.path.walk must not end in a slash.

for Python 3, see: Python 3: Traverse Directory

10.1. os.path — Common pathname manipulations — Python v2.7.6 documentation #os.path.walk

blog comments powered by Disqus