Python 2: Traverse Directory

By Xah Lee. Date: . Last updated: .

Suppose you want to visit every file in a directory. For example, do find/replace on all HTML files.

os.path.walk(base_dir, f, arg)
walk a dir starting at base_dir. When it sees a directory (including base_dir), it will call f(arg, current_dir , list_of_children).

where:

  • current_dir → the full path of the current directory.
  • list_of_children → a list of all children of the current directory, including sub directories. Each item is a string of the file name or directory name.

Note: os.path.walk is deprecated, and removed in Python 3. You should use os.walk instead, also available in python 2.7.x. [see Python 3: Traverse Directory]

Note: before Python 2.4.2: the first arg to os.path.walk must not end in a slash.

# -*- coding: utf-8 -*-
# python 2

# traverse a directory

import os

mydir= "/home/xyz/Documents"

def myfun(s1, s2, s3):
    print "s1 is {}".format(s1)
    print "s2 is {}".format(s2)
    print "s3 is {}".format(s3)
    print "--------------------------"

os.path.walk(mydir, myfun, "xyz")

Here's a example of filter some files by file name extension, and for each file we want, call a function on it.

# -*- coding: utf-8 -*-
# python 2

# traverse a dir, and list only html files

import os

mydir= "/home/xyz/Documents/"

def processThisFile(fpath):
    print "g touched:", fpath

def filterFile(dummy, thisDir, dirChildrenList):
    for child in dirChildrenList:
        if ".html" == os.path.splitext(child)[1] and os.path.isfile(thisDir+"/"+child):
            processThisFile(thisDir+"/"+child)

os.path.walk(mydir, filterFile, None)

Note that os.path.splitext() splits a string into two parts, a portion before the last period, and the rest in the second portion. Effectively it is used for getting file suffix. The os.path.isfile() makes sure that this is a actual file and not a dir with “.html” suffix.

Python Text Processing

Python

Regex

Text Processing

Web

Misc