Python: Find Replace by Regex

By Xah Lee. Date: . Last updated: .

Here's a Python script to do find/replace by regex, for all files in a dir.

Features:

  1. Can do whole directory, or limit by dir depth, or by list of files.
  2. Filter files by file extension or regex.
  3. Automatic backup. Can be turned off.
  4. Can have more than 1 regex/replace pairs.
  5. Print number of changes for changed files.

If you don't want regex, see Python: Find Replace in a Dir

# Python 3

# 2018-08-24

# change all files in a dir by 1 or more regex/replace pairs

# web site: http://xahlee.info/python/findreplace_regex.html

import os, sys, shutil, re
import datetime

# if this list is not empty, then only these files will be processed
file_list = [
    # "/home/xah/web/ergoemacs_org/emacs/emacs.html", # example
]

# must be full path
input_dir = "/Users/xah/xx_manual/"

file_extension_regex = r"\.html$"

min_level = 1 # files and dirs inside input_dir are level 1.
max_level = 9 # inclusive

do_backup = True
backup_suffix = "~~"

# find and replace pairs here. Each is a 2-tuple. first element is regex object, second is replace string
find_replace_list = [
   (re.compile(r'''<meta name="[A-Za-z]+" content="[-.,0-9]+">''', re.U|re.M|re.DOTALL),
    r''),

   # more find and replace pairs here
]

##################################################

def replace_string_in_file(fpath):
   "Replaces all strings by regex in find_replace_list at fpath."

   input_file = open(fpath, "r", encoding="utf-8")

   try:
      file_content = input_file.read()
   except UnicodeDecodeError:
      print("UnicodeDecodeError:{:s}".format(input_file))
      return

   input_file.close()

   num_replaced = 0
   for a_pair in find_replace_list:
      tem_tuple = re.subn(a_pair[0], a_pair[1], file_content)
      output_text = tem_tuple[0]
      num_replaced += tem_tuple[1]
      file_content = output_text

   if (num_replaced > 0):
      print(("◆ changed %d %s" % (num_replaced, fpath) ))

      if do_backup:
         shutil.copy2(fpath, fpath + backup_suffix)

      output_file = open(fpath, "r+b")
      output_file.read() # we do this way to preserve file creation date
      output_file.seek(0)
      output_file.write(output_text.encode("utf-8"))
      output_file.truncate()
      output_file.close()

##################################################

print(datetime.datetime.now())
print("Input Dir:", input_dir)
for x in find_replace_list:
   print("Find regex:「{}」".format(x[0]))
   print("Replace pattern:「{}」".format(x[1]))
   print("\n")

if (len(file_list) != 0):
   for ff in file_list: replace_string_in_file(os.path.normpath(ff) )
else:
    for dirPath, subdirList, fileList in os.walk(input_dir):
        curDirLevel = dirPath.count( os.sep) - input_dir.count( os.sep)
        curFileLevel = curDirLevel + 1
        if min_level <= curFileLevel <= max_level:
            for fName in fileList:
                if (re.search(file_extension_regex, fName, re.U)):
                    replace_string_in_file(dirPath + os.sep + fName)

print("Done.")

Find Replace Scripts

  1. Golang: Find Replace Script
  2. Python: Find Replace in a Dir
  3. Python: Find Replace by Regex
  4. Perl: Find Replace String Pairs in Directory
  5. Elisp: Write grep
  6. Emacs: xah-find.el, Find Replace in Pure Elisp

Python Text Processing

  1. Read/Write File
  2. Walk Directory
  3. Python 3: Walk Directory
  4. Manipulate Path
  5. Process Unicode
  6. Convert File Encoding
  7. Convert File Encoding in a Dir
  8. Find Replace in dir
  9. Find Replace by Regex
  10. Count Word Frequency

If you have a question, put $5 at patreon and message me.

Python

  1. Python 3 Basics
  2. Python 2 Basics
  3. Python 2 and 3 Difference
  4. Print Version
  5. Builtin Help
  6. Quote String
  7. String Methods
  8. Format String
  9. Operators
  10. True, False
  11. if then else
  12. Loop
  13. List Basics
  14. Loop Thru List
  15. Map f to List
  16. List Comprehension
  17. List Methods
  18. Sort
  19. Dictionary
  20. Loop Thru Dict
  21. Dict Methods
  22. Function
  23. Class
  24. List Modules
  25. Write a Module
  26. Unicode 🐍
  27. Object, ID, Type

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. 2 Traverse Directory
  4. Manipulate Path
  5. Process Unicode
  6. Convert File Encoding
  7. Find Replace in dir
  8. Find Replace by Regex
  9. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Complex Numbers
  7. Copy Nested List
  8. Tuple vs List
  9. Sets, Union, Intersection
  10. Closure
  11. 2 Closure
  12. Decorator
  13. 3 Map with Side Effect
  14. Append String in Loop
  15. Timing f timeit
  16. Keyword Arg Default Value Unstable
  17. Check Page Load Size
  18. Thumbnail Generation