Python: Find Replace by Regex

By Xah Lee. Date: . Last updated: .

Here's a Python script to do find/replace by regex, for all files in a dir.

if you want simple plain text find replace (no regex), see Python: Find Replace in a Dir

# Python 3

# 2018-08-24

# change all files in a dir by 1 or more regex/replace pairs

import os, sys, shutil, re
import datetime

# if this this is not empty, then only these files will be processed
file_list = [
    # "/home/xah/web/ergoemacs_org/emacs/emacs.html", # example
]

# must be full path
input_dir = "/Users/xah/xx_manual/"

file_extension_regex = r"\.html$"

min_level = 1 # files and dirs inside input_dir are level 1.
max_level = 9 # inclusive

do_backup = True
backup_suffix = "~~"

# find and replace pairs here. Each is a 2-tuple. first element is regex object, second is replace string
find_replace_list = [
   (re.compile(r'''<meta name="[A-Za-z]+" content="[-.,0-9]+">''', re.U|re.M|re.DOTALL),
    r''),

   # more find and replace pairs here
]

##################################################

def replace_string_in_file(fpath):
   "Replaces all strings by regex in find_replace_list at fpath."

   input_file = open(fpath, "r", encoding="utf-8")

   try:
      file_content = input_file.read()
   except UnicodeDecodeError:
      print("UnicodeDecodeError:{:s}".format(input_file))
      return

   input_file.close()

   num_replaced = 0
   for a_pair in find_replace_list:
      tem_tuple = re.subn(a_pair[0], a_pair[1], file_content)
      output_text = tem_tuple[0]
      num_replaced += tem_tuple[1]
      file_content = output_text

   if (num_replaced > 0):
      print(("◆ changed %d %s" % (num_replaced, fpath) ))

      if do_backup:
         shutil.copy2(fpath, fpath + backup_suffix)

      output_file = open(fpath, "r+b")
      output_file.read() # we do this way to preserve file creation date
      output_file.seek(0)
      output_file.write(output_text.encode("utf-8"))
      output_file.truncate()
      output_file.close()

##################################################

print(datetime.datetime.now())
print("Input Dir:", input_dir)
for x in find_replace_list:
   print("Find regex:「{}」".format(x[0]))
   print("Replace pattern:「{}」".format(x[1]))
   print("\n")

if (len(file_list) != 0):
   for ff in file_list: replace_string_in_file(os.path.normpath(ff) )
else:
    for dirPath, subdirList, fileList in os.walk(input_dir):
        curDirLevel = dirPath.count( os.sep) - input_dir.count( os.sep)
        curFileLevel = curDirLevel + 1
        if min_level <= curFileLevel <= max_level:
            for fName in fileList:
                if (re.search(file_extension_regex, fName, re.U)):
                    replace_string_in_file(dirPath + os.sep + fName)

print("Done.")

If you have a question, put $5 at patreon and message me.

Python by Example

  1. Python Basics
  2. Print Version String
  3. Builtin Help
  4. Quote String
  5. String Operations
  6. String Methods
  7. Format String
  8. True, False
  9. if then else
  10. for, while, Loops
  11. List Basics
  12. Loop Thru List
  13. Map Function to List
  14. List Comprehension
  15. List Methods
  16. Dictionary
  17. Loop Thru Dict
  18. Dict Methods
  19. Function
  20. Class
  21. List Modules
  22. Write a Module
  23. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. Manipulate Path
  4. Process Unicode
  5. Convert File Encoding
  6. Find Replace in dir
  7. Find Replace by Regex
  8. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST
  5. Check Page Load Size
  6. Thumbnail Generation

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Complex Numbers

Advanced

  1. Sort
  2. Copy Nested List
  3. Tuple vs List
  4. Sets, Union, Intersection
  5. Closure in Python 2
  6. Decorator
  7. Append String in Loop
  8. Timing f timeit
  9. Keyword Arg Default Value Unstable