Perl: Find Replace String Pairs in Directory

By Xah Lee. Date: . Last updated: .

Here is a Perl script that do find and replace.

Features:

  1. Multiple pairs of find replace strings in one shot.
  2. Regex or plain string search.
  3. Optional backup to a separate folder
  4. Exclude large files
  5. Count number of replacements
# -*- coding: utf-8 -*-
# perl

use utf8;
use strict;

=pod

Description:
This script does find and replace on a given foler recursively.

Features:
• multiple Find and Replace string pairs can be given.
• The find/replace strings can be set to regex or literal.
• Files can be filtered according to file name suffix matching or other criterions.
• Backup copies of original files will be made at a user specified folder that preserves all folder structures of original folder.
• A report will be generated that indicates which files has been changed, how many changes, and total number of files changed.
• files will retain their own/group/permissions settings.

usage:
1. edit the parts under the section
2. edit the subroutine fileFilterQ to set which file will be checked or skipped.

to do:
• in the report, print the strings that are changed, possibly with surrounding lines.
• allow just find without replace.
• add the GNU syntax for unix command prompt.
• Report if backup directory exists already, or provide toggle to overwrite, or some other smarties.

Date created: 2000-02
Last run: 2019-01-13
web site: http://xahlee.info/perl/perl_find_replace_in_dir.html
Author: Xah Lee

=cut

use File::Find;
use File::Path;
use File::Copy;
use Data::Dumper;

# dir search
my $inputDirPath = q[/Users/xah/web/ergoemacs_org/emacs_manual/];

# backup dir path. if no exit, will be created
my $backupDirPath = q[/Users/xah/backup];

# find/replace string pairs
my %findReplaceH = (
q[find string here 1] => q[replace string here 1],
q[find string here 2] => q[replace string here 2],
);

# $useRegexQ has values 1 or 0. If 1, inteprets the pairs in %findReplaceH
# to be regex.
my $useRegexQ = 0;

# in bytes. larger files will be skipped
my $fileSizeLimit = 2000 * 1000;

# --------------------------------------------------
# globals

$inputDirPath =~ s[/$][]; # e.g. '/home/joe/public_html'
$backupDirPath =~ s[/$][]; # e.g. '/tmp/joe_back';

$inputDirPath =~ m[/(\w+)$];
my $previousDir = $`;   # e.g. '/home/joe'
my $lastDir = $1;       # e.g. 'public_html'
my $backupRoot = $backupDirPath . '/' . $1; # e.g. '/tmp/joe_back/public_html'

my $refLargeFiles = [];
my $totalFileChangedCount = 0;

# --------------------------------------------------
# subroutines

# fileFilterQ($fullFilePath) return true if file is desired.
sub fileFilterQ ($) {
        my $fileName = $_[0];

        if ((-s $fileName) > $fileSizeLimit) {
                push (@$refLargeFiles, $fileName);
                return 0;
        };
        if ($fileName =~ m{\.html$}) {
print "processing: $fileName\n";
return 1;};

##        if (-d $fileName) {return 0;}; # directory
##        if (not (-T $fileName)) {return 0;}; # not text file

        return 0;
};

# go through each file, accumulate a hash.
sub processFile {
        my $currentFile = $File::Find::name; # full path spect
        my $currentDir = $File::Find::dir;
        my $currentFileName = $_;

        if (not fileFilterQ($currentFile)) {
                return 1;
        }

# open file. Read in the whole file.
        if (not(open FILE, "<$currentFile")) {die("Error opening file:
$!");};
        my $wholeFileString;
        {local $/ = undef; $wholeFileString = <FILE>;};
        if (not(close(FILE))) {die("Error closing file: $!");};

# do the replacement.
        my $replaceCount = 0;

        foreach my $key1 (keys %findReplaceH) {
                my $pattern = ($useRegexQ ? $key1 : quotemeta($key1));
                $replaceCount = $replaceCount + ($wholeFileString =~
s/$pattern/$findReplaceH{$key1}/g);
        };

        if ($replaceCount > 0) { # replacement has happened
                $totalFileChangedCount++;
# do backup
                # make a directory in the backup path, make a backup copy.
                my $pathAdd = $currentDir; $pathAdd =~ s[$inputDirPath][];
                mkpath("$backupRoot/$pathAdd", 0, 0777);
                copy($currentFile, "$backupRoot/$pathAdd/$currentFileName") or
                    die "error: file copying file failed on $currentFile\n$!";

# write to the original
                # get the file mode.
                my ($mode, $uid, $gid) = (stat($currentFile))[2,4,5];

                # write out a new file.
                if (not(open OUTFILE, ">$currentFile")) {die("Error opening file: $!");};
                print OUTFILE $wholeFileString;
                if (not(close(OUTFILE))) {die("Error closing file: $!");};

                # set the file mode.
                chmod($mode, $currentFile);
                chown($uid, $gid, $currentFile);

                print "-----out77311-------------------------------\n";
                print "$replaceCount replacements made at\n";
                print "$currentFile\n";
        }

};

# --------------------------------------------------
# main

find(\&processFile, $inputDirPath);

print "--------------------------------------------\n\n\n";
print "Total of $totalFileChangedCount files changed.\n";

if (scalar @$refLargeFiles > 0) {
        print "The following large files are skipped:\n";
        print Dumper($refLargeFiles);
}

__END__

Sample output

processing: /Users/xah/web/ergoemacs_org/emacs_manual/elisp/_0025_002dConstructs.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/elisp/A-Sample-Function-Description.html
-----out77311-------------------------------
4 replacements made at
/Users/xah/web/ergoemacs_org/emacs_manual/elisp/A-Sample-Function-Description.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/elisp/A-Sample-Variable-Description.html
-----out77311-------------------------------
1 replacements made at
/Users/xah/web/ergoemacs_org/emacs_manual/elisp/A-Sample-Variable-Description.html

... hundreds lines

processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Words.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Writing-Calendar-Files.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/X-Resources.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Xref-Commands.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Xref.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Yanking.html
processing: /Users/xah/web/ergoemacs_org/emacs_manual/emacs/Yes-or-No-Prompts.html
--------------------------------------------

Total of 18 files changed.
The following large files are skipped:
$VAR1 = [
          '/Users/xah/web/ergoemacs_org/emacs_manual/elisp/index_index.html'
        ];

I've been using this script from 2000 to 2005.

Find Replace Scripts

  1. Golang: Find Replace Script
  2. Python: Find Replace in a Dir
  3. Python: Find Replace by Regex
  4. Perl: Find Replace String Pairs in Directory
  5. Elisp: Write grep
  6. Emacs: xah-find.el, Find Replace in Pure Elisp

If you have a question, put $5 at patreon and message me.

Perl

  1. Perl Overview
  2. Version String
  3. Help System

Detail

  1. Quoting String
  2. Format String
  3. String Operations
  4. True, False
  5. if then else
  6. Loop
  7. List / Array
  8. Loop Thru List
  9. Map f to List
  10. List Comprehension
  11. Hash Table
  12. Function Optional Param
  13. regex

Text Processing

  1. Unicode 🐪
  2. Convert File Encoding
  3. Read Write File
  4. Traverse Dir
  5. Find Replace
  6. Validate Local Links
  7. Split Line by Regex

Advanced

  1. Sort List, Matrix, Object
  2. Sort Matrix
  3. Sort Unstable
  4. Sort Misc
  5. List Modules, Search Paths
  6. Write a Module
  7. Complex Numbers
  8. System Call
  9. gzip
  10. Get Env Var
  11. GET Web Content
  12. Email