ELisp: Batch Script to Validate Matching Brackets

By Xah Lee. Date: . Last updated: .

This page shows you how to write a elisp script that checks thousands of files for mismatched brackets.

Problem

Write a emacs lisp script to process 5 thousands files and check for mismatched brackets.

The matching pairs includes these: () {} [] “” ‹› «» 〈〉 《》 【】 〖〗 「」 『』. [see Unicode: Brackets, Quotes «»「」【】《》]

The program should be able to check all files in a dir, and report any file that has mismatched bracket, and also indicate the line number or position where a mismatch occurs.

Solution

;; -*- coding: utf-8; lexical-binding: t; -*-
;; 2011-07-15 , 2020-04-12
;; spec at http://xahlee.org/comp/validate_matching_brackets.html
;; by Xah Lee.

;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (…{…}… “…”…)
;; bad: ( [)]
;; bad: ( ( )

(setq inputFile "xx_test_file.txt" ) ; a test file.
(setq inputDir "/Users/xah/web/ergoemacs_org/emacs/") ; must end in slash

(defvar matchPairs '() "a alist. For each pair, the car is opening char, cdr is closing char.")
(setq matchPairs '(
                   ;; ("(" . ")")
                   ;; ("{" . "}")
                   ;; ("[" . "]")
                   ;; ("“" . "”")
                   ;; ("‹" . "›")
                   ;; ("«" . "»")
                   ("【" . "】")
                   ;; ("〖" . "〗")
                   ;; ("〈" . "〉")
                   ;; ("《" . "》")
                   ;; ("「" . "」")
                   ;; ("『" . "』")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair)) "|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending “|”

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fPath)
  "Process the file at FPATH"
  (let (myBuffer myStack $char $pos)

    (setq myStack '()) ; each entry is a vector [char position]
    (setq $char "")     ; the current char found

    (when t
      ;; (not (string-match "/xx" fPath)) ; in case you want to skip certain files

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fPath nil nil nil t)

      (goto-char (point-min))
      (while (re-search-forward searchRegex nil t)
        (setq $pos (point))
        (setq $char (buffer-substring-no-properties $pos (- $pos 1)))

        ;; (princ (format "-----------------------------\nfound char: %s\n" $char) )

        (let ((isClosingCharQ nil) (matchedOpeningChar nil))
          (setq isClosingCharQ (rassoc $char matchPairs))
          (when isClosingCharQ (setq matchedOpeningChar (car isClosingCharQ)))

          ;; (princ (format "isClosingCharQ is: %s\n" isClosingCharQ) )
          ;; (princ (format "matchedOpeningChar is: %s\n" matchedOpeningChar) )

          (if
              (and
               (car myStack) ; not empty
               (equal (elt (car myStack) 0) matchedOpeningChar ))
              (progn
                ;; (princ (format "matched this top item on stack: %s\n" (car myStack)) )
                (setq myStack (cdr myStack)))
            (progn
              ;; (princ (format "did not match this top item on stack: %s\n" (car myStack)) )
              (setq myStack (cons (vector $char $pos) myStack)))))
        ;; (princ "current stack: " )
        ;; (princ myStack )
        ;; (terpri )
        )

      (when (not (equal myStack nil))
        (princ "Error file: ")
        (princ fPath)
        (print (car myStack)))
      (kill-buffer myBuffer))))

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    ;; (my-process-file inputFile) ; use this to test one one single file
    (mapc 'my-process-file
          (directory-files-recursively inputDir "\\.html$" )) ; do all HTML files
    (princ "Done deal!")))

I added many comments and debug code for easy understanding. If you are not familiar with the many elisp idioms such as opening file, buffers, printing to output, see: ELisp: How to Write a CommandText Processing with Emacs Lisp Batch Style.

To run the code, simply open it in emacs. Edit the line at the top for “inputDir”. Then call eval-buffer.

Here's a sample output:

Error file: c:/Users/h3/web/xahlee_org/p/time_machine/Hettie_Potter_orig.txt
[")" 3625]
Error file: c:/Users/h3/web/xahlee_org/p/time_machine/Hettie_Potter.txt
[")" 2338]
Error file: c:/Users/h3/web/xahlee_org/p/arabian_nights/xx/v1fn.txt
["”" 185795]
Done deal!

Code Explanation

Here's outline of steps.

Here's some interesting use of lisp features to implement the above.

Define Matching Pair Chars as “alist”

We begin by defining the chars we want to check, as a “association list”. Like this:

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   )
      )

[see ELisp: Association List]

If you care only to check for curly quotes, you can remove elements above. This is convenient because some files necessarily have mismatched pairs such as the parenthesis, because that char is used for many non-bracketing purposes (For example, ASCII smiley).

Generate Regex String from alist

To search for a set of chars in emacs, we can read the buffer char-by-char, or, we can simply use search-forward-regexp. To use that, first we need to generate a regex string from our matchPairs alist. For example, if we want to search “〈〉《》”, then our regex string should be "〈\\|〉\\|《\\|》".

First, we define/declare the string. Not a necessary step, but we do it for clarity.

(setq searchRegex "")

Then we go thru the matchPairs alist. For each pair, we use car and cdr to get the chars and concat it to the string. Like this:

(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair)) "|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

Then we remove the ending |.

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending “|”

Then, change | to \\|. In elisp regex, the | is literal. The “regex or” is \|. Elisp does not have a special regex string syntax, it only understands normal strings. So, to feed to regex \|, you need to espace the first backslash. So, the string for regex needs to be \\|. Here's how we do it:

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t t)) ; change | to \\| for regex “or” operation

See also: Emacs: Regular Expression.

Implement Stack Using Lisp List

Stack is done using lisp's list. For example: '(1 2 3). The top of stack is the first element. To add to the stack, do it like this: (setq mystack (cons newitem mystack)). To remove a item from stack is this: (setq mystack (cdr mystack)). The stack start as a empty list: '().

For each entry in the stack, we put the char and also its position, so that we can report the position if the file does have mismatched pairs.

We use a vector as entries for the stack. Each entry is like this: (vector char pos). [see ELisp: Vector]

Here's how to fetch a char from alist, and push to stack, pop from stack.

; check if current char is a closing char and is in our match pairs alist.
; use “rassoc” to check alist's set of “values”.
; It returns the first key/value pair found, or nil
(rassoc char matchPairs)

; add to stack
(setq myStack (cons (vector char pos) myStack) )

; pop stack
(setq myStack (cdr myStack) )

Advantages of Emacs Lisp

Note that the great advantage of using elisp for text processing, instead of {Perl, Python, Ruby, etc} is that many things are taken care by the emacs environment.

I don't need to write code to deal with file encoding (emacs automatically does it). No reading file is involved. Just “open” or “save” the file. Processing a file is simply moving cursor thru characters or lines, changing parts of it. No code needed for doing safety backup. Emacs automatically does backup if you made any changes, and can be turned off by setting the built-in var “make-backup-files” to nil. For file paths in the output, you can easily open it by a click or key press. I can add just 2 lines so that clicking on the error char in the output jumps to the location in the file.

Any elisp script you write inside emacs automatically becomes a extension of emacs and can be used in a interactive way. Or, you could run it in a command line shell, for example, emacs --script process_log.el.

This problem is posted to a few comp.lang newsgroups as a fun challenge. See: Programing Exercise, Validate Matching Brackets.