Lisp, Python, Perl, Ruby Code to Validate Matching Brackets

, ,

This is a preliminary report on scripts of several languages to validate matching brackets.

Problem Description

Little Parser Problem Challenge: Matching Pairs Validation

The problem is to write a script that can check a dir of text files (and all subdirs) and reports if a file has any mismatched matching brackets.

Here's a example of mismatched bracket: ([)], (“[”), ((, etc. (and yes, the brackets may be nested. There are usually text between these chars.)

I'll be writing a emacs lisp solution and post in 2 days. Ι welcome other lang implementations. In particular, perl, python, php, ruby, tcl, lua, Haskell, Ocaml. I'll also be able to eval common lisp (clisp) and Scheme lisp (scsh), Java. Other lang such as Clojure, Scala, C, C++, or any others, are all welcome, but i won't be able to eval it. JavaScript implementation will be very interesting too, but please indicate which and where to install the command line version.

I hope you'll find this a interesting “challenge”. This is a parsing problem. I haven't studied parsers except some Wikipedia reading, so my solution will probably be naive. I hope to see and learn from your solution too.

i hope you'll participate. Just post solution here. Thanks.

Solutions

Emacs Lisp

;; -*- coding: utf-8 -*-
;; spec at http://xahlee.org/comp/validate_matching_brackets.html
;; 2011-07-15
;; by Xah Lee.

(setq inputDir "/cygdrive/c/Users/h3/web/xahlee_org/p/time_machine") ; must end in slash

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("“" . "”")
                   ("‹" . "›")
                   ("«" . "»")
                   ("【" . "】")
                   ("〖" . "〗")
                   ("〈" . "〉")
                   ("《" . "》")
                   ("「" . "」")
                   ("『" . "』")
                   ) )

(setq searchRegex "")
(mapc
 (lambda (mypair)
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair)) "|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)
(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending “|”
(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t t)) ; change | to \\| for regex “or” operation

(defun my-process-file (fpath)
  "process the file at fullpath fpath …"
  (let (myBuffer (myStack '()) (ξchar "") ξpos)
      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fpath nil nil nil t)

      (while (search-forward-regexp searchRegex nil t)
        (setq ξpos (point) )
        (setq ξchar (buffer-substring-no-properties ξpos (- ξpos 1))  )
        (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
          (setq isClosingCharQ (rassoc ξchar matchPairs))
          (when isClosingCharQ (setq matchedOpeningChar (car isClosingCharQ) ) )
          (if (and (car myStack) (equal (elt (car myStack) 0) matchedOpeningChar ) )
              (setq myStack (cdr myStack) )
            (setq myStack (cons (vector ξchar ξpos) myStack) ) )
          ) )
      (when (not (equal myStack nil))
        (princ "Error file: ")
        (princ fpath)
        (print (car myStack) )
        )
      (kill-buffer myBuffer)
    ))

(require 'find-lisp)

(mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))

raw source code validate_brackets_Xah_Lee.el

Detailed explanation at Emacs Lisp: Batch Script to Validate Matching Brackets.

Jon Snader (jcs) wrote 2 versions in elisp. See: 〔Xah's Challenge (Part 2) By Jon Snader. @ irreal.org…〕. Ι haven't studied it yet.

Python

# -*- coding: utf-8 -*-
# python 3
# spec at http://xahlee.org/comp/validate_matching_brackets.html
# 2011-07-17
# by Raymond Hettinger

input_dir = "c:/Users/h3/web/xxst/find_elisp/validate matching brackets/xxdir"
input_dir = "c:/Users/h3/web/xahlee_org/p/time_machine"
input_dir = "c:/Users/h3/web/xahlee_org/comp/validate_matching_brackets/xx"

import os, re

def check_balance(characters):
    '''Return -1 if all delimiters are balanced or
       the char number of the first delimiter mismatch.

    '''
    openers = {
        '(': ')',
        '{': '}',
        '[': ']',
        '“': '”',
        '‹': '›',
        '«': '»',
        '【': '】',
        '〈': '〉',
        '《': '》',
        '「': '」',
        '『': '』',
        }
    closers = set(openers.values())
    stack = []
    for i, c in enumerate(characters, start=1):
        if c in openers:
            stack.append(openers[c])
        elif c in closers:
            if not stack or c != stack.pop():
                return i
    if stack:
        return i
    return -1

def scan(directory, encoding='utf-8'):
    for dirpath, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            fullname = os.path.join(dirpath, filename)
            if re.search(r'\.txt$', filename, re.U) and os.path.isfile(fullname):
                print ( "processing:" + fullname)
                with open(fullname, 'r', encoding=encoding) as f:
                    try:
                        characters = f.read()
                    except UnicodeDecodeError:
                        continue
                position = check_balance(characters)
                if position >= 0:
                    print('{0!r}: {1}'.format(position, fullname))

scan(input_dir)

print ("done")
validate_brackets_Raymond_Hettinger.py

This report is incomplete. So far Raymond Hettinger's python 3 code is the only working code other than elisp. None of the following works on my machine.

For the original post of this problem and the discussion, see: a little parsing challenge ☺ @ groups.google.com….

Thanks to the many who have written code and made helpful comments. I may come back to clean this up, in the coming weeks. If you can correct one of the following programs, please comment.

Pending Solutions

Python

Ruby

Perl

Common Lisp

blog comments powered by Disqus