Elisp: Parsing Expression Grammar (PEG)

By Xah Lee. Date: . Last updated: .
xtodo work in progress

Code Example

;; -*- coding: utf-8; lexical-binding: t; -*-

(require 'peg)

(defun x-peg-fail (xstack)
  "Called by `peg-run' when fail.
Print to messages buffer of the stack.
Created: 2025-10-03
Version: 2025-10-03"
  (message "Parse failed. xstack [%s]" xstack))

(defun x-peg-success (zf)
  "Called by `peg-run' when success.
Print to messages buffer of ...
Created: 2025-10-03
Version: 2025-10-03"
  (message "
--------------------
Parse success.")
  (message "zf is [%s]" zf)
  (if (functionp zf)
      (message "function arg zf is %s" zf)
    (message "functionp nil"))
  (message "funcall zf result is %s" (funcall zf))
  (print "=========="))

(let ((xbuff (get-buffer-create "peg-parse-text")))

  (when (eq 1 (with-current-buffer xbuff (point-max)))
    (let ((xtext "<div><p></p></div>\n"))
      (with-current-buffer xbuff (insert xtext))))

  (with-current-buffer xbuff
    (goto-char (point-min))

    (with-peg-rules
        ((xml-elem `(-- (point)) opening-tag (* xml-elem) closing-tag `(-- (point)))
         (opening-tag "<" tag-name ">")
         (tag-name (+ [a-z]) (* "-") (* [a-z]))
         (closing-tag "</" tag-name ">"))
      (peg-run (peg xml-elem) 'x-peg-fail 'x-peg-success))

    ;; (pop-to-buffer xbuff)
    ;; (view-echo-area-messages)
    ))

;; --------------------
;; Parse success.
;; zf is [#<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_8>]
;; functionp true and is [#<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_8>]
;; funcall zf result is [(19 13 6 1)]

;; "=========="

PEG actions, example

;; -*- coding: utf-8; lexical-binding: t; -*-

(require 'peg)

(defun x-peg-fail (xstack)
  "Called by `peg-run' when fail.
Print to messages buffer of the stack.
Created: 2025-10-03
Version: 2025-10-03"
  (message "Parse failed. xstack [%s]" xstack))

(defun x-peg-success (zf)
  "Called by `peg-run' when success.
Print to messages buffer of ...
Created: 2025-10-03
Version: 2025-10-03"
  (message "
--------------------
Parse success.")
  (message "zf is [%s]" zf)
  (if (functionp zf)
      (message "function arg zf is %s" zf)
    (message "functionp nil"))
  (message "funcall zf result is %s" (funcall zf))
  (print "=========="))

(let ((xbuff (get-buffer-create "xtext")))

  (when (eq 1 (with-current-buffer xbuff (point-max)))
    (let ((xtext "<div><p></p></div>\n"))
      (with-current-buffer xbuff (insert xtext))))

  (with-current-buffer xbuff
    (goto-char (point-min))

    ;; you can have action in a pex
    ;; you can have more than one action in a pex
    ;; you cannot have action between pexs. no syntax error but they are ignored.
    ;; each action has the form
    ;; `(arg1 arg2 etc -- ‹lisp code refering to arg1 arg2 etc›)
    ;; the args before double dash, are taken from stack. the code after double dash, are pushed to stack
    ;; args can be omitted. means no take from stack.
    ;; you must push to stack, else there is nothing on stack
    ;; before you take from stack (those args), there must be something on stack first. e.g. you must have a action before that pushed to stack

    (with-peg-rules
        ((xml-elem opening-tag (* xml-elem) closing-tag)

         (opening-tag "<" `(-- (point)) tag-name `(-- (point)) ">" `(aa bb -- (buffer-substring aa bb)))
         (tag-name (+ [a-z]) (* "-") (* [a-z]))
         (closing-tag "</" tag-name ">")
         (substring xml-elem))
      (peg-run (peg xml-elem) 'x-peg-fail 'x-peg-success))

    ;; (pop-to-buffer xbuff)
    ;; (view-echo-area-messages)
    ))

;; --------------------
;; Parse success.
;; zf is [#<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_8>]
;; function arg zf is #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_8>
;; funcall zf result is (p div)

;; "=========="

Source Code Inline Doc

This package implements Parsing Expression Grammars for Emacs Lisp.

This file implements the macros `define-peg-rule', `with-peg-rules', and `peg-parse' which parses the current buffer according to a PEG.

E.g. we can match integers with:

(with-peg-rules
        ((number sign digit (* digit))
         (sign   (or "+" "-" ""))
         (digit  [0-9]))
      (peg-run (peg number)))

or

(define-peg-rule digit ()
      [0-9])
    (peg-parse (number sign digit (* digit))
               (sign   (or "+" "-" "")))

The syntax for PEX (Parsing Expression) is a follows:

Description		Lisp		Traditional, as in Ford's paper
    ===========		====		===========
    Sequence			(and E1 E2)	e1 e2
    Prioritized Choice	(or E1 E2)	e1 / e2
    Not-predicate		(not E)		!e
    And-predicate		(if E)		&e
    Any character		(any)		.
    Literal string		"abc"		"abc"
    Character C		(char C)	'c'
    Zero-or-more		(* E)		e*
    One-or-more		(+ E)		e+
    Optional			(opt E)		e?
    Non-terminal             SYMBOL		A
    Character range		(range A B)	[a-b]
    Character set		[a-b "+*" ?x]	[a-b+*x]   ;Note: it's a vector
    Character classes	[ascii cntrl]
    Boolean-guard		(guard EXP)
    Syntax-Class		(syntax-class NAME)
    Local definitions	(with RULES PEX...)
    Indirect call            (funcall EXP ARGS...)

and

Empty-string		(null)		ε
    Beginning-of-Buffer	(bob)
    End-of-Buffer		(eob)
    Beginning-of-Line	(bol)
    End-of-Line		(eol)
    Beginning-of-Word	(bow)
    End-of-Word		(eow)
    Beginning-of-Symbol	(bos)
    End-of-Symbol		(eos)

Named rulesets:

You can define a set of rules for later use with:

(define-peg-ruleset myrules
      (sign  () (or "+" "-" ""))
      (digit () [0-9])
      (nat   () digit (* digit))
      (int   () sign digit (* digit))
      (float () int "." nat))

and later refer to it:

(with-peg-rules
        (myrules
         (complex float "+i" float))
      ... (peg-parse nat "," nat "," complex) ...)

Parsing actions:

PEXs also support parsing actions, i.e. Lisp snippets which are executed when a pex matches. This can be used to construct syntax trees or for similar tasks. The most basic form of action is written as:

(action FORM)          ; evaluate FORM for its side-effects

Actions don't consume input, but are executed at the point of match. Another kind of action is called a "stack action", and looks like this:

`(VAR... -- FORM...)   ; stack action

A stack action takes VARs from the "value stack" and pushes the results of evaluating FORMs to that stack.

(with-peg-rules
    ((query (+ term) (eol))
     (term key ":" value (opt (+ [space]))
	   `(k v -- (cons (intern k) v)))
     (key (substring (and (not ":") (+ [word]))))
     (value (or string-value number-value))
     (string-value (substring (+ [alpha])))
     (number-value (substring (+ [digit]))
		   `(val -- (string-to-number val))))
  (peg-run (peg query)))

This invocation of `peg-run' would parse this buffer text:

name:Jane age:30

And return this Elisp sexp:

((age . 30) (name . "Jane"))

Derived Operators:

The following operators are implemented as combinations of primitive expressions:

(substring E)  ; Match E and push the substring for the matched region.
    (region E)     ; Match E and push the start and end positions.
    (replace E RPL); Match E and replace the matched region with RPL.
    (list E)       ; Match E and push a list of the items that E produced.

See `peg-ex-parse-int' in `peg-tests.el' for further examples.

Regexp equivalents:

Here a some examples for regexps and how those could be written as pex. [Most are taken from rx.el]

"^[a-z]*"
    (and (bol) (* [a-z]))

    "\n[^ \t]"
    (and "\n" (not [" \t"]) (any))

    "\\*\\*\\* EOOH \\*\\*\\*\n"
    "*** EOOH ***\n"

    "\\<\\(catch\\|finally\\)\\>[^_]"
    (and (bow) (or "catch" "finally") (eow) (not "_") (any))

    "[ \t\n]*:\\([^:]+\\|$\\)"
    (and (* [" \t\n"]) ":" (or (+ (not ":") (any)) (eol)))

    "^content-transfer-encoding:\\(\n?[\t ]\\)*quoted-printable\\(\n?[\t ]\\)*"
    (and (bol)
         "content-transfer-encoding:"
         (* (opt "\n") ["\t "])
         "quoted-printable"
         (* (opt "\n") ["\t "]))

    "\\$[I]d: [^ ]+ \\([^ ]+\\) "
    (and "$Id: " (+ (not " ") (any)) " " (+ (not " ") (any)) " ")

    "^;;\\s-*\n\\|^\n"
    (or (and (bol) ";;" (* (syntax-class whitespace)) "\n")
        (and (bol) "\n"))

    "\\\\\\\\\\[\\w+"
    (and "\\\\[" (+ (syntax-class word)))

See ";;; Examples" in `peg-tests.el' for other examples.

Rule argument and indirect calls:

Rules can take arguments and those arguments can themselves be PEGs. For example:

(define-peg-rule 2-or-more (peg)
      (funcall peg)
      (funcall peg)
      (* (funcall peg)))

    ... (peg-parse
         ...
         (2-or-more (peg foo))
         ...
         (2-or-more (peg bar))
         ...)

References:

Todo:

News:

Reference