Elisp: Ban Syntax Table

By Xah Lee. Date:

emacs syntax table is truely, really, greatly annoying. We need to write a parsing lib without syntax table.

every time i thought of using it and relying on it, i run into bunch of problems.

for example, recently i rewrote “xah-select-text-in-quote” so it's based on syntax table.

(defun xah-select-text-in-quote ()
  "Select text between ASCII quotes, single or double."
  (let (p1 p2)
    (if (nth 3 (syntax-ppss))
          (backward-up-list 1 "ESCAPE-STRINGS" "NO-SYNTAX-CROSSING")
          (setq p1 (point))
          (forward-sexp 1)
          (setq p2 (point))
          (goto-char (1+ p1))
          (set-mark (1- p2)))
        (error "Cursor not inside quote")))))

(note: backward-up-list changed in emacs 24.x. I'm using 24.4)

try the command in nxml-mode. Doesn't work. Why? probably because it has complex use of syntax table.

but syntax table is useful, no? No. It's not that emacs syntax table is useful. It's the underlying builtin emacs parsing lib that's useful.

For example, the select text in quote was like this, without using syntax table:

(defun select-text-in-quote ()
  "Select text between the nearest left and right delimiters.
Delimiters are paired characters: ()[]<>«»“”‘’「」, including \"\"."
  (let (b1 b2)
    (skip-chars-backward "^<>(“{[「«\"‘")
    (setq b1 (point))
    (skip-chars-forward "^<>)”}]」»\"’")
    (setq b2 (point))
    (set-mark b1)))

The problem was that, it couldn't deal with nested quotes or backslash escaped quotes. Nor can it deal with 'single quotes', as used in {Python, Ruby, HTML, etc}. (If you include single quote as delimiter, it's a problem because single quote is also used as apostrophe, and happens often (e.g. “it's so!”).) But at least, for any double quoted string that doesn't contain backslash escaped quotes, it always works, reliably.

so, the solution is to add perhaps 50 lines of code to do parsing, or, rely on emacs builtin parser. ( Parsing Expressions (ELISP Manual) )

but if you rely on emacs parsing engine (such as the syntax-ppss function), it'll save you time writing the parser, but it relies on syntax table, meaning, your command's behavior is unpredictable, depending on each buffer/major-mode's syntax table.

Theoretically, the idea of syntax table is useful, because each major mode can have its own concept of quote, suitable for each language the major mode is designed for. Great. But in reality, it doesn't work out that way, as in this example of nxml-mode (which is written by the world's top xml expert James Clark, also a top emacs lisp expert. (who also wrote the classic html-mode and xml-mode in 1990s, all part of emacs.))

All these html modes had complex use of syntax table, because syntax table is not flexible. (e.g. in HTML, the apostrophy isn't normally a quoting character, except when inside a tag (and not already inside a quote).)

<div class='wow' title="it's">complex</div>

Emacs's syntax table is mostly designed for just 3 languages of the 1990s: {C, Lisp, TeX}.


Emacs Modernization