Nested Syntax: XML vs LISP

By Xah Lee. Date: . Last updated: .

XML has nested syntax. There's a competing syntax from lisp. For example:

<div>some <p><b>white</b> rabbit</p></div>
(div some (p (b white)) rabbit)

the lispers claim their syntax is simpler and superior.

there's one major problem with the lisp way. If there's a missing closing bracket, it's often impossible to fix, because the closing bracket contains no info on which opening bracket it is paired with. With XML, it does, usually (unless all your elements are the same elements. For example: <span><span><span></span></span></span>)

here's a lisp expression with a missing bracket.

(a (b (d) (c))

it could be either of the following:

(a (b (d) (c)))
(a (b (d)) (c))

so, this is actually a problem. In practice, missing a bracket do happen. Suppose a cat jumped onto your keyboard.

so, sad, i think i'll have to admit the lisp way isn't superior.

but can we fix it? and still keep it simple?

one way, is to allow multiple type of brackets. e.g. ()[]{}〔〕【】〖〗「」『』〈〉《》‹›«» [see Unicode: Brackets, Quotes «»「」【】《》] This way, things are still simple, and do fix the flaw above. Also, user can simply use one type of bracket, and press a button, and the editor will change them to cycle among different type of brackets. So, this way, as far as fix missing bracket goes, it's even more robust than XML.

(it's more robust in the sense that the ending bracket has more chance of knowing which opening bracket it is paired with, assuming that different types of brackets are used and cycled. It's not necessarily more robust because XML ending tags can take more damage before a complete ending tag is destroyed. For example, in </span>, it takes more than few chars to destroy it. Though, if the deleted char is the slash, then it became a open bracket, a new can of worm. So, overall, we'll need fuller analysis to say which is more robust.)

XML Syntax Induced Human Error

[Nick Alcock https://plus.google.com/115849739354666812574/posts] mentioned a interesting point. With the XML syntax, you have errors such as <a><b></a></b>, which would not likely to occur in simple bracket syntax.

This point is interesting, because here we have human error induced by syntax. It happens because the XML way makes the code less obvious as nested structure. It becomes just running text.

XML Syntax Thwarts Navigating Tree Structure

also, with the XML syntax, it makes it a order of magnitude more difficult for machines to parse the nested structure. [see Language Syntax: Brackets vs Begin/End]

With simple bracket syntax, editors can trivially parse it, and let users navigate the nested tree structure. For example, in emacs, you can color the nested bracket, and also navigate the tree structure by keys. [see How to Edit Lisp Code with Emacs] It is possible in XML too of course, the point is that it's much more difficult to implement.

The practical consequence is that fewer editor supports structured editing, and fewer people uses it. This is also the reason JSON is far more popular.

Racket Lisp Supports Mixed Brackets

Lisp expert [François-René Rideau https://plus.google.com/u/0/108564127390615114635/posts] told me, that Racket lisp already supports mixed bracket. That is, any of () [] {} can be used, and is syntactically equivalent. Super! ([original thread https://plus.google.com/112757647855302148298/posts/acbh1pUD7Qt])

now, i dearly wish emacs lisp supports that. [see Emacs Lisp Tutorial by Example]

am thinking about this issue because am currently writing a delete-tag command in elisp for xah-html-mode. https://code.google.com/p/ergoemacs/source/browse/packages/xah-html-mode.el (if you are using GNU Emac's default html-mode, it's already there. The command is sgml-delete-tag. (note: i can't just call that command in my mode, because that command for some reason depends on emacs's idiotic system of syntax table. This means, if i want to save time and use it, i'll have to change syntax table for my mode, or set it temporarily.) )

Now, with this new understanding, i also need to modify my HTML6 spec HTML6, JSON SXML Simplified to allow mixed brackets.

XML Syntax Allow Early Detecting Error

XML and its derivates have the verbose closing tags on purpose. Basically, with Lisp or your bracket syntax you have to parse the complete document to find errors whereas with XML you can detect errors on the go.

(from Tassilo Horn At https://plus.google.com/+XahLee/posts/6zFZfvv677w)

Best Syntax for Nesting?

what's a nesting syntax that doesn't have all these problem?

how about, bracket with a number to uniquely identify the bracket? Like this:
(n body n)

Example:

(1 div some (2 p (3 b white 3) 2) rabbit 1)

When there are a lot code, such as html, the number may get long. So, a better way is perhaps to allow letters too.

Letters plus number would be 36 possible characters. If we allow 2 places, we have 36^2 = 1296 possibilities.

If we allow 3 places, we have 36^3 = 46656 possibilities. More than enough in a doc.