Nested Syntax Design: XML vs LISP

, ,

XML has nested syntax. There's a competing syntax from lisp. For example:

<div>some <p><b>white</b> rabbit</p></div>
(div some (p (b white)) rabbit)

the lispers claim their syntax is simpler and superior. I kinda went alone (i guess because Wolfram language is like that too and Wolfram language is my favorite language. ⁖ f[g[h[x]]]), but haven't really put thought on it. (it's my subconsciousness's doing!)

there's one major problem with the lisp way. If there's a missing closing bracket, it's often impossible to fix, because the closing bracket contains no info whatsoever on which opening bracket it is paired with. With XML, it does, usually (unless all your elements are the same elements. ⁖ <span><span><span></span></span></span>)

here's a lisp expression with a missing bracket.

(a (b (d) (c))

it could be either of the following:

(a (b (d) (c)))
(a (b (d)) (c))

so, this is actually a problem. In practice, missing a bracket do happen. Suppose a cat jumped onto your keyboard.

so, sad, i think i'll have to admit the lisp way isn't superior.

but can we fix it? and still keep it simple?

one way, is to allow multiple type of brackets. ⁖ ()[]{}〔〕【】〖〗「」『』〈〉《》‹›«» 〔➤ Unicode: Brackets & Quotation «» 「」 【】 《》〕 This way, things are still simple, and do fix the flaw above. Also, user can simply use one type of bracket, and press a button, and the editor will change them to cycle among different type of brackets. So, this way, as far as fix missing bracket goes, it's even more robust than XML.

(it's more robust in the sense that the ending bracket has more chance of knowing which opening bracket it is paired with, assuming that different types of brackets are used and cycled. It's not necessarily more robust because XML ending tags can take more damage before a complete ending tag is destroyed. ⁖ in </span>, it takes more than few chars to destroy it. Though, if the deleted char is the slash, then it became a open bracket, a new can of worm. So, overall, we'll need fuller analysis to say which is more robust.)

XML Syntax Induced Human Error

Nick Alcock mentioned a interesting point. With the XML syntax, you have errors such as <a><b></a></b>, which would not likely to occur in simple bracket syntax.

This point is interesting, because here we have human error induced by syntax. It happens because the XML way makes the code less obvious as nested structure. It becomes just running text.

XML Syntax Thwarts Navigating Tree Structure

also, with the XML syntax, it makes it a order of magnitude more difficult for machines to parse the nested structure. 〔➤ Language Syntax: Brackets vs Begin/End

With simple bracket syntax, editors can trivially parse it, and let users navigate the nested tree structure. For example, in emacs, you can color the nested bracket, and also navigate the tree structure by keys. 〔➤ How to Edit Lisp Code with Emacs〕 It is possible in XML too of course, the point is that it's much more difficult to implement.

The practical consequence is that fewer editor supports structured editing, and fewer people uses it. This is also the reason JSON is far more popular.

Racket Lisp Supports Mixed Brackets

Lisp expert François-René Rideau told me, that Racket lisp already supports mixed bracket. That is, any of () [] {} can be used, and is syntactically equivalent. Super! (original thread)

now, i dearly wish emacs lisp supports that. 〔➤ Emacs Lisp Basics by Example

am thinking about this issue because am currently writing a delete-tag command in elisp for xah-html-mode. https://code.google.com/p/ergoemacs/source/browse/packages/xah-html-mode.el (if you are using GNU Emac's default html-mode, it's already there. The command is sgml-delete-tag. (note: i can't just call that command in my mode, because that command for some reason depends on emacs's idiotic system of syntax table. This means, if i want to save time and use it, i'll have to change syntax table for my mode, or set it temporarily.) )

Now, with this new understanding, i also need to modify my HTML6 spec HTML6: Your JSON & SXML Simplified to allow mixed brackets.

blog comments powered by Disqus