Fundamental Problems of Lisp, Syntax Irregularity

By Xah Lee. Date: 2008-07-30. Last updated: 2024-07-02.

There are 2 problems that are rooted in lisp. One is the irregularity in its often cited regular syntax. The other is the language's use of “cons” for list.

Syntax Irregularity

Lisp family of languages, in particular, Common Lisp, Scheme Lisp, Emacs Lisp , are well known for its syntax's regularity, namely, “everything” is of the form (f x1 x2 …). However, it is little talked about that there are several irregularities in its syntax. Here are some examples of the syntax irregularity.

The comment syntax of semicolon to end of line. e.g. ; comment.
The dotted notation for cons cell. e.g. (1 . 2).
The single quote syntax used to create a list and hold evaluation of elements. e.g. '(1 2 3).
The backquote and comma are used as part of syntax to evaluate only parts of expression. e.g. (setq x 3) (setq name-value-pair `(x ,x)).
The ,@ for inserting a list as elements into another list. e.g. (setq x (list 1 2)) (setq y (list 3 4)) (setq xy `(,@ x ,@ y))
The square brackets are used for vector datatype in elisp. e.g. [1 2 3], or in scheme it uses the apostrophe and number sign '#(1 2 3)
There are various others in Common Lisp or Scheme Lisp. e.g. the char # and #|. In Scheme's R6RS, it has introduced a few new ones.

elisp cal-tex 2024-07-02 — elisp syntax irregularity, showing apostrophe, dot, semicolon.

elisp dolist 2024-07-02 — elisp syntax irregularity, showing the apostrophe, dot, grave accent, comma, comma at sign.

In the following, i detail how these irregularities hamper the power of regular syntax, and some powerful features and language developments that lisp have missed that may be due to it.

Confusing

Lisp's irregular syntax are practically confusing. For example, the difference between

(list 1 2 3)
'(1 2 3)
(quote (list 1 2 3))

is a hard to understand. The use of ` , ,@ are esoteric. If all these semantics use the regular syntactical form (f a b c), then much confusion will be reduced and people will understand and use these features better.

For example:

;; bad
(a . b)

;; good
(. a b)

;; better, with word
(cons a b)

;; bad
'(1 2 3)

;; good
(' 1 2 3)

;; better, with word
(quote 1 2 3)

; bad
(setq aa `(,@ bb ,@ cc))

; good
(setq aa (` (,@ bb) (,@ cc)))

;; better
(setq aa (eval-parts (splice bb) (splice cc)))

Syntax-Semantics Correspondence

A regular nested syntax has a one-to-one correspondence to the language's abstract syntax tree, and to a large extent the syntax has some correspondence to the language's semantics. The irregularities in syntax break this correspondence.

For example, programers can tell what piece of source code (f x1 x2) do by just reading the name that appears as first element in the paren. As a contrast, in syntax soup languages such as Java, Perl, the programer must be familiar with each of its tens of historically evolved ad hoc syntactical forms.

Ad hoc syntax of C derived langs:

if (…) {…}
for (…; …; …) {…}
(A ? B : C)
x++
myList = [1, 2, 3]

If lisp's '(1 2 3) is actually (' 1 2 3) or (literal-list 1 2 3), then the syntax-semantic correspondence is kept intact.

Source Code Transformation

Lisp relies on a regular nested syntax. Because of such regularity of the syntax, it allows transformation of the source code by a simple lexical scan. This has powerful ramification. (lisp's macros is one example) For example, since the syntax is regular, one could easily have alternative, easier-to-read syntaxes as a layer. (the concept is known in lisp history as M-expression ) Wolfram Language took this advantage, so that you have easy-to-read syntax, yet fully retain the advantages of regular form.

In lisp history, such layer has been done and tried here and there in various forms or langs [see LISP Infix Syntax Survey] , but never caught on due to largely social reasons. Part of the reason is political and lisper's sensitivity to criticism of nested syntax identity.

In lisp communities, it is widely recognized that lisp's regular syntax has the property that “code is data; data is code”. However, what does that mean exactly is usually not clearly understood in the lisp community. Here is its defining characteristics:

A regular nested syntax, makes it possible to do source code transformations trivially.

The benefits of a regular syntax has become widely recognized since ~2005, by the XML language. The XML language, due to its strict syntactical regularity, has developed into many technologies on a life of their own, such XSLT, XQuery, STX etc, due to this lexical transformation property.

Automatic, Uniform, Universal, Source Code Display

One of the advantage of regular nested syntax is that a programer should never need to format his source code manually (i.e. pressing tabs, returns), and save the hundreds hours of labor, guides, tutorials, editor tools, that are part of what's known as “coding style convention”, because the editor can reformat the source code on the fly based on a trivial lexical scan. (as a example, such “coding style convention” almost never appear in XML, because there are plenty tools to automatically format it, due to the regularity in syntax.)

Because lisp's syntax has lots of nested parenthesis, so when coding lisp, the source code formatting is much more labor-intensive than syntax soup languages such as Perl, even when using a dedicated lisp editor such as emacs that contain large number of editing commands on nested syntax. [see How to Edit Lisp Code with Emacs]

The lisp community, established a particular way of formatting lisp code as exhibited in emacs's lisp modes and written guides of conventions. The recognition of such convention further erode any possibility and awareness of automatic, uniform, universal, formatting.

As a example, the Mathematica language features a pure nested syntax similar to lisp but without irregularities. So, in that language, since version 3 released in 1996, the source code in its editor are automatically formatted on the fly as programer types, much in the same way paragraphs are automatically wrapped in a word processor since early 1990s. (In fact, with Mathematica, it features automatic rendering of source code into 2-dimensional Mathematical notations.)

Note the phrase “automatic, uniform, universal, source code display”.

By automatic, it means that any text editor can format your code on the fly or by request, and this feature can be TRIVIALLY implemented.
By uniform, it means there is one SIMPLE and MECHANICAL heuristic, to determine a canonical way to format any regularly nested code for human-readable display.
By universal is meant that all programers, will recognize and habituated with this one canonical way, as a standard. (they can of course set a preference in their editor to display it in other ways)

The “uniform” and “universal” aspect is a well-known property of Python lang's source code. The reason Python's source code has such uniform and universal display formatting is because it is worked into the language's semantics. That is: the semantics of the code depends on the “formatting”. But also note, Python's source code is not and cannot be automatically formatted, precisely because the semantics and formatting are tied together. A strictly regular nested syntax, such as Mathematica's, can, and is done, since 1996.

Lisp, despite its syntax irregularities, can still have a automatic formatting at least to a large, practical, extent. Once lisp has automatic on-the-fly formatting, then lisp code will achieve uniform and universal source code formatting display. (In emacs, this feature is similar to how fill-paragraph, auto-fill-mode works, and might be called fill-sexp or auto-fill-sexp-mode. See: A Simple Lisp Code Formatter.)

The advantage of having a automatic, uniform, universal, source code display for a language is that it gets rids of the hundreds of hours on the labor, tools, guides, arguments, about how one should format his code. (this is partly the situation of Python already) But more importantly, by having such properties, it will actually have a impact on how programer code in the language. i.e. what kind of idioms they choose to use, what type of comments they put in code, and where. This, further influences the evolution of the language, i.e. what kind of functions or features are added to the lang. For some detail on this aspect, see: The Harm of Manual Code Formating .

Syntax as Markup Language

One of the power of a uniform nested syntax is that you could build up layers on top of it, so that the source code can function as markup of conventional mathematical notations (For example, MathML) and or as a word-processor format that can contain headers, colored text, links, images, videos, version log, yet lose practical nothing. (For example, Microsoft Office Open XML)

This is done in Mathematica in 1996 with release of Mathematica version 3. (For example, think of XML, its uniform nested syntax, its diverse use as a markup lang, then, some people are adding computational semantics to it now (i.e. a computer language with syntax of XML. e.g. [ http://www.o-xml.org/ ]). You can think of Mathematica going the other way, by starting with a computer lang with a regular nested syntax, then add new but inert keywords to it with markup semantics. The compiler will treat these inert keywords like comment syntax when doing computation. When the source code is read by a editor, the editor takes the markup keywords for structural or stylistic representation, with {title, heading, tables, images, animations, hyperlinks}, and typeset math expression (For example, think of MathML) etc. Expressions with non-mark-up keywords are shown as plain text just like normal source code.)

For example, HTML has the “h1” tag for heading:

<h1>Some Title</h1>

In lisp, it could also have a function “h1” that acts as a title markup:

(h1 "Some Title")

The compiler will treat the “h1” similar to the “format” function. When the source code is displayed in a lisp editor, only formatted output would show.

Frequently Asked Questions

You say that lisp syntax irregularities “reduce such syntax's power”. What you mean by “syntax's power”?

What Are Good Qualities of Computer Language Syntax?

Many of lisp's sugar syntax are designed to reduce nested paren. Why using a more consistent, but more annoying sugar syntax?

Many lisp's irregular syntax could have a form that is regular yet does not require extra typing. e.g. '(a b c) could be (' a b c), and (a . b) could be (. a b).

But ultimately, you have to ask why lisper advocate nested syntax in the first place.

If lispers love the nested syntax, then, the argument that there should not be irregularities, has merit. If lispers think that some irregularities is good for convenience, then there's the question of how many, or what form. You might as well introduce ++i for (setq i (+ i 1)).

A Text Editor Feature: Extend Selection by Semantic Unit

The Cons Problem

Fundamental Problems of LISP, the Cons Cell