Programing Language: Fundamental Problems of Lisp

, ,

In this article, i discuss 2 problems that are rooted in lisp. One is the irregularity in its often cited regular syntax. The other is the language's use of “cons” for list.

Syntax Irregularities

Lisp family of languages, in particular, Common Lisp, Scheme Lisp, Emacs Lisp, are well known for its syntax's regularity, namely, “everything” is of the form (f x1 x2 …). However, it is little talked about that there are several irregularities in its syntax. Here are some examples of the syntax irregularity.

In the following, i detail how these irregularities hamper the power of regular syntax, and some powerful features and language developments that lisp have missed that may be due to it.

Confusing

Lisp's irregular syntax are practically confusing. For example, the difference between (list 1 2 3), '(1 2 3), (quote (1 2 3)) is a frequently asked question. The use of ` , ,@ are esoteric. If all these semantics use the regular syntactical form (f args), then much confusion will be reduced and people will understand and use these features better. For example:

(a . b) ; bad

(. a b) ; good
'(1 2 3) ; bad

(' 1 2 3) ; good
; or
(list-literal 1 2 3) ; good
(setq myListXY `(,@ myListX ,@ myListY)) ; bad

(setq myListXY (` (,@ myListX) (,@ myListY))) ; good
; or
(setq myListXY (eval-parts (splice myListX) (splice myListY))) ; good

Syntax-Semantics Correspondence

A regular nested syntax has a one-to-one correspondence to the language's abstract syntax tree, and to a large extent the syntax has some correspondence to the language's semantics. The irregularities in syntax break this correspondence.

For example, programers can tell what piece of source code (f x1 x2 x3 …) do by just reading the name that appears as first element in the paren. As a contrast, in syntax soup languages such as Java, Perl, the programer must be familiar with each of its tens of historically evolved ad hoc syntactical forms. (⁖ if (…) {…}, for (…; …; …) {…}, (A ? B : C), x++, myList = [1, 2, 3], ….) As a example, if lisp's '(1 2 3) is actually (' 1 2 3) or (literal-list 1 2 3), then the syntax-semantic correspondence is kept intact.

Source Code Transformation

Lisp relies on a regular nested syntax. Because of such regularity of the syntax, it allows transformation of the source code by a simple lexical scan. This has powerful ramification. (lisp's macros is one example) For example, since the syntax is regular, one could easily have alternative, easier-to-read syntaxes as a layer. (the concept is known in lisp history as M-expression) Mathematica took this advantage, so that you have easy-to-read syntax, yet fully retain the advantages of regular form. In lisp history, such layer has been done and tried here and there in various forms or langs 〔➤ LISP Sans Parenthesis, 1958 to 2013, a Survey〕 , but never caught on due to largely social reasons. Part of the reason is political and lisper's sensitivity to criticism of nested syntax identity.

In lisp communities, it is widely recognized that lisp's regular syntax has the property that “code is data; data is code”. However, what does that mean exactly is usually not clearly understood in the lisp community. Here is its defining characteristics:

A regular nested syntax, makes it possible to do source code transformations trivially with a lexical scan.

The benefits of a regular syntax has become widely recognized since ≈2005, by the XML language. The XML language, due to its strict syntactical regularity, has developed into many technologies on a life of their own, such XSLT, XQuery, STX etc, due to this lexical transformation property.

Addendum: in recent years, the term “homoiconicity” is used to describe this property. Wikipedia at Homoiconicity lists langs having that property:

Note Mathematcia, tcl, XSLT, and assembly!

Automatic, Uniform, Universal, Source Code Display

One of the advantage of regular nested syntax is that a programer should never need to format his source code manually (i.e. pressing tabs, returns), and save the hundreds hours of labor, guides, tutorials, editor tools, that are part of what's known as “coding style convention”, because the editor can reformat the source code on the fly based on a trivial lexical scan. (as a example, such “coding style convention” almost never appear in XML, because there are plenty tools to automatically format it, due to the regularity in syntax.)

Because lisp's syntax has lots of nested parenthesis, so when coding lisp, the source code formatting is much more labor-intensive than syntax soup languages such as Perl, even when using a dedicated lisp editor such as emacs that contain large number of editing commands on nested syntax. 〔➤ How to Edit Lisp Code with Emacs

The lisp community, established a particular way of formatting lisp code as exhibited in emacs's lisp modes and written guides of conventions. The recognition of such convention further erode any possibility and awareness of automatic, uniform, universal, formatting.

As a example, the Mathematica language features a pure nested syntax similar to lisp but without irregularities. So, in that language, since version 3 released in 1996, the source code in its editor are automatically formatted on the fly as programer types, much in the same way paragraphs are automatically wrapped in a word processor since early 1990s. (In fact, with Mathematica, it features automatic rendering of source code into 2-dimensional Mathematical notations.)

Note the phrase “automatic, uniform, universal, source code display”.

The “uniform” and “universal” aspect is a well-known property of Python lang's source code. The reason Python's source code has such uniform and universal display formatting is because it is worked into the language's semantics. That is: the semantics of the code depends on the “formatting”. But also note, Python's source code is not and cannot be automatically formatted, precisely because the semantics and formatting are tied together. A strictly regular nested syntax, such as Mathematica's, can, and is done, since 1996.

Lisp, despite its syntax irregularities, can still have a automatic formatting at least to a large, practical, extent. Once lisp has automatic on-the-fly formatting, then lisp code will achieve uniform and universal source code formatting display. (In emacs, this feature is similar to how fill-paragraph, auto-fill-mode works, and might be called fill-sexp or auto-fill-sexp-mode. See: A Simple Lisp Code Formatter.)

The advantage of having a automatic, uniform, universal, source code display for a language is that it gets rids of the hundreds of hours on the labor, tools, guides, arguments, about how one should format his code. (this is partly the situation of Python already) But more importantly, by having such properties, it will actually have a impact on how programer code in the language. i.e. what kind of idioms they choose to use, what type of comments they put in code, and where. This, further influences the evolution of the language, i.e. what kind of functions or features are added to the lang. For some detail on this aspect, see: The Harm of Manual Code Formating.

Syntax as Markup Language

One of the power of a uniform nested syntax is that you could build up layers on top of it, so that the source code can function as markup of conventional mathematical notations (⁖ MathML) and or as a word-processor format that can contain headers, colored text, links, images, videos, version log, yet lose practical nothing. (⁖ Microsoft Office Open XML)

This is done in Mathematica in 1996 with release of Mathematica version 3. (⁖ think of XML, its uniform nested syntax, its diverse use as a markup lang, then, some people are adding computational semantics to it now (i.e. a computer language with syntax of XML. ⁖ www.o-xml.org). You can think of Mathematica going the other way, by starting with a computer lang with a regular nested syntax, then add new but inert keywords to it with markup semantics. The compiler will treat these inert keywords like comment syntax when doing computation. When the source code is read by a editor, the editor takes the markup keywords for structural or stylistic representation, with {title, heading, tables, images, animations, hyperlinks}, and typeset math expression (⁖ think of MathML) etc. Expressions with non-mark-up keywords are shown as plain text just like normal source code.)

For example, HTML has the “h1” tag for heading:

<h1>Some Title</h1>

In lisp, it could also have a function “h1” that acts as a title markup:

(h1 "Some Title")

The compiler will treat the “h1” similar to the “format” function. When the source code is displayed in a lisp editor, only formatted output would show. 〔➤ HTML6: Your JSON & SXML Simplified

Frequently Asked Questions

You say that lisp syntax irregularities “reduce such syntax's power”. What you mean by “syntax's power”?

See: What Are Good Qualities of Computer Language Syntax?.

Many of lisp's sugar syntax are designed to reduce nested paren. Why using a more consistent, but more annoying sugar syntax?

Many lisp's irregular syntax could have a form that is regular yet does not require extra typing. For example, '(a b c) could be (' a b c), and (a . b) could be (. a b).

But ultimately, you have to ask why lisper advocate nested syntax in the first place.

If lispers love the nested syntax, then, the argument that there should not be irregularities, has merit. If lispers think that some irregularities is good for convenience, then there's the question of how many, or what form. You might as well introduce ++i for (setq i (+ i 1)).

Further readings:

The Cons Business

The above are some of the damages of irregularity in lisp's syntax. The other fundamental problem in the language is its cons cells as its list construction primitive.

Lisp at core is based on functional programing on lists. This is a powerful paradigm. However, for historical reasons, lisp's list is based on the hardware concept of “cons” cell. From a mathematical point of view, what this means is that lisp's list is limited to a max of 2 elements. If you want a longer list, you must nest it and interpret it in a special way. (i.e. effectively creating a mini-protocol, known as “proper list”.) The cons fundamentally crippled the development of list processing. (info "(elisp) Building Lists")

Lisp being historically based the cons for about 4 decades. The cons and the associated {car cdr cadr caadr caddr caaar cdddr …} are fundamentally rooted in the lisp langs, is thus not something that can be easily mended. This is unfortunate. However, this situation could be improved. By, for example:

One of the myth that is quickly injected into budding lispers, is that cons is powerful. It is powerful in the sense assembly lang is powerful.

Lisp's cons is perhaps the greatest f��� up in the history of computer languages.

For some concrete examples of problems caused by the cons, see:

Frequently Asked Questions

If you don't like cons, lisp has arrays and hashmaps, too.

Suppose there's a lang called gisp. In gisp, there's cons but also fons. Fons is just like cons except it has 3 cells with 3 accessors: car, cbr, cdr. Now, gisp is a old lang, the fons are deeply rooted in the lang. Every some 100 lines of code you'll see a use of fons with its extra accessor cbr, or any one of the cbaar, cdabr, cbbar, cbbbar, etc. You got annoyed by this. You complain that fons is bad. But then some gisp fanatics retorts: “If you don't like fons, gisp has cons, too!”.

You see, by “having something too”, does not solve the problem of pollution.

I like the cons concept. Even in functional languages like Haskell it is popular, ⁖ when matching in the form of (x:xs), which is the same like car/cdr in Lisp.

Languages that have a list data type implemented as linked list, with First, Rest accessors, do not expose it as API in the same way lisp does with explicit cons.

Lisp's cons forces programer to think of list in a low-level nested of 2-item construction, with explicit functions like “cddr”, “cadaar” etc, and even a special notation “(A . B)”.

This hinders the development tree functions. For example, one might write a function that extracts the leafs of a binary tree. But due to cons, there is no single data structure to implement a binary tree. You can have the whole structure made of cons, or cons only for ending branches, or whole structure made of “proper list”. Consequently, what's considered a leaf depends on your implementation, and the code to traverse the tree also depends on how you used cons.

Worse, any proper list can have improper list as elements. So, you can have a list of cons, or cons of lists, cons of cons, list of lists, or any mix. The overall effect of the cons is that it prevents lisp to have a uniform high level treatment of lists, with the result that development of functions that work on tree are inconsistent and few.

(Addendum : there was actually a lang with “fons” of 3 cells. {Haines, E. C., The TREET List Processing Language. M.S. Thesis, MIT, Cambridge, Mass., August 1964. Also SR-133, The MITRE Corporation, Bedford, Mass., April 1965.} Thanks to Richard Fateman for alerting me to this.)

Why Lisp's Cons Problem Never Got Fixed?

Now, a little speculation.

We might wonder, why lisp has the cons problem and was never addressed in Common Lisp or Scheme Lisp?

Historical Necessity

I guess at the time, 1960s and 1970s, the very fact that you could have a concept like a list in a lang and manipulate it as a unit, is a very advanced concept. The list, being built up by hardware cons, is just something that is necessary for implementation for practical considerations. (think for a moment what computers are like in the 1960s.)

Having data as a single manipulable object (list) didn't really become popular until the 1990s. (notably due to Perl) And today, it is THE most important feature of high-level languages. (⁖ Perl, Python, Ruby, PHP, JavaScript .)

Deep Nesting is Rare

The lisp's cons, as a underlying primitive that builds lists, even though a bit cumbersome, but works just fine when data structure used is simple. Even today, with all the Perl, Python, PHP, JavaScript etc langs that deal with lists, vast majority of list usage is just simple flat list, sometimes 2 level of nesting (list of list, list of hash, hash of list). 3 levels of nesting is seldom used, unless it is 3D matrices used mostly in computer graphics or linear algebra applications. Greater than 3 level is rarely seen. Systematic manipulation and exploitation of nested list, such as mapping to leafs, to particular level, transposition by permutation on level, or list structure pattern matching in today's functional langs, etc is hardly ever to be seen. (These are common idioms in so-called array languages. ⁖ APL, MATLAB, Mathematica.)

So, in general, when you just deal with simple lists, the cumbersomeness of using {cons, car, cdr, caardr, …} for list doesn't really surface. Further, the cons is fundamentally rooted in the language. It's not something that can be easily changed except creating a new language. When there is a specific need in a application, there is a haphazard collection of functions that deal with lists at a higher level.

Today, with list manipulation being mainstream, especially with the uprising of many functional langs, and with JavaScript arose JSON, the lisp's cons historical baggage becomes more apparent and painful.

Summary of Reasons:

blog comments powered by Disqus