Lisp List Problem

By Xah Lee. Date: 2007.

Lisp at core is based on functional programing on lists. This is comparatively a powerful paradigm. However, for historical reasons, lisp's list is based on the hardware concept of “cons” cell. From a mathematical point of view, what this means is that lisp's lists is limited to a max of 2 elements. If you want a longer list, you must nest it and interpret it in a special way. (i.e. effectively creating a mini-protocol of nesting lists, known as “proper list”.) The cons fundamentally crippled the development of list processing.

The following are some collected online forum writings about my opinion on Lisp's list problem.

2007-12-30

hi Daniel,

It is my honor to reply to you!

About the cons business… i meant to say that the lisp's ways of list is historical and drags in the implementation details. Namely, that lists are made of con cells, with car and cdr primitives, then with a layer of abstraction on top.

it is understandable due to lisp's 1960's birth that the language necessarily have some implementation concept in the language. Nevertheless, it's a flaw considered today.

in today's high-level languages, one do not deal with memory allocation, linked list, types, etc implementation oriented concepts. For example, in Python, a list is like this [a,b,c,…] and if one wants arbitrarily nested list (such as a matrix, tree, associative list, hash, dictionary, key'd-list, vector, array, map, sequence … however one wants to call them), the programer simply write [a,b,c,…] where each of the a,b,c can be another list of the form […]. Similarly in JavaScript.

But in lisp, it is at the low level exposed to programers the “cons” sequenced in a particular way, and accessed with car, cdr, caard, etc. A lisper may argue that a programer can simply use higher-level constructs like “nth” or “list” to deal with lists. However, one really cannot pretend that cons doesn't exist in lisp because the language takes the con cell as its fundamental primitive. In short, a programer cannot program in lisp in a real world situation without having a good understanding of the cons. (partly due to the language itself, partly due to (i think vast majority) of existing code all directly deals with cons)

all this may seem shallow and does not constitute a problem for any serious, professional programer. But i think it is a damnation to lisp and the major (unfixable) cause of preventing lisp from becoming widely used in the future (despite that lisp had been somewhat mainstream in the 1980s, which i wasn't a programer then). Because, the reason for high-level language being what they are is from all these little details. In general, high-level language being what they are ( Python, PHP, JavaScript, Mathematica) is because they are more and more abstract from the hardware/implementation/compiler concepts/issues.

am rambling a bit… but let me try another example. For example, in Mathematica, a list has the form {a,b,c,…}. There is no concept whatsoever of traditional computer scientist's terms of vectors, arrays, hashes, linked lists, circular lists, sets, matrixes, trees… But what if one wants a vector or array or sequence or any of these ideas? Basically the concept behind all these variations lies in their algorithmic properties and certain restriction in the list's shape. For example, for most languages, a vector or array is simply a list with unchanging length (from the interface perspective). A hash is simply a list of element where each element is a list of 2 items, and further with the property that one have a function that quickly determines whether a key exist etc. (i.e. so-called constant time access). A set is just a list that does not have identical elements. A matrix is just a list with element where all elements are lists of the same length. A tree is simply a arbitrarily nested list… and so on. So, from a interface point of view, all these, to the programer who just type out characters on the screen, are all the same, of the form {a,b,c,…}. Thus Mathematica, and many other high level languages, don't have all these other various computer scientist's concept of various sequence/list/array/aggregate “types”. (Perl, Python, PHP, still have hash besides list)

But, one may ask, so if one wants a hash-table in Mathematica (or other high-level lang) that just have a list, how can one do it? For example, its a list of pairs like this {{k1,v1},{k2,v2},…} but how can i know that it'll have constant access time when i use the language's function to gets the elements of a list? Does it mean the programer has to implement it himself? The answer is that the compiler should automatically figure out and decide this transparently. In case the compiler is theoretically not able to do so, then the language could have some construct such as:

myhash={{k1,v1},{k2,v2},…};
AssignListProperty(myhash,hash-like);

Assume here that the “AssignListProperty” function is a method in the language to tell the compiler certain usage example for the list, so that the compiler can optimize it. So that the list myhash automatically achieve what typical mid-level language such as Java would traditionally have a hash type. (In the case of Java, it's the “Interface” “Collection”, in Perl and Python both are a hash type, in C, a generation before Java, hashes must be implemented by the programer)

In lisp, although much of these list-like things are already implemented… but the bottom line is that the concept of a implementation/hardware thing the con/car/cdr is exposed to the language in a way that it is not possible to practically program in lisp without fully understand the cons. Also, as a consequence of the cons thing, lisp creates many various ways to code a tree of a given shape. (i.e. the distinction of “proper list”) This is one of the complexity with cons, which is the reason i think new generation of coders will not put up with. (they will simply not understand it, and not bother to understand it, or bothered to understand it and decided it's a drag and not use lisp)

2008-01-30

[The following is a excerpt from a comp.lang.lisp post.]

The cons business in lisp is really a pain in the ass. car, cdr, caar, cddar!! I am a expert of Mathematica language since about 1995, and consider myself perhaps the world's top 100. The Mathematica language is, in a way, a modern lisp. (although, i think to mention lisp with Mathematica would be thought of like a insult to its creator Stephen Wolfram) In Mathematica, the language heavily uses lists and nested lists. In fact, the whole source code of any source code is one single, deeply nested list. Since i code in Mathematica for over a decade, i'm thoroughly familiar and have deep knowledge of the nested syntax and its abstract tree concept. Since Mathematica source code are all nested lists, the language provide many facilities to manipulate such a form. I can, for example, get all the nodes at level n. Map a function to level n. Map a function to just leafs (which is considered level -1 in Mathematica). Or, get a particular leaf (e.g. lisp's (caar myList) would be Part[myList,1,1]). Or, getting several elements at a particular level. (Emacs Lisp cannot do this, nor Common Lisp or Scheme Lisp in any simple and standardized way) I can also get any part(s) at any level(s) that match a particular type. (this aspect, concerning pattern matching, Mathematica is far complete and flexible than any computer language that exist out there.)

Despite being a expert in trees, the lisp's cons business is truely a pain to deal with. A large part of my time spent on elisp programing is on the debugging the cons business.

In this aspect, it is similar to Perl's list business. In Perl, to work with nested list is a extreme pain in the ass, due to its very screwed up syntax that are in part spurn from its semantics for nested lists. (by what's so-called “references”. [see Perl: List, Array] [see Perl: Hash Table (Key/Value List)])

To work with anything slightly nested, is a pain in Perl. And, it is why, in Perl programs, there is not much of nested data structures except maybe at just 2 or 3 levels deep. I think lisp's cons business severely puts a limit on what people actually use it for. Kinda a joke consider the origin of the name LISt Processing.

Maciej Katafiasz wrote:

[ http://reference.wolfram.com/mathematica/ref/Cases.html ]

Looking at that reference, it seems to me that the most interesting things happen not when you add levels, which I understand should be fairly trivial, but when you employ patterns.

The power of Mathematica's pattern matching facilities is quite beyond any language out there. So, let's ignore Mathematica's patterns for the moment because otherwise we'd be talking about implementing Mathematica.

(note here: my main interest in this thread, as well as many others recently, is just about adding Mathematica's list manipulating functions to lisp, which is doable and practical. Adding Mathematica's pattern matching to lisp would be too big a topic.)

The level spec in Mathematica is actually quite interesting.

When reading the doc site: [ http://reference.wolfram.com/mathematica/ref/Cases.html ]

click on the “More Information” yellow box. It gives you details. Here's few excerpt:

• Cases uses standard level specifications:

   n               levels 1 through n
   Infinity        levels 1 through Infinity
   {n}             level n only
   {n1,n2}         levels n1 through n2

• The default value for levelspec in Cases is {1}.

• A positive level n consists of all parts of expr specified by n indices.

• A negative level -n consists of all parts of expr with depth n.

• Level -1 consists of numbers, symbols and other objects that do not have subparts.

• Level 0 corresponds to the whole expression.

Note that, this levelspec is basically either a integer or symbol Infinity, or a list of 1 or 2 elements whose elements are integers or Infinity. (Examples: {1} means just level 1. {3,5} means levels 3 and 4 and 5. {2,-2} means level 2, 3,… up to any branch that has a depth of 2.)

This levelspec is a optional argument for any functions that take a list as argument. Now, for the moment let's assume that mapping to levels or manipulating lists with specified levels is useful. But adding this to ALL list related functions, as to achieve a consistent and uniform way to deal with trees, is quite another level of power.

This levelspec, in its full generality as specified above, is not easy to implement. You can try implement it for lisp's map. I bet its gonna be few hundred lines.

So, if you gonna add such a option to all lisp's functions that deals with lists, you don't implement for each one. Instead, you write a general code that deals with levels, and have all your list functions call it.

(i've studied and implement this in some depth, in ~1998, at a time i'm a naive, total blank about computer science. I think basically you need some internal data structure for trees. The way i did it was basically use a flat list to represent tree. Each element has 2 parts, one being the node's index, and the other being the node's item. So, in essence, i decompose a tree into this flat structure, then manipulate any levelspec or other tree-transformation by manipulating the indexes, then use the result to reconstruct a new tree. I think this method is the mathematical essence of any implementation of dealing with manipulating trees in the general way. see the Mathematica notebook here: Tree Functions in Perl, Python, Java )

Now, when this levelspec is used, let's say {2,5}, meaning level 2 to 5 inclusive, and when this is used in a context of pattern matching as in Mathematica's Cases[], it gets rather quite complex. Because, there's the question of which level is gonna be pattern matched first, because Mathematica patterns can match just about any shape of nested list. So this why the last sentence in the spec says “Cases traverses the parts of expr in a depth-first order, with leaves visited before roots.”.

In practice, using patterns with levelspec that's more than one level, is rather complex. To give a picture of this…: unless you have programed in Mathematica daily for few years, you generally don't use it and don't need to use it. (it becomes very powerful when you are thoroughly familiar with tree structure and Mathematica's pattern spec) (in analogy, this is somewhat similar to regex. Some of the regex features you generally never use, but are critical when you need it)

Imagine someone accustomed to the above high-level flexibility of dealing with trees, then is exposed to lisp's cons business with car, cdr, cddar! That is the shock i got when i was learning purportedly the most elegant language the Scheme lisp back in 1998. [see My Lisp experience]

As i have expressed before, due to the cons surfacing to the language level masquerading as a list syntax, a programer cannot practically write programs and pretending it doesn't exist. I could, in my own emacs lisp program, avoid cons and always construct only proper lists and use higher level list functions on them. But as soon as i read other's lisp code, i'll see cons. This is not just a syntactical, perpetual inconvenience. Because, i could not use the high-level list function on arbitrary lists in other people's code, because there's the proper list and improper list distinction due to the fact that lists longer than 2 elements are made of cons in some specific way. In essence, the power of the concept of trees, hampered by coding it on top of binary trees.

This is why, in lisps, there are never much advancement to manipulating lists in a more general and powerful way. The cons stops it.

The reason lisp have cons is really due to a particular hardware. (see for example, CAR and CDR) This is a instance of damage of not clearly understanding or separating issues related to hardware/speed/compilation with issues related to the mathematical specification of computation. Similar damage includes the so-called “types” of int/long/float/double/, the concept of “point/reference” in many langs, lisp's concept of “objects”, and in general the concept of a language's data types, etc [see Jargons and High Level Languages]

For a more coherent summary of lisp's list problem, see: Fundamental Problems of Lisp. For a example of the effect of the cons problem, see: Why Lisp Do Not Have A Generic Copy-List Function