Concepts & Confusions of {Prefix, Infix, Postfix, Fully Nested} Notations

, ,

In LISP languages, they use a notation like (+ 1 2) to mean 1+2. Likewise, they write (if test this that) to mean if (test) {this} else {that}. LISP code have the form (a b c …), where the a b c themselves may also be of that form. There is a wide misunderstanding that this notation being “prefix notation”. In this article, i'll give some general overview of the meanings of Algebraic Notation and prefix, infix, postfix notations, and explain how LISP notation is a Functional Notation and calling it “prefix notation” is misleading and misconception at a fundamental level.

The math notation we encounter in school, such as 1+2, is called Infix Algebraic Notation. Algebraic notations have the concept of operators, meaning, symbols placed around arguments. In algebraic infix notation, different symbols have different stickiness levels defined for them. ⁖ 3+2*5>7 means (3+(2*5))>7, not ((3+2)*5)>7, because the operator “*” has a higher stickness than the operator “+”, and “+” has a higher stickness than the “>” operator. The stickiness of operator symbols is normally called “Operator Precedence”. It is done by giving a order specification for the symbols, or equivalently, give each symbol a integer index, so that for example if we have a△b▲c, we can unambiguously understand it to mean one of (a△b)▲c or a△(b▲c). (besides a order for operators, each operator must also specify a left/right priority, to resolve cases like this: a△b△c to mean (a△b)△c or a△(b△c), important when the function associated with the operator is not associative. ⁖ exponential: 3^4^5.)

In a algebraic postfix notation known as Reverse Polish Notation, there needs not to have the concept of Operator Precedence. For example, the infix notation (3+(2*5))>7 is written as 3 2 5 * + 7 >, where the operation simply evaluates from left to right. Similarly, for a prefix notation syntax, the evaluation goes from right to left, as in > 7 + * 5 2 3.

While functional notations, do not employ the concept of Operators, because there is no operators. Everything is a syntactically a “function”, written as f(a,b,c…). For example, the same expression above is written as >( +(3, *(2,5)), 7) or greaterThan( plus(3, times(2,5)), 7).

For lisps in particular, their fully functional notation is historically termed sexp (short for S-Expression, where S stands for Symbolic). It is sometimes known as Fully Parenthesized Notation. For example, in lisp the above example is written as: (> (+ 3 (* 2 5)) 7).

The common concepts of “prefix, postfix, infix” are notions in algebraic notations only. Because in Full Functional Notation, there are no operators, therefore no positioning to talk about. A Function's arguments are simply explicitly written out inside a pair of brackets.

The reason for the misconception that lisp notation is “prefix” is because the “head” appears as the first element in the enclosed parenthesis. Such use of the term “prefix” is a confusion engenderer because the significance of the term lies in algebraic notation systems that involves the concept of operators and order.

A side note: the terminology “Algebraic” Notation is a misnomer. It seems to imply that such notations have something to do with the branch of math called algebra while other notation systems do not. The reason the name Algebraic Notation is used because when the science of algebra was young, around 1700s mathematicians are dealing with equations using symbols like + × = written out similar to the way we use them today. This is before the activities of systematic investigation into notation systems as necessitated in the studies of logic in 1800s or computer languages in 1900s. So, when notation systems are explored and invented, the conventional way of infixing + × = became known as algebraic because that's what people think of when seeing them.

See also: What's Function, What's Operator?.

Prefix, Infix, Postfix notations in Mathematica

The Head of Expressions

Lisp's nested parenthesis syntax is a Functional Notation. It has the general form of (f a b …) where any of the symbols inside the matching parenthesis may again be that form. For example, here's a code snippet in Emacs Lisp.

;; Recursively apply (f x i), where i is the ith element in the list li.
;; For example, (fold f x '(1 2)) computes (f (f x 1) 2)
(defun fold (f x li)
  (let ((li2 li) (ele) (x2 x))
    (while (setq ele (pop li2))
      (setq x2 (funcall f x2 ele))
    )
    x2
  )
)

Vast majority of computer languages, interpret source code in a one-dimensional linear nature. Namely, from left to right, line by line, as in written text. (Examples of computer language's source code that are not linear in nature, are spreadsheet, cellular automata, visual programing languages) For languages that interpret source code linearly, there is almost always a hierarchical (tree) lexical structure. Lisp's notation is the most effective in visually showing the tree structure of the syntax. This is because, a function and its arguments, are simply laid out inside matching brackets. The level of nesting corresponds to the order of evaluating expressions.

The first element inside the matching parenthesis, is called the “head” of the expression. For example, in (f a b), the “f” is the head. The head is a function, and the rest of the symbols inside the matching parenthesis are its arguments.

The head of lisp's notation needs not to be defined as the first element inside the parenthesis. For example, we can define the “head” to be the last element inside the parenthesis. So, we write (arg1 arg2 … f) instead of the usual (f arg1 arg2 …) and its syntactical structure remains unchanged. Like wise, you can move the head outside of the parenthesis.

In Mathematica, the head is placed in front of the parenthesis, and square brackets are used instead of parenthesis for the enclosing delimiter. For example, lisp's (f a b c) is Mathematica's f[a,b,c]. Other examples: (sin θ) vs Sin[θ], (map f list) vs Map[f,list]. Placing the head in front of the matching bracket makes the notation more familiar, because it is the same how math are traditionally written.

However, there is a disadvantage in moving the head of a expression from inside the matching bracket to outside. Namely: The evaluation order no longer completely corresponds to nesting level of the text in source code, WHEN THE HEAD IS ITSELF A COMPOUND EXPRESSION.

For example, suppose Deriv is a function that takes a function f and returns a function g (the derivative of f), and we want to apply g to a variable x. In lisp, we would write ((Deriv f) x). In Mathematica, we would write Deriv[f][x].

In lisp's version, the nesting level corresponds to the order of the evaluation. In the Mathematica's form, it is no longer so, because now you have forms like this: a[b][c], and there's no nesting that indicates whether it means (a[b])[c] or a([b][c]).

For a more detailed example of Mathematica syntax, see Intro to Mathematica Pattern Matching for Lisp Programers.

A Systematic System for Prefix, Postfix, Infix Notations

A prefix notation in Mathematica has the form f@arg. For example: f@a@b@c is equivalent to f[a[b[c]]] or lisp's (f (a (b c))). Mathematica also offers a postfix notation using the operator “//”. For example, c//b//a//f is syntactically equivalent to f[a[b[c]]]. Unix shell's pipe “|” syntax, is a form of postfix notation. ⁖ c | b | a | f.

For example, Sin[List[1,2,3]] can be written as Sin@List[1,2,3] or List[1,2,3]//Sin. These are all SYNTACTICALLY equivalent. For infix notation, the function symbol is placed between its arguments. In Mathematica, the generic form for infix notation is by sandwiching the tilde symbol around the function name. ⁖ Join[List[1,2],List[3,4]] is syntactically equivalent to List[1,2] ~Join~ List[3,4].

Note: Infix notation, is inherently limited to functions with 2 parameters. This is purely a syntactical issue, due to the fact that the source code is written linearly in a one-dimensional line. There are only 2 spots, the left and the right, to the infix operator. In a similar way, prefix and postfix notations, are inherently limited to functions with one parameter. You might think that's not true because we can use parenthesis to enclose function's arguments. Once you let in a matching delimiter for function's parameters, you have a syntax for nesting of functions, and it is no longer the simple prefix, postfix, or infix notation, you have functional notation.

Systematic System of Operators

Besides a systematic prefix, infix, postfix notations, used together with the function's names as operators. Mathematica introduces familiar operators in a systematic way. For example, The “Plus” in Plus[a,b] is given a operator of “+”, so Plus[1,2] can be written as 1+2. Other arithmetic functions are all given a operator in a similar way. For example, Times[a,b] can be written as a*b. Divide[a,b] is a/b. Power[2,8] can be written as 2^8. The “List” in List[1,2,3] is given a match-fix bracket operator “{}”, so that, List[1,2,3] can be more easily written as {1,2,3}. The boolean “And” function is given the operator &&, so that And[a,b] can be written with the more familiar and convenient a && b. The Map function in Map[f, List[1,2,3]] is given the operator “/@”, so the example would be written as f /@ List[1,2,3].

The gist being that most commonly used functions are mapped to special symbols as operators to emulate the irregular but nevertheless well-understood conventional notations. Also, it reduces the use of deep nesting that is difficult to type and manage. Combining all these types of systematic syntax variations, it can make the source code easier to read than a purely nested structure. For example, common math expressions such as 3+2*5>7 don't have to be written as Greater[Plus[3,Times[2,5]],7] or the lispy (> (+ 3 (* 2 5)) 7).

Note: it can also make the source code harder to read if the notations are mixed up unreasonably or intentionally as in obfuscated code.

C and Perl

When we say that C uses infix notation, the term “infix notation” is used loosely for convenience of description. C and other language's syntaxes derived from it (⁖ Java, Perl, JavaScript, …) are not based on a notation system, but takes the approach of a ad hoc syntax soup. Things like {i++, ++i, for(;;){}, while(){}, 0x123, expr1 ? expr2 : expr3, sprint(…%s…,…), …} are syntax whimsies. They lack a formal basis.

Perl mongers are proud of their slogan of There's More Than One Way To Do It in its variegated syntax sugars. In languages based on a systematic notation system, such as Mathematica, the constructs in source code are much more varied yet remain formally simple.

How Purely Nested Notation Limits the Language's Utility

There is a common complain by programers about lisp's notation, of nested parenthesis, being unnatural or difficult to read. Long time lisp programers, often counter, that it is a matter of conditioning, and or blaming the use of “inferior” text editors that are not designed to display nested notations. In the following, i describe how lisp notation is actually a problem, in several levels.

① Some 99% of programers are not used to the nested parenthesis syntax. This is a practical problem. On this aspect alone, lisp's syntax can be considered a problem.

② Arguably, the pure nested syntax is not natural for human to read. Long time lispers may disagree on this point.

③ Most importantly, a pure nested syntax discourages frequent or advanced use of function sequencing or compositions. This aspect is the most devastating.

The first issue, that most programers are not comfortable with nested notation, is well known. It is not a technical issue. Whether it is considered a problem of the lisp language is a matter of philosophical disposition.

The second issue, about nested parenthesis not being natural for human to read, may be debatable. I do think, that deep nesting is a problem to the programer. Here's a example of 2 blocks of code that are syntactically equivalent in the Mathematica language:

vectorAngle[{a1_, a2_}] := Module[{x, y},
    {x, y} = {a1, a2}/Sqrt[a1^2 + a2^2] // N;
    If[x == 0, If[Sign@y === 1, π/2, -π/2],
      If[y == 0, If[Sign@x === 1, 0, π],
        If[Sign@y === 1, ArcCos@x, 2 π - ArcCos@x]
        ]
      ]
    ]
SetDelayed[vectorAngle[List[Pattern[a1,Blank[]],Pattern[a2,Blank[]]]],
    Module[List[x,y],
      CompoundExpression[
        Set[List[x,y],
          N[Times[List[a1,a2],
              Power[Sqrt[Plus[Power[a1,2],Power[a2,2]]],-1]]]],
        If[Equal[x,0],
          If[SameQ[Sign[y],1],Times[π,Power[2,-1]],
            Times[Times[-1,π],Power[2,-1]]],
          If[Equal[y,0],If[SameQ[Sign[x],1],0,π],
            If[SameQ[Sign[y],1],ArcCos[x],
              Plus[Times[2,π],Times[-1,ArcCos[x]]]]]]]]]

In the latter, it uses a full nested form (called FullForm in Mathematica). This form is isomorphic to lisp's nested parenthesis syntax, token for token (i.e. lisp's (f a b) is Mathematica's f[a,b]). As you can see, this form, by the sheer number of nested brackets, is in practice problematic to read and type. In Mathematica, nobody really program using this syntax. (The FullForm syntax is there, for the same reason of language design principle shared with lisp of “consistency and simplicity”, or the commonly touted lisp advantage of “data is program; program is data”.)

The following shows the same code with tokens transformed into the lisp style.

(SetDelayed
 (vectorAngle (list (Pattern a1 (Blank)) (Pattern a2 (Blank))))
 (let (list x y)
   (progn
    (setq (list x y)
          (N (* (list a1 a2)
                (exp (sqrt (+ (exp a1 2) (exp a2 2))) -1))))
    (if (eq x 0)
        (if (equal (signum y) 1) (* π (exp 2 -1))
          (* (* -1 π) (exp 2 -1)))
      (if (eq y 0) (if (equal (signum x) 1) 0 π)
        (if (equal (signum y) 1) (acos x)
          (+ (* 2 π) (* -1 (acos x)))))))))

Note: The steps to transform the sample Mathematica code in FullForm to lisp's sexp is done by these operations (for those curious):

① Move the head inside. (using this regex \([A-Za-z]+\)\[[\1)

② Replace bracket by parens. []()

③ Replace function names to lisp styled names:

The third issue, about how nested syntax seriously discourages frequent use of inline function sequencing, is the most important.

One practical way to see how this is so, is by considering unix's shell syntax. You all know, how convenient and powerful is the unix's pipes. Here are some practical example: ls -al | grep xyz, or cat a b c | grep xyz | sort | uniq.

Now suppose, we get rid of the unix's pipe notation, instead, replace it with a pure functional notation: ⁖ (uniq (sort (grep xyz (cat a b c)))), or enrich it with a composition function and a pure function construct, so this example can be written as: ((composition uniq sort (lambda (x) (grep xyz x))) (cat a b c)).

You see, how this change, although syntactically equivalent to the pipe “|” (or semantically equivalent in the example using function compositions), but due to the cumbersome nesting of parenthesis, will force a change in the nature of the language by the code programer produces. Namely, the frequency of inline sequencing of functions on the fly will probably be reduced, instead, there will be more code that define functions with temp variables and apply it just once as with traditional languages.

A language's syntax or notation system, has major impact on what kind of code or style or thinking pattern on the language's users. This is a well-known fact for those acquainted with the history of math notations.

The sequential prefix notation such as Haskell's f g h x, Mathematica's f@g@h@x, or postfix as unix's x|h|g|f, Ruby's x.h.g.f, Mathematica's x//h//g//f, are much more convenient and easier to decipher, than (f (g (h x))) or ((composition f g h) x). In real world source code, any of the f, g, h will likely be full of nested parens themselves, because often they are functions constructed inline (aka lambda).

Lisp, by sticking with a almost uniform nested parenthesis notation, it immediately reduces the pattern of sequencing functions, simply because the syntax does not readily lend the programer to it. For programers who are aware of the coding pattern of sequencing functions (aka function chaining, filtering), now either need to think in terms of a separate “composition” construct, and or subject to the much problematic typing and deciphering of nested parenthesis. (in practice, it's mostly done by writing the inline functions as a auxiliary function definitions on their own, then another code block sequence them together. ⁖ (defun f …) (defun g …) (defun h …) (f (g (h x))) )

Note: Lisp syntax is actually not a pure nested form. It has ad hoc syntax equivalents such as the quote construct '(a b c), dotted notation for cons ⁖ (a . b) for (cons a b), special syntax for quoted vector [1 2 3], splice and partial hold eval ⁖ `(,@ x ,@ y ,z 4)), weirded comment syntax #|something|#. Mathematica's FullForm, is actually a version of unadulterated nested notation. For a full discussion, see: Fundamental Problems of Lisp.

For a practical example of this problem, see: Programing Language: LISP Syntax Problem of Piping Functions.

Thanks to Andrew Haley for a correction.

Lisp Syntax Readable?

lisp syntax is really unreadable. 10 years ago, i thought it's more of a joke for those uninitiated. But then surely the basic fact of uniformity is a problem for reading (because in nature, things are not uniform). But now having coded lisp for ≈5 years, i do find it comparatively unreadable.

here's sample code from GNU Emacs am currently reading.

(defun kill-new (string &optional replace yank-handler)
  "Make STRING the latest kill in the kill ring.
…"
  (if (> (length string) 0)
      (if yank-handler
          (put-text-property 0 (length string)
                             'yank-handler yank-handler string))
    (if yank-handler
        (signal 'args-out-of-range
                (list string "yank-handler specified for empty string"))))
  (unless (and kill-do-not-save-duplicates
               ;; Due to text properties such as 'yank-handler that
               ;; can alter the contents to yank, comparison using
               ;; `equal' is unsafe.
               (equal-including-properties string (car kill-ring)))
    (if (fboundp 'menu-bar-update-yank-menu)
        (menu-bar-update-yank-menu string (and replace (car kill-ring)))))
  (when save-interprogram-paste-before-kill
    (let ((interprogram-paste (and interprogram-paste-function
                                   (funcall interprogram-paste-function))))
      (when interprogram-paste
        (dolist (s (if (listp interprogram-paste)
                       (nreverse interprogram-paste)
                     (list interprogram-paste)))
          (unless (and kill-do-not-save-duplicates
                       (equal-including-properties s (car kill-ring)))
            (push s kill-ring))))))
  (unless (and kill-do-not-save-duplicates
               (equal-including-properties string (car kill-ring)))
    (if (and replace kill-ring)
        (setcar kill-ring string)
      (push string kill-ring)
      (if (> (length kill-ring) kill-ring-max)
          (setcdr (nthcdr (1- kill-ring-max) kill-ring) nil))))
  (setq kill-ring-yank-pointer kill-ring)
  (if interprogram-cut-function
      (funcall interprogram-cut-function string)))

Even though that Perl is a syntax soup, but once a programer is familiar with the language and a piece of code didn't abused the syntax, it's much more readable than lisp.

blog comments powered by Disqus