parser notes

By Xah Lee. Date: . Last updated: .

warning: work in progress.

Syntax-directed translation

Syntax-directed translation refers to a method of compiler implementation where the source language translation is completely driven by the parser.

A common method of Syntax-directed translation is translating a string into a sequence of actions by attaching one such action to each rule of a grammar.[1] Thus, parsing a string of the grammar produces a sequence of rule applications. SDT provides a simple way to attach semantics to any such syntax.

Syntax-directed translation

top down parser


grammar inference

grammar inference, is the reverse problem of parsing. given a set of strings, figure out a grammar. Grammar induction


parsing methods can be classified as top-down or bottom-up and as directional or non-directional; the directional methods can be further distinguished into deterministic and none-deterministic.

—Dicke Grune Book

3.4.2 Type 0 and Type 1 Grammars

It is a remarkable result in formal linguistics that the recognition problem for a arbitrary Type 0 grammar is unsolvable. This means that there cannot be an algorithm that accepts an arbitrary Type 0 grammar and an arbitrary string and tells us in finite time whether the grammar can produce the string or not. This statement can be proven, but the proof is very intimidating and, what is worse, it does not provide any insight into the cause of the phenomenon. It is a proof by contradiction: we can prove that, if such an algorithm existed, we could construct a second algorithm of which we can prove that it only terminates if it never terminates. Since this is a logical impossibility and since all other premises that went into the intermediate proof are logically sound we are forced to conclude that our initial premise, the existence of a recognizer for Type 0 grammars, is a logical impossibility. Convincing, but not food for the soul. For the full


defining qualities of terminal and non-terminal?

in formal grammar, are there difference in defining qualities of terminal and non-terminal? That is, is it possible, to define formal grammar, without this distinction? is the distinction, merely a convenience?

I think, the distinction is merely a convenience. The essence of non-terminal, is that there is no rule such that it is on left side and the right side is something more than itself.


Pumping lemma for context-free languages

Funarg problem

Funarg problem

In computer science, the funarg problem refers to the difficulty in implementing first-class functions (functions as first-class objects) in programming language implementations so as to use stack-based memory allocation of the functions.

The difficulty only arises if the body of a nested function refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call.[1] To summarize the discussion below, two standard resolutions are to either forbid such references or to create closures.[2]

Static single assignment form

Static single assignment form

In compiler design, static single assignment (SSA) form is a property of an intermediate representation (IR), which requires that each variable is assigned exactly once, and every variable is defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

SSA was developed by Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck, researchers at IBM in the 1980s.[1]

One might expect to find SSA in a compiler for Fortran or C, whereas in functional language compilers, such as those for Scheme, ML and Haskell, continuation-passing style (CPS) is generally used. SSA is formally equivalent to a well-behaved subset of CPS excluding non-local control flow, which does not occur when CPS is used as intermediate representation. So optimizations and transformations formulated in terms of one immediately apply to the other.

Primitive recursive function

Primitive recursive function

In computability theory, primitive recursive functions are a class of functions that are defined using primitive recursion and composition as central operations and are a strict subset of the total µ-recursive functions (µ-recursive functions are also called partial recursive). Primitive recursive functions form an important building block on the way to a full formalization of computability. These functions are also important in proof theory.

Ackermann function

Ackermann function

in computability theory, the Ackermann function is one of the simplest and earliest-discovered examples of a total computable function that is not primitive recursive. All primitive recursive functions are total and computable, but the Ackermann function illustrates that not all total computable functions are primitive recursive.


Some links to check out:

Discovered several languages. TXL, Colm, and the concept of program transformation.

There seems to be quite a few such languages. Here's a big list: http://www.program-transformation.org/Transform/TransformationSystems .

TXL (programming language) (home at http://www.txl.ca/) seems to be a popular transformation language. And a new supposedly improved one is Colm, at: http://www.complang.org/colm/. On the colm site, it has a Ph D thesis in PDF that gives a nice overview of such systems.

proof: languages are encountably infinit

diagonalization technique is described more formally in most books on theoretical computer science; see e.g., Rayward-Smith [393, pp. 5-6], or Sudkamp [397, Section 1.4]. [Dick Grune, section 2.1.3.4]

formal grammar (phrase structure grammar) is as Powerful as

This [phrase structure grammar] is a framework of impressive frugality and the question immediately rises: Is it sufficient? That is hard to say, but if it is not, we do not have anything more expressive.

Strange as it may sound, all other methods known to mankind for generating sets have been proved to be equivalent to or less powerful than a phrase structure grammar. One obvious method for generating a set is, of course, to write a program generating it, but it has been proved that any set that can be generated by a program can be generated by a phrase structure grammar. There are even more arcane methods, but all of them have been proved not to be more expressive.

On the other hand there is no proof that no such stronger method can exist. But in view of the fact that many quite different methods all turn out to halt at the same barrier, it is highly unlikely3 that a stronger method will ever be found. See, e.g. Révész [394, pp 100-102].

[Dick Grune, section 2.2.3]

compiler examples

https://github.com/Microsoft/TypeScript

TypeScript https://github.com/Microsoft/TypeScript

CoffeeScript https://github.com/jashkenas/coffeescript

Jison. A parser generator similar to Bison, but for JavaScript. https://github.com/zaach/jison

toy JavaScript, in C https://github.com/zooxyt/js

toy regex, in C. https://github.com/zooxyt/qre

toy language, APLisp, in C. https://github.com/tnovelli/aplisp

JavaScript in Common Lisp http://mxr.mozilla.org/mozilla/source/js2/semantics/

https://github.com/lu1s/dragon-book-source-code

lex man page, flex info manual

yacc man page, bison info manual

the 2 or 3 js6 compiler in js

The asteroid to kill this dinosaur is still in orbit. —lex man page of plan9

dragon book code example. toy parser, in Java. http://dragonbook.stanford.edu/