Parser

By Xah Lee. Date: 2014-12-29. Last updated: 2016-07-21.

warning: work in progress.

Parsing

Top-down parsers:

Some of the parsers that use top-down parsing include:

Recursive descent parser
LL parser (Left-to-right, Leftmost derivation)
Earley parser

Bottom-up parsers:

Some of the parsers that use bottom-up parsing include:

    Precedence parser
        Operator-precedence parser
        Simple precedence parser
    BC (bounded context) parsing
    LR parser (Left-to-right, Rightmost derivation)
        Simple LR (SLR) parser
        LALR parser
        Canonical LR (LR(1)) parser
        GLR parser
    CYK parser
    Recursive ascent parser
    Shift-Reduce Parser

Bottom-up parsing

Shift-reduce parser

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods. The parsing methods most commonly used today, LR parsing and its variations, are shift-reduce methods. The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions. The outward actions of an LR parser are best understood by ignoring the arcane mathematical details of how LR parser tables are generated, and instead looking at the parser as just some generic shift-reduce method.

Operator-precedence parser

Operator-precedence grammar

operator precedence grammar is a kind of grammar for formal languages.

Technically, an operator precedence grammar is a context-free grammar that has the property (among others[1]) that no production has either an empty right-hand side or two adjacent nonterminals in its right-hand side. These properties allow precedence relations to be defined between the terminals of the grammar. A parser that exploits these relations is considerably simpler than more general-purpose parsers such as LALR parsers. Operator-precedence parsers can be constructed for a large class of context-free grammars.

Shift-reduce parser

A shift-reduce parser works by doing some combination of Shift steps and Reduce steps, hence the name.

A Shift step advances in the input stream by one symbol. That shifted symbol becomes a new single-node parse tree.
A Reduce step applies a completed grammar rule to some of the recent parse trees, joining them together as one tree with a new root symbol.

A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used today, LR parsing and its variations, are shift-reduce methods.[1] The precedence parsers used before the invention of LR parsing are also shift-reduce methods. All shift-reduce parsers have similar outward effects, in the incremental order in which they build a parse tree or call specific output actions.

LL parser

LL parser is a top-down parser for a subset of context-free languages. It parses the input from Left to right, performing Leftmost derivation of the sentence.

An LL parser is called an LL(k) parser if it uses k tokens of lookahead when parsing a sentence. If such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL(k) grammar.

LR parser

LR parsers are a type of bottom-up parsers that efficiently handle deterministic context-free languages in guaranteed linear time. The LALR parsers and the SLR parsers are common variants of LR parsers. LR parsers are often mechanically generated from a formal grammar for the language by a parser generator tool. They are very widely used for the processing of computer languages, more than other kinds of generated parsers.

The name LR is an acronym. The L means that the parser reads input text in one direction without backing up; that direction is typically Left to right within each line, and top to bottom across the lines of the full input file. (This is true for most parsers.) The R means that the parser produces a reversed Rightmost derivation; it does a bottom-up parse, not a top-down LL parse or ad-hoc parse. The name LR is often followed by a numeric qualifier, as in LR(1) or sometimes LR(k). To avoid backtracking or guessing, the LR parser is allowed to peek ahead at k lookahead input symbols before deciding how to parse earlier symbols. Typically k is 1 and is not mentioned. The name LR is often preceded by other qualifiers, as in SLR and LALR.

SLR (simple LR) parser

SLR parser

In computer science, a Simple LR or SLR parser is a type of LR parser with small parse tables and a relatively simple parser generator algorithm. As with other types of LR(1) parser, an SLR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, without guesswork or backtracking. The parser is mechanically generated from a formal grammar for the language.

Canonical LR parser

canonical LR parser or LR(1) parser is an LR(k) parser for k=1, i.e. with a single lookahead terminal.

The special attribute of this parser is that all LR(k) parsers with k≻1 can be transformed into a LR(1) parser.

It can handle all deterministic context-free languages.

In the past this LR(k) parser has been avoided because of its huge memory requirements in favor of less powerful alternatives such as the LALR and the LL(1) parser. Recently, however, a “minimal LR(1) parser” whose space requirements are close to LALR parsers, is being offered by several parser generators.

Like most parsers, the LR(1) parser is automatically generated by compiler compilers like GNU Bison, MSTA, Menhir,[2] HYACC,[3] and LRSTAR.[4]

LALR parser

LALR parser[a] (aka Look-Ahead left-to-right parser) is a simplified version of a canonical LR parser