What is the Difference Between BNF, EBNF, ABNF?
BNF is the original, most simple, mostly used in academic papers of theoretical context, for communicating to humans. (as opposed to being used in compiler/parser.) There's no one exact specification of BNF.
EBNF means Extended BNF. There's not one single EBNF, but each author or program define their own variant that's slightly different.
ABNF (augmented BNF) is a rather very different format than BNF, but is more standardized. It is harder to read, but is most used in parsers.
In terms of power, they are all equivalent.
They are just syntactical differences.
For example, in traditional BNF, the lhs/rhs separator is ::=
, but in books, often →
. In EBNF and ABNF it's =
.
Another example, in traditional BNF, nonterminals are written with brackets
around it such as
<EXPR>
and terminals are just plain characters.
In ABNF,
nonterminals are plain, and terminals are bracketed with double quotes,
like this "+"
.
In BNF, the symbol for alternatives is a vertical line |
.
In ABNF, the symbol for alternatives is a slash /
.
EBNF and ABNF also features shortcut grammar syntax, such as specifying 0 or more of the preceding nonterminal/terminal. To translate it to BNF, you'll need to introduce several more rules and nonterminals.
In general, BNF notation is good for teaching, explanation, theoretical discussion. It is simple. EBNF and especially ABNF are more used to actually implement grammar and read by parsers.
Example BNF
postal-address ::= name-part street-address zip-part
name-part ::= personal-part last-name opt-suffix-part EOL
| personal-part name-part
personal-part ::= first-name | initial "."
street-address ::= house-num street-name opt-apt-num EOL
zip-part ::= town-name "," state-code ZIP-code EOL
opt-suffix-part ::= "Sr." | "Jr." | roman-numeral | ""
opt-apt-num ::= apt-num | ""
note: this example is incomplete. For example, name-part
is not defined.
example from Backus–Naur Form
Note:
- non-terminal symbols are bracketed by less/greater signs < >.
For example,
zip-code
. (on this page, it's omitted, but the non-terminal is shown colored and slanted.) - Many chars are used for the notation itself. For example, < > : = |.
Extended Backus Naur Form
There are several “Extended Backus Naur Form”. Here's a example describing a simplified Pascal syntax:
(* a simple program syntax in EBNF − Wikipedia *) program = 'PROGRAM', white space, identifier, white space, 'BEGIN', white space, { assignment, ";", white space }, 'END.' ; identifier = alphabetic character, { alphabetic character | digit } ; number = [ "-" ], digit, { digit } ; string = '"' , { all characters - '"' }, '"' ; assignment = identifier , ":=" , ( number | identifier | string ) ; alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" ; digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ; white space = ? white space characters ? ; all characters = ? all visible characters ? ;
here's Pascal code described by it:
PROGRAM DEMO1
BEGIN
A:=3;
B:=45;
H:=-100023;
C:=A;
D123:=B34A;
BABOON:=GIRAFFE;
TEXT:="Hello world!";
END.
Usage | Notation |
---|---|
definition | = |
concatenation | , |
termination | ; |
alternation | | |
option | [ ... ] |
repetition | { ... } |
grouping | ( ... ) |
terminal string | " ... " |
terminal string | ' ... ' |
comment | (* ... *) |
special sequence | ? ... ? |
exception | - |
note:
Terminals are enclosed by a double quote pair ". All others are non-terminal. (except special symbols)
some character and their meanings:
- * repetition-symbol
- - except-symbol
- , concatenate-symbol
- | definition-separator-symbol
- = defining-symbol
- ; terminator-symbol
Advantages over BNF
Any grammar defined in EBNF can also be represented in BNF though representations in the latter are generally lengthier. e.g., options and repetitions cannot be directly expressed in BNF and require the use of an intermediate rule or alternative production defined to be either nothing or the optional production for option, or either the repeated production of itself, recursively, for repetition. The same constructs can still be used in EBNF.
The BNF uses the symbols (<, >, |, ::=) for itself, but does not include quotes around terminal strings. This prevents these characters from being used in the languages, and requires a special symbol for the empty string. In EBNF, terminals are strictly enclosed within quotation marks (“…” or ‘…’). The angle brackets (“<…>“) for nonterminals can be omitted.
BNF syntax can only represent a rule in one line, whereas in EBNF a terminating character, the semicolon, marks the end of a rule.
Furthermore, EBNF includes mechanisms for enhancements, defining the number of repetitions, excluding alternatives, comments, etc.
from Extended Backus–Naur Form
note: majority of so-called EBNF are sloppy for communication with humans. There are lots ambiguities and not well defined.
Augmented Backus–Naur Form
this is the worst lot. ABNF is not really human friendly. Example:
telephone-uri = "tel:" telephone-subscriber telephone-subscriber = global-number / local-number global-number = global-number-digits *par local-number = local-number-digits *par context *par par = parameter / extension / isdn-subaddress isdn-subaddress = ";isub=" 1*uric extension = ";ext=" 1*phonedigit context = ";phone-context=" descriptor descriptor = domainname / global-number-digits global-number-digits = "+" *phonedigit DIGIT *phonedigit local-number-digits = *phonedigit-hex (HEXDIG / "*" / "#") *phonedigit-hex domainname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum / alphanum *( alphanum / "-" ) alphanum toplabel = ALPHA / ALPHA *( alphanum / "-" ) alphanum parameter = ";" pname ["=" pvalue ] pname = 1*( alphanum / "-" ) pvalue = 1*paramchar paramchar = param-unreserved / unreserved / pct-encoded unreserved = alphanum / mark mark = "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" pct-encoded = "%" HEXDIG HEXDIG param-unreserved = "[" / "]" / "/" / ":" / "&" / "+" / "$" phonedigit = DIGIT / [ visual-separator ] phonedigit-hex = HEXDIG / "*" / "#" / [ visual-separator ] visual-separator = "-" / "." / "(" / ")" alphanum = ALPHA / DIGIT reserved = ";" / "/" / "?" / ":" / "@" / "&" / "=" / "+" / "$" / "," uric = reserved / unreserved / pct-encoded
postal-address = name-part street zip-part name-part = *(personal-part SP) last-name [SP suffix] CRLF name-part =/ personal-part CRLF personal-part = first-name / (initial ".") first-name = *ALPHA initial = ALPHA last-name = *ALPHA suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X")) street = [apt SP] house-num SP street-name CRLF apt = 1*4DIGIT house-num = 1*8(DIGIT / ALPHA) street-name = 1*VCHAR zip-part = town-name "," SP state 1*2SP zip-code CRLF town-name = 1*(ALPHA / SP) state = 2ALPHA zip-code = 5DIGIT ["-" 4DIGIT]
see also [2014-11-22 https://github.com/Engelberg/instaparse/blob/master/docs/ABNF.md ], by Mark Engelberg.