11 ECMAScript Language: Lexical Grammar

The source text of an ECMAScript Script or Module is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.

There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar. The InputElementRegExpOrTemplateTail goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted. The InputElementRegExp goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted. The InputElementTemplateTail goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted. In all other contexts, InputElementDiv is used as the lexical goal symbol.

NOTE The use of multiple lexical goals ensures that there are no lexical ambiguities that would affect automatic semicolon insertion. For example, there are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (see 11.9); in examples such as the following:

a = b
/hi/g.exec(c).map(d);

where the first non-whitespace, non-comment code point after a LineTerminator is U+002F (SOLIDUS) and the syntactic context allows division or division-assignment, no semicolon is inserted at the LineTerminator. That is, the above example is interpreted in the same way as:

a = b / hi / g.exec(c).map(d);

Syntax

InputElementDiv ::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
RightBracePunctuator
InputElementRegExp ::
WhiteSpace
LineTerminator
Comment
CommonToken
RightBracePunctuator
RegularExpressionLiteral
InputElementRegExpOrTemplateTail ::
WhiteSpace
LineTerminator
Comment
CommonToken
RegularExpressionLiteral
TemplateSubstitutionTail
InputElementTemplateTail ::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
TemplateSubstitutionTail

11.1 Unicode Format-Control Characters

The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.

U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text these code points may also be used in an IdentifierName (see 11.6.1) after the first character.

U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see 11.2).

The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in Table 31.

Table 31 — Format-Control Code Point Usage
Code Point Name Abbreviation Usage
U+200C ZERO WIDTH NON-JOINER <ZWNJ> IdentifierPart
U+200D ZERO WIDTH JOINER <ZWJ> IdentifierPart
U+FEFF ZERO WIDTH NO-BREAK SPACE <ZWNBSP> WhiteSpace

11.2 White Space

White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a StringLiteral, a RegularExpressionLiteral, a Template, or a TemplateSubstitutionTail where they are considered significant code points forming part of a literal value. They may also occur within a Comment, but cannot appear within any other kind of token.

The ECMAScript white space code points are listed in Table 32.

Table 32 — White Space Code Points
Code Point Name Abbreviation
U+0009 CHARACTER TABULATION <TAB>
U+000B LINE TABULATION <VT>
U+000C FORM FEED (FF) <FF>
U+0020 SPACE <SP>
U+00A0 NO-BREAK SPACE <NBSP>
U+FEFF ZERO WIDTH NO-BREAK SPACE <ZWNBSP>
Other category “Zs” Any other Unicode “Separator, space” code point <USP>

ECMAScript implementations must recognize as WhiteSpace code points listed in the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations may also recognize as WhiteSpace additional category Zs code points from subsequent editions of the Unicode Standard.

NOTE Other than for the code points listed in Table 32, ECMAScript WhiteSpace intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in category “Zs”.

Syntax

WhiteSpace ::
<TAB>
<VT>
<FF>
<SP>
<NBSP>
<ZWNBSP>
<USP>

11.3 Line Terminators

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (11.9). A line terminator cannot occur within any token except a StringLiteral, Template, or TemplateSubstitutionTail. Line terminators may only occur within a StringLiteral token as part of a LineContinuation.

A line terminator can occur within a MultiLineComment (11.4) but cannot occur within a SingleLineComment.

Line terminators are included in the set of white space code points that are matched by the \s class in regular expressions.

The ECMAScript line terminator code points are listed in Table 33.

Table 33 — Line Terminator Code Points
Code Point Unicode Name Abbreviation
U+000A LINE FEED (LF) <LF>
U+000D CARRIAGE RETURN (CR) <CR>
U+2028 LINE SEPARATOR <LS>
U+2029 PARAGRAPH SEPARATOR <PS>

Only the Unicode code points in Table 33 are treated as line terminators. Other new line or line breaking Unicode code points are not treated as line terminators but are treated as white space if they meet the requirements listed in Table 32. The sequence <CR><LF> is commonly used as a line terminator. It should be considered a single SourceCharacter for the purpose of reporting line numbers.

Syntax

LineTerminator ::
<LF>
<CR>
<LS>
<PS>
LineTerminatorSequence ::
<LF>
<CR> [lookahead ≠ <LF> ]
<LS>
<PS>
<CR> <LF>

11.4 Comments

Comments can be either single or multi-line. Multi-line comments cannot nest.

Because a single-line comment can contain any Unicode code point except a LineTerminator code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the // marker to the end of the line. However, the LineTerminator at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see 11.9).

Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator code point, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar.

Syntax

Comment ::
MultiLineComment
SingleLineComment
MultiLineComment ::
/* MultiLineCommentCharsopt */
MultiLineCommentChars ::
MultiLineNotAsteriskChar MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
PostAsteriskCommentChars ::
MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
MultiLineNotAsteriskChar ::
SourceCharacter but not *
MultiLineNotForwardSlashOrAsteriskChar ::
SourceCharacter but not one of / or *
SingleLineComment ::
// SingleLineCommentCharsopt
SingleLineCommentChars ::
SingleLineCommentChar SingleLineCommentCharsopt
SingleLineCommentChar ::
SourceCharacter but not LineTerminator

11.5 Tokens

Syntax

CommonToken ::
IdentifierName
Punctuator
NumericLiteral
StringLiteral
Template

NOTE The DivPunctuator, RegularExpressionLiteral, RightBracePunctuator, and TemplateSubstitutionTail productions derive additional tokens that are not included in the CommonToken production.

11.6 Names and Keywords

IdentifierName and ReservedWord are tokens that are interpreted according to the Default Identifier Syntax given in Unicode Standard Annex #31, Identifier and Pattern Syntax, with some small modifications. ReservedWord is an enumerated subset of IdentifierName. The syntactic grammar defines Identifier as an IdentifierName that is not a ReservedWord (see 11.6.2). The Unicode identifier grammar is based on character properties specified by the Unicode Standard. The Unicode code points in the specified categories in version 5.1.0 of the Unicode standard must be treated as in those categories by all conforming ECMAScript implementations. ECMAScript implementations may recognize identifier code points defined in later editions of the Unicode Standard.

NOTE This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an IdentifierName, and the code points U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are permitted anywhere after the first code point of an IdentifierName.

Unicode escape sequences are permitted in an IdentifierName, where they contribute a single Unicode code point to the IdentifierName. The code point is expressed by the HexDigits of the UnicodeEscapeSequence (see 11.8.4). The \ preceding the UnicodeEscapeSequence and the u and { } code units, if they appear, do not contribute code points to the IdentifierName. A UnicodeEscapeSequence cannot be used to put a code point into an IdentifierName that would otherwise be illegal. In other words, if a \ UnicodeEscapeSequence sequence were replaced by the SourceCharacter it contributes, the result must still be a valid IdentifierName that has the exact same sequence of SourceCharacter elements as the original IdentifierName. All interpretations of IdentifierName within this specification are based upon their actual code points regardless of whether or not an escape sequence was used to contribute any particular code point.

Two IdentifierName that are canonically equivalent according to the Unicode standard are not equal unless, after replacement of each UnicodeEscapeSequence, they are represented by the exact same sequence of code points.

Syntax

IdentifierName ::
IdentifierStart
IdentifierName IdentifierPart
IdentifierStart ::
UnicodeIDStart
$
_
\ UnicodeEscapeSequence
IdentifierPart ::
UnicodeIDContinue
$
_
\ UnicodeEscapeSequence
<ZWNJ>
<ZWJ>
UnicodeIDStart ::
any Unicode code point with the Unicode property “ID_Start”
UnicodeIDContinue ::
any Unicode code point with the Unicode property “ID_Continue”

The definitions of the nonterminal UnicodeEscapeSequence is given in 11.8.4.

NOTE The sets of code points with Unicode properties “ID_Start” and “ID_Continue” include, respectively, the code points with Unicode properties “Other_ID_Start” and “Other_ID_Continue”.

11.6.1 Identifier Names

11.6.1.1 Static Semantics: Early Errors

IdentifierStart :: \ UnicodeEscapeSequence
  • It is a Syntax Error if SV(UnicodeEscapeSequence) is none of "$", or "_", or the UTF16Encoding (10.1.1) of a code point matched by the UnicodeIDStart lexical grammar production.

IdentifierPart :: \ UnicodeEscapeSequence
  • It is a Syntax Error if SV(UnicodeEscapeSequence) is none of "$", or "_", or the UTF16Encoding (10.1.1) of either <ZWNJ> or <ZWJ>, or the UTF16Encoding of a Unicode code point that would be matched by the UnicodeIDContinue lexical grammar production.

11.6.1.2 Static Semantics: StringValue

See also: 11.8.4.2, 12.1.4.

IdentifierName ::
IdentifierStart
IdentifierName IdentifierPart
  1. Return the String value consisting of the sequence of code units corresponding to IdentifierName. In determining the sequence any occurrences of \ UnicodeEscapeSequence are first replaced with the code point represented by the UnicodeEscapeSequence and then the code points of the entire IdentifierName are converted to code units by UTF16Encoding (10.1.1) each code point.

11.6.2 Reserved Words

A reserved word is an IdentifierName that cannot be used as an Identifier.

Syntax

ReservedWord ::
Keyword
FutureReservedWord
NullLiteral
BooleanLiteral

NOTE The ReservedWord definitions are specified as literal sequences of specific SourceCharacter elements. A code point in a ReservedWord cannot be expressed by a \ UnicodeEscapeSequence.

11.6.2.1 Keywords

The following tokens are ECMAScript keywords and may not be used as Identifiers in ECMAScript programs.

Syntax

Keyword :: one of
break do in typeof
case else instanceof var
catch export new void
class extends return while
const finally super with
continue for switch yield
debugger function this
default if throw
delete import try

NOTE In some contexts yield is given the semantics of an Identifier. See 12.1.1. In strict mode code, let and static are treated as reserved keywords through static semantic restrictions (see 12.1.1, 13.3.1.1, 13.7.5.1, and 14.5.1) rather than the lexical grammar.

11.6.2.2 Future Reserved Words

The following tokens are reserved for used as keywords in future language extensions.

Syntax

FutureReservedWord ::
enum
await

await is only treated as a FutureReservedWord when Module is the goal symbol of the syntactic grammar.

NOTE Use of the following tokens within strict mode code (see 10.2.1) is also reserved. That usage is restricted using static semantic restrictions (see 12.1.1) rather than the lexical grammar:

implements package protected
interface private public

11.7 Punctuators

Syntax

Punctuator :: one of
{ ( ) [ ] .
... ; , < > <=
>= == != === !==
+ - * % ++ --
<< >> >>> & | ^
! ~ && || ? :
= += -= *= %= <<=
>>= >>>= &= |= ^= =>
DivPunctuator ::
/
/=
RightBracePunctuator ::
}

11.8 Literals

11.8.1 Null Literals

Syntax

NullLiteral ::
null

11.8.2 Boolean Literals

Syntax

BooleanLiteral ::
true
false

11.8.3 Numeric Literals

Syntax

NumericLiteral ::
DecimalLiteral
BinaryIntegerLiteral
OctalIntegerLiteral
HexIntegerLiteral
DecimalLiteral ::
DecimalIntegerLiteral . DecimalDigitsopt ExponentPartopt
. DecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
DecimalIntegerLiteral ::
0
NonZeroDigit DecimalDigitsopt
DecimalDigits ::
DecimalDigit
DecimalDigits DecimalDigit
DecimalDigit :: one of
0 1 2 3 4 5 6 7 8 9
NonZeroDigit :: one of
1 2 3 4 5 6 7 8 9
ExponentPart ::
ExponentIndicator SignedInteger
ExponentIndicator :: one of
e E
SignedInteger ::
DecimalDigits
+ DecimalDigits
- DecimalDigits
BinaryIntegerLiteral ::
0b BinaryDigits
0B BinaryDigits
BinaryDigits ::
BinaryDigit
BinaryDigits BinaryDigit
BinaryDigit :: one of
0 1
OctalIntegerLiteral ::
0o OctalDigits
0O OctalDigits
OctalDigits ::
OctalDigit
OctalDigits OctalDigit
OctalDigit :: one of
0 1 2 3 4 5 6 7
HexIntegerLiteral ::
0x HexDigits
0X HexDigits
HexDigits ::
HexDigit
HexDigits HexDigit
HexDigit :: one of
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

The SourceCharacter immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.

NOTE For example:
      3in
is an error and not the two input elements 3 and in.

A conforming implementation, when processing strict mode code (see 10.2.1), must not extend, as described in B.1.1, the syntax of NumericLiteral to include LegacyOctalIntegerLiteral, nor extend the syntax of DecimalIntegerLiteral to include NonOctalDecimalIntegerLiteral.

11.8.3.1 Static Semantics: MV

A numeric literal stands for a value of the Number type. This value is determined in two steps: first, a mathematical value (MV) is derived from the literal; second, this mathematical value is rounded as described below.

  • The MV of NumericLiteral :: DecimalLiteral is the MV of DecimalLiteral.

  • The MV of NumericLiteral :: BinaryIntegerLiteral is the MV of BinaryIntegerLiteral.

  • The MV of NumericLiteral :: OctalIntegerLiteral is the MV of OctalIntegerLiteral.

  • The MV of NumericLiteral :: HexIntegerLiteral is the MV of HexIntegerLiteral.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral . is the MV of DecimalIntegerLiteral.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits is the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits × 10n), where n is the number of code points in DecimalDigits.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral . ExponentPart is the MV of DecimalIntegerLiteral × 10e, where e is the MV of ExponentPart.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart is (the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits × 10n)) × 10e, where n is the number of code points in DecimalDigits and e is the MV of ExponentPart.

  • The MV of DecimalLiteral :: . DecimalDigits is the MV of DecimalDigits × 10n, where n is the number of code points in DecimalDigits.

  • The MV of DecimalLiteral :: . DecimalDigits ExponentPart is the MV of DecimalDigits × 10en, where n is the number of code points in DecimalDigits and e is the MV of ExponentPart.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral is the MV of DecimalIntegerLiteral.

  • The MV of DecimalLiteral :: DecimalIntegerLiteral ExponentPart is the MV of DecimalIntegerLiteral × 10e, where e is the MV of ExponentPart.

  • The MV of DecimalIntegerLiteral :: 0 is 0.

  • The MV of DecimalIntegerLiteral :: NonZeroDigit is the MV of NonZeroDigit.

  • The MV of DecimalIntegerLiteral :: NonZeroDigit DecimalDigits is (the MV of NonZeroDigit × 10n) plus the MV of DecimalDigits, where n is the number of code points in DecimalDigits.

  • The MV of DecimalDigits :: DecimalDigit is the MV of DecimalDigit.

  • The MV of DecimalDigits :: DecimalDigits DecimalDigit is (the MV of DecimalDigits × 10) plus the MV of DecimalDigit.

  • The MV of ExponentPart :: ExponentIndicator SignedInteger is the MV of SignedInteger.

  • The MV of SignedInteger :: DecimalDigits is the MV of DecimalDigits.

  • The MV of SignedInteger :: + DecimalDigits is the MV of DecimalDigits.

  • The MV of SignedInteger :: - DecimalDigits is the negative of the MV of DecimalDigits.

  • The MV of DecimalDigit :: 0 or of HexDigit :: 0 or of OctalDigit :: 0 or of BinaryDigit :: 0 is 0.

  • The MV of DecimalDigit :: 1 or of NonZeroDigit :: 1 or of HexDigit :: 1 or of OctalDigit :: 1 or
    of BinaryDigit :: 1 is 1.

  • The MV of DecimalDigit :: 2 or of NonZeroDigit :: 2 or of HexDigit :: 2 or of OctalDigit :: 2 is 2.

  • The MV of DecimalDigit :: 3 or of NonZeroDigit :: 3 or of HexDigit :: 3 or of OctalDigit :: 3 is 3.

  • The MV of DecimalDigit :: 4 or of NonZeroDigit :: 4 or of HexDigit :: 4 or of OctalDigit :: 4 is 4.

  • The MV of DecimalDigit :: 5 or of NonZeroDigit :: 5 or of HexDigit :: 5 or of OctalDigit :: 5 is 5.

  • The MV of DecimalDigit :: 6 or of NonZeroDigit :: 6 or of HexDigit :: 6 or of OctalDigit :: 6 is 6.

  • The MV of DecimalDigit :: 7 or of NonZeroDigit :: 7 or of HexDigit :: 7 or of OctalDigit :: 7 is 7.

  • The MV of DecimalDigit :: 8 or of NonZeroDigit :: 8 or of HexDigit :: 8 is 8.

  • The MV of DecimalDigit :: 9 or of NonZeroDigit :: 9 or of HexDigit :: 9 is 9.

  • The MV of HexDigit :: a or of HexDigit :: A is 10.

  • The MV of HexDigit :: b or of HexDigit :: B is 11.

  • The MV of HexDigit :: c or of HexDigit :: C is 12.

  • The MV of HexDigit :: d or of HexDigit :: D is 13.

  • The MV of HexDigit :: e or of HexDigit :: E is 14.

  • The MV of HexDigit :: f or of HexDigit :: F is 15.

  • The MV of BinaryIntegerLiteral :: 0b BinaryDigits is the MV of BinaryDigits.

  • The MV of BinaryIntegerLiteral :: 0B BinaryDigits is the MV of BinaryDigits.

  • The MV of BinaryDigits :: BinaryDigit is the MV of BinaryDigit.

  • The MV of BinaryDigits :: BinaryDigits BinaryDigit is (the MV of BinaryDigits × 2) plus the MV of BinaryDigit.

  • The MV of OctalIntegerLiteral :: 0o OctalDigits is the MV of OctalDigits.

  • The MV of OctalIntegerLiteral :: 0O OctalDigits is the MV of OctalDigits.

  • The MV of OctalDigits :: OctalDigit is the MV of OctalDigit.

  • The MV of OctalDigits :: OctalDigits OctalDigit is (the MV of OctalDigits × 8) plus the MV of OctalDigit.

  • The MV of HexIntegerLiteral :: 0x HexDigits is the MV of HexDigits.

  • The MV of HexIntegerLiteral :: 0X HexDigits is the MV of HexDigits.

  • The MV of HexDigits :: HexDigit is the MV of HexDigit.

  • The MV of HexDigits :: HexDigits HexDigit is (the MV of HexDigits × 16) plus the MV of HexDigit.

Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number type. If the MV is 0, then the rounded value is +0; otherwise, the rounded value must be the Number value for the MV (as specified in 6.1.6), unless the literal is a DecimalLiteral and the literal has more than 20 significant digits, in which case the Number value may be either the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th significant digit position. A digit is significant if it is not part of an ExponentPart and

  • it is not 0; or
  • there is a nonzero digit to its left and there is a nonzero digit, not in the ExponentPart, to its right.

11.8.4 String Literals

NOTE 1 A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), U+2028 (LINE SEPARATOR), U+2029 (PARAGRAPH SEPARATOR), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in 10.1.1. Code points belonging to the Basic Multilingual Plane are encoded as a single code unit element of the string. All other code points are encoded as two code unit elements of the string.

Syntax

StringLiteral ::
" DoubleStringCharactersopt "
' SingleStringCharactersopt '
DoubleStringCharacters ::
DoubleStringCharacter DoubleStringCharactersopt
SingleStringCharacters ::
SingleStringCharacter SingleStringCharactersopt
DoubleStringCharacter ::
SourceCharacter but not one of " or \ or LineTerminator
\ EscapeSequence
LineContinuation
SingleStringCharacter ::
SourceCharacter but not one of ' or \ or LineTerminator
\ EscapeSequence
LineContinuation
LineContinuation ::
\ LineTerminatorSequence
EscapeSequence ::
CharacterEscapeSequence
0 [lookahead ∉ DecimalDigit]
HexEscapeSequence
UnicodeEscapeSequence

A conforming implementation, when processing strict mode code (see 10.2.1), must not extend the syntax of EscapeSequence to include LegacyOctalEscapeSequence as described in B.1.2.

CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
SingleEscapeCharacter :: one of
' " \ b f n r t v
NonEscapeCharacter ::
SourceCharacter but not one of EscapeCharacter or LineTerminator
EscapeCharacter ::
SingleEscapeCharacter
DecimalDigit
x
u
HexEscapeSequence ::
x HexDigit HexDigit
UnicodeEscapeSequence ::
u Hex4Digits
u{ HexDigits }
Hex4Digits ::
HexDigit HexDigit HexDigit HexDigit

The definition of the nonterminal HexDigit is given in 11.8.3. SourceCharacter is defined in 10.1.

NOTE 2 A line terminator code point cannot appear in a string literal, except as part of a LineContinuation to produce the empty code points sequence. The proper way to cause a line terminator code point to be part of the String value of a string literal is to use an escape sequence such as \n or \u000A.

11.8.4.1 Static Semantics: Early Errors

UnicodeEscapeSequence :: u{ HexDigits }
  • It is a Syntax Error if the MV of HexDigits > 1114111.

11.8.4.2 Static Semantics: StringValue

See also: 11.6.1.2, 12.1.4.

StringLiteral ::
" DoubleStringCharactersopt "
' SingleStringCharactersopt '
  1. Return the String value whose elements are the SV of this StringLiteral.

11.8.4.3 Static Semantics: SV

A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of code unit values contributed by the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a mathematical value (MV), as described below or in 11.8.3.

  • The SV of StringLiteral :: "" is the empty code unit sequence.

  • The SV of StringLiteral :: '' is the empty code unit sequence.

  • The SV of StringLiteral :: " DoubleStringCharacters " is the SV of DoubleStringCharacters.

  • The SV of StringLiteral :: ' SingleStringCharacters ' is the SV of SingleStringCharacters.

  • The SV of DoubleStringCharacters :: DoubleStringCharacter is a sequence of one or two code units that is the SV of DoubleStringCharacter.

  • The SV of DoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters is a sequence of one or two code units that is the SV of DoubleStringCharacter followed by all the code units in the SV of DoubleStringCharacters in order.

  • The SV of SingleStringCharacters :: SingleStringCharacter is a sequence of one or two code units that is the SV of SingleStringCharacter.

  • The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a sequence of one or two code units that is the SV of SingleStringCharacter followed by all the code units in the SV of SingleStringCharacters in order.

  • The SV of DoubleStringCharacter :: SourceCharacter but not one of " or \ or LineTerminator is the UTF16Encoding (10.1.1) of the code point value of SourceCharacter.

  • The SV of DoubleStringCharacter :: \ EscapeSequence is the SV of the EscapeSequence.

  • The SV of DoubleStringCharacter :: LineContinuation is the empty code unit sequence.

  • The SV of SingleStringCharacter :: SourceCharacter but not one of ' or \ or LineTerminator is the UTF16Encoding (10.1.1) of the code point value of SourceCharacter.

  • The SV of SingleStringCharacter :: \ EscapeSequence is the SV of the EscapeSequence.

  • The SV of SingleStringCharacter :: LineContinuation is the empty code unit sequence.

  • The SV of EscapeSequence :: CharacterEscapeSequence is the SV of the CharacterEscapeSequence.

  • The SV of EscapeSequence :: 0 is the code unit value 0.

  • The SV of EscapeSequence :: HexEscapeSequence is the SV of the HexEscapeSequence.

  • The SV of EscapeSequence :: UnicodeEscapeSequence is the SV of the UnicodeEscapeSequence.

  • The SV of CharacterEscapeSequence :: SingleEscapeCharacter is the code unit whose value is determined by the SingleEscapeCharacter according to { REF _Ref365803173 \h }Table 34.

Table 34 — String Single Character Escape Sequences
Escape Sequence Code Unit Value Unicode Character Name Symbol
\b 0x0008 BACKSPACE <BS>
\t 0x0009 CHARACTER TABULATION <HT>
\n 0x000A LINE FEED (LF) <LF>
\v 0x000B LINE TABULATION <VT>
\f 0x000C FORM FEED (FF) <FF>
\r 0x000D CARRIAGE RETURN (CR) <CR>
\" 0x0022 QUOTATION MARK "
\' 0x0027 APOSTROPHE '
\\ 0x005C REVERSE SOLIDUS \
  • The SV of CharacterEscapeSequence :: NonEscapeCharacter is the SV of the NonEscapeCharacter.

  • The SV of NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator is the UTF16Encoding (10.1.1) of the code point value of SourceCharacter.

  • The SV of HexEscapeSequence :: x HexDigit HexDigit is the code unit value that is (16 times the MV of the first HexDigit) plus the MV of the second HexDigit.

  • The SV of UnicodeEscapeSequence :: u Hex4Digits is the SV of Hex4Digits.

  • The SV of Hex4Digits :: HexDigit HexDigit HexDigit HexDigit is the code unit value that is (4096 times the MV of the first HexDigit) plus (256 times the MV of the second HexDigit) plus (16 times the MV of the third HexDigit) plus the MV of the fourth HexDigit.

  • The SV of UnicodeEscapeSequence :: u{ HexDigits } is the UTF16Encoding (10.1.1) of the MV of HexDigits.

11.8.5 Regular Expression Literals

NOTE 1 A regular expression literal is an input element that is converted to a RegExp object (see 21.2) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp or calling the RegExp constructor as a function (see 21.2.3).

The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the RegularExpressionBody and the RegularExpressionFlags are subsequently parsed again using the more stringent ECMAScript Regular Expression grammar (21.2.1).

An implementation may extend the ECMAScript Regular Expression grammar defined in 21.2.1, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions.

Syntax

RegularExpressionLiteral ::
/ RegularExpressionBody / RegularExpressionFlags
RegularExpressionBody ::
RegularExpressionFirstChar RegularExpressionChars
RegularExpressionChars ::
[empty]
RegularExpressionChars RegularExpressionChar
RegularExpressionFirstChar ::
RegularExpressionNonTerminator but not one of * or \ or / or [
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionChar ::
RegularExpressionNonTerminator but not one of \ or / or [
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionBackslashSequence ::
\ RegularExpressionNonTerminator
RegularExpressionNonTerminator ::
SourceCharacter but not LineTerminator
RegularExpressionClass ::
[ RegularExpressionClassChars ]
RegularExpressionClassChars ::
[empty]
RegularExpressionClassChars RegularExpressionClassChar
RegularExpressionClassChar ::
RegularExpressionNonTerminator but not one of ] or \
RegularExpressionBackslashSequence
RegularExpressionFlags ::
[empty]
RegularExpressionFlags IdentifierPart

NOTE 2 Regular expression literals may not be empty; instead of representing an empty regular expression literal, the code unit sequence // starts a single-line comment. To specify an empty regular expression, use: /(?:)/.

11.8.5.1 Static Semantics: Early Errors

RegularExpressionFlags :: RegularExpressionFlags IdentifierPart
  • It is a Syntax Error if IdentifierPart contains a Unicode escape sequence.

11.8.5.2 Static Semantics: BodyText

RegularExpressionLiteral :: / RegularExpressionBody / RegularExpressionFlags
  1. Return the source text that was recognized as RegularExpressionBody.

11.8.5.3 Static Semantics: FlagText

RegularExpressionLiteral :: / RegularExpressionBody / RegularExpressionFlags
  1. Return the source text that was recognized as RegularExpressionFlags.

11.8.6 Template Literal Lexical Components

Syntax

Template ::
NoSubstitutionTemplate
TemplateHead
NoSubstitutionTemplate ::
` TemplateCharactersopt `
TemplateHead ::
` TemplateCharactersopt ${
TemplateSubstitutionTail ::
TemplateMiddle
TemplateTail
TemplateMiddle ::
} TemplateCharactersopt ${
TemplateTail ::
} TemplateCharactersopt `
TemplateCharacters ::
TemplateCharacter TemplateCharactersopt
TemplateCharacter ::
$ [lookahead ≠ { ]
\ EscapeSequence
LineContinuation
LineTerminatorSequence
SourceCharacter but not one of ` or \ or $ or LineTerminator

A conforming implementation must not use the extended definition of EscapeSequence described in B.1.2 when parsing a TemplateCharacter.

NOTE TemplateSubstitutionTail is used by the InputElementTemplateTail alternative lexical goal.

11.8.6.1 Static Semantics: TV and TRV

A template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV, 11.8.4) contributed by the various parts of the template literal component. As part of this process, some Unicode code points within the template component are interpreted as having a mathematical value (MV, 11.8.3). In determining a TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence. The Template Raw Value (TRV) is similar to a Template Value with the difference that in TRVs escape sequences are interpreted literally.

  • The TV and TRV of NoSubstitutionTemplate :: `` is the empty code unit sequence.

  • The TV and TRV of TemplateHead :: `${ is the empty code unit sequence.

  • The TV and TRV of TemplateMiddle :: }${ is the empty code unit sequence.

  • The TV and TRV of TemplateTail :: }` is the empty code unit sequence.

  • The TV of NoSubstitutionTemplate :: ` TemplateCharacters ` is the TV of TemplateCharacters.

  • The TV of TemplateHead :: ` TemplateCharacters ${ is the TV of TemplateCharacters.

  • The TV of TemplateMiddle :: } TemplateCharacters ${ is the TV of TemplateCharacters.

  • The TV of TemplateTail :: } TemplateCharacters ` is the TV of TemplateCharacters.

  • The TV of TemplateCharacters :: TemplateCharacter is the TV of TemplateCharacter.

  • The TV of TemplateCharacters :: TemplateCharacter TemplateCharacters is a sequence consisting of the code units in the TV of TemplateCharacter followed by all the code units in the TV of TemplateCharacters in order.

  • The TV of TemplateCharacter :: SourceCharacter but not one of ` or \ or $ or LineTerminator is the UTF16Encoding (10.1.1) of the code point value of SourceCharacter.

  • The TV of TemplateCharacter :: $ is the code unit value 0x0024.

  • The TV of TemplateCharacter :: \ EscapeSequence is the SV of EscapeSequence.

  • The TV of TemplateCharacter :: LineContinuation is the TV of LineContinuation.

  • The TV of TemplateCharacter :: LineTerminatorSequence is the TRV of LineTerminatorSequence.

  • The TV of LineContinuation :: \ LineTerminatorSequence is the empty code unit sequence.

  • The TRV of NoSubstitutionTemplate :: ` TemplateCharacters ` is the TRV of TemplateCharacters.

  • The TRV of TemplateHead :: ` TemplateCharacters ${ is the TRV of TemplateCharacters.

  • The TRV of TemplateMiddle :: } TemplateCharacters ${ is the TRV of TemplateCharacters.

  • The TRV of TemplateTail :: } TemplateCharacters ` is the TRV of TemplateCharacters.

  • The TRV of TemplateCharacters :: TemplateCharacter is the TRV of TemplateCharacter.

  • The TRV of TemplateCharacters :: TemplateCharacter TemplateCharacters is a sequence consisting of the code units in the TRV of TemplateCharacter followed by all the code units in the TRV of TemplateCharacters, in order.

  • The TRV of TemplateCharacter :: SourceCharacter but not one of ` or \ or $ or LineTerminator is the UTF16Encoding (10.1.1) of the code point value of SourceCharacter.

  • The TRV of TemplateCharacter :: $ is the code unit value 0x0024.

  • The TRV of TemplateCharacter :: \ EscapeSequence is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of EscapeSequence.

  • The TRV of TemplateCharacter :: LineContinuation is the TRV of LineContinuation.

  • The TRV of TemplateCharacter :: LineTerminatorSequence is the TRV of LineTerminatorSequence.

  • The TRV of EscapeSequence :: CharacterEscapeSequence is the TRV of the CharacterEscapeSequence.

  • The TRV of EscapeSequence :: 0 is the code unit value 0x0030.

  • The TRV of EscapeSequence :: HexEscapeSequence is the TRV of the HexEscapeSequence.

  • The TRV of EscapeSequence :: UnicodeEscapeSequence is the TRV of the UnicodeEscapeSequence.

  • The TRV of CharacterEscapeSequence :: SingleEscapeCharacter is the TRV of the SingleEscapeCharacter.

  • The TRV of CharacterEscapeSequence :: NonEscapeCharacter is the SV of the NonEscapeCharacter.

  • The TRV of SingleEscapeCharacter :: one of ' " \ b f n r t v is the SV of the SourceCharacter that is that single code point.

  • The TRV of HexEscapeSequence :: x HexDigit HexDigit is the sequence consisting of code unit value 0x0078 followed by TRV of the first HexDigit followed by the TRV of the second HexDigit.

  • The TRV of UnicodeEscapeSequence :: u Hex4Digits is the sequence consisting of code unit value 0x0075 followed by TRV of Hex4Digits.

  • The TRV of UnicodeEscapeSequence :: u{ HexDigits } is the sequence consisting of code unit value 0x0075 followed by code unit value 0x007B followed by TRV of HexDigits followed by code unit value 0x007D.

  • The TRV of Hex4Digits :: HexDigit HexDigit HexDigit HexDigit is the sequence consisting of the TRV of the first HexDigit followed by the TRV of the second HexDigit followed by the TRV of the third HexDigit followed by the TRV of the fourth HexDigit.

  • The TRV of HexDigits :: HexDigit is the TRV of HexDigit.

  • The TRV of HexDigits :: HexDigits HexDigit is the sequence consisting of TRV of HexDigits followed by TRV of HexDigit.

  • The TRV of a HexDigit is the SV of the SourceCharacter that is that HexDigit.

  • The TRV of LineContinuation :: \ LineTerminatorSequence is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of LineTerminatorSequence.

  • The TRV of LineTerminatorSequence :: <LF> is the code unit value 0x000A.

  • The TRV of LineTerminatorSequence :: <CR> is the code unit value 0x000A.

  • The TRV of LineTerminatorSequence :: <LS> is the code unit value 0x2028.

  • The TRV of LineTerminatorSequence :: <PS> is the code unit value 0x2029.

  • The TRV of LineTerminatorSequence :: <CR><LF> is the sequence consisting of the code unit value 0x000A.

NOTE TV excludes the code units of LineContinuation while TRV includes them. <CR><LF> and <CR> LineTerminatorSequences are normalized to <LF> for both TV and TRV. An explicit EscapeSequence is needed to include a <CR> or <CR><LF> sequence.

11.9 Automatic Semicolon Insertion

Certain ECMAScript statements (empty statement, let, const, import, and export declarations, variable statement, expression statement, debugger statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.

11.9.1 Rules of Automatic Semicolon Insertion

In the following rules, “token” means the actual recognized lexical token determined using the current lexical goal symbol as described in clause 11.

There are three basic rules of semicolon insertion:

  1. When, as a Script or Module is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
    • The offending token is separated from the previous token by at least one LineTerminator.

    • The offending token is }.

    • The previous token is ) and the inserted semicolon would then be parsed as the terminating semicolon of a do-while statement (13.7.2).

  2. When, as the Script or Module is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Script or Module, then a semicolon is automatically inserted at the end of the input stream.
  3. When, as the Script or Module is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation [no LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.

However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 13.7.4).

NOTE The following are the only restricted productions in the grammar:

PostfixExpression[Yield] :
LeftHandSideExpression[?Yield] [no LineTerminator here] ++
LeftHandSideExpression[?Yield] [no LineTerminator here] --
ContinueStatement[Yield] :
continue;
continue [no LineTerminator here] LabelIdentifier[?Yield] ;
BreakStatement[Yield] :
break ;
break [no LineTerminator here] LabelIdentifier[?Yield] ;
ReturnStatement[Yield] :
return [no LineTerminator here] Expression ;
return [no LineTerminator here] Expression[In, ?Yield] ;
ThrowStatement[Yield] :
throw [no LineTerminator here] Expression[In, ?Yield] ;
ArrowFunction[In, Yield] :
ArrowParameters[?Yield] [no LineTerminator here] => ConciseBody[?In]
YieldExpression[In] :
yield [no LineTerminator here] * AssignmentExpression[?In, Yield]
yield [no LineTerminator here] AssignmentExpression[?In, Yield]

The practical effect of these restricted productions is as follows:

  • When a ++ or -- token is encountered where the parser would treat it as a postfix operator, and at least one LineTerminator occurred between the preceding token and the ++ or -- token, then a semicolon is automatically inserted before the ++ or -- token.

  • When a continue, break, return, throw, or yield token is encountered and a LineTerminator is encountered before the next token, a semicolon is automatically inserted after the continue, break, return, throw, or yield token.

The resulting practical advice to ECMAScript programmers is:

  • A postfix ++ or -- operator should appear on the same line as its operand.

  • An Expression in a return or throw statement or an AssignmentExpression in a yield expression should start on the same line as the return, throw, or yield token.

  • An IdentifierReference in a break or continue statement should be on the same line as the break or continue token.

11.9.2 Examples of Automatic Semicolon Insertion

The source

{ 1 2 } 3

is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source

{ 1
2 } 3

is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:

{ 1
;2 ;} 3;

which is a valid ECMAScript sentence.

The source

for (a; b
)

is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the semicolon is needed for the header of a for statement. Automatic semicolon insertion never inserts one of the two semicolons in the header of a for statement.

The source

return
a + b

is transformed by automatic semicolon insertion into the following:

return;
a + b;

NOTE 1 The expression a + b is not treated as a value to be returned by the return statement, because a LineTerminator separates it from the token return.

The source

a = b
++c

is transformed by automatic semicolon insertion into the following:

a = b;
++c;

NOTE 2 The token ++ is not treated as a postfix operator applying to the variable b, because a LineTerminator occurs between b and ++.

The source

if (a > b)
else c = d

is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would then be parsed as an empty statement.

The source

a = b + c
(d + e).print()

is not transformed by automatic semicolon insertion, because the parenthesized expression that begins the second line can be interpreted as an argument list for a function call:

a = b + c(d + e).print()

In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.