21 Text Processing

21.1 String Objects

21.1.1 The String Constructor

The String constructor is the %String% intrinsic object and the initial value of the String property of the global object. When called as a constructor it creates and initializes a new String object. When String is called as a function rather than as a constructor, it performs a type conversion.

The String constructor is designed to be subclassable. It may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified String behaviour must include a super call to the String constructor to create and initialize the subclass instance with a [[StringData]] internal slot.

21.1.1.1 String ( value )

When String is called with argument value, the following steps are taken:

  1. If no arguments were passed to this function invocation, let s be "".
  2. Else,
    1. If NewTarget is undefined and Type(value) is Symbol, return SymbolDescriptiveString(value).
    2. Let s be ToString(value).
  3. ReturnIfAbrupt(s).
  4. If NewTarget is undefined, return s.
  5. Return StringCreate(s, GetPrototypeFromConstructor(NewTarget, "%StringPrototype%")).

The length property of the String function is 1.

21.1.2 Properties of the String Constructor

The value of the [[Prototype]] internal slot of the String constructor is the intrinsic object %FunctionPrototype% (19.2.3).

Besides the length property (whose value is 1), the String constructor has the following properties:

21.1.2.1 String.fromCharCode ( ...codeUnits )

The String.fromCharCode function may be called with any number of arguments which form the rest parameter codeUnits. The following steps are taken:

  1. Let codeUnits be a List containing the arguments passed to this function.
  2. Let length be the number of elements in codeUnits.
  3. Let elements be a new List.
  4. Let nextIndex be 0.
  5. Repeat while nextIndex < length
    1. Let next be codeUnits[nextIndex].
    2. Let nextCU be ToUint16(next).
    3. ReturnIfAbrupt(nextCU).
    4. Append nextCU to the end of elements.
    5. Let nextIndex be nextIndex + 1.
  6. Return the String value whose elements are, in order, the elements in the List elements. If length is 0, the empty string is returned.

The length property of the fromCharCode function is 1.

21.1.2.2 String.fromCodePoint ( ...codePoints )

The String.fromCodePoint function may be called with any number of arguments which form the rest parameter codePoints. The following steps are taken:

  1. Let codePoints be a List containing the arguments passed to this function.
  2. Let length be the number of elements in codePoints.
  3. Let elements be a new List.
  4. Let nextIndex be 0.
  5. Repeat while nextIndex < length
    1. Let next be codePoints[nextIndex].
    2. Let nextCP be ToNumber(next).
    3. ReturnIfAbrupt(nextCP).
    4. If SameValue(nextCP, ToInteger(nextCP)) is false, throw a RangeError exception.
    5. If nextCP < 0 or nextCP > 0x10FFFF, throw a RangeError exception.
    6. Append the elements of the UTF16Encoding (10.1.1) of nextCP to the end of elements.
    7. Let nextIndex be nextIndex + 1.
  6. Return the String value whose elements are, in order, the elements in the List elements. If length is 0, the empty string is returned.

The length property of the fromCodePoint function is 1.

21.1.2.3 String.prototype

The initial value of String.prototype is the intrinsic object %StringPrototype% (21.1.3).

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

21.1.2.4 String.raw ( template , ...substitutions )

The String.raw function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the List substitutions. The following steps are taken:

  1. Let substitutions be a List consisting of all of the arguments passed to this function, starting with the second argument. If fewer than two arguments were passed, the List is empty.
  2. Let numberOfSubstitutions be the number of elements in substitutions.
  3. Let cooked be ToObject(template).
  4. ReturnIfAbrupt(cooked).
  5. Let raw be ToObject(Get(cooked, "raw")).
  6. ReturnIfAbrupt(raw).
  7. Let literalSegments be ToLength(Get(raw, "length")).
  8. ReturnIfAbrupt(literalSegments).
  9. If literalSegments ≤ 0, return the empty string.
  10. Let stringElements be a new List.
  11. Let nextIndex be 0.
  12. Repeat
    1. Let nextKey be ToString(nextIndex).
    2. Let nextSeg be ToString(Get(raw, nextKey)).
    3. ReturnIfAbrupt(nextSeg).
    4. Append in order the code unit elements of nextSeg to the end of stringElements.
    5. If nextIndex + 1 = literalSegments, then
      1. Return the String value whose code units are, in order, the elements in the List stringElements. If stringElements has no elements, the empty string is returned.
    6. If nextIndex < numberOfSubstitutions, let next be substitutions[nextIndex].
    7. Else, let next be the empty String.
    8. Let nextSub be ToString(next).
    9. ReturnIfAbrupt(nextSub).
    10. Append in order the code unit elements of nextSub to the end of stringElements.
    11. Let nextIndex be nextIndex + 1.

The length property of the raw function is 1.

NOTE String.raw is intended for use as a tag function of a Tagged Template (12.3.7). When called as such, the first argument will be a well formed template object and the rest parameter will contain the substitution values.

21.1.3 Properties of the String Prototype Object

The String prototype object is the intrinsic object %StringPrototype%. The String prototype object is itself an ordinary object. It is not a String instance and does not have a [[StringData]] internal slot.

The value of the [[Prototype]] internal slot of the String prototype object is the intrinsic object %ObjectPrototype% (19.1.3).

Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the this value passed to them must be either a String value or an object that has a [[StringData]] internal slot that has been initialized to a String value.

The abstract operation thisStringValue(value) performs the following steps:

  1. If Type(value) is String, return value.
  2. If Type(value) is Object and value has a [[StringData]] internal slot, then
    1. Assert: value's [[StringData]] internal slot is a String value.
    2. Return the value of value’s [[StringData]] internal slot.
  3. Throw a TypeError exception.

The phrase “this String value” within the specification of a method refers to the result returned by calling the abstract operation thisStringValue with the this value of the method invocation passed as the argument.

21.1.3.1 String.prototype.charAt ( pos )

NOTE 1 Returns a single element String containing the code unit at index pos in the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result is a String value, not a String object.

If pos is a value of Number type that is an integer, then the result of x.charAt(pos) is equal to the result of x.substring(pos, pos+1).

When the charAt method is called with one argument pos, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let position be ToInteger(pos).
  5. ReturnIfAbrupt(position).
  6. Let size be the number of elements in S.
  7. If position < 0 or positionsize, return the empty String.
  8. Return a String of length 1, containing one code unit from S, namely the code unit at index position.

NOTE 2 The charAt function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.2 String.prototype.charCodeAt ( pos )

NOTE 1 Returns a Number (a nonnegative integer less than 216) that is the code unit value of the string element at index pos in the String resulting from converting this object to a String. If there is no element at that index, the result is NaN.

When the charCodeAt method is called with one argument pos, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let position be ToInteger(pos).
  5. ReturnIfAbrupt(position).
  6. Let size be the number of elements in S.
  7. If position < 0 or positionsize, return NaN.
  8. Return a value of Number type, whose value is the code unit value of the element at index position in the String S.

NOTE 2 The charCodeAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

21.1.3.3 String.prototype.codePointAt ( pos )

NOTE 1 Returns a nonnegative integer Number less than 1114112 (0x110000) that is the code point value of the UTF-16 encoded code point (6.1.4) starting at the string element at index pos in the String resulting from converting this object to a String. If there is no element at that index, the result is undefined. If a valid UTF-16 surrogate pair does not begin at pos, the result is the code unit at pos.

When the codePointAt method is called with one argument pos, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let position be ToInteger(pos).
  5. ReturnIfAbrupt(position).
  6. Let size be the number of elements in S.
  7. If position < 0 or positionsize, return undefined.
  8. Let first be the code unit value of the element at index position in the String S.
  9. If first < 0xD800 or first > 0xDBFF or position+1 = size, return first.
  10. Let second be the code unit value of the element at index position+1 in the String S.
  11. If second < 0xDC00 or second > 0xDFFF, return first.
  12. Return UTF16Decode(first, second).

NOTE 2 The codePointAt function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

21.1.3.4 String.prototype.concat ( ...args )

NOTE 1 When the concat method is called it returns a String consisting of the code units of the this object (converted to a String) followed by the code units of each of the arguments converted to a String. The result is a String value, not a String object.

When the concat method is called with zero or more arguments the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let args be a List whose elements are the arguments passed to this function.
  5. Let R be S.
  6. Repeat, while args is not empty
    1. Remove the first element from args and let next be the value of that element.
    2. Let nextString be ToString(next).
    3. ReturnIfAbrupt(nextString).
    4. Let R be the String value consisting of the code units of the previous value of R followed by the code units of nextString.
  7. Return R.

The length property of the concat method is 1.

NOTE 2 The concat function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

21.1.3.5 String.prototype.constructor

The initial value of String.prototype.constructor is the intrinsic object %String%.

21.1.3.6 String.prototype.endsWith ( searchString [ , endPosition] )

The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let isRegExp be IsRegExp(searchString).
  5. ReturnIfAbrupt(isRegExp).
  6. If isRegExp is true, throw a TypeError exception.
  7. Let searchStr be ToString(searchString).
  8. ReturnIfAbrupt(searchStr).
  9. Let len be the number of elements in S.
  10. If endPosition is undefined, let pos be len, else let pos be ToInteger(endPosition).
  11. ReturnIfAbrupt(pos).
  12. Let end be min(max(pos, 0), len).
  13. Let searchLength be the number of elements in searchStr.
  14. Let start be end - searchLength.
  15. If start is less than 0, return false.
  16. If the sequence of elements of S starting at start of length searchLength is the same as the full element sequence of searchStr, return true.
  17. Otherwise, return false.

The length property of the endsWith method is 1.

NOTE 1 Returns true if the sequence of elements of searchString converted to a String is the same as the corresponding elements of this object (converted to a String) starting at endPosition – length(this). Otherwise returns false.

NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

NOTE 3 The endsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.7 String.prototype.includes ( searchString [ , position ] )

The includes method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let isRegExp be IsRegExp(searchString).
  5. ReturnIfAbrupt(isRegExp).
  6. If isRegExp is true, throw a TypeError exception.
  7. Let searchStr be ToString(searchString).
  8. ReturnIfAbrupt(searchStr).
  9. Let pos be ToInteger(position). (If position is undefined, this step produces the value 0).
  10. ReturnIfAbrupt(pos).
  11. Let len be the number of elements in S.
  12. Let start be min(max(pos, 0), len).
  13. Let searchLen be the number of elements in searchStr.
  14. If there exists any integer k not smaller than start such that k + searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr, return true; but if there is no such integer k, return false.

The length property of the includes method is 1.

NOTE 1 If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, return true; otherwise, returns false. If position is undefined, 0 is assumed, so as to search all of the String.

NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

NOTE 3 The includes function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.8 String.prototype.indexOf ( searchString [ , position ] )

NOTE 1 If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, then the smallest such index is returned; otherwise, ‑1 is returned. If position is undefined, 0 is assumed, so as to search all of the String.

The indexOf method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let searchStr be ToString(searchString).
  5. ReturnIfAbrupt(searchStr).
  6. Let pos be ToInteger(position). (If position is undefined, this step produces the value 0).
  7. ReturnIfAbrupt(pos).
  8. Let len be the number of elements in S.
  9. Let start be min(max(pos, 0), len).
  10. Let searchLen be the number of elements in searchStr.
  11. Return the smallest possible integer k not smaller than start such that k+ searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr; but if there is no such integer k, return the value -1.

The length property of the indexOf method is 1.

NOTE 2 The indexOf function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.9 String.prototype.lastIndexOf ( searchString [ , position ] )

NOTE 1 If searchString appears as a substring of the result of converting this object to a String at one or more indices that are smaller than or equal to position, then the greatest such index is returned; otherwise, ‑1 is returned. If position is undefined, the length of the String value is assumed, so as to search all of the String.

The lastIndexOf method takes two arguments, searchString and position, and performs the following steps:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let searchStr be ToString(searchString).
  5. ReturnIfAbrupt(searchString).
  6. Let numPos be ToNumber(position). (If position is undefined, this step produces the value NaN).
  7. ReturnIfAbrupt(numPos).
  8. If numPos is NaN, let pos be +∞; otherwise, let pos be ToInteger(numPos).
  9. Let len be the number of elements in S.
  10. Let start be min(max(pos, 0), len).
  11. Let searchLen be the number of elements in searchStr.
  12. Return the largest possible nonnegative integer k not larger than start such that k+ searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr; but if there is no such integer k, return the value -1.

The length property of the lastIndexOf method is 1.

NOTE 2 The lastIndexOf function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.10 String.prototype.localeCompare ( that [, reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the localeCompare method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the localeCompare method is used.

When the localeCompare method is called with argument that, it returns a Number other than NaN that represents the result of a locale-sensitive String comparison of the this value (converted to a String) with that (converted to a String). The two Strings are S and That. The two Strings are compared in an implementation-defined fashion. The result is intended to order String values in the sort order specified by a host default locale, and will be negative, zero, or positive, depending on whether S comes before That in the sort order, the Strings are equal, or S comes after That in the sort order, respectively.

Before performing the comparisons, the following steps are performed to prepare the Strings:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let That be ToString(that).
  5. ReturnIfAbrupt(That).

The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.

The localeCompare method, if considered as a function of two arguments this and that, is a consistent comparison function (as defined in 22.1.3.24) on the set of all Strings.

The actual return values are implementation-defined to permit implementers to encode additional information in the value, but the function is required to define a total ordering on all Strings. This function must treat Strings that are canonically equivalent according to the Unicode standard as identical and must return 0 when comparing Strings that are considered canonically equivalent.

The length property of the localeCompare method is 1.

NOTE 1 The localeCompare method itself is not directly suitable as an argument to Array.prototype.sort because the latter requires a function of two arguments.

NOTE 2 This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the rules of the host environment's current locale. However, regardless of the host provided comparison capabilities, this function must treat Strings that are canonically equivalent according to the Unicode standard as identical. It is recommended that this function should not honour Unicode compatibility equivalences or decompositions. For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms (http://www.unicode.org/reports/tr15/) and Unicode Technical Note #5, Canonical Equivalence in Applications (http://www.unicode.org/notes/tn5/). Also see Unicode Technical Standard #10, Unicode Collation Algorithm (http://www.unicode.org/reports/tr10/).

NOTE 3 The localeCompare function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.11 String.prototype.match ( regexp )

When the match method is called with argument regexp, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. ReturnIfAbrupt(O).
  3. If regexp is neither undefined nor null, then
    1. Let matcher be GetMethod(regexp, @@match).
    2. ReturnIfAbrupt(matcher).
    3. If matcher is not undefined, then
      1. Return Call(matcher, regexp, «‍O»).
  4. Let S be ToString(O).
  5. ReturnIfAbrupt(S).
  6. Let rx be RegExpCreate(regexp, undefined) (see 21.2.3.2.3).
  7. ReturnIfAbrupt(rx).
  8. Return Invoke(rx, @@match, «‍S»).

NOTE The match function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.12 String.prototype.normalize ( [ form ] )

When the normalize method is called with one argument form, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. If form is not provided or form is undefined, let form be "NFC".
  5. Let f be ToString(form).
  6. ReturnIfAbrupt(f).
  7. If f is not one of "NFC", "NFD", "NFKC", or "NFKD", throw a RangeError exception.
  8. Let ns be the String value that is the result of normalizing S into the normalization form named by f as specified in http://www.unicode.org/reports/tr15/tr15-29.html.
  9. Return ns.

The length property of the normalize method is 0.

NOTE The normalize function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

21.1.3.13 String.prototype.repeat ( count )

The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let n be ToInteger(count).
  5. ReturnIfAbrupt(n).
  6. If n < 0, throw a RangeError exception.
  7. If n is +∞, throw a RangeError exception.
  8. Let T be a String value that is made from n copies of S appended together. If n is 0, T is the empty String.
  9. Return T.

NOTE 1 This method creates a String consisting of the code units of the this object (converted to String) repeated count times.

NOTE 2 The repeat function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.14 String.prototype.replace (searchValue, replaceValue )

When the replace method is called with arguments searchValue and replaceValue the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. ReturnIfAbrupt(O).
  3. If searchValue is neither undefined nor null, then
    1. Let replacer be GetMethod(searchValue, @@replace).
    2. ReturnIfAbrupt(replacer).
    3. If replacer is not undefined, then
      1. Return Call(replacer, searchValue, «O, replaceValue»).
  4. Let string be ToString(O).
  5. ReturnIfAbrupt(string).
  6. Let searchString be ToString(searchValue).
  7. ReturnIfAbrupt(searchString).
  8. Let functionalReplace be IsCallable(replaceValue).
  9. If functionalReplace is false, then
    1. Let replaceValue be ToString(replaceValue).
    2. ReturnIfAbrupt(replaceValue).
  10. Search string for the first occurrence of searchString and let pos be the index within string of the first code unit of the matched substring and let matched be searchString. If no occurrences of searchString were found, return string.
  11. If functionalReplace is true, then
    1. Let replValue be Call(replaceValue, undefinedmatched, pos, and string»).
    2. Let replStr be ToString(replValue).
    3. ReturnIfAbrupt(replStr).
  12. Else,
    1. Let captures be an empty List.
    2. Let replStr be GetSubstitution(matched, string, pos, captures, replaceValue).
  13. Let tailPos be pos + the number of code units in matched.
  14. Let newString be the String formed by concatenating the first pos code units of string, replStr, and the trailing substring of string starting at index tailPos. If pos is 0, the first element of the concatenation will be the empty String.
  15. Return newString.

NOTE The replace function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.14.1 Runtime Semantics: GetSubstitution(matched, str, position, captures, replacement)

The abstract operation GetSubstitution performs the following steps:

  1. Assert: Type(matched) is String.
  2. Let matchLength be the number of code units in matched.
  3. Assert: Type(str) is String.
  4. Let stringLength be the number of code units in str.
  5. Assert: position is a nonnegative integer.
  6. Assert: positionstringLength.
  7. Assert: captures is a possibly empty List of Strings.
  8. Assert: Type( replacement) is String
  9. Let tailPos be position + matchLength.
  10. Let m be the number of elements in captures.
  11. Let result be a String value derived from replacement by copying code unit elements from replacement to result while performing replacements as specified in Table 45. These $ replacements are done left-to-right, and, once such a replacement is performed, the new replacement text is not subject to further replacements.
  12. Return result.
Table 45 — Replacement Text Symbol Substitutions
Code units Unicode Characters Replacement text
0x0024, 0x0024 $$ $
0x0024, 0x0026 $& matched
0x0024, 0x0060 $` If position is 0, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index 0 and whose last code unit is at index position -1.
0x0024, 0x0027 $' If tailPosstringLength, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index tailPos and continues to the end of str.
0x0024, N
Where
0x0031 ≤ N ≤ 0x0039
$n where
n
is one of 1 2 3 4 5 6 7 8 9 and $n is not followed by a decimal digit
The nth element of captures, where n is a single digit in the range 1 to 9. If nm and the nth element of captures is undefined, use the empty String instead. If n>m, the result is implementation-defined.
0x0024, N, N
Where
0x0030 ≤ N ≤ 0x0039
$nn where
n
is one of 0 1 2 3 4 5 6 7 8 9
The nnth element of captures, where nn is a two-digit decimal number in the range 01 to 99. If nnm and the nnth element of captures is undefined, use the empty String instead. If nn is 00 or nn>m, the result is implementation-defined.
0x0024 $ in any context that does not match any of the above. $

21.1.3.15 String.prototype.search ( regexp )

When the search method is called with argument regexp, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. ReturnIfAbrupt(O).
  3. If regexp is neither undefined nor null, then
    1. Let searcher be GetMethod(regexp, @@search).
    2. ReturnIfAbrupt(searcher).
    3. If searcher is not undefined , then
      1. Return Call(searcher, regexp, «O»)
  4. Let string be ToString(O).
  5. ReturnIfAbrupt(string).
  6. Let rx be RegExpCreate(regexp, undefined) (see 21.2.3.2.3).
  7. ReturnIfAbrupt(rx).
  8. Return Invoke(rx, @@search, «‍string»).

NOTE The search function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.16 String.prototype.slice ( start, end )

The slice method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end (or through the end of the String if end is undefined). If start is negative, it is treated as sourceLength+start where sourceLength is the length of the String. If end is negative, it is treated as sourceLength+end where sourceLength is the length of the String. The result is a String value, not a String object. The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let len be the number of elements in S.
  5. Let intStart be ToInteger(start).
  6. ReturnIfAbrupt(intStart).
  7. If end is undefined, let intEnd be len; else let intEnd be ToInteger(end).
  8. ReturnIfAbrupt(intEnd).
  9. If intStart < 0, let from be max(len + intStart,0); otherwise let from be min(intStart, len).
  10. If intEnd < 0, let to be max(len + intEnd,0); otherwise let to be min(intEnd, len).
  11. Let span be max(tofrom,0).
  12. Return a String value containing span consecutive elements from S beginning with the element at index from.

The length property of the slice method is 2.

NOTE The slice function is intentionally generic; it does not require that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.

21.1.3.17 String.prototype.split ( separator, limit )

Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any substring in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as an RegExp, that has a @@split method.

When the split method is called, the following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. ReturnIfAbrupt(O).
  3. If separator is neither undefined nor null, then
    1. Let splitter be GetMethod(separator, @@split).
    2. ReturnIfAbrupt(splitter).
    3. If splitter is not undefined , then
      1. Return Call(splitter, separator, «‍O, limit»).
  4. Let S be ToString(O).
  5. ReturnIfAbrupt(S).
  6. Let A be ArrayCreate(0).
  7. Let lengthA be 0.
  8. If limit is undefined, let lim = 253–1; else let lim = ToLength(limit).
  9. ReturnIfAbrupt(lim).
  10. Let s be the number of elements in S.
  11. Let p = 0.
  12. Let R be ToString(separator).
  13. ReturnIfAbrupt(R).
  14. If lim = 0, return A.
  15. If separator is undefined, then
    1. Perform CreateDataProperty(A, "0", S).
    2. Assert: The above call will never result in an abrupt completion.
    3. Return A.
  16. If s = 0, then
    1. Let z be SplitMatch(S, 0, R).
    2. If z is not false, return A.
    3. Perform CreateDataProperty(A, "0", S).
    4. Assert: The above call will never result in an abrupt completion.
    5. Return A.
  17. Let q = p.
  18. Repeat, while qs
    1. Let e be SplitMatch(S, q, R).
    2. If e is false, let q = q+1.
    3. Else e is an integer index into S,
      1. If e = p, let q = q+1.
      2. Else ep,
        1. Let T be a String value equal to the substring of S consisting of the code units at indices p (inclusive) through q (exclusive).
        2. Perform CreateDataProperty(A, ToString(lengthA), T).
        3. Assert: The above call will never result in an abrupt completion.
        4. Increment lengthA by 1.
        5. If lengthA = lim, return A.
        6. Let p = e.
        7. Let q = p.
  19. Let T be a String value equal to the substring of S consisting of the code units at indices p (inclusive) through s (exclusive).
  20. Perform CreateDataProperty(A, ToString(lengthA), T).
  21. Assert: The above call will never result in an abrupt completion.
  22. Return A.

The length property of the split method is 2.

NOTE 1 The value of separator may be an empty String, an empty regular expression, or a regular expression that can match an empty String. In this case, separator does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. (For example, if separator is the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.) If separator is a regular expression, only the first match at a given index of the this String is considered, even if backtracking could yield a non-empty-substring match at that index. (For example, "ab".split(/a*?/) evaluates to the array ["a","b"], while "ab".split(/a*/) evaluates to the array["","b"].)

If the this object is (or converts to) the empty String, the result depends on whether separator can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,

      "A<B>bold</B>and<CODE>coded</CODE>".split(/<(\/)?([^<>]+)>/)

evaluates to the array:

      ["A", undefined, "B", "bold", "/", "B", "and", undefined,
      "CODE", "coded", "/", "CODE", ""]

If separator is undefined, then the result array contains just one String, which is the this value (converted to a String). If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.

NOTE 2 The split function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.17.1 Runtime Semantics: SplitMatch ( S, q, R )

The abstract operation SplitMatch takes three parameters, a String S, an integer q, and a String R, and performs the following steps in order to return either false or the end index of a match:

  1. Assert: Type(R) is String.
  2. Let r be the number of code units in R.
  3. Let s be the number of code units in S.
  4. If q+r > s, return false.
  5. If there exists an integer i between 0 (inclusive) and r (exclusive) such that the code unit at index q+i of S is different from the code unit at index i of R, return false.
  6. Return q+r.

21.1.3.18 String.prototype.startsWith ( searchString [, position ] )

The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let isRegExp be IsRegExp(searchString).
  5. ReturnIfAbrupt(isRegExp).
  6. If isRegExp is true, throw a TypeError exception.
  7. Let searchStr be ToString(searchString).
  8. ReturnIfAbrupt(searchString).
  9. Let pos be ToInteger(position). (If position is undefined, this step produces the value 0).
  10. ReturnIfAbrupt(pos).
  11. Let len be the number of elements in S.
  12. Let start be min(max(pos, 0), len).
  13. Let searchLength be the number of elements in searchStr.
  14. If searchLength+start is greater than len, return false.
  15. If the sequence of elements of S starting at start of length searchLength is the same as the full element sequence of searchStr, return true.
  16. Otherwise, return false.

The length property of the startsWith method is 1.

NOTE 1 This method returns true if the sequence of elements of searchString converted to a String is the same as the corresponding elements of this object (converted to a String) starting at index position. Otherwise returns false.

NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.

NOTE 3 The startsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.19 String.prototype.substring ( start, end )

The substring method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end of the String (or through the end of the String is end is undefined). The result is a String value, not a String object.

If either argument is NaN or negative, it is replaced with zero; if either argument is larger than the length of the String, it is replaced with the length of the String.

If start is larger than end, they are swapped.

The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let len be the number of elements in S.
  5. Let intStart be ToInteger(start).
  6. ReturnIfAbrupt(intStart).
  7. If end is undefined, let intEnd be len; else let intEnd be ToInteger(end).
  8. ReturnIfAbrupt(intEnd).
  9. Let finalStart be min(max(intStart, 0), len).
  10. Let finalEnd be min(max(intEnd, 0), len).
  11. Let from be min(finalStart, finalEnd).
  12. Let to be max(finalStart, finalEnd).
  13. Return a String whose length is to - from, containing code units from S, namely the code units with indices from through to −1, in ascending order.

The length property of the substring method is 2.

NOTE The substring function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.20 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleLowerCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleLowerCase method is used.

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment's current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

The length property of the toLocaleLowerCase method is 0.

The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.

NOTE The toLocaleLowerCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.21 String.prototype.toLocaleUpperCase ([ reserved1 [ , reserved2 ] ] )

An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleUpperCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleUpperCase method is used.

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function works exactly the same as toUpperCase except that its result is intended to yield the correct result for the host environment's current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.

The length property of the toLocaleUpperCase method is 0.

The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.

NOTE The toLocaleUpperCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.22 String.prototype.toLowerCase ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4. The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let cpList be a List containing in order the code points as defined in 6.1.4 of S, starting at the first element of S.
  5. For each code point c in cpList, if the Unicode Character Database provides a language insensitive lower case equivalent of c then replace c in cpList with that equivalent code point(s).
  6. Let cuList be a new List.
  7. For each code point c in cpList, in order, append to cuList the elements of the UTF16Encoding (10.1.1) of c.
  8. Let L be a String whose elements are, in order, the elements of cuList .
  9. Return L.

The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).

NOTE 1 The case mapping of some code points may produce multiple code points . In this case the result String may not be the same length as the source String. Because both toUpperCase and toLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().

NOTE 2 The toLowerCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.23 String.prototype.toString ( )

When the toString method is called, the following steps are taken:

  1. Let s be thisStringValue(this value).
  2. Return s.

NOTE For a String object, the toString method happens to return the same thing as the valueOf method.

21.1.3.24 String.prototype.toUpperCase ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

This function behaves in exactly the same way as String.prototype.toLowerCase, except that code points are mapped to their uppercase equivalents as specified in the Unicode Character Database.

NOTE The toUpperCase function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.25 String.prototype.trim ( )

This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.

The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Let T be a String value that is a copy of S with both leading and trailing white space removed. The definition of white space is the union of WhiteSpace and LineTerminator. When determining whether a Unicode code point is in Unicode general category “Zs”, code unit sequences are interpreted as UTF-16 encoded code point sequences as specified in 6.1.4.
  5. Return T.

NOTE The trim function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

21.1.3.26 String.prototype.valueOf ( )

When the valueOf method is called, the following steps are taken:

  1. Let s be thisStringValue(this value).
  2. Return s.

21.1.3.27 String.prototype [ @@iterator ]( )

When the @@iterator method is called it returns an Iterator object (25.1.1.2) that iterates over the code points of a String value, returning each code point as a String value. The following steps are taken:

  1. Let O be RequireObjectCoercible(this value).
  2. Let S be ToString(O).
  3. ReturnIfAbrupt(S).
  4. Return CreateStringIterator(S).

The value of the name property of this function is "[Symbol.iterator]".

21.1.4 Properties of String Instances

String instances are String exotic objects and have the internal methods specified for such objects. String instances inherit properties from the String prototype object. String instances also have a [[StringData]] internal slot.

String instances have a length property, and a set of enumerable properties with integer indexed names.

21.1.4.1 length

The number of elements in the String value represented by this String object.

Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

21.1.5 String Iterator Objects

An String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named constructor for String Iterator objects. Instead, String iterator objects are created by calling certain methods of String instance objects.

21.1.5.1 CreateStringIterator Abstract Operation

Several methods of String objects return Iterator objects. The abstract operation CreateStringIterator with argument string is used to create such iterator objects. It performs the following steps:

  1. Assert: Type(string) is String.
  2. Let iterator be ObjectCreate(%StringIteratorPrototype%, «[[IteratedString]], [[StringIteratorNextIndex]] »).
  3. Set iterator’s [[IteratedString]] internal slot to string.
  4. Set iterator’s [[StringIteratorNextIndex]] internal slot to 0.
  5. Return iterator.

21.1.5.2 The %StringIteratorPrototype% Object

All String Iterator Objects inherit properties from the %StringIteratorPrototype% intrinsic object. The %StringIteratorPrototype% object is an ordinary object and its [[Prototype]] internal slot is the %IteratorPrototype% intrinsic object (25.1.2). In addition, %StringIteratorPrototype% has the following properties:

21.1.5.2.1 %StringIteratorPrototype%.next ( )

  1. Let O be the this value.
  2. If Type(O) is not Object, throw a TypeError exception.
  3. If O does not have all of the internal slots of an String Iterator Instance (21.1.5.3), throw a TypeError exception.
  4. Let s be the value of the [[IteratedString]] internal slot of O.
  5. If s is undefined, return CreateIterResultObject(undefined, true).
  6. Let position be the value of the [[StringIteratorNextIndex]] internal slot of O.
  7. Let len be the number of elements in s.
  8. If positionlen, then
    1. Set the value of the [[IteratedString]] internal slot of O to undefined.
    2. Return CreateIterResultObject(undefined, true).
  9. Let first be the code unit value at index position in s.
  10. If first < 0xD800 or first > 0xDBFF or position+1 = len, let resultString be the string consisting of the single code unit first.
  11. Else,
    1. Let second be the code unit value at index position+1 in the String S.
    2. If second < 0xDC00 or second > 0xDFFF, let resultString be the string consisting of the single code unit first.
    3. Else, let resultString be the string consisting of the code unit first followed by the code unit second.
  12. Let resultSize be the number of code units in resultString.
  13. Set the value of the [[StringIteratorNextIndex]] internal slot of O to position+ resultSize.
  14. Return CreateIterResultObject(resultString, false).

21.1.5.2.2 %StringIteratorPrototype% [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "String Iterator".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.

21.1.5.3 Properties of String Iterator Instances

String Iterator instances are ordinary objects that inherit properties from the %StringIteratorPrototype% intrinsic object. String Iterator instances are initially created with the internal slots listed in Table 46.

Table 46 — Internal Slots of String Iterator Instances
Internal Slot Description
[[IteratedString]] The String value whose elements are being iterated.
[[StringIteratorNextIndex]] The integer index of the next string index to be examined by this iteration.

21.2 RegExp (Regular Expression) Objects

A RegExp object contains a regular expression and the associated flags.

NOTE The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.

21.2.1 Patterns

The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of Pattern.

Syntax

Pattern[U] ::
Disjunction[?U]
Disjunction[U] ::
Alternative[?U]
Alternative[?U] | Disjunction[?U]
Alternative[U] ::
[empty]
Alternative[?U] Term[?U]
Term[U] ::
Assertion[?U]
Atom[?U]
Atom[?U] Quantifier
Assertion[U] ::
^
$
\ b
\ B
( ? = Disjunction[?U] )
( ? ! Disjunction[?U] )
Quantifier ::
QuantifierPrefix
QuantifierPrefix ?
QuantifierPrefix ::
*
+
?
{ DecimalDigits }
{ DecimalDigits , }
{ DecimalDigits , DecimalDigits }
Atom[U] ::
PatternCharacter
.
\ AtomEscape[?U]
CharacterClass[?U]
( Disjunction[?U] )
( ? : Disjunction[?U] )
SyntaxCharacter :: one of
^ $ \ . * + ? ( ) [ ] { } |
PatternCharacter ::
SourceCharacter but not SyntaxCharacter
AtomEscape[U] ::
DecimalEscape
CharacterEscape[?U]
CharacterClassEscape
CharacterEscape[U] ::
ControlEscape
c ControlLetter
HexEscapeSequence
RegExpUnicodeEscapeSequence[?U]
IdentityEscape[?U]
ControlEscape :: one of
f n r t v
ControlLetter :: one of
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
RegExpUnicodeEscapeSequence[U] ::
[+U] u LeadSurrogate \u TrailSurrogate
[+U] u LeadSurrogate
[+U] u TrailSurrogate
[+U] u NonSurrogate
[~U] u Hex4Digits
[+U] u{ HexDigits }

Each \u TrailSurrogate for which the choice of associated u LeadSurrogate is ambiguous shall be associated with the nearest possible u LeadSurrogate that would otherwise have no corresponding \u TrailSurrogate.

LeadSurrogate ::
Hex4Digits [match only if the SV of Hex4Digits is in the inclusive range 0xD800 to 0xDBFF]
TrailSurrogate ::
Hex4Digits [match only if the SV of Hex4Digits is in the inclusive range 0xDC00 to 0xDFFF]
NonSurrogate ::
Hex4Digits [match only if the SV of Hex4Digits is not in the inclusive range 0xD800 to 0xDFFF]
IdentityEscape[U] ::
[+U] SyntaxCharacter
[+U] /
[~U] SourceCharacter but not UnicodeIDContinue
DecimalEscape ::
DecimalIntegerLiteral [lookahead ∉ DecimalDigit]
CharacterClassEscape :: one of
d D s S w W
CharacterClass[U] ::
[ [lookahead ∉ {^}] ClassRanges[?U] ]
[ ^ ClassRanges[?U] ]
ClassRanges[U] ::
[empty]
NonemptyClassRanges[?U]
NonemptyClassRanges[U] ::
ClassAtom[?U]
ClassAtom[?U] NonemptyClassRangesNoDash[?U]
ClassAtom[?U] - ClassAtom[?U] ClassRanges[?U]
NonemptyClassRangesNoDash[U] ::
ClassAtom[?U]
ClassAtomNoDash[?U] NonemptyClassRangesNoDash[?U]
ClassAtomNoDash[?U] - ClassAtom[?U] ClassRanges[?U]
ClassAtom[U] ::
-
ClassAtomNoDash[?U]
ClassAtomNoDash[U] ::
SourceCharacter but not one of \ or ] or -
\ ClassEscape[?U]
ClassEscape[U] ::
DecimalEscape
b
[+U] -
CharacterEscape[?U]
CharacterClassEscape

21.2.1.1 Static Semantics: Early Errors

RegExpUnicodeEscapeSequence :: u{ HexDigits }
  • It is a Syntax Error if the MV of HexDigits > 1114111.

21.2.2 Pattern Semantics

A regular expression pattern is converted into an internal procedure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The internal procedure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.

A Pattern is either a BMP pattern or a Unicode pattern depending upon whether or not its associated flags contain a "u". A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (6.1.4). In either context, “character value” means the numeric value of the corresponding non-encoded code point.

The syntax and semantics of Pattern is defined as if the source code for the Pattern was a List of SourceCharacter values where each SourceCharacter corresponds to a Unicode code point. If a BMP pattern contains a non-BMP SourceCharacter the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.

NOTE For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character) List consisting of the single code point 0x1D11E. However, interpreted as a BMP pattern, it is first UTF-16 encoded to produce a two element List consisting of the code units 0xD834 and 0xDD1E.

Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16Decode (see 10.1.2) must be used in producing a List consisting of a single pattern character, the code point U+1D11E.

An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.

21.2.2.1 Notation

The descriptions below use the following variables:

  • Input is a List consisting of all of the characters, in order, of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation Input[n] means the nth character of Input, where n can range between 0 (inclusive) and InputLength (exclusive).

  • InputLength is the number of characters in Input.

  • NcapturingParens is the total number of left capturing parentheses (i.e. the total number of times the Atom :: ( Disjunction ) production is expanded) in the pattern. A left capturing parenthesis is any ( pattern character that is matched by the ( terminal of the Atom :: ( Disjunction ) production.

  • IgnoreCase is true if the RegExp object's [[OriginalFlags]] internal slot contains "i" and otherwise is false.

  • Multiline is true if the RegExp object's [[OriginalFlags]] internal slot contains "m" and otherwise is false.

  • Unicode is true if the RegExp object's [[OriginalFlags]] internal slot contains "u" and otherwise is false.

Furthermore, the descriptions below use the following internal data structures:

  • A CharSet is a mathematical set of characters, either code units or code points depending up the state of the Unicode flag. “All characters” means either all code unit values or all code point values also depending upon the state if Unicode.

  • A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is a List of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The nth element of captures is either a List that represents the value obtained by the nth set of capturing parentheses or undefined if the nth set of capturing parentheses hasn’t been reached yet. Due to backtracking, many States may be in use at any time during the matching process.

  • A MatchResult is either a State or the special token failure that indicates that the match failed.

  • A Continuation procedure is an internal closure (i.e. an internal procedure with some arguments already bound to values) that takes one State argument and returns a MatchResult result. If an internal closure references variables which are bound in the function that creates the closure, the closure uses the values that these variables had at the time the closure was created. The Continuation attempts to match the remaining portion (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns failure.

  • A Matcher procedure is an internal closure that takes two arguments — a State and a Continuation — and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.

  • An AssertionTester procedure is an internal closure that takes a State argument and returns a Boolean result. The assertion tester tests a specific condition (specified by the closure's already-bound arguments) against the current place in Input and returns true if the condition matched or false if not.

  • An EscapeValue is either a character or an integer. An EscapeValue is used to denote the interpretation of a DecimalEscape escape sequence: a character ch means that the escape sequence is interpreted as the character ch, while an integer n means that the escape sequence is interpreted as a backreference to the nth set of capturing parentheses.

21.2.2.2 Pattern

The production Pattern :: Disjunction evaluates as follows:

  1. Evaluate Disjunction to obtain a Matcher m.
  2. Return an internal closure that takes two arguments, a String str and an integer index, and performs the following steps:
    1. If Unicode is true, let Input be a List consisting of the sequence of code points of str interpreted as a UTF-16 encoded (6.1.4) Unicode string. Otherwise, let Input be a List consisting of the sequence of code units that are the elements of str. Input will be used throughout the algorithms in 21.2.2. Each element of Input is considered to be a character.
    2. Let listIndex be the index into Input of the character that was obtained from element index of str.
    3. Let InputLength be the number of characters contained in Input. This variable will be used throughout the algorithms in 21.2.2.
    4. Let c be a Continuation that always returns its State argument as a successful MatchResult.
    5. Let cap be a List of NcapturingParens undefined values, indexed 1 through NcapturingParens.
    6. Let x be the State (listIndex, cap).
    7. Call m(x, c) and return its result.

NOTE A Pattern evaluates (“compiles”) to an internal procedure value. RegExp.prototype.exec and other methods can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in 21.2.2 are designed so that compiling a pattern may throw a SyntaxError exception; on the other hand, once the pattern is successfully compiled, applying the resulting internal procedure to find a match in a String cannot throw an exception (except for any host-defined exceptions that can occur anywhere such as out-of-memory).

21.2.2.3 Disjunction

The production Disjunction :: Alternative evaluates by evaluating Alternative to obtain a Matcher and returning that Matcher.

The production Disjunction :: Alternative | Disjunction evaluates as follows:

  1. Evaluate Alternative to obtain a Matcher m1.
  2. Evaluate Disjunction to obtain a Matcher m2.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Call m1(x, c) and let r be its result.
    2. If r is not failure, return r.
    3. Call m2(x, c) and return its result.

NOTE The | regular expression operator separates two alternatives. The pattern first tries to match the left Alternative (followed by the sequel of the regular expression); if it fails, it tries to match the right Disjunction (followed by the sequel of the regular expression). If the left Alternative, the right Disjunction, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left Alternative. If choices in the left Alternative are exhausted, the right Disjunction is tried instead of the left Alternative. Any capturing parentheses inside a portion of the pattern skipped by | produce undefined values instead of Strings. Thus, for example,

/a|ab/.exec("abc")

returns the result "a" and not "ab". Moreover,

/((a)|(ab))((c)|(bc))/.exec("abc")

returns the array

["abc", "a", "a", undefined, "bc", undefined, "bc"]

and not

["abc", "ab", undefined, "ab", "c", "c", undefined]

21.2.2.4 Alternative

The production Alternative :: [empty] evaluates by returning a Matcher that takes two arguments, a State x and a Continuation c, and returns the result of calling c(x).

The production Alternative :: Alternative Term evaluates as follows:

  1. Evaluate Alternative to obtain a Matcher m1.
  2. Evaluate Term to obtain a Matcher m2.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Create a Continuation d that takes a State argument y and returns the result of calling m2(y, c).
    2. Call m1(x, d) and return its result.

NOTE Consecutive Terms try to simultaneously match consecutive portions of Input. If the left Alternative, the right Term, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right Term, and all choices in the right Term are tried before moving on to the next choice in the left Alternative.

21.2.2.5 Term

The production Term :: Assertion evaluates by returning an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:

  1. Evaluate Assertion to obtain an AssertionTester t.
  2. Call t(x) and let r be the resulting Boolean value.
  3. If r is false, return failure.
  4. Call c(x) and return its result.

The production Term :: Atom evaluates as follows:

  1. Return the Matcher that is the result of evaluating Atom.

The production Term :: Atom Quantifier evaluates as follows:

  1. Evaluate Atom to obtain a Matcher m.
  2. Evaluate Quantifier to obtain the three results: an integer min, an integer (or ∞) max, and Boolean greedy.
  3. If max is finite and less than min, throw a SyntaxError exception.
  4. Let parenIndex be the number of left capturing parentheses in the entire regular expression that occur to the left of this production expansion's Term. This is the total number of times the Atom :: ( Disjunction ) production is expanded prior to this production's Term plus the total number of Atom :: ( Disjunction ) productions enclosing this Term.
  5. Let parenCount be the number of left capturing parentheses in the expansion of this production's Atom. This is the total number of Atom :: ( Disjunction ) productions enclosed by this production's Atom.
  6. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Call RepeatMatcher(m, min, max, greedy, x, c, parenIndex, parenCount) and return its result.

21.2.2.5.1 Runtime Semantics: RepeatMatcher Abstract Operation

The abstract operation RepeatMatcher takes eight parameters, a Matcher m, an integer min, an integer (or ∞) max, a Boolean greedy, a State x, a Continuation c, an integer parenIndex, and an integer parenCount, and performs the following steps:

  1. If max is zero, return c(x).
  2. Create an internal Continuation closure d that takes one State argument y and performs the following steps when evaluated:
    1. If min is zero and y's endIndex is equal to x's endIndex, return failure.
    2. If min is zero, let min2 be zero; otherwise let min2 be min–1.
    3. If max is ∞, let max2 be ∞; otherwise let max2 be max–1.
    4. Call RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount) and return its result.
  3. Let cap be a fresh copy of x's captures List.
  4. For every integer k that satisfies parenIndex < k and kparenIndex+parenCount, set cap[k] to undefined.
  5. Let e be x's endIndex.
  6. Let xr be the State (e, cap).
  7. If min is not zero, return m(xr, d).
  8. If greedy is false, then
    1. Call c(x) and let z be its result.
    2. If z is not failure, return z.
    3. Call m(xr, d) and return its result.
  9. Call m(xr, d) and let z be its result.
  10. If z is not failure, return z.
  11. Call c(x) and return its result.

NOTE 1 An Atom followed by a Quantifier is repeated the number of times specified by the Quantifier. A Quantifier can be non-greedy, in which case the Atom pattern is repeated as few times as possible while still matching the sequel, or it can be greedy, in which case the Atom pattern is repeated as many times as possible while still matching the sequel. The Atom pattern is repeated rather than the input character sequence that it matches, so different repetitions of the Atom can match different input substrings.

NOTE 2 If the Atom and the sequel of the regular expression all have choice points, the Atom is first matched as many (or as few, if non-greedy) times as possible. All choices in the sequel are tried before moving on to the next choice in the last repetition of Atom. All choices in the last (nth) repetition of Atom are tried before moving on to the next choice in the next-to-last (n–1)st repetition of Atom; at which point it may turn out that more or fewer repetitions of Atom are now possible; these are exhausted (again, starting with either as few or as many as possible) before moving on to the next choice in the (n-1)st repetition of Atom and so on.

Compare

/a[a-z]{2,4}/.exec("abcdefghi")

which returns "abcde" with

/a[a-z]{2,4}?/.exec("abcdefghi")

which returns "abc".

Consider also

/(aa|aabaac|ba|b|c)*/.exec("aabaac")

which, by the choice point ordering above, returns the array

["aaba", "ba"]

and not any of:

["aabaac", "aabaac"]
["aabaac", "c"]

The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:

"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/,"$1")

which returns the gcd in unary notation "aaaaa".

NOTE 3 Step 5 of the RepeatMatcher clears Atom's captures each time Atom is repeated. We can see its behaviour in the regular expression

/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")

which returns the array

["zaacbbbcac", "z", "ac", "a", undefined, "c"]

and not

["zaacbbbcac", "z", "ac", "a", "bbb", "c"]

because each iteration of the outermost * clears all captured Strings contained in the quantified Atom, which in this case includes capture Strings numbered 2, 3, 4, and 5.

NOTE 4 Step 1 of the RepeatMatcher's d closure states that, once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty character sequence are not considered for further repetitions. This prevents the regular expression engine from falling into an infinite loop on patterns such as:

/(a*)*/.exec("b")

or the slightly more complicated:

/(a*)b\1+/.exec("baaaac")

which returns the array

["b", ""]

21.2.2.6 Assertion

The production Assertion :: ^ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. If e is zero, return true.
  3. If Multiline is false, return false.
  4. If the character Input[e–1] is one of LineTerminator, return true.
  5. Return false.

NOTE Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.

The production Assertion :: $ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. If e is equal to InputLength, return true.
  3. If Multiline is false, return false.
  4. If the character Input[e] is one of LineTerminator, return true.
  5. Return false.

The production Assertion :: \ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. Call IsWordChar(e–1) and let a be the Boolean result.
  3. Call IsWordChar(e) and let b be the Boolean result.
  4. If a is true and b is false, return true.
  5. If a is false and b is true, return true.
  6. Return false.

The production Assertion :: \ B evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. Call IsWordChar(e–1) and let a be the Boolean result.
  3. Call IsWordChar(e) and let b be the Boolean result.
  4. If a is true and b is false, return false.
  5. If a is false and b is true, return false.
  6. Return true.

The production Assertion :: ( ? = Disjunction ) evaluates as follows:

  1. Evaluate Disjunction to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is failure, return failure.
    4. Let y be r's State.
    5. Let cap be y's captures List.
    6. Let xe be x's endIndex.
    7. Let z be the State (xe, cap).
    8. Call c(z) and return its result.

The production Assertion :: ( ? ! Disjunction ) evaluates as follows:

  1. Evaluate Disjunction to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is not failure, return failure.
    4. Call c(x) and return its result.

21.2.2.6.1 Runtime Semantics: IsWordChar Abstract Operation

The abstract operation IsWordChar takes an integer parameter e and performs the following steps:

  1. If e is –1 or e is InputLength, return false.
  2. Let c be the character Input[e].
  3. If c is one of the sixty-three characters below, return true.
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9 _
  4. Return false.

21.2.2.7 Quantifier

The production Quantifier :: QuantifierPrefix evaluates as follows:

  1. Evaluate QuantifierPrefix to obtain the two results: an integer min and an integer (or ∞) max.
  2. Return the three results min, max, and true.

The production Quantifier :: QuantifierPrefix ? evaluates as follows:

  1. Evaluate QuantifierPrefix to obtain the two results: an integer min and an integer (or ∞) max.
  2. Return the three results min, max, and false.

The production QuantifierPrefix :: * evaluates as follows:

  1. Return the two results 0 and ∞.

The production QuantifierPrefix :: + evaluates as follows:

  1. Return the two results 1 and ∞.

The production QuantifierPrefix :: ? evaluates as follows:

  1. Return the two results 0 and 1.

The production QuantifierPrefix :: { DecimalDigits } evaluates as follows:

  1. Let i be the MV of DecimalDigits (see 11.8.3).
  2. Return the two results i and i.

The production QuantifierPrefix :: { DecimalDigits , } evaluates as follows:

  1. Let i be the MV of DecimalDigits.
  2. Return the two results i and ∞.

The production QuantifierPrefix :: { DecimalDigits , DecimalDigits } evaluates as follows:

  1. Let i be the MV of the first DecimalDigits.
  2. Let j be the MV of the second DecimalDigits.
  3. Return the two results i and j.

21.2.2.8 Atom

The production Atom :: PatternCharacter evaluates as follows:

  1. Let ch be the character matched by PatternCharacter.
  2. Let A be a one-element CharSet containing the character ch.
  3. Call CharacterSetMatcher(A, false) and return its Matcher result.

The production Atom :: . evaluates as follows:

  1. Let A be the set of all characters except LineTerminator.
  2. Call CharacterSetMatcher(A, false) and return its Matcher result.

The production Atom :: \ AtomEscape evaluates as follows:

  1. Return the Matcher that is the result of evaluating AtomEscape.

The production Atom :: CharacterClass evaluates as follows:

  1. Evaluate CharacterClass to obtain a CharSet A and a Boolean invert.
  2. Call CharacterSetMatcher(A, invert) and return its Matcher result.

The production Atom :: ( Disjunction ) evaluates as follows:

  1. Evaluate Disjunction to obtain a Matcher m.
  2. Let parenIndex be the number of left capturing parentheses in the entire regular expression that occur to the left of this production expansion's initial left parenthesis. This is the total number of times the Atom :: ( Disjunction ) production is expanded prior to this production's Atom plus the total number of Atom :: ( Disjunction ) productions enclosing this Atom.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Create an internal Continuation closure d that takes one State argument y and performs the following steps:
      1. Let cap be a fresh copy of y's captures List.
      2. Let xe be x's endIndex.
      3. Let ye be y's endIndex.
      4. Let s be a fresh List whose characters are the characters of Input at indices xe (inclusive) through ye (exclusive).
      5. Set cap[parenIndex+1] to s.
      6. Let z be the State (ye, cap).
      7. Call c(z) and return its result.
    2. Call m(x, d) and return its result.

The production Atom :: ( ? : Disjunction ) evaluates as follows:

  1. Return the Matcher that is the result of evaluating Disjunction.

21.2.2.8.1 Runtime Semantics: CharacterSetMatcher Abstract Operation

The abstract operation CharacterSetMatcher takes two arguments, a CharSet A and a Boolean flag invert, and performs the following steps:

  1. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Let e be x's endIndex.
    2. If e is InputLength, return failure.
    3. Let ch be the character Input[e].
    4. Let cc be Canonicalize(ch).
    5. If invert is false, then
      1. If there does not exist a member a of set A such that Canonicalize(a) is cc, return failure.
    6. Else invert is true,
      1. If there exists a member a of set A such that Canonicalize(a) is cc, return failure.
    7. Let cap be x's captures List.
    8. Let y be the State (e+1, cap).
    9. Call c(y) and return its result.

21.2.2.8.2 Runtime Semantics: Canonicalize ( ch )

The abstract operation Canonicalize takes a character parameter ch and performs the following steps:

  1. If IgnoreCase is false, return ch.
  2. If Unicode is true,
    1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
    2. Else, return ch.
  3. Else,
    1. Assert: ch is a UTF-16 code unit.
    2. Let s be the ECMAScript String value consisting of the single code unit ch.
    3. Let u be the same result produced as if by performing the algorithm for String.prototype.toUpperCase using s as the this value.
    4. Assert: u is a String value.
    5. If u does not consist of a single code unit, return ch.
    6. Let cu be u's single code unit element.
    7. If ch's code unit value ≥ 128 and cu's code unit value < 128, return ch.
    8. Return cu.

NOTE 1 Parentheses of the form ( Disjunction ) serve both to group the components of the Disjunction pattern together and to save the result of the match. The result can be used either in a backreference (\ followed by a nonzero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching internal procedure. To inhibit the capturing behaviour of parentheses, use the form (?: Disjunction ) instead.

NOTE 2 The form (?= Disjunction ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside Disjunction must match at the current position, but the current position is not advanced before matching the sequel. If Disjunction can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a (?= form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.

For example,

/(?=(a+))/.exec("baaabac")

matches the empty String immediately after the first b and therefore returns the array:

["", "aaa"]

To illustrate the lack of backtracking into the lookahead, consider:

/(?=(a+))a*b\1/.exec("baaabac")

This expression returns

["aba", "a"]

and not:

["aaaba", "a"]

NOTE 3 The form (?! Disjunction ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside Disjunction must fail to match at the current position. The current position is not advanced before matching the sequel. Disjunction can contain capturing parentheses, but backreferences to them only make sense from within Disjunction itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return undefined because the negative lookahead must fail for the pattern to succeed. For example,

/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")

looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against undefined and therefore always succeeds. The whole expression returns the array:

["baaabaac", "ba", undefined, "abaac"]

NOTE 4 In case-insignificant matches when Unicode is true, all characters are implicitly case-folded using the simple mapping provided by the Unicode standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, "ß" (U+00DF) to "SS". It may however map a code point outside the Basic Latin range to a character within, for example, "ſ" (U+017F) to "s". Such characters are not mapped if Unicode is false. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as /[a‑z]/i, but they will match /[a‑z]/ui.

21.2.2.9 AtomEscape

The production AtomEscape :: DecimalEscape evaluates as follows:

  1. Evaluate DecimalEscape to obtain an EscapeValue E.
  2. If E is a character, then
    1. Let ch be E's character.
    2. Let A be a one-element CharSet containing the character ch.
    3. Call CharacterSetMatcher(A, false) and return its Matcher result.
  3. Assert: E must be an integer.
  4. Let n be that integer.
  5. If n=0 or n>NcapturingParens, throw a SyntaxError exception.
  6. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let cap be x's captures List.
    2. Let s be cap[n].
    3. If s is undefined, return c(x).
    4. Let e be x's endIndex.
    5. Let len be s's length.
    6. Let f be e+len.
    7. If f>InputLength, return failure.
    8. If there exists an integer i between 0 (inclusive) and len (exclusive) such that Canonicalize(s[i]) is not the same character value as Canonicalize(Input [e+i]), return failure.
    9. Let y be the State (f, cap).
    10. Call c(y) and return its result.

The production AtomEscape :: CharacterEscape evaluates as follows:

  1. Evaluate CharacterEscape to obtain a character ch.
  2. Let A be a one-element CharSet containing the character ch.
  3. Call CharacterSetMatcher(A, false) and return its Matcher result.

The production AtomEscape :: CharacterClassEscape evaluates as follows:

  1. Evaluate CharacterClassEscape to obtain a CharSet A.
  2. Call CharacterSetMatcher(A, false) and return its Matcher result.

NOTE An escape sequence of the form \ followed by a nonzero decimal number n matches the result of the nth set of capturing parentheses (see 0). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.

21.2.2.10 CharacterEscape

The production CharacterEscape :: ControlEscape evaluates by returning the character according to Table 47.

Table 47 — ControlEscape Character Values
ControlEscape Character Value Code Point Unicode Name Symbol
t 9 U+0009 CHARACTER TABULATION <HT>
n 10 U+000A LINE FEED (LF) <LF>
v 11 U+000B LINE TABULATION <VT>
f 12 U+000C FORM FEED (FF) <FF>
r 13 U+000D CARRIAGE RETURN (CR) <CR>

The production CharacterEscape :: c ControlLetter evaluates as follows:

  1. Let ch be the character matched by ControlLetter.
  2. Let i be ch's character value.
  3. Let j be the remainder of dividing i by 32.
  4. Return the character whose character value is j.

The production CharacterEscape :: HexEscapeSequence evaluates as follows:

  1. Return the character whose code is the SV of HexEscapeSequence.

The production CharacterEscape :: RegExpUnicodeEscapeSequence evaluates as follows:

  1. Return the result of evaluating RegExpUnicodeEscapeSequence.

The production CharacterEscape :: IdentityEscape evaluates as follows:

  1. Return the character matched by IdentityEscape.

The production RegExpUnicodeEscapeSequence :: u LeadSurrogate \u TrailSurrogate evaluates as follows:

  1. Let lead be the result of evaluating LeadSurrogate.
  2. Let trail be the result of evaluating TrailSurrogate.
  3. Let cp be UTF16Decode(lead, trail).
  4. Return the character whose character value is cp.

The production RegExpUnicodeEscapeSequence :: u LeadSurrogate evaluates as follows:

  1. Return the character whose code is the result of evaluating LeadSurrogate.

The production RegExpUnicodeEscapeSequence :: u TrailSurrogate evaluates as follows:

  1. Return the character whose code is the result of evaluating TrailSurrogate.

The production RegExpUnicodeEscapeSequence :: u NonSurrogate evaluates as follows:

  1. Return the character whose code is the result of evaluating NonSurrogate.

The production RegExpUnicodeEscapeSequence :: u Hex4Digits evaluates as follows:

  1. Return the character whose code is the SV of Hex4Digits.

The production RegExpUnicodeEscapeSequence :: u{ HexDigits } evaluates as follows:

  1. Return the character whose code is the MV of HexDigits.

The production LeadSurrogate :: Hex4Digits evaluates as follows:

  1. Return the character whose code is the SV of Hex4Digits.

The production TrailSurrogate :: Hex4Digits evaluates as follows:

  1. Return the character whose code is the SV of Hex4Digits.

The production NonSurrogate :: Hex4Digits evaluates as follows:

  1. Return the character whose code is the SV of Hex4Digits.

21.2.2.11 DecimalEscape

The production DecimalEscape :: DecimalIntegerLiteral evaluates as follows:

  1. Let i be the MV of DecimalIntegerLiteral.
  2. If i is zero, return the EscapeValue consisting of the character U+0000 (NULL).
  3. Return the EscapeValue consisting of the integer i.

The definition of “the MV of DecimalIntegerLiteral” is in 11.8.3.

NOTE If \ is followed by a decimal number n whose first digit is not 0, then the escape sequence is considered to be a backreference. It is an error if n is greater than the total number of left capturing parentheses in the entire regular expression. \0 represents the <NUL> character and cannot be followed by a decimal digit.

21.2.2.12 CharacterClassEscape

The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the characters 0 through 9 inclusive.

The production CharacterClassEscape :: D evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: d .

The production CharacterClassEscape :: s evaluates by returning the set of characters containing the characters that are on the right-hand side of the WhiteSpace (11.2) or LineTerminator (11.3) productions.

The production CharacterClassEscape :: S evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: s .

The production CharacterClassEscape :: w evaluates by returning the set of characters containing the sixty-three characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 _

The production CharacterClassEscape :: W evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: w .

21.2.2.13 CharacterClass

The production CharacterClass :: [ ClassRanges ] evaluates by evaluating ClassRanges to obtain a CharSet and returning that CharSet and the Boolean false.

The production CharacterClass :: [ ^ ClassRanges ] evaluates by evaluating ClassRanges to obtain a CharSet and returning that CharSet and the Boolean true.

21.2.2.14 ClassRanges

The production ClassRanges :: [empty] evaluates by returning the empty CharSet.

The production ClassRanges :: NonemptyClassRanges evaluates by evaluating NonemptyClassRanges to obtain a CharSet and returning that CharSet.

21.2.2.15 NonemptyClassRanges

The production NonemptyClassRanges :: ClassAtom evaluates as follows:

  1. Return the CharSet that is the result of evaluating ClassAtom.

The production NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash evaluates as follows:

  1. Evaluate ClassAtom to obtain a CharSet A.
  2. Evaluate NonemptyClassRangesNoDash to obtain a CharSet B.
  3. Return the union of CharSets A and B.

The production NonemptyClassRanges :: ClassAtom - ClassAtom ClassRanges evaluates as follows:

  1. Evaluate the first ClassAtom to obtain a CharSet A.
  2. Evaluate the second ClassAtom to obtain a CharSet B.
  3. Evaluate ClassRanges to obtain a CharSet C.
  4. Call CharacterRange(A, B) and let D be the resulting CharSet.
  5. Return the union of CharSets D and C.

21.2.2.15.1 Runtime Semantics: CharacterRange Abstract Operation

The abstract operation CharacterRange takes two CharSet parameters A and B and performs the following steps:

  1. If A does not contain exactly one character or B does not contain exactly one character, throw a SyntaxError exception.
  2. Let a be the one character in CharSet A.
  3. Let b be the one character in CharSet B.
  4. Let i be the character value of character a.
  5. Let j be the character value of character b.
  6. If i > j, throw a SyntaxError exception.
  7. Return the set containing all characters numbered i through j, inclusive.

21.2.2.16 NonemptyClassRangesNoDash

The production NonemptyClassRangesNoDash :: ClassAtom evaluates as follows:

  1. Return the CharSet that is the result of evaluating ClassAtom.

The production NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash evaluates as follows:

  1. Evaluate ClassAtomNoDash to obtain a CharSet A.
  2. Evaluate NonemptyClassRangesNoDash to obtain a CharSet B.
  3. Return the union of CharSets A and B.

The production NonemptyClassRangesNoDash :: ClassAtomNoDash - ClassAtom ClassRanges evaluates as follows:

  1. Evaluate ClassAtomNoDash to obtain a CharSet A.
  2. Evaluate ClassAtom to obtain a CharSet B.
  3. Evaluate ClassRanges to obtain a CharSet C.
  4. Call CharacterRange(A, B) and let D be the resulting CharSet.
  5. Return the union of CharSets D and C.

NOTE 1 ClassRanges can expand into single ClassAtoms and/or ranges of two ClassAtoms separated by dashes. In the latter case the ClassRanges includes all characters between the first ClassAtom and the second ClassAtom, inclusive; an error occurs if either ClassAtom does not represent a single character (for example, if one is \w) or if the first ClassAtom's character value is greater than the second ClassAtom's character value.

NOTE 2 Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.

NOTE 3 A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of ClassRanges, the beginning or end limit of a range specification, or immediately follows a range specification.

21.2.2.17 ClassAtom

The production ClassAtom :: - evaluates by returning the CharSet containing the one character -.

The production ClassAtom :: ClassAtomNoDash evaluates by evaluating ClassAtomNoDash to obtain a CharSet and returning that CharSet.

21.2.2.18 ClassAtomNoDash

The production ClassAtomNoDash :: SourceCharacter but not one of \ or ] or - evaluates as follows:

  1. Return the CharSet containing the character matched by SourceCharacter.

The production ClassAtomNoDash :: \ ClassEscape evaluates as follows:

  1. Return the CharSet that is the result of evaluating ClassEscape.

21.2.2.19 ClassEscape

The production ClassEscape :: DecimalEscape evaluates as follows:

  1. Evaluate DecimalEscape to obtain an EscapeValue E.
  2. If E is not a character, throw a SyntaxError exception.
  3. Let ch be E's character.
  4. Return the one-element CharSet containing the character ch.

The production ClassEscape :: b evaluates as follows:

  1. Return the CharSet containing the single character <BS> U+0008 (BACKSPACE).

The production ClassEscape :: - evaluates as follows:

  1. Return the CharSet containing the single character - U+002D (HYPEN-MINUS).

The production ClassEscape :: CharacterEscape evaluates as follows:

  1. Return the CharSet containing the single character that is the result of evaluating CharacterEscape.

The production ClassEscape :: CharacterClassEscape evaluates as follows:

  1. Return the CharSet that is the result of evaluating CharacterClassEscape.

NOTE A ClassAtom can use any of the escape sequences that are allowed in the rest of the regular expression except for \b, \B, and backreferences. Inside a CharacterClass, \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a ClassAtom causes an error.

21.2.3 The RegExp Constructor

The RegExp constructor is the %RegExp% intrinsic object and the initial value of the RegExp property of the global object. When RegExp is called as a function rather than as a constructor, it creates and initializes a new RegExp object. Thus the function call RegExp() is equivalent to the object creation expression new RegExp() with the same arguments.

The RegExp constructor is designed to be subclassable. It may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified RegExp behaviour must include a super call to the RegExp constructor to create and initialize subclass instances with the necessary internal slots.

21.2.3.1 RegExp ( pattern, flags )

The following steps are taken:

  1. Let patternIsRegExp be IsRegExp(pattern).
  2. ReturnIfAbrupt(patternIsRegExp).
  3. If NewTarget is not undefined, let newTarget be NewTarget.
  4. Else,
    1. Let newTarget be the active function object.
    2. If patternIsRegExp is true and flags is undefined, then
      1. Let patternConstructor be Get(pattern, "constructor").
      2. ReturnIfAbrupt(patternConstructor).
      3. If SameValue(newTarget, patternConstructor) is true, return pattern.
  5. If Type(pattern) is Object and pattern has a [[RegExpMatcher]] internal slot, then
    1. Let P be the value of pattern’s [[OriginalSource]] internal slot.
    2. If flags is undefined, let F be the value of pattern’s [[OriginalFlags]] internal slot.
    3. Else, let F be flags.
  6. Else if patternIsRegExp is true, then
    1. Let P be Get(pattern, "source").
    2. ReturnIfAbrupt(P).
    3. If flags is undefined, then
      1. Let F be Get(pattern, "flags").
      2. ReturnIfAbrupt(F).
    4. Else, let F be flags.
  7. Else,
    1. Let P be pattern.
    2. Let F be flags.
  8. Let O be RegExpAlloc(newTarget).
  9. ReturnIfAbrupt(O).
  10. Return RegExpInitialize(O, P, F).

NOTE If pattern is supplied using a StringLiteral, the usual escape sequence substitutions are performed before the String is processed by RegExp. If pattern must contain an escape sequence to be recognized by RegExp, any U+005C (REVERSE SOLIDUS) code points must be escaped within the StringLiteral to prevent them being removed when the contents of the StringLiteral are formed.

21.2.3.2 Abstract Operations for the RegExp Constructor

21.2.3.2.1 Runtime Semantics: RegExpAlloc ( newTarget )

When the abstract operation RegExpAlloc with argument newTarget is called, the following steps are taken:

  1. Let obj be OrdinaryCreateFromConstructor(newTarget, "%RegExpPrototype%", «‍[[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]]»).
  2. ReturnIfAbrupt(obj).
  3. Let status be DefinePropertyOrThrow(obj, "lastIndex", PropertyDescriptor {[[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false}).
  4. Assert: status is not an abrupt completion.
  5. Return obj.

21.2.3.2.2 Runtime Semantics: RegExpInitialize ( obj, pattern, flags )

When the abstract operation RegExpInitialize with arguments obj, pattern, and flags is called, the following steps are taken:

  1. If pattern is undefined, let P be the empty String.
  2. Else, let P be ToString(pattern).
  3. ReturnIfAbrupt(P).
  4. If flags is undefined, let F be the empty String.
  5. Else, let F be ToString(flags).
  6. ReturnIfAbrupt(F).
  7. If F contains any code unit other than "g", "i", "m", "u", or "y" or if it contains the same code unit more than once, throw a SyntaxError exception.
  8. If F contains "u", let BMP be false; else let BMP be true.
  9. If BMP is true, then
    1. Parse P using the grammars in 21.2.1 and interpreting each of its 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements. The goal symbol for the parse is Pattern. Throw a SyntaxError exception if P did not conform to the grammar, if any elements of P were not matched by the parse, or if any Early Error conditions exist.
    2. Let patternCharacters be a List whose elements are the code unit elements of P.
  10. Else
    1. Parse P using the grammars in 21.2.1 and interpreting P as UTF-16 encoded Unicode code points (6.1.4). The goal symbol for the parse is Pattern[U]. Throw a SyntaxError exception if P did not conform to the grammar, if any elements of P were not matched by the parse, or if any Early Error conditions exist.
    2. Let patternCharacters be a List whose elements are the code points resulting from applying UTF-16 decoding to P's sequence of elements.
  11. Set the value of obj’s [[OriginalSource]] internal slot to P.
  12. Set the value of obj’s [[OriginalFlags]] internal slot to F.
  13. Set obj’s [[RegExpMatcher]] internal slot to the internal procedure that evaluates the above parse of P by applying the semantics provided in 21.2.2 using patternCharacters as the pattern's List of SourceCharacter values and F as the flag parameters.
  14. Let setStatus be Set(obj, "lastIndex", 0, true).
  15. ReturnIfAbrupt(setStatus).
  16. Return obj.

21.2.3.2.3 Runtime Semantics: RegExpCreate ( P, F )

When the abstract operation RegExpCreate with arguments P and F is called, the following steps are taken:

  1. Let obj be RegExpAlloc(%RegExp%).
  2. ReturnIfAbrupt(obj).
  3. Return RegExpInitialize(obj, P, F).

21.2.3.2.4 Runtime Semantics: EscapeRegExpPattern ( P, F )

When the abstract operation EscapeRegExpPattern with arguments P and F is called, the following occurs:

  1. Let S be a String in the form of a Pattern (Pattern[U] if F contains "u") equivalent to P interpreted as UTF-16 encoded Unicode code points (6.1.4), in which certain code points are escaped as described below. S may or may not be identical to P; however, the internal procedure that would result from evaluating S as a Pattern (Pattern[U] if F contains "u") must behave identically to the internal procedure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results.
  2. The code points / or any LineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that the String value formed by concatenating the Strings "/", S, "/", and F can be parsed (in an appropriate lexical context) as a RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is "/", then S could be "\/" or "\u002F", among other possibilities, but not "/", because /// followed by F would be parsed as a SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this specification can be met by letting S be "(?:)".
  3. Return S.

21.2.4 Properties of the RegExp Constructor

The value of the [[Prototype]] internal slot of the RegExp constructor is the intrinsic object %FunctionPrototype% (19.2.3).

Besides the length property (whose value is 2), the RegExp constructor has the following properties:

21.2.4.1 RegExp.prototype

The initial value of RegExp.prototype is the intrinsic object %RegExpPrototype% (21.2.5).

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

21.2.4.2 get RegExp [ @@species ]

RegExp[@@species] is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Return the this value.

The value of the name property of this function is "get [Symbol.species]".

NOTE RegExp prototype methods normally use their this object's constructor to create a derived object. However, a subclass constructor may over-ride that default behaviour by redefining its @@species property.

21.2.5 Properties of the RegExp Prototype Object

The RegExp prototype object is the intrinsic object %RegExpPrototype%. The RegExp prototype object is an ordinary object. It is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.

The value of the [[Prototype]] internal slot of the RegExp prototype object is the intrinsic object %ObjectPrototype% (19.1.3).

NOTE The RegExp prototype object does not have a valueOf property of its own; however, it inherits the valueOf property from the Object prototype object.

21.2.5.1 RegExp.prototype.constructor

The initial value of RegExp.prototype.constructor is the intrinsic object %RegExp%.

21.2.5.2 RegExp.prototype.exec ( string )

Performs a regular expression match of string against the regular expression and returns an Array object containing the results of the match, or null if string did not match.

The String ToString(string) is searched for an occurrence of the regular expression pattern as follows:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have a [[RegExpMatcher]] internal slot, throw a TypeError exception.
  4. Let S be ToString(string).
  5. ReturnIfAbrupt(S).
  6. Return RegExpBuiltinExec(R, S).

21.2.5.2.1 Runtime Semantics: RegExpExec ( R, S )

The abstract operation RegExpExec with arguments R and S performs the following steps:

  1. Assert: Type(R) is Object.
  2. Assert: Type(S) is String.
  3. Let exec be Get(R, "exec").
  4. ReturnIfAbrupt(exec).
  5. If IsCallable(exec) is true, then
    1. Let result be Call(exec, R, «S»).
    2. ReturnIfAbrupt(result).
    3. If Type(result) is neither Object or Null, throw a TypeError exception.
    4. Return result.
  6. If R does not have a [[RegExpMatcher]] internal slot, throw a TypeError exception.
  7. Return RegExpBuiltinExec(R, S).

NOTE If a callable exec property is not found this algorithm falls back to attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup of exec.

21.2.5.2.2 Runtime Semantics: RegExpBuiltinExec ( R, S )

The abstract operation RegExpBuiltinExec with arguments R and S performs the following steps:

  1. Assert: R is an initialized RegExp instance.
  2. Assert: Type(S) is String.
  3. Let length be the number of code units in S.
  4. Let lastIndex be ToLength(Get(R,"lastIndex")).
  5. ReturnIfAbrupt(lastIndex).
  6. Let global be ToBoolean(Get(R, "global")).
  7. ReturnIfAbrupt(global).
  8. Let sticky be ToBoolean(Get(R, "sticky")).
  9. ReturnIfAbrupt(sticky).
  10. If global is false and sticky is false, let lastIndex be 0.
  11. Let matcher be the value of R’s [[RegExpMatcher]] internal slot.
  12. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  13. If flags contains "u", let fullUnicode be true, else let fullUnicode be false.
  14. Let matchSucceeded be false.
  15. Repeat, while matchSucceeded is false
    1. If lastIndex > length, then
      1. Let setStatus be Set(R, "lastIndex", 0, true).
      2. ReturnIfAbrupt(setStatus).
      3. Return null.
    2. Let r be matcher(S, lastIndex).
    3. If r is failure, then
      1. If sticky is true, then
        1. Let setStatus be Set(R, "lastIndex", 0, true).
        2. ReturnIfAbrupt(setStatus).
        3. Return null.
      2. Let lastIndex be AdvanceStringIndex(S, lastIndex, fullUnicode).
    4. Else,
      1. Assert: r is a State.
      2. Set matchSucceeded to true.
  16. Let e be r's endIndex value.
  17. If fullUnicode is true, then
    1. e is an index into the Input character list, derived from S, matched by matcher. Let eUTF be the smallest index into S that corresponds to the character at element e of Input. If e is greater than or equal to the length of Input, then eUTF is the number of code units in S.
    2. Let e be eUTF.
  18. If global is true or sticky is true,
    1. Let setStatus be Set(R, "lastIndex", e, true).
    2. ReturnIfAbrupt(setStatus).
  19. Let n be the length of r's captures List. (This is the same value as 21.2.2.1's NcapturingParens.)
  20. Let A be ArrayCreate(n + 1).
  21. Assert: The value of A's "length" property is n + 1.
  22. Let matchIndex be lastIndex.
  23. Assert: The following CreateDataProperty calls will not result in an abrupt completion.
  24. Perform CreateDataProperty(A, "index", matchIndex).
  25. Perform CreateDataProperty(A, "input", S).
  26. Let matchedSubstr be the matched substring (i.e. the portion of S between offset lastIndex inclusive and offset e exclusive).
  27. Perform CreateDataProperty(A, "0", matchedSubstr).
  28. For each integer i such that i > 0 and in
    1. Let captureI be ith element of r's captures List.
    2. If captureI is undefined, let capturedValue be undefined.
    3. Else if fullUnicode is true,
      1. Assert: captureI is a List of code points.
      2. Let capturedValue be a string whose code units are the UTF16Encoding (10.1.1) of the code points of captureI.
    4. Else, fullUnicode is false,
      1. Assert: captureI is a List of code units.
      2. Let capturedValue be a string consisting of the code units of captureI.
    5. Perform CreateDataProperty(A, ToString(i) , capturedValue).
  29. Return A.

21.2.5.2.3 AdvanceStringIndex ( S, index, unicode )

The abstract operation AdvanceStringIndex with arguments S, index, and unicode performs the following steps:

  1. Assert: Type(S) is String.
  2. Assert: index is an integer such that 0≤index≤253-1.
  3. Assert: Type(unicode) is Boolean.
  4. If unicode is false, return index+1.
  5. Let length be the number of code units in S.
  6. If index+1 ≥ length, return index+1.
  7. Let first be the code unit value at index index in S.
  8. If first < 0xD800 or first > 0xDBFF, return index+1.
  9. Let second be the code unit value at index index+1 in S.
  10. If second < 0xDC00 or second > 0xDFFF, return index+1.
  11. Return index+2.

21.2.5.3 get RegExp.prototype.flags

RegExp.prototype.flags is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let result be the empty String.
  4. Let global be ToBoolean(Get(R, "global")).
  5. ReturnIfAbrupt(global).
  6. If global is true, append "g" as the last code unit of result.
  7. Let ignoreCase be ToBoolean(Get(R, "ignoreCase")).
  8. ReturnIfAbrupt(ignoreCase).
  9. If ignoreCase is true, append "i" as the last code unit of result.
  10. Let multiline be ToBoolean(Get(R, "multiline")).
  11. ReturnIfAbrupt(multiline).
  12. If multiline is true, append "m" as the last code unit of result.
  13. Let unicode be ToBoolean(Get(R, "unicode")).
  14. ReturnIfAbrupt(unicode).
  15. If unicode is true, append "u" as the last code unit of result.
  16. Let sticky be ToBoolean(Get(R, "sticky")).
  17. ReturnIfAbrupt(sticky).
  18. If sticky is true, append "y" as the last code unit of result.
  19. Return result.

21.2.5.4 get RegExp.prototype.global

RegExp.prototype.global is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  4. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  5. If flags contains the code unit "g", return true.
  6. Return false.

21.2.5.5 get RegExp.prototype.ignoreCase

RegExp.prototype.ignoreCase is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  4. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  5. If flags contains the code unit "i", return true.
  6. Return false.

21.2.5.6 RegExp.prototype [ @@match ] ( string )

When the @@match method is called with argument string, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ToString(string)
  4. ReturnIfAbrupt(S).
  5. Let global be ToBoolean(Get(rx, "global")).
  6. ReturnIfAbrupt(global).
  7. If global is false, then
    1. Return RegExpExec(rx, S).
  8. Else global is true,
    1. Let fullUnicode be ToBoolean(Get(rx, "unicode")).
    2. ReturnIfAbrupt(fullUnicode).
    3. Let setStatus be Set(rx, "lastIndex", 0, true).
    4. ReturnIfAbrupt(setStatus).
    5. Let A be ArrayCreate(0).
    6. Let n be 0.
    7. Repeat,
      1. Let result be RegExpExec(rx, S).
      2. ReturnIfAbrupt(result).
      3. If result is null, then
        1. If n=0, return null.
        2. Else, return A.
      4. Else result is not null,
        1. Let matchStr be ToString(Get(result, "0")).
        2. ReturnIfAbrupt(matchStr).
        3. Let status be CreateDataProperty(A, ToString(n), matchStr).
        4. Assert: status is true.
        5. If matchStr is the empty String, then
          1. Let thisIndex be ToLength(Get(rx, "lastIndex")).
          2. ReturnIfAbrupt(thisIndex).
          3. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          4. Let setStatus be Set(rx, "lastIndex", nextIndex, true).
          5. ReturnIfAbrupt(setStatus).
        6. Increment n.

The value of the name property of this function is "[Symbol.match]".

NOTE The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.

21.2.5.7 get RegExp.prototype.multiline

RegExp.prototype.multiline is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  4. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  5. If flags contains the code unit "m", return true.
  6. Return false.

21.2.5.8 RegExp.prototype [ @@replace ] ( string, replaceValue )

When the @@replace method is called with arguments string and replaceValue the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ToString(string).
  4. ReturnIfAbrupt(S).
  5. Let lengthS be the number of code unit elements in S.
  6. Let functionalReplace be IsCallable(replaceValue).
  7. If functionalReplace is false, then
    1. Let replaceValue be ToString(replaceValue).
    2. ReturnIfAbrupt(replaceValue).
  8. Let global be ToBoolean(Get(rx, "global")).
  9. ReturnIfAbrupt(global).
  10. If global is true, then
    1. Let fullUnicode be ToBoolean(Get(rx, "unicode")).
    2. ReturnIfAbrupt(fullUnicode).
    3. Let setStatus be Set(rx, "lastIndex", 0, true).
    4. ReturnIfAbrupt(setStatus).
  11. Let results be a new empty List.
  12. Let done be false.
  13. Repeat, while done is false
    1. Let result be RegExpExec(rx, S).
    2. ReturnIfAbrupt(result).
    3. If result is null, set done to true.
    4. Else result is not null,
      1. Append result to the end of results.
      2. If global is false, set done to true.
      3. Else,
        1. Let matchStr be ToString(Get(result, "0")).
        2. ReturnIfAbrupt(matchStr).
        3. If matchStr is the empty String, then
          1. Let thisIndex be ToLength(Get(rx, "lastIndex")).
          2. ReturnIfAbrupt(thisIndex).
          3. Let nextIndex be AdvanceStringIndex(S, thisIndex, fullUnicode).
          4. Let setStatus be Set(rx, "lastIndex", nextIndex, true).
          5. ReturnIfAbrupt(setStatus).
  14. Let accumulatedResult be the empty String value.
  15. Let nextSourcePosition be 0.
  16. Repeat, for each result in results,
    1. Let nCaptures be ToLength(Get(result, "length")).
    2. ReturnIfAbrupt(nCaptures).
    3. Let nCaptures be max(nCaptures − 1, 0).
    4. Let matched be ToString(Get(result, "0")).
    5. ReturnIfAbrupt(matched).
    6. Let matchLength be the number of code units in matched.
    7. Let position be ToInteger(Get(result, "index")).
    8. ReturnIfAbrupt(position).
    9. Let position be max(min(position, lengthS), 0).
    10. Let n be 1.
    11. Let captures be an empty List.
    12. Repeat while nnCaptures
      1. Let capN be Get(result, ToString(n)).
      2. ReturnIfAbrupt(capN).
      3. If capN is not undefined, then
        1. Let capN be ToString(capN).
        2. ReturnIfAbrupt(capN).
      4. Append capN as the last element of captures.
      5. Let n be n+1
    13. If functionalReplace is true, then
      1. Let replacerArgs be «matched».
      2. Append in list order the elements of captures to the end of the List replacerArgs.
      3. Append position and S as the last two elements of replacerArgs.
      4. Let replValue be Call(replaceValue, undefined, replacerArgs).
      5. Let replacement be ToString(replValue).
    14. Else,
      1. Let replacement be GetSubstitution(matched, S, position, captures, replaceValue).
    15. ReturnIfAbrupt(replacement).
    16. If positionnextSourcePosition, then
      1. NOTE position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
      2. Let accumulatedResult be the String formed by concatenating the code units of the current value of accumulatedResult with the substring of S consisting of the code units from nextSourcePosition (inclusive) up to position (exclusive) and with the code units of replacement.
      3. Let nextSourcePosition be position + matchLength.
  17. If nextSourcePositionlengthS, return accumulatedResult.
  18. Return the String formed by concatenating the code units of accumulatedResult with the substring of S consisting of the code units from nextSourcePosition (inclusive) up through the final code unit of S (inclusive).

The value of the name property of this function is "[Symbol.replace]".

21.2.5.9 RegExp.prototype [ @@search ] ( string )

When the @@search method is called with argument string, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ToString(string).
  4. ReturnIfAbrupt(S).
  5. Let previousLastIndex be Get(rx, "lastIndex").
  6. ReturnIfAbrupt(previousLastIndex).
  7. Let status be Set(rx, "lastIndex", 0, true).
  8. ReturnIfAbrupt(status).
  9. Let result be RegExpExec(rx, S).
  10. ReturnIfAbrupt(result).
  11. Let status be Set(rx, "lastIndex", previousLastIndex, true).
  12. ReturnIfAbrupt(status).
  13. If result is null, return –1.
  14. Return Get(result, "index").

The value of the name property of this function is "[Symbol.search]".

NOTE The lastIndex and global properties of this RegExp object are ignored when performing the search. The lastIndex property is left unchanged.

21.2.5.10 get RegExp.prototype.source

RegExp.prototype.source is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalSource]] internal slot, throw a TypeError exception.
  4. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  5. Let src be the value of R’s [[OriginalSource]] internal slot.
  6. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  7. Return EscapeRegExpPattern(src, flags).

21.2.5.11 RegExp.prototype [ @@split ] ( string, limit )

NOTE 1 Returns an Array object into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the this value regular expression; these occurrences are not part of any substring in the returned array, but serve to divide up the String value.

The this value may be an empty regular expression or a regular expression that can match an empty String. In this case, regular expression does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. (For example, if the regular expression matches the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.) Only the first match at a given index of the this String is considered, even if backtracking could yield a non-empty-substring match at that index. (For example, /a*?/[Symbol.split]("ab") evaluates to the array ["a","b"], while /a*/[Symbol.split]("ab") evaluates to the array["","b"].)

If the string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.

If the regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,

    /<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")

evaluates to the array

    ["A",undefined,"B","bold","/","B","and",undefined,"CODE","coded","/","CODE",""]

If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.

When the @@split method is called, the following steps are taken:

  1. Let rx be the this value.
  2. If Type(rx) is not Object, throw a TypeError exception.
  3. Let S be ToString(string).
  4. ReturnIfAbrupt(S).
  5. Let C be SpeciesConstructor(rx, %RegExp%).
  6. ReturnIfAbrupt(C).
  7. Let flags be ToString(Get(rx, "flags")).
  8. ReturnIfAbrupt(flags).
  9. If flags contains "u", let unicodeMatching be true.
  10. Else, let unicodeMatching be false.
  11. If flags contains "y", let newFlags be flags.
  12. Else, let newFlags be the string that is the concatenation of flags and "y".
  13. Let splitter be Construct(C, «rx, newFlags»).
  14. ReturnIfAbrupt(splitter).
  15. Let A be ArrayCreate(0).
  16. Let lengthA be 0.
  17. If limit is undefined, let lim be 253–1; else let lim be ToLength(limit).
  18. ReturnIfAbrupt(lim).
  19. Let size be the number of elements in S.
  20. Let p be 0.
  21. If lim = 0, return A.
  22. If size = 0, then
    1. Let z be RegExpExec(splitter, S).
    2. ReturnIfAbrupt(z).
    3. If z is not null, return A.
    4. Assert: The following call will never result in an abrupt completion.
    5. Perform CreateDataProperty(A, "0", S).
    6. Return A.
  23. Let q be p.
  24. Repeat, while q < size
    1. Let setStatus be Set(splitter, "lastIndex", q, true).
    2. ReturnIfAbrupt(setStatus).
    3. Let z be RegExpExec(splitter, S).
    4. ReturnIfAbrupt(z).
    5. If z is null, let q be AdvanceStringIndex(S, q, unicodeMatching).
    6. Else z is not null,
      1. Let e be ToLength(Get(splitter, "lastIndex")).
      2. ReturnIfAbrupt(e).
      3. If e = p, let q be AdvanceStringIndex(S, q, unicodeMatching).
      4. Else ep,
        1. Let T be a String value equal to the substring of S consisting of the elements at indices p (inclusive) through q (exclusive).
        2. Assert: The following call will never result in an abrupt completion.
        3. Perform CreateDataProperty(A, ToString(lengthA), T).
        4. Let lengthA be lengthA +1.
        5. If lengthA = lim, return A.
        6. Let p be e.
        7. Let numberOfCaptures be ToLength(Get(z, "length")).
        8. ReturnIfAbrupt(numberOfCaptures).
        9. Let numberOfCaptures be max(numberOfCaptures-1, 0).
        10. Let i be 1.
        11. Repeat, while inumberOfCaptures.
          1. Let nextCapture be Get(z, ToString(i)).
          2. ReturnIfAbrupt(nextCapture).
          3. Perform CreateDataProperty(A, ToString(lengthA), nextCapture).
          4. Let i be i +1.
          5. Let lengthA be lengthA +1.
          6. If lengthA = lim, return A.
        12. Let q be p.
  25. Let T be a String value equal to the substring of S consisting of the elements at indices p (inclusive) through size (exclusive).
  26. Assert: The following call will never result in an abrupt completion.
  27. Perform CreateDataProperty(A, ToString(lengthA), T ).
  28. Return A.

The length property of the @@split method is 2.

The value of the name property of this function is "[Symbol.split]".

NOTE 2 The @@split method ignores the value of the global and sticky properties of this RegExp object.

21.2.5.12 get RegExp.prototype.sticky

RegExp.prototype.sticky is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  4. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  5. If flags contains the code unit "y", return true.
  6. Return false.

21.2.5.13 RegExp.prototype.test( S )

The following steps are taken:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let string be ToString(S).
  4. ReturnIfAbrupt(string).
  5. Let match be RegExpExec(R, string).
  6. ReturnIfAbrupt(match).
  7. If match is not null, return true; else return false.

21.2.5.14 RegExp.prototype.toString ( )

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. Let pattern be ToString(Get(R, "source")).
  4. ReturnIfAbrupt(pattern).
  5. Let flags be ToString(Get(R, "flags")).
  6. ReturnIfAbrupt(flags).
  7. Let result be the String value formed by concatenating "/", pattern, and "/", and flags.

NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.

21.2.5.15 get RegExp.prototype.unicode

RegExp.prototype.unicode is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:

  1. Let R be the this value.
  2. If Type(R) is not Object, throw a TypeError exception.
  3. If R does not have an [[OriginalFlags]] internal slot, throw a TypeError exception.
  4. Let flags be the value of R’s [[OriginalFlags]] internal slot.
  5. If flags contains the code unit "u", return true.
  6. Return false.

21.2.6 Properties of RegExp Instances

RegExp instances are ordinary objects that inherit properties from the RegExp prototype object. RegExp instances have internal slots [[RegExpMatcher]], [[OriginalSource]], and [[OriginalFlags]]. The value of the [[RegExpMatcher]] internal slot is an implementation dependent representation of the Pattern of the RegExp object.

NOTE Prior to ECMAScript 2015, RegExp instances were specified as having the own data properties source, global, ignoreCase, and multiline. Those properties are now specified as accessor properties of RegExp.prototype.

RegExp instances also have the following property:

21.2.6.1 lastIndex

The value of the lastIndex property specifies the String index at which to start the next match. It is coerced to an integer when used (see 21.2.5.2.2). This property shall have the attributes { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }.