The String constructor is the %String% intrinsic object and the initial value of the String
property of
the global object. When called as a constructor it creates and initializes a new String object. When String
is called as a function rather than as a constructor, it performs a type conversion.
The String
constructor is designed to be subclassable. It may be used as the value of an
extends
clause of a class definition. Subclass constructors that intend to inherit the specified
String
behaviour must include a super
call to the String
constructor to create and
initialize the subclass instance with a [[StringData]] internal
slot.
When String
is called with argument value, the following steps are
taken:
""
."%StringPrototype%"
)).The length
property of the String
function is 1.
The value of the [[Prototype]] internal slot of the String constructor is the intrinsic object %FunctionPrototype% (19.2.3).
Besides the length
property (whose value is 1), the String constructor has the following
properties:
The String.fromCharCode
function may be called with any number of arguments which
form the rest parameter codeUnits. The following steps are taken:
The length
property of the fromCharCode
function is 1.
The String.fromCodePoint
function may be called with any number of arguments which
form the rest parameter codePoints. The following steps are taken:
.
The length
property of the fromCodePoint
function is 1.
The initial value of String.prototype
is the intrinsic object %StringPrototype% (21.1.3).
This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.
The String.raw
function may be called with a variable number of arguments. The first
argument is template and the remainder of the arguments form the List substitutions. The following steps are taken:
"raw"
))."length"
)).The length
property of the raw
function is 1.
NOTE String.raw is intended for use as a tag function of a Tagged Template (12.3.7). When called as such, the first argument will be a well formed template object and the rest parameter will contain the substitution values.
The String prototype object is the intrinsic object %StringPrototype%. The String prototype object is itself an ordinary object. It is not a String instance and does not have a [[StringData]] internal slot.
The value of the [[Prototype]] internal slot of the String prototype object is the intrinsic object %ObjectPrototype% (19.1.3).
Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the this value passed to them must be either a String value or an object that has a [[StringData]] internal slot that has been initialized to a String value.
The abstract operation thisStringValue(value) performs the following steps:
The phrase “this String value” within the specification of a method refers to the result returned by calling the abstract operation thisStringValue with the this value of the method invocation passed as the argument.
NOTE 1 Returns a single element String containing the code unit at index pos in the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result is a String value, not a String object.
If pos is a value of Number type that is an integer, then the result of
x.charAt(
pos)
is equal to the result of
x.substring(
pos,
pos+1)
.
When the charAt
method is called with one argument pos, the following
steps are taken:
NOTE 2 The charAt
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
NOTE 1 Returns a Number (a nonnegative integer less than 216) that is the code unit value of the string element at index pos in the String resulting from converting this object to a String. If there is no element at that index, the result is NaN.
When the charCodeAt
method is called with one argument pos, the following
steps are taken:
NOTE 2 The charCodeAt
function is intentionally generic; it does not require
that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a
method.
NOTE 1 Returns a nonnegative integer Number less than 1114112 (0x110000) that is the code point value of the UTF-16 encoded code point (6.1.4) starting at the string element at index pos in the String resulting from converting this object to a String. If there is no element at that index, the result is undefined. If a valid UTF-16 surrogate pair does not begin at pos, the result is the code unit at pos.
When the codePointAt
method is called with one argument pos, the following
steps are taken:
NOTE 2 The codePointAt
function is intentionally generic; it does not require
that its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a
method.
NOTE 1 When the concat
method is called it returns a String consisting of the
code units of the this
object (converted to a String) followed by the code units of each of the arguments
converted to a String. The result is a String value, not a String object.
When the concat
method is called with zero or more arguments the following steps are
taken:
The length
property of the concat
method is 1.
NOTE 2 The concat
function is intentionally generic; it does not require that
its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a
method.
The initial value of String.prototype.constructor
is the intrinsic object %String%.
The following steps are taken:
The length property of the endsWith method is 1.
NOTE 1 Returns true if the sequence of elements of searchString converted to a String is the same as the corresponding elements of this object (converted to a String) starting at endPosition – length(this). Otherwise returns false.
NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
NOTE 3 The endsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.
The includes
method takes two arguments, searchString and position, and
performs the following steps:
The length property of the includes
method is 1.
NOTE 1 If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, return true; otherwise, returns false. If position is undefined, 0 is assumed, so as to search all of the String.
NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
NOTE 3 The includes
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects
for use as a method.
NOTE 1 If searchString appears as a substring of the result of converting this
object to a String, at one or more indices that are greater than or equal to position, then the smallest such
index is returned; otherwise, ‑1
is returned. If position is undefined, 0 is
assumed, so as to search all of the String.
The indexOf
method takes two arguments, searchString and
position, and performs the following steps:
0
).-1
.The length
property of the indexOf
method is 1.
NOTE 2 The indexOf
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
NOTE 1 If searchString appears as a substring of the result of converting this
object to a String at one or more indices that are smaller than or equal to position, then the greatest such
index is returned; otherwise, ‑1
is returned. If position is undefined, the length
of the String value is assumed, so as to search all of the String.
The lastIndexOf
method takes two arguments, searchString and
position, and performs the following steps:
-1
.The length
property of the lastIndexOf
method is 1.
NOTE 2 The lastIndexOf
function is intentionally generic; it does not require
that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the
localeCompare
method as specified in the ECMA-402 specification. If an ECMAScript implementation does not
include the ECMA-402 API the following specification of the localeCompare
method is used.
When the localeCompare
method is called with argument that, it returns a Number other than
NaN that represents the result of a locale-sensitive String comparison of the this value (converted to a
String) with that (converted to a String). The two Strings are S and That.
The two Strings are compared in an implementation-defined fashion. The result is intended to order String values in the
sort order specified by a host default locale, and will be negative, zero, or positive, depending on whether S
comes before That in the sort order, the Strings are equal, or S comes after That in the sort order, respectively.
Before performing the comparisons, the following steps are performed to prepare the Strings:
The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.
The localeCompare
method, if considered as a function of two arguments this and that, is
a consistent comparison function (as defined in 22.1.3.24) on the set of all
Strings.
The actual return values are implementation-defined to permit implementers to encode additional information in the
value, but the function is required to define a total ordering on all Strings. This function must treat Strings that are
canonically equivalent according to the Unicode standard as identical and must return 0
when comparing
Strings that are considered canonically equivalent.
The length
property of the localeCompare
method is 1.
NOTE 1 The localeCompare
method itself is not directly suitable as an argument
to Array.prototype.sort
because the latter requires a function of
two arguments.
NOTE 2 This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the rules of the host environment's current locale. However, regardless of the host provided comparison capabilities, this function must treat Strings that are canonically equivalent according to the Unicode standard as identical. It is recommended that this function should not honour Unicode compatibility equivalences or decompositions. For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms (http://www.unicode.org/reports/tr15/) and Unicode Technical Note #5, Canonical Equivalence in Applications (http://www.unicode.org/notes/tn5/). Also see Unicode Technical Standard #10, Unicode Collation Algorithm (http://www.unicode.org/reports/tr10/).
NOTE 3 The localeCompare
function is intentionally generic; it does not require
that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
When the match
method is called with argument regexp, the following steps
are taken:
NOTE The match
function is intentionally generic; it does not require that its
this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
When the normalize
method is called with one argument form, the following
steps are taken:
"NFC"
."NFC"
, "NFD"
,
"NFKC"
, or "NFKD"
, throw a RangeError exception.The length
property of the normalize
method is 0.
NOTE The normalize
function is intentionally generic; it does not require that
its this value be a String object. Therefore it can be transferred to other kinds of objects for use as a
method.
The following steps are taken:
NOTE 1 This method creates a String consisting of the code units of the this
object (converted to String) repeated count times.
NOTE 2 The repeat
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
When the replace
method is called with arguments searchValue and
replaceValue the following steps are taken:
NOTE The replace
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
The abstract operation GetSubstitution performs the following steps:
$
replacements are done left-to-right, and, once such a replacement is performed, the
new replacement text is not subject to further replacements.Code units | Unicode Characters | Replacement text |
---|---|---|
0x0024, 0x0024 | $$ |
$ |
0x0024, 0x0026 | $& |
matched |
0x0024, 0x0060 | $` |
If position is 0, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index 0 and whose last code unit is at index position -1. |
0x0024, 0x0027 | $' |
If tailPos ≥ stringLength, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index tailPos and continues to the end of str. |
0x0024, N Where 0x0031 ≤ N ≤ 0x0039 |
$n where is one of 1 2 3 4 5 6 7 8 9 and $n is not followed by a decimal digit |
The nth element of captures, where n is a single digit in the range 1 to 9. If n≤m and the nth element of captures is undefined, use the empty String instead. If n>m, the result is implementation-defined. |
0x0024, N, N Where 0x0030 ≤ N ≤ 0x0039 |
$nn where is one of 0 1 2 3 4 5 6 7 8 9 |
The nnth element of captures, where nn is a two-digit decimal number in the range 01 to 99. If nn≤m and the nnth element of captures is undefined, use the empty String instead. If nn is 00 or nn>m, the result is implementation-defined. |
0x0024 | $ in any context that does not match any of the above. |
$ |
When the search method is called with argument regexp, the following steps are taken:
NOTE The search
function is intentionally generic; it does not require that its
this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
The slice
method takes two arguments, start and end, and
returns a substring of the result of converting this object to a String, starting from index start and running
to, but not including, index end (or through the end of the String if end is undefined). If
start is negative, it is treated as sourceLength+start where sourceLength is the length of the String. If
end is negative, it is treated as sourceLength+end where sourceLength is the length of the String. The result is a
String value, not a String object. The following steps are taken:
The length
property of the slice
method is 2.
NOTE The slice
function is intentionally generic; it does not require that its
this value be a String object. Therefore it can be transferred to other kinds of objects for use as a method.
Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any substring in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as an RegExp, that has a @@split method.
When the split
method is called, the following steps are taken:
"0"
,
S)."0"
,
S).The length
property of the split
method is 2.
NOTE 1 The value of separator may be an empty String, an empty regular
expression, or a regular expression that can match an empty String. In this case, separator does not match
the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of
the previous separator match. (For example, if separator is the empty String, the String is split up into
individual code unit elements; the length of the result array equals the length of the String, and each substring
contains one code unit.) If separator is a regular expression, only the first match at a given index of the
this String is considered, even if backtracking could yield a non-empty-substring match at that index. (For
example, "ab".split(/a*?/)
evaluates to the array ["a","b"]
, while
"ab".split(/a*/)
evaluates to the array["","b"]
.)
If the this object is (or converts to) the empty String, the result depends on whether separator can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,
"A<B>bold</B>and<CODE>coded</CODE>".split(/<(\/)?([^<>]+)>/)
evaluates to the array:
["A", undefined, "B", "bold", "/", "B", "and",
undefined,
"CODE", "coded", "/", "CODE", ""]
If separator is undefined, then the result array contains just one String, which is the this value (converted to a String). If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.
NOTE 2 The split
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
The abstract operation SplitMatch takes three parameters, a String S, an integer q, and a String R, and performs the following steps in order to return either false or the end index of a match:
The following steps are taken:
The length property of the startsWith method is 1.
NOTE 1 This method returns true if the sequence of elements of searchString converted to a String is the same as the corresponding elements of this object (converted to a String) starting at index position. Otherwise returns false.
NOTE 2 Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
NOTE 3 The startsWith function is intentionally generic; it does not require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.
The substring method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end of the String (or through the end of the String is end is undefined). The result is a String value, not a String object.
If either argument is NaN or negative, it is replaced with zero; if either argument is larger than the length of the String, it is replaced with the length of the String.
If start is larger than end, they are swapped.
The following steps are taken:
The length
property of the substring
method is 2.
NOTE The substring
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the
toLocaleLowerCase
method as specified in the ECMA-402 specification. If an ECMAScript implementation does not
include the ECMA-402 API the following specification of the toLocaleLowerCase
method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.
This function works exactly the same as toLowerCase
except that its result is intended to yield the
correct result for the host environment's current locale, rather than a locale-independent result. There will only
be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode
case mappings.
The length property of the toLocaleLowerCase method is 0.
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
NOTE The toLocaleLowerCase
function is intentionally generic; it does not
require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for
use as a method.
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the
toLocaleUpperCase
method as specified in the ECMA-402 specification. If an ECMAScript implementation does not
include the ECMA-402 API the following specification of the toLocaleUpperCase
method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.
This function works exactly the same as toUpperCase
except that its result is intended to yield the
correct result for the host environment's current locale, rather than a locale-independent result. There will only
be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode
case mappings.
The length property of the toLocaleUpperCase method is 0.
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
NOTE The toLocaleUpperCase
function is intentionally generic; it does not
require that its this value be a String object. Therefore, it can be transferred to other kinds of objects for
use as a method.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4. The following steps are taken:
The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).
NOTE 1 The case mapping of some code points may produce multiple code points . In this case
the result String may not be the same length as the source String. Because both toUpperCase
and
toLowerCase
have context-sensitive behaviour, the functions are not symmetrical. In other words,
s.toUpperCase().toLowerCase()
is not necessarily equal to s.toLowerCase()
.
NOTE 2 The toLowerCase
function is intentionally generic; it does not require
that its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
When the toString
method is called, the following steps are taken:
NOTE For a String object, the toString
method happens to return the same thing
as the valueOf
method.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.
This function behaves in exactly the same way as String.prototype.toLowerCase
, except that code points are mapped to
their uppercase equivalents as specified in the Unicode Character Database.
NOTE The toUpperCase
function is intentionally generic; it does not require that
its this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in 6.1.4.
The following steps are taken:
NOTE The trim
function is intentionally generic; it does not require that its
this value be a String object. Therefore, it can be transferred to other kinds of objects for use as a
method.
When the valueOf
method is called, the following steps are taken:
When the @@iterator method is called it returns an Iterator object (25.1.1.2) that iterates over the code points of a String value, returning each code point as a String value. The following steps are taken:
The value of the name
property of this function is "[Symbol.iterator]"
.
String instances are String exotic objects and have the internal methods specified for such objects. String instances inherit properties from the String prototype object. String instances also have a [[StringData]] internal slot.
String instances have a length
property, and a set of enumerable properties with integer indexed
names.
The number of elements in the String value represented by this String object.
Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.
An String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named constructor for String Iterator objects. Instead, String iterator objects are created by calling certain methods of String instance objects.
Several methods of String objects return Iterator objects. The abstract operation CreateStringIterator with argument string is used to create such iterator objects. It performs the following steps:
All String Iterator Objects inherit properties from the %StringIteratorPrototype% intrinsic object. The %StringIteratorPrototype% object is an ordinary object and its [[Prototype]] internal slot is the %IteratorPrototype% intrinsic object (25.1.2). In addition, %StringIteratorPrototype% has the following properties:
The initial value of the @@toStringTag property is the String value "String Iterator"
.
This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.
String Iterator instances are ordinary objects that inherit properties from the %StringIteratorPrototype% intrinsic object. String Iterator instances are initially created with the internal slots listed in Table 46.
Internal Slot | Description |
---|---|
[[IteratedString]] | The String value whose elements are being iterated. |
[[StringIteratorNextIndex]] | The integer index of the next string index to be examined by this iteration. |
A RegExp object contains a regular expression and the associated flags.
NOTE The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
The RegExp
constructor applies the following grammar to the input pattern String. An error occurs if the
grammar cannot interpret the String as an expansion of Pattern.
|
Disjunction[?U]^
$
\
b
\
B
(
?
=
Disjunction[?U] )
(
?
!
Disjunction[?U] )
?
*
+
?
{
DecimalDigits }
{
DecimalDigits ,
}
{
DecimalDigits ,
DecimalDigits }
.
\
AtomEscape[?U](
Disjunction[?U] )
(
?
:
Disjunction[?U] )
^
$
\
.
*
+
?
(
)
[
]
{
}
|
c
ControlLetterf
n
r
t
v
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
u
LeadSurrogate \u
TrailSurrogateu
LeadSurrogateu
TrailSurrogateu
NonSurrogateu
Hex4Digitsu{
HexDigits }
Each \u
TrailSurrogate for which the choice of associated u
LeadSurrogate is ambiguous shall be associated with the nearest possible u
LeadSurrogate that would otherwise have no corresponding \u
TrailSurrogate.
/
d
D
s
S
w
W
[
[lookahead ∉ {^
}] ClassRanges[?U] ]
[
^
ClassRanges[?U] ]
-
ClassAtom[?U] ClassRanges[?U]-
ClassAtom[?U] ClassRanges[?U]-
\
or ]
or -
\
ClassEscape[?U]b
-
u{
HexDigits }
A regular expression pattern is converted into an internal procedure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The internal procedure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.
A Pattern is either a BMP pattern or a Unicode pattern depending upon whether or not its
associated flags contain a "u"
. A BMP pattern matches against a String interpreted as consisting of a
sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern
matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of
describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the
context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (6.1.4). In either context, “character value” means the
numeric value of the corresponding non-encoded code point.
The syntax and semantics of Pattern is defined as if the source code for the Pattern was a List of SourceCharacter values where each SourceCharacter corresponds to a Unicode code point. If a BMP pattern contains a non-BMP SourceCharacter the entire pattern is encoded using UTF-16 and the individual code units of that encoding are used as the elements of the List.
NOTE For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character) List consisting of the single code point 0x1D11E. However, interpreted as a BMP pattern, it is first UTF-16 encoded to produce a two element List consisting of the code units 0xD834 and 0xDD1E.
Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern UTF16Decode (see 10.1.2) must be used in producing a List consisting of a single pattern character, the code point U+1D11E.
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
The descriptions below use the following variables:
Input is a List consisting of all of the characters, in order, of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation Input[n] means the nth character of Input, where n can range between 0 (inclusive) and InputLength (exclusive).
InputLength is the number of characters in Input.
NcapturingParens is the total number of left capturing parentheses (i.e. the total number
of times the Atom :: (
Disjunction )
production is expanded) in the pattern. A left
capturing parenthesis is any (
pattern character that is matched by the (
terminal of the
Atom :: (
Disjunction )
production.
IgnoreCase is true if the RegExp object's [[OriginalFlags]] internal slot contains "i"
and otherwise is
false.
Multiline is true if the RegExp object's [[OriginalFlags]] internal slot contains "m"
and otherwise is
false.
Unicode is true if the RegExp object's [[OriginalFlags]] internal slot contains "u"
and otherwise is
false.
Furthermore, the descriptions below use the following internal data structures:
A CharSet is a mathematical set of characters, either code units or code points depending up the state of the Unicode flag. “All characters” means either all code unit values or all code point values also depending upon the state if Unicode.
A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is a List of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The nth element of captures is either a List that represents the value obtained by the nth set of capturing parentheses or undefined if the nth set of capturing parentheses hasn’t been reached yet. Due to backtracking, many States may be in use at any time during the matching process.
A MatchResult is either a State or the special token failure that indicates that the match failed.
A Continuation procedure is an internal closure (i.e. an internal procedure with some arguments already bound to values) that takes one State argument and returns a MatchResult result. If an internal closure references variables which are bound in the function that creates the closure, the closure uses the values that these variables had at the time the closure was created. The Continuation attempts to match the remaining portion (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns failure.
A Matcher procedure is an internal closure that takes two arguments — a State and a Continuation — and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.
An AssertionTester procedure is an internal closure that takes a State argument and returns a Boolean result. The assertion tester tests a specific condition (specified by the closure's already-bound arguments) against the current place in Input and returns true if the condition matched or false if not.
An EscapeValue is either a character or an integer. An EscapeValue is used to denote the interpretation of a DecimalEscape escape sequence: a character ch means that the escape sequence is interpreted as the character ch, while an integer n means that the escape sequence is interpreted as a backreference to the nth set of capturing parentheses.
The production Pattern :: Disjunction evaluates as follows:
NOTE A Pattern evaluates (“compiles”) to an internal procedure value. RegExp.prototype.exec
and other methods can then apply this procedure to a
String and an offset within the String to determine whether the pattern would match starting at exactly that offset
within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in 21.2.2 are designed so that compiling a pattern may throw a SyntaxError
exception; on the other hand, once the pattern is successfully compiled, applying the resulting internal procedure to
find a match in a String cannot throw an exception (except for any host-defined exceptions that can occur anywhere such
as out-of-memory).
The production Disjunction :: Alternative evaluates by evaluating Alternative to obtain a Matcher and returning that Matcher.
The production Disjunction ::
Alternative |
Disjunction evaluates as
follows:
NOTE The |
regular expression operator separates two alternatives. The pattern
first tries to match the left Alternative (followed by the sequel of the regular expression); if
it fails, it tries to match the right Disjunction (followed by the sequel of the regular
expression). If the left Alternative, the right Disjunction, and the
sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left Alternative. If choices in the left Alternative are exhausted, the right Disjunction is tried instead of the left Alternative. Any capturing
parentheses inside a portion of the pattern skipped by |
produce undefined values instead of
Strings. Thus, for example,
/a|ab/.exec("abc")
returns the result "a"
and not "ab"
. Moreover,
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The production Alternative :: [empty] evaluates by returning a Matcher that takes two arguments, a State x and a Continuation c, and returns the result of calling c(x).
The production Alternative :: Alternative Term evaluates as follows:
NOTE Consecutive Terms try to simultaneously match consecutive portions of Input. If the left Alternative, the right Term, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right Term, and all choices in the right Term are tried before moving on to the next choice in the left Alternative.
The production Term :: Assertion evaluates by returning an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
The production Term :: Atom evaluates as follows:
The production Term :: Atom Quantifier evaluates as follows:
(
Disjunction )
production is expanded prior to this production's
Term plus the total number of Atom ::
(
Disjunction )
productions enclosing
this Term.(
Disjunction )
productions enclosed by this production's Atom.The abstract operation RepeatMatcher takes eight parameters, a Matcher m, an integer min, an integer (or ∞) max, a Boolean greedy, a State x, a Continuation c, an integer parenIndex, and an integer parenCount, and performs the following steps:
NOTE 1 An Atom followed by a Quantifier is repeated the number of times specified by the Quantifier. A Quantifier can be non-greedy, in which case the Atom pattern is repeated as few times as possible while still matching the sequel, or it can be greedy, in which case the Atom pattern is repeated as many times as possible while still matching the sequel. The Atom pattern is repeated rather than the input character sequence that it matches, so different repetitions of the Atom can match different input substrings.
NOTE 2 If the Atom and the sequel of the regular expression all have choice points, the Atom is first matched as many (or as few, if non-greedy) times as possible. All choices in the sequel are tried before moving on to the next choice in the last repetition of Atom. All choices in the last (nth) repetition of Atom are tried before moving on to the next choice in the next-to-last (n–1)st repetition of Atom; at which point it may turn out that more or fewer repetitions of Atom are now possible; these are exhausted (again, starting with either as few or as many as possible) before moving on to the next choice in the (n-1)st repetition of Atom and so on.
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns "abcde"
with
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns "abc"
.
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/,"$1")
which returns the gcd in unary notation "aaaaa"
.
NOTE 3 Step 5 of the RepeatMatcher clears Atom's captures each time Atom is repeated. We can see its behaviour in the regular expression
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost *
clears all captured Strings contained in the quantified
Atom, which in this case includes capture Strings numbered 2, 3, 4, and 5.
NOTE 4 Step 1 of the RepeatMatcher's d closure states that, once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty character sequence are not considered for further repetitions. This prevents the regular expression engine from falling into an infinite loop on patterns such as:
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
The production Assertion ::
^
evaluates by returning an internal AssertionTester closure that takes a State argument
x and performs the following steps when evaluated:
NOTE Even when the y
flag is used with a pattern, ^
always
matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.
The production Assertion ::
$
evaluates by returning an internal AssertionTester closure that takes a State argument
x and performs the following steps when evaluated:
The production Assertion ::
\
b
evaluates by returning an internal AssertionTester closure that
takes a State argument x and performs the following steps when evaluated:
The production Assertion ::
\
B
evaluates by returning an internal AssertionTester closure that
takes a State argument x and performs the following steps when evaluated:
The production Assertion ::
(
?
=
Disjunction )
evaluates as follows:
The production Assertion ::
(
?
!
Disjunction )
evaluates as follows:
The abstract operation IsWordChar takes an integer parameter e and performs the following steps:
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
_ |
The production Quantifier :: QuantifierPrefix evaluates as follows:
The production Quantifier ::
QuantifierPrefix ?
evaluates as follows:
The production QuantifierPrefix :: *
evaluates as follows:
The production QuantifierPrefix :: +
evaluates as follows:
The production QuantifierPrefix :: ?
evaluates as follows:
The production QuantifierPrefix :: {
DecimalDigits }
evaluates as follows:
The production QuantifierPrefix :: {
DecimalDigits ,
}
evaluates as follows:
The production QuantifierPrefix :: {
DecimalDigits ,
DecimalDigits }
evaluates as follows:
The production Atom :: PatternCharacter evaluates as follows:
The production Atom :: .
evaluates as follows:
The production Atom :: \
AtomEscape evaluates as follows:
The production Atom :: CharacterClass evaluates as follows:
The production Atom :: (
Disjunction )
evaluates as follows:
(
Disjunction )
production is expanded prior to this production's
Atom plus the total number of Atom ::
(
Disjunction )
productions enclosing
this Atom.The production Atom :: (
?
:
Disjunction )
evaluates as follows:
The abstract operation CharacterSetMatcher takes two arguments, a CharSet A and a Boolean flag invert, and performs the following steps:
The abstract operation Canonicalize takes a character parameter ch and performs the following steps:
String.prototype.toUpperCase
using s as the
this value.NOTE 1 Parentheses of the form (
Disjunction
)
serve both to group the components of the Disjunction pattern together and to
save the result of the match. The result can be used either in a backreference (\
followed by a nonzero
decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching
internal procedure. To inhibit the capturing behaviour of parentheses, use the form (?:
Disjunction )
instead.
NOTE 2 The form (?=
Disjunction )
specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside Disjunction must match at the current position, but the current position is not advanced before
matching the sequel. If Disjunction can match at the current position in several ways, only
the first one is tried. Unlike other regular expression operators, there is no backtracking into a (?=
form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction
contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b
and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
NOTE 3 The form (?!
Disjunction )
specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside Disjunction must fail to match at the current position. The current position is not advanced before
matching the sequel. Disjunction can contain capturing parentheses, but backreferences to them
only make sense from within Disjunction itself. Backreferences to these capturing parentheses
from elsewhere in the pattern always return undefined because the negative lookahead must fail for the pattern
to succeed. For example,
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a
not immediately followed by some positive number n of a
's, a
b
, another n a
's (specified by the first \2
) and a c
. The second
\2
is outside the negative lookahead, so it matches against undefined and therefore always
succeeds. The whole expression returns the array:
["baaabaac", "ba", undefined, "abaac"]
NOTE 4 In case-insignificant matches when Unicode is true, all characters are implicitly case-folded using the simple mapping provided by the Unicode
standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not
map, for example, "ß"
(U+00DF) to "SS"
. It may however map a code point outside the
Basic Latin range to a character within, for example, "
ſ"
(U+017F) to
"
s"
. Such characters are not mapped if Unicode is false. This prevents Unicode code points such as U+017F and U+212A from matching regular
expressions such as /[a‑z]/i
, but they will match /[a‑z]/ui
.
The production AtomEscape :: DecimalEscape evaluates as follows:
The production AtomEscape :: CharacterEscape evaluates as follows:
The production AtomEscape :: CharacterClassEscape evaluates as follows:
NOTE An escape sequence of the form \
followed by a nonzero decimal number
n matches the result of the nth set of capturing parentheses (see 0). It is an error if the
regular expression has fewer than n capturing parentheses. If the regular expression has n or more
capturing parentheses but the nth one is undefined because it has not captured anything, then the
backreference always succeeds.
The production CharacterEscape :: ControlEscape evaluates by returning the character according to Table 47.
ControlEscape | Character Value | Code Point | Unicode Name | Symbol |
---|---|---|---|---|
t |
9 | U+0009 |
CHARACTER TABULATION | <HT> |
n |
10 | U+000A |
LINE FEED (LF) | <LF> |
v |
11 | U+000B |
LINE TABULATION | <VT> |
f |
12 | U+000C |
FORM FEED (FF) | <FF> |
r |
13 | U+000D |
CARRIAGE RETURN (CR) | <CR> |
The production CharacterEscape :: c
ControlLetter evaluates as follows:
The production CharacterEscape :: HexEscapeSequence evaluates as follows:
The production CharacterEscape :: RegExpUnicodeEscapeSequence evaluates as follows:
The production CharacterEscape :: IdentityEscape evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u
LeadSurrogate \u
TrailSurrogate evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u
LeadSurrogate evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u
TrailSurrogate evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u
NonSurrogate evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u
Hex4Digits evaluates as follows:
The production RegExpUnicodeEscapeSequence :: u{
HexDigits }
evaluates as follows:
The production LeadSurrogate :: Hex4Digits evaluates as follows:
The production TrailSurrogate :: Hex4Digits evaluates as follows:
The production NonSurrogate :: Hex4Digits evaluates as follows:
The production DecimalEscape :: DecimalIntegerLiteral evaluates as follows:
The definition of “the MV of DecimalIntegerLiteral” is in 11.8.3.
NOTE If \
is followed by a decimal number n whose first digit is not
0
, then the escape sequence is considered to be a backreference. It is an error if n is greater
than the total number of left capturing parentheses in the entire regular expression. \0
represents the
<NUL> character and cannot be followed by a decimal digit.
The production CharacterClassEscape :: d
evaluates by returning the ten-element set of characters containing the characters
0
through 9
inclusive.
The production CharacterClassEscape :: D
evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: d
.
The production CharacterClassEscape :: s
evaluates by returning the set of characters containing the characters that are on the
right-hand side of the WhiteSpace (11.2) or LineTerminator (11.3) productions.
The production CharacterClassEscape :: S
evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: s
.
The production CharacterClassEscape :: w
evaluates by returning the set of characters containing the sixty-three characters:
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
k |
l |
m |
n |
o |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
U |
V |
W |
X |
Y |
Z |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
_ |
The production CharacterClassEscape :: W
evaluates by returning the set of all characters not included in the set returned by CharacterClassEscape :: w
.
The production CharacterClass :: [
ClassRanges ]
evaluates by evaluating ClassRanges to obtain a CharSet and returning that CharSet and the Boolean false.
The production CharacterClass :: [
^
ClassRanges ]
evaluates
by evaluating ClassRanges to obtain a CharSet and returning that CharSet and the Boolean
true.
The production ClassRanges :: [empty] evaluates by returning the empty CharSet.
The production ClassRanges :: NonemptyClassRanges evaluates by evaluating NonemptyClassRanges to obtain a CharSet and returning that CharSet.
The production NonemptyClassRanges :: ClassAtom evaluates as follows:
The production NonemptyClassRanges :: ClassAtom NonemptyClassRangesNoDash evaluates as follows:
The production NonemptyClassRanges :: ClassAtom -
ClassAtom ClassRanges evaluates as follows:
The abstract operation CharacterRange takes two CharSet parameters A and B and performs the following steps:
The production NonemptyClassRangesNoDash :: ClassAtom evaluates as follows:
The production NonemptyClassRangesNoDash :: ClassAtomNoDash NonemptyClassRangesNoDash evaluates as follows:
The production NonemptyClassRangesNoDash :: ClassAtomNoDash -
ClassAtom
ClassRanges evaluates as follows:
NOTE 1 ClassRanges can expand into single ClassAtoms and/or ranges of two ClassAtoms separated by dashes. In the latter case the ClassRanges includes all characters between the first ClassAtom and the second ClassAtom, inclusive; an error occurs if either ClassAtom does not represent a single character (for example, if one is \w) or if the first ClassAtom's character value is greater than the second ClassAtom's character value.
NOTE 2 Even if the pattern ignores case, the case of the two ends of a range is significant
in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i
matches only
the letters E
, F
, e
, and f
, while the pattern /[E-f]/i
matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols [
,
\
, ]
, ^
, _
, and `
.
NOTE 3 A -
character can be treated literally or it can denote a range. It is
treated literally if it is the first or last character of ClassRanges, the beginning or end
limit of a range specification, or immediately follows a range specification.
The production ClassAtom :: -
evaluates by returning the CharSet containing the one character -
.
The production ClassAtom :: ClassAtomNoDash evaluates by evaluating ClassAtomNoDash to obtain a CharSet and returning that CharSet.
The production ClassAtomNoDash :: SourceCharacter but not one of \
or ]
or -
evaluates as follows:
The production ClassAtomNoDash :: \
ClassEscape evaluates as follows:
The production ClassEscape :: DecimalEscape evaluates as follows:
The production ClassEscape ::
b
evaluates as follows:
The production ClassEscape ::
-
evaluates as follows:
The production ClassEscape :: CharacterEscape evaluates as follows:
The production ClassEscape :: CharacterClassEscape evaluates as follows:
NOTE A ClassAtom can use any of the escape sequences that are allowed
in the rest of the regular expression except for \b
, \B
, and backreferences. Inside a CharacterClass, \b
means the backspace character, while \B
and
backreferences raise errors. Using a backreference inside a ClassAtom causes an error.
The RegExp constructor is the %RegExp% intrinsic object and the initial value of the RegExp
property of
the global object. When RegExp
is called as a function rather than as a constructor, it creates and
initializes a new RegExp object. Thus the function call RegExp(…)
is equivalent to the
object creation expression new RegExp(…)
with the same arguments.
The RegExp
constructor is designed to be subclassable. It may be used as the value of an
extends
clause of a class definition. Subclass constructors that intend to inherit the specified
RegExp
behaviour must include a super
call to the RegExp
constructor to create and
initialize subclass instances with the necessary internal slots.
The following steps are taken:
"constructor"
)."source"
)."flags"
).NOTE If pattern is supplied using a StringLiteral, the usual escape sequence substitutions are performed before the String is processed by RegExp. If pattern must contain an escape sequence to be recognized by RegExp, any U+005C (REVERSE SOLIDUS) code points must be escaped within the StringLiteral to prevent them being removed when the contents of the StringLiteral are formed.
When the abstract operation RegExpAlloc with argument newTarget is called, the following steps are taken:
"%RegExpPrototype%"
, «[[RegExpMatcher]], [[OriginalSource]],
[[OriginalFlags]]»)."lastIndex"
, PropertyDescriptor {[[Writable]]: true, [[Enumerable]]: false,
[[Configurable]]: false}).When the abstract operation RegExpInitialize with arguments obj, pattern, and flags is called, the following steps are taken:
"g"
, "i"
, "m"
,
"u"
, or "y"
or if it contains the same code unit more than once, throw a
SyntaxError exception."u"
, let BMP be false; else let BMP be true."lastIndex"
, 0,
true).When the abstract operation RegExpCreate with arguments P and F is called, the following steps are taken:
When the abstract operation EscapeRegExpPattern with arguments P and F is called, the following occurs:
"u"
) equivalent to P interpreted as UTF-16
encoded Unicode code points (6.1.4), in which certain
code points are escaped as described below. S may or may not be identical to P; however, the
internal procedure that would result from evaluating S as a Pattern (Pattern[U] if
F contains "u"
) must behave identically to the
internal procedure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract
operation using the same values for P and F must produce identical results./
or any LineTerminator occurring in the pattern shall be escaped in S
as necessary to ensure that the String value formed by concatenating the Strings "/"
, S,
"/"
, and F can be parsed (in an appropriate lexical context) as a
RegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if
P is "/"
, then S could be "\/"
or "\u002F"
, among other
possibilities, but not "/"
, because ///
followed by F would be parsed as a
SingleLineComment rather than a RegularExpressionLiteral. If P is the empty String, this
specification can be met by letting S be "(?:)"
.The value of the [[Prototype]] internal slot of the RegExp constructor is the intrinsic object %FunctionPrototype% (19.2.3).
Besides the length
property (whose value is 2), the RegExp constructor has the following
properties:
The initial value of RegExp.prototype
is the intrinsic object %RegExpPrototype% (21.2.5).
This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.
RegExp[@@species]
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
The value of the name
property of this function is "get [Symbol.species]"
.
NOTE RegExp prototype methods normally use their this
object's constructor
to create a derived object. However, a subclass constructor may over-ride that default behaviour by redefining its
@@species property.
The RegExp prototype object is the intrinsic object %RegExpPrototype%. The RegExp prototype object is an ordinary object. It is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
The value of the [[Prototype]] internal slot of the RegExp prototype object is the intrinsic object %ObjectPrototype% (19.1.3).
NOTE The RegExp prototype object does not have a valueOf
property of its own;
however, it inherits the valueOf
property from the Object prototype object.
The initial value of RegExp.prototype.constructor
is the intrinsic object %RegExp%.
Performs a regular expression match of string against the regular expression and returns an Array object containing the results of the match, or null if string did not match.
The String ToString(string) is searched for an occurrence of the regular expression pattern as follows:
The abstract operation RegExpExec with arguments R and S performs the following steps:
"exec"
).NOTE If a callable exec
property is not found this algorithm falls back to
attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for
prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup
of exec
.
The abstract operation RegExpBuiltinExec with arguments R and S performs the following steps:
"lastIndex"
))."global"
))."sticky"
))."u"
, let fullUnicode be true, else let fullUnicode be
false."lastIndex"
,
0, true)."lastIndex"
, 0, true)."lastIndex"
,
e, true)."length"
property is n + 1."index"
,
matchIndex)."input"
,
S)."0"
,
matchedSubstr).The abstract operation AdvanceStringIndex with arguments S, index, and unicode performs the following steps:
RegExp.prototype.flags
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
"global"
))."g"
as the last code unit of result."ignoreCase"
))."i"
as the last code unit of result."multiline"
))."m"
as the last code unit of result."unicode"
))."u"
as the last code unit of result."sticky"
))."y"
as the last code unit of result.RegExp.prototype.global
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
"g"
, return true.RegExp.prototype.ignoreCase
is an accessor property whose set accessor function is
undefined. Its get accessor function performs the following steps:
"i"
, return true.When the @@match
method is called with argument string, the following
steps are taken:
"global"
))."unicode"
))."lastIndex"
, 0,
true)."0"
))."lastIndex"
))."lastIndex"
,
nextIndex, true).The value of the name
property of this function is "[Symbol.match]"
.
NOTE The @@match property is used by the IsRegExp abstract operation to identify objects that have the basic behaviour of regular expressions. The absence of a @@match property or the existence of such a property whose value does not Boolean coerce to true indicates that the object is not intended to be used as a regular expression object.
RegExp.prototype.multiline
is an accessor property whose set accessor function is
undefined. Its get accessor function performs the following steps:
"m"
, return true.When the @@replace
method is called with arguments string and
replaceValue the following steps are taken:
"global"
))."unicode"
))."lastIndex"
, 0,
true)."0"
))."lastIndex"
))."lastIndex"
,
nextIndex, true)."length"
))."0"
))."index"
)).The value of the name
property of this function is "[Symbol.replace]"
.
When the @@search method is called with argument string, the following steps are taken:
"lastIndex"
)."lastIndex"
, 0,
true)."lastIndex"
,
previousLastIndex, true)."index"
).The value of the name
property of this function is "[Symbol.search]"
.
NOTE The lastIndex
and global
properties of this RegExp object are
ignored when performing the search. The lastIndex
property is left unchanged.
RegExp.prototype.source
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
NOTE 1 Returns an Array object into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the this value regular expression; these occurrences are not part of any substring in the returned array, but serve to divide up the String value.
The this value may be an empty regular expression or a regular expression that can match an
empty String. In this case, regular expression does not match the empty substring at the beginning or end of the input
String, nor does it match the empty substring at the end of the previous separator match. (For example, if the regular
expression matches the empty String, the String is split up into individual code unit elements; the length of the result
array equals the length of the String, and each substring contains one code unit.) Only the first match at a given index
of the this String is considered, even if backtracking could yield a non-empty-substring match at that index. (For
example, /a*?/[Symbol.split]("ab")
evaluates to the array ["a","b"]
, while
/a*/[Symbol.split]("ab")
evaluates to the array["","b"]
.)
If the string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If the regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. For example,
/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")
evaluates to the array
["A",undefined,"B","bold","/","B","and",undefined,"CODE","coded","/","CODE",""]
If limit is not undefined, then the output array is truncated so that it contains no more than limit elements.
When the @@split
method is called, the following steps are taken:
"flags"
))."u"
, let unicodeMatching be true."y"
, let newFlags be flags."y"
."0"
,
S)."lastIndex"
,
q, true)."lastIndex"
))."length"
)).The length
property of the @@split
method is 2.
The value of the name
property of this function is "[Symbol.split]"
.
NOTE 2 The @@split
method ignores the value of the global
and
sticky
properties of this RegExp object.
RegExp.prototype.sticky
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
"y"
, return true.The following steps are taken:
"source"
))."flags"
))."/"
, pattern, and
"/"
, and flags.NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.
RegExp.prototype.unicode
is an accessor property whose set accessor function is undefined. Its get accessor function performs the following steps:
"u"
, return true.RegExp instances are ordinary objects that inherit properties from the RegExp prototype object. RegExp instances have internal slots [[RegExpMatcher]], [[OriginalSource]], and [[OriginalFlags]]. The value of the [[RegExpMatcher]] internal slot is an implementation dependent representation of the Pattern of the RegExp object.
NOTE Prior to ECMAScript 2015, RegExp
instances were specified as having the own
data properties source
, global
, ignoreCase
, and multiline
. Those
properties are now specified as accessor properties of RegExp.prototype.
RegExp instances also have the following property:
The value of the lastIndex
property specifies the String index at which to start the next match. It is
coerced to an integer when used (see 21.2.5.2.2). This property shall have the
attributes { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }.