JavaScript: RegExp Syntax
Regular Expression (aka regexp) syntax has 2 parts:
- regex → the text pattern to match.
- flags → flags is used to tweak regex meaning or how the regex function behaves. For example, ignore letter case.
RegExp Object Syntax
RegExp object are created in 2 ways:
/regex/flags
- Literal expression. This is convenient.
RegExp(str, flagsStr)
- [see RegExp Constructor] This is more general, and can be used to construct regex from string on the fly.
[see RegExp Tutorial]
RegExp Flags
RegExp Syntax
Character Class
.
-
Any character except newline characters: {
\n
,\r
,\u2028
,\u2029
}.
If dotAll flags
is used, also match newline character.
const txt = `B3 yes?`; const rgx = /.+/g; console.log(...txt.matchAll(rgx)); // [ "B3" ] [ "yes?" ]
const txt = `B3 yes?`; const rgx = /.+/gs; console.log(...txt.matchAll(rgx)); // [ "B3\nyes?" ]
[…]
-
Any character between the brackets.
Can include character class in them such as
\w
.console.log( ..."cat cet cit cot cut".matchAll("c[aou]t"), ); // [ "cat" ] [ "cot" ] [ "cut" ]
console.log(..."fire-brand y42".matchAll(/[-\w]+/g)); // [ "fire-brand" ] [ "y42" ]
-
[^…]
- Any char that's not one of the character in the brackets.
\w
-
Any A to Z, a to z, and 0 to 9 and low line _.
const x = "x2 a-b y_z 中文" console.log(... (x.matchAll(/(\w+)/g))); // [ "x2", "x2" ] [ "a", "a" ] [ "b", "b" ] [ "y_z", "y_z" ] // word syntax does not match hypen, no chinese, but contain digit and low line
\W
-
Any character that is not
\w
. \d
-
Any ASCII digit 0 to 9.
Example:
"xyz123".search( /\d/ )
\D
-
Any character that's not
\d
. \s
- Any whitespace character. Whitespace includes space, tab, form feed, line feed, and other Unicode spaces. [see JavaScript: Whitespace Characters]
\S
-
Any character that is not
\s
.
Character Class by Unicode Property
\p{UnicodeProperty}
- [JavaScript: RegExp Unicode Property]
\P{x}
-
(Note, uppercase P) Negation of
\p{x}
.
Boundaries
^
-
beginning of string. If RegExp Flag
g
is set, also match beginning of lines. $
-
end of string. If RegExp Flag
g
is set, also match end of line. \b
-
word boundary. For literal backspace, use
[\b]
.console.log("cats cat".search(/cat/) === 0); console.log("cats cat".search(/\bcat\b/) === 5); // all true
\B
- Not word boundary.
Repetition
*
-
Match previous pattern 0 or more times. Same as
{0,}
. ?
-
Match previous pattern 0 or 1 time. Same as
{0,1}
. +
-
Match previous pattern 1 or more times. Same as
{1,}
.console.log("278".search(/\d+/) === 0); // true
{n}
-
Match previous pattern exactly n times.
console.log("eat".search(/e{2}/) === -1); console.log("feet".search(/e{2}/)=== 1); // all true
{n,}
- Match previous pattern n or more times.
{n, m}
- Match previous pattern n times or up to m times (inclusive).
Note: these will match as far as possible. For non-greedy version, add a ?
after them.
Alternate and Conditions
x|y
-
Alternate. Match either x or y.
console.log("wildfire".search(/water|fire/) === 4); // true
x(?=y)
-
Look ahead assertion.
Match only if x is followed by y
// replace all ab by abc, only if ab is followed by comma or period console.log("abc, ab, ab.".replace(/ab(?=[\.,])/g, "abc") === "abc, abc, abc."); // true
x(?!y)
- Match only if x is not followed by y
(?<=y)x
-
(JS2018) Look behind assertion.
Match only if y comes before x.console.log( "sometimes somehow".replace(/(?<=some)how/, "one") === "sometimes someone", ); // true
(?<!y)x
-
(JS2018) Negative Look behind assertion.
Match only if y does not come before x.
Capture Group, Back Reference
(…)
-
Capture. Captured group can be later referenced by
/n
where n is a digit.\1
is the first captured group.console.log(/(\d{4}).+(\d{4})/.exec("born 1899, died 1960")); // [ "1899, died 1960", "1899", "1960" ]
console.log( "born 1899, died 1960".replace(/.+(\d{4}).+(\d{4})/, "$1 to $2") === "1899 to 1960", ); // true
-
(?<name>…)
-
(JS2018)
Named capture group. The group can be refered to by
\k<name>
in regex or$<name>
in replacement string.// match text where width and height are the same console.log( /width="(?<w>\d+)" height="\k<w>"/.exec('width="300" height="300"'), ); // [ 'width="300" height="300"', "300" ]
console.log( "lived from 1899 to 1960".replace( /.+(?<born>\d{4}).+(?<died>\d{4})/, "$<born> - $<died>", ), ); // 1899 - 1960
(?:…)
- Syntax for priority (precedence), but don't capture.
\n
-
The nth captured group.
\1
is the first captured group. \k<nanme>
- Refer to named capture group name.
Escapes for Literal Characters
\0
- the NUL character (ASCII 0) [see ASCII Characters]
\t
- horizontal tab (common tab char)
\n
- line feed (unix newline char)
\v
- vertical tab (rarely used)
\f
- form feed (often used in emacs as code section break)
\r
- carriage return (used in Mac OS Classic as newline)
\xxx
-
a ASCII character of hexadecimal code xx. For example,
/\x61/
matches the letter “a” (ASCII code 97, hexadecimal 61) \uxxxx
-
a Unicode character with hexadecimal code xxxx. It must be 4 digits. Add 0 in front if not. For example,
/\u03b1/
matches “α” (codepoint 945, hexadecimal 3b1).console.log("α".search(/\u03b1/) === 0); console.log("α".search(/α/) === 0); // all true
\cX
-
a ASCII control character. For example,
/\cJ/
matches the unix newline\n
. [see ASCII Characters] [\b]
- a backspace.
JavaScript Regular Expression
- RegExp Tutorial
- RegExp Functions
- RegExp Syntax
- RegExp Flags
- Replacement String Dollar Sign Sequence
- Replacement Function Arguments