JavaScript: RegExp Syntax

By Xah Lee. Date: . Last updated: .

Regular Expression (aka regex, regexp) syntax has 2 parts:

Syntax

Regex object are created in 2 ways:

/pattern/flags
Literal expression. This is convenient.
RegExp(patternStr, flagsStr)
[see RegExp Constructor] This is more general, and can be used to construct regex from string on the fly.

[see RegExp Tutorial]

RegExp Flags

RegExp Flags

RegExp Pattern Syntax

Character Class

.
Any character except newline characters: {\n, \r, \u2028, \u2029}.
If dotAll flag s is used, also match newline character.
const txt = `B3
yes?`;
const rgx = /.+/g;
console.log(...txt.matchAll(rgx));
// [ "B3" ] [ "yes?" ]
const txt = `B3
yes?`;
const rgx = /.+/gs;
console.log(...txt.matchAll(rgx));
// [ "B3\nyes?" ]
[]
Any character between the brackets. Can include character class in them such as \w.
console.log(
  ..."cat cet cit cot cut".matchAll("c[aou]t"),
);
// [ "cat" ] [ "cot" ] [ "cut" ]
console.log(..."fire-brand y42".matchAll(/[-\w]+/g));
// [ "fire-brand" ] [ "y42" ]
[^]
Any char that's not one of the character in the brackets.
\w
Any A to Z, a to z, and 0 to 9 and low line _.
const x = "x2 a-b y_z 中文"
console.log(... (x.matchAll(/(\w+)/g)));
// [ "x2", "x2" ] [ "a", "a" ] [ "b", "b" ] [ "y_z", "y_z" ]
// word syntax does not match hypen, no chinese, but contain digit and low line
\W
Any character that is not \w.
\d
Any ASCII digit 0 to 9. Example: "xyz123".search( /\d/ )
\D
Any character that's not \d.
\s
Any whitespace character. Whitespace includes space, tab, form feed, line feed, and other Unicode spaces. [see JS: Whitespace Characters]
\S
Any character that is not \s.

Boundaries

^
beginning of string. If RegExp Flag g is set, also match beginning of lines.
$
end of string. If RegExp Flag g is set, also match end of line.
\b
word boundary. For literal backspace, use [\b].
console.log("cats cat".search(/cat/) === 0);
console.log("cats cat".search(/\bcat\b/) === 5);
// all true
\B
Not word boundary.

Repetition

*
Match previous pattern 0 or more times. Same as {0,}.
?
Match previous pattern 0 or 1 time. Same as {0,1}.
+
Match previous pattern 1 or more times. Same as {1,}.
console.log("278".search(/\d+/) === 0); // true
{n}
Match previous pattern exactly n times.
console.log("eat".search(/e{2}/) === -1);
console.log("feet".search(/e{2}/)=== 1);
// all true
{n,}
Match previous pattern n or more times.
{n, m}
Match previous pattern n times or up to m times (inclusive).

Note: these will match as far as possible. For non-greedy version, add a ? after them.

Alternate and Conditions

x|y
Alternate. Match either x or y.
console.log("wildfire".search(/water|fire/) === 4); // true
x(?=y)
Look ahead assertion. Match only if x is followed by y
// replace all ab by abc, only if ab is followed by comma or period
console.log("abc, ab, ab.".replace(/ab(?=[\.,])/g, "abc") === "abc, abc, abc."); // true
x(?!y)
Match only if x is not followed by y
(?<=y)x
(JS2018) Look behind assertion.
Match only if y comes before x.
console.log(
  "sometimes somehow".replace(/(?<=some)how/, "one") === "sometimes someone",
); // true
(?<!y)x
(JS2018) Negative Look behind assertion.
Match only if y does not come before x.

Capture Group, Back Reference

()
Capture. Captured group can be later referenced by /n where n is a digit. \1 is the first captured group.
console.log(/(\d{4}).+(\d{4})/.exec("born 1899, died 1960"));
// [ "1899, died 1960", "1899", "1960" ]
console.log(
  "born 1899, died 1960".replace(/.+(\d{4}).+(\d{4})/, "$1 to $2") ===
    "1899 to 1960",
); // true
(?<name>)
(JS2018) Named capture group. The group can be refered to by \k<name> in regex or $<name> in replacement string.
// match text where width and height are the same
console.log(
  /width="(?<w>\d+)" height="\k<w>"/.exec('width="300" height="300"'),
);
// [ 'width="300" height="300"', "300" ]
console.log(
  "lived from 1899 to 1960".replace(
    /.+(?<born>\d{4}).+(?<died>\d{4})/,
    "$<born> - $<died>",
  ),
);
// 1899 - 1960
(?:)
Syntax for priority (precedence), but don't capture.
\n
The nth captured group. \1 is the first captured group.
\k<nanme>
Refer to named capture group name.

Unicode Property

\p{PropertyValue}
match a char that has property PropertyValue. Example RegExp Unicode Property
\p{PropertyName=PropertyValue}
match a char whose PropertyName is PropertyValue
\p{BinaryPropertyName}
\P{...}
(Note, uppercase P) Negation of \p{...}.

Escapes for Literal Characters

\0
the NUL character (ASCII 0) [see ASCII Table]
\t
horizontal tab (common tab char)
\n
line feed (unix newline char)
\v
vertical tab (rarely used)
\f
form feed (often used in emacs as code section break)
\r
carriage return (used in Mac OS Classic as newline)
\xxx
a ASCII character of hexadecimal code xx. For example, /\x61/ matches the letter “a” (ASCII code 97, hexadecimal 61)
\uxxxx
a Unicode character with hexadecimal code xxxx. It must be 4 digits. Add 0 in front if not. For example, /\u03b1/ matches “α” (codepoint 945, hexadecimal 3b1).
console.log("α".search(/\u03b1/) === 0);
console.log("α".search(/α/) === 0);
// all true
\cX
a ASCII control character. For example, /\cJ/ matches the unix newline \n. [see ASCII Table]
[\b]
a backspace.

JavaScript Regular Expression

JS in Depth
XAH  BUY NOW

JS in Depth

JS Obj Ref

DOM


JS Obj Ref

RegExp

prototype

Syntax

misc