JS: RegExp Syntax

By Xah Lee. Date: . Last updated: .

Regular Expression (aka regexp) syntax has 2 parts:

RegExp Object Syntax

RegExp object are created in 2 ways:

/regex/flags

Literal expression. This is convenient. [see RegExp Tutorial]

RegExp(str, flagsStr)

This is more general, and can be used to construct regex from string at runtime. [see RegExp Constructor]

πŸ’‘ TIP: if your regex contains slash, use the RegExp(str, flagsStr) form, so you can avoid backslash to escape the slash. e.g. "text containing url".match(RegExp("https?://[a-z]+", "g")) . If your regex contains many backslash, e.g. \d, use the /regex/flags form to avoid escaping string.

RegExp Flags

RegExp Syntax

Character Class

.

Any character except newline characters: {\n, \r, \u2028, \u2029}.

If RegExp Flag dotAll is used, also match newline character.

console.log("abc efg".match(/.+/g));
// [ "abc efg" ]
console.log(
`abc
efg`
.match(/.+/g));

// [ "abc", "efg" ]
[…]

Any character between the brackets. Can include character class in them such as \w.

console.log(
  "cat cet cit cot cut".match(/c[aou]t/g),
);
// [ "cat", "cot", "cut" ]
// match hyphen and any char of word class \w
console.log("fire-brand y42".match(/[-\w]+/g));
// [ "fire-brand", "y42" ]
[^…]

Any char that's not one of the character in the brackets.

\w

Any A to Z, a to z, and 0 to 9 and low line _.

// word syntax does not match hypen, nor chinese, but match digit and low line

console.log("x2 a-b y_z δΈ­ζ–‡".match(/(\w+)/g));
// [ "x2", "a", "b", "y_z" ]
\W

Any character that is not \w.

\d

Any ASCII digit 0 to 9. e.g. "xyz123".match( /\d+/g )

\D

Any character that's not \d.

\s

Any whitespace character. Whitespace includes space, tab, form feed, line feed, and other Unicode spaces. [see JS: Whitespace Characters]

\S

Any character that is not \s.

Character Class by Unicode Property

\p{UnicodeProperty}

[JS: RegExp Unicode Property]

\P{x}

(Note, uppercase P) Negation of \p{x}.

Boundaries

^
  • beginning of string.
  • If RegExp Flag global is set, also match beginning of lines.
$
  • end of string.
  • If RegExp Flag global is set, also match end of lines.
\b

word boundary. For literal backspace, use [\b].

console.log("1cats 2cat 3cat_dog".match(/.cat/g));
// [ "1cat", "2cat", "3cat" ]

// with boundary at end
console.log("1cats 2cat 3cat_dog".match(/.cat\b/g));
// [ "2cat" ]
\B

Not word boundary.

Repetition

*

Match previous pattern 0 or more times. Same as {0,}.

console.log("<b>cat</b> <b >dog</b>".match(RegExp("<b *>[a-z]+</b>", "g")));
// [ "<b>cat</b>", "<b >dog</b>" ]
?

Match previous pattern 0 or 1 time. Same as {0,1}.

console.log("http://abc https://abc".match(RegExp("https?://[a-z]+", "g")));
// [ "http://abc", "https://abc" ]
+

Match previous pattern 1 or more times. Same as {1,}.

console.log("278 091 826".match(/\d+/g));
// [ "278", "091", "826" ]
{n}

Match previous pattern exactly n times.

console.log("eat feet".match(/e{2}/g));
// [ "ee" ]
{n,}

Match previous pattern n or more times.

{n, m}

Match previous pattern n times or up to m times (inclusive).

Note: these will match as far as possible. For non-greedy version, add a ? after them.

Alternate and Conditions

x|y

Alternate. Match either x or y.

console.log("wildfire and lots water".match(/water|fire/g));
// [ "fire", "water" ]
x(?=y)

Look ahead assertion. Match only if x is followed by y

// replace all ab by abc, only if ab is followed by comma or period
console.log("abc, ab, ab.".replace(/ab(?=[\.,])/g, "abc") === "abc, abc, abc."); // true
x(?!y)

Match only if x is not followed by y

(?<=y)x

(JS2018) Look behind assertion.

Match only if y comes before x.

console.log(
  "sometimes somehow".replace(/(?<=some)how/, "one") === "sometimes someone",
); // true
(?<!y)x

(JS2018) Negative Look behind assertion.

Match only if y does not come before x.

Capture Group, Back Reference

(…)

Capture. Captured group can be later referenced by /n where n is a digit. \1 is the first captured group.

console.log(/(\d{4}).+(\d{4})/.exec("born 1899, died 1960"));
// [ "1899, died 1960", "1899", "1960" ]
console.log(
  "born 1899, died 1960".replace(/.+(\d{4}).+(\d{4})/, "$1 to $2") ===
    "1899 to 1960",
); // true
(?<name>…)

(JS2018) Named capture group.

The group can be refered to by \k<name> in regex or $<name> in replacement string.

// match text where width and height are the same
console.log(
  /width="(?<w>\d+)" height="\k<w>"/.exec('width="300" height="300"'),
);
// [ 'width="300" height="300"', "300" ]
console.log(
  "lived from 1899 to 1960".replace(
    /.+(?<born>\d{4}).+(?<died>\d{4})/,
    "$<born> - $<died>",
  ),
);
// 1899 - 1960
(?:…)

Syntax for priority (precedence), but don't capture.

\n

The nth captured group. \1 is the first captured group.

\k<nanme>

Refer to named capture group name.

Escapes for Literal Characters

\0

the NUL character (ASCII 0)

\t

horizontal tab (common tab char)

\n

line feed (unix newline char)

\v

vertical tab (rarely used)

\f

form feed (often used in emacs as code section break)

\r

carriage return (used in Mac OS Classic as newline)

\xxx

a ASCII Character of hexadecimal code xx. e.g. /\x61/ matches the letter β€œa” (ASCII code 97, hexadecimal 61)

\uxxxx

a Unicode character with hexadecimal code xxxx. It must be 4 digits. Add 0 in front if not. e.g. /\u03b1/ matches β€œΞ±β€ (codepoint 945, hexadecimal 3b1).

console.log("greek Ξ±".match(/\u03b1/g));
// [ "Ξ±" ]
\cX

a ASCII control character. For example, /\cJ/ matches the unix newline \n.

[\b]

a backspace.

JavaScript, Regular Expression

BUY Ξ£JS JavaScript in Depth