JS: Regular Expression Syntax

By Xah Lee. Date: . Last updated: .

Character Class

.

Any character except newline characters: {\n, \r, \u2028, \u2029}.

If regex flag s is on, also match newline character.

console.log("abc efg".match(/.+/g));
// [ "abc efg" ]
console.log(
`abc
efg`
.match(/.+/g));

// [ "abc", "efg" ]
[]

Any character between the brackets. Can include character class in them such as \w.

console.log(
  "cat cet cit cot cut".match(/c[aou]t/g),
);
// [ "cat", "cot", "cut" ]
// match hyphen and any char of word class \w
console.log("fire-brand y42".match(/[-\w]+/g));
// [ "fire-brand", "y42" ]
[^]

Any char that's not one of the character in the brackets.

\w

Any A to Z, a to z, and 0 to 9 and low line _.

// word syntax does not match hypen, nor chinese, but match digit and low line

console.log("x2 a-b y_z 中文".match(/(\w+)/g));
// [ "x2", "a", "b", "y_z" ]
\W

Any character that is not \w.

\d

Any digit 0 to 9.

\D

Any character that's not \d.

\s

Any whitespace character. Whitespace includes space, tab, form feed, line feed, and other Unicode spaces. 〔see JS: Whitespace Characters

\S

Any character that is not \s.

Character Class by Unicode Property

\p{UnicodeProperty}
\P{x}

(Note, uppercase P) Negation of \p{x}.

Boundaries

^ (at the beginning of regex)
  • beginning of string.
  • If regex flag m is on, also match beginning of lines.
console.log("something by someone".match(/^some\w+/g));
// [ "something" ]

console.log("something by someone".match(/some\w+/g));
// [ "something", "someone" ]
$ (at the end of regex)
  • end of string.
  • If regex flag m is on, also match end of lines.
\b

word boundary. For literal backspace, use [\b].

console.log("Java is not JavaScript".match(/Java\b/g));
// [ "Java" ]

console.log("Java is not JavaScript".match(/Java/g));
// [ "Java", "Java" ]
\B

Not word boundary.

Repetition

*

Match previous pattern 0 or more times. Same as {0,}.

console.log("<b>cat</b> <b >dog</b>".match(RegExp("<b *>[a-z]+</b>", "g")));
// [ "<b>cat</b>", "<b >dog</b>" ]
?

Match previous pattern 0 or 1 time. Same as {0,1}.

console.log("http://abc https://abc".match(RegExp("https?://[a-z]+", "g")));
// [ "http://abc", "https://abc" ]
+

Match previous pattern 1 or more times. Same as {1,}.

console.log("278 091 826".match(/\d+/g));
// [ "278", "091", "826" ]
{n}

Match previous pattern exactly n times.

console.log("eat feet".match(/e{2}/g));
// [ "ee" ]
{n,}

Match previous pattern n or more times.

{n, m}

Match previous pattern n times or up to m times (inclusive).

Note: these will match as far as possible. For non-greedy version, add a ? after them.

Alternate

x|y

Alternate. Match either x or y.

console.log("wildfire and lots water".match(/water|fire/g));
// [ "fire", "water" ]

Conditions

x(?=y)

Look ahead assertion. Match only if x is followed by y

// replace all ab by abc, only if ab is followed by comma or period
console.log("abc, ab, ab.".replace(/ab(?=[\.,])/g, "abc") === "abc, abc, abc."); // true
x(?!y)

Match only if x is not followed by y

(?<=y)x

(new in ECMAScript 2018) Look behind assertion.

Match only if y comes before x.

console.log(
  "sometimes somehow".replace(/(?<=some)how/, "one") === "sometimes someone",
); // true
(?<!y)x

(new in ECMAScript 2018) Negative Look behind assertion.

Match only if y does not come before x.

Capture Group, Back Reference

()

Capture. Captured group can be later referenced by /n where n is a digit. \1 is the first captured group.

Dollar sign is used in replacement string to refer to captured group. 〔see JS: Regex Replace String Dollar Sign

console.log("born 1899, died 1960".replace(/born (\d{4}), died (\d{4})/, "$1 to $2"));
// 1899 to 1960
(?:)

Syntax for priority (precedence), but don't capture.

\n

The nth captured group. \1 is the first captured group.

Named capture group

(?<name>)

(new in ECMAScript 2018)

The group can be refered to by \k<name> in regex or $<name> in replacement string.

// match text where width and height are the same
console.log(
  /width="(?<w>\d+)" height="\k<w>"/.exec('width="300" height="300"'),
);
// [ 'width="300" height="300"', "300" ]
console.log(
  "lived from 1899 to 1960".replace(
    /.+(?<born>\d{4}).+(?<died>\d{4})/,
    "$<born> - $<died>",
  ),
);
// 1899 - 1960
\k<nanme>

Refer to named capture group name.

Escapes for Literal Characters

\0

the NUL character (ASCII 0)

\t

horizontal tab (common tab char)

\n

line feed (unix newline char)

\v

vertical tab (rarely used)

\f

form feed (often used in emacs as code section break)

\r

carriage return (used in Mac OS Classic as newline)

\xxx

a ASCII Character of hexadecimal code xx. e.g. /\x61/ matches the letter “a” (ASCII code 97, hexadecimal 61)

\uxxxx

a Unicode character with hexadecimal code xxxx. It must be 4 digits. Add 0 in front if not. e.g. /\u03b1/ matches “α” (codepoint 945, hexadecimal 3b1).

console.log("greek α".match(/\u03b1/g));
// [ "α" ]
\cX

a ASCII control character. For example, /\cJ/ matches the unix newline \n.

[\b]

a backspace.

JavaScript. Regular Expression