JS: RegExp Syntax
Regular Expression (aka regexp) syntax has 2 parts:
- regex β specify a text pattern.
- flags β flags is used to tweak regex meaning or how the regex function behaves. e.g. ignore letter case.
RegExp Object Syntax
RegExp object are created in 2 ways:
/regex/flags
-
Literal expression. This is convenient. [see RegExp Tutorial]
RegExp(str, flagsStr)
-
This is more general, and can be used to construct regex from string at runtime. [see RegExp Constructor]
π‘ TIP:
if your regex contains slash, use the
RegExp(str, flagsStr)
form, so you can avoid backslash to escape the slash.
e.g.
"text containing url".match(RegExp("https?://[a-z]+", "g"))
.
If your regex contains many backslash, e.g.
\d
, use the
/regex/flags
form to avoid escaping string.
RegExp Flags
RegExp Syntax
Character Class
.
-
Any character except newline characters: {
\n
,\r
,\u2028
,\u2029
}.If RegExp Flag dotAll is used, also match newline character.
console.log("abc efg".match(/.+/g)); // [ "abc efg" ]
console.log( `abc efg` .match(/.+/g)); // [ "abc", "efg" ]
[β¦]
-
Any character between the brackets. Can include character class in them such as
\w
.console.log( "cat cet cit cot cut".match(/c[aou]t/g), ); // [ "cat", "cot", "cut" ]
// match hyphen and any char of word class \w console.log("fire-brand y42".match(/[-\w]+/g)); // [ "fire-brand", "y42" ]
-
[^β¦]
-
Any char that's not one of the character in the brackets.
\w
-
Any A to Z, a to z, and 0 to 9 and low line _.
// word syntax does not match hypen, nor chinese, but match digit and low line console.log("x2 a-b y_z δΈζ".match(/(\w+)/g)); // [ "x2", "a", "b", "y_z" ]
\W
-
Any character that is not
\w
. \d
-
Any ASCII digit 0 to 9. e.g.
"xyz123".match( /\d+/g )
\D
-
Any character that's not
\d
. \s
-
Any whitespace character. Whitespace includes space, tab, form feed, line feed, and other Unicode spaces. [see JS: Whitespace Characters]
\S
-
Any character that is not
\s
.
Character Class by Unicode Property
\p{UnicodeProperty}
\P{x}
-
(Note, uppercase P) Negation of
\p{x}
.
Boundaries
^
-
- beginning of string.
- If RegExp Flag global is set, also match beginning of lines.
$
-
- end of string.
- If RegExp Flag global is set, also match end of lines.
\b
-
word boundary. For literal backspace, use
[\b]
.console.log("1cats 2cat 3cat_dog".match(/.cat/g)); // [ "1cat", "2cat", "3cat" ] // with boundary at end console.log("1cats 2cat 3cat_dog".match(/.cat\b/g)); // [ "2cat" ]
\B
-
Not word boundary.
Repetition
*
-
Match previous pattern 0 or more times. Same as
{0,}
.console.log("<b>cat</b> <b >dog</b>".match(RegExp("<b *>[a-z]+</b>", "g"))); // [ "<b>cat</b>", "<b >dog</b>" ]
?
-
Match previous pattern 0 or 1 time. Same as
{0,1}
.console.log("http://abc https://abc".match(RegExp("https?://[a-z]+", "g"))); // [ "http://abc", "https://abc" ]
+
-
Match previous pattern 1 or more times. Same as
{1,}
.console.log("278 091 826".match(/\d+/g)); // [ "278", "091", "826" ]
{n}
-
Match previous pattern exactly n times.
console.log("eat feet".match(/e{2}/g)); // [ "ee" ]
{n,}
-
Match previous pattern n or more times.
{n, m}
-
Match previous pattern n times or up to m times (inclusive).
Note: these will match as far as possible. For non-greedy version, add a ?
after them.
Alternate and Conditions
x|y
-
Alternate. Match either x or y.
console.log("wildfire and lots water".match(/water|fire/g)); // [ "fire", "water" ]
x(?=y)
-
Look ahead assertion. Match only if x is followed by y
// replace all ab by abc, only if ab is followed by comma or period console.log("abc, ab, ab.".replace(/ab(?=[\.,])/g, "abc") === "abc, abc, abc."); // true
x(?!y)
-
Match only if x is not followed by y
(?<=y)x
-
(JS2018) Look behind assertion.
Match only if y comes before x.
console.log( "sometimes somehow".replace(/(?<=some)how/, "one") === "sometimes someone", ); // true
(?<!y)x
-
(JS2018) Negative Look behind assertion.
Match only if y does not come before x.
Capture Group, Back Reference
(β¦)
-
Capture. Captured group can be later referenced by
/n
where n is a digit.\1
is the first captured group.console.log(/(\d{4}).+(\d{4})/.exec("born 1899, died 1960")); // [ "1899, died 1960", "1899", "1960" ]
console.log( "born 1899, died 1960".replace(/.+(\d{4}).+(\d{4})/, "$1 to $2") === "1899 to 1960", ); // true
-
(?<name>β¦)
-
(JS2018) Named capture group.
The group can be refered to by
\k<name>
in regex or$<name>
in replacement string.// match text where width and height are the same console.log( /width="(?<w>\d+)" height="\k<w>"/.exec('width="300" height="300"'), ); // [ 'width="300" height="300"', "300" ]
console.log( "lived from 1899 to 1960".replace( /.+(?<born>\d{4}).+(?<died>\d{4})/, "$<born> - $<died>", ), ); // 1899 - 1960
(?:β¦)
-
Syntax for priority (precedence), but don't capture.
\n
-
The nth captured group.
\1
is the first captured group. \k<nanme>
-
Refer to named capture group name.
Escapes for Literal Characters
\0
-
the NUL character (ASCII 0)
\t
-
horizontal tab (common tab char)
\n
-
line feed (unix newline char)
\v
-
vertical tab (rarely used)
\f
-
form feed (often used in emacs as code section break)
\r
-
carriage return (used in Mac OS Classic as newline)
\xxx
-
a ASCII Character of hexadecimal code xx. e.g.
/\x61/
matches the letter βaβ (ASCII code 97, hexadecimal 61) \uxxxx
-
a Unicode character with hexadecimal code xxxx. It must be 4 digits. Add 0 in front if not. e.g.
/\u03b1/
matches βΞ±β (codepoint 945, hexadecimal 3b1).console.log("greek Ξ±".match(/\u03b1/g)); // [ "Ξ±" ]
\cX
-
a ASCII control character. For example,
/\cJ/
matches the unix newline\n
. [\b]
-
a backspace.
JavaScript, Regular Expression
- JS: RegExp Tutorial
- JS: Regex Functions
- JS: RegExp Syntax
- JS: RegExp Flag
- JS: Regex Replace String Dollar Sign
- JS: Regex Replace Function Args
- JS: RegExp Object
- JS: RegExp Constructor
- JS: RegExp.prototype
- JS: String.prototype.search
- JS: String.prototype.match
- JS: String.prototype.matchAll
- JS: String.prototype.replace
- JS: String.prototype.replaceAll
- JS: RegExp.prototype.test
- JS: RegExp.prototype.exec