JS: Regex Unicode Property

By Xah Lee. Date: . Last updated: .

(new in ECMAScript 2018)

Each unicode character has many properties. For exmaple, if it is: capital letter, Latin letter, punctuation, math symbol ( ± ), emoji, Chinese character, etc.

JavaScript regex allows you to match unicode properties.

Syntax:

\p{PropertyValue}

match a char that has property PropertyValue

// get emoji chars
console.log("i ♥ 😸".match(RegExp("\\p{Emoji_Presentation}", "gu")));
// [ "😸" ]
\p{PropertyName=PropertyValue}

match a char whose PropertyName is PropertyValue

// get Latin chars
console.log("abc ♥ 😸. $_$".match(RegExp("\\p{Script_Extensions=Latin}", "gu")));
// [ "a", "b", "c" ]
\p{BinaryPropertyName}
// get punctuation chars
console.log("i ♥ 😸. $_$".match(RegExp("\\p{P}", "gu")));
// [ ".", "_" ]
// get currency symbols
console.log("i ♥ 😸. $_$".match(/\p{Sc}/gu));
// [ "$", "$" ]
// get characters that's unicode letter
console.log("ä α ж の ♥ ⠮ 🦋 ∑ ° ⊕ +".match(/\p{L}/gu));
// [ "ä", "α", "ж", "の" ]
\P{x}

(Note, uppercase P) Negation of \p{x}.

JavaScript Unicode topics