JS: RegExp Unicode Property
New in JS2018
Each unicode character has many properties. For exmaple, if it is: capital letter, Latin letter, punctuation, math symbol ( ± ∑ ∫ ), emoji, Chinese character, etc.
JavaScript regex allows you to match unicode properties.
Syntax:
\p{PropertyValue}
-
match a char that has property PropertyValue
// get emoji chars const xx = "i ♥ 😸. $_$"; const yy = xx.match(/\p{Emoji_Presentation}/gu); console.log(yy.length === 1); console.log(yy[0] === "😸");
\p{PropertyName=PropertyValue}
-
match a char whose PropertyName is PropertyValue
// get Latin chars const xx = "i ♥ 😸. $_$"; const yy = xx.match(/\p{Script_Extensions=Latin}+/gu); console.log(yy.length === 1); console.log(yy[0] === "i");
\p{BinaryPropertyName}
-
// get punctuation chars const xx = "i ♥ 😸. $_$"; const yy = xx.match(/\p{P}/gu); console.log(yy.length === 2); console.log(yy[0] === "."); console.log(yy[1] === "_");
// get currency symbols const xx = "i ♥ 😸. $_$"; const yy = xx.match(/\p{Sc}/gu); console.log(yy.length === 2); console.log(yy[0] === "$"); console.log(yy[1] === "$");
\P{x}
-
(Note, uppercase P) Negation of
\p{x}
.
Check If a Character is a Unicode Letter
// get characters that's unicode letter const xx = "ä α ж の ♥ ⠮ 😃 + 3"; const yy = xx.match(/\p{L}/gu); // [ "ä", "α", "ж", "の" ] console.log(yy.join("") === "äαжの");