JS: RegExp Unicode Property

By Xah Lee. Date: . Last updated: .

New in JS2018

Each unicode character has many properties. For exmaple, if it is: capital letter, Latin letter, punctuation, math symbol ( ± ), emoji, Chinese character, etc.

JavaScript regex allows you to match unicode properties.


match a char that has property PropertyValue
// get emoji chars
const xx = "i ♥ 😸. $_$";
const yy = xx.match(/\p{Emoji_Presentation}/gu);
console.log(yy.length === 1);
console.log(yy[0] === "😸");
match a char whose PropertyName is PropertyValue
// get Latin chars

const xx = "i ♥ 😸. $_$";
const yy = xx.match(/\p{Script_Extensions=Latin}+/gu);

console.log(yy.length === 1);
console.log(yy[0] === "i");
// get punctuation chars

const xx = "i ♥ 😸. $_$";
const yy = xx.match(/\p{P}/gu);

console.log(yy.length === 2);
console.log(yy[0] === ".");
console.log(yy[1] === "_");
// get currency symbols

const xx = "i ♥ 😸. $_$";
const yy = xx.match(/\p{Sc}/gu);

console.log(yy.length === 2);
console.log(yy[0] === "$");
console.log(yy[1] === "$");
(Note, uppercase P) Negation of \p{x}.

Check If a Character is a Unicode Letter

// get characters that's unicode letter

const xx = "ä α ж の ♥ ⠮ 😃 + 3";

const yy = xx.match(/\p{L}/gu);
// [ "ä", "α", "ж", "の" ]

console.log(yy.join("") === "äαжの");