JS: RegExp Unicode Property
New in JS2018
Each unicode character has many properties. For exmaple, whether it is a capital case, whether it is latin letter, whether it is a non-letter, whether it is a punctuation, whether it is a math symbol, whether it is a emoji, whether it is a Chinese character, etc.
in JavaScript regex, you can match unicode properties.
Syntax:
\p{PropertyValue}
- match a char that has property PropertyValue
\p{PropertyName=PropertyValue}
- match a char whose PropertyName is PropertyValue
\p{BinaryPropertyName}
\P{...}
- (Note, uppercase P) Negation of
\p{...}
.
Get emoji:
console.log( "i ♥ 😸. $_$".match(/\p{Emoji_Presentation}/gu) ); // [ '😸' ]
Get latin characters:
console.log( "i ♥ 😸. $_$".match( /\p{Script_Extensions=Latin}+/gu ) ); // [ 'i' ]
Get letters:
// get letters console.log( "∑Σπα".match( /\p{L}/gu ) ); // [ 'Σ', 'π', 'α' ]
Get punctuations:
// punctuations console.log( "i ♥ 😸. $_$".match( /\p{P}/gu ) ); // [ '.', '_' ]
Get currency symbols:
// currency symbols console.log( "i ♥ 😸. $_$".match( /\p{Sc}/gu ) ); // [ '$', '$' ]
See also: Unicode Escape Sequence