JS: RegExp Unicode Property

By Xah Lee. Date: .

Each unicode character has many properties. For exmaple, whether it is a capital case, whether it is latin letter, whether it is a non-letter, whether it is a punctuation, whether it is a math symbol, whether it is a emoji, whether it is a Chinese character, etc.

in JavaScript regex, you can match unicode properties.

Syntax:

\p{PropertyValue}
match a char that has property PropertyValue
\p{PropertyName=PropertyValue}
match a char whose PropertyName is PropertyValue
\p{BinaryPropertyName}
\P{...}
(Note, uppercase P) Negation of \p{...}.

Get emoji:

console.log(
"i β™₯ 😸. $_$".match(/\p{Emoji_Presentation}/gu)
);
// [ '😸' ]

Get latin characters:

console.log(
"i β™₯ 😸. $_$".match( /\p{Script_Extensions=Latin}+/gu )
);
// [ 'i' ]

Get letters:

// get letters
console.log(
"βˆ‘Ξ£Ο€Ξ±".match( /\p{L}/gu )
);
// [ 'Ξ£', 'Ο€', 'Ξ±' ]

Get punctuations:

// punctuations
console.log(
"i β™₯ 😸. $_$".match( /\p{P}/gu )
);
// [ '.', '_' ]

Get currency symbols:

// currency symbols
console.log(
"i β™₯ 😸. $_$".match( /\p{Sc}/gu )
);
// [ '$', '$' ]

See also: Unicode Escape Sequence

Buy JavaScript in Depth

JavaScript in Depth

JS Obj Ref

DOM


JavaScript in Depth

Basic Syntax

Value Types

Variable

String

Function

Property

Object and Inheritance

Array

Constructor/Class

Iterable 🌟

Regular Expression

Date

Set Object

Map Object

Number

Misc