Elisp: Unicode Escape Sequence
Unicode Escape Sequence
In emacs lisp string, you can represent Unicode char by the character's Codepoint. Called Escape Sequence
Note: you can have Unicode characters directly (e.g. "I ♥ 😸"
).
"\uxxxx"
-
A Unicode char. xxxx must be 4 hexadecimal digits, representing the char's codepoint in hex. You need to pad it with 0 if the codepoint is less than 4 hexadecimal digits.
(string-equal "\u0061" "a") ;; t ;; ♥ BLACK HEART SUIT ;; codepoint 9829 ;; hexadecimal 2665 (string-equal "\u2665" "♥") ;; t
"\U00xxxxxx"
-
A Unicode char. xxxxxx must be 6 hexadecimal digits, representing the char's codepoint in hex. You need to pad it with 0 if the codepoint is less than 6 hexadecimal digits.
;; 😸 GRINNING CAT FACE WITH SMILING EYES ;; codepoint 128568 ;; hexadecimal 1f638 (string-equal "\U0001f638" "😸") ;; t
Why is Encoded Unicode Char Useful?
The use of encoded representation is useful when you want to represent non-printable chars, such as {RIGHT-TO-LEFT MARK, ZERO WIDTH NO-BREAK SPACE, NO-BREAK SPACE}. Example:
(defun replace-BOM-mark-etc () "Query replace some invisible Unicode chars. The chars to be searched are: RIGHT-TO-LEFT MARK 8207 x200f ZERO WIDTH NO-BREAK SPACE 65279 xfeff start on cursor position to end." (interactive) (query-replace-regexp "\u200f\\|\ufeff" ""))
Elisp, String
Elisp, Regex in Lisp Code
- Elisp: Regular Expression
- Elisp: Regex Functions
- Emacs: Regular Expression Syntax
- Elisp: Regex Backslash in Lisp Code
- Elisp: Case Sensitivity (case-fold-search)
- Elisp: Find Replace Text in Buffer
- Elisp: Match Data (Regex Result)
- Elisp: Unicode Escape Sequence
- Elisp: Convert Regex to Lisp Regex String
- Elisp: How to Test Regex
- Elisp: Regex in Readable Syntax, Package Rx
- Elisp: Regex Named Character Class and Syntax Table
- Emacs Regex vs Regex in Python, JavaScript, Java