Elisp: Unicode Escape Sequence

By Xah Lee. Date: . Last updated: .

Unicode Escape Sequence

In emacs lisp string, you can represent Unicode char by the character's Char ID.

This is called Escape Sequence.

๐Ÿ’ก TIP: you can have Unicode characters directly (e.g. "I ๐Ÿงก ๐Ÿ˜ธ" ).

"\uxxxx"
A Unicode char. xxxx must be 4 hexadecimal digits, representing the char's codepoint in hex. You need to pad it with 0 if the codepoint is less than 4 hexadecimal digits.
(string-equal "\u0061" "a")
;; t

;; โ™ฅ BLACK HEART SUIT
;; codepoint 9829
;; hexadecimal 2665
(string-equal "\u2665" "โ™ฅ")
;; t
"\U00xxxxxx"
A Unicode char. xxxxxx must be 6 hexadecimal digits, representing the char's codepoint in hex. You need to pad it with 0 if the codepoint is less than 6 hexadecimal digits.
;; ๐Ÿ˜ธ GRINNING CAT FACE WITH SMILING EYES
;; codepoint 128568
;; hexadecimal 1f638
(string-equal "\U0001f638" "๐Ÿ˜ธ")
;; t

Why is Encoded Unicode Char Useful?

The use of encoded representation is useful when you want to represent non-printable chars, such as {RIGHT-TO-LEFT MARK, ZERO WIDTH NO-BREAK SPACE, NO-BREAK SPACE}. Example:

(defun replace-BOM-mark-etc ()
  "Query replace some invisible Unicode chars.
The chars to be searched are:
 RIGHT-TO-LEFT MARK 8207 x200f
 ZERO WIDTH NO-BREAK SPACE 65279 xfeff

start on cursor position to end."
  (interactive)
  (query-replace-regexp "\u200f\\|\ufeff" ""))

Reference

Emacs Lisp, String

Emacs Lisp, Regex in Lisp Code