Python: Get Unicode Name, Codepoint

By Xah Lee. Date: . Last updated: .

How to get Unicode character's codepoint in decimal?

# -*- coding: utf-8 -*-
# python 2

from unicodedata import *

# get code point of Unicode char (in decimal)
print ord(u"→") # 8594

How to find a character's Unicode name?

# -*- coding: utf-8 -*-
# python 2

from unicodedata import *

# find Unicode char's name
print name(u"→")    # RIGHTWARDS ARROW

How to get the Unicode char of a given name?

# -*- coding: utf-8 -*-
# python 2

from unicodedata import *

char1 = lookup("GREEK SMALL LETTER ALPHA") # should be UPPER CASE
print char1.encode('utf-8') # α

char2 = lookup("RIGHTWARDS ARROW") # should be UPPER CASE
print char2.encode('utf-8') # 
char3 = lookup("CJK UNIFIED IDEOGRAPH-5929") # doesn't work if not UPPER CASE
print char3.encode('utf-8') # 
# Unicode name is UPPER CASE by spec

Here's a short intro of Unicode:

  1. Each char has a ID, called its codepoint. It's a integer.
  2. Each char has a unique name. (but a char may have a older name.)
  3. Each char has a number of properties, for example: Upper/lower case, direction (right-to-left languages), whether it's part of a combining char, whether it's a punctuation, ….

The rest of functions in unicodedata module returns these properties.

[see Unicode Basics: Character Set, Encoding, UTF-8 ]

This page lets you search unicode. Unicode Characters ∑ ♥ 😄

Print a Range of Unicode Chars

Here's a example that prints a range of Unicode chars, with their ordinal in hex, and name.

Chars without a name are skipped. (some of such are undefined codepoints.)

# -*- coding: utf-8 -*-
# python 2

from unicodedata import *

xlist=[]

for i in range(945, 969):
    xlist.append(eval('u"\\u%04x"' % i))

for x in xlist:
    if name(x,'-')!='-':
        print x.encode('utf-8'),'|', "%04x"%(ord(x)), '|', name(x,'-')

# output
"""
α | 03b1 | GREEK SMALL LETTER ALPHA
β | 03b2 | GREEK SMALL LETTER BETA
γ | 03b3 | GREEK SMALL LETTER GAMMA
δ | 03b4 | GREEK SMALL LETTER DELTA
ε | 03b5 | GREEK SMALL LETTER EPSILON
ζ | 03b6 | GREEK SMALL LETTER ZETA
η | 03b7 | GREEK SMALL LETTER ETA
θ | 03b8 | GREEK SMALL LETTER THETA
ι | 03b9 | GREEK SMALL LETTER IOTA
κ | 03ba | GREEK SMALL LETTER KAPPA
λ | 03bb | GREEK SMALL LETTER LAMDA
μ | 03bc | GREEK SMALL LETTER MU
ν | 03bd | GREEK SMALL LETTER NU
ξ | 03be | GREEK SMALL LETTER XI
ο | 03bf | GREEK SMALL LETTER OMICRON
π | 03c0 | GREEK SMALL LETTER PI
ρ | 03c1 | GREEK SMALL LETTER RHO
ς | 03c2 | GREEK SMALL LETTER FINAL SIGMA
σ | 03c3 | GREEK SMALL LETTER SIGMA
τ | 03c4 | GREEK SMALL LETTER TAU
υ | 03c5 | GREEK SMALL LETTER UPSILON
φ | 03c6 | GREEK SMALL LETTER PHI
χ | 03c7 | GREEK SMALL LETTER CHI
ψ | 03c8 | GREEK SMALL LETTER PSI
"""

Print Unicode Symbols Whose Name Contains STAR

Here's a example.

# -*- coding: utf-8 -*-
# python 2

# print all unicode chars whose name contains "STAR"
# 2014-04-14 by 馬曉駿 https://gist.github.com/10622337

from unicodedata import name

bullets = list()

for i in range(0x10000):
    try:
        c = unichr(i)
        if 'STAR' in name(c):
            bullets.append(c)
    except:
        pass

bullets.sort(key = lambda c:name(c))
for c in bullets:
    print name(c), c.encode("utf-8")

# output
"""
APL FUNCTIONAL SYMBOL CIRCLE STAR ⍟
APL FUNCTIONAL SYMBOL STAR DIAERESIS ⍣
ARABIC FIVE POINTED STAR ٭
ARABIC START OF RUB EL HIZB ۞
BLACK CENTRE WHITE STAR ✬
BLACK FOUR POINTED STAR ✦
BLACK SMALL STAR ⭑
BLACK STAR ★
CIRCLED OPEN CENTRE EIGHT POINTED STAR ❂
CIRCLED WHITE STAR ✪
EIGHT POINTED BLACK STAR ✴
EIGHT POINTED PINWHEEL STAR ✵
EIGHT POINTED RECTILINEAR BLACK STAR ✷
GLEICH STARK ⧦
HEAVY EIGHT POINTED RECTILINEAR BLACK STAR ✸
HEAVY OUTLINED BLACK STAR ✮
OPEN CENTRE BLACK STAR ✫
OUTLINED BLACK STAR ✭
OUTLINED WHITE STAR ⚝
PINWHEEL STAR ✯
SHADOWED WHITE STAR ✰
SIX POINTED BLACK STAR ✶
STAR AND CRESCENT ☪
STAR EQUALS ≛
STAR OF DAVID ✡
STAR OPERATOR ⋆
STRESS OUTLINED WHITE STAR ✩
SYMBOL FOR START OF HEADING ␁
SYMBOL FOR START OF TEXT ␂
TIBETAN MARK DELIMITER TSHEG BSTAR ༌
TWELVE POINTED BLACK STAR ✹
WHITE FOUR POINTED STAR ✧
WHITE MEDIUM STAR ⭐
WHITE SMALL STAR ⭒
WHITE STAR ☆
"""

Python Text Processing

  1. Read/Write File
  2. Walk Directory
  3. Python 3: Walk Directory
  4. File Path
  5. Process Unicode
  6. Convert File Encoding
  7. Convert File Encoding in a Dir
  8. Find Replace in dir
  9. Find Replace by Regex
  10. Count Word Frequency

If you have a question, put $5 at patreon and message me.

Python

  1. Python 3 Basics
  2. Python 2 Basics
  3. Python 2 and 3 Difference
  4. Print Version
  5. Builtin Help
  6. Quote String
  7. String Methods
  8. Format String
  9. Operators
  10. True, False
  11. if then else
  12. Loop
  13. List Basics
  14. Loop Thru List
  15. Map f to List
  16. List Comprehension
  17. List Methods
  18. Sort
  19. Dictionary
  20. Loop Thru Dict
  21. Dict Methods
  22. Function
  23. Class
  24. Object, ID, Type
  25. List Modules
  26. Write a Module
  27. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. 2 Traverse Directory
  4. File Path
  5. Process Unicode
  6. Convert File Encoding
  7. Find Replace in dir
  8. Find Replace by Regex
  9. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Complex Numbers
  7. Copy Nested List
  8. Tuple vs List
  9. Sets, Union, Intersection
  10. Closure
  11. 2 Closure
  12. Decorator
  13. Append String in Loop
  14. Timing f timeit
  15. Keyword Arg Default Value Unstable
  16. Check Page Load Size
  17. Thumbnail Generation