Python: Get Unicode Name, Codepoint

By Xah Lee. Date: . Last updated: .

Get Codepoint

Get Unicode character's codepoint.

# python 3

from unicodedata import *

# get codepoint of Unicode char in decimal
print(ord(u"→"))
# 8594

Get Name

Find character's Unicode name.

# python 3

from unicodedata import *

print(name(u"→"))
# RIGHTWARDS ARROW

Get Char

Get Unicode char of a given name.

# python 3

from unicodedata import *

char1 = lookup("GREEK SMALL LETTER ALPHA")
print(char1)
# α

char2 = lookup("RIGHTWARDS ARROW")
print(char2)
# 
char3 = lookup("CJK UNIFIED IDEOGRAPH-5929")
print(char3)
# 

Here's python 2:

# -*- coding: utf-8 -*-
# python 2

from unicodedata import *

char1 = lookup("GREEK SMALL LETTER ALPHA")
print(char1.encode('utf-8'))
# α

char2 = lookup("RIGHTWARDS ARROW")
print(char2.encode('utf-8'))
# 
char3 = lookup("CJK UNIFIED IDEOGRAPH-5929")
print(char3.encode('utf-8'))
# 

Intro of Unicode and UTF 8:

  1. Each char has a ID, called its codepoint. It's a integer.
  2. Each char has a unique name. (but a char may have a older name.)
  3. Each char has a number of properties, for example: Upper/lower case, direction (right-to-left languages), whether it's part of a combining char, whether it's a punctuation, etc.

The rest of functions in unicodedata module returns these properties.

[see Unicode Basics: Character Set, Encoding, UTF-8 ]

This page lets you search unicode. Unicode Search 💋 ♥ 😄

Print a Range of Unicode Chars

Here's a example that prints a range of Unicode chars, with their ordinal in hex, and name.

Chars without a name are skipped. (some of such are undefined codepoints.)

# python 3

from unicodedata import *

xlist=[]

for i in range(945, 969):
    xlist.append(eval('u"\\u%04x"' % i))

for x in xlist:
    if name(x,'-')!='-':
        print(x,'|', "%04x"%(ord(x)), '|', name(x,'-'))

# output
# α | 03b1 | GREEK SMALL LETTER ALPHA
# β | 03b2 | GREEK SMALL LETTER BETA
# γ | 03b3 | GREEK SMALL LETTER GAMMA
# δ | 03b4 | GREEK SMALL LETTER DELTA
# ε | 03b5 | GREEK SMALL LETTER EPSILON
# ζ | 03b6 | GREEK SMALL LETTER ZETA
# η | 03b7 | GREEK SMALL LETTER ETA
# θ | 03b8 | GREEK SMALL LETTER THETA
# ι | 03b9 | GREEK SMALL LETTER IOTA
# κ | 03ba | GREEK SMALL LETTER KAPPA
# λ | 03bb | GREEK SMALL LETTER LAMDA
# μ | 03bc | GREEK SMALL LETTER MU
# ν | 03bd | GREEK SMALL LETTER NU
# ξ | 03be | GREEK SMALL LETTER XI
# ο | 03bf | GREEK SMALL LETTER OMICRON
# π | 03c0 | GREEK SMALL LETTER PI
# ρ | 03c1 | GREEK SMALL LETTER RHO
# ς | 03c2 | GREEK SMALL LETTER FINAL SIGMA
# σ | 03c3 | GREEK SMALL LETTER SIGMA
# τ | 03c4 | GREEK SMALL LETTER TAU
# υ | 03c5 | GREEK SMALL LETTER UPSILON
# φ | 03c6 | GREEK SMALL LETTER PHI
# χ | 03c7 | GREEK SMALL LETTER CHI
# ψ | 03c8 | GREEK SMALL LETTER PSI

Python Text Processing

  1. Read/Write File
  2. Walk Directory
  3. File Path
  4. Process Unicode
  5. Convert File Encoding
  6. Convert File Encoding in a Dir
  7. Find Replace in dir
  8. Find Replace by Regex
  9. Count Word Frequency

If you have a question, put $5 at patreon and message me.

Python

  1. Python 3 Basics
  2. Python 2 Basics
  3. Python 2 and 3 Difference
  4. Print Version
  5. Builtin Help
  6. Quote String
  7. String Methods
  8. Format String
  9. Operators
  10. Complex Numbers
  11. True, False
  12. if then else
  13. Loop
  14. List Basics
  15. Loop Thru List
  16. Map f to List
  17. Copy Nested List
  18. List Comprehension
  19. List Methods
  20. Sort
  21. Dictionary
  22. Loop Thru Dict
  23. Dict Methods
  24. Tuple
  25. Sets
  26. Function
  27. Closure
  28. 2 Closure
  29. Decorator
  30. Class
  31. Object, ID, Type
  32. List Modules
  33. Write a Module
  34. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. File Path
  4. Process Unicode
  5. Convert File Encoding
  6. Find Replace in dir
  7. Find Replace by Regex
  8. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Append String in Loop
  7. Timing f timeit
  8. Keyword Arg Default Value Unstable
  9. Check Page Load Size
  10. Thumbnail Generation