Python 2: Unicode Tutorial
For python 3, see Python: Unicode 🐍
Python 2 Source Code Encoding
if your source code contains literal
Unicode Characters,
such as
♥
,
the file must start with the line
# -*- coding: utf-8 -*-
Python 2, String Containing Unicode, Declare Unicode String
If your string contain literal
Unicode Characters
, such as
♥
(U+2665: BLACK HEART SUIT)
, then you must prefix your string with
u
,
e.g. u"I ♥ U"
.
The u
makes the string a Unicode datatype. Without the u
, string is just byte sequence.
The r
and u
can be combined, like this: ur"I ♥ Python"
# -*- coding: utf-8 -*- # python 2 import sys print( sys.version) # unicode string starts with u aa = u"I ♥ U" print aa.encode('utf-8') # I ♥ U
Sometimes when you print Unicode strings, you may get a error like this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 16: ordinal not in range(128).
The solution is to use the .encode()
or .decode()
method.
# -*- coding: utf-8 -*- # python 2 myStr = u'α' # Bad. This is a error. print 'Greek alpha: ', myStr # Good print 'Greek alpha: ', myStr.encode('utf-8')
Python 2: Unicode in Regex
When using regex on Unicode string, and you want the
word patterns {\w
, \W
} and boundary patterns {\b
, \B
},
dependent on the Unicode character properties, you need to add the Unicode flag re.U
when calling regex functions.
# -*- coding: utf-8 -*- # python 2 # example showing the difference of using re.U regex flag import re rr = re.findall(r"\w+", u"♥αβγ!", re.U) if rr: print rr else: print "no match" # prints [u'\u03b1\u03b2\u03b3'] # if re.U is not used, it prints “no match” because the \w+ pattern for “word” only consider ASCII letters
See: Python: Regex Flags .
Find Replace Unicode Char in String
# -*- coding: utf-8 -*- # python 2 # example of finding all unicode char in a string import re ss = u"i♥NY 😸" # find all unicode chars myResult = re.findall(u"[^\u0000-\u007e]+", ss) if myResult: print myResult # [u'\u2665', u'\U0001f638'] else: print "no match"
Python, Unicode
Python, String
- Python: Quote String
- Python: Triple Quote String
- Python: Raw String
- Python: f-String (Format, Template)
- Python: String Escape Sequence
- Python: Unicode Escape Sequence
- Python: Print String
- Python: Print Without Newline
- Python: Convert to String
- Python: Join String
- Python: Format String
- Python: String Methods
- Python: Search Substring
- Python: Split String
- Python: String, Check Case, Char Class
- Python: Letter Case Conversion
- Python: Unicode 🐍
- Python 2: Unicode Tutorial