Many python regex functions takes a second argument “flags”. (⁖ “re.search”, “re.compile”). The flags modifies the meaning of the given regex pattern.
The flags can be any of re.I, re.L, re.M, re.S, re.U, re.S. They can be
combined with the | operator. For example, re.search(pat,re.M|re.U)
creates a regex pattern that matches multiple lines of a Unicode
string.
Indicates case-insensitive matching.
Make the pattern characters \w, \W, \b, \B dependent on the current locale.
When specified, the pattern character ^ matches at the beginning
of the string and at the beginning of each line (immediately following each
newline); and the pattern character $ matches at the end of the
string and at the end of each line (immediately preceding each newline).
Otherwise, ^ matches only at the beginning of the string, and $ only at the end of the string and immediately before the newline (if any) at the end of the string.
Make the dot character (.) match any character, including a newline; without this flag, a dot will match anything except a newline.
Make the pattern characters \w, \W, \b, \B dependent on the Unicode character properties database. For Example:
# -*- coding: utf-8 -*- import re result=re.search(r'\w+', u'真善美αβγ!',re.U) if result: print result.group().encode('utf8') else: print "no match"
Note that Python re module also allows Unicode in the pattern
string. Just be sure to use the Unicode prefix u to the pattern
string. For Example:
# -*- coding: utf-8 -*- import re result=re.findall(ur'善+', u'真善美αβγ!',re.U) print result[0].encode('utf8')
This flag changes the regex syntax, to allow you to add annotations in regex.
Whitespace within the pattern is ignored, except when in a
character class or preceded by an unescaped backslash, and, when a
line contains a # neither in a character class or preceded by an
unescaped backslash, all characters from the leftmost such # through
the end of the line are ignored. Example:
a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*")blog comments powered by Disqus