MathCurvesSurfacesWallpaper GroupsGallerySoftwarePOV-Ray
ProgramingLinuxPerl PythonHTMLCSSJavaScriptPHPJavaEmacsUnicode ♥
Web Hosting by 1&1

Python Regex Flags

Xah Lee, 2005, 2011-02

Many python regex functions takes a second argument “flags”. (⁖ “re.search”, “re.compile”). The flags modifies the meaning of the given regex pattern.

The flags can be any of re.I, re.L, re.M, re.S, re.U, re.S. They can be combined with the | operator. For example, re.search(pat,re.M|re.U) creates a regex pattern that matches multiple lines of a Unicode string.

re.IGNORECASE or re.I

Indicates case-insensitive matching.

re.LOCALE or re.L

Make the pattern characters \w, \W, \b, \B dependent on the current locale.

re.MULTILINE or re.M

When specified, the pattern character ^ matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character $ matches at the end of the string and at the end of each line (immediately preceding each newline).

Otherwise, ^ matches only at the beginning of the string, and $ only at the end of the string and immediately before the newline (if any) at the end of the string.

re.DOTALL or re.S

Make the dot character (.) match any character, including a newline; without this flag, a dot will match anything except a newline.

re.UNICODE or re.U

Make the pattern characters \w, \W, \b, \B dependent on the Unicode character properties database. For Example:

# -*- coding: utf-8 -*-
import re
result=re.search(r'\w+', u'真善美αβγ!',re.U)
if result:
    print result.group().encode('utf8')
else:
    print "no match"

Note that Python re module also allows Unicode in the pattern string. Just be sure to use the Unicode prefix u to the pattern string. For Example:

# -*- coding: utf-8 -*-
import re
result=re.findall(ur'善+', u'真善美αβγ!',re.U)
print result[0].encode('utf8')

re.VERBOSE or re.X

This flag changes the regex syntax, to allow you to add annotations in regex. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a # neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored. Example:

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")
blog comments powered by Disqus