Python Regex Flags

, , …,

Many Python Regex Functions and Regex Methods take a optional argument called “flags”. The flags modifies the meaning of the given regex pattern.

The flags can be any of:

Summary of Regex Flags
syntaxlong syntaxmeaning
re.Ire.IGNORECASEignore case.
re.Mre.MULTILINEmake begin/end {^, $} consider each line.
re.Sre.DOTALLmake . match newline too.
re.Ure.UNICODEmake {\w, \W, \b, \B} follow Unicode rules.
re.Lre.LOCALEmake {\w, \W, \b, \B} follow locale.
re.Xre.VERBOSEallow comment in regex.

To specify more than one of them, use | operator to connect them. For example, re.search(pattern,string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).

「re.IGNORECASE」 or 「re.I」

Indicates case-insensitive matching.

「re.MULTILINE」 or 「re.M」

When specified, the pattern character ^ match the beginning of the string and the beginning of each line (immediately following each newline); and the pattern character $ match at the end of the string and at the end of each line (immediately preceding each newline).

Normally, ^ and $ only match at the beginning/end of the string. 〔➤ Python Regex Syntax

# -*- coding: utf-8 -*-
# python

# example of regex flag re.MULTILINE

import re

ss = """abc
def
ghi"""

r1 = re.findall(r"^\w", ss)
r2 = re.findall(r"^\w", ss, flags = re.MULTILINE)

print r1                        # ['a']
print r2                        # ['a', 'd', 'g']

「re.DOTALL」 or 「re.S」

Make the dot character . match any character, including a newline. Without this flag, a dot will match anything except a newline.

# -*- coding: utf-8 -*-
# python

# example of regex flag re.DOTALL

import re

ss = """once upon a time,
there lived a king"""

r1 = re.findall(r".+", ss)
r2 = re.findall(r".+", ss, re.DOTALL)

print r1                        # ['once upon a time,', 'there lived a king']

print r2                        # ['once upon a time,\nthere lived a king']

「re.UNICODE」 or 「re.U」

Make the pattern characters {\w, \W, \b, \B} dependent on the Unicode character properties database. For Example:

# -*- coding: utf-8 -*-

# example of regex re.UNICODE flag

import re

x1 = re.search(r"\w+", u"♥αβγ!", re.U)
x2 = re.search(r"\w+", u"♥αβγ!")

if x1:
    print x1.group().encode("utf8") # → 「αβγ」
else:
    print "no match"

print x2                        # → 「None」

Note that Unicode string can be in the pattern string. Just be sure to use the Unicode prefix u to the pattern string. For Example:

# -*- coding: utf-8 -*-

import re

result = re.findall(ur"β", u"αβγ", re.U)
print result[0].encode("utf8")  # prints β

「re.LOCALE」 or 「re.L」

Make the word pattern {\w, \W} and boundary pattern {\b, \B}, dependent on the current locale. 〔➤ Python Regex Syntax

「re.VERBOSE」 or 「re.X」

This flag changes the regex syntax, to allow you to add annotations in regex. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a # neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored. Example:

# -*- coding: utf-8 -*-

import re

# example of the regex re.VERBOSE flag

# matching a decimal number
p1 = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

p2 = re.compile(r"\d+\.\d*")    # pattern p2 is same as p1

r1 = re.findall(p1, u"a3.45")
r2 = re.findall(p2, u"a3.45")

print r1[0].encode("utf8")      # 3.45
print r2[0].encode("utf8")      # 3.45
blog comments powered by Disqus