Python: Regex Flags

By Xah Lee. Date: . Last updated: .

Many Python Regex Functions and Regex Methods take a optional argument called “flags”. The flags modifies the meaning of the given regex pattern.

The flags can be any of:

Summary of Regex Flags
syntaxlong syntaxmeaning
re.Ire.IGNORECASEignore case.
re.Mre.MULTILINEmake begin/end {^, $} consider each line.
re.Sre.DOTALLmake . match newline too.
re.Ure.UNICODEmake {\w, \W, \b, \B} follow Unicode rules.
re.Lre.LOCALEmake {\w, \W, \b, \B} follow locale.
re.Xre.VERBOSEallow comment in regex.

To specify more than one of them, use | operator to connect them. e.g., string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).


Indicates case-insensitive matching.

re.MULTILINE or re.M

When specified, the pattern character ^ match the beginning of the string and the beginning of each line (immediately following each newline); and the pattern character $ match at the end of the string and at the end of each line (immediately preceding each newline).

Normally, ^ and $ only match at the beginning/end of the string. [see Python: Regex Syntax]

# -*- coding: utf-8 -*-
# python 2

# example of regex flag re.MULTILINE

import re

ss = """abc

r1 = re.findall(r"^\w", ss)
r2 = re.findall(r"^\w", ss, flags = re.MULTILINE)

print r1    # ['a']
print r2    # ['a', 'd', 'g']

re.DOTALL or re.S

Make the dot character . match any character, including a newline. Without this flag, a dot will match anything except a newline.

# -*- coding: utf-8 -*-
# python 2

# example of regex flag re.DOTALL

import re

ss = """once upon a time,
there lived a king"""

r1 = re.findall(r".+", ss)
r2 = re.findall(r".+", ss, re.DOTALL)

print r1    # ['once upon a time,', 'there lived a king']

print r2    # ['once upon a time,\nthere lived a king']

re.UNICODE or re.U

Make the pattern characters {\w, \W, \b, \B} dependent on the Unicode character properties database.

# -*- coding: utf-8 -*-

# example of regex re.UNICODE flag

import re

x1 ="\w+", u"♥αβγ!", re.U)
x2 ="\w+", u"♥αβγ!")

if x1:
    print"utf8") # → 「αβγ」
    print "no match"

print x2    # → 「None」

Note that Unicode string can be in the pattern string. Just be sure to use the Unicode prefix u to the pattern string.

# -*- coding: utf-8 -*-

import re

result = re.findall(ur"β", u"αβγ", re.U)
print result[0].encode("utf8")  # prints β

re.LOCALE or re.L

Make the word pattern {\w, \W} and boundary pattern {\b, \B}, dependent on the current locale. [see Python: Regex Syntax]

re.VERBOSE or re.X

This flag changes the regex syntax, to allow you to add annotations in regex. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a # neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

# -*- coding: utf-8 -*-

import re

# example of the regex re.VERBOSE flag

# matching a decimal number
p1 = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

p2 = re.compile(r"\d+\.\d*")    # pattern p2 is same as p1

r1 = re.findall(p1, u"a3.45")
r2 = re.findall(p2, u"a3.45")

print r1[0].encode("utf8")  # 3.45
print r2[0].encode("utf8")  # 3.45

Python, Regular Expression