Python Regex Functions

By Xah Lee. Date: . Last updated: .

Here's a summary of regex functions.,str)
Return match object if found, else None
Similar to, but match starts at beginning of string.
Return a list.
Return a list of non-overlapping (repeated) matches.
Similar to re.findall(), but returns a iterator.
Does replacement. Returns the new string.
Similar to re.sub(), but returns a tuple. 1st element is the new string, 2nd is number of replacement.
Add backslash to string for feeding it to regex as pattern. Return the new string.

Note: optional parameters not shown above. See below for detail., str)
Return MatchObject if pattern matches (part or whole of a string), else return None. Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: '' and 'y*'., str, flags=flags)
Use flags flags
# -*- coding: utf-8 -*-
# python 2

# example of

import re

xx ="\w+@\w+\.com", "from address")

if xx:
    print "yes"
    print # →
    print "no"

Note: pattern string should be enclosed using raw quotes, like this r"…". Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use"\t+") or"\\t+").

The optional parameter “flags” modifies the meaning of the given pattern. The flags can be any of:

To specify more than one of them, use | operator to connect them. For example,,string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).

For detail, see: Python Regex Flags

re.match(pattern, string)
Similar to except that the match must start at the beginning of string. For example,'me','somestring') matches, but re.match('me','somestring') returns None.
re.match(pattern, string, flags=flags)
Use flag.
# -*- coding: utf-8 -*-
# python 2

import re

my_result = re.match('so','somestring') # succeed

if my_result == None:
    print "no match"
    print "yes match"

Note: re.match() is not exactly equivalent to with ^.'^B', 'A\nB',re.M) # succeeds
re.match(r'B', 'A\nB',re.M)   # fails
re.split(pattern, string)
Returns a list of splitted string with pattern as boundary.
# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r' +', 'what   do  you think')
#                    ['what', 'do', 'you', 'think']

If the boundary pattern is enclosed in parenthesis, then it is included in the returned list.

# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r'( +)', 'what   do  you think')
#     ['what', '   ', 'do', '  ', 'you', ' ', 'think']

If there are more than one capturing parenthesis in pattern, they are all included in the returned list in sequence.

# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r'( +)(@+)', 'what   @@do  @@you @@think')
# ⇒ ['what', '   ', '@@', 'do', '  ', '@@', 'you', ' ', '@@', 'think']
re.split(pattern, string, maxsplit = n)
Split, at most n times.
# -*- coding: utf-8 -*-
# python 2
import re
print re.split(r' ', 'a b c d e', maxsplit = 2)
# ['a', 'b', 'c d e']
re.findall(pattern, string)
Return a list of all non-overlapping matches of pattern in string.
re.findall(pattern, string, flags=flags)
With flags.
# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'@+', 'what   @@@do  @@you @think') # ['@@@', '@@', '@']

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'( +)(@+)', 'what   @@@do  @@you @think')
# ⇒ [('   ', '@@@'), ('  ', '@@'), (' ', '@')]

Empty matches are included in the result unless they touch the beginning of another match.

# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'\b', 'what   @@@do  @@you @think')
# ['', '', '', '', '', '', '', '']

TODO: need another example here showing what is meant by “unless they touch the beginning of another match.”

re.finditer(pattern, string)
Similar to re.findall(), except an “iterator” is returned with MatchObject as members. This is to be used in a loop.
re.finditer(pattern, string, flags=flags)
With flags.
# -*- coding: utf-8 -*-
# python 2
import re

for matched in re.finditer(r'(\w+)', 'what   do  you think'):
    print # prints each word in a line
re.sub(pattern, repl, string)
Substitute pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Returns a new string.
# -*- coding: utf-8 -*-
# python 2

# example of using re.sub( )

import re

# add alt to image tag
t1 = '<img src="cat.jpg">'
t2 = re.sub(r'src="([a-z]+)\.jpg">', r'src="\1.jpg" alt="\1">', t1)

print t1    # <img src="cat.jpg">
print t2    # <img src="cat.jpg" alt="cat">

repl can also be a function for more complicated replacement. The function must take a MatchObject as argument. For each occurrence of match, the function is called and its return value used as the replacement string.

# -*- coding: utf-8 -*-
# python 2

# example of using re.sub(pattern, rep, str ) where rep is a function

import re

def ff(xx):
    if == "ea":
        return "æ"
    elif == "oo":
        return "u"

print re.sub(r"[aeiou]+", ff, "encyclopeadia") # encyclopædia
print re.sub(r"[aeiou]+", ff, "book") # buk
print re.sub(r"[aeiou]+", ff, "geek") # geek

pattern may be a string or an regex object. If you need to specify regular expression flags, you can use a regex object. Alternatively, you can embed a flag in your regex pattern by (?iLmsux) in the beginning of your pattern. For example, re.sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'. (See: regex pattern syntax for detail.)

re.sub(pattern, repl, string, count)
count is the maximum number of pattern occurrences to be replaced.
Same as re.sub(), except it returns a tuple: (new_string, number_of_substitution_made).

In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>…) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn't ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character 0. The backreference \g<0> substitutes in the entire substring matched by the pattern.

Return a string with a backslash character \ inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.

Exception Error

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern.

Python Regular Expression



Text Processing