Python Regex Match Object

, , …,

Some regex function or method return a “MatchObject”. The following are its methods & attributes.

Match Object Methods

expand(template string) → return a transformed version of template string, with back references replaced from the captured pattern. This is similar to using re.sub(…, template string, …).

Back refernces include the numeric forms {\1, \2, …}, and {\g<1>, \g<2>, …}, and named forms \g<name>. Note, the named forms need to be specified in the regex pattern by ?P<name>pattern 〔➤ Regex Syntax

# -*- coding: utf-8 -*-
# python

# example of regex match object .expand()

import re

xx = re.compile(r"(\d\d\d\d)")

yy = xx.search("in the year 1999")

print yy.expand(r"Year: \1")    # Year: 1999

groups(), group(), groupdict()

summary of group methods
syntaxmeaning
groups()return a tuple containing all the captured groups of the match.
group(n1,n2,…)return a string or tuple containing one or more captured group, in the order given. The arguments should be integers. If there's a single argument, returns a string, else tuple.
groupdict() return a dictionary containing all the named groups of the match. Key is name of the named group.

groups()

Return a tuple containing all the subgroups of the match.

# -*- coding: utf-8 -*-
# python

# example using match object's .groups() method

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.groups()         # ('a1', 'a2', 'a3')

group(…)

return a string or tuple containing one or more captured group, in the order given. The arguments should be integers: 0 is the whole matched pattern, 1 is the 1st captured string, etc. If there's just 1 argument, returns a string, else tuple. No argument is equivalent to group(0)

# -*- coding: utf-8 -*-
# python

# example of regex match object's methods groups() and group(…)

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)

print matchObj.groups()       # ('a1', 'a2', 'a3')

print matchObj.group()        # 'some a1 a2 a3 list'
print matchObj.group(0)       # 'some a1 a2 a3 list'
print matchObj.group(1)       # 'a1'
print matchObj.group(2)       # 'a2'
print matchObj.group(1,2)     # ('a1', 'a2')
print matchObj.group(2,1,1)   # ('a2', 'a1', 'a1')
print matchObj.group(0,1)     # ('some a1 a2 a3 list', 'a1')

If an argument is negative or larger than the number of groups defined in the pattern, a IndexError exception is raised.

If a group is contained in a part of the pattern that did not match, the corresponding result is None. (NEED EXAMPLE) If a group is contained in a part of the pattern that matched multiple times, the last match is returned. (NEED EXAMPLE)

If the regex pattern uses the (?P<name>…) syntax, the arguments may also be strings identifying groups by name name.

# -*- coding: utf-8 -*-
# python

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (?P<second>\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.group(1, "second", 3) # ('a1', 'a2', 'a3')

If a string argument is not used as a group name in the pattern, IndexError exception is raised.

groupdict(…)

Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None.

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groupdict() # prints {'this': 'a1', 'thatt': 'a3', 'second': 'a2'}

start(…), end(…), span(…)

summary of group methods
syntaxmeaning
start(n)return the index where the n captured group begins.
end(n)return the index where the n captured group ends.
span(n)return a tuple, with start and end position of the nth captured group.

start(…)

Return the indices of the start and end of the substring matched by nth captured pattern. start() is equivalent to start(0), similarly for end(). (0 represents to string matched by the whole regex pattern.) Example:

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.start(1)    # prints 5
print matchObj.end(1)      # prints 7

Return -1 if group exists but did not contribute to the match. For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is m.string[m.start(g):m.end(g)]

Note that m.start(myGroup) will equal m.end(myGroup) if myGroup matched a null string. For example, after m = re.search('b(c?)', 'cba'), m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) raises an “IndexError” exception.

span(…)

For MatchObject m, return the 2-tuple (m.start(n), m.end(n)). Note that if the given captured pattern did not contribute to the match, this is (-1, -1). (MAY NEED AN EXAMPLE HERE). span() is equivalent to span(0).

Match Object Attributes

The following are various attributes of the MatchObject.

string

The string passed to match() or search().

Example:

# -*- coding: utf-8 -*-
# python
import re

mm = re.compile(r'some.+').search('some text')
print mm.string    # prints 'some text'

re

The regular expression object whose match() or search() method produced this MatchObject instance.

pos

The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

endpos

The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.

lastindex

The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applyied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applyied to the same string.

lastgroup

The name of the last matched capturing group, or None if the group didn't have a name, or if no group was matched at all.

blog comments powered by Disqus