MathCurvesSurfacesWallpaper GroupsGallerySoftwarePOV-Ray
ProgramingLinuxPerl PythonHTMLCSSJavaScriptPHPJavaEmacsUnicode ♥
Web Hosting by 1&1

Python Regex Match Objects

Xah Lee, 2005,

this page is under revision.

Match Objects

Some functions and methods return a “MatchObject”. The following are their methods and attributes:

expand(‹template string›)

Returns ‹template string›, with back references replaced from the captured pattern. This is similar to the “sub()” function. Example:

matchObj = re.compile(‹pattern›).search(‹mytext›)
result = matchObj.expand(‹mytemplate›)

is equivalent to

result = re.sub(‹pattern›, ‹mytemplate›, ‹mytext›)

Back refernces include the numeric forms, ⁖ \1, \2, …, or \g<1>, \g<2>, …, as well as named forms, ⁖ \g<name>. Note, the named forms needs to be specified in the regex pattern by ?P<name>‹pattern› 〔☛ Regex Syntax

Here's a complete usage example of expand():

# -*- coding: utf-8 -*-
# python

import re
xx = re.compile(r'date is (\d\d\d\d)')
yy = xx.search('the date is 1999.')
print yy.expand(r'Date: \1') # prints: Date: 1999

groups(), group(), groupdict()

The following methods “groups()”, “group()”, “groupdict()” all returns the captured match in different ways.

“groups()” returns them all, “group(n1,n2, …)” returns them in a user specified order and combination, and “groupdict()” returns the named captures as a dictionary.

groups()

Return a tuple containing all the subgroups of the match.

Example:

# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.groups() # prints: ('a1', 'a2', 'a3')

group( [n1, n2, n3 …])

Returns one or more captured patterns. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Each argument is a integer reference to the captured pattern, with 0 denoting the entire matched pattern. No argument group() is equivalent to group(0).

Example:

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.groups()       # ⇒ ('a1', 'a2', 'a3')
print matchObj.group()        # ⇒ 'some a1 a2 a3 list'
print matchObj.group(0)       # ⇒ 'some a1 a2 a3 list'
print matchObj.group(1)       # ⇒ 'a1'
print matchObj.group(2)       # ⇒ 'a2'
print matchObj.group(1,2)     # ⇒ ('a1', 'a2')
print matchObj.group(2,1,1)   # ⇒ ('a2', 'a1', 'a1')
print matchObj.group(0,1)     # ⇒ ('some a1 a2 a3 list', 'a1')

If an argument is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised.

If a group is contained in a part of the pattern that did not match, the corresponding result is None. (NEED EXAMPLE) If a group is contained in a part of the pattern that matched multiple times, the last match is returned. (NEED EXAMPLE)

If the regular expression uses the (?P<name>…) syntax, the arguments may also be strings identifying groups by name.

Example:

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (?P<second>\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.group(1,'second',3)     # prints: ('a1', 'a2', 'a3')

If a string argument is not used as a group name in the pattern, IndexError exception is raised.

groupdict([default])

Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None. Example:

Example:

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groupdict() # prints {'this': 'a1', 'thatt': 'a3', 'second': 'a2'}

start( [n])

end([n])

Return the indices of the start and end of the substring matched by nth captured pattern. start() is equivalent to start(0), similarly for end(). (0 represents to string matched by the whole regex pattern.) Example:

# -*- coding: utf-8 -*-
# python
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.start(1)    # prints 5
print matchObj.end(1)      # prints 7

Return -1 if group exists but did not contribute to the match. (todo: NOTE QUITE UNDERSTAND THIS. NEED EXAMPLE HERE) For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is m.string[m.start(g):m.end(g)]

Note that m.start(myGroup) will equal m.end(myGroup) if myGroup matched a null string. For example, after m = re.search('b(c?)', 'cba'), m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) raises an “IndexError” exception.

span([n])

For MatchObject m, return the 2-tuple (m.start(n), m.end(n)). Note that if the given captured pattern did not contribute to the match, this is (-1, -1). (MAY NEED AN EXAMPLE HERE). span() is equivalent to span(0).

The following are various attributes of the MatchObject.

string

The string passed to match() or search().

Example:

# -*- coding: utf-8 -*-
# python
import re

mm = re.compile(r'some.+').search('some text')
print mm.string    # prints 'some text'

re

The regular expression object whose match() or search() method produced this MatchObject instance.

pos

The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

endpos

The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.

lastindex

The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applyied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applyied to the same string.

lastgroup

The name of the last matched capturing group, or None if the group didn't have a name, or if no group was matched at all.

blog comments powered by Disqus