Python Regex Match Object

By Xah Lee. Date: . Last updated: .

Some regex function or method return a “MatchObject”. The following are its methods and attributes.

Match Object Methods

expand(template_string) → return a transformed version of template_string, with back references replaced from the captured pattern. This is similar to using re.sub(…, template_string, …).

Back refernces include the numeric forms {\1, \2, …}, and {\g<1>, \g<2>, …}, and named forms \g<name>. Note, the named forms need to be specified in the regex pattern by ?P<name>pattern [see Regex Syntax]

# -*- coding: utf-8 -*-
# python 2

# example of regex match object .expand()

import re

xx = re.compile(r"(\d\d\d\d)")

yy = xx.search("in the year 1999")

print yy.expand(r"Year: \1")    # Year: 1999

groups(), group(), groupdict()

summary of group methods
syntaxmeaning
groups()return a tuple containing all the captured groups of the match.
group(n1,n2,…)return a string or tuple containing one or more captured group, in the order given. The arguments should be integers. If there's a single argument, returns a string, else tuple.
groupdict() return a dictionary containing all the named groups of the match. Key is name of the named group.

groups()

Return a tuple containing all the subgroups of the match.

# -*- coding: utf-8 -*-
# python 2

# example using match object's .groups() method

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.groups()   # ('a1', 'a2', 'a3')

group(…)

return a string or tuple containing one or more captured group, in the order given. The arguments should be integers: 0 is the whole matched pattern, 1 is the 1st captured string, etc. If there's just 1 argument, returns a string, else tuple. No argument is equivalent to group(0)

# -*- coding: utf-8 -*-
# python 2

# example of regex match object's methods groups() and group(…)

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)

print matchObj.groups() # ('a1', 'a2', 'a3')

print matchObj.group()  # 'some a1 a2 a3 list'
print matchObj.group(0) # 'some a1 a2 a3 list'
print matchObj.group(1) # 'a1'
print matchObj.group(2) # 'a2'
print matchObj.group(1,2) # ('a1', 'a2')
print matchObj.group(2,1,1)   # ('a2', 'a1', 'a1')
print matchObj.group(0,1) # ('some a1 a2 a3 list', 'a1')

If an argument is negative or larger than the number of groups defined in the pattern, a IndexError exception is raised.

If a group is contained in a part of the pattern that did not match, the corresponding result is None. (NEED EXAMPLE) If a group is contained in a part of the pattern that matched multiple times, the last match is returned. (NEED EXAMPLE)

If the regex pattern uses the (?P<name>…) syntax, the arguments may also be strings identifying groups by name name.

# -*- coding: utf-8 -*-
# python 2

import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(\w\d+) (?P<second>\w\d+) (\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.group(1, "second", 3) # ('a1', 'a2', 'a3')

If a string argument is not used as a group name in the pattern, IndexError exception is raised.

groupdict(…)

Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None.

# -*- coding: utf-8 -*-
# python 2
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj=patObj.search(myText)
print matchObj.groupdict() # prints {'this': 'a1', 'thatt': 'a3', 'second': 'a2'}

start(…), end(…), span(…)

summary of group methods
syntaxmeaning
start(n)return the index where the n captured group begins.
end(n)return the index where the n captured group ends.
span(n)return a tuple, with start and end position of the nth captured group.

start(…)

Return the indices of the start and end of the substring matched by nth captured pattern. start() is equivalent to start(0), similarly for end(). (0 represents to string matched by the whole regex pattern.)

# -*- coding: utf-8 -*-
# python 2
import re

myText = 'some a1 a2 a3 list'
patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+')
matchObj = patObj.search(myText)
print matchObj.start(1)    # prints 5
print matchObj.end(1)  # prints 7

Return -1 if group exists but did not contribute to the match. For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is m.string[m.start(g):m.end(g)]

Note that m.start(myGroup) will equal m.end(myGroup) if myGroup matched a null string. For example, after m = re.search('b(c?)', 'cba'), m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) raises an “IndexError” exception.

span(…)

For MatchObject m, return the 2-tuple (m.start(n), m.end(n)). Note that if the given captured pattern did not contribute to the match, this is (-1, -1). (MAY NEED AN EXAMPLE HERE). span() is equivalent to span(0).

Match Object Attributes

The following are various attributes of the MatchObject.

string

The string passed to match() or search().

Example:

# -*- coding: utf-8 -*-
# python 2
import re

mm = re.compile(r'some.+').search('some text')
print mm.string    # prints 'some text'

re

The regular expression object whose match() or search() method produced this MatchObject instance.

pos

The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.

endpos

The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.

lastindex

The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applyied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applyied to the same string.

lastgroup

The name of the last matched capturing group, or None if the group didn't have a name, or if no group was matched at all.

If you have a question, put $5 at patreon and message me.

Python

  1. Python 3 Basics
  2. Python 2 Basics
  3. Python 2 and 3 Difference
  4. Print Version
  5. Builtin Help
  6. Quote String
  7. String Methods
  8. Format String
  9. Operators
  10. Complex Numbers
  11. True, False
  12. if then else
  13. Loop
  14. List Basics
  15. Loop Thru List
  16. Map f to List
  17. Copy Nested List
  18. List Comprehension
  19. List Methods
  20. Sort
  21. Dictionary
  22. Loop Thru Dict
  23. Dict Methods
  24. Tuple
  25. Sets
  26. Function
  27. Closure
  28. 2 Closure
  29. Decorator
  30. Class
  31. Object, ID, Type
  32. List Modules
  33. Write a Module
  34. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. File Path
  4. Process Unicode
  5. Convert File Encoding
  6. Find Replace in dir
  7. Find Replace by Regex
  8. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Append String in Loop
  7. Timing f timeit
  8. Keyword Arg Default Value Unstable
  9. Check Page Load Size
  10. Thumbnail Generation