Python: Regex Object and Methods
Compile Regex Object
re.compile(regexStr)
-
Compiles regexStr into a object. A regex object can be used by regex object methods. e.g.
RegexObj.search(…)
. re.compile(regexStr, flags)
- Use flags. 〔see Python: Regex Flags〕
The use of regex object is similar to re module functions. e.g. the following 2 have the same result.
x = re.search(regexStr, text)
x = re.compile(regexStr).search(text)
Creating a regex object is useful when you need to use the same pattern on many strings. This is more efficient because python doesn't have to compile the regex pattern internally each time.
The following detail regex object methods. Most of them have the same syntax as Regex Functions, but some are more flexible with optional parameters.
Regex Object Methods
search
RegexObj.search(text)
-
re.compile(regexStr).search(text)
is equivalent tore.search(regexStr, text)
. 〔see re.search〕 RegexObj.search(text, startPos)
- Match on substring starting at startPos.
RegexObj.search(text, startPos, endPos)
- End at endPos.
The following have the same result:
RegexObj.search(text, m, n)
RegexObj.search(text[m:n])
match
RegexObj.match(text)
-
Same result as
re.match(regexStr, text)
. 〔see re.match〕 RegexObj.match(text, startPos)
- Match on substring starting at startPos.
RegexObj.match(text, startPos, endPos)
- End at endPos
split
RegexObj.split(text, …)
-
Same result as
re.split(regexStr, text, …)
. 〔see re.split〕
findall
RegexObj.findall(text)
-
Same result as
re.findall(regexStr, text)
〔see re.findall〕 RegexObj.findall(text, startPos)
- Match on substring starting at startPos.
RegexObj.findall(text, startPos, endPos)
- End at endPos.
finditer
RegexObj.finditer(text)
-
Same result as
re.finditer(regexStr, text)
〔see re.finditer〕 RegexObj.finditer(text, startPos)
- Match on substring starting at startPos.
RegexObj.finditer(text, startPos, endPos)
- End at endPos.
sub
RegexObj.sub( repl, text, …)
-
Same result as
re.sub(regexStr, text, …)
.
〔see re.sub〕
subn
RegexObj.subn( repl, text, …)
-
Same result as
re.subn(regexStr, text, …)
〔see re.subn〕
Regex Attributes
The following are constants assigned by the module when a pattern object is compiled.
.flags
RegexObj.flags
-
The flags argument used when the regex object was compiled, or
0
if no flags were provided.
import re patObj = re.compile(ur'\w', re.M|re.U) print patObj.flags # prints 40
.groupindex
RegexObj.groupindex
-
Return a dictionary. The keys are the named capture groups, the values are their order in the pattern. e.g. here's a pattern of named group
(?P<id>)
. The dictionary is empty if no symbolic groups were used in the pattern.
# -*- coding: utf-8 -*- # python 2 # example of regex .groupindex import re # a typical image tag # <img src="cat.jpg" alt="cat" width="123" height="456"> # a pattern to capture image tag, with named group xx = re.compile(r'src="(?P<filename>[^"]+)" alt="(?P<alttext>[^"]+)" width="(?P<width>\d+)" height="(?P<height>\d+)"') print xx.groupindex # {'height': 4, 'width': 3, 'alttext': 2, 'filename': 1}
.pattern
RegexObj.pattern
- The pattern string from which the regex object was compiled.