MathCurvesSurfacesWallpaper GroupsGallerySoftwarePOV-Ray
ProgramingLinuxPerl PythonHTMLCSSJavaScriptPHPJavaLang DesignEmacsUnicode ♥

Python Regex Functions

, 2005, …,

search(…)

search(‹pattern›, ‹text›)

If pattern matches (part of a string), then a MatchObject is returned.

Returns None if pattern is not found in the string.

Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: r'' and r'y*'.

Example:

# -*- coding: utf-8 -*-
# python

import re
result = re.search(r"\w+@\w+\.com", "long text xyz@xyz.com long")
if result:
    print "yes!"
    print result.group() # ⇒ xyz@xyz.com
else:
    print "no!"

Note: pattern string should be enclosed using single quotes, like this r'…'. Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use re.search(r'\t+') or re.search("\\t+").

search(‹pattern›, ‹text›, ‹flags›)

The optional second argument “flags” modifies the meaning of the given pattern. The flags can be any of {re.I, re.L, re.M, re.S, re.U, re.S}. They can be combined with the | operator. For example, re.search(pat,re.M|re.U) creates a regex pattern that matches multiple lines of a Unicode string. For detail, see: Python Regex Flags.

match(…)

match(‹pattern›, ‹string›)

match(‹pattern›, ‹string›, ‹flags›)

The “match” function is like “search” except that the match must start at the beginning of string. For example, re.search('me','somestring') matches, but re.match('me','somestring') returns None. Example:

# -*- coding: utf-8 -*-
# python

import re

my_result = re.match('so','somestring') # succeed

if my_result == None:
    print "no match"
else:
    print "yes match"

Note: Match() is not exactly equivalent to Search() with ^. Example:

re.search(r'^B', 'A\nB',re.M) # succeeds
re.match(r'B', 'A\nB',re.M)   # fails

split(…)

split(‹pattern›, ‹string›)

Returns a list of splitted string with pattern as boundary. Example:

re.split(r' +', 'what   do  you think')
# ⇒ ['what', 'do', 'you', 'think']

If the boundary pattern is enclosed in parenthesis, then it is included in the returned list. For Example:

re.split(r'( +)', 'what   do  you think')
# ⇒ ['what', '   ', 'do', '  ', 'you', ' ', 'think']

If there are more than one capturing parenthesis in pattern, they are all included in the returned list in sequence. For Example:

re.split(r'( +)(@+)', 'what   @@do  @@you @@think')
# ⇒ ['what', '   ', '@@', 'do', '  ', '@@', 'you', ' ', '@@', 'think']

split(‹pattern›, ‹string›, maxsplit = ‹n›)

split happens at most ‹n› times.

If the maxsplit is set to 0, it is equivalent to this turned off.

# -*- coding: utf-8 -*-
# python

import re
my_result = re.split(r' ', 'a b c d e', maxsplit = 2)
print my_result # ⇒ ['a', 'b', 'c d e']

findall(…)

findall(‹pattern›, ‹string›)

Return a list of all non-overlapping matches of ‹pattern› in ‹string›. Example:

# -*- coding: utf-8 -*-
# python

import re
my_result = re.findall(r'@+', 'what   @@@do  @@you @think')
print my_result # ⇒ ['@@@', '@@', '@']

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Example:

re.findall(r'( +)(@+)', 'what   @@@do  @@you @think')
# ⇒ [('   ', '@@@'), ('  ', '@@'), (' ', '@')]

Empty matches are included in the result unless they touch the beginning of another match. Example:

re.findall(r'\b', 'what   @@@do  @@you @think')
# ⇒ ['', '', '', '', '', '', '', '']

TODO: need another example here showing what is meant by “unless they touch the beginning of another match.”

findall(‹pattern›, ‹string›, ‹flags›)

finditer(…)

finditer(‹pattern›, ‹string›)

finditer(‹pattern›, ‹string›, ‹flags›)

Like “findall”, except an “iterator” is returned with MatchObject as members. This is to be used in a loop. Example:

for matched in re.finditer(r'(\w+)', 'what   do  you think'):
    print matched.group()

sub(…)

sub(‹pattern›, ‹repl›, ‹string›)

Substitute ‹pattern› in ‹string› by the replacement ‹repl›. If the pattern isn't found, ‹string› is returned unchanged.

Returns a new string.

Example:

# -*- coding: utf-8 -*-
# python

import re

my_str = '<img src="cat.jpg">'
my_result = re.sub(r'src="([a-z]+)\.jpg">', r'src="\1.jpg" alt="\1">', my_str)

print my_str # <img src="cat.jpg">
print my_result # <img src="cat.jpg" alt="cat">

‹repl› can also be a function for more complicated replacement. The function must take a MatchObject as argument. For each occurrence of match, the function is called and its return value used as the replacement string. Example:

def fun(matchObj):
    if matchObj.group(0) == '--A Sage':
        return '--Me'
    else:
        return '--Some Joe'

new_str = re.sub(r'--.+$', fun,'"what do you mean?" --xyz')
print new_str       # prints:  "what do you mean?"  --Some Joe

‹pattern› may be a string or an regex object. If you need to specify regular expression flags, you can use a regex object. Alternatively, you can embed a flag in your regex pattern by (?iLmsux) in the beginning of your pattern. For example, sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'. (See: regex pattern syntax for detail.)

sub(‹pattern›, ‹repl›, ‹string›, ‹count›)

‹count› is the maximum number of pattern occurrences to be replaced.

In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>…) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn't ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character 0. The backreference \g<0> substitutes in the entire substring matched by the pattern.

subn(…)

Same as “sub(…)”, except it returns a tuple: (‹new string›, ‹number of substitution made›).

escape(…)

escape(‹string›)

Return a string with a backslash character “\” inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.

exception error

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern.

blog comments powered by Disqus