Python Regex Functions

By Xah Lee. Date: . Last updated: .

Here's a summary of regex functions.

regex functions summary
syntaxmeaning
re.search(regex,str)return match object if found, else None
re.match(regex,str)similar to re.search(), but match starts at beginning of string.
re.split(regex,str)return a list.
re.findall(regex,str)return a list of non-overlapping (repeated) matches.
re.finditer(…)similar to re.findall(), but returns a iterator.
re.sub(regex,replacement,str)does replacement. Returns the new string.
re.subn(…)similar to re.sub(), but returns a tuple. 1st element is the new string, 2nd is number of replacement.
re.escape(str)add backslash to string for feeding it to regex as pattern. Return the new string.

Note: optional parameters not shown above. See below for detail.

re.search(…)

re.search(pattern, str) → Return MatchObject if pattern matches (part or whole of a string), else return None. Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: '' and 'y*'.

re.search(pattern, str, flags=flags) → use flags flags

# -*- coding: utf-8 -*-
# python 2

# example of re.search()

import re

xx = re.search(r"\w+@\w+\.com", "from xyz@example.com address")

if xx:
    print "yes"
    print xx.group() # → xyz@xyz.com
else:
    print "no"

Note: pattern string should be enclosed using raw quotes, like this r"…". Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use re.search(r"\t+") or re.search("\\t+").

The optional parameter “flags” modifies the meaning of the given pattern. The flags can be any of:

To specify more than one of them, use | operator to connect them. For example, re.search(pattern,string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).

For detail, see: Python Regex Flags

re.match(…)

re.match(pattern, string) → Similar to re.search() except that the match must start at the beginning of string. For example, re.search('me','somestring') matches, but re.match('me','somestring') returns None.

re.match(pattern, string, flags=flags) → use flag.

# -*- coding: utf-8 -*-
# python 2

import re

my_result = re.match('so','somestring') # succeed

if my_result == None:
    print "no match"
else:
    print "yes match"

Note: re.match() is not exactly equivalent to re.search() with ^.

re.search(r'^B', 'A\nB',re.M) # succeeds
re.match(r'B', 'A\nB',re.M)   # fails

re.split(…)

re.split(pattern, string) → Returns a list of splitted string with pattern as boundary.

# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r' +', 'what   do  you think')
#                    ['what', 'do', 'you', 'think']

If the boundary pattern is enclosed in parenthesis, then it is included in the returned list.

# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r'( +)', 'what   do  you think')
#     ['what', '   ', 'do', '  ', 'you', ' ', 'think']

If there are more than one capturing parenthesis in pattern, they are all included in the returned list in sequence.

# -*- coding: utf-8 -*-
# python 2

import re

print re.split(r'( +)(@+)', 'what   @@do  @@you @@think')
# ⇒ ['what', '   ', '@@', 'do', '  ', '@@', 'you', ' ', '@@', 'think']

re.split(pattern, string, maxsplit = n) → split, at most n times.

# -*- coding: utf-8 -*-
# python 2
import re
print re.split(r' ', 'a b c d e', maxsplit = 2)
# ['a', 'b', 'c d e']

re.findall(…)

re.findall(pattern, string) → Return a list of all non-overlapping matches of pattern in string.

re.findall(pattern, string, flags=flags)

# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'@+', 'what   @@@do  @@you @think') # ['@@@', '@@', '@']

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'( +)(@+)', 'what   @@@do  @@you @think')
# ⇒ [('   ', '@@@'), ('  ', '@@'), (' ', '@')]

Empty matches are included in the result unless they touch the beginning of another match.

# -*- coding: utf-8 -*-
# python 2
import re
print re.findall(r'\b', 'what   @@@do  @@you @think')
# ['', '', '', '', '', '', '', '']

TODO: need another example here showing what is meant by “unless they touch the beginning of another match.”

re.finditer(…)

re.finditer(pattern, string) → Similar to re.findall(), except an “iterator” is returned with MatchObject as members. This is to be used in a loop.

re.finditer(pattern, string, flags=flags)

# -*- coding: utf-8 -*-
# python 2
import re

for matched in re.finditer(r'(\w+)', 'what   do  you think'):
    print matched.group() # prints each word in a line

re.sub(…)

re.sub(pattern, repl, string) → Substitute pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Returns a new string.

# -*- coding: utf-8 -*-
# python 2

# example of using re.sub( )

import re

# add alt to image tag
t1 = '<img src="cat.jpg">'
t2 = re.sub(r'src="([a-z]+)\.jpg">', r'src="\1.jpg" alt="\1">', t1)

print t1    # <img src="cat.jpg">
print t2    # <img src="cat.jpg" alt="cat">

repl can also be a function for more complicated replacement. The function must take a MatchObject as argument. For each occurrence of match, the function is called and its return value used as the replacement string.

# -*- coding: utf-8 -*-
# python 2

# example of using re.sub(pattern, rep, str ) where rep is a function

import re

def ff(xx):
    if xx.group(0) == "ea":
        return "æ"
    elif xx.group(0) == "oo":
        return "u"
    else:
        return xx.group(0)

print re.sub(r"[aeiou]+", ff, "encyclopeadia") # encyclopædia
print re.sub(r"[aeiou]+", ff, "book") # buk
print re.sub(r"[aeiou]+", ff, "geek") # geek

pattern may be a string or an regex object. If you need to specify regular expression flags, you can use a regex object. Alternatively, you can embed a flag in your regex pattern by (?iLmsux) in the beginning of your pattern. For example, re.sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'. (See: regex pattern syntax for detail.)

re.sub(pattern, repl, string, count)count is the maximum number of pattern occurrences to be replaced.

In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>…) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn't ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character 0. The backreference \g<0> substitutes in the entire substring matched by the pattern.

re.subn(…)

re.subn(…) → Same as re.sub(…), except it returns a tuple: (new_string, number_of_substitution_made).

re.escape(…)

re.escape(string) → Return a string with a backslash character 「\」 inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.

Exception Error

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern.

If you have a question, put $5 at patreon and message me.

Python

  1. Python 3 Basics
  2. Python 2 Basics
  3. Python 2 and 3 Difference
  4. Print Version
  5. Builtin Help
  6. Quote String
  7. String Methods
  8. Format String
  9. Operators
  10. Complex Numbers
  11. True, False
  12. if then else
  13. Loop
  14. List Basics
  15. Loop Thru List
  16. Map f to List
  17. Copy Nested List
  18. List Comprehension
  19. List Methods
  20. Sort
  21. Dictionary
  22. Loop Thru Dict
  23. Dict Methods
  24. Tuple
  25. Sets
  26. Function
  27. Closure
  28. 2 Closure
  29. Decorator
  30. Class
  31. Object, ID, Type
  32. List Modules
  33. Write a Module
  34. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. File Path
  4. Process Unicode
  5. Convert File Encoding
  6. Find Replace in dir
  7. Find Replace by Regex
  8. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Append String in Loop
  7. Timing f timeit
  8. Keyword Arg Default Value Unstable
  9. Check Page Load Size
  10. Thumbnail Generation