Python Regex Functions

By Xah Lee. Date: . Last updated: .

Here's a summary of regex functions.

regex functions summary
syntaxmeaning
re.search(regex,str)return match object if found, else None
re.match(regex,str)similar to re.search(), but match starts at beginning of string.
re.split(regex,str)return a list.
re.findall(regex,str)return a list of non-overlapping (repeated) matches.
re.finditer(…)similar to re.findall(), but returns a iterator.
re.sub(regex,replacement,str)does replacement. Returns the new string.
re.subn(…)similar to re.sub(), but returns a tuple. 1st element is the new string, 2nd is number of replacement.
re.escape(str)add backslash to string for feeding it to regex as pattern. Return the new string.

Note: optional parameters not shown above. See below for detail.

re.search(…)

re.search(pattern, str) → Return MatchObject if pattern matches (part or whole of a string), else return None. Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: '' and 'y*'.

re.search(pattern, str, flags=flags) → use flags flags

# -*- coding: utf-8 -*-
# python

# example of re.search()

import re

xx = re.search(r"\w+@\w+\.com", "from xyz@example.com address")

if xx:
    print "yes"
    print xx.group() # → xyz@xyz.com
else:
    print "no"

Note: pattern string should be enclosed using raw quotes, like this r"…". Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use re.search(r"\t+") or re.search("\\t+").

The optional parameter “flags” modifies the meaning of the given pattern. The flags can be any of:

To specify more than one of them, use | operator to connect them. For example, re.search(pattern,string,flags=re.IGNORECASE|re.MULTILINE|re.UNICODE).

For detail, see: Python Regex Flags

re.match(…)

re.match(pattern, string) → Similar to re.search() except that the match must start at the beginning of string. For example, re.search('me','somestring') matches, but re.match('me','somestring') returns None.

re.match(pattern, string, flags=flags) → use flag.

# -*- coding: utf-8 -*-
# python

import re

my_result = re.match('so','somestring') # succeed

if my_result == None:
    print "no match"
else:
    print "yes match"

Note: re.match() is not exactly equivalent to re.search() with ^. Example:

re.search(r'^B', 'A\nB',re.M) # succeeds
re.match(r'B', 'A\nB',re.M)   # fails

re.split(…)

re.split(pattern, string) → Returns a list of splitted string with pattern as boundary.

# -*- coding: utf-8 -*-
# python

import re

print re.split(r' +', 'what   do  you think')
#                    ['what', 'do', 'you', 'think']

If the boundary pattern is enclosed in parenthesis, then it is included in the returned list. For Example:

# -*- coding: utf-8 -*-
# python

import re

print re.split(r'( +)', 'what   do  you think')
#     ['what', '   ', 'do', '  ', 'you', ' ', 'think']

If there are more than one capturing parenthesis in pattern, they are all included in the returned list in sequence. For Example:

# -*- coding: utf-8 -*-
# python

import re

print re.split(r'( +)(@+)', 'what   @@do  @@you @@think')
# ⇒ ['what', '   ', '@@', 'do', '  ', '@@', 'you', ' ', '@@', 'think']

re.split(pattern, string, maxsplit = n) → split, at most n times.

# -*- coding: utf-8 -*-
# python
import re
print re.split(r' ', 'a b c d e', maxsplit = 2)
# ['a', 'b', 'c d e']

re.findall(…)

re.findall(pattern, string) → Return a list of all non-overlapping matches of pattern in string.

re.findall(pattern, string, flags=flags)

# -*- coding: utf-8 -*-
# python
import re
print re.findall(r'@+', 'what   @@@do  @@you @think') # ['@@@', '@@', '@']

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Example:

# -*- coding: utf-8 -*-
# python
import re
print re.findall(r'( +)(@+)', 'what   @@@do  @@you @think')
# ⇒ [('   ', '@@@'), ('  ', '@@'), (' ', '@')]

Empty matches are included in the result unless they touch the beginning of another match. Example:

# -*- coding: utf-8 -*-
# python
import re
print re.findall(r'\b', 'what   @@@do  @@you @think')
# ['', '', '', '', '', '', '', '']

TODO: need another example here showing what is meant by “unless they touch the beginning of another match.”

re.finditer(…)

re.finditer(pattern, string) → Similar to re.findall(), except an “iterator” is returned with MatchObject as members. This is to be used in a loop.

re.finditer(pattern, string, flags=flags)

# -*- coding: utf-8 -*-
# python
import re

for matched in re.finditer(r'(\w+)', 'what   do  you think'):
    print matched.group() # prints each word in a line

re.sub(…)

re.sub(pattern, repl, string) → Substitute pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Returns a new string.

# -*- coding: utf-8 -*-
# python

# example of using re.sub( )

import re

# add alt to image tag
t1 = '<img src="cat.jpg">'
t2 = re.sub(r'src="([a-z]+)\.jpg">', r'src="\1.jpg" alt="\1">', t1)

print t1    # <img src="cat.jpg">
print t2    # <img src="cat.jpg" alt="cat">

repl can also be a function for more complicated replacement. The function must take a MatchObject as argument. For each occurrence of match, the function is called and its return value used as the replacement string. Example:

# -*- coding: utf-8 -*-
# python

# example of using re.sub(pattern, rep, str ) where rep is a function

import re

def ff(xx):
    if xx.group(0) == "ea":
        return "æ"
    elif xx.group(0) == "oo":
        return "u"
    else:
        return xx.group(0)

print re.sub(r"[aeiou]+", ff, "encyclopeadia") # encyclopædia
print re.sub(r"[aeiou]+", ff, "book") # buk
print re.sub(r"[aeiou]+", ff, "geek") # geek

pattern may be a string or an regex object. If you need to specify regular expression flags, you can use a regex object. Alternatively, you can embed a flag in your regex pattern by (?iLmsux) in the beginning of your pattern. For example, re.sub("(?i)b+", "x", "bbbb BBBB") returns 'x x'. (See: regex pattern syntax for detail.)

re.sub(pattern, repl, string, count)count is the maximum number of pattern occurrences to be replaced.

In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>…) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn't ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character 0. The backreference \g<0> substitutes in the entire substring matched by the pattern.

re.subn(…)

re.subn(…) → Same as re.sub(…), except it returns a tuple: (new_string, number_of_substitution_made).

re.escape(…)

re.escape(string) → Return a string with a backslash character 「\」 inserted in front of every non-alphanumeric character. This is useful if you want to use a given string as a pattern for exact match.

Exception Error

Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern.

If you have a question, put $5 at patreon and message me.

Python by Example

  1. Python Basics
  2. Print Version String
  3. Builtin Help
  4. Quote String
  5. String Operations
  6. String Methods
  7. Format String
  8. True, False
  9. if then else
  10. for, while, Loops
  11. List Basics
  12. Loop Thru List
  13. Map Function to List
  14. List Comprehension
  15. List Methods
  16. Dictionary
  17. Loop Thru Dict
  18. Dict Methods
  19. Function
  20. Class
  21. List Modules
  22. Write a Module
  23. Unicode 🐍

Regex

  1. Regex Basics
  2. Regex Reference

Text Processing

  1. Read/Write File
  2. Traverse Directory
  3. Manipulate Path
  4. Process Unicode
  5. Convert File Encoding
  6. Find Replace in dir
  7. Find Replace by Regex
  8. Count Word Frequency

Web

  1. Send Email
  2. GET Web Page
  3. Web Crawler
  4. HTTP POST
  5. Check Page Load Size
  6. Thumbnail Generation

Misc

  1. JSON
  2. Find Script Path
  3. Get Env Var
  4. System Call
  5. Decompress Gzip
  6. Complex Numbers

Advanced

  1. Sort
  2. Copy Nested List
  3. Tuple vs List
  4. Sets, Union, Intersection
  5. Closure in Python 2
  6. Decorator
  7. Append String in Loop
  8. Timing f timeit
  9. Keyword Arg Default Value Unstable