Python: Regex Tutorial

By Xah Lee. Date: . Last updated: .

Regular Expression (or regex in short) is a character sequence that represent a pattern for matching text. For example, you can use it to find all email addresses in a file by matching the email address pattern. Regex is used by many functions to check if a string contains certain pattern, or extract it, or replace it with other string.

Check If String Match

To use regex in Python, first you need to import re.

To check if a pattern is in string, use:

re.search(pattern, str, flags)
If pattern matches (part or whole of a string), then a Match Object is returned. Else, Returns None. (Match Object evaluates to True) [see Python: Regex Match Object]

For regex flags, see: Python: Regex Flags .

# regex matching email email address

import re

text = "this xyz@example.com that"
xx = re.search(r" (\w+@\w+\.com) ", text )

if xx:
    print("matched")
    print(xx.group(1))
else:
    print("no match")

Find and Replace

sub(pattern, repl, string)
Substitute pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Returns a new string.

Optional 4th argument is number of replacement to make. If omitted, it replace all occurrences of matches.

# example of regex replace
import re
x = "123";
x2 = re.sub(r"2", r"8", x)
print(x2)
# 183

Here's a more complex example, replacing all “gif” image paths to “png” in HTML file.

# regex example of replacing gif to png in html img tag

import re

myText = r"""<p><img src="rabbits.gif" width="30" height="20">
and <img class="xyz" src="../cats.gif">,
but <img src ="tigers.gif">,
 <img src=
"bird.gif">!</p>"""

newText = re.sub(r'src\s*=\s*"([^"]+)\.gif"', r'src="\1.png"', myText)

print(newText)

# <p><img src="rabbits.png" width="30" height="20">
# and <img class="xyz" src="../cats.png">,
# but <img src="tigers.png">,
#  <img src="bird.png">!</p>

Tip

Note: A successful match does not necessarily mean it contains part of the given string. For example, these patterns matches any string: '' and 'y*'.

Note: pattern string should be enclosed using raw quotes, like this r"…". Otherwise, backslashes in it must be escaped. For example, to search for a sequence of tabs, use re.search(r"\t+") or re.search("\\t+"). [see Python: Quote String]

Python Regular Expression