Python: Regex
What is Regex
Regular Expression (aka regex) is a character sequence that represent a text pattern.
For example, you can use it to find all email addresses in a file by matching the email address pattern.
Regex is used by many functions to check if a string contains certain pattern, or extract it, or replace it with other string.
Check If String Match
To use regex in Python, first you need to import re
.
To check if a pattern is in string, use:
re.search(pattern, str, flags)
-
If pattern matches (part or whole of a string), then a
Match Object
is returned. Else, Returns
None
. (Match Object evaluates toTrue
) 〔see Python: Regex Match Object〕For regex flags, see: Python: Regex Flags .
# regex matching email email address import re text = "this xyz@example.com that" xx = re.search(r" (\w+@\w+\.com) ", text ) if xx: print("matched") print(xx.group(1)) else: print("no match")
Find and Replace
sub(pattern, repl, string)
-
Substitute pattern in string by the replacement repl. If the pattern isn't found, string is returned unchanged. Returns a new string.
Optional 4th argument is number of replacement to make. If omitted, it replace all occurrences of matches.
# example of regex replace import re x = "123"; x2 = re.sub(r"2", r"8", x) print(x2) # 183
Here's a more complex example, replacing all “gif” image paths to “png” in HTML file.
# regex example of replacing gif to png in html img tag import re myText = r"""<p><img src="rabbits.gif" width="30" height="20"> and <img class="xyz" src="../cats.gif">, but <img src ="tigers.gif">, <img src= "bird.gif">!</p>""" newText = re.sub(r'src\s*=\s*"([^"]+)\.gif"', r'src="\1.png"', myText) print(newText) # <p><img src="rabbits.png" width="30" height="20"> # and <img class="xyz" src="../cats.png">, # but <img src="tigers.png">, # <img src="bird.png">!</p>
Tip
Note: A successful match does not necessarily mean it contains part of the given string. e.g. these patterns matches any string: ''
and 'y*'
.
Note: pattern string should be enclosed using raw quotes, like this r"…"
.
Otherwise, backslashes in it must be escaped. e.g. to search for a sequence of tabs, use re.search(r"\t+")
or re.search("\\t+")
.
〔see Python: Quote String〕