Wolfram: StringExpression Pattern Syntax

By Xah Lee. Date: 2024-06-27. Last updated: 2024-10-06.

Here are the most useful patterns for String Expression . (see Wolfram site for complete list.)

Literal Char Sequence

(* literal char sequence *)
StringCases[ "abcd", "bc" ]
(* {bc} *)

Any Single Char

_ → any single character
Except["\n"] → a single character that is not newline character.

(* any single char bracketed by space *)
StringCases[ "a b c", (" " ~~ _ ~~ " ")]
(* { b } *)

Character Class

LetterCharacter
CharacterRange[ char1, char2 ]
DigitCharacter
HexadecimalCharacter
WhitespaceCharacter
NumberString

(* any of a b c *)
StringCases[ "abcdf", {"a", "b", "c"}]
(* {a, b, c} *)

(* any letter char *)
StringCases[ "x291h", LetterCharacter]
(* {x, h} *)

(* range of chars *)
StringCases[ "9GqYw",
CharacterRange[ "a", "z" ],
IgnoreCase -> True]
(* {G, q, Y, w} *)

(* digits *)
StringCases[ "x86w3", DigitCharacter..]
(* {86, 3} *)

(* hexadecimal digits *)
StringCases[ "pf7hs72a64", HexadecimalCharacter..]
(* {f7, 72a64} *)

(* whitespace char *)
StringCases[ "some \tthing\nthere", WhitespaceCharacter]
(* result is list of space, tab, or newline chars *)

Character Class Exclusion

Except[pattern] → a pattern that is not pattern

(* catch bold text *)
StringCases[
"<b>this</b> and <b>449</b> and <b>3 some</b>",
"<b>" ~~ Except[ "<" ].. ~~ "</b>"
]
(* {<b>this</b>, <b>449</b>, <b>3 some</b>} *)

Match Boundary

StartOfString
EndOfString
StartOfLine
EndOfLine
WordBoundary
Except[WordBoundary]

(* start of string *)
StringCases[ "abc",
StartOfString ~~ LetterCharacter]
(* a *)

(* end of string *)
StringCases[ "abc",
LetterCharacter ~~ EndOfString]
(* c *)

(* start of line *)
StringCases[ "abc
def",
StartOfLine  ~~ LetterCharacter]
(* {a, d} *)

(* end of line *)
StringCases[ "abc
def",
LetterCharacter ~~ EndOfLine]
(* {c, f} *)

(* word boundary *)
StringCases[ "x471 948 y694",
WordBoundary ~~ DigitCharacter.. ~~ WordBoundary]
(* {948} *)

StringCases[ "x471 958",
Except[ WordBoundary ] ~~ DigitCharacter.. ~~ Except[ WordBoundary ]]
(* {47, 5} *)

Repeat

pattern.. → one or more times
pattern... → zero or more times

(* digit repeated one or more times *)
StringCases[ "2023 and 2024", DigitCharacter..]
(* {2023, 2024} *)

(* space repeated one or more times *)
StringCases[ "this year   2023", " "..]
(* match sequence of spaces *)

(* <p> followed by space zero or more times *)
StringCases[ "<p>  some</p>", "<p>" ~~ " "...]
(* {<p>  } *)

Repeat N Times

(* digits repeated 4 times *)
StringCases[ "95 713 2023", (d:DigitCharacter..)/;(StringLength@d === 4) ]
(* {2023} *)

Alternatives

{pattern1, pattern2, etc}

pattern1 | pattern2 | etc

(* digits, preceded by id or x *)
StringCases[ "id57 id537 Y418 x41 m6",
{"id", "x"} ~~ DigitCharacter..]
(* {id57, id537, x41} *)

(* catch integer or hexadecimal *)
StringCases[ "18d69  447 3zzbw a1eaa",
WordBoundary ~~ {DigitCharacter.., HexadecimalCharacter..} ~~ WordBoundary]
(* {18d69, 447, a1eaa} *)

Named Capture

name:pattern

Named capture allows you to name a pattern, for later reference in replacement, or to specify a particular repetition of it the pattern.

(* digits, preceded by id or x.
name the digits as d.
return a list of the digits as number
 *)
StringCases[ "id57 id537 Y418 x41 m6",
{"id", "x"} ~~ d:DigitCharacter.. :> ToExpression[ d ]]
(* {57, 537, 41} *)

Conditional Check on Pattern

(name:(pattern))/;(expr) → conditional check

Conditional Check is a expression, that takes a named capture, and if the expression return True, it is considered a match.

(* get even numbers *)
StringCases[ "95 74 263 41",
 (d: (WordBoundary~~DigitCharacter..~~WordBoundary))/;(EvenQ[ToExpression[d]])
]
(* {74} *)