WolframLang: String Expression

By Xah Lee. Date: . Last updated: .

String Expression represents a string pattern. It is WolframLang's alternative syntax comparable to Regular Expression.

String Expression has a syntax that is more readable, and similar to WolframLang: Pattern Syntax for matching symbolic expression structures.

StringExpression[s1, s2, etc]

🔸 SHORT SYNTAX: s1 ~~ s2 ~~ etc

Represents a string pattern, used in string functions that take a string pattern.

StringExpression

WolframLang StringExpression 2022-04-29
WolframLang StringExpression 2022-04-29

StringExpression Short Syntax

StringExpression[ a, b, c, etc ]

is equivalent to

(a ~~ b ~~ c ~~ etc)

StringExpression Syntax

Here are the most useful ones. (see Wolfram site for complete list.)

Literal Char Sequence

(* literal char sequence *)
StringCases[ "abcd", "bc" ]
(* {bc} *)

Any Single Char

(* any single char bracketed by space *)
StringCases[ "a b c", (" " ~~ _ ~~ " ")]
(* { b } *)

Character Class

(* any of a b c *)
StringCases[ "abcdf", {"a", "b", "c"}]
(* {a, b, c} *)

(* any letter char *)
StringCases[ "x291h", LetterCharacter]
(* {x, h} *)

(* range of chars *)
StringCases[ "9GqYw",
CharacterRange[ "a", "z" ],
IgnoreCase -> True]
(* {G, q, Y, w} *)

(* digits *)
StringCases[ "x86w3", DigitCharacter..]
(* {86, 3} *)

(* hexadecimal digits *)
StringCases[ "pf7hs72a64", HexadecimalCharacter..]
(* {f7, 72a64} *)

(* whitespace char *)
StringCases[ "some \tthing\nthere", WhitespaceCharacter]
(* result is list of space, tab, or newline chars *)

Character Class Exclusion

(* catch bold text *)
StringCases[
"<b>this</b> and <b>449</b> and <b>3 some</b>",
"<b>" ~~ Except[ "<" ].. ~~ "</b>"
]
(* {<b>this</b>, <b>449</b>, <b>3 some</b>} *)

Match Boundary

(* start of string *)
StringCases[ "abc",
StartOfString ~~ LetterCharacter]
(* a *)

(* end of string *)
StringCases[ "abc",
LetterCharacter ~~ EndOfString]
(* c *)

(* start of line *)
StringCases[ "abc
def",
StartOfLine  ~~ LetterCharacter]
(* {a, d} *)

(* end of line *)
StringCases[ "abc
def",
LetterCharacter ~~ EndOfLine]
(* {c, f} *)

(* word boundary *)
StringCases[ "x471 948 y694",
WordBoundary ~~ DigitCharacter.. ~~ WordBoundary]
(* {948} *)

StringCases[ "x471 958",
Except[ WordBoundary ] ~~ DigitCharacter.. ~~ Except[ WordBoundary ]]
(* {47, 5} *)

Repeat

(* digit repeated one or more times *)
StringCases[ "2023 and 2024", DigitCharacter..]
(* {2023, 2024} *)

(* space repeated one or more times *)
StringCases[ "this year   2023", " "..]
(* match sequence of spaces *)

(* <p> followed by space zero or more times *)
StringCases[ "<p>  some</p>", "<p>" ~~ " "...]
(* {<p>  } *)

Repeat N Times

(* digits repeated 4 times *)
StringCases[ "95 713 2023", (d:DigitCharacter..)/;(StringLength@d === 4) ]

Alternatives

{pattern1, pattern2, etc}

or

pattern1 | pattern2 | etc

(* digits, preceded by id or x *)
StringCases[ "id57 id537 Y418 x41 m6",
{"id", "x"} ~~ DigitCharacter..]
(* {id57, id537, x41} *)
(* catch integer or hexadecimal *)
StringCases[ "18d69  447 3zzbw a1eaa",
WordBoundary ~~ {DigitCharacter.., HexadecimalCharacter..} ~~ WordBoundary]
(* {18d69, 447, a1eaa} *)

Named Capture

name:pattern

Named capture allows you to name a pattern, for later reference in replacement, or to specify a particular repetition of it the pattern.

(* digits, preceded by id or x.
name the digits as d.
return a list of the digits as number
 *)
StringCases[ "id57 id537 Y418 x41 m6",
{"id", "x"} ~~ d:DigitCharacter.. :> ToExpression[ d ]]
(* {57, 537, 41} *)

Conditional Check on Pattern

Conditional Check is a expression, that takes a named capture, and if the expression return True, it is considered a match.

(* get even numbers *)
StringCases[ "95 74 263 41",
 (d: (WordBoundary~~DigitCharacter..~~WordBoundary))/;(EvenQ[ToExpression[d]])
]
(* {74} *)

StringExpression Examples

(* catch email address *)

StringCases[
 "joe@mcqxf.com and mary@nvsck.org",
 LetterCharacter.. ~~ "@" ~~ LetterCharacter.. ~~ "." ~~ LetterCharacter..
]
(* {joe@mcqxf.com, mary@nvsck.org} *)

Ignore Case

To ignore case, use the option IgnoreCase -> True

(* match string, ignore case *)
StringCases["Some Thing", StringExpression["thing"], IgnoreCase -> True]

(* {"Thing"} *)

RegularExpression vs StringExpression

On syntax, the RegularExpression syntax is more widely understood. StringExpression is more readable. [see WolframLang: Regular Expression]

They almost have the same power, except:

WolframLang string pattern 2022-04-29 r2g8
WolframLang string pattern 2022-04-29 r2g8
xx = "Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green";

(* match string using regex, repace it by captured groups *)
StringCases[xx,
RegularExpression["(\\d+) (red|green|blue)"] :> {"$1","$2"}
]
(* {{8, green}, {6, blue}, {20, red}, {5, blue}, {4, red}, {13, green}} *)

(* same, using wolfram string expression syntax. *)
StringCases[xx,
d:NumberString~~" "~~c:"red"|"green"|"blue" :> {d,c}
]
(* {{8, green}, {6, blue}, {20, red}, {5, blue}, {4, red}, {13, green}} *)

for tutorial, see https://reference.wolfram.com/language/tutorial/WorkingWithStringPatterns.html

WolframLang String