WolframLang: String Expression
String Expression represents a string pattern. It is WolframLang's alternative syntax comparable to Regular Expression.
String Expression has a syntax that is more readable, and similar to WolframLang: Pattern Syntax for matching symbolic expression structures.
StringExpression[s1, s2, etc]
-
🔸 SHORT SYNTAX:
s1 ~~ s2 ~~ etc
Represents a string pattern, used in string functions that take a string pattern.
StringExpression Short Syntax
StringExpression[ a, b, c, etc ]
is equivalent to
(a ~~ b ~~ c ~~ etc)
StringExpression Syntax
Here are the most useful ones. (see Wolfram site for complete list.)
Literal Char Sequence
(* literal char sequence *) StringCases[ "abcd", "bc" ] (* {bc} *)
Any Single Char
_
→ any single characterExcept["\n"]
→ a single character that is not newline character.
(* any single char bracketed by space *) StringCases[ "a b c", (" " ~~ _ ~~ " ")] (* { b } *)
Character Class
LetterCharacter
CharacterRange[ "char1", "char2" ]
DigitCharacter
HexadecimalCharacter
WhitespaceCharacter
NumberString
(* any of a b c *) StringCases[ "abcdf", {"a", "b", "c"}] (* {a, b, c} *) (* any letter char *) StringCases[ "x291h", LetterCharacter] (* {x, h} *) (* range of chars *) StringCases[ "9GqYw", CharacterRange[ "a", "z" ], IgnoreCase -> True] (* {G, q, Y, w} *) (* digits *) StringCases[ "x86w3", DigitCharacter..] (* {86, 3} *) (* hexadecimal digits *) StringCases[ "pf7hs72a64", HexadecimalCharacter..] (* {f7, 72a64} *) (* whitespace char *) StringCases[ "some \tthing\nthere", WhitespaceCharacter] (* result is list of space, tab, or newline chars *)
Character Class Exclusion
Except[pattern]
→ a pattern that is not pattern
(* catch bold text *) StringCases[ "<b>this</b> and <b>449</b> and <b>3 some</b>", "<b>" ~~ Except[ "<" ].. ~~ "</b>" ] (* {<b>this</b>, <b>449</b>, <b>3 some</b>} *)
Match Boundary
StartOfString
EndOfString
StartOfLine
EndOfLine
WordBoundary
Except[WordBoundary]
(* start of string *) StringCases[ "abc", StartOfString ~~ LetterCharacter] (* a *) (* end of string *) StringCases[ "abc", LetterCharacter ~~ EndOfString] (* c *) (* start of line *) StringCases[ "abc def", StartOfLine ~~ LetterCharacter] (* {a, d} *) (* end of line *) StringCases[ "abc def", LetterCharacter ~~ EndOfLine] (* {c, f} *) (* word boundary *) StringCases[ "x471 948 y694", WordBoundary ~~ DigitCharacter.. ~~ WordBoundary] (* {948} *) StringCases[ "x471 958", Except[ WordBoundary ] ~~ DigitCharacter.. ~~ Except[ WordBoundary ]] (* {47, 5} *)
Repeat
pattern..
→ one or more timespattern...
→ zero or more times
(* digit repeated one or more times *) StringCases[ "2023 and 2024", DigitCharacter..] (* {2023, 2024} *) (* space repeated one or more times *) StringCases[ "this year 2023", " "..] (* match sequence of spaces *) (* <p> followed by space zero or more times *) StringCases[ "<p> some</p>", "<p>" ~~ " "...] (* {<p> } *)
Repeat N Times
(* digits repeated 4 times *) StringCases[ "95 713 2023", (d:DigitCharacter..)/;(StringLength@d === 4) ]
Alternatives
{pattern1, pattern2, etc}
or
pattern1 | pattern2 | etc
(* digits, preceded by id or x *) StringCases[ "id57 id537 Y418 x41 m6", {"id", "x"} ~~ DigitCharacter..] (* {id57, id537, x41} *)
(* catch integer or hexadecimal *) StringCases[ "18d69 447 3zzbw a1eaa", WordBoundary ~~ {DigitCharacter.., HexadecimalCharacter..} ~~ WordBoundary] (* {18d69, 447, a1eaa} *)
Named Capture
name:pattern
Named capture allows you to name a pattern, for later reference in replacement, or to specify a particular repetition of it the pattern.
(* digits, preceded by id or x. name the digits as d. return a list of the digits as number *) StringCases[ "id57 id537 Y418 x41 m6", {"id", "x"} ~~ d:DigitCharacter.. :> ToExpression[ d ]] (* {57, 537, 41} *)
Conditional Check on Pattern
(name:(pattern))/;(expr)
→ conditional check
Conditional Check is a expression, that takes a named capture, and if the expression return True, it is considered a match.
(* get even numbers *) StringCases[ "95 74 263 41", (d: (WordBoundary~~DigitCharacter..~~WordBoundary))/;(EvenQ[ToExpression[d]]) ] (* {74} *)
StringExpression Examples
(* catch email address *) StringCases[ "joe@mcqxf.com and mary@nvsck.org", LetterCharacter.. ~~ "@" ~~ LetterCharacter.. ~~ "." ~~ LetterCharacter.. ] (* {joe@mcqxf.com, mary@nvsck.org} *)
Ignore Case
To ignore case, use the option
IgnoreCase -> True
(* match string, ignore case *) StringCases["Some Thing", StringExpression["thing"], IgnoreCase -> True] (* {"Thing"} *)
RegularExpression vs StringExpression
On syntax, the
RegularExpression
syntax is more widely understood.
StringExpression
is more readable.
[see WolframLang: Regular Expression]
They almost have the same power, except:
RegularExpression
allows you to have lookahead/lookbehind, and repeats of a given length.StringExpression
allows you to have condition test by expression (pattern/;expr
) and or conditions test by function (pattern?functionQ
), and orderless matchingAnyOrder[ pattern1, pattern2, pattern3 ]
. Also, you can embedRegularExpression
in it.
xx = "Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green"; (* match string using regex, repace it by captured groups *) StringCases[xx, RegularExpression["(\\d+) (red|green|blue)"] :> {"$1","$2"} ] (* {{8, green}, {6, blue}, {20, red}, {5, blue}, {4, red}, {13, green}} *) (* same, using wolfram string expression syntax. *) StringCases[xx, d:NumberString~~" "~~c:"red"|"green"|"blue" :> {d,c} ] (* {{8, green}, {6, blue}, {20, red}, {5, blue}, {4, red}, {13, green}} *)
for tutorial, see https://reference.wolfram.com/language/tutorial/WorkingWithStringPatterns.html