WolframLang: StringExpression Pattern Syntax
Here are the most useful patterns for String Expression . (see Wolfram site for complete list.)
Literal Char Sequence
(* literal char sequence *) StringCases[ "abcd", "bc" ] (* {bc} *)
Any Single Char
_
→ any single characterExcept["\n"]
→ a single character that is not newline character.
(* any single char bracketed by space *) StringCases[ "a b c", (" " ~~ _ ~~ " ")] (* { b } *)
Character Class
LetterCharacter
CharacterRange[ char1, char2 ]
DigitCharacter
HexadecimalCharacter
WhitespaceCharacter
NumberString
(* any of a b c *) StringCases[ "abcdf", {"a", "b", "c"}] (* {a, b, c} *) (* any letter char *) StringCases[ "x291h", LetterCharacter] (* {x, h} *) (* range of chars *) StringCases[ "9GqYw", CharacterRange[ "a", "z" ], IgnoreCase -> True] (* {G, q, Y, w} *) (* digits *) StringCases[ "x86w3", DigitCharacter..] (* {86, 3} *) (* hexadecimal digits *) StringCases[ "pf7hs72a64", HexadecimalCharacter..] (* {f7, 72a64} *) (* whitespace char *) StringCases[ "some \tthing\nthere", WhitespaceCharacter] (* result is list of space, tab, or newline chars *)
Character Class Exclusion
Except[pattern]
→ a pattern that is not pattern
(* catch bold text *) StringCases[ "<b>this</b> and <b>449</b> and <b>3 some</b>", "<b>" ~~ Except[ "<" ].. ~~ "</b>" ] (* {<b>this</b>, <b>449</b>, <b>3 some</b>} *)
Match Boundary
StartOfString
EndOfString
StartOfLine
EndOfLine
WordBoundary
Except[WordBoundary]
(* start of string *) StringCases[ "abc", StartOfString ~~ LetterCharacter] (* a *) (* end of string *) StringCases[ "abc", LetterCharacter ~~ EndOfString] (* c *) (* start of line *) StringCases[ "abc def", StartOfLine ~~ LetterCharacter] (* {a, d} *) (* end of line *) StringCases[ "abc def", LetterCharacter ~~ EndOfLine] (* {c, f} *) (* word boundary *) StringCases[ "x471 948 y694", WordBoundary ~~ DigitCharacter.. ~~ WordBoundary] (* {948} *) StringCases[ "x471 958", Except[ WordBoundary ] ~~ DigitCharacter.. ~~ Except[ WordBoundary ]] (* {47, 5} *)
Repeat
pattern..
→ one or more timespattern...
→ zero or more times
(* digit repeated one or more times *) StringCases[ "2023 and 2024", DigitCharacter..] (* {2023, 2024} *) (* space repeated one or more times *) StringCases[ "this year 2023", " "..] (* match sequence of spaces *) (* <p> followed by space zero or more times *) StringCases[ "<p> some</p>", "<p>" ~~ " "...] (* {<p> } *)
Repeat N Times
(* digits repeated 4 times *) StringCases[ "95 713 2023", (d:DigitCharacter..)/;(StringLength@d === 4) ] (* {2023} *)
Alternatives
{pattern1, pattern2, etc}
or
pattern1 | pattern2 | etc
(* digits, preceded by id or x *) StringCases[ "id57 id537 Y418 x41 m6", {"id", "x"} ~~ DigitCharacter..] (* {id57, id537, x41} *)
(* catch integer or hexadecimal *) StringCases[ "18d69 447 3zzbw a1eaa", WordBoundary ~~ {DigitCharacter.., HexadecimalCharacter..} ~~ WordBoundary] (* {18d69, 447, a1eaa} *)
Named Capture
name:pattern
Named capture allows you to name a pattern, for later reference in replacement, or to specify a particular repetition of it the pattern.
(* digits, preceded by id or x. name the digits as d. return a list of the digits as number *) StringCases[ "id57 id537 Y418 x41 m6", {"id", "x"} ~~ d:DigitCharacter.. :> ToExpression[ d ]] (* {57, 537, 41} *)
Conditional Check on Pattern
(name:(pattern))/;(expr)
→ conditional check
Conditional Check is a expression, that takes a named capture, and if the expression return True, it is considered a match.
(* get even numbers *) StringCases[ "95 74 263 41", (d: (WordBoundary~~DigitCharacter..~~WordBoundary))/;(EvenQ[ToExpression[d]]) ] (* {74} *)
WolframLang String
- WolframLang: String
- WolframLang: String Functions
- WolframLang: Get SubString
- WolframLang: Convert String
- WolframLang: Format String
- WolframLang: String Match
- WolframLang: String Replace
- WolframLang: Regular Expression
- WolframLang: String Expression
- WolframLang: StringExpression Pattern Syntax
- WolframLang: RegularExpression vs StringExpression