Pattern Matching in Ocaml
Pattern Matching is similar to regex. Regex is about pattern matching on texts. You can use regex to check if a string has certain form, or extract parts of it to construct new string. Functional language's Pattern Matching is the same idea, but applied to any expressions of the language, typically list-like data. For example, you can use pattern matching to check a list-like data has elements that is of a particular type, or extract all elements of such type, or check if a list-like-thing has certain structure of tree shape, and extract parts that matches your pattern to construct new data structure. Pattern matching is heavily used in ocaml similar to how regex is heavily used for text processing.
For example, suppose you have a list, and you want all elements that's int type in the list. Typically, you would loop thru the list and add any item you found to a new list. But with pattern matching, you have a short syntax to specify a pattern instead of coding the details of the loop. In more complicated patterns, writing a loop to test on elements will be almost impossible or inefficient. (imagine, if you don't have regex and want to write your own code to extract email addresses in a piece of text.)
The syntax for pattern matching has this form:
match expression with pattern
(* simple pattern matching examples *) (* get middle item of a list *) match [3;4;5] with [a;b;c] -> b;; (* ⇒ 4 *) (* get the second element of a 2-tuple *) match (3, "woot!") with ( m, n ) -> n;; (* "woot!" *)
You can specify several cases of patterns by using the alternative syntax “|”.
(* example of pattern matching with alternatives *) match 4 with 4 -> 5 | 5 -> 6;; (* ⇒ 5 *) match (3,4) with (3,7) -> 5 | (3,4) -> 7 | (3,8) -> 6;; (* ⇒ 7 *)
Note, when you use
match x with a -> a1 | b -> b1 | …, each of the x, a, b, etc must be of the same type, and the replacement values a1, b1, etc must all be of the same type.
Alternatives can be used for the matching part only. It is useful when you want to replace multiple patterns into the same value.
(* examples of single pattern of alternatives *) match 4 with 3 | 4 | 5 -> 6;; (* ⇒ 6 *) match (3,4) with (3,7) | (3,4) | (3,8) -> 7;; (* ⇒ 7 *)
For coding convenience, the alternative operator “|” can begin a match.
(* the following are syntactically equivalent *) match 4 with | 4 -> 5 | 5 -> 6;; match 4 with 4 -> 5 | 5 -> 6;;
The Wildcard “_”
The character underscore “_”, can be used in pattern matching to match any value.
(* examples of pattern matching using wildcard “_” *) match 4 with _ -> 5;; (* ⇒ 5 *) (* tuple *) match (3,4) with (_,4) -> 5;; (* ⇒ 5 *) (* list *) match [3;4] with [_;_] -> 5;; (* ⇒ 5 *) (* array *) match [| 3;4 |] with [| _;_ |] -> 5;; (* ⇒ 5 *)
You can name your patterns, so that you can use it in your replacement. The simplest way to name your pattern is just to use a name in place of the pattern. Examples:
match 4 with x -> x;; (* ⇒ 4 *) match 4 with xyz -> xyz;; (* ⇒ 4 *) match [3;4;5] with [a;b;c] -> b;; (* ⇒ 4 *) match (3, 7, 4) with ( a, b, c ) -> c;; (* ⇒ 4 *) match (3, 4, (8,9)) with (a, b, c ) -> c;; (* ⇒ (8, 9) *)
However, when you use a name in pattern like above, the name cannot appear twice.
Suppose you have 3-tuple “(x,y,z)”, and when the x and equal to y, you want transform it to just (x,z). You cannot do it like in the following example:
let d = (3, 3, 4) in match d with ( a, a, c ) -> (a, c);; (* Error: Variable a is bound several times in this matching *)
For more complex patterns, you can use the “as” keyword, like this:
pattern as name
(* examples of named patterns *) match 3 with 3 as x -> x;; (* 3 *) match (3,4) with (3 as y, 4) -> y;; (* 3 *) match (3,4) with ((3 as x), (4 as y)) -> (y, x);; (* ⇒ (4, 3) *) match ((3,9),4) with (_ as xx, 4) -> xx;; (* ⇒ (3,9) *)
A bit warning about pattern matching with tuples. Tuples actualy don't need the parenthesis. The tuple “(a, b, c, …)” can be written as “a, b, c, …” when there's no ambiguity. For example:
(* tuples actually don't need parenthesis *) let x = 3,4 in fst x;; (* ⇒ 3 *) (* the following are all equivalent *) let x = (3,4,5) in match x with (_ as x, _, _) -> x;; (* ⇒ 3 *) let x = 3,4,5 in match x with (_ as x, _, _) -> x;; (* ⇒ 3 *) let x = 3,4,5 in match x with _ as x, _, _ -> x;; (* ⇒ 3 *)
This means, when you pattern match on tuples, sometimes the result may be suprising because of the operator precedence. Following is a example:
match (3,4) with (3 , 4 as y) -> y;; (* ⇒ (3, 4) *) (* above is actually equivalent to the following *) match (3,4) with ((3 , 4) as y) -> y;; (* ⇒ (3, 4) *) (* to actually name 4 as y, you need to use paren to group *) match (3,4) with (3 , (4 as y)) -> y;; (* 4 *)
Compiler prints warning when patterns do not capture all possibilities. This is important because, if no pattern matches at run time, Match_Failure exception is raised, and if that is not captured, the program stops. So, make sure your program logic will always find a match, or, do catch the Match_Failure exception.
You can always simply add a wildcard pattern “_” to match anything, but this is not recommended because it losens your code logic.
Defining Function with Patterns
Ask me question on patreon