How Wolfram Language does Unicode?

By Xah Lee. Date: . Last updated: .

This page explains some tech detail about how Wolfram Language deals with Unicode characters.

Wolfram Language supports Unicode, but does not use Unicode when saving to file.

If you are not familiar with unicode, first read:

Wolfram Language files use 7-bits ASCII only.

Wolfram Language notebook spec 2021-06-06
Wolfram Language notebook spec 2021-06-06 http://www.wolfram.com/technology/nb/

How does it support Unicode if it uses only ASCII?

Named Characters

Wolfram Language has a set of special characters with the syntax \[name]. For example:

GlyphSyntax
é\[EAcute]
É\[CapitalEAcute]
α\[Alpha]
Δ\[CapitalDelta]
\[CirclePlus]
\[Because]
\[Element]
\[Equivalent]
\[DoubleStruckCapitalR]

So, when you type \[Alpha], it is displayed as “α”.

As of 2021-06-06, there are 1009 named chars. ListingOfNamedCharacters

Some Named Chars Are Not in Unicode

Some of the named chars are not in Unicode. For example:

Wolfram Language wolf ram char Wolfram Language icon char 2021-06-06 Wolfram Language special char 2021-06-06 MzRbH
Wolfram Language only characters

Some Unicode Math Symbols Are Not in Named Chars

Similarly, many uncommon Math Symbols in Unicode [see Unicode Math Symbols ∑ ∫ π² ∞] are not WolframLang named characters. For example:

Also, Chinese chars, Arabic alphabets etc, are not Wolfram Language named chars.

Map Between Unicode and Named Chars

When you paste a Unicode char into Wolfram notebook, it will try to interpret the Unicode as one of the named char.

So, for example, if you paste α (GREEK SMALL LETTER ALPHA; U+x3b1), it automatically becomes Wolfram Language's \[Alpha], and displayed as α.

If it's not a named character, Wolfram Language show it as is.

Syntax for Arbitrary Unicode Char

For any Unicode that's not one of Wolfram Language's named char (such as Chinese chars), their syntax is this: \:xxxx, where the xxxx is Unicode's 4 digit hexidecimal representation of the char. For example, the Chinese char (water), Unicode hexadecimal is 6c34 , in Wolfram Language is: \:6c34.

For unicode character that's more than 4 digit hexadecimal, it's represented by
\|6_digits_hexadecimal

For example, the unicode character 😃 is \|01f603

The above roughly summarize how Wolfram Language takes Unicode as input.

Interpretation of Unicode Characters

When you paste in a unicode character, how does Wolfram Language interprete it?

Not a Named Character

If the character does not correspond to one of the named character, it is treated like any letter, such as a b c. You can use it to in function or variable names. For example:

♥ = 3

Type of Named Chars: Letter/Letter-Like Forms vs Operator

A named character is one of two types:

Each character in these class may or may not have builtin meaning.

Here's a tree illustration:

Unicode Char
␣named char
␣␣letter or letter-like
␣␣␣has builtin meaning (π ° ∞)
␣␣␣no builtin meaning. (α β)
␣␣operator
␣␣␣has builtin meaning (≥ √ ∑ ∫)
␣␣␣no builtin meaning. (± ⊕)
␣not named char (🤡 💯)

Letter/Letter-Like-Form vs Operator

If it's letter or letter-like forms, either it has a builtin meaning, such as °, π, , or it is treated as any letter, such as λ. It is treated the same as any of a, b, c etc. You can use it as part of variable/function name.

If it's a operator, either it has a builtin meaning, such as . , , Or it does not have builtin meaning, such as

However, any operator character has builtin meaning of its syntactic structure and precedence.

Of the named chars, many has special meaning in Wolfram Language. For example, π \[Pi] is automatically considered identical to the built-in symbol Pi, which means the mathematical constant. (So, if you type \[Pi] or \:03c0, they are displayed as π with meaning of Pi.). Here's some examples of special meaning named chars.

GlyphWolfram Language's nameUnicode nameUnicode hexidecimalDefault Interpretation
π\[Pi]GREEK SMALL LETTER PI03c0Pi
\[Infinity]INFINITY221eInfinity
GlyphWolfram Language's nameUnicode nameUnicode hexidecimalDefault Interpretation
\[GreaterEqual]GREATER-THAN OR EQUAL TO2265GreaterThan
\[Integral]INTEGRAL222bIntegrate
\[Intersection]N-ARY INTERSECTION22c2Union
\[Sum]N-ARY SUMMATION2211Sum
\[Sqrt]SQUARE ROOT221aSqrt
\[CirclePlus]CIRCLED PLUS2295CirclePlus

See:

keyboard Shortcut Alias for Named Chars

Some of the named char has one or more aliases for ease of input. For example, to enter α, you can type EscapeaEscape or EscapealphaEscape. Here's some examples:

GlyphCommon Alias
αa
πp
inf
<=
°deg
ΔD
el
->

Inputting Special Chars

You can input a special character by: