How Wolfram Language does Unicode?

By Xah Lee. Date: . Last updated: .

This page explains some tech detail about how Wolfram Language deals with Unicode characters.

Wolfram Language supports Unicode, but does not use Unicode when saving to file. [see Unicode Basics: Character Set, Encoding, UTF-8]

Wolfram Language files use 7-bits ASCII only.

Wolfram Language notebook spec 2021-06-06
Wolfram Language notebook spec 2021-06-06 http://www.wolfram.com/technology/nb/

How does it support Unicode if it uses only ASCII?

Named Characters

Wolfram Language has a set of special characters with the syntax \[name]. For example:

GlyphSyntax
é\[EAcute]
É\[CapitalEAcute]
α\[Alpha]
Δ\[CapitalDelta]
\[CirclePlus]
\[Because]
\[Element]
\[Equivalent]
\[DoubleStruckCapitalR]

So, when you type \[Alpha], it is displayed as “α”.

You can think of them as HTML's “named character entities”. [see HTML Entity List]

As of 2021-06-06, there are 1009 named chars. https://reference.wolfram.com/language/guide/ListingOfNamedCharacters.html

Many of the named chars are also in Unicode, but not all. For example:

Wolfram Language wolf ram char Wolfram Language icon char 2021-06-06 Wolfram Language special char 2021-06-06 MzRbH
Wolfram Language only characters

Similarly, many rarely used Math Symbols in Unicode are not in this list. For example:

Also, Chinese chars, Arabic alphabets etc, are not Wolfram Language named chars.

Map Between Unicode and Named Chars

When you paste a Unicode char into Wolfram notebook, it will try to interpret the Unicode as one of the named char.

So, for example, if you paste α (GREEK SMALL LETTER ALPHA; U+x3b1), it automatically becomes Wolfram Language's \[Alpha], and displayed as α.

If it's not a named character, Wolfram Language show it as is.

Syntax for Arbitrary Unicode Char

For any Unicode that's not one of Wolfram Language's named char (such as Chinese chars), their syntax is this: \:xxxx, where the xxxx is Unicode's 4 digit hexidecimal representation of the char. For example, the Chinese char (water), Unicode hexadecimal is 6c34 , in Wolfram Language is: \:6c34.

For unicode character that's more than 4 digit hexadecimal, it's represented by
\|6_digits_hexadecimal

For example, the unicode character 😃 is \|01f603

The above roughly summarize how Wolfram Language takes Unicode as input.

Interpretation of Unicode Characters

When you paste in a unicode character, how does Wolfram Language interprete it?

If the character does not correspond to one of the named character, it is just treated as a letter, just like any of a b c, and has no builtin meaning.

Some Named Chars have Built-in Meaning

If the character does not corresponds to one of the named character, then there are no builtin meaning for it. It is treated like any letter, such as a b c. you can use it to define function or variable names or anything else. For example: 水 = 3

If the character corresponds to one of the named character, there are 2 possible interpretations:

Each character in these class may or may not have builtin meaning.

named chars
 builtin meaning
  letter or letter-like → e.g. π ° ∞
  operator → e.g. ≥ √ ∑ ∫
 no builtin meaning
  letter or letter-like → treated as any normal letter such as a b c.
  operator → e.g. ± ⊕

Letter/Letter-Like-Form vs Operator

If it's letter or letter-like forms, either it has a builtin meaning, such as °, π, , or it is treated as any letter, such as λ. It is treated the same as any of a, b, c etc. You can define function with it or whatever.

If it's a operator, either it has a builtin meaning, such as . , , Or it does not have builtin meaning, such as

However, any operator character has builtin meaning of its syntactic structure and precedence.

Of the named chars, many has special meaning in Wolfram Language. For example, π \[Pi] is automatically considered identical to the built-in symbol Pi, which means the mathematical constant. (So, if you type \[Pi] or \:03c0, they are displayed as π with meaning of Pi.). Here's some examples of special meaning named chars.

GlyphWolfram Language's nameUnicode nameUnicode hexidecimalDefault Interpretation
π\[Pi]GREEK SMALL LETTER PI03c0Pi
\[Infinity]INFINITY221eInfinity
GlyphWolfram Language's nameUnicode nameUnicode hexidecimalDefault Interpretation
\[GreaterEqual]GREATER-THAN OR EQUAL TO2265GreaterThan
\[Integral]INTEGRAL222bIntegrate
\[Intersection]N-ARY INTERSECTION22c2Union
\[Sum]N-ARY SUMMATION2211Sum
\[Sqrt]SQUARE ROOT221aSqrt
\[CirclePlus]CIRCLED PLUS2295CirclePlus

See:

keyboard Shortcut Alias for Named Chars

Some of the named char has one or more aliases for ease of input. For example, to enter α, you can type EscapeaEscape or EscapealphaEscape. Here's some examples:

GlyphCommon Alias
αa
πp
inf
<=
°deg
ΔD
el
->

Inputting Special Chars

You can input a special character by: