WolframLang: Source Code Encoding and Unicode
This page explains some tech detail about how Wolfram Language deals with Unicode characters.
WolframLang Supports Unicode Characters
WolframLang supports Unicode characters, such as math symbols → (U+2192: RIGHTWARDS ARROW) or Greek α (U+3B1: GREEK SMALL LETTER ALPHA) or Chinese 水 and any Unicode character.
WolframLang Source Code is ASCII Only
WolframLang source code and the notebook, is saved in ASCII format only. 〔see WolframLang: File Encoding〕
Syntax of Unicode Characters
When you paste text containing Unicode characters into Wolfram notebook, any unicode character will be shown as is. But the underlying syntax or Notebook will convert it to one of:
- Named character syntax:
\[Name]
. e.g. α (U+3B1: GREEK SMALL LETTER ALPHA) is\[Alpha]
- Lowercase 4 hexadecimal digits character syntax:
\:hhhh
. e.g. 水 (U+6C34: CJK IDEOGRAPH-6C34) is\:6c34
- Lowercase 6 hexadecimal digits character syntax:
\|hhhhhh
. e.g. 😃 (U+1F603: SMILING FACE WITH OPEN MOUTH) is\|01f603
What is Named Character
Named character is a set of characters that have the syntax \[Name]
Glyph | Syntax |
---|---|
é | \[EAcute] |
É | \[CapitalEAcute] |
α | \[Alpha] |
Δ | \[CapitalDelta] |
⊕ | \[CirclePlus] |
∵ | \[Because] |
∈ | \[Element] |
⇔ | \[Equivalent] |
ℝ | \[DoubleStruckCapitalR] |
So, when you type \[Alpha]
, it is displayed as α.
As of 2021-06-06,
there are 1009 named chars.
Some Named Chars Are Not in Unicode
Some of the named chars are not in Unicode. For example:
Some Unicode Math Symbols Are Not in Named Chars
Many uncommon Math Symbols in Unicode 〔see Unicode: Math Symbols π² ∞ ∫〕 are not WolframLang named characters. For example:
⫷ ⫸ ⩹ ⩺
Also, Chinese chars, Arabic alphabets etc, are not Wolfram Language named chars.
Map of Unicode to Named Character
Some unicode char such as π (U+3C0: GREEK SMALL LETTER PI)
maps to the named char \[Pi]
.
However, some unicode char such as ℝ
(U+211D: DOUBLE-STRUCK CAPITAL R)
does not map to the “same looking” named char
\[DoubleStruckCapitalR]
Am not aware of a document that list all the named characters and the unicode they map to, if any.
Interpretation of Unicode Characters
When you paste in a unicode character, how does Wolfram Language interpret it?
If Not a Named Character
If the character does not map to one of the named character, it is treated like any letter, such as a b c. You can use it in variable name or function name. For example:
Table[♥ , {♥, 1, 5}] (* {1, 2, 3, 4, 5} *)
Type of Named Chars: Letter/Letter-Like Forms vs Operator
A named character is one of two types:
- letter or letter-like forms
- operator
Each character in these class may or may not have builtin meaning.
Here's a tree illustration:
- named char
- letter or letter-like
- has builtin meaning (π ° ∞)
- no builtin meaning. (α)
- operator
- has builtin meaning (≥ √ ∑ ∫)
- no builtin meaning. (± ⊕)
- letter or letter-like
- not named char (🤡 💯)
Letter/Letter-Like-Form vs Operator
If it's letter or letter-like forms, either it has a builtin meaning, such as °, π, ∞, or it is treated as any letter, such as λ. It is treated the same as any of a, b, c etc. You can use it as part of variable/function name.
If it's a operator, either it has a builtin meaning, such as √ ≥ ∑ ∫ Or it does not have builtin meaning, such as ⊕ .
However, any operator character has builtin meaning of its syntactic structure and precedence.
A named character many have special meaning in Wolfram Language.
For example, π \[Pi]
is automatically considered identical to the built-in symbol Pi
, which means the mathematical constant.
(So, if you type \[Pi]
or \:03c0
, they are displayed as π
with meaning of Pi
.).
Here's some examples of special meaning named chars.
Glyph | Wolfram Language's name | Unicode name | Unicode hexidecimal | Default Interpretation |
---|---|---|---|---|
π | \[Pi] | GREEK SMALL LETTER PI | 03c0 | Pi |
∞ | \[Infinity] | INFINITY | 221e | Infinity |
√ | \[Sqrt] | SQUARE ROOT | 221a | operator for Sqrt |
≥ | \[GreaterEqual] | GREATER-THAN OR EQUAL TO | 2265 | operator for GreaterEqual |
⋂ | \[Intersection] | N-ARY INTERSECTION | 22c2 | operator for Intersection |
∑ | \[Sum] | N-ARY SUMMATION | 2211 | part of operator for Sum |
∫ | \[Integral] | INTEGRAL | 222b | part of operator for Integrate |
⊕ | \[CirclePlus] | CIRCLED PLUS | 2295 | operator for CirclePlus |
See:
keyboard Shortcut Alias for Named Chars
A named character may have one or more aliases for ease of input. For example, to enter α, you can type EscapeaEscape or EscapealphaEscape. Here's some examples:
Glyph | Common Alias |
---|---|
α | a |
π | p |
∞ | inf |
≤ | <= |
° | deg |
Δ | D |
∈ | el |
→ | -> |
Inputting Special Chars
You can input a special character by:
- Use one of the graphical palettes.
- Press Escape, then the char's alias name, then Escape again.
- Copy the Unicode char somewhere and pasting it in Wolfram Notebook.
- Type it like this:
\[Name]
. - Type the Unicode hexadecimal like this:
\:xxxx
or\|xxxxxx
See also: