Programing Language: Unicode Math Symbols in Function Name and Operator
Unicode Support in Function Name and Operator
The question we want to ask are, does your favorite programing language:
- Support literal Unicode character in strings? (almost all programing language today (as of 2019) support unicode in string or source code.)
- Support literal Unicode character in source code? (e.g. in comment.)
- Allow Unicode character in variable or function names, e.g.
φ = x
andφ(x)
. - Allow Unicode math symbol character in identifiers (variable or function name), e.g.
⊕(v1, v2)
. - Allow defining new operators e.g.
x +++ y
. - Allow defining new operators using unicode symbol e.g.
M ⊗ v
.
Here is a table showing support of unicode in names and operators.
language | allow unicode φ in names | allow math symbol ⊕ in names | allow new operator | allow math symbol ⊕ in operator |
---|---|---|---|---|
C | ✅ | ✅ | no | no |
C++ | ✅ | ✅ | no | no |
Go | ✅ | no | no | no |
Java | ✅ | no | no | no |
JavaScript | ✅ | no | no | no |
Python | ✅ | no | ✅ | no |
Perl | ✅ | no | no | no |
Ruby | ✅ | ✅ | ? | no |
ocaml | no | no | ✅ | no |
Haskell | ✅ | no | ✅ | ✅ |
Emacs Lisp | ✅ | ✅ | no | no |
Julia | ✅ | ✅ | ✅ | ✅ |
Wolfram Lang | ✅ | ✅ | ✅ | ✅ |
Rust | ✅ | no | no | no |
Nim | ✅ | no | ? | ? |
What Characters Are Unicode Letter
Many languages do not allow math symbols e.g. Unicode: Math Symbols π² ∞ ∫ in names. e.g. JavaScript, python, golang, java.
Why Define Operators?
Python
Python 2
Python 2.x does not support Unicode char for identifier names.
Python 2.x's Unicode support is not very good. But does work for processing Unicode in string.
Python 3
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
JavaScript
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
Ruby
Ruby has robust support of Unicode, starting with version 1.9. (2007)
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
# -*- coding: utf-8 -*- # ruby 🐞 = "🐞" def λ n n + "美" end p (λ 🐞) # "🐞美"
Perl
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
Emacs Lisp
- Allow Unicode in function or variable names.
- Allowed math symbols.
;; -*- coding: utf-8; lexical-binding: t; -*- ;; copy and paste this to a file (defun xx-🐞 () "Inserts 🐞 愛 ☯ at cursor position." (interactive) (let (α) (setq α "🐞 愛 ☯") (insert α))) ;; then Alt-x eval-buffer ;; then Alt-x then type xx-🐞 ;; press tab for completion
Java
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
C Lang
Allow math symbols in identifier names.
/* 2024-12-11 testing unicode in identifier. author: steve, on xah lee discord. valid when compile it with gcc -std=c99 but -std=c89 fails. */ #include <stdio.h> int main(int argc, char *argv[]) { double π = 3.14159; printf("π = %f\n", π); const char *杀 = "xah"; printf("杀 ≈ %s\n", 杀); const char *xah = "杀"; printf("xah = %s\n", xah); const char* 😍 = "😍"; printf("😍 = %s\n", 😍); }
Cpp
Allow math symbols in identifier names.
/////////////////////////////////////////// // // 2024-12-11 // // // testing unicode in identifier. // // // author: steve, on xah lee discord. // /////////////////////////////////////////// #include <print> auto main(int argc, char *argv[]) -> int { auto π = 3.14159; std::println("π ≈ {}", π); auto 杀 = "xah"; std::println("杀 ≈ {}", 杀); auto xah = "杀"; std::println("xah ≈ {}", xah); auto 😍 = "😍"; std::println("😍 = {}", 😍); }
Golang
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
package main import "fmt" func main() { var α = 3 // var t⊕ = 4 // syntax error // invalid identifier character U+2295 '⊕' fmt.Printf("%v\n", α) }
Rust
- Unicode letters can be in identifier names.
- Math symbols NOT allowed (e.g. ⊕°).
// 2024-12-12 // testing unicode char in function names // author: jamesni pub fn demo() { println!("Golden ratio {}", φ()); //println!("1 ⊕ 0 = {}", f⊕(1, 0)); } fn φ() -> f64 { let φ = 1.618; φ } //fn f⊕(a: isize, b: isize) -> isize { // a ^ b //} // unknown start of token: \u{2295} // Rust has a list of overridable operators // see [https://doc.rust-lang.org/std/ops/index.html#traits]
Ocaml
Ocaml does not allow any non-ascii character in names.
Ocaml can define operators, but operators can only be some ASCII chars.
source https://caml.inria.fr/pub/docs/manual-ocaml/names.html#operator-name
Haskell
Haskell allows non-ascii char in identifier names. but it must be Unicode letter char. Math symbol not allowed.
Haskell can define operator, and the operator can be any unicode character in the category of symbol or punctuation.
Summary: identifier names must be unicode letter, and operator must be unicode symbol.
source https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-180002.4
Here is haskell user defined operators in action:
Julia
Julia allow Unicode math symbols variable names and also allow defining operators with math symbols.
Wolfram Language
- Wolfram Language allow Non-letter Unicode in identifier, e.g.
♥ = 3
. - Wolfram Language allow defining behaviors of operators, but the operator name/character must be chosen from a given list. e.g.
x ⊕ y
Technically, Mathematica source code is ASCII. Characters in Unicode or Mathematica's own set of math symbols are represented by a markup, much like HTML entities. However, Mathematica editor (the Front End) displays it rendered, and there's robust system for user to input math symbols.
Linden Scripting Language (Second Life)
Supports Unicode in variable name and function name.
string a∑♥ = "variable with Unicode char in name"; string t∑♥() { return "function with Unicode char in name";} default { state_entry() { llSay(0, "Hello, Avatar!"); } touch_start(integer num_detected) { llSay(0, (string) t∑♥() + "; " + (string) a∑♥); } }
See:
Fortress
APL etc
APL is well-known for its use of math symbols. 〔see APL Symbols Meaning and Code Example〕 but am not sure if it allows unicode symbol in identifiers or defining operator.
For other programing language language's support of unicode in names, see http://rosettacode.org/wiki/Unicode_variable_names Note: that page does not discuss if math symbol character can be used.
Why Use Unicode in Variable Names?
Languages and Unicode Support History
- Mathematica. v3 (1996). Major version with typesetting feature.
- Emacs Lisp (2009). Internal encoding is now superset of UTF-8. 〔see New Features in Emacs 23〕
- Linden Scripting Language (LSL)
- Perl 5.12 (2010). Major Unicode overhaul.
- Python 3 (2008). Major Unicode overhaul.
- Ruby 1.9 (2007). Major Unicode overhaul.
- Java 1.5 (2004). Java have always supported Unicode. Its source file encoding are UTF-16 by spec. (in Java 18, source code is now UTF-8)
- JavaScript. ECMA-262, 5.1 edition. Source code is unicode char set, but no encoding is specified. Encoding is dependent on the host system (the web page or HTTP server). JavaScript's strings, are 16 bits byte sequence, not characters. 〔see JavaScript String is 16 Bits Byte Sequence〕
- Python 2. Does not support Unicode in identifier names.
thanks to boostjam on perl
thanks to Hleb Valoshka on ruby.
https://plus.google.com/b/113859563190964307534/105354689506653311797/posts
Programing Language Operators
- What is Function, What is Operator?
- Programing Language Design: Why You Need Operators
- What is the Definition of Operator in Computer Language?
- Necessity of Operator Overload in Computer Languages
- Programing Language: Unicode Math Symbols in Function Name and Operator
- Logical Operators, Truth Table, Unicode