Unicode in Function Name and Operator
The question we want to ask are, does your favorite programing language:
- Support literal Unicode character in strings? (almost all programing language today (as of 2019) support unicode in string or source code.)
- Support literal Unicode character in source code? (e.g. in comment.)
- Allow Unicode character in identifier (variable or function names), e.g.
Ο = 3
;Ο(3)
. - Allow Non-letter Unicode character in identifier, e.g.
β(v1, v2)
. - Allow defining new operators e.g.
x +++ y
. - Allow defining new operators using unicode symbol e.g.
M β v
.
Here is a table showing support of unicode in identifier, and defining operators.
language | allow unicode in identifier | allow math symbol in identifier | allow define operator | allow math symbol in operator |
---|---|---|---|---|
C | no | no | no | no |
C++ | no | no | no | no |
Go | β | no | no | no |
Java | β | no | no | no |
JavaScript | β | no | no | no |
Python | β | no | β | no |
Perl | β | no | no | no |
Ruby | β | β | ? | no |
ocaml | no | no | β | no |
Haskell | β | no | β | β |
Emacs Lisp | β | β | no | no |
Julia | β | β | β | β |
Wolfram Lang | β | β | β | β |
Rust | ? | ? | ? | ? |
Nim | β | no | ? | ? |
What Characters Are Unicode Letter
Many languages do not allow non-letter e.g. Unicode: Math Symbols β β« ΟΒ² β in identifier names. e.g. JavaScript, python, golang, java.
Why Define Operators?
Programing Language Design: Why You Need Operators
Golang
Identifier must start with a letter, followed by letter or digit.
Unicode math symbol (non-letter e.g. β₯ β) NOT allowed.
package main import "fmt" func main() { var Ξ± = 3 // var tβ = 4 // syntax error // invalid identifier character U+2295 'β' fmt.Printf("%v\n", Ξ±) }
Python 2
Python 2.x does not support Unicode char for variable or function names.
Python 2.x's Unicode support is not very good. But does work for processing Unicode in string.
See: Python: Unicode Tutorial π .
Python 3
Python 3 supports Unicode in variable names and function names.
Unicode math symbol (non-letter e.g. β₯ β) NOT allowed.
# python 3 def Ο(n): return n + 1 Ξ± = 4 print(Ο(Ξ±)) # prints 5
# python 3 # β₯ = 4 # β₯ = 4 # ^ # SyntaxError: invalid character in identifier
Detail at: Python: Unicode Tutorial π .
JavaScript
JavaScript supports Unicode in variable name and function name.
JavaScript identifiers (variable or function names) must begin with a letter, underscore _, or a dollar sign $. The βletterβ or βdigitβ includes non-ASCII Unicode that are also letters or digits.
Unicode math symbol (non-letter e.g. β₯ β) NOT allowed.
// -*- coding: utf-8 -*- var ζ = "β₯"; function Ξ»(n) { return n + "β₯"; } alert(Ξ»(ζ));
As of , all browsers support it.
For detail, see:
- JavaScript: Allowed Characters in Identifier
- JavaScript: What's Default Charset, Encoding? How to Escape Unicode Character?
- JavaScript String is 16 Bits Byte Sequence
Ruby
Ruby has robust support of Unicode, starting with version 1.9. (2007)
Unicode can be in variable or function names.
Non-letter unicode math symbol allowed.
# -*- coding: utf-8 -*- # ruby β₯ = "β₯" def Ξ» n n + "ηΎ" end p (Ξ» β₯) == "β₯ηΎ" # true
Perl
Perl, since about 2010, has good Unicode support. Unicode can be in variable or function names.
# -*- coding: utf-8 -*- # perl 5.14 use strict; use utf8; # necessary if you want to use Unicode in function or variable names # string with unicode char my $s = 'I β you'; $s =~ s/β /β₯/; print "$s\n"; # var with Unicode char my $Ξ² = 4; print "$Ξ²\n"; # function with Unicode char sub Ξ» { return 2;} print Ξ»();
Unicode math symbol (non-letter e.g. β₯ β) NOT allowed.
# -*- coding: utf-8 -*- # perl v5.32.1 # 2022-12-28 use strict; use utf8; # version string print $^V; # sample output: v5.32.1 # ssss--------------------------------------------------- # identifier cannot be arbitrary unicode char # my $π = 3; # error # Unrecognized character \x{1f602} # ssss--------------------------------------------------- use strict; use utf8; # identifier cannot be arbitrary unicode char # my $β₯ = 3; # error # Unrecognized character \x{2665}
The exact rule is complicated. But basically, if the unicode is considered a letter, then, it's ok. Heart β₯, or the Summation sign β, are not letters.
[see Perl: Unicode Tutorial πͺ]
Java
Java supports Unicode fully, including use in variable/class/method names.
Unicode math symbol (non-letter e.g. β₯ β) NOT allowed.
class ζΉ { String ε = "north"; double Ο = 3.14159; } class UnicodeTest { public static void main(String[] arg) { ζΉ x1 = new ζΉ(); System.out.println( x1.ε ); System.out.println( x1.Ο ); } }
Detail: Java Tutorial: Unicode in Java .
Emacs Lisp
For text processing, the most beautiful lang with respect to Unicode is emacs lisp. In elisp, you don't have to declare none of the Unicode or encoding stuff. You simply write code to process string or files, without even having to know what encoding it is. Emacs the environment takes care of all that.
Emacs Lisp allow Unicode in var/function names.
Non-letter unicode math symbol allowed.
(defun β₯ () "Inserts stuff" (interactive) (let ((Ξ± "β₯ ζ β―")) (insert Ξ±)))
(to try the above in emacs: paste the above into a empty file, then
select it, then Alt+x eval-region
to make emacs eval it. Now, you can press Alt+x then
type β₯ (just copy paste), it'll insert ββ₯ ζ β―β.) (See: Emacs
and Unicode Tips β’ Emacs
Lisp Basics.)
Ocaml
Ocaml does not allow any non-ascii character in names.
Ocaml can define operators, but operators can only be some ASCII chars.

source https://caml.inria.fr/pub/docs/manual-ocaml/names.html#operator-name


Haskell
Haskell allows non-ascii char in variable or function names. but it must be Unicode letter char. Math symbol not allowed.
Haskell can define operator, and the operator can be any unicode character in the category of symbol or punctuation.
Summary: variable or function names must be unicode letter, and operator must be unicode symbol.

source https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-180002.4

Here is haskell user defined operators in action:

Julia
Julia allow Unicode math symbols variable names and also allow defining operators with math symbols.

Wolfram Language
- Wolfram Language allow Non-letter Unicode in identifier, e.g.
β₯ = 3
. - Wolfram Language allow defining behaviors of operators, but the operator name/character must be chosen from a given list. e.g.
x β y

Technically, Mathematica source code is ASCII. Characters in Unicode or Mathematica's own set of math symbols are represented by a markup, much like HTML entities. However, Mathematica editor (the Front End) displays it rendered, and there's robust system for user to input math symbols.
- For detail of how Mathematica deal with Unicode, see: Wolfram Language and Unicode.
- For example notebook with Unicode, see: Math Typesetting, Mathematica, MathML.
Linden Scripting Language (Second Life)
Linden Scripting Language supports Unicode in function or variable names.
string aββ₯ = "variable with Unicode char in name"; string tββ₯() { return "function with Unicode char in name";} default { state_entry() { llSay(0, "Hello, Avatar!"); } touch_start(integer num_detected) { llSay(0, (string) tββ₯() + "; " + (string) aββ₯); } }
See:
APL, Fortress ...
APL is well-known for its use of math symbols. [see APL Symbols Meaning and Code Example] but am not sure if it allows unicode symbol in identifiers or defining operator.
For other programing language language's support of unicode in names, see http://rosettacode.org/wiki/Unicode_variable_names Note: that page does not discuss if math symbol character can be used.
Why Use Unicode in Variable Names?
See: Variable Naming: English Words Considered Harmful .
Languages and Unicode Support History
- Mathematica. v3 (1996). Major version with typesetting feature.
- Emacs Lisp (2009). Internal encoding is now superset of UTF-8. [see New Features in Emacs 23]
- Linden Scripting Language (LSL)
- Perl 5.12 (2010). Major Unicode overhaul.
- Python 3 (2008). Major Unicode overhaul.
- Ruby 1.9 (2007). Major Unicode overhaul.
- Java 1.5 (2004). Java have always supported Unicode. Its source file encoding are UTF-16 by spec.
- JavaScript. ECMA-262, 5.1 edition. Source code is unicode char set, but no encoding is specified. Encoding is dependent on the host system (the web page or HTTP server). JavaScript's strings, are 16 bits byte sequence, not characters. [see JavaScript String is 16 Bits Byte Sequence]
- Python 2. Does not support Unicode in variable or function names.
thanks to boostjam on perl
thanks to Hleb Valoshka on ruby.
[https://plus.google.com/b/113859563190964307534/105354689506653311797/posts]