Unicode in Function Names and Operator Symbol

By Xah Lee. Date: . Last updated: .

The question we want to ask are, does your favorite programing language:

  1. Support literal Unicode in strings?
  2. Support literal Unicode in source code?
  3. Allow Unicode in identifier (variable or function names), e.g. φ = 3, Γ(z).
  4. Allow Non-letter Unicode in identifier, e.g. ∑(3,4,5), vector⊕(v1,v2).
  5. Allow defining new operators e.g. x +++ y.
  6. Allow defining new operators using unicode symbol e.g. v1v2, Mv.

Basically, almost all programing language today (as of 2019) support unicode in string or source code.

Here's a table showing support of unicode in identifier, and defining operators.

Unicode in Identifier and Operator
languageallow unicode in nameallow math symbolallow define operatorallow math symbol in operator
Cnononono
C++nononono
Goyesnonono
Javayesnonono
JavaScriptyesnonono
Pythonyesnoyesno
Perlyesnonono
Rubyyesyes?no
ocamlnonoyesno
haskellyesnoyesyes
Emacs Lispyesyesnono
Juliayesyesyesyes
Mathematicayesyesyesyes

JavaScript, python, golang, java, do NOT allow any non-letter math symbols such as

[see Unicode Math Symbols ∑ ∫ π² ∞]

This means, you can't define a function with those characters.

Why Define Operators?

In scientific programing, you have to deal with math formulas often.

Math formula in traditional programing language notation is not readable.

Here's a quadratic formula in normal math notation:

quadratic formula 2019-06-22 j6ppn

Here's functional notation in programing languages:

assign(x, div( plus(-b, sqrt( plus( power(b, 2), neg(times(4, a, c))))), times(2, a)))

If allow symbols in function names:

=(x, /( +(-b, sqrt( +( ^(b, 2), -(*(4, a, c))))), *(2, a)))

Lisp notation:

(set x (div (plus (neg b) (sqrt (power b 2) (neg (times 4 a c)))) (times 2 a )))

if allow symbols:

(= x (/ (+ (- b) (√ (^ b 2) (- (* 4 a c)))) (* 2 a )))

If allow defining operators, we have the normal infix notation:

x = (-b + √(b^2 - 4 * a * c) ) / (2 * a)

In math, the operand may not be numbers. So you need to define your own operators, such as for matrix multiplication, vector addition.

Real world example in JavaScript, not allowing math symbols in names:

const svg_ellipse_arc = (([cx,cy],[rx,ry], [θ, Δ], φ ) => {
Δ = Δ % (2*π);
const rotMatrix = rotate_matrix (φ);
const [sX, sY] = vec_add ( matrix_times ( rotMatrix, [rx * cos(θ),ry * sin(θ)] ), [cx,cy] ) ;
const [eX, eY] = vec_add ( matrix_times ( rotMatrix, [rx * cos(θ+Δ),ry * sin(θ+Δ)] ), [cx,cy] ) ;
const fA = ( ( Math.abs (Δ) > π ) ? 1 : 0 );
const fS = ( ( Δ > 0 ) ? 1 : 0 );
const path = document.createElementNS("http://www.w3.org/2000/svg", "path");
path.setAttribute("d", "M " + sX + " " + sY + " A " + [ rx , ry , φ/π*180 , fA, fS, eX, eY ].join(" "));
return path;
});

Now, if we allow math symbols in names, we have:

const svg_ellipse_arc = (([cx,cy],[rx,ry], [θ, Δ], φ ) => {
Δ = Δ % (2*π);
const rotMatrix = rotate_matrix (φ);
const [sX, sY] = ⊕( ⊗( rotMatrix, [rx * cos(θ),ry * sin(θ)] ), [cx,cy] ) ;
const [eX, eY] = ⊕( ⊗( rotMatrix, [rx * cos(θ+Δ),ry * sin(θ+Δ)] ), [cx,cy] ) ;
const fA = ( ( Math.abs (Δ) > π ) ? 1 : 0 );
const fS = ( ( Δ > 0 ) ? 1 : 0 );
const path = document.createElementNS("http://www.w3.org/2000/svg", "path");
path.setAttribute("d", "M " + sX + " " + sY + " A " + [ rx , ry , φ/π*180 , fA, fS, eX, eY ].join(" "));
return path;
});

Now, if we can define operators with math symbols in names, we have:

const svg_ellipse_arc = (([cx,cy],[rx,ry], [θ, Δ], φ ) => {
Δ = Δ % (2*π);
const rotMatrix = rotate_matrix (φ);
const [sX, sY] = rotMatrix ⊗ [rx * cos(θ),ry * sin(θ)] ⊕ [cx,cy];
const [sX, sY] = rotMatrix ⊗ [rx * cos(θ+Δ),ry * sin(θ+Δ)] ⊕ [cx,cy];
const fA = ( ( Math.abs (Δ) > π ) ? 1 : 0 );
const fS = ( ( Δ > 0 ) ? 1 : 0 );
const path = document.createElementNS("http://www.w3.org/2000/svg", "path");
path.setAttribute("d", "M " + sX + " " + sY + " A " + [ rx , ry , φ/π*180 , fA, fS, eX, eY ].join(" "));
return path;
});

Golang

Identifier must start with a letter, followed by letter or digit.

Unicode math symbol (non-letter e.g. ♥ ⊕) NOT allowed.

package main

import "fmt"

func main() {

	var α = 3

	// syntax error
	var t⊕ = 4
	// invalid identifier character U+2295 '⊕'

	fmt.Printf("%v\n", α)
}

Python 2

Python 2.x does not support Unicode char for variable or function names.

Python 2.x's Unicode support is not very good. But does work for processing Unicode in string.

See: Python: Unicode Tutorial 🐍.

Python 3

Python 3 supports Unicode in variable names and function names.

Unicode math symbol (non-letter e.g. ♥ ⊕) NOT allowed.

# python 3

def ƒ(n):
    return n+1

α = 4
print(ƒ(α)) # prints 5
# python 3

♥ = 4

#     ♥ = 4
#     ^
# SyntaxError: invalid character in identifier

Detail at: Python: Unicode Tutorial 🐍.

JavaScript

JavaScript supports Unicode in variable name and function name.

JavaScript identifiers (variable or function names) must begin with a letter, underscore _, or a dollar sign $. The “letter” or “digit” includes non-ASCII Unicode that are also letters or digits.

Unicode math symbol (non-letter e.g. ♥ ⊕) NOT allowed.

// -*- coding: utf-8 -*-

var 愛 = "♥";

function λ(n) {return n + "♥";}

alert( λ(愛) );

As of , all browsers support it.

For detail, see:

Ruby

Ruby has robust support of Unicode, starting with version 1.9. (2007)

Unicode can be in variable or function names.

Non-letter unicode math symbol allowed.

# -*- coding: utf-8 -*-
# ruby

♥ = "♥"

def λ n
  n + "美"
end

p (λ ♥) == "♥美" # true

[Ruby: Unicode Tutorial 💎]

Perl

Perl, since about 2010, has good Unicode support. Unicode can be in variable or function names.

# -*- coding: utf-8 -*-
# perl 5.14

use strict;
use utf8; # necessary if you want to use Unicode in function or variable names

# string with unicode char
my $s = 'I ★ you';
$s =~ s/★/♥/;
print "$s\n";

# var with Unicode char
my $β = 4;
print "$β\n";

# function with Unicode char
sub λ { return 2;}
print λ();

Unicode math symbol (non-letter e.g. ♥ ⊕) NOT allowed.

# -*- coding: utf-8 -*-
# perl v5.18.2

use strict;
use utf8;

# identifier cannot be arbitrary unicode char

my $😂 = 3;

# error
# Unrecognized character \x{1f602}
# -*- coding: utf-8 -*-
# perl v5.18.2

use strict;
use utf8;

# identifier cannot be arbitrary unicode char

my $♥ = 3;

# error
# Unrecognized character \x{2665}

The exact rule is complicated. But basically, if the unicode is considered a letter, then, it's ok. Heart , or the Summation sign , are not letters.

[see Perl: Unicode Tutorial 🐪]

Java

Java supports Unicode fully, including use in variable/class/method names.

Unicode math symbol (non-letter e.g. ♥ ⊕) NOT allowed.

class  {
    String  = "north";
    double π = 3.14159;
}

class UnicodeTest {
    public static void main(String[] arg) {
         x1 = new ();
        System.out.println( x1.北 );
        System.out.println( x1.π );
    }
}

Detail: Java Tutorial: Unicode in Java.

Emacs Lisp

For text processing, the most beautiful lang with respect to Unicode is emacs lisp. In elisp, you don't have to declare none of the Unicode or encoding stuff. You simply write code to process string or files, without even having to know what encoding it is. Emacs the environment takes care of all that.

Emacs Lisp allow Unicode in var/function names.

Non-letter unicode math symbol allowed.

(defun ♥ ()
  "Inserts stuff"
  (interactive)
  (let ((α "♥ 愛 ☯"))
    (insert α)))

(to try the above in emacs: paste the above into a empty file, then select it, then 【Alt+xeval-region to make emacs eval it. Now, you can press 【Alt+x】 then type β (just copy paste), it'll insert “♥ 愛 ☯”.) (See: Emacs and Unicode TipsEmacs Lisp Basics.)

Ocaml

Ocaml does not allow any non-ascii character in names.

Ocaml can define operators, but operators can only be some ASCII chars.

ocaml identifier allowed chars 2019-06-21 fmxgy
ocaml identifier allowed chars 2019-06-21 fmxgy

source https://caml.inria.fr/pub/docs/manual-ocaml/names.html#operator-name

ocaml operator allowed chars 2019-06-21 pnbsv
ocaml operator allowed chars 2019-06-21 pnbsv
ocaml operator allowed chars 2019-06-21 8wy5n
ocaml operator allowed chars 2019-06-21 8wy5n

Haskell

Haskell allows non-ascii char in variable or function names. but it must be Unicode letter char. Math symbol not allowed.

Haskell can define operator, and the operator can be any unicode character in the category of symbol or punctuation.

Summary: variable or function names must be unicode letter, and operator must be unicode symbol.

haskell lex 2019-06-21 8shnc
haskell lex 2019-06-21 8shnc

source https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-180002.4

haskell operator unicode 2019-06-21 mfyyv
haskell operator unicode 2019-06-21 mfyyv

Here's haskell user defined operators in action:

haskell snowman operator h5m4n
Haskell snowman and mountain operators [image source twitter Iceland_jack]

Julia

Julia allow Unicode math symbols variable names and also allow defining operators with math symbols.

screenshot 2019-06-20 y2j8g
[https://docs.julialang.org/en/latest/manual/variables/#Allowed-Variable-Names-1]

Wolfram Language

Mathematica syntax StandardForm screenshot
Mathematica notebook

Technically, Mathematica source code is ASCII. Characters in Unicode or Mathematica's own set of math symbols are represented by a markup, much like HTML entities. However, Mathematica editor (the Front End) displays it rendered, and there's robust system for user to input math symbols.

Linden Scripting Language (Second Life)

Linden Scripting Language supports Unicode in function or variable names.

string a∑♥ = "variable with Unicode char in name";

string t∑♥() { return "function with Unicode char in name";}

default
{
    state_entry()
    {
        llSay(0, "Hello, Avatar!");
    }

    touch_start(integer num_detected)
    {
        llSay(0, (string) t∑♥() + "; " + (string) a∑♥);
    }
}

See:

APL, Fortress ...

APL is well-known for its use of math symbols. [see APL Symbols Meaning and Code Example] but am not sure if it allows unicode symbol in identifiers or defining operator.

For other programing language language's support of unicode in names, see http://rosettacode.org/wiki/Unicode_variable_names Note: that page does not discuss if math symbol character can be used.

Why Use Unicode in Variable Names?

See: Programing Style: Variable Naming: English Words Considered Harmful.

Languages and Unicode Support History

thanks to boostjam on perl

thanks to Hleb Valoshka on ruby. [https://plus.google.com/b/113859563190964307534/105354689506653311797/posts]

File Encoding

  1. Unicode Basics: Character Set, Encoding, UTF-8, Codepoint
  2. HTML: Character Sets and Encoding
  3. Unicode in Function Names and Operator Symbol
  4. Python: Unicode Tutorial 🐍
  5. Python: Convert File Encoding
  6. Python: Convert File Encoding for All Files in a Dir
  7. Perl: Unicode Tutorial 🐪
  8. Perl: Convert File Encoding
  9. Ruby: Unicode Tutorial 💎
  10. Java: Convert File Encoding
  11. Linux: Convert File Encoding with iconv

If you have a question, put $5 at patreon and message me.