Unicode in Ruby, Perl, Python, JavaScript, Java, Emacs Lisp, Mathematica

By Xah Lee. Date: . Last updated: .

This page shows which language support using Unicode symbols in variable name (For example, α = 5), or in function name (For example, φ(4)), or in user-defined operator such as x ⊕ y.

The question we want to ask are:

Ruby

Ruby has robust support of Unicode, starting with version 1.9. (2007) Unicode can be in variable or function names.

Non-letter unicode is allowed.

# -*- coding: utf-8 -*-
# ruby

♥ = "♥"

def λ n
  n + "美"
end

p (λ ♥) == "♥美" # true

[Ruby: Unicode Tutorial 💎]

Perl

Perl, since about 2010, has good Unicode support. Unicode can be in variable or function names.

# -*- coding: utf-8 -*-
# perl 5.14

use strict;
use utf8; # necessary if you want to use Unicode in function or variable names

# string with unicode char
my $s = 'I ★ you';
$s =~ s///;
print "$s\n";

# var with Unicode char
my  = 4;
print "$β\n";

# function with Unicode char
sub λ { return 2;}
print λ();

Non-letter unicode is not allowed.

# -*- coding: utf-8 -*-
# perl v5.18.2

use strict;
use utf8;

# identifier cannot be arbitrary unicode char

my $😂 = 3;

# error
# Unrecognized character \x{1f602}
# -*- coding: utf-8 -*-
# perl v5.18.2

use strict;
use utf8;

# identifier cannot be arbitrary unicode char

my $♥ = 3;

# error
# Unrecognized character \x{2665}

The exact rule is complicated. But basically, if the unicode is considered a letter, then, it's ok. Heart , or the Summation sign , are not letters.

[see Perl: Unicode Tutorial 🐪]

Python 2

Python 2.x does not support Unicode char for variable or function names.

Python 2.x's Unicode support is not very good. But does work for processing Unicode in string.

See: Python: Unicode Tutorial 🐍.

Python 3

Python 3 supports Unicode in variable names and function names.

# -*- coding: utf-8 -*-          ← optional, but still good to have
# python 3

def ƒ(n):
    return n+1

α = 4
print(ƒ(α)) # prints 5

Non-letter unicode is not allowed.

# -*- coding: utf-8 -*-
# python 3

♥ = 4

#     ♥ = 4
#     ^
# SyntaxError: invalid character in identifier

Detail at: Python: Unicode Tutorial 🐍.

JavaScript

JavaScript supports Unicode in variable name and function name.

JavaScript identifiers (variable or function names) must begin with a letter, underscore _, or a dollar sign $. The “letter” or “digit” includes non-ASCII Unicode that are also letters or digits.

Non-letter unicode is not allowed.

// -*- coding: utf-8 -*-

var 愛 = "♥";

function λ(n) {return n + "♥";}

alert( λ(愛) );

As of , all browsers support it.

For detail, see:

Emacs Lisp

For text processing, the most beautiful lang with respect to Unicode is emacs lisp. In elisp, you don't have to declare none of the Unicode or encoding stuff. You simply write code to process string or files, without even having to know what encoding it is. Emacs the environment takes care of all that.

Emacs Lisp also supports Unicode in var/function names.

Non-letter unicode is allowed.

(defun ♥ ()
  "Inserts stuff"
  (interactive)
  (let ((α "♥ 愛 ☯"))
    (insert α)))

(to try the above in emacs: paste the above into a empty file, then select it, then 【Alt+xeval-region to make emacs eval it. Now, you can press 【Alt+x】 then type β (just copy paste), it'll insert “♥ 愛 ☯”.) (See: Emacs and Unicode TipsEmacs Lisp Basics.)

Java

Java supports Unicode fully, including use in variable/class/method names.

Non-letter unicode is not allowed.

class  {
    String  = "north";
    double π = 3.14159;
}

class UnicodeTest {
    public static void main(String[] arg) {
        方 x1 = new ();
        System.out.println( x1.北 );
        System.out.println( x1.π );
    }
}

Detail: Java Tutorial: Unicode in Java.

Wolfram Language

Mathematica syntax StandardForm screenshot
Mathematica notebook

Technically, Mathematica source code is ASCII. Characters in Unicode or Mathematica's own set of math symbols are represented by a markup, much like HTML entities. However, Mathematica editor (the Front End) displays it rendered, and there's robust system for user to input math symbols.

Linden Scripting Language (Second Life)

Linden Scripting Language supports Unicode in function or variable names.

string a∑♥ = "variable with Unicode char in name";

string t∑♥() { return "function with Unicode char in name";}

default
{
    state_entry()
    {
        llSay(0, "Hello, Avatar!");
    }

    touch_start(integer num_detected)
    {
        llSay(0, (string) t∑♥() + "; " + (string) a∑♥);
    }
}

See:

APL, Julia, Fortress ...

APL is well-known for its use of math symbols. [see APL Symbols Meaning and Code Example]

Julia allow Unicode in variable names and encourages it. http://docs.julialang.org/en/latest/manual/variables/#allowed-variable-names

For a much longer list, see http://rosettacode.org/wiki/Unicode_variable_names

Why Use Unicode in Variable Names?

See: Programing Style: Variable Naming: English Words Considered Harmful.

Languages and Unicode Support History

thanks to boostjam on perl

thanks to Hleb Valoshka on ruby. [https://plus.google.com/b/113859563190964307534/105354689506653311797/posts]

If you have a question, put $5 at patreon and message me.