Unicode in Ruby, Perl, Python, JavaScript, Java, Emacs Lisp, Mathematica

, , …,

This page shows which language support using Unicode symbols in variable name (⁖ α = 5), or in function name (⁖ φ(4)), or in user-defined operator such as x1 ⊕ x2.

Of these, only Mathematica supports user defined operators, and it can be a Unicode character.

Ruby

# -*- coding: utf-8 -*-
# ruby

i♥NY = "♥"

def λ n
  n + "美"
end

p λ i♥NY                        # ⇒ "♥美"

more detail, see: Ruby Unicode Tutorial 💎.

thanks to Hleb Valoshka

Perl

Perl, since about 2010, has good Unicode support. Unicode can be in var or function names. Example:

# -*- coding: utf-8 -*-
# perl 5.14

use strict;
use utf8; # necessary if you want to use Unicode in function or var names

# processing Unicode string
my $s = 'I ★ you';
$s =~ s///;
print "$s\n";

# variable with Unicode char
my $愛 = 4;
print "$愛\n";

# function with Unicode char
sub f愛 { return 2;}
print f愛();

Detail at: Unicode in Perl.

Python 2

Python 2.x does not support Unicode char for var or function names.

Python 2.x's Unicode support is not very good. But does work for processing Unicode in string.

If you are processing lots of files, and if one of the file contains a bad char or doesn't use encoding you expected, your python script chokes dead in the middle, you don't even know which file it is or which line unless your code print file names. If you are processing a few thousand files in a dir with all sub-dirs, good luck in finding out which files have already been processed.

Python 3

Python 3 fixed the Unicode problem. Python 3 supports Unicode in variable names and function names. Example:

# -*- coding: utf-8 -*-          ← optional, but still good to have
# python 3

def ƒ(n):
    return n+1

α = 4
print(ƒ(α)) # prints 5

Note: some Unicode seems not allowed. For example: a₁ = 1. (thanks to boostjam)

Detail at: Python Unicode Tutorial 🐍.

JavaScript

JavaScript supports Unicode in var name and function name.

JavaScript identifiers (var or function names) must begin with a letter, underscore (_), or a dollar sign ($). The “letter” or “digit” includes non-ASCII Unicode that are also letters or digits.

// -*- coding: utf-8 -*-

var 愛 = "♥";

function λ(n) {return n + "♥";}

alert( λ(愛) );

As of , all browsers support it.

For detail, see: JavaScript: What Are Allowed Characters in Identifiers (Variable & Function Names)? Unicode?.

Emacs Lisp

For text processing, the most beautiful lang with respect to Unicode is emacs lisp. In elisp, you don't have to declare none of the Unicode or encoding stuff. You simply write code to process string or files, without even having to know what encoding it is. Emacs the environment takes care of all that.

Emacs Lisp also supports Unicode in var/function names. For example:

(defun β ()
  "Inserts stuff"
  (interactive)
  (let ((α "♥ 愛 ☯"))
    (insert α)
    )
  )

(to try the above in emacs: paste the above into a empty file, then select it, then call eval-region to make emacs eval it. Now, you can press 【Alt+x】 then type β (just copy paste), it'll insert “♥ 愛 ☯”.) (See: Emacs and Unicode TipsEmacs Lisp Basics.)

Java

Java supports Unicode fully, including use in var/class/method names. Example:

class  {
    String  = "north";
    double π = 3.14159;
}

class UnicodeTest {
    public static void main(String[] arg) {
        方 x1 = new ();
        System.out.println( x1.北 );
        System.out.println( x1.π );
    }
}

Detail: Java Tutorial: Unicode in Java.

Mathematica

Mathematica (M) supports Unicode extensively, in variable names, function names, and you can define your own operators where the operator is a symbol in Unicode.

Technically, M source code is ASCII. Characters in Unicode or M's own set of math symbols are represented by a markup, much like HTML entities. However, M editor (the Front End) displays it rendered, and there's robust system for user to input math symbols.

Linden Scripting Language (LSL)

LSL also supports Unicode in function or variable names. Example:

string aασ∑♥ = "var with Unicode char in name";

string tασ∑♥() { return "function with Unicode char in name";}

default
{
    state_entry()
    {
        llSay(0, "Hello, Avatar!");
    }

    touch_start(integer num_detected)
    {
        llSay(0, (string) tασ∑♥() + "; " + (string) aασ∑♥);
    }
}

See:

Why Use Unicode in Variable Names?

See: Programing Style: Variable Naming: English Words Considered Harmful.

blog comments powered by Disqus