Vast majority of computer languages use ASCII as its character set. This means, it jams multitude of operators into about 20 symbols. Often, a symbol has multiple meanings depending on contex. Also, a sequence of chars are used as a single symbol as a workaround for lack of symbols. Even for languages that use Unicode as its char set (⁖ Java, XML), often still use the ≈20 ASCII symbols for all its operators. The only exceptions i know of are Mathematica, Fortress, APL. This page gives example and problems of symbol congestion.
Here are some common examples of a symbol that has multiple meanings depending on context:
In Java, the SQUARE BRACKET
[
] is use for declaring array type main(String[] args). Also, part of syntax for array initiation myArray = new int[10];.
Also, a delimiter for getting a element of array myArray[i].
In Java and most other languages, PARENTHESIS
(
) is used for expression
grouping (x + y) * z, also as delimiter for arguments of a function call System.out.print(x), also as
delimiters for parameters of a function's declaration main(String[] args).
In {C, Perl} and many other langs, COLON
: is used as a separator in a ternary expression (⁖ (test ? "yes" : "no")), also as a namespace separator (⁖ use Data::Dumper;).
In URL, SOLIDUS
/ is used as path separator, also as indicator of protocol. ⁖ http://example.org/comp/unicode.html
In Python and many others, LESS-THAN SIGN “<” is used for “less than” boolean operator, but also as a alignment flag in its “format” method, also as a delimiter of named group in regex, and also as part of char in other operators that are made of 2 chars, ⁖
{<<
<=
<<=
<>}.
The above are just some examples to illustrate the issue. There are perhaps 100 times more.
Here are some common examples of operators that are made of multiple characters:
|| Logical OR&& Logical AND== Equality Testing!= Inequality Testing<= Less-or-Equal than Testing>= Greater-or-Equal than Testing++ Increase by One-- Decrease by One** Exponential=+ Add and Assign=* Multiply and Assign:= Definine:: Namespace Separator// Floor Division.. Range OperatorThe tradition of sticking to the 95 chars in ASCII of 1960s is extremely limiting. It creates complex problems manifested in:
String Escape mechanism, for example, C's backslash {\n, \r, \t, \/, …}, widely adopted. A better solution would be Unicode symbols for unprintable chars. Example candidates:
(Note: string escape mechanism is ultimately necessary, but using proper Unicode can alleviate 99% of the need. (See also: Computing Symbols in Unicode.))
The backslash string escape mechanism directly leads to crazy leaning toothpicks syndrome, especially bad in emacs regex. Example:
"<img src=\"\\([^\"]+\\)\" alt=\"\\([^\"]+\\)\" width=\"\\([0-9]+\\)\" height=\"\\([0-9]+\\)\">"
This is particularly bad in regex. For example, ^ has multiple meanings depending on where it is placed. If in the beginning, it's a line beginning marker, if as first char inside square brackets ⁖ [^…] then it's a negation, otherwise it's literal.
Many other regex chars also have special meaning, some depends on their position. ⁖ ^ $ ? | . + \ - { } ( ) [ ] ….
Whether a symbol's meaning is literal, or whether their position changes meaning, or wether meaning is changed inside [], is completely ad hoc.
The lack of bracketing symbols leads to varieties of unnecessarily complex string delimiters to help solve the problem of quoting.
Python's triple quotes: {'''…''', """…"""}. 〔☛ Strings in Perl & Python〕
Perl's varying delimiters: {q(…), q[…], q{…}, m/…/}.
Perl, PHP, unix shell's heredoc. 〔☛ PHP: String Syntax & Heredoc〕
(See also: Computer Language Design: String Syntax.)
HTML entities, ⁖ { &, <, >, ", α, α, α, …}.
Example. This:
<p>he wrote “4 > 3”</p>
is written as:
<p>he wrote “4 > 3”</p>
The HTML entities are invented partly as a mechanism of avoiding symbol jam of the characters: < > &, and partly as a kludge for entering frequently needed symbols (⁖ © ™ α → …), and partly as a kludge to avoid char encoding and transmission problem (i.e. there's no UNICODE in 1980s, and only ASCII and a handful other basic encoding is widely recognized.). 〔☛ HTML/XML Entities List〕
For a concrete example of how this induced complexity in code, see: ASCII Jam Problem: HTML Entities.
URL percent encoding and encoding Unicode in URL. Example:
http://en.wikipedia.org/wiki/St._Jerome_in_His_Study_%28D%C3%BCrer%29
for
http://en.wikipedia.org/wiki/St._Jerome_in_His_Study_(Dürer)
The complexity in resolving the ambiguity of the Ampersand char in URL and CGI protocol. See: URL Percent Encoding and Ampersand Char ◇ URL Percent Encoding problems ◇ JavaScript Encode URL, Escape String.
When a language or config needs to represent keystrokes, the ASCII jam made complexities and readibility much worse. See:
All these problems occur because we are jamming so many meanings into about 20 symbols in ASCII.
The language designer Guy Steele recently gave a very interesting talk. See: Guy Steele on Parallel Programing. In it, he showed code snippets of his language Fortress, which uses Unicode as operators.
For example, list delimiters are Unicode angle bracket ⟨1,2,3⟩. 〔☛ Matching Brackets in Unicode〕 It also uses the circle plus “⊕” as operator. 〔☛ Math Symbols in Unicode〕
Most of today's languages do not support Unicode in function or variable names, so you can forget about using Unicode in variable names (⁖ α = 3) or function names (⁖ “lambda” as “λ” or “function” as “ƒ”), or defining your own operators (⁖ “⊕”).
〔☛ Unicode Support in Ruby, Perl, Python, JavaScript, Java, Emacs Lisp, Mathematica〕
Today, it's trivial to create a keyboard layout to type any set of Unicode symbols you choose. See: How to Create a APL or Math Symbols Keyboard Layout.