Using Unicode in HTML Attributes

By Xah Lee. Date: ,

Discovered that you can use Unicode in your HTML tag attribute values. Here's a sample HTML:

<html>
<head>
<title>Unicode in HTML Tag Attributes</title>
<style>
p.α {color:red}
</style>
</head>
<body>

<p class="α">yay!</p>

</body>
</html>

In the above, notice the greek alpha α character, used as attribute value.

Here's the page you can see the above source code rendered: Sample Page of Unicode in HTML Tag Attributes.

This works in all latest versions of Firefox, Internet Explorer 8, Google Chrome, Safari, Opera, on Windows. (as of 2010-12)

You can use any other Unicode, including various bullets symbols, math symbols. For a sample list of Unicode chars, see: Sample Unicode Characters.

If you use emacs, you can enter Unicode chars easily. See: Emacs: xah-math-input.elEmacs and Unicode Tips.

ID's Value Cannot Contain Unicode

However, ID's value must not contain Unicode. It can be letters A to z, 0 to 9, and -_:.. It cannot contain space and cannot start with a number.

However, it passes W3C's validator. See: W3C HTML Validator Invalid.

How is it Useful?

This could useful to reduce file size and reduce attribute value space jam, especially in HTML generating codes. (➢ for example: concent management system's engines)

For example, here's a source code of OCaml language.

(* array examples *)
let x = [| 2; 8; 3 |];;
print_int x.(1);;
x.(1) <- 9;;
let x = Array.make 9 4;;

The following is the syntax colored version:

(* array examples *)
let x = [| 2; 8; 3 |];;
print_int x.(1);;
x.(1) <- 9;;
let x = Array.make 9 4;;

The following is the HTML source code for it:

<span class="comment">(* array examples *)</span>
<span class="tuareg-font-lock-governing">let</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-operator">[|</span> 2<span class="tuareg-font-lock-operator">;</span> 8<span class="tuareg-font-lock-operator">;</span> 3 <span class="tuareg-font-lock-operator">|];;</span>
print_int x.<span class="tuareg-font-lock-operator">(</span>1<span class="tuareg-font-lock-operator">);;</span>
x.<span class="tuareg-font-lock-operator">(</span>1<span class="tuareg-font-lock-operator">)</span> <span class="tuareg-font-lock-operator">&lt;-</span> 9<span class="tuareg-font-lock-operator">;;</span>
<span class="tuareg-font-lock-governing">let</span> <span class="variable-name">x </span><span class="tuareg-font-lock-operator">=</span> <span class="type">Array</span>.make 9 4<span class="tuareg-font-lock-operator">;;</span>

See how verbose it is? For each token in the OCaml lang, it is wrapped by a span tag with a particular class name. Each of these class name can be replaced by a short Unicode char, but remain unique, meaningful, and doesn't pollute your class value space for normal use. For example:

Before:

<span class="tuareg-font-lock-operator">…</span>
<span class="variable-name">…</span>
<span class="string">…</span>

After:

<span class="♠o">…</span> <!-- for operator -->
<span class="♠v">…</span> <!-- for variable -->
<span class="♠s">…</span> <!-- for string -->

Here, we used the spade symbol ♠ for all class values that is used for syntax coloring. Effectively created our own namespace.

For a example of how verbose it can become, see: Emacs nxml-mode Fontification Changes.

If you use emacs, you might be interested in: Using Emacs To Syntax Color Source Code In HTML.

Like what you read? Buy JavaScript in Depth.