HTML: Charset and Encoding

By Xah Lee. Date: 2005-12-30. Last updated: 2022-10-23.

In HTML, you can declare the Character Set (aka charset) for the file, like this::

<meta charset="utf-8" />

For HTML 4, use this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

Once you declared your character set, you can have characters from that character set in your HTML file.

UTF-8 (Unicode) contains all the world's language's characters. Here is a sample of characters from Unicode: € ™ † é → ♥ ≠ 😂

Character Entity

Another way to show special characters in your file is by so-called “character entity”.

HTTP's definition of charset (and the charset meta tag in HTML) is actually about character encoding.

Here is a excerpt:

By spec, there is no default encoding.

A encoding must came from one of http header, meta tag in html file. If none found, the browser must guess.

The default character set for HTML 4 is Unicode, but you still need to declare the encoding. [W3C: Internationalization: Document Character Set At http://www.w3.org/International/questions/qa-doc-charset ]
How User Agent should determine the char encoding. [HTML 4.01 Specification: 5 HTML Document Representation At http://www.w3.org/TR/html401/charset.html#idx-character_encoding-6 ]