HTML: Charset and Encoding

By Xah Lee. Date: . Last updated: .

In HTML, you can declare the Character Set (aka charset) for the file, like this::

<meta charset="utf-8" />

For HTML 4, use this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

Once you declared your character set, you can have characters from that character set in your HTML file.

UTF-8 (Unicode) contains all the world's language's characters. Here is a sample of characters from Unicode: é 😂

Character Entity

Another way to show special characters in your file is by so-called “character entity”.

HTML/HTTP Charset is About Encoding, Not Character Set

HTTP's definition of charset (and the charset meta tag in HTML) is actually about character encoding.

Here is a excerpt:

rfc 2616 charset 2022-10-23 VdJDs
[RFC 2616 At http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4 ]

What is HTML4 or HTML5's Default Encoding?

By spec, there is no default encoding.

A encoding must came from one of http header, meta tag in html file. If none found, the browser must guess.

html5 whatwg char encoding 2019-06-07 p38s7
html5 whatwg char encoding 2019-06-07 [source https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding]

Reference

Allowed Characters and Case Sensitivity