HTML: Character Sets and Encoding

By Xah Lee. Date: . Last updated: .

In HTML, you can declare the Character Set for the file, like this::

<meta charset="utf-8" />

For HTML 4, use this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

Once you declared your character set, you can have characters from that character set in your HTML file.

UTF-8 (Unicode) contains all the world's language's characters. Here is a sample of characters from Unicode:

© é 😂

[see Unicode Search 😄]

For unicode/charset/encoding basics, see: Unicode Basics: Character Set, Encoding, UTF-8 .

Character Entity

Another way to show special characters in your file is by so-called “character entity”.

[see HTML Entity List]

HTML/HTTP Charset is About Encoding, Not Character Set

HTTP's definition of charset (and the charset meta tag in HTML) is actually about character encoding.

Here's a excerpt:

rfc 2616 encoding vs char set 2019-06-07 wyz8n
rfc 2616 encoding vs char set 2019-06-07 [RFC 2616 At http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4 ]

What's HTML4 or HTML5's Default Encoding?

By spec, there's no default encoding.

A encoding must came from one of http header, meta tag in html file. If none found, the browser must guess.

html5 whatwg char encoding 2019-06-07 p38s7
html5 whatwg char encoding 2019-06-07 [source https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding]

Reference

JS in Depth
XAH  BUY NOW

HTML Basics

HTML Table

Misc

HTML4 Frameset