HTML: Charset and Encoding

By Xah Lee. Date: . Last updated: .

What is HTML Charset

HTML charset is a set of allow characters and character encoding specification.

In HTML, you can declare the charset for the file, inside the head tag, like this:

<head>

<meta charset="utf-8" />

</head>

〔see Unicode: Character Set, Encoding, UTF-8, Codepoint

Declare Charset in HTML 4

For HTML 4, use this:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

Character Entity

Another way to show special characters in your file is by so-called “character entity”.

HTML/HTTP Charset is About Encoding, Not Character Set

HTTP's definition of charset (and the charset meta tag in HTML) is actually about character encoding.

Here is a excerpt:

rfc 2616 charset 2022-10-23 VdJDs
RFC 2616 At http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.4

What is HTML4 or HTML5's Default Encoding?

By spec, there is no default encoding.

A encoding specification must came from one of:

html5 whatwg char encoding 2019-06-07 p38s7
html5 whatwg char encoding 2019-06-07 [source https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding]

Reference

HTML 4 Default Charset, Encoding, and Declaration

W3C charset 2024-09-04 YQZwj
W3C: Internationalization: Document Character Set At http://www.w3.org/International/questions/qa-doc-charset

How User Agent should determine the character encoding

W3C HTML 4.01 charset 2024-09-04 4Xp8T
HTML 4.01 Specification: 5 HTML Document Representation At http://www.w3.org/TR/html401/charset.html#idx-character_encoding-6

Allowed Characters and Case Sensitivity