Unicode: Byte Order (Endianness)

By Xah Lee. Date: 2022-10-22. Last updated: 2024-05-22.

What is Byte Order

Byte Order (aka Endianness. big-endian vs little-endian) indicates the order of byte unit, used in file or Binary transmission.

Byte Order Explained

Best explained by an example. The character 🤡

Unicode: Character Name: CLOWN FACE
Codepoint: 129313
Codepoint in hexadecimal: 1F921
UTF-8 Encoding: F0 9F A4 A1
UTF-16 Encoding: D83E DD21

In UTF-16 Encoding, it has 4 bytes: D8 3E DD 21 (Each 2 hexadecimals is one byte (one byte is 8 binary digits))

In UTF-16, the minimal number of bytes for a character is 2 bytes. So, it groups every 2-byte as one single unit, called code unit.

In big-endian encoding, the order is D83E DD21
In little-endian encoding, the order is DD21 D83E

Origin of the jargon Big-Endian, Little-Endian

The term Big-Endian vs Little-Endian for byte-order came from a april fools joke ON HOLY WARS AND A PLEA FOR PEACE written by Danny Cohen, published in 1980-04-01.

it alludes to Jonathan Swift's 1726 satire Gulliver's Travels. PART I — A VOYAGE TO LILLIPUT, where the people of Lilliput and Blefuscu fight about which end of egg to crack first.

Unicode: Byte Order (Endianness)

What is Byte Order

Byte Order Explained

Origin of the jargon Big-Endian, Little-Endian

Unicode and Encoding Explained