Unicode: Byte Order (Endianness)
Byte Order (aka Endianness. big-endian vs little-endian) indicates the order of byte unit, used in file or Binary transmission.
For example, the character ๐คก
- Unicode Character Name: CLOWN FACE
- Codepoint: 129313
- Codepoint in hexadecimal: 1F921
- UTF-8 Encoding: F0 9F A4 A1
- UTF-16 Encoding: D83E DD21
In
UTF-16 Encoding,
it has 4 bytes:
D83E DD21
(Each hexadecimal represents 4 binary digits. So, 2 hexadecimal digits is 8 binary digits, thus 1 byte.)
In UTF-16, the minimal number of bytes for a character is 2 bytes. So, it groups every 2-byte as one single unit, called code unit.
- In big-endian encoding, the order is
D83E DD21
- In little-endian encoding, the order is
DD21 D83E
Origin of the jargon Big-Endian, Little-Endian
The term Big-Endian vs Little-Endian for byte-order came from a article written by Danny Cohen, published in 1980.
[ON HOLY WARS AND A PLEA FOR PEACE By Danny Cohen. At Endian_war_1980_Danny_Cohen.txt ]
it alludes to Jonathan Swift's 1726 satire Gulliver's Travels. PART I โ A VOYAGE TO LILLIPUT, where the people of Lilliput and Blefuscu fight about which end of egg to crack first.