Unicode: Byte Order (Endianness)
What is Byte Order
Byte Order (aka Endianness. big-endian vs little-endian) indicates the order of byte unit, used in file or Binary transmission.
Byte Order Explained
Best explained by an example. The character ๐ฆ (U+1F98B: BUTTERFLY)
In
UTF-16 Encoding,
it has 4 bytes:
D8 3E DD 8B
In UTF-16, the minimal number of bytes for a character is 2 bytes. So, it groups every 2-byte as one single unit, called code unit.
- In big-endian encoding, the order is
D83E DD8B - In little-endian encoding, the order is
DD8B D83E
Origin of the jargon Big-Endian, Little-Endian
The term Big-Endian vs Little-Endian for byte-order came from a april fools joke ON HOLY WARS AND A PLEA FOR PEACE written by Danny Cohen, published in .
it alludes to Jonathan Swift's 1726 satire Gulliver's Travels. PART I โ A VOYAGE TO LILLIPUT, where the people of Lilliput and Blefuscu fight about which end of egg to crack first.
Unicode and Encoding Explained
- Unicode: Character Set, Encoding, UTF-8, Code Point
- Unicode: Code Point (Char ID)
- Unicode: Character Name
- ASCII Characters
- Unicode: Basic Multilingual Plane
- Unicode: UTF-8 Encoding
- Unicode: UTF-16 Encoding
- Unicode: Surrogate Pair
- Unicode: Byte Order (Endianness)
- Unicode: BOM, Byte Order Mark
- Set Text Editor File Encoding
- Unicode: Letter Character
- Unicode: Variation Selector