Programing Language Design: String Byte vs Code Unit vs Code Point

By Xah Lee. Date: .
xtodo

string byte vs code unit vs code point

string byte vs code point 2026-01-12 18ee1
string byte vs code point 2026-01-12 18ee1

The issue is about string implementation. Is it bytes, code unit (2 bytes), or code point (character), or even, swift has “graphmes cluster” (which consider composition characters as single char)

of langs that i know, JavaScript, sucks. because there is actually no way, to deal with strings that contains emoji, unless you manually write every string method.

Golang is superb. Basically, string is considered as sequence of bytes. However, it has systematic way to deal with unicode strings, by convertedit to slice (aka dynamic array) of “runes” (aka unicode characters).

other superb ones are, Wolfram language, emacs lisp, PowerShell, python 3, fsharp. all these's string are unicode characters.