Programing Language Design: String Byte vs Code Unit vs Code Point
string byte vs code unit vs code point
- software engineering. programing language design.
- very interesting topic.
- which programing lang has code unit as its string index
- https://x.com/i/grok/share/qoZp48hkhLcYFPNH3kHHZKTKK
The issue is about string implementation. Is it bytes, code unit (2 bytes), or code point (character), or even, swift has “graphmes cluster” (which consider composition characters as single char)
of langs that i know, JavaScript, sucks. because there is actually no way, to deal with strings that contains emoji, unless you manually write every string method.
Golang is superb. Basically, string is considered as sequence of bytes. However, it has systematic way to deal with unicode strings, by convertedit to slice (aka dynamic array) of “runes” (aka unicode characters).
other superb ones are, Wolfram language, emacs lisp, PowerShell, python 3, fsharp. all these's string are unicode characters.