JavaScript String is 16 Bits Byte Sequence

, , …,
Want to master JavaScript in a week? Buy Xah JavaScript Tutorial.

Is JavaScript string a sequence of characters or byte values?

JavaScript string is sequence of 16-bits values that represent characters from UTF-16 encoding. ECMAScript §8#sec-8.4 (all JavaScript String Methods consider string as sequence of 16-bits values)

Here's a example that shows the difference.

// example. strings are sequence of 16-bits values, not character
var aa = "α"; // GREEK SMALL LETTER ALPHA, 945, U+3B1
var bb = "😸"; // GRINNING CAT FACE WITH SMILING EYES, 128568, U+1F638

console.log(aa.length); // 1
console.log(bb.length); // 2

If you have a Unicode character in string whose codepoint is larger than 2^16 (character outside the “basic multi-lingual plane”, ⁖ Unicode Emoticons, Faces 😃 😄 😱 😸 👸 👽 👍) then result may not be what you expected. For solution, see: JavaScript: Converting Unicode Character to/from Codepoint.

JavaScript compilers don't do Unicode normalization when reading source code. ECMAScript §6#sec-6

for a intro to Unicode, see: Unicode Basics: What's Character Set, Character Encoding, UTF-8?.

blog comments powered by Disqus