JavaScript String is 16 Bits Byte Sequence

By Xah Lee. Date: . Last updated: .

JavaScript string is sequence of 16-bits values that represent characters from UTF-16 encoding. ECMAScript 2015 §ECMAScript Data Types and Values#sec-ecmascript-language-types-string-type

Each element in a string is technically not a character.

This is important when you have string that contains Unicode character beyond basic multi-lingual plane. (such as emoticons 😸 〔➤see Unicode Emoji 😄 😱 😸 👸 👽 🙋〕 )

Here's a example that shows the difference.

// strings are sequence of 16-bits values, not character
var aa = "α"; // GREEK SMALL LETTER ALPHA, 945, U+3B1
var bb = "😸"; // GRINNING CAT FACE WITH SMILING EYES, 128568, U+1F638

console.log(aa.length); // 1
console.log(bb.length); // 2

Another example.

var aa = "αX"; // GREEK SMALL LETTER ALPHA, 945, U+3B1
var bb = "😸X"; // GRINNING CAT FACE WITH SMILING EYES, 128568, U+1F638

// we want to take the first char
console.log(aa.slice(0, 1)); // α
console.log(bb.slice(0, 1)); // � WRONG!

Note: JavaScript does not do Unicode normalization when reading source code. ES5 §6#sec-6

〔➤see JavaScript: Converting Unicode Character to/from Codepoint

〔➤see Unicode Basics: What's Character Set, Character Encoding, UTF-8?

Like what you read? Buy JavaScript in Depth.