JavaScript String Problem

By Xah Lee. Date: . Last updated: .

.@BrendanEich can i ask you a js Q? all javascript string methods break when string contains emoji. is there plan to fix this? and how?

xahlee ask BrendanEich js unicode string 2023-01-21 HJydb
fake unicode js string 2023-01-21 9VP9w
Xah Talk Show 2019-04-29 JavaScript string problem, unicode, emoji

I understand workarounds. But the fact remain. All JavaScript string methods doesn't work when string contain emoji.

so no plan in ECMA to fix this?

it's really sad to hear, that #JavaScript has no plan to fix its broken string methods. Users need to write their own string methods or rely on libs, whenever string contain codepointโ‰ฅ2^16 I can't imagine it's gonna be forever. All modern lang don't have this problem.

the issue isn't just emoji. Any codepoint โ‰ฅ 2^16. "๐’€ช".length===2 this is a practical problem. I thought we need strict 2015; but seems Brendan and ECMA is against such idea.

the point here is that, forever in the future, when a string contains a character whose codepoint โ‰ฅ 2^16, he cannot rely on any string methods. Has to write his own or use external lib. I do not see this as acceptable. But seems, that's what the industry is accepting.

meanwhile, the fancy industry coders talk about new features in JavaScript. Often, i truly, don't understand wtf these hackers are thinking.

someone asked why it's a practical problem. Don't you think a lot strings will contain codepoint beyond 2^16? e.g. twitter, facebook, or any website that allow posts/comments, will have to deal with emoji. So they can't rely on string methods. need to use ext lib or special attention to string.

Sure, such string is perhaps still less than 0.1% of strings. But on the whole, most software will encounter it. Also, programers have to know what's in his string. A high level lang such as JavaScript with this problem is really broken fundamentally.

i think the solution is really a new "use strict 2020". But i recall asking Brendan, he or the js future team is agaist it for reasons i can't agree. Reasons like โ€œmaking things more complicatedโ€. i think the real reason is, corporate interests induced perspective.

if you are not sure what's going on in this thread, see

JavaScript: Character, Code Unit, Codepoint

String Problem in Java and C#

turns out, same problem happens in Java and C#.

/* package whatever; // don't place package name! */

import java.util.*;
import java.lang.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
	public static void main (String[] args) throws java.lang.Exception
        System.out.println("๐Ÿ˜‚".length() == 2);
using System;

public class Test
	public static void Main()
        Console.WriteLine("๐Ÿ˜‚".Length == 2);
        Console.WriteLine("๐Ÿ˜‚".Substring(0,1) != "๐Ÿ˜‚");

2019-04-29 thanks to [2019-04-29 ] for the Java and C# code.