JavaScript String Problem

By Xah Lee. Date: . Last updated: .

.@BrendanEich can i ask you a js Q? all javascript string methods break when string contains emoji. is there plan to fix this? and how?

https://twitter.com/xah_lee/status/1122727700387389441

See also: https://twitter.com/FakeUnicode/status/1122820438340427776

xah talk show 2019-04-29 JavaScript string problem, unicode, emoji

I understand workarounds. But the fact remain. All JavaScript string methods β€œbreak” when string contain codepoint > 2^16.

so no plan in ECMA to fix this?

it's really sad to hear, that #JavaScript has no plan to fix its broken string methods. Users need to write their own string methods or rely on libs, whenever string contain codepointβ‰₯2^16 I can't imagine it's gonna be forever. All modern lang don't have this problem.

the issue isn't just emoji. Any codepoint β‰₯ 2^16. "π’€ͺ".length===2 this is a practical problem. I thought we need strict 2015; but seems Brendan and ECMA is against such idea.

the point here is that, forever in the future, when a string contains a character whose codepoint β‰₯ 2^16, he cannot rely on any string methods. Has to write his own or use external lib. I do not see this as acceptable. But seems, that's what the industry is accepting.

meanwhile, the fancy industry coders talk about new features in JavaScript. Often, i truly, don't understand wtf these hackers are thinking.

someone asked why it's a practical problem. Don't you think a lot strings will contain codepoint beyond 2^16? e.g. twitter, facebook, or any website that allow posts/comments, will have to deal with emoji. So they can't rely on string methods. need to use ext lib or special attention to string.

Sure, such string is perhaps still less than 0.1% of strings. But on the whole, most software will encounter it. Also, programers have to know what's in his string. A high level lang such as JavaScript with this problem is really broken fundamentally.

i think the solution is really a new "use strict 2020". But i recall asking Brendan, he or the js future team is agaist it for reasons i can't agree. Reasons like β€œmaking things more complicated”. i think the real reason is, corporate interests induced perspective.

if you are not sure what's going on in this thread, see

JS: Character, Code Unit, Codepoint

String Problem in Java and C#

turns out, same problem happens in Java and C#.

/* package whatever; // don't place package name! */

import java.util.*;
import java.lang.*;
import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
	public static void main (String[] args) throws java.lang.Exception
	{
        System.out.println("πŸ˜‚".length() == 2);
        System.out.println(!"πŸ˜‚".substring(0,1).equals("πŸ˜‚"));
	}
}
using System;

public class Test
{
	public static void Main()
	{
        Console.WriteLine("πŸ˜‚".Length == 2);
        Console.WriteLine("πŸ˜‚".Substring(0,1) != "πŸ˜‚");
	}
}

2019-04-29 thanks to https://twitter.com/tjcrowder for the Java and C# code.

If you have a question, put $5 at patreon and message me.