JS: Source Code Encoding

By Xah Lee. Date: . Last updated: .

JavaScript language source code allows any Unicode character. However, it does not specify a encoding. (example: utf8, utf16, or other)

In practice, you should save JavaScript in UTF-8 encoding. UTF-8 is also the standard encoding for web technologies.

Following are the details.

JavaScript Source Code Default Charset?

JavaScript source code's default charset is Unicode.

ECMAScript code is expressed using Unicode, version 8.0.0 or later. ECMAScript source text is a sequence of code points. All Unicode codepoint values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars. The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification.

ECMAScript® 2016 Language Specification#sec-ecmascript-language-source-code

It basically means, JavaScript source code can contain any character of Unicode. For example, including any Chinese character, or Egyptian Hieroglyph, or emoji.

However, it does not specify what encoding is used.

[see Unicode Basics: Character Set, Encoding, UTF-8]

JavaScript Source Code Default Encoding

JavaScript spec does not specify a default encoding for source code.

[see HTTP Protocol Tutorial]

[see HTML: Character Sets and Encoding]

The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification.

Regardless of the external source text encoding, a conforming ECMAScript implementation processes the source text as if it was an equivalent sequence of SourceCharacter values, each SourceCharacter being a Unicode code point. Conforming ECMAScript implementations are not required to perform any normalization of source text, or behave as though they were performing normalization of source text.

ECMAScript® 2016 Language Specification#sec-ecmascript-language-source-code

The best practice is to use UTF-8 to encode your JavaScript source file, because it's the most widely used on the web and is backward compatible with ASCII.

if your code uses non-English characters, you might want to start your JavaScript file by this line

// -*- coding: utf-8 -*-

This line indicates what encoding is used for a file, and is standard for Python, Ruby, emacs lisp, and many editors.

This line is completely ignored by browsers. It is only useful for your editors.

However, you still need to save the file in utf-8 encoding. Typically, you can set your editor (such as emacs, vim, XCode, Microsoft Visual Studio, etc, to save file in utf-8.)

[see Python: Unicode Tutorial 🐍]

[see Ruby: Unicode Tutorial 💎]

[see Emacs: Unicode Tutorial]

JS String

  1. String Overview
  2. Template String
  3. Char, Code Unit, Code Point
  4. String Escape Sequence
  5. Unicode Escape Sequence
  6. String to Number
  7. Encode URL, Escape String
  8. Format Number
  9. Source Code Encoding
  10. Allowed Characters in Identifier
  11. String Object
  12. String.prototype
Liket it? Put $5 at patreon.

Or, Buy JavaScript in Depth

If you have a question, put $5 at patreon and message me.

Web Dev Tutorials

  1. HTML
  2. Visual CSS
  3. JS in Depth
  4. JS Reference
  5. DOM
  6. SVG
  7. Web Dev Blog