Ruby: Unicode Tutorial 💎

By Xah Lee. Date: . Last updated: .

Ruby has robust support of Unicode, starting with version 1.9. This page is about Ruby 1.9 or later.

Source Code Encoding and Default Encoding for String

Start your file by # -*- coding: utf-8 -*-, on first or second line. This will make UTF-8 as the source code's encoding. This is called magic comment.

Any of the following form also work.

# -*- coding: utf-8 -*-  # also emacs, python, convention.
# -*- coding: UTF-8 -*-

# coding: utf-8
# coding: UTF-8

# encoding: utf-8
# encoding: UTF-8

p __ENCODING__ # prints #<Encoding:UTF-8>

Ruby String = Bytes + Encoding Info

In Ruby, each string is a object with info about encoding. You can use the method encoding to find a string's encoding.

# -*- coding: utf-8 -*-
# ruby

p "abc♥".encoding # #<Encoding:UTF-8>

p "abc♥".encoding.name # UTF-8

p "abc♥".size # 4

p "abc♥".bytesize # 6

Change a String's Encoding info

Use method force_encoding to change a string's encoding info. This doesn't actually convert encoding of a string; it simply changes the encoding meta-data.

# -*- coding: utf-8 -*-
# ruby

ss = "α"
p ss # "α"

ss.force_encoding("GB18030")
p ss # "\x{CEB1}"

Convert Encoding for a String

Use method encode! to convert a string's encoding.

Use method encode to change encoding in output, but not modify the string.

# -*- coding: utf-8 -*-
# ruby

ss = "α"
p ss.encoding.name # "UTF-8"

ss.encode!("GB18030")
p ss.encoding.name # "GB18030"

p ss.encode("utf-8") # "α"

Default Encoding of Strings

The default encoding of a string is from your source code encoding.

When you open file, you can specify the external encoding like this:

# -*- coding: utf-8 -*-
# ruby

ff = File.open( "file.txt", "r:UTF-8") # open a file for read, specify a encoding

ff.each {|xx| p xx } # print each line

Ruby has concept of:

Example:

# -*- coding: utf-8 -*-
# ruby

# read a file.
# Tell Ruby what encoding is that file
# Tell Ruby the encoding to use when the content becomes Ruby string

ff = File.open( "file.txt", "r:UTF-8:UTF-16")

p ff.external_encoding.name # UTF-8
p ff.internal_encoding.name # UTF-16

# read a line, save to ss
ss = ff.readline

# print ss's encoding
p ss.encoding.name # UTF-16

When writing out to a file, you can also specify a encoding, like this: open("output.txt", "w:GB18030").

the Encoding Class; Supported Encoding

You can print all supported encoding like this:

# -*- coding: utf-8 -*-
# ruby

Encoding.list.each { |xx| p xx.name}
"ASCII-8BIT"
"UTF-8"
"US-ASCII"
"Big5"
"Big5-HKSCS"
"Big5-UAO"
"CP949"
"Emacs-Mule"
"EUC-JP"
"EUC-KR"
"EUC-TW"
"GB18030"
"GBK"
"ISO-8859-1"
"ISO-8859-2"
"ISO-8859-3"
"ISO-8859-4"
"ISO-8859-5"
"ISO-8859-6"
"ISO-8859-7"
"ISO-8859-8"
"ISO-8859-9"
"ISO-8859-10"
"ISO-8859-11"
"ISO-8859-13"
"ISO-8859-14"
"ISO-8859-15"
"ISO-8859-16"
"KOI8-R"
"KOI8-U"
"Shift_JIS"
"UTF-16BE"
"UTF-16LE"
"UTF-32BE"
"UTF-32LE"
"Windows-1251"
"IBM437"
"IBM737"
"IBM775"
"CP850"
"IBM852"
"CP852"
"IBM855"
"CP855"
"IBM857"
"IBM860"
"IBM861"
"IBM862"
"IBM863"
"IBM864"
"IBM865"
"IBM866"
"IBM869"
"Windows-1258"
"GB1988"
"macCentEuro"
"macCroatian"
"macCyrillic"
"macGreek"
"macIceland"
"macRoman"
"macRomania"
"macThai"
"macTurkish"
"macUkraine"
"CP950"
"CP951"
"stateless-ISO-2022-JP"
"eucJP-ms"
"CP51932"
"GB2312"
"GB12345"
"ISO-2022-JP"
"ISO-2022-JP-2"
"CP50220"
"CP50221"
"Windows-1252"
"Windows-1250"
"Windows-1256"
"Windows-1253"
"Windows-1255"
"Windows-1254"
"TIS-620"
"Windows-874"
"Windows-1257"
"Windows-31J"
"MacJapanese"
"UTF-7"
"UTF8-MAC"
"UTF-16"
"UTF-32"
"UTF8-DoCoMo"
"SJIS-DoCoMo"
"UTF8-KDDI"
"SJIS-KDDI"
"ISO-2022-JP-KDDI"
"stateless-ISO-2022-JP-KDDI"
"UTF8-SoftBank"
"SJIS-SoftBank"

If you are not familiar with Unicode, see: Unicode Basics: What's Character Set, Character Encoding, UTF-8?.

If you have a question, put $5 at patreon and message me.

  1. Ruby Basics
  2. Doc Lookup
  3. Print Version String
  4. Unicode 💎
  5. Quote String
  6. Quote Long String
  7. Format String
  8. String Operations
  9. True, False
  10. “if then else”
  11. for, while, Loop
  12. List
  13. Loop Thru List
  14. Hash Table
  15. Function Optional Parameter
  16. Map f to List
  17. Complex Numbers
  18. Intro to Reading Ruby Doc: What's M, C, ::, # ?
  19. Predefined Global Variables
  20. What's RVM, Gem, Rake, Bundler, RDoc, ri, irb?