Ruby: Unicode Tutorial 💎

By Xah Lee. Date: . Last updated: .

Ruby has robust support of Unicode, starting with version 1.9. This page is about Ruby 1.9 or later.

Source Code Encoding and Default Encoding for String

Ruby String = Bytes + Encoding Info

In Ruby, each string is a object with info about encoding. You can use the method encoding to find a string's encoding.

# ruby

p "abc♥".encoding == Encoding::UTF_8
p "abc♥".encoding.name == "UTF-8"
p "🤡".size == 1
p "🤡".bytesize == 4

Convert Encoding for a String

Use method encode! to convert a string's encoding.

Use method encode to change encoding in output, but not modify the string.

# ruby

ss = "α"
p ss.encoding.name == "UTF-8"

ss.encode!("GB18030")
p ss.encoding.name == "GB18030"

p ss.encode("utf-8") # "α"

Default Encoding of Strings

The default encoding of a string is from your source code encoding.

When you open file, you can specify the external encoding like this:

# ruby

# open a file for read, specify a encoding
ff = File.open( "filename", "r:UTF-8")

# print each line
ff.each {|xx| p xx }

Ruby has concept of:

Example:

# ruby

# read a file.
# Tell Ruby what encoding is that file
# Tell Ruby the encoding to use when the content becomes Ruby string

ff = File.open( "filename", "r:UTF-8:UTF-16")

p ff.external_encoding.name # "UTF-8"
p ff.internal_encoding.name # "UTF-16"

# read a line, save to ss
ss = ff.readline

# print ss's encoding
p ss.encoding.name # "UTF-16"

When writing out to a file, you can also specify a encoding, like this: open("output.txt", "w:GB18030").

Supported Encoding

Ruby: Char Encoding

Ruby String