Ruby: Unicode Tutorial 💎
Ruby has robust support of Unicode, starting with version 1.9. This page is about Ruby 1.9 or later.
Source Code Encoding and Default Encoding for String
Ruby String = Bytes + Encoding Info
In Ruby, each string is a object with info about encoding. You can use the method encoding
to find a string's encoding.
# ruby p "abc♥".encoding == Encoding::UTF_8 p "abc♥".encoding.name == "UTF-8" p "🤡".size == 1 p "🤡".bytesize == 4
Convert Encoding for a String
Use method encode!
to convert a string's encoding.
Use method encode
to change encoding in output, but not modify the string.
# ruby ss = "α" p ss.encoding.name == "UTF-8" ss.encode!("GB18030") p ss.encoding.name == "GB18030" p ss.encode("utf-8") # "α"
Default Encoding of Strings
The default encoding of a string is from your source code encoding.
When you open file, you can specify the external encoding like this:
# ruby # open a file for read, specify a encoding ff = File.open( "filename", "r:UTF-8") # print each line ff.each {|xx| p xx }
Ruby has concept of:
- external_encoding → the encoding of a external file
- internal_encoding → the encoding to use when the content becomes Ruby string
Example:
# ruby # read a file. # Tell Ruby what encoding is that file # Tell Ruby the encoding to use when the content becomes Ruby string ff = File.open( "filename", "r:UTF-8:UTF-16") p ff.external_encoding.name # "UTF-8" p ff.internal_encoding.name # "UTF-16" # read a line, save to ss ss = ff.readline # print ss's encoding p ss.encoding.name # "UTF-16"
When writing out to a file, you can also specify a encoding, like this: open("output.txt", "w:GB18030")
.