Golang: String, Byte Slice, Rune Slice

By Xah Lee. Date: . Last updated: .

Why You Need to Convert Between String, Byte Slice, Rune Slice

One annoying thing about golang is that you have to constantly convert between {string, byte slice, rune slice}.

They are the same thing in 3 different formats.

String is a nice way to deal with short sequence of Bytes or ASCII Characters. Everytime you operate on string, such as find replace string or take substring, a new string is created. This is very inefficient if string is huge, such as file content. 〔see Golang: String

Byte slice is just like string, but mutable. You can modify each byte or character. This is very efficient for working with file content, either as text file, binary file. 〔see Golang: Slice

Rune slice is like byte slice, except that each index is a character instead of a byte. This is best if you work with text files that have lots non-ASCII characters, such as Chinese or Unicode: Math Symbols π² ∞ ∫ or Unicode: Emoji 😄 .

Convert Between String, Byte Slice, Rune Slice

Here's common solutions working with golang string as character sequence.

[]byte(str)

String to byte slice.

string(byteSlice)

Byte slice to string.

[]rune(str)

String to rune slice.

string(runeSlice)

Rune slice to string.

[]rune(string(byteSlice))

Byte slice to rune slice.

[]byte(string(runeSlice))

Rune slice to byte slice.

utf8.RuneCount(byteSlice)

Return the count of characters in byteSlice.

utf8.RuneCountInString(str)

Return the count of character in string. (“character” here means Rune).

String To Byte Slice

[]byte(str)

package main

import "fmt"

func main() {

	var x = "abc→"

	// convert string to byte slice
	var bs = ([]byte)(x)

	fmt.Printf("%v\n", bs) // [97 98 99 226 134 146]

}

Byte Slice to String

string(byteSlice)

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to string
	var str = string(bs)

	fmt.Printf("%v\n", str) // abc→

	// print type
	fmt.Printf("%T\n", str) // string

}

String To Rune Slice

[]rune(str)

package main

import "fmt"

func main() {

	var str = "abc→"

	// convert string to rune slice
	var rs = []rune(str)

	fmt.Printf("%v\n", rs) // [97 98 99 8594]

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To String

string(runeSlice)

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// convert rune slice to string
	var str = string(rs)

	fmt.Printf("%#v\n", str) // "abc→"

	// print type
	fmt.Printf("%T\n", str) // string

}

Byte Slice To Rune Slice

[]rune(string(byteSlice))

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to rune slice
	var rs = []rune(string(bs))

	for _, v := range rs {
		fmt.Printf("%c", v)
	}
	// abc→

	fmt.Printf("\n")

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To Byte Slice

[]byte(string(runeSlice))

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// print type
	fmt.Printf("%T\n", rs) // []int32

	// convert rune slice to byte slice
	var bs = []byte(string(rs))

	fmt.Printf("%#v\n", bs)
	// []byte{0x61, 0x62, 0x63, 0xe2, 0x86, 0x92}

	fmt.Printf("%d\n", bs)
	// [97 98 99 226 134 146]

	fmt.Printf("%q\n", bs)
	// "abc→"

}

count of Characters

To count the number of character, there are few ways:

Use import "unicode/utf8"

utf8.RuneCount(byteSlice)

Return the count of characters in byteSlice.

utf8.RuneCountInString(str)

Return the count of character in string. (character here means Rune)

Or convert it to rune slice, then call len, e.g. len([]rune("I ♥ U"))

package main

import "fmt"
import "unicode/utf8"

func main() {
	var x = "I ♥ U"

	// number of bytes
	fmt.Printf("%v\n", len(x)) // 7

	// number of characters
	fmt.Printf("%v\n", utf8.RuneCountInString(x)) // 5
}

Substring by Character Index

To get a substring with proper character boundaries, convert it to rune slice first. Like this:

package main

import "fmt"

func main() {

	// string of non-ascii chars
	var x = "♥😂→★🍎"

	// convert to rune slice
	var y = []rune(x)

	// take a slice from index 2 to 3
	var z = y[2:4]

	// print as chars
	fmt.Printf("%q\n", z) // ['→' '★']

	// print in go syntax
	fmt.Printf("%#v\n", z) // []int32{8594, 9733}

}

Given Byte Index that Start a Character, Find Its Char Index

Given a index (of a char start byte) of a string (or byte slice), find the corresponding rune (char start) index.

Solution:

utf8.RuneCount(byteSlice[0,index])

or

utf8.RuneCountInString(textStr[0,index])

package main

import "fmt"
import "unicode/utf8"

// chinese text (or any text containing non-ASCII)
var x = "中文和英文"

// 6 is the start of the char 和
var i = 6

// we want to show user the char position

func main() {

	fmt.Printf("position of 和 is: %v\n", utf8.RuneCountInString(x[0:i])) // 2
	// position of 和 is: 2

}

Given a Random Byte Index, Find the Index that Start a Char

Given index of a byte slice, how to find the byte index that starts a character before the byte? (the byte slice may contain non-ASCII characters. 〔see ASCII Characters〕 )

Solution, first:

import "unicode/utf8"

then

for !utf8.RuneStart(textBytes[index]) { index-- }

Sample code:

package main

import "fmt"
import "unicode/utf8"

var x = "中文"

// chinese

var i = 4

func main() {

	fmt.Printf("%q\n", x[i]) // '\u0096'
	// result is a byte inside unicode byte sequence

	// set index to the index that begins a char
	for !utf8.RuneStart(x[i]) {
		i--
	}

	fmt.Printf("%v\n", i) // 3
	// the index that begins a unicode char is 3

	fmt.Printf("%q\n", x[i:len(x)]) // "文"
	// now u can extra substring properly
}

Iterate Character in String

for i, c := range str {body}

go thru characters in string. i is the index (with respect to bytes), c is the character.

package main

import "fmt"

func main() {
	const x = "abc♥ 😂d"
	for i, c := range x {
		fmt.Printf("%v %q\n", i, c)
	}
}

// 0 'a'
// 1 'b'
// 2 'c'
// 3 '♥'
// 6 ' '
// 7 '😂'
// 11 'd'

if you don't need the index, do:

for _, c := range str {body}

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		fmt.Printf("%q, %U\n", c, c)
	}
}

// '♥', U+2665
// ' ', U+0020
// '😂', U+1F602

Note: when you loop thru string by range, each character in string is basically turned into a Rune type.

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		// print the char and its type
		fmt.Printf("%q, %T\n", c, c)
	}
}

// '♥', int32
// ' ', int32
// '😂', int32

Golang, String