Golang: String, Byte Slice, Rune Slice

By Xah Lee. Date: . Last updated: .

Why You Need to Convert Between String, Byte Slice, Rune Slice

One annoying thing about golang is that you have to constantly convert between {string, byte slice, rune slice}.

They are the same thing in 3 different formats.

String is a nice way to deal with short sequence of Bytes or ASCII Characters. Everytime you operate on string, such as find replace string or take substring, a new string is created. This is very inefficient if string is huge, such as file content. [see Golang: String]

Byte slice is just like string, but mutable. You can modify each byte or character. This is very efficient for working with file content, either as text file, binary file. [see Golang: Slice]

Rune slice is like byte slice, except that each index is a character instead of a byte. This is best if you work with text files that have lots non-ASCII characters, such as Chinese or Unicode: Math Symbols ∑ ∫ π² ∞ or Unicode: Emoji 😄 .

Convert Between String, Byte Slice, Rune Slice

Here's common solutions working with golang string as character sequence.

[]byte(str)
String to byte slice.
string(byteSlice)
Byte slice to string.
[]rune(str)
String to rune slice.
string(runeSlice)
Rune slice to string.
[]rune(string(byteSlice))
Byte slice to rune slice.
[]byte(string(runeSlice))
Rune slice to byte slice.
utf8.RuneCount(byteSlice)
Return the count of characters in byteSlice.
utf8.RuneCountInString(str)
Return the count of character in string. (“character” here means Rune).

String To Byte Slice

[]byte(str)

package main

import "fmt"

func main() {

	var x = "abc→"

	// convert string to byte slice
	var bs = ([]byte)(x)

	fmt.Printf("%v\n", bs) // [97 98 99 226 134 146]

}

Byte Slice to String

string(byteSlice)

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to string
	var str = string(bs)

	fmt.Printf("%v\n", str) // abc→

	// print type
	fmt.Printf("%T\n", str) // string

}

String To Rune Slice

[]rune(str)

package main

import "fmt"

func main() {

	var str = "abc→"

	// convert string to rune slice
	var rs = []rune(str)

	fmt.Printf("%v\n", rs) // [97 98 99 8594]

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To String

string(runeSlice)

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// convert rune slice to string
	var str = string(rs)

	fmt.Printf("%#v\n", str) // "abc→"

	// print type
	fmt.Printf("%T\n", str) // string

}

Byte Slice To Rune Slice

[]rune(string(byteSlice))

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to rune slice
	var rs = []rune(string(bs))

	for _, v := range rs {
		fmt.Printf("%c", v)
	}
	// abc→

	fmt.Printf("\n")

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To Byte Slice

[]byte(string(runeSlice))

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// print type
	fmt.Printf("%T\n", rs) // []int32

	// convert rune slice to byte slice
	var bs = []byte(string(rs))

	fmt.Printf("%#v\n", bs)
	// []byte{0x61, 0x62, 0x63, 0xe2, 0x86, 0x92}

	fmt.Printf("%d\n", bs)
	// [97 98 99 226 134 146]

	fmt.Printf("%q\n", bs)
	// "abc→"

}

count of Characters

To count the number of character, there are few ways:

Use import "unicode/utf8"

utf8.RuneCount(byteSlice)
Return the count of characters in byteSlice.
utf8.RuneCountInString(str)
Return the count of character in string. (character here means Rune)

Or convert it to rune slice, then call len, example: len([]rune("I ♥ U"))

package main

import "fmt"
import "unicode/utf8"

func main() {
	var x = "I ♥ U"

	// number of bytes
	fmt.Printf("%v\n", len(x)) // 7

	// number of characters
	fmt.Printf("%v\n", utf8.RuneCountInString(x)) // 5
}

Substring by Character Index

To get a substring with proper character boundaries, convert it to rune slice first. Like this:

package main

import "fmt"

func main() {

	// string of non-ascii chars
	var x = "♥😂→★🍎"

	// convert to rune slice
	var y = []rune(x)

	// take a slice from index 2 to 3
	var z = y[2:4]

	// print as chars
	fmt.Printf("%q\n", z) // ['→' '★']

	// print in go syntax
	fmt.Printf("%#v\n", z) // []int32{8594, 9733}

}

Given Byte Index that Start a Character, Find Its Char Index

Given a index (of a char start byte) of a string (or byte slice), find the corresponding rune (char start) index.

Solution:
utf8.RuneCount(byteSlice[0,index])
or
utf8.RuneCountInString(textStr[0,index])

package main

import "fmt"
import "unicode/utf8"

// chinese text (or any text containing non-ASCII)
var x = "中文和英文"

// 6 is the start of the char 和
var i = 6

// we want to show user the char position

func main() {

	fmt.Printf("position of 和 is: %v\n", utf8.RuneCountInString(x[0:i])) // 2
	// position of 和 is: 2

}

Given a Random Byte Index, Find the Index that Start a Char

Given index of a byte slice, how to find the byte index that starts a character before the byte? (the byte slice may contain non-ASCII characters. [see ASCII Characters] )

Solution:
import "unicode/utf8"
then
for !utf8.RuneStart(textBytes[index]) { index-- }

Sample code:

package main

import "fmt"
import "unicode/utf8"

var x = "中文"

// chinese

var i = 4

func main() {

	fmt.Printf("%q\n", x[i]) // '\u0096'
	// result is a byte inside unicode byte sequence

	// set index to the index that begins a char
	for !utf8.RuneStart(x[i]) {
		i--
	}

	fmt.Printf("%v\n", i) // 3
	// the index that begins a unicode char is 3

	fmt.Printf("%q\n", x[i:len(x)]) // "文"
	// now u can extra substring properly
}

Loop Thru Character in String

for i, c := range str {}
go thru characters in string. i is the index (with respect to bytes), c is the character.
package main

import "fmt"

func main() {
	const x = "abc♥ 😂d"
	for i, c := range x {
		fmt.Printf("%v %q\n", i, c)
	}
}

// 0 'a'
// 1 'b'
// 2 'c'
// 3 '♥'
// 6 ' '
// 7 '😂'
// 11 'd'

if you don't need the index, do:

for _, c := range str {}

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		fmt.Printf("%q, %U\n", c, c)
	}
}

// '♥', U+2665
// ' ', U+0020
// '😂', U+1F602

Note: when you loop thru string by range, each character in string is basically turned into a “rune” type, which is golang's term for Unicode Codepoint. That is, a integer id for the character.

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		// print the char and its type
		fmt.Printf("%q, %T\n", c, c)
	}
}

// '♥', int32
// ' ', int32
// '😂', int32

[see Golang: Rune]

Golang String