Golang: String, Byte Slice, Rune Slice

By Xah Lee. Date: . Last updated: .

One annoying thing about golang is that you have to constantly convert string, byte slice, and rune slice.

They are the same thing in 3 different formats.

String is a nice way to deal with short sequence, of bytes or characters. Everytime you operate on string, such as find replace string or take substring, a new string is created. This is very inefficient if string is huge, such as file content. [see Golang: String]

Byte slice is just like string, but mutable. You can modify each byte or character. This is very efficient for working with file content, either as text file, binary file. [see Golang: Slice]

Rune slice is like byte slice, except that each index is a character instead of a byte. This is best if you work with text files that have lots non-ASCII characters, such as Chinese text or math formulas ∑ or text with emoji ♥ . [see Golang: Rune]

[see ASCII Table]

Here's common solutions working with golang string as character sequence.

[]byte(str)
String to byte slice.
string(byteSlice)
Byte slice to string.
[]rune(str)
String to rune slice.
string(runeSlice)
Rune slice to string.
[]rune(string(byteSlice))
Byte slice to rune slice.
[]byte(string(runeSlice))
Rune slice to byte slice.
utf8.RuneCount(byteSlice)
Return the number of characters in byteSlice.
utf8.RuneCountInString(str)
Return the number of character in string. (character here means Unicode codepoint, aka rune).

String To Byte Slice

[]byte(str)

package main

import "fmt"

func main() {

	var x = "abc→"

	// convert string to byte slice
	var bs = ([]byte)(x)

	fmt.Printf("%v\n", bs) // [97 98 99 226 134 146]

}

Byte Slice to String

string(byteSlice)

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to string
	var str = string(bs)

	fmt.Printf("%v\n", str) // abc→

	// print type
	fmt.Printf("%T\n", str) // string

}

String To Rune Slice

[]rune(str)

package main

import "fmt"

func main() {

	var str = "abc→"

	// convert string to rune slice
	var rs = []rune(str)

	fmt.Printf("%v\n", rs) // [97 98 99 8594]

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To String

string(runeSlice)

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// convert rune slice to string
	var str = string(rs)

	fmt.Printf("%#v\n", str) // "abc→"

	// print type
	fmt.Printf("%T\n", str) // string

}

Byte Slice To Rune Slice

[]rune(string(byteSlice))

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to rune slice
	var rs = []rune(string(bs))

	for _, v := range rs {
		fmt.Printf("%c", v)
	}
	// abc→

	fmt.Printf("\n")

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To Byte Slice

[]byte(string(runeSlice))

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// print type
	fmt.Printf("%T\n", rs) // []int32

	// convert rune slice to byte slice
	var bs = []byte(string(rs))

	fmt.Printf("%#v\n", bs)
	// []byte{0x61, 0x62, 0x63, 0xe2, 0x86, 0x92}

	fmt.Printf("%d\n", bs)
	// [97 98 99 226 134 146]

	fmt.Printf("%q\n", bs)
	// "abc→"

}

Number of Characters

To count the number of character, there are few ways:

Use import "unicode/utf8"

utf8.RuneCount(byteSlice)
Return the number of characters in byteSlice.

utf8.RuneCountInString(str)

Return the number of character in string. (character here means Unicode codepoint, aka rune)

Or convert it to rune slice, then call len, example: len([]rune("I ♥ U"))

package main

import "fmt"
import "unicode/utf8"

func main() {
	var x = "I ♥ U"

	// number of bytes
	fmt.Printf("%v\n", len(x)) // 7

	// number of characters
	fmt.Printf("%v\n", utf8.RuneCountInString(x)) // 5
}

Substring by Character Index

To get a substring with proper character boundaries, convert it to rune slice first. Like this:

package main

import "fmt"

func main() {

	// string of unicode
	var x = "♥😂→★🍎"

	// convert to rune slice
	var y = []rune(x)

	// take a slice from index 2 to 3
	var z = y[2:4]

	// print as chars
	fmt.Printf("%q\n", z) // ['→' '★']

	// print in go syntax
	fmt.Printf("%#v\n", z) // []int32{8594, 9733}

}

Given Byte Index that Start a Character, Find Its Char Index

given a index (of a char start byte) of a string (or byte slice), find the corresponding rune (char start) index.

solution:

utf8.RuneCount(byteSlice[0,index])

or

utf8.RuneCountInString(textStr[0,index])

package main

import "fmt"
import "unicode/utf8"

// chinese text (or any text containing non-ASCII)
var x = "中文和英文"

// 6 is the start of the char 和
var i = 6

// we want to show user the char position

func main() {

	fmt.Printf("position of 和 is: %v\n", utf8.RuneCountInString(x[0:i])) // 2
	// position of 和 is: 2

}

Given a Random Byte Index, Find the Index that Start a Char

given index of a byte slice, how to find the byte index that starts a character before the byte? (the byte slice may contain unicode)

Solution:

import "unicode/utf8"

and

for !utf8.RuneStart(textBytes[index]) { index-- }

Sample code:

package main

import "fmt"
import "unicode/utf8"

var x = "中文"

// chinese

var i = 4

func main() {

	fmt.Printf("%q\n", x[i]) // '\u0096'
	// result is a byte inside unicode byte sequence

	// set index to the index that begins a char
	for !utf8.RuneStart(x[i]) {
		i--
	}

	fmt.Printf("%v\n", i) // 3
	// the index that begins a unicode char is 3

	fmt.Printf("%q\n", x[i:len(x)]) // "文"
	// now u can extra substring properly
}

Loop Thru Character in String

for i, c := range str {}
go thru characters in string. i is the index (with respect to bytes), c is the character.
package main

import "fmt"

func main() {
	const x = "abc♥ 😂d"
	for i, c := range x {
		fmt.Printf("%v %q\n", i, c)
	}
}

// 0 'a'
// 1 'b'
// 2 'c'
// 3 '♥'
// 6 ' '
// 7 '😂'
// 11 'd'

if you don't need the index, do:

for _, c := range str {}

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		fmt.Printf("%q, %U\n", c, c)
	}
}

// '♥', U+2665
// ' ', U+0020
// '😂', U+1F602

Note: when you loop thru string by range, each character in string is basically turned into a “rune” type, which is golang's term for Unicode codepoint. That is, a integer id for the character.

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		// print the char and its type
		fmt.Printf("%q, %T\n", c, c)
	}
}

// '♥', int32
// ' ', int32
// '😂', int32

[see Golang: Rune]

Golang

Examples

Reference