Golang: String, Byte Slice, Rune Slice

By Xah Lee. Date: . Last updated: .

One annoying thing about golang is that you have to constantly convert string, byte slice, and rune slice.

They are the same thing in 3 different formats.

String is a nice way to deal with short sequence, of bytes or characters. Everytime you operate on string, such as find replace string or take substring, a new string is created. This is very inefficient if string is huge, such as file content. [see Golang: String]

Byte slice is just like string, but mutable. i.e. you can modify each byte or character. This is very efficient for working with file content, either as text file, binary file, or IO stream from networking. [see Golang: Slice]

Rune slice is like byte slice, except that each index is a character instead of a byte. This is best if you work with text files that have lots non-ASCII characters, such as Chinese text or math formulas ∑ or text with emoji ♥ . [see Golang: Rune]

[see ASCII Table]

Here's common solutions working with golang string as character sequence.

String To Byte Slice

[]byte(str)

package main

import "fmt"

func main() {

	var x = "abc→"

	// convert string to byte slice
	var bs = ([]byte)(x)

	fmt.Printf("%v\n", bs) // [97 98 99 226 134 146]

}

Byte Slice to String

string(byteslice)

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to string
	var str = string(bs)

	fmt.Printf("%v\n", str) // abc→

	// print type
	fmt.Printf("%T\n", str) // string

}

String To Rune Slice

[]rune(str)

package main

import "fmt"

func main() {

	var str = "abc→"

	// convert string to rune slice
	var rs = []rune(str)

	fmt.Printf("%v\n", rs) // [97 98 99 8594]

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To String

string(runeslice)

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// convert rune slice to string
	var str = string(rs)

	fmt.Printf("%#v\n", str) // "abc→"

	// print type
	fmt.Printf("%T\n", str) // string

}

Byte Slice To Rune Slice

[]rune(string(byteslice))

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to rune slice
	var rs = []rune(string(bs))

	for _, v := range rs {
		fmt.Printf("%c", v)
	}
	// abc→

	fmt.Printf("\n")

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune Slice To Byte Slice

[]byte(string(runeslice))

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// print type
	fmt.Printf("%T\n", rs) // []int32

	// convert rune slice to byte slice
	var bs = []byte(string(rs))

	fmt.Printf("%#v\n", bs)
	// []byte{0x61, 0x62, 0x63, 0xe2, 0x86, 0x92}

	fmt.Printf("%d\n", bs)
	// [97 98 99 226 134 146]

	fmt.Printf("%q\n", bs)
	// "abc→"

}

Number of Characters

To count the number of character, there are few ways:

Use import "unicode/utf8"

utf8.RuneCount(byteSlice) → return the number of characters in byteSlice.

utf8.RuneCountInString(string) → returns the number of character in string. (character here means Unicode codepoint, aka rune)

Or convert it to rune slice, then call len, e.g. len([]rune("I ♥ U"))

package main

import "fmt"
import "unicode/utf8"

func main() {
	var x = "I ♥ U"

	// number of bytes
	fmt.Printf("%v\n", len(x)) // 7

	// number of characters
	fmt.Printf("%v\n", utf8.RuneCountInString(x)) // 5
}

Substring by Character Index

To get a substring with proper character boundaries, convert it to rune slice first. Like this:

package main

import "fmt"

func main() {

	// string of unicode
	var x = "♥😂→★🍎"

	// convert to rune slice
	var y = []rune(x)

	// take a slice from index 2 to 3
	var z = y[2:4]

	// print as chars
	fmt.Printf("%q\n", z) // ['→' '★']

	// print in go syntax
	fmt.Printf("%#v\n", z) // []int32{8594, 9733}

}

Given Byte Index that Start a Character, Find Its Char Index

given a index (of a char start byte) of a string (or byte slice), find the corresponding rune (char start) index.

solution:

utf8.RuneCount(byteSlice[0,index])

or

utf8.RuneCountInString(textStr[0,index])

package main

import "fmt"
import "unicode/utf8"

// chinese text (or any text containing non-ASCII)
var x = "中文和英文"

// 6 is the start of the char 和
var i = 6

// we want to show user the char position

func main() {

	fmt.Printf("position of 和 is: %v\n", utf8.RuneCountInString(x[0:i])) // 2
	// position of 和 is: 2

}

Given a Random Byte Index, Find the Index that Start a Char

given index of a byte slice, how to find the byte index that starts a character before the byte? (the byte slice may contain unicode)

Solution:

import "unicode/utf8"

and

for !utf8.RuneStart(textBytes[index]) { index-- }

Sample code:

package main

import "fmt"
import "unicode/utf8"

var x = "中文"

// chinese

var i = 4

func main() {

	fmt.Printf("%q\n", x[i]) // '\u0096'
	// result is a byte inside unicode byte sequence

	// set index to the index that begins a char
	for !utf8.RuneStart(x[i]) {
		i--
	}

	fmt.Printf("%v\n", i) // 3
	// the index that begins a unicode char is 3

	fmt.Printf("%q\n", x[i:len(x)]) // "文"
	// now u can extra substring properly
}

Loop Thru Character in String

for i, c := range string {…} → go thru characters in string. i is the index (with respect to bytes), c is the character.

package main

import "fmt"

func main() {
	const x = "abc♥ 😂d"
	for i, c := range x {
		fmt.Printf("%v %q\n", i, c)
	}
}

// 0 'a'
// 1 'b'
// 2 'c'
// 3 '♥'
// 6 ' '
// 7 '😂'
// 11 'd'

if you don't need the index, do:

for _, c := range string {…}

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		fmt.Printf("%q, %U\n", c, c)
	}
}

// '♥', U+2665
// ' ', U+0020
// '😂', U+1F602

Note: when you loop thru string by range, each character in string is basically turned into a “rune” type, which is golang's term for Unicode codepoint. That is, a integer id for the character.

package main

import "fmt"

func main() {
	const x = "♥ 😂"
	for _, c := range x {
		// print the char and its type
		fmt.Printf("%q, %T\n", c, c)
	}
}

// '♥', int32
// ' ', int32
// '😂', int32

[see Golang: Rune]

If you have a question, put $5 at patreon and message me.

Golang

  1. Compile, Run
  2. Source Encoding
  3. Package, Import
  4. Comment
  5. Print
  6. String
  7. Print String
  8. String Backslash Escape
  9. Rune
  10. Variable
  11. Zero Value
  12. Constant
  13. If Then Else
  14. Switch/Case
  15. Loop
  16. Basic Types
  17. Array
  18. Slice
  19. Map
  20. Struct
  21. Function
  22. Closure
  23. Pointer
  24. String, Byte Slice, Rune Slice
  25. regexp
  26. Read File
  27. Write to File
  28. Walk Dir
  29. Check File Exist
  30. System Call
  31. Get Script Path
  32. Defer
  33. Random Number

Examples

  1. Match Any Regexp
  2. Find Replace
  3. Validate Links
  4. Generate Sitemap

Reference

  1. Go Spec