Golang: String as Chars

By Xah Lee. Date: . Last updated: .

One annoying thing about golang is that you have to constantly deal/convert string, byte slice, and rune slice.

they are the same thing in 3 different formats.

String is a nice way to deal with short sequence, of bytes or characters. Everytime you operate on string, such as find replace string or take substring, a new string is created. This is very inefficient if string is huge, such as file content. [see Golang: String]

Byte slice is just like string, but mutable. i.e. you can modify each byte or character. This is very efficient for working with file content, either as text file, binary file, or IO stream from networking. [see Golang: Slice]

Rune slice is like byte slice, except that each index is a character instead of a byte. This is best if you work with text files that have lots non-ASCII characters, such as Chinese text or math formulas ∑ or text with emoji ♥ . [see Golang: Rune]

[see ASCII Table]

Here's common solutions working with golang string as character sequence.

Convert String To/From Byte Slice

String to byte slice: []byte(str)

package main

import "fmt"

func main() {

	var x = "abc→"

	// convert string to byte slice
	var bs = ([]byte)(x)

	fmt.Printf("%v\n", bs) // [97 98 99 226 134 146]

}

Byte slice to string: string(byteslice)

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to string
	var str = string(bs)

	fmt.Printf("%v\n", str) // abc→

	// print type
	fmt.Printf("%T\n", str) // string

}

Convert String To/From Rune Slice

String to rune slice: []rune(str)

package main

import "fmt"

func main() {

	var str = "abc→"

	// convert string to rune slice
	var rs = []rune(str)

	fmt.Printf("%v\n", rs) // [97 98 99 8594]

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Rune slice to string: string(runeslice):

package main

import "fmt"

func main() {

	var rs = []rune{'a', 'b', 'c', '→'}

	// convert rune slice to string
	var str = string(rs)

	fmt.Printf("%#v\n", str) // "abc→"

	// print type
	fmt.Printf("%T\n", str) // string

}

Convert Byte Slice To/From Rune Slice

To convert between Byte Slice and Rune Slice, first convert it to string.

[]rune(string(byteslice))

package main

import "fmt"

func main() {

	var bs = []byte{"a"[0], "b"[0], "c"[0], 0xE2, 0x86, 0x92}
	// 0xE2, 0x86, 0x92 is the utf8 encoding for →

	// convert byte slice to rune slice
	var rs = []rune(string(bs))

	for _, v := range rs {
		fmt.Printf("%c", v)
	}
	// abc→

	fmt.Printf("\n")

	// print type
	fmt.Printf("%T\n", rs) // []int32

}

Number of Characters

To count the number of character, there are few ways:

Use import "unicode/utf8"

utf8.RuneCount(byteSlice) → return the number of characters in byteSlice.

utf8.RuneCountInString(string) → returns the number of character in string. (character here means Unicode codepoint, aka rune)

Or convert it to rune slice, then call len, e.g. len([]rune("I ♥ U"))

package main

import "fmt"
import "unicode/utf8"

func main() {
    var x = "I ♥ U"

    // number of bytes
    fmt.Printf("%v\n", len(x)) // 7

    // number of characters
    fmt.Printf("%v\n", utf8.RuneCountInString(x)) // 5
}

Substring by Character Index

To get a substring with proper character boundaries, convert it to rune slice first. Like this:

package main

import "fmt"

func main() {

	// string of unicode
	var x = "♥😂→★🍎"

	// convert to rune slice
	var y = []rune(x)

	// take a slice from index 2 to 3
	var z = y[2:4]

	// print as chars
	fmt.Printf("%q\n", z) // ['→' '★']

	// print in go syntax
	fmt.Printf("%#v\n", z) // []int32{8594, 9733}

}

Given Byte Index that Start a Character, Find Its Char Index

given a index (of a char start byte) of a string (or byte slice), find the corresponding rune (char start) index.

solution:

utf8.RuneCount(byteSlice[0,index])

or

utf8.RuneCountInString(textStr[0,index])

package main

import "fmt"
import "unicode/utf8"

// chinese text (or any text containing non-ASCII)
var x = "中文和英文"

// 6 is the start of the char 和
var i = 6

// we want to show user the char position

func main() {

	fmt.Printf("position of 和 is: %v\n", utf8.RuneCountInString(x[0:i])) // 2
	// position of 和 is: 2

}

Given a Random Byte Index, Find the Index that Start a Char

given index of a byte slice, how to find the byte index that starts a character before the byte? (the byte slice may contain unicode)

Solution:

import "unicode/utf8"

and

for !utf8.RuneStart(textBytes[index]) { index-- }

Sample code:

package main

import "fmt"
import "unicode/utf8"

var x = "中文"
// chinese

var i = 4

func main() {

	fmt.Printf("%q\n", x[i]) // '\u0096'
	// result is a byte inside unicode byte sequence

	// set index to the index that begins a char
	for !utf8.RuneStart(x[i]) {
		i--
	}

	fmt.Printf("%v\n", i) // 3
	// the index that begins a unicode char is 3

	fmt.Printf("%q\n", x[i:len(x)]) // "文"
	// now u can extra substring properly
}

Loop Thru Character in String

for i, c := range string {…} → go thru characters in string. i is the index (with respect to bytes), c is the character.

package main

import "fmt"

func main() {
    const x = "abc♥ 😂d"
    for i, c := range x {
        fmt.Printf("%v %q\n", i, c)
    }
}

// 0 'a'
// 1 'b'
// 2 'c'
// 3 '♥'
// 6 ' '
// 7 '😂'
// 11 'd'

if you don't need the index, do:

for _, c := range string {…}

package main

import "fmt"

func main() {
    const x = "♥ 😂"
    for _, c := range x {
        fmt.Printf("%q, %U\n", c, c)
    }
}

// '♥', U+2665
// ' ', U+0020
// '😂', U+1F602

Note: when you loop thru string by range, each character in string is basically turned into a “rune” type, which is golang's term for Unicode codepoint. That is, a integer id for the character.

package main

import "fmt"

func main() {
    const x = "♥ 😂"
    for _, c := range x {
        // print the char and its type
        fmt.Printf("%q, %T\n", c, c)
    }
}

// '♥', int32
// ' ', int32
// '😂', int32

[see Golang: Rune]

If you have a question, put $5 at patreon and message me.

Golang

  1. Compile, Run
  2. Source Encoding
  3. Package, Import
  4. Comment
  5. Print
  6. String
  7. String Backslash Escape
  8. Rune
  9. Variable
  10. Zero Value
  11. Constant
  12. If Then Else
  13. Switch/Case
  14. Loop
  15. Basic Types
  16. Array
  17. Slice
  18. Map
  19. Struct
  20. Function
  21. String as Chars
  22. regexp
  23. Read File
  24. Write to File
  25. Walk Dir
  26. Check File Exist
  27. System Call
  28. Get Script Path
  29. Pointer
  30. Defer
  31. Random Number

Examples

  1. Match Any Regexp
  2. Find Replace
  3. Validate Links
  4. Generate Sitemap

Reference

  1. Go Spec