Elisp: Replace Digits by Subscript

By Xah Lee. Date: 2011-10-13. Last updated: 2011-10-19.

Here's a interesting elisp coding exercise. I have this elisp functon:

(defun replace-digits-by-subscript (string)
  "Replace digits by Unicode subscript characters in STRING.
For example, 「103 and 42」 ⇒ 「₁₀₃ and ₄₂」."
  (let ((myStr string))
    (setq myStr (replace-regexp-in-string "0" "₀" myStr))
    (setq myStr (replace-regexp-in-string "1" "₁" myStr))
    (setq myStr (replace-regexp-in-string "2" "₂" myStr))
    (setq myStr (replace-regexp-in-string "3" "₃" myStr))
    (setq myStr (replace-regexp-in-string "4" "₄" myStr))
    (setq myStr (replace-regexp-in-string "5" "₅" myStr))
    (setq myStr (replace-regexp-in-string "6" "₆" myStr))
    (setq myStr (replace-regexp-in-string "7" "₇" myStr))
    (setq myStr (replace-regexp-in-string "8" "₈" myStr))
    (setq myStr (replace-regexp-in-string "9" "₉" myStr))
    myStr
    ))

You might think it's a bit verbose, or inefficient. But i can't think of way to improve it. Can you come up with a better version?

Solution

[Rob Shinn ~~https://plus.google.com/101972882215318886468/about~~] suggested that the subscript chars can be obtained from the digit chars by a trip to their character set codepoints. I implemented his idea like this:

(defun replace-digits-by-subscript2 (string)
  (let ((myStr string) (ii 0))
    (while (< ii 10)
      (setq myStr (replace-regexp-in-string (char-to-string (+ ii 48)) (char-to-string (+ ii 48 8272)) myStr) )
      (setq ii (1+ ii))
      )
    myStr
    ))

This is a good solution, though a bit hack, because it depends on the codepoints in a charset. In this case, it can be carried out because their codepoint happens to have a constant difference. The char “0” has Unicode codepoint 48, the char “1” has Unicode codepoint 49, etc. The char “₀” has codepoint 8320, the char “₁” has codepoint 8321, etc. They have a constant difference of 8272.

Independently, Jon Snader (aka jcs) gave the following solution (irreal.org), similar in idea but without the loop. Here's the code:

(defun replace-digits-by-subscript3 (string)
  (replace-regexp-in-string "[0-9]"
    (lambda (v) (format "%c" (+ (string-to-number v) 8320))) string) )

This code is a excellent use of format. But more importantly, new to me is that:

• The second argument to replace-regexp-in-string can be a function. Elisp will feed this function the matched text and use the function's return value as replacement string.

Independently, Anonymous wrote this solution:

(defun replace-digits-by-subscript4 (string)
  (replace-regexp-in-string "[0-9]"
    (lambda (arg) (string (aref "₀₁₂₃₄₅₆₇₈₉" (string-to-number arg)))) string) )

This is a excellent solution, probably best of all possible solutions. It is very clever, yet doesn't rely on charset. It relies on this fact:

• The subscript chars can be indexed by the corresponding digits. e.g. (string (aref "₀₁₂₃₄₅₆₇₈₉" 3)).

The use of aref is also new to me. Salute to Anonymous!

Note that aref is for extracting elements of array (e.g. string, vector). elt is for extracting elements of any sequence (e.g. string, vector, list). nth is just for list.

Sequences Arrays Vectors (ELISP Manual)