ELisp: Chinese Char To Reference Link

By Xah Lee. Date: . Last updated: .

We need a command to change the character under cursor into several HTML links.

For example

becomes:

<div class="chineseXL"><span lang="zh"></span> <span class="en"><a href="https://translate.google.com/#zh-CN|en|我">Translate</a> • <a href="https://en.wiktionary.org/wiki/我">Wiktionary</a></span></div>

Appears in browser like this:

TranslateWiktionary

This is useful for writing blog on languages and linguistics.

Solution

Here's the code:

(defun xah-words-chinese-linkify ()
  "Make the Chinese character before cursor into Chinese dictionary reference links.

URL `http://xahlee.info/emacs/emacs/elisp_chinese_char_linkify.html'
Version 2020-11-24 2021-05-02"
  (interactive)
  (let (
        ($template
         "<div class=\"chineseXL\"><span lang=\"zh\">▮</span> <span class=\"en\"><a href=\"https://translate.google.com/#zh-CN|en|▮\">Translate</a> • <a href=\"https://en.wiktionary.org/wiki/▮\">Wiktionary</a></span></div>"
         )
        ($char (buffer-substring-no-properties (- (point) 1) (point))))
    (delete-char -1)
    (insert (replace-regexp-in-string "▮" $char $template))))

This is truely a time saver.

URL Encoding of Chinese

Note: technically, Chinese chars in a URL should be URL Encoded.

For example:

http://en.wiktionary.org/wiki/中

should be:

http://en.wiktionary.org/wiki/%E4%B8%AD

(A Chinese character should become bytes in hexadecimal from the char's UTF-8 encoding. The char 中's UTF-8 encoding is 3 bytes of the following hexadecimal: E4 B8 AD.)

However, i think the situation of percent encoding is a abomination. [see Problems of Symbol Congestion in Computer Languages; ASCII Jam vs Unicode] I decided to not botch my Chinese chars in URL. This does not cause practical problems.

If you want to do that, see: ELisp: URL Percent Decode/Encode

Emacs Chinese Topics