Emacs: Convert Full-Width/Half-Width Punctuations
This page shows commands to convert to/from Full-Width/Half-Width characters. (全角 半角 转换)
If you type Chinese or Japanese mixed with English, then often you'll have mixed Asian/Western punctuations, and is hard to fix manually.
- . ↔ 。 (U+3002: IDEOGRAPHIC FULL STOP)
- , ↔ , (U+FF0C: FULLWIDTH COMMA)
- ? ↔ ? (U+FF1F: FULLWIDTH QUESTION MARK)
- ; ↔ ; (U+FF1B: FULLWIDTH SEMICOLON)
[see Unicode Full-Width Characters]
Convert English Chinese Punctuation
(defun xah-convert-english-chinese-punctuation (@begin @end &optional @to-direction) "Convert punctuation from/to English/Chinese characters. When called interactively, do current line or selection. The conversion direction is automatically determined. If `universal-argument' is called, ask user for change direction. When called in lisp code, *begin *end are region begin/end positions. *to-direction must be any of the following values: 「\"chinese\"」, 「\"english\"」, 「\"auto\"」. See also: `xah-remove-punctuation-trailing-redundant-space'. URL `http://xahlee.info/emacs/emacs/elisp_convert_chinese_punctuation.html' Version 2015-10-05" (interactive (let ($p1 $p2) (if (use-region-p) (progn (setq $p1 (region-beginning)) (setq $p2 (region-end))) (progn (setq $p1 (line-beginning-position)) (setq $p2 (line-end-position)))) (list $p1 $p2 (if current-prefix-arg (ido-completing-read "Change to: " '( "english" "chinese") "PREDICATE" "REQUIRE-MATCH") "auto" )))) (let ( ($input-str (buffer-substring-no-properties @begin @end)) ($replacePairs [ [". " "。"] [".\n" "。\n"] [", " ","] [",\n" ",\n"] [": " ":"] ["; " ";"] ["? " "?"] ; no space after ["! " "!"] ;; for inside HTML [".</" "。</"] ["?</" "?</"] [":</" ":</"] [" " " "] ] )) (when (string-equal @to-direction "auto") (setq @to-direction (if (or (string-match " " $input-str) (string-match "。" $input-str) (string-match "," $input-str) (string-match "?" $input-str) (string-match "!" $input-str)) "english" "chinese"))) (save-excursion (save-restriction (narrow-to-region @begin @end) (mapc (lambda ($x) (progn (goto-char (point-min)) (while (search-forward (aref $x 0) nil "noerror") (replace-match (aref $x 1))))) (cond ((string-equal @to-direction "chinese") $replacePairs) ((string-equal @to-direction "english") (mapcar (lambda (x) (vector (elt x 1) (elt x 0))) $replacePairs)) (t (user-error "Your 3rd argument 「%s」 isn't valid" @to-direction))))))))
Remove Punctuation Trailing Redundant Spaces
Here's helpful command to remove redundant spaces after punctuation.
- In English text, the convention is to have 1 space after punctuation (sometimes 2, after the Full Stop sign).
- In Chinese text, the convention is to have no space after punctuation.
(defun xah-remove-punctuation-trailing-redundant-space (@begin @end) "Remove redundant whitespace after punctuation. Works on current line or text selection. When called in emacs lisp code, the *begin *end are cursor positions for region. See also `xah-convert-english-chinese-punctuation'. URL `http://xahlee.info/emacs/emacs/elisp_convert_chinese_punctuation.html' version 2015-08-22" (interactive (if (use-region-p) (list (region-beginning) (region-end)) (list (line-beginning-position) (line-end-position)))) (require 'xah-replace-pairs) (xah-replace-regexp-pairs-region @begin @end [ ;; clean up. Remove extra space. [" +," ","] [", +" ", "] ["? +" "? "] ["! +" "! "] ["\\. +" ". "] ;; fullwidth punctuations [", +" ","] ["。 +" "。"] [": +" ":"] ["? +" "?"] ["; +" ";"] ["! +" "!"] ["、 +" "、"] ] "FIXEDCASE" "LITERAL"))
These commands are useful for Twitter too, for saving a few character in Twitter's character limit. Because, English punctuation takes 2 char each, while Chinese version needs just one char, the space is included in the punctuation symbol.
Convert Half-Width Full-Width Characters
This command convert all English letters and digits and punctuations, from/to half-width and full-width.
[see Unicode Full-Width Characters]
(defun xah-convert-fullwidth-chars (@begin @end &optional @to-direction) "Convert ASCII chars to/from Unicode fullwidth version. Works on current line or text selection. The conversion direction is determined like this: if the command has been repeated, then toggle. Else, always do to-Unicode direction. If `universal-argument' is called first: no C-u → Automatic. C-u → to ASCII C-u 1 → to ASCII C-u 2 → to Unicode When called in lisp code, @begin @end are region begin/end positions. @to-direction must be any of the following values: 「\"unicode\"」, 「\"ascii\"」, 「\"auto\"」. See also: `xah-remove-punctuation-trailing-redundant-space'. URL `http://xahlee.info/emacs/emacs/elisp_convert_chinese_punctuation.html' Version 2018-08-02" (interactive (let ($p1 $p2) (if (use-region-p) (progn (setq $p1 (region-beginning)) (setq $p2 (region-end))) (progn (setq $p1 (line-beginning-position)) (setq $p2 (line-end-position)))) (list $p1 $p2 (cond ((equal current-prefix-arg nil) "auto") ((equal current-prefix-arg '(4)) "ascii") ((equal current-prefix-arg 1) "ascii") ((equal current-prefix-arg 2) "unicode") (t "unicode"))))) (let* ( ($ascii-unicode-map [ ["0" "0"] ["1" "1"] ["2" "2"] ["3" "3"] ["4" "4"] ["5" "5"] ["6" "6"] ["7" "7"] ["8" "8"] ["9" "9"] ["A" "A"] ["B" "B"] ["C" "C"] ["D" "D"] ["E" "E"] ["F" "F"] ["G" "G"] ["H" "H"] ["I" "I"] ["J" "J"] ["K" "K"] ["L" "L"] ["M" "M"] ["N" "N"] ["O" "O"] ["P" "P"] ["Q" "Q"] ["R" "R"] ["S" "S"] ["T" "T"] ["U" "U"] ["V" "V"] ["W" "W"] ["X" "X"] ["Y" "Y"] ["Z" "Z"] ["a" "a"] ["b" "b"] ["c" "c"] ["d" "d"] ["e" "e"] ["f" "f"] ["g" "g"] ["h" "h"] ["i" "i"] ["j" "j"] ["k" "k"] ["l" "l"] ["m" "m"] ["n" "n"] ["o" "o"] ["p" "p"] ["q" "q"] ["r" "r"] ["s" "s"] ["t" "t"] ["u" "u"] ["v" "v"] ["w" "w"] ["x" "x"] ["y" "y"] ["z" "z"] ["," ","] ["." "."] [":" ":"] [";" ";"] ["!" "!"] ["?" "?"] ["\"" """] ["'" "'"] ["`" "`"] ["^" "^"] ["~" "~"] ["¯" " ̄"] ["_" "_"] [" " " "] ["&" "&"] ["@" "@"] ["#" "#"] ["%" "%"] ["+" "+"] ["-" "-"] ["*" "*"] ["=" "="] ["<" "<"] [">" ">"] ["(" "("] [")" ")"] ["[" "["] ["]" "]"] ["{" "{"] ["}" "}"] ["(" "⦅"] [")" "⦆"] ["|" "|"] ["¦" "¦"] ["/" "/"] ["\\" "\"] ["¬" "¬"] ["$" "$"] ["£" "£"] ["¢" "¢"] ["₩" "₩"] ["¥" "¥"] ] ) ($reverse-map (mapcar (lambda (x) (vector (elt x 1) (elt x 0))) $ascii-unicode-map)) ($stateBefore (if (get 'xah-convert-fullwidth-chars 'state) (get 'xah-convert-fullwidth-chars 'state) (progn (put 'xah-convert-fullwidth-chars 'state 0) 0 ))) ($stateAfter (if (eq $stateBefore 0) 1 0 ))) ;"0\\|1\\|2\\|3\\|4\\|5\\|6\\|7\\|8\\|9\\|A\\|B\\|C\\|D\\|E\\|F\\|G\\|H\\|I\\|J\\|K\\|L\\|M\\|N\\|O\\|P\\|Q\\|R\\|S\\|T\\|U\\|V\\|W\\|X\\|Y\\|Z\\|a\\|b\\|c\\|d\\|e\\|f\\|g\\|h\\|i\\|j\\|k\\|l\\|m\\|n\\|o\\|p\\|q\\|r\\|s\\|t\\|u\\|v\\|w\\|x\\|y\\|z" ;; (message "before %s" $stateBefore) ;; (message "after %s" $stateAfter) ;; (message "@to-direction %s" @to-direction) ;; (message "real-this-command %s" real-this-command) ;; (message "real-last-command %s" real-last-command) ;; (message "this-command %s" this-command) ;; (message "last-command %s" last-command) (let ((case-fold-search nil)) (xah-replace-pairs-region @begin @end (cond ((string-equal @to-direction "unicode") $ascii-unicode-map) ((string-equal @to-direction "ascii") $reverse-map) ((string-equal @to-direction "auto") (if (eq $stateBefore 0) $reverse-map $ascii-unicode-map ) ;; 2018-08-02 this doesn't work when using smex ;; (if (eq last-command this-command) ;; (progn ;; (message "%s" "repeated") ;; (if (eq $stateBefore 0) ;; $reverse-map ;; $ascii-unicode-map )) ;; (progn ;; (message "%s" "not repeated") ;; $ascii-unicode-map)) ;; ) (t (user-error "Your 3rd argument 「%s」 isn't valid" @to-direction))) t t )) (put 'xah-convert-fullwidth-chars 'state $stateAfter)))