Emacs: Remove Accent Marks 🚀

By Xah Lee. Date: . Last updated: .

Here's a emacs command that removes accent marks, or, convert some Unicode characters into ASCII. (aka Zap Gremlins)

For example:

put this in your Emacs Init File:

(defun xah-asciify-text (&optional Begin End)
  "Remove accented letters in current line or selection.
e.g. café → cafe.

URL `http://xahlee.info/emacs/emacs/emacs_zap_gremlins.html'
Version: 2018-11-12 2021-09-17 2022-05-02"
  (interactive)
  (let ((xcharMap
          [
           ["ß" "ss"]
           ["á\\|à\\|â\\|ä\\|ā\\|ǎ\\|ã\\|å\\|ą\\|ă\\|ạ\\|ả\\|ả\\|ấ\\|ầ\\|ẩ\\|ẫ\\|ậ\\|ắ\\|ằ\\|ẳ\\|ặ" "a"]
           ["æ" "ae"]
           ["ç\\|č\\|ć" "c"]
           ["é\\|è\\|ê\\|ë\\|ē\\|ě\\|ę\\|ẹ\\|ẻ\\|ẽ\\|ế\\|ề\\|ể\\|ễ\\|ệ" "e"]
           ["í\\|ì\\|î\\|ï\\|ī\\|ǐ\\|ỉ\\|ị" "i"]
           ["ñ\\|ň\\|ń" "n"]
           ["ó\\|ò\\|ô\\|ö\\|õ\\|ǒ\\|ø\\|ō\\|ồ\\|ơ\\|ọ\\|ỏ\\|ố\\|ổ\\|ỗ\\|ộ\\|ớ\\|ờ\\|ở\\|ợ" "o"]
           ["ú\\|ù\\|û\\|ü\\|ū\\|ũ\\|ư\\|ụ\\|ủ\\|ứ\\|ừ\\|ử\\|ữ\\|ự"     "u"]
           ["ý\\|ÿ\\|ỳ\\|ỷ\\|ỹ"     "y"]
           ["þ" "th"]
           ["ď\\|ð\\|đ" "d"]
           ["ĩ" "i"]
           ["ľ\\|ĺ\\|ł" "l"]
           ["ř\\|ŕ" "r"]
           ["š\\|ś" "s"]
           ["ť" "t"]
           ["ž\\|ź\\|ż" "z"]
           [" " " "]       ; thin space etc
           ["–" "-"]       ; dash
           ["—\\|一" "--"] ; em dash etc
           ])
         (xp1 (if Begin Begin
                (if (region-active-p)
                    (region-beginning)
                  (line-beginning-position))))
         (xp2 (if End End
                (if (region-active-p)
                    (region-end)
                  (line-end-position)))))
    (let ((case-fold-search t))
      (save-restriction
        (narrow-to-region xp1 xp2)
        (mapc
         (lambda (xpair)
           (goto-char (point-min))
           (while (re-search-forward (elt xpair 0) (point-max) t)
             (replace-match (elt xpair 1))))
         xcharMap)))))

(defun xah-asciify-string (String)
  "Returns a new string. e.g. café → cafe.
See `xah-asciify-text'
Version: 2015-06-08"
  (with-temp-buffer
      (insert String)
      (xah-asciify-text (point-min) (point-max))
      (buffer-string)))

[see Accent Marks: Trema, Umlaut, Macron, Circumflex]

( thanks to robert_nagy for adding chars)

Accumulator vs Parallel Programing

This problem makes a good parallel programing exercise. See: Parallel Programing Exercise: asciify-string.

Alternative Solution with “iconv” or perl

Yuri Khan and Teemu Likonen suggested using the “iconv” shell command. See man iconv. Here's Teemu's code.

(defun asciify-string (string)
"Convert STRING to ASCII string.
For example:
“passé” becomes “passe”"
;; Code originally by Teemu Likonen
  (with-temp-buffer
    (insert string)
    (call-process-region (point-min) (point-max) "iconv" t t nil "--to-code=ASCII//TRANSLIT")
    (buffer-substring-no-properties (point-min) (point-max))))

Julian Bradfield suggested Perl. Here's his one-liner, it removes chars with accent marks.

perl -e 'use encoding utf8; use Unicode::Normalize; while ( <> ) { $_ = NFKD($_); s/\pM//g; print; }'

http://groups.google.com/group/comp.emacs/msg/8d58b6e9b2bd07fd

Though, it would be nice to have a pure elisp solution, because “iconv” is not in Windows or Mac OS X as of .