Emacs: Replace Invisible Unicode Chars

By Xah Lee. Date: . Last updated: .

Here's a command that replaces invisible characters.

They happen often when copying text from Twitter etc sites.

(defun xah-replace-invisible-char ()
  "Query replace some invisible Unicode chars.
The chars replaced are:
 ZERO WIDTH NO-BREAK SPACE (65279, #xfeff)
 ZERO WIDTH SPACE (codepoint 8203, #x200b)
 RIGHT-TO-LEFT MARK (8207, #x200f)
 RIGHT-TO-LEFT OVERRIDE (8238, #x202e)
 LEFT-TO-RIGHT MARK ‎(8206, #x200e)

Search begins at buffer beginning. (respects `narrow-to-region')

URL `http://xahlee.info/emacs/emacs/elisp_unicode_replace_invisible_chars.html'
Version: 2018-09-07 2022-09-13"
  (let ((case-replace nil)
        (case-fold-search nil)
        ($p0 (point)))
    (goto-char (point-min))
    (while (re-search-forward "\ufeff\\|\u200b\\|\u200f\\|\u202e\\|\u200e\\|\ufffc" nil t)
      (replace-match ""))
    (goto-char $p0)))

See also: Emacs: Unicode Tutorial