Emacs Lisp: Transform HTML Tags from “span” to “b”
This page shows a simple practical elisp script for HTML tag transformation.
I want transform the HTML tag
, for over a hundred files. Also, print a report of the changes.
Here's outline of steps.
- Open the file. Use regex to search the span markup.
- Make the replacement.
- Add the replacement to a list, for later report.
- Repeat the above until no more found.
- Use a dir traverse function to apply the above to every file. [see Emacs Lisp: Walk Directory, List Files]
- When done, print the list of changes.
Here's the code:
;; -*- coding: utf-8 -*- ;; 2011-07-18 ;; replace <span class="w">…</span> to <b>…</b> ;; ;; do this for all files in a dir. (setq inputDir "~/web/vocabulary/" ) ; dir should end with a slash (setq changedItems '()) (defun my-process-file (fPath) "Process the file at FPATH" (let (myBuff myWord) (setq myBuff (find-file fPath)) (widen) (goto-char (point-min)) ;; in case buffer already open (while (re-search-forward "<span class=\"w\">\\([^<]+?\\)</span>" nil t) (setq myWord (match-string 1)) (when (< (length myWord) 15) ; a little double check in case of possible mismatched tag (replace-match (concat "<b>" myWord "</b>" ) t) (setq changedItems (cons (substring-no-properties myWord) changedItems ) ) ) ) ;; close buffer if there is no change. Else leave it open. (when (not (buffer-modified-p myBuff)) (kill-buffer myBuff) ) ) ) (require 'find-lisp) (setq make-backup-files t) (setq case-fold-search nil) (setq case-replace nil) (let (outputBuffer) (setq outputBuffer "*xah span.w to b replace output*" ) (with-output-to-temp-buffer outputBuffer (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$")) (print changedItems) (princ "Done deal!") ) )
Here's the output: elisp_batch_html_tag_transform_bold_output.txt.
There are over 1k changes. The output is extremely useful because i can just take a few seconds to glance at the output to know there are no errors. Errors are possible because whenever using regex to parse HTML, a missing tag in HTML or even a unexpected nested tag, can mean disaster.
The code is simple. If you don't understand it, see: