Elisp: Transform HTML Tags from “span” to “b”

By Xah Lee. Date: . Last updated: .

This page shows a simple practical elisp script for HTML tag transformation.

Problem

I want transform the HTML tag

<span class="w">…</span>

to

<b>…</b>

, for over a hundred files. Also, print a report of the changes.

Solution

Here's outline of steps.

  1. Open the file. Use regex to search the span markup.
  2. Make the replacement.
  3. Add the replacement to a list, for later report.
  4. Repeat the above until no more found.
  5. Use a dir traverse function to apply the above to every file. 〔see Elisp: Walk Directory, List Files
  6. When done, print the list of changes.

Here's the code:

;; -*- coding: utf-8 -*-
;; 2011-07-18
;; replace <span class="w">…</span> to <b>…</b>
;;
;; do this for all files in a dir.

(setq inputDir "~/web/vocabulary/" ) ; dir should end with a slash

(setq changedItems '())

(defun my-process-file (fPath)
  "Process the file at FPATH"
  (let (myBuff myWord)
    (setq myBuff (find-file fPath))

    (widen) (goto-char (point-min)) ;; in case buffer already open

    (while (re-search-forward "<span class=\"w\">\\([^<]+?\\)</span>" nil t)
      (setq myWord (match-string 1))
      (when (< (length myWord) 15) ; a little double check in case of possible mismatched tag
        (replace-match (concat "<b>" myWord "</b>" )  t)
        (setq changedItems (cons (substring-no-properties myWord) changedItems ) )
        ) )

    ;; close buffer if there is no change. Else leave it open.
    (when (not (buffer-modified-p myBuff)) (kill-buffer myBuff) )
    ) )

(require 'find-lisp)

(setq make-backup-files t)
(setq case-fold-search nil)
(setq case-replace nil)

(let (outputBuffer)
  (setq outputBuffer "*xah span.w to b replace output*" )
  (with-output-to-temp-buffer outputBuffer
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.html$"))
    (print changedItems)
    (princ "Done deal!")
    )
  )

Here's the output: elisp_batch_html_tag_transform_bold_output.txt.

There are over 1k changes. The output is extremely useful because i can just take a few seconds to glance at the output to know there are no errors. Errors are possible because whenever using regex to parse HTML, a missing tag in HTML or even a unexpected nested tag, can mean disaster.

The code is simple. If you don't understand it, see: