Emacs Lisp: Text Processing, Transforming Page Tag

By Xah Lee. Date: . Last updated: .

This page shows a example of using emacs lisp for text processing. It is used to update HTML page's navigation bar.


You have hundreds of HTML pages that have a nav bar like this:

<div class="pages">Goto Page:
<a href="1.html">1</a>,
<a href="2.html">2</a>,
<a href="3.html">3</a>,

It looks like this in browser (with CSS):

page tag 1

This is the page navigation bar. Note that the page contains a link to itself.

You want to remove the self-link. The result should look like this:

<div class="pages">Goto Page:
<a href="2.html">2</a>,
<a href="3.html">3</a>,
page tag 2


Here are the steps we need to do for each file:

  1. open the file.
  2. move cursor to the beginning of page navigation string.
  3. move cursor to file name.
  4. call sgml-delete-tag to remove the anchor tag. (sgml-delete-tag is from html-mode)
  5. save file.
  6. close buffer.

We begin by writing a test code to process a single file.

(defun my-process-file-navbar (fPath)
  "Modify the HTML file at fPath."
  (let (fName myBuffer)
    (setq fName (file-name-nondirectory fPath))
    (setq myBuffer (find-file fPath))
    (widen) ; in case buffer already open, and narrow-to-region is in effect
    (goto-char (point-min))
    (search-forward "<div class=\"pages\">Goto Page:")
    (search-forward fName)
    (sgml-delete-tag 1)
    (kill-buffer myBuffer)))

(my-process-file-navbar "~/test1.html")

For testing, create files {test1.html, test2.html, test3.html} in a temp directory for testing this code. Place the following content into each file:

<div class="pages">Goto Page: <a href="test1.html">XYZ Overview</a>, <a href="test2.html">Second Page</a>, <a href="test3.html">Summary Z</a></div>

(note that the link text may not be 1, 2, 3.)

The elisp code above is very basic.

Open file. Files (ELISP Manual)
Move cursor. Buffers (ELISP Manual)
Close buffer. Searching and Matching (ELISP Manual) .

sgml-delete-tag is from html-mode (which is automatically loaded when a HTML file is opened).

sgml-delete-tag deletes the opening/closing tags tags the cursor is on.

All we need to do now is to feed it a bunch of file paths.

To get the list of files that contains the page-nav tag, we can simply use linux's “find” and “grep”, like this:

find . -name "*\.html" -exec grep -l '<div class="pages">' {} \;

From the output, we can use string-rectangle and query-replace, to construct the following code:

(mapc 'my-process-file-navbar

The mapc is a lisp idiom of looping thru a list or vector. The first argument is a function. The function will be applied to every element in the list. The single quote in front of the function is necessary. It prevents the symbol “my-process-file-navbar” from being evaluated (as a expression of a variable).

Emacs ♥