Emacs Lisp: Text Processing, Transforming Page Tag
This page shows a example of using emacs lisp for text processing. It is used to update HTML page's navigation bar.
Problem
You have hundreds of HTML pages that have a nav bar like this:
<div class="pages">Goto Page: <a href="1.html">1</a>, <a href="2.html">2</a>, <a href="3.html">3</a>, … </div>
It looks like this in browser (with CSS):

This is the page navigation bar. Note that the page contains a link to itself.
You want to remove the self-link. The result should look like this:
<div class="pages">Goto Page: 1, <a href="2.html">2</a>, <a href="3.html">3</a>, … </div>

Solution
Here are the steps we need to do for each file:
- open the file.
- move cursor to the beginning of page navigation string.
- move cursor to file name.
- call
sgml-delete-tag
to remove the anchor tag. (sgml-delete-tag
is fromhtml-mode
) - save file.
- close buffer.
We begin by writing a test code to process a single file.
(defun my-process-file-navbar (fPath) "Modify the HTML file at fPath." (let (fName myBuffer) (setq fName (file-name-nondirectory fPath)) (setq myBuffer (find-file fPath)) (widen) ; in case buffer already open, and narrow-to-region is in effect (goto-char (point-min)) (search-forward "<div class=\"pages\">Goto Page:") (search-forward fName) (sgml-delete-tag 1) (save-buffer) (kill-buffer myBuffer))) (my-process-file-navbar "~/test1.html")
For testing, create files {test1.html
, test2.html
, test3.html
} in a temp directory for testing this code. Place the following content into each file:
<div class="pages">Goto Page: <a href="test1.html">XYZ Overview</a>, <a href="test2.html">Second Page</a>, <a href="test3.html">Summary Z</a></div>
(note that the link text may not be 1, 2, 3.)
The elisp code above is very basic.
find-file
- Open file. Files (ELISP Manual)
search-forward
- Move cursor. Buffers (ELISP Manual)
kill-buffer
- Close buffer. Searching and Matching (ELISP Manual) .
sgml-delete-tag
is from html-mode
(which is automatically loaded when a HTML file is opened).
sgml-delete-tag
deletes the opening/closing tags tags the cursor is on.
All we need to do now is to feed it a bunch of file paths.
To get the list of files that contains the page-nav tag, we can simply use linux's “find” and “grep”, like this:
find . -name "*\.html" -exec grep -l '<div class="pages">' {} \;
From the output, we can use string-rectangle and query-replace, to construct the following code:
(mapc 'my-process-file-navbar [ "~/web/cat1html" "~/web/dog.html" "~/web/something.html" "~/web/xyz.html" ] )
The mapc
is a lisp idiom of looping thru a
list
or
vector. The first argument is a function. The function will be applied to every element in the list. The single quote in front of the function is necessary. It prevents the symbol “my-process-file-navbar” from being evaluated (as a expression of a variable).
Emacs ♥