Elisp: Add alt Attribute to Image Tags

By Xah Lee. Date: . Last updated: .

This page shows a example of using emacs's regex to update HTML image tags on all files in a directory.

Problem

For all HTML image tags of the form:

<img src="my_cat.png" alt="" width="832" height="513">

Add a value to the “alt” attribute. The value should be the image file name, without file extension, and LOW LINE char _ converted to space. The result should be like this:

<img src="my_cat.png" alt="my cat" width="832" height="513">

This needs to be done for about 100 files inside a dir and subdir.

Solution

List and Mark Files in Subdirectories

Alt+x find-dired, then give the dir name, then give -name "*html". The result is all HTML files in that dir and subdir.

Now, mark the files you want, by Alt+x dired-mark-files-regexp% m】. Then give the pattern \.html. This marks all HTML files.

Dired Query Replace by Regexp

To do regexp replace on dired marked files, Alt+x dired-do-query-replace-regexp. 〔see Emacs: Interactive Find Replace Text in Directory

The Search Pattern

The next job is to give regex search pattern. This is simple:

<img src="\([^"]+?\)" alt="" width="\([0-9]+\)" height="\([0-9]+\)">

This regex captures the file name, the width and height.

〔see Emacs: Regular Expression

Using Elisp Expression for Replacement String

Since emacs 22, it allows you to give a elisp expression for the replacement, by using this syntax for the replacement string: \, elisp_expression.

The heart of this task is to write the elisp function that gives us the replacement string, where the alt part is the transformed version of the file name. This is surprisingly simple too. Here's the lisp expression we need:

(concat
 "<img src=\""
 (match-string 1)
 "\" alt=\""
 (replace-regexp-in-string ".png" ""
    (replace-regexp-in-string "_" " " (match-string 1)))
 "\" width=\""
 (match-string 2)
 "\" height=\""
 (match-string 3)
 "\">"
 )

The match-string simply give us the matched values. The interesting part is the replace-regexp-in-string we used to generate the value for alt. First, we replace “_” to space, then we delete the “.png”. That's all there is to it.

Finally, we call dired-do-query-replace-regexpQ】 in the dired buffer. 〔see Emacs: Interactive Find Replace Text in Directory

Without emacs, the above operation might take a hour or two and is tedious and error prone. With expertise in Perl or Python scripting, the problem is lack of interactive see-and-do. With emacs, the whole operation is less than 5 minutes.

Advantage of Interactive Regex Replace on Multiple Files

Suppose you are given a task where hundreds of valid HTML files in a dir needs to be converted to valid XHTML. Note that XHTML has a slightly different syntax. For example, all tags such as <p> and <li> now needs to be closed. Tags like <img>, <hr>, <br> need to be like <img … />, <hr/>, <br/>. Also, tags are now case sensitive, so you need to lower case them. Also, image tags now must be wrapped inside a container tag, such as <div>. The DTD also needs to be changed, and there are many style oriented tags that needs to be transformed.

This task seems daunting. You could try a Perl script in one shot, but it would probably take you days to code it correctly, and if your script has a parsing or regex error, it'll delete parts of your files without you knowing it. You could do a trial and error approach by regex replacement experimentally one at a time. Still, your script goes batch. If you make a mistake, you'll have to revert all your files. With mastery of emacs, you can do the above transform using regex find replace one by one, interactively and safely, saving your time some 10 fold.

See also: Elisp: Find Replace Multiple String Pairs

Function as Replacement String