Why Emacs is Still so Useful Today

By Xah Lee. Date: . Last updated: .

This essay tells a little anecdote about why emacs is still essential and superior among today's tools.

Problem

Today, i need to rework some of the markup in a HTML page Wallpaper groups: References and Related Web Sites. The page is a bibliography. Each entry had markup like this:

<div class="entry">
<span class="w">Title</span>: <b>Regular Polytopes</b>
<span class="w">Author</span>: H.S.M. Coxeter<br>
<span class="w">Publisher</span>: Dover<br>
<span class="w">Date</span>: 1973.<br>
<b>Notes:</b> 3rd ed.<br>
<span class="w">Subject</span>: Symmetry<br>
<span class="w">Comment</span>: A standard reference…<br>
</div>

This page was written in 1997, and the markup is slightly updated a few times in the past decade. The markup is not optimal. For example, all the title, author, etc info are marked up by the same <span class="w">…</span> The style of “span.w” is just bold, used across my website. A better one, would be something like this:

<div class="entry">
<div class="title">Regular Polytopes</div>
<div class="author">H.S.M. Coxeter</div>
<div class="publisher">Dover</div>
<div class="date">1973</div>
Notes: 3rd ed.<br>
<div class="subject">Symmetry</div>
<div class="comment">A standard reference…</div>
</div>

together with CSS, like this:

div.entry {margin:1ex; padding:1ex}
div.entry > div.author {color:green}
div.entry > div.author:before {content:"Author: ";color:black;font-weight:bold}
div.entry > div.title {font-style:italic;color:#4d378b}
div.entry > div.title:before {content:"Title: ";color:black;font-weight:bold; font-style:normal}
div.entry > div.publisher:before {content:"Publsher: ";color:black;font-weight:bold}
div.entry > div.date:before {content:"Date: ";color:black;font-weight:bold}
div.entry > div.subject:before {content:"Subject: ";color:black;font-weight:bold}
div.entry > div.comment:before {content:"Comment: ";color:black;font-weight:bold}

would make it much better. Because, each of the Title, Author, etc are semantically marked. This means that machines can trivially process it and understand it, and the styling can be easily changed, on each of the Title, Author, etc. (Such a markup is called HTML Microformat, a step towards semantic web.)

There are 34 such entries. So, how does one go about this little task? If you look at the markup, they are fairly regular. So, perhaps you can write a little Python script to process it. However, if the markup is not 100% regular, the scripting approach won't work. Some entry may not have a date line, some are journal and not book so may be missing publisher, some have a line about library location… Each time you run your script, your script will chock on little exceptions, then you loop back to fixing the script.

So, unless your text to be processed is a valid format with a grammar and semantic specification, the script approach will likely end up taking longer than manually doing multiple passes of find replace. If you take the time to make your text regular first then write the script to process it, that probably won't save you time.

This is where emacs comes in. Emacs has several find/replace commands, by regex or by plain string, on a text selection, or entire file (buffer), or multiple files. The beauty is that it works all in a interactive way, with the option to proceed in batch when you see a clear pattern.

Text-Soup Situation And Lumberjack Tasks

Text-Soup Situation

For a coder or sys admin, vast majority of time the text editing he needs to perform are of this text-soup nature. Sure, if your code is in some strict environment, such as coding Java in a company in a big project with strict code structure, you might use some IDE's built-in feature to “refactor”. However, vast majority of texts that exist in the world are not in some such “strict” format or environment, and the ways you need to process them do not fall into some nicely categorized transformation. (as a illustration, XML became widely adopted precisely because it avoids being a text-soup.) All those unix startup shell scripts, make file scripts, sys admin scripts, software installation scripts, all are of this text-soup nature. Almost all software's programing language source code, are of this text-soup nature. All unix config files, soup. All man page source code, texinfo, TeX/LaTeX source code, soup. All publications, journals, magazines, essays, books, their source text are text-soup. And if we look at the web, probably 99.99% of HTML that exist today are text-soup, and they are not even valid HTML.

So, essentially all text in digital form are text-soup. The only notable exception are texts in a format with a simple grammar and at least some degree of semantic grammar, such as XML.

Lumberjack-Tasks

Even if your text is in some strict format, the task you need to do on the text will 99% likely not be some known transformation, so cannot be automated. For example, if you have hundreds of “HTML 4.01 transitional” valid HTML, and you need to transform them into valid “XHTML 4.01 strict” format, it will not be possible with a script without some type of Artificial Intelligence that makes human decisions. Because, the syntax of HTML is specified, but not semantically. (For some detailed example, see: HTML Correctness and Validators).

Today, perhaps a significant percentage of the world's texts are HTML, and vast majority are not valid HTML. Plain text is worse. In general, for automatic text processing, the text needs to be in a format with SEMANTIC spec (of course, with a syntax grammar to begin with). The more semantic grammar, the narrower the scope, the more specific the context, the easier is automation. (but in general, even such cannot eliminate the lumberjack-tasks situation, because a format with a semantic grammar is just one point of view. Most of the time, real-world situation on text processing tasks isn't nicely academically defined.)

Why Emacs But Not Other Editors

So, majority of daily tasks on text can't be automated by some scripting or IDE built-in tools. What about other text editors?

The power in emacs is that it has a integrated scripting language designed for text processing, and its commands are all oriented by keyboard operation, together with keyboard macros that can record and repeat commands. This means, for the daily lumberjack-task on text-soup, they became semi-automatic. The integrated elisp covers parts that can be automated by machine, the interactive nature and key macros covers the part that cannot be completely automated. Vast majority of text editors don't have a integrated scripting language. Vast majority are GUI based, so isn't suitable for tasks that are text processing and programing oriented.

(The one that has the same nature of emacs is vi. vi, together with its unix shell tools environment, can become useful tool as emacs for dealing with text-soup tasks.)

For some examples of actual tasks and solutions done with emacs and not possible with other text editors or IDEs, see: