Emacs Lisp: Count Lines, Words, Chars

By Xah Lee. Date: . Last updated: .

A little emacs lisp exercise. Writing command to count words.

In emacs 23, there's count-lines, but no command to count words or characters. (count-words is now builtin in emacs 24. [see Emacs 24 (Released 2012-06)])

Here's how to write it.

(defun my-count-words-region (posBegin posEnd)
  "Print number of words and chars in region."
  (interactive "r")
  (message "Counting …")
  (save-excursion
    (let (wordCount charCount)
      (setq wordCount 0)
      (setq charCount (- posEnd posBegin))
      (goto-char posBegin)
      (while (and (< (point) posEnd)
                  (re-search-forward "\\w+\\W*" posEnd t))
        (setq wordCount (1+ wordCount)))

      (message "Words: %d. Chars: %d." wordCount charCount)
      )))

How It Works

The function has this skeleton:

(defun my-count-words-region (pos1 pos2)
  "…"
  (interactive "r")
  ;   )

the (interactive "r") means emacs will automatically fill your dummy variables {pos1, pos2} by region beginning/end positions. (region positions are integers) [see Emacs Lisp: Mark, Region, Active Region] [see Emacs Lisp: Interactive Form]

The next part of the function is this:

(save-excursion
 (let (var1 var2 …))
 (setq var1 …)
 (setq var2 …)
 …
)

The let is lisp's way to have a block of local variables.

The (save-excursion …) will run its body, then restore the cursor position and mark position. We need it because in the code we are going to move cursor around. When the command is finished, the cursor will be placed back to the position when user started the command.

Now, to count the char, it is just the length of the beginning and ending position of the region. So, it is simple, like this:

(setq charCount (- posEnd posBegin))

Now, we move the char to beginning of region, like this: (goto-char posBegin). The next part, count the words, like this:

(while (and (< (point) posEnd)
                  (re-search-forward "\\w+\\W*" posEnd t))
        (setq wordCount (1+ wordCount)))

The (< (point) posEnd) is for checking that the cursor havn't reached the end of region yet.

The (re-search-forward "\\w+\\W*" posEnd t) means keep moving the cursor forward by regex search a word pattern. The “posEnd” argument there means don't search beyond the end of region. And the “t” there means don't report error if no more found.

search-forward and re-search-forward are one of the top 10 most useful functions in elisp for text processing. If you are not familiar with them, lookup their doc string (with describe-function).

So, the above “while” block, basically means keep moving the cursor and count words, until the cursor is at the end of region.

Finally, the program just print out the result, by:

(message "Words: %d. Chars: %d." wordCount charCount)

Note

The code shown on this page count words by emacs's syntax table, because the regex for word \\w+ is dependent on syntax table. In emacs, each character is classified into one or more categories. For example, the English alphabets are in the “word” class, punctuations characters are in “punctuation” class, etc. The current syntax table often depends on the major mode. Syntax Tables (ELISP Manual)

The disadvantage of syntax table is that, the result is unpredictable, dependent what the current major mode (and any minor mode or lisp code can change it). For example, this file (at this moment), is 1325 words when in “Fundamental” mode, but 1316 words when in “text-mode”. (863 by unix “wc” command.)

count-words-region-or-line

Here's a version that will count the current line, if there is no text selection.

(defun xah-count-words-region-or-line ()
  "Print number of words and chars in text selection or line.
In emacs 24, you can use `count-words'."
  (interactive)
  (let (p1 p2)
    (if (region-active-p)
        (progn (setq p1 (region-beginning))
               (setq p2 (region-end)))
      (progn (setq p1 (line-beginning-position))
             (setq p2 (line-end-position))))
    (save-excursion
      (let (wCnt charCnt)
        (setq wCnt 0)
        (setq charCnt (- p2 p1))
        (goto-char p1)
        (while (and (< (point) p2) (re-search-forward "\\w+\\W*" p2 t))
          (setq wCnt (1+ wCnt)))
        (message "Words: %d. Chars: %d." wCnt charCnt)))))

Introduction to Programming in Emacs Lisp by Robert J Chassell

Note: “my-count-words-region” is largely from Introduction to Programming in Emacs Lisp Buy at amazon by Robert J Chassell.

This book is bundled with emacs since version 22. To view it in emacs, Alt+x infoCtrl+h i】, then click on the “Emacs Lisp Intro”.

I was reading it sometimes in 2005. That tutorial is for people who never programed before. It was quite frustrating to read, because for every thing you learn, you have to scan some 10 pages of things you already know about programing, such as the meaning of {variables, assignment, syntax, etc}. In the end, i didn't really read that book. This function is about the only thing i got out of it.