Emacs Lisp Problems: Trim String, Regex Match Data, Lacking Namespace

By Xah Lee. Date: . Last updated: .

This page shows a trim string emacs lisp package, and discuss emacs lisp's problems of regex match data and lacking namespace.

Emacs Lisp: Trim Whitespace in String Library

Magnar Sveen (the EmacsRocks guy) has written a elisp lib for trimming strings, such as chopping off white spaces or newline char. Basically, they are wrappers to a few builtin functions. But the lib is really nice, as they provide a set of functions with consistent interface, and you don't have to code your own. The lib is at: https://github.com/magnars/s.el

Coding string trimming functions is actually not as trivial. For example, here's one i wrote xeu_elisp_util.el at https://github.com/xahlee/xeu_elisp_util.el

(defun trim-string (string)
  "Remove white spaces in beginning and ending of STRING.
White space here is any of: space, tab, emacs newline (line feed, ASCII 10)."
(replace-regexp-in-string "\\`[ \t\n]*" "" (replace-regexp-in-string "[ \t\n]*\\'" "" string))
)

One of the questions i had to ask myself is implementation choices. Should i be using replace-regexp-in-string? Which way is faster? There's actually quite a few ways to implement this, and it is not clear which way is fast unless you spend a few hours to experiment. (assume you need to call this function tens of thousands of times.) The function i wrote above is NOT fast, because it calls replace-regexp-in-string, which is implemented in elisp, and is quite complex. (Alt+x describe-function to look at its source code.)

(Calling trim string tens of thousands of times is realistic, and happens in practice a lot. For example, processing thousands of files line by line.)

here's Magnar Sveen's implementation:

(defun s-trim-left (s)
  "Remove whitespace at the beginning of S."
  (if (string-match "\\`[ \t\n\r]+" s)
      (replace-match "" t t s)
    s))

(defun s-trim-right (s)
  "Remove whitespace at the end of S."
  (if (string-match "[ \t\n\r]+\\'" s)
      (replace-match "" t t s)
    s))

(defun s-trim (s)
  "Remove whitespace at the beginning and end of S."
  (s-trim-left (s-trim-right s)))

Regex Search and Losing Match Data

Using string-match will taint your string match data.

For example, suppose you are processing thousands of HTML files. You want to change all links from relative path to absolute URL. Suppose you Alt+x search-forward-regexp to search for the link string. You need to get the matched data. Let's say (match-string 1) is the href value, and (match-string 2) is the link text, and (match-beginning 0) is the beginning of tag, and (match-end 0) is the end tag. Now, you call (match-string 2) then apply a regex to trim white spaces. Now, all your other match data will be lost.

For safety, whenever you use any elisp regex function and want to catch the match data, you should immediately get them after the call (that is, set to variables), then, continue to do whatever you are doing. Again, this isn't very nice, as you'll need to introduce several temp variables.

I really hope that replace-regexp-in-string is written in C.

Here's a example of text processing that uses regex and requires the capture of several match data and trim string: Elisp vs Perl: Validate File Links.

Note: you can use save-match-data, but it's a lisp macro. The solution isn't better than saving matches yourself.

Emacs Lisp Problem of Lacking Namespace

Another issue of elisp is the lack of namespace. This is a major problem that prevents elisp from growing as a language.

For example, Magnar's lib is nice, i want to use it, but, should i really? If i use it, it means i'll have to put another misc lib in my elisp system in a haphazard way. In other languages such as perl, python, ruby, there's namespace and controlled way to install packages. In these langs, a library has expected paths, standard naming scheme, standard format of documentation that are embedded with source code, and usually a system to install or uninstall them (For example, perl cpan, python pip, ruby gem).

In emacs lisp, there is no namespace, and elisp packages are not managed. [see ELisp: load, load-file, autoload]

(the Emacs 24 (Released 2012-06) package system is not a elisp language library system. It's a user-land app system, similar to Debian Linux's Advanced Packaging Tool (apt-get). [see Emacs: Install Package with ELPA/MELPA])

When a language lacks namespace or standard library system, it ends up as lots of misc packages in the wild, with overlapping functionality, inconsistent interface, varying quality.

2017-06-22 Addendum

now emacs has trim string builtin. see

ELisp: Trim String

Emacs Lisp Misc Essays