Emacs Lisp Problems: Trim String, Regex Match Data, Lacking Namespace
This page shows a trim string emacs lisp package, and discuss emacs lisp's problems of regex match data and lacking namespace.
Emacs Lisp: Trim Whitespace in String Library
Magnar Sveen (the EmacsRocks guy) has written a elisp lib for trimming strings, such as chopping off white spaces or newline char. Basically, they are wrappers to a few builtin functions. But the lib is really nice, as they provide a set of functions with consistent interface, and you don't have to code your own. The lib is at: [2012-11-08 https://github.com/magnars/s.el ]
Coding string trimming functions is actually not as trivial. For example, here's one i wrote xeu_elisp_util.el at https://github.com/xahlee/xeu_elisp_util.el
(defun trim-string (string) "Remove white spaces in beginning and ending of STRING. White space here is any of: space, tab, emacs newline (line feed, ASCII 10)." (replace-regexp-in-string "\\`[ \t\n]*" "" (replace-regexp-in-string "[ \t\n]*\\'" "" string)) )
One of the questions i had to ask myself is implementation choices. Should i be using replace-regexp-in-string
? Which way is faster? There's actually quite a few ways to implement this, and it is not clear which way is fast unless you spend a few hours to experiment. (assume you need to call this function tens of thousands of times.) The function i wrote above is NOT fast, because it calls replace-regexp-in-string
, which is implemented in elisp, and is quite complex. (Alt+x describe-function
to look at its source code.)
(Calling trim string tens of thousands of times is realistic, and happens in practice a lot. For example, processing thousands of files line by line.)
here's Magnar Sveen's implementation:
(defun s-trim-left (s) "Remove whitespace at the beginning of S." (if (string-match "\\`[ \t\n\r]+" s) (replace-match "" t t s) s)) (defun s-trim-right (s) "Remove whitespace at the end of S." (if (string-match "[ \t\n\r]+\\'" s) (replace-match "" t t s) s)) (defun s-trim (s) "Remove whitespace at the beginning and end of S." (s-trim-left (s-trim-right s)))
Regex Search and Losing Match Data
Using string-match
will taint your string match data.
For example, suppose you are processing thousands of HTML files. You want to change all links from relative path to absolute URL. Suppose you Alt+x search-forward-regexp
to search for the link string. You need to get the matched data.
Let's say (match-string 1)
is the href value, and (match-string 2)
is the link text, and
(match-beginning 0)
is the beginning of tag, and
(match-end 0)
is the end tag.
Now, you call (match-string 2)
then apply a regex to trim white spaces. Now, all your other match data will be lost.
For safety, whenever you use any elisp regex function and want to catch the match data, you should immediately get them after the call (that is, set to variables), then, continue to do whatever you are doing. Again, this isn't very nice, as you'll need to introduce several temp variables.
I really hope that replace-regexp-in-string
is written in C.
Here's a example of text processing that uses regex and requires the capture of several match data and trim string: Elisp vs Perl: Validate File Links.
Note: you can use save-match-data
, but it's a lisp macro. The solution isn't better than saving matches yourself.
Emacs Lisp Problem of Lacking Namespace
Another issue of elisp is the lack of namespace. This is a major problem that prevents elisp from growing as a language.
For example, Magnar's lib is nice, i want to use it, but, should i really? If i use it, it means i'll have to put another misc lib in my elisp system in a haphazard way. In other languages such as perl, python, ruby, there's namespace and controlled way to install packages. In these langs, a library has expected paths, standard naming scheme, standard format of documentation that are embedded with source code, and usually a system to install or uninstall them (For example, perl cpan
, python pip
, ruby gem
).
In emacs lisp, there is no namespace, and elisp packages are not managed. [see Emacs Lisp: load, load-file, autoload]
(the Emacs 24 (Released 2012-06) package system is not a elisp language library system. It's a user-land app system, similar to Debian Linux's Advanced Packaging Tool (apt-get). [see Emacs: Install Package with ELPA/MELPA])
When a language lacks namespace or standard library system, it ends up as lots of misc packages in the wild, with overlapping functionality, inconsistent interface, varying quality.
2017-06-22 Addendum
now emacs has trim string builtin. see
Emacs Lisp Misc Essays
- Text Processing vs Structured
- Elisp coding style: let forms
- ELisp Naming Convention
- Some and Every
- What is the Function fn?
- Symbol vs String
- Meaning of Lisp List, Function Type, and Syntax Coloring
- Elisp vs Perl: Validate File Links
- Emacs Lisp: Relation of Curor Position to Begin/End of Matching Pairs in Source Code
- Text Processing: ELisp vs Perl
- Can ELisp Fix Its Namespace Problem by Copying JavaScript Practice?
- ELisp vs JavaScript
- Controversy of Common Lisp Package in ELisp
- Lisp List Problem
- Lisp-1 vs Lisp-2
- ELisp Problems: Trim String, Regex Match Data, Lacking Namespace
- Functional Programing: Function Output Should Always Have the Same Structure