Emacs Lisp Text Processing: find-file vs with-temp-buffer
This page gives a detailed speed comparion of using emacs lisp's find-file
vs with-temp-buffer
for processing 5 thousand files.
Summary
Using find-file
to open 5565 files, takes 63 seconds.
Using with-temp-buffer
, 16 seconds. (4 times faster.)
Conclusion: when doing batch text processing of thousands of files, don't use find-file
, use with-temp-buffer
or with-temp-file
instead. (use the latter when you need to make changes to the file.)
The reason for this speed up is that find-file
will load a major mode, which does syntax coloring (which is relatively
super slow), keep track of undo, etc. (see bottom for detail)
Detail
Here's the test that you can run.
For this testing purpose, the input dir used is the HTML version of GNU Emacs Lisp Reference Manual. A total of ~900 files.
If you like to run the test, you can download it at:
with-temp-buffer Version
Here's the actual code.
;; 2011-12-20 ;; Speed test script ;; ;; What the script do: ;; Creates a [sitemap.xml] file. ;; Open each files in a dir, if the file doesn't contain the word “refresh”, add a entry of the file to [sitemap.xml]. ;; Must end in a slash. Must not start with ~ (setq webroot "/Users/h3/web/xahlee_org/emacs_manual/elisp/") ;; ------------------------ (defun my-process-file (fPath destBuff) "Process the file at fullpath FPATH. Write result to buffer DESTBUFF." (with-temp-buffer (insert-file-contents fPath) (goto-char (point-min)) (when (not (search-forward "refresh" nil "noerror")) (with-current-buffer destBuff (insert "<url><loc>") (insert (concat "http://example.org/" (substring fPath (length webroot)))) (insert "</loc></url>\n") )) ) ) ;; ------------------------ (print (benchmark-run 1 ;; create sitemap buffer (let (filePath sitemapBuf) (setq filePath (concat webroot "sitemap.xml")) (setq sitemapBuf (find-file filePath)) (erase-buffer) (set-buffer-file-coding-system 'unix) (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> ") (require 'find-lisp) (mapc (lambda (xx) (my-process-file xx sitemapBuf)) (find-lisp-find-files webroot "\\.html$")) (insert "</urlset>") (save-buffer) ) )) (message "%s" "Yay, Done!")
To run it:
- Copy and Paste and save this file as
speedtest_temp-buff.el
. - Change the “webroot” variable to a directory on your computer that has lots HTML files.
- In terminal, run it with “--script”, like this:
emacs --script ~/speedtest_temp-buff.el
. This won't load your init files. You init files might contain hooks or other things that effect the speed. - When the program is finished, it'll create a file named
sitemap.xml
in the same dir of “webroot”. - The
benchmark-run
's output will be printed on the screen.
What the script does is very simple:
- Open each HTML file in a directory
- If the file contains the string “refresh”, then do nothing.
- Else, add the file name as a entry to a file
sitemap.xml
. (this file is created by the script.)
This version gets file content by using a temp buffer, like this:
(with-temp-buffer (insert-file-contents fPath) ;; do processing here )
The script is a simplified version of generating a sitemap. 〔see Elisp: Create Sitemap〕
find-file Version
The find-file
version is identical except the my-process-file
function. Like this:
(defun my-process-file (fPath destBuff) "Process the file at fullpath FPATH. Write result to buffer DESTBUFF." (let (myBuffer) (setq myBuffer (find-file fPath)) (goto-char (point-min)) (when (not (search-forward "refresh" nil "noerror")) (with-current-buffer destBuff (insert "<url><loc>") (insert (concat "http://example.org/" (substring fPath (length webroot)))) (insert "</loc></url>\n") )) (kill-buffer myBuffer) ) )
Speed difference
Here's the test results (all timing are in seconds, rounded).
Script Version | Script Running Time | Garbage Collection | Garbage Collection Time | Actual Time |
---|---|---|---|---|
find-file | 8.2 | 80 | 0.8 | 9 |
with-temp-buffer | 1.46 | 16 | 0.16 | 1.5 |
Do Not use find-file or write-file
find-file
, write-file
, or any function that visits a file has many unwanted side-effects, and it can be up to 40 times slower (i tested before). Here's example of side-effects:
- It keeps undo info.
- It syntax color the buffer.
- It displays the file. (very slow if you have
global-linum-mode
, etc.) - It may have tons of hooks added by others. (
desktop-save-mode
,recentf-mode
,tabbar-mode
,snippet-mode
(yasnippet), etc.) - It may do backup.
Misc Notes
Jon Snader (jcs) suggested using the “benchmark-run” for timing report and garbage collection info. http://irreal.org/blog/?p=400. The “benchmark-run” is tremendously useful. Thanks Jon.
Stefan Monnier 〔http://www.iro.umontreal.ca/~monnier/〕 suggested that turning off “vc-handled-backends” might speed up the “find-file” version slightly. Source groups.google.com comp.lang.lisp
Thanks to [Trey Jackson https://plus.google.com/116944459982600529677/about] for a major correction on this article. In my previous report, the timing difference was a factor of 45, because i had various personal hooks.