Problems of grep in Emacs
This page describes problems of calling unix grep in emacs, and why a emacs lisp version is more flexible and superior.
Unix grep util is quite useful. In emacs, it's even better. Because, emacs has commands that act as wrapper to unix grep, with the advantage that the output is colored, and file names are linked. 〔see Emacs: Search Text in Directory〕
However, calling unix grep inside emacs has some problems, either directly by a shell-command
command, or indirectly by emacs lisp wrapper commands {grep
, rgrep
, lgrep
, grep-find
, etc}.
External Program Problem
On Windows, calling a external unix util is major problem, because by default there is no unix grep installed. This makes a large part of critical emacs features not usable.
User has to install either Cygwin or others, then, emacs goes thru several layers, thru Cygwin, Windows. Unicode, or text that contain quote chars or escapes, almost always gets screwed along the way. 〔see Installing Cygwin Tutorial〕
Unix Shell Quote Escape Problem
Today, i want to grep with this regex height="[0-9]+" />
for HTML image tags.
On my Microsoft Windows machine with Cygwin installed, when calling emacs's grep
command, i gave this to the prompt:
grep -ie -nH 'height\=\"[0\-9]\+\" \/\>' *html
It doesn't work.
I tried many variations: with the backslash in different places, double backslash, single/double quotes. Sometimes the error is about Cygwin detecting DOS style slash. Sometimes it silently creates a file of 0 length named '
in your directory, due to your bad escapes.
Dependence on Familiarity of Unix Shell Syntax
Unix utils syntax is incomprehensible to none-unix users, but emacs depends on them for basic features such as searching text in a dir. Users not familiar with the cryptic shell syntax won't be able to use it. For example, emacs grep
command prompts this: grep -nH -e ▮
.
Also, to list files in emacs dired, the only command for that is find-dired
.
〔see Linux: Walk Dir: find, xargs〕
These depends on familiarity of unix find/xargs commands. For example, it prompts “Run find (with args):”, where the user is supposed to type something like -name "*html"
〔see Emacs: Inconsistency of Search Features〕
Regex Not Compatible to Emacs Regex
The regex is different from Emacs: Regular Expression, such as when Alt+x query-replace-regexp
.
Unicode String Problem
When using unix grep on MS Windows for processing Unicode text, there are many encoding problems. On Windows with Cygwin, the char encoding in the stream gets messed up thru the various layers.
For example, grep fails when searching for │ (U+2502). This is calling Cygwin grep from emacs on Windows. It's too complex to figure out exactly why it fails.
With Unicode, you have to deal with unix environment variable “locale”, emacs's own various encoding settings, MS Window's locale and “codepage” setup. There's complex interplay of environment variables among {emacs, emacs's inferior shells, Cygwin, Windows}. 〔see Emacs in Microsoft Windows FAQ〕
Even on Linux terminal, shell tools have issues with Unicode. See: Linux Shell Util uniq Unicode Bug
Problem with Long Search String
Sometimes you want to search a string that's part of source code.
It may be long, containing 300 hundred chars or more.
(e.g. a snippet of HTML that contains JavaScript and span multi-lines.) You could put your search string in a file with grep using --file=filename
, but this is not convenient.
Here's a example of a string i need to do a literal search:
<div class="chtk"><script>ch_client="polyglut";ch_width=550;ch_height=90;ch_type="mpu";ch_sid="Chitika Default";ch_backfill=1;ch_color_site_link="#00C";ch_color_title="#00C";ch_color_border="#FFF";ch_color_text="#000";ch_color_bg="#FFF";</script><script src="http://scripts.chitika.net/eminimalls/amm.js"></script></div>
Shell escape this string would be very inconvenient, and more complex when shell is used inside emacs.
Grep Not Flexible for Specifying Files in Directories
grep is not very flexible for working with all files in a directory. There's -r
option, but then you can't specify file pattern (e.g. *html
). You have to do it like this: grep -r 'xyz' --include='*html' dirname
〔see Linux: Text Processing: grep, cat, awk, uniq〕
Sometimes you need to work on a list of files, sometimes by a pattern, sometimes you want to exclude some files by list or by pattern, sometimes only the first 2 levels, or a combination of the above in a specific order. Some unix tools provide these options, sometimes by combination of tools (e.g. find/xargs), but their order and syntax is complex and tool specific. With a script in Perl, Python, elisp, it's much easier to control.
Too Many Incompatible Versions of Grep
There are too many versions and varieties of grep. The primary 2 are BSD vs GNU. Mac OS X comes with BSD versions, but some utils are GNU versions. Linuxes typically come with GNU versions. The different versions accept different options. Also, GNU grep supports a varieties of confusing regex (“--basic-regexp”, “--extended-regexp”, “--perl-regexp”.) It's too painful to figure them out and remember their details.
grep is Not Powerful Enough for Nested Syntax (e.g. HTML/XML)
Unix grep and associated tool (sort, wc, uniq, pipe, sed, awk, etc.) is not flexible, when your need is slightly more complex. For example, suppose
i need to find all occurrences of HTML “img” tag that are not wrapped by a <div>
tag. This is impossible with unix tools. (extending the limit of unix tools is how Perl was born in 1987.)
Example of a Real World Problem Using Grep Inside Emacs
Here's a concrete example of grep problem.
In my vocabulary page Wordy English — the Making of Belles-Lettres, i use the Unicode BOX DRAWINGS LIGHT VERTICAL “│” as a temp marker for processing the word list. Today i need to grep pages containing that character.
Calling Meta+x grep
in emacs with grep -inH -e "│" *html
returns a error:
-*- mode: grep; default-directory: "c:/Users/xah/web/xahlee_org/emacs/" -*- Grep started at Tue Apr 05 15:37:47 grep "│" *html warning: extra args ignored after 'grep "│\' Grep finished with no matches found at Tue Apr 05 15:37:47
Starting shell
in emacs (which runs Microsoft cmd.exe in Windows Vista) doesn't work neither. (it works fine when grepping ASCII string) Here's a session log:
Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.
c:\Users\xah\web\xahlee_org\emacs>grep "│" *html
grep "â\224? *html
It stuck there. Ctrl+c Ctrl+c doesn't get out. I had to kill the buffer.
Calling msys-shell
works. (msys-shell
is bundled with ErgoEmacs. It calls bash in MinGW, which is a subset of Cygwin port.) Here's a log:
sh-3.2$ grep "│" *html
antonymous_synonyms.html:<li> cry, decry │ you can cry, as in crying out loud, but you can also decry, by crying out loud </li>
antonymous_synonyms.html:<li> linear, rectilinear │ linear algebra, rectilinear motion. Rectilinear is the linearness of motion.</li>
…
Calling it in Cygwin Bash running inside Windows Console also works.
So, this means, the problem isn't grep
not understanding Unicode.
Something went wrong when emacs talks to Cygwin. Though, what exactly is the problem? Well, i'm not about to spend few hours to find out.
in PowerShell, it also works. For example: with this command select-string -path *.html -pattern "│"
. However, calling PowerShell thru emacs does not work.
Here's my system setup:
- I'm running ErgoEmacs 1.9.2.
- Windows Vista with latest patches.
- Cygwin installed. (too lazy to lookup version)
- MSYS of MinGW installed as part of ErgoEmacs. (too lazy to lookup version)
- Windows PowerShell installed (too lazy to lookup version)
- PowerShell emacs interface mode installed (too lazy to lookup version)
Addendum: Adding the option -P
also worked. For example: call emacs “grep” command, then give grep -inH -e -P "│" *html
. Thanks to “blandest” (gnu.emacs.help).
Emacs Lisp Solves All Problems
Here's pure emacs lisp for grep/sed: Emacs: Xah Find Replace (xah-find.el) 📦
This page started when i wrote a grep in emacs lisp: Elisp: Write grep, and people are asking why.
Emacs Modernization
- Emacs Modernization: Simple Changes Emacs Should Adopt
- Why Emacs Keys are Painful
- Emacs: Problems of the Scratch Buffer
- Emacs Modernization: Meta Key Notation
- Emacs Menu Usability Problem
- Emacs Mode Line Problem
- Emacs cua-mode Problems
- Emacs: Inconsistency of Search Features
- Problems of grep in Emacs
- Emacs: Usability Problems of Mode Documentation
- Problems of Emacs Manual
- Emacs Manual Sucks by Examples
- Emacs: kill-buffer Induces Buffer Accumulation
- Emacs Spell Checker Problems
- Emacs: Form Feed ^L Problem
- Emacs: Single Key to Delete Whole Line
- Emacs HTML Mode Sucks
- Emacs Does Not Support Viewing Images Files In Windows
- Emacs Should Adopt HTML as Texinfo Replacement
- Emacs Should Support HTML Mail
- Problems of Emacs's “man” Command
- Emacs Lisp Mode Syntax Coloring Problem
- Emacs AutoHotkey Mode Problems
- Elisp: Ban Syntax Table
- Emacs: Make elisp-index-search use Current Symbol
- Emacs GNU Texinfo Problems; Invalid HTML
- A Record of Frustration in IT Industry; Disappearing FSF URLs, 2006
- Emacs Manual Node Persistency Issues
- Emacs: dired-do-query-replace-regex Replace ALL (fixed)
- Problems of Emacs Supporting Obsolete Systems
- Elisp: Function to Copy/Delete a Dir Recursively (fixed)
- Thoughts on Common Lisp Scheme Lisp Based Emacs
- Text Editors Popularity and Market Research
- Text Editor's Cursor Movement Behavior (emacs, vi, Notepad++)
- Emacs: Usability Problems of Letter-Case Changing Commands
- Emacs Select Word Command Problem
- Emacs: Search Current Word 🚀
- Emacs fill-paragraph Problem