Linux Shell Text Processing Tutorial: grep, cat, awk, sort, uniq, …

, , …,

This page is a basic tutorial on using Linux shell's text processing tools. They are especially useful for processing lines.

Get Lines: grep

grep is the most important command. You should master it.

How to show only certain lines that contains a text pattern?

# show lines containing xyz in myFile
grep 'xyz' myFile
# show lines containing xyz in all files ending in html
grep 'xyz' *html

How to use grep for all files in a dir?

Use -r for all subdirectories. Use --include='*html' to match file name. Example:

grep -r 'xyz' --include='*html' dirname

This will apply grep to all files ending in “html” in a directory dirname and all subdirectories.

How to use grep for exact string? (how to turn off regex)

Use the option -F. Example:

# search perl source files for the string “href\s*=\s*"([^"]+)".*>” literally
grep -F 'href\s*=\s*"([^"]+)".*>' *pl

This is useful when you want to search complicated string in source code.

If your string is really complicated, you can put it in a file, and use the option --file=pattern filename for the search text. Example:

# search emacs lisp source code in dir and all subdirs. The search pattern is stored in file named myPattern.txt
grep -r --file=myPattern.txt --include=*el .

Most Useful Grep Options

Options for Pattern String

Examples:

# print lines not matching a string, for all files ending in “log”
grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP”
grep -P 'png HTTP|jpg HTTP' *log

Options for File Selection

Output Options

More Grep Examples

# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count.

grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt

grep -r --include='*html' -F 'http://' ~/web | grep -v -P 'google.com|twitter.com|reddit.com|wikipedia.org' > xx.txt

text columns, awk, sort, unique, sum column …

How to show only nth column in a text file?

# print the 7th column. (columns are separated by spaces by default.)
cat myFile | awk '{print $7}'

For delimiter other than space, for example tab, use -F option. Example:

# print 12th atd 7th column, Tab is the separator
cat myFile | awk -F\t '{print $12 , $7}'

Alternative solution is to use the cut utility, but it does not accept regex as delimeters. So, if you have column separated by different number of spaces, “cut” cannot do it.

How to show only uniq lines in a file?

sort myFile -u

or

sort myFile | uniq

To prepend the line with a count of repetition, use sort myFile | uniq -c

How to sum up the 2nd column in a file?

awk '{sum += $2} END {print sum}' myFile

How to show only first few lines of a huge file?

head myFile. If you want to see first n lines, use head -n 100 myFile. If you want to see the bottom of a file, use “tail”.

Sort by Number, by Field

See: Linux: sort Examples.

Processing Multiple Files

See: Linux: “find” Command Examples.

blog comments powered by Disqus