Linux: Text Processing: grep, cat, awk, uniq

By Xah Lee. Date: . Last updated: .

This page is a basic tutorial on using Linux shell's text processing tools. They are especially useful for processing lines.

Get Lines: grep

grep is the most important command. You should master it.

How to show matching lines?

# show lines containing xyz in myFile
grep 'xyz' myFile
# show lines containing xyz in all files ending in html in current dir top level files
grep 'xyz' *html

How to use grep for all files in a dir?

# show matching lines in dir and subdir, file name ending in html
grep -r 'xyz' --include='*html' ~/web

Here's what the options mean:

How to use grep without regex?

Use the option -F. (F means “Fixed string”)

# search ruby source files that contains  .* literally
grep -F '.*' *rb

This is useful when you want to search complicated string in source code, such as *@$.*#+-/\|`.

If your string is really complicated, you can put it in a file, and use the option --file=my_pattern_filename for the search text. Example:

# search js source code in dir and all subdirs. The regex is stored in file named myPattern.txt
grep -r --file=myPattern.txt --include=*js .

Most Useful Grep Options

Options for Pattern String


# print lines not matching a string, for all files ending in “log”
grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP”
grep -P 'png HTTP|jpg HTTP' *log

Options for File Selection

Output Options

More Grep Examples

# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count.

grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt

grep -r --include='*html' -F 'http://' ~/web | grep -v -P '|||' > xx.txt

text columns, awk, sort, unique, sum column …

How to show only nth column in a text file?

# print the 7th column. (columns are separated by spaces by default.)
cat myFile | awk '{print $7}'

For delimiter other than space, for example tab, use -F option. Example:

# print 12th atd 7th column, Tab is the separator
cat myFile | awk -F\t '{print $12 , $7}'

Alternative solution is to use the cut utility, but it does not accept regex as delimeters. So, if you have column separated by different number of spaces, “cut” cannot do it.

How to show only uniq lines in a file?

sort myFile -u


sort myFile | uniq

To prepend the line with a count of repetition, use sort myFile | uniq -c

How to sum up the 2nd column in a file?

awk '{sum += $2} END {print sum}' file_name → sum the 2nd column in a file.

How to show only first few lines of a huge file?

head file_name → show first n lines of a file.

head -n 100 file_name → show first 100 lines of a file.

tail file_name → show the last n lines of a file.

head -n 100 file_name → show last 100 lines of a file.

Sort by Number, by Field

Linux: Sort Lines

Processing Multiple Files

Linux: Traverse Directory: find, xargs

If you have a question, put $5 at patreon and message me.