Linux: Text Processing: grep, cat, awk, uniq

By Xah Lee. Date: . Last updated: .

This page is a basic tutorial on using Linux shell's text processing tools. They are especially useful for processing lines.

Get Lines: grep

grep is the most important command. You should master it.

Show Matching Lines

# show lines containing xyz in myFile
grep 'xyz' myFile
# show lines containing xyz in all files ending in html in current dir top level files
grep 'xyz' *html

Grep for All Files in a Dir

# show matching lines in dir and subdir, file name ending in html
grep -r 'xyz' --include='*html' ~/web

Here's what the options mean:

grep without regex

Use the option -F. (F means “Fixed string”)

# search ruby source files that contains  .* literally
grep -F '.*' *rb

This is useful when you want to search complicated string in source code, such as *@$.*#+-/\|`.

If your string is really complicated, you can put it in a file, and use the option --file=my_pattern_filename for the search text. Example:

# search js source code in dir and all subdirs. The regex is stored in file named myPattern.txt
grep -r --file=myPattern.txt --include=*js .

Most Useful Grep Options

Options for Pattern String

Examples:

# print lines not matching a string, for all files ending in “log”
grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP”
grep -P 'png HTTP|jpg HTTP' *log

Options for File Selection

Output Options

More Grep Examples

# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count.

grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt

grep -r --include='*html' -F 'http://' ~/web | grep -v -P 'google.com|twitter.com|reddit.com|wikipedia.org' > xx.txt

text columns, awk, sort, unique, sum column …

show only nth column in a text file

# print the 7th column. (columns are separated by spaces by default.)
cat myFile | awk '{print $7}'

For delimiter other than space, for example tab, use -F option. Example:

# print 12th atd 7th column, Tab is the separator
cat myFile | awk -F\t '{print $12 , $7}'

Alternative solution is to use the cut utility, but it does not accept regex as delimeters. So, if you have column separated by different number of spaces, “cut” cannot do it.

remove duplicate lines

sort myFile -u

or

sort myFile | uniq

To prepend the line with a count of repetition, use sort myFile | uniq -c

sum up 2nd column

awk '{sum += $2} END {print sum}' file_name → sum the 2nd column in a file.

show only first few lines of a huge file

head file_name → show first n lines of a file.

head -n 100 file_name → show first 100 lines of a file.

tail file_name → show the last n lines of a file.

head -n 100 file_name → show last 100 lines of a file.

Sort by Number, by Field

Linux: Sort Lines

Processing Multiple Files

Linux: Traverse Directory: find, xargs

Linux Shell Basics

  1. Get System Info
  2. Shell Basics
  3. grep, cat, awk, uniq
  4. sort
  5. find, xargs
  6. diff Files/Dir
  7. dir size: du
  8. dir tree
  9. tar gzip bzip2 xz 7zip rar zip
  10. wget, curl, GET, HEAD
  11. rsync
  12. Install Packages

Sys Admin

  1. Job Control
  2. ps
  3. top
  4. htop
  5. RAM stat
  6. Users and Groups
  7. File Permission
  8. Opened Files: lsof
  9. shutdown, sleep

Bash/Terminal

  1. Bash Keys, Terminal Keys, Man Page Keys
  2. Bash Prompt Setup
  3. Bash Color Prompt
  4. .bashrc, .profile, .bash_profile
  5. Virtual Console
  6. Terminal Control Sequence Keys
  7. Reset Terminal
  8. tmux
  9. man page
  10. Bash Manual in Chapters
  11. BASH Shell Misc Tips
  12. Log Terminal Session

Linux Desktop

  1. Most Useful GUI Apps
  2. Set Default App
  3. Shell Commands for GUI Apps
  4. Image Viewers
  5. Music Players
  6. Move File to Trash by Command
  7. X11 Selection and Clipboard
  8. How to Switch to LXDE, Xfce
  9. LXDE Keyboard Shortcuts
  10. LXDE/Openbox, Change Keyboard Shortcuts
  11. LXDE Set Key Repeat Rate
  12. LXDE/OpenBox, Disable Mouse Scroll Wheel Hide Window
  13. Xfce Keyboard Shortcuts
  14. Xfce Good Themes
  15. xmonad Keybinding
  16. How to Restart X11
  17. Why Tiling Window Manager Sucks
  18. Standard Fonts
  19. How to Install Font

If you have a question, put $5 at patreon and message me.