Topic Contributors
creator avatar  
mreschke
Matthew Reschke
Site Developer
Created: Dec 4th, 2010
Updated: Sep 6th, 2011
File
Download Selected (zip)
Download Selected (tar.gz)
Edit
Select All
Select None
View
Detail
Detail Preview
Icons
Preview
Show Hidden
Hide Hidden
Full Manager
Reset Defaults
Open In New Tab
Open In New Window
List Archive Contents
Download File
Open
Download Folder (as .zip)
Awk Command
Post # 227 permalink Topic #225 by mreschke on 2010-12-04 21:32:15 (viewed 758 times)

See also Sed Command and Grep Command

Awk is a very powerfull line editor filter or small programming language available in all unix style operating systems. This page contains small tutorial awk scripts and snippets.

Info[-][- -][++]

  1. Columns are automatically assigned $1 - $n. And $0 is the entire line
  2. Awk has some built in functions:
    1. gsub(r,s) globally replaces r with s within the line ($0)
    2. index(s,t) returns first position of string t in s (or 0 if not present)
    3. length(s) returns the number of characters in s
    4. match(s,r) tests weather s contains a sub-string matched with r
    5. split(s,a,fs) splits string s into array a using field separator fs
    6. substr(s,p,n) returns sub-string of s of length n starting at position p
    7. Others: sin(), cos(), exp(), sqrt(), rand()...
  3. Awk has some built in variables
    1. NF is number of fields (num of columns in that row)
    2. NR is the current line number
    3. FNR is the current line number for the current file (if using awk on multiple files)
    4. FS set this for field separator (defaults to space)
    5. RS is the record separator (so row separator, defaults to new line)

One Liners[-][- -][++]

  1. Print columns 1 and 5 of lines beginning with /dev from the df command
    1. df | awk '/^\/dev/{ print $1 ": " $5 }'
    2. You can see we are are filtering the output first (lines beginning with /dev), then acting on those lines. So pattern and action. By only using pattern we essentially have the grep command. Like cat /etc/passwd | grep pulse is the same as awk '/pulse/' /etc/passwd
  2. Print entire lines in /etc/passwd where userid >= 500
    1. awk -F: '$3 >= 500' /etc/passwd
    2. the -F: means we are setting the field separator to a :
  3. Print only the names from /etc/passwd
    1. awk -F: '{ print $1 }' /etc/passwd
  4. Add line numbers to output
    1. awk '{ print NR, $0 }' /etc/passwd
    2. Here the , after the NR is used as a space, if you wanted xx-row (instead of xx row) you could use awk '{ print NR-$0 }' /etc/passwd
    3. Remember $0 is entire line, we could just print the first column (the usernames) with line number like this awk -F: '{ print NR, $1 }' /etc/passwd
  5. Print every 10th line
    1. awk 'NR%10 == 0' /etc/passwd (or %2) for every other line
  6. Reverse the order of input (like reverse sort)
    1. awk '{ s = $0 "\n" s } END { print s }' /etc/passwd
  7. Center the output at column 40
    1. awk '{ printf "%" int(40+length($0)/2) "s\n", $0 }' /etc/passwd
  8. Print non blank lines
    1. awk 'NF' /etc/passwd
  9. Print lines longer than 80 chars
    1. awk 'length($0) > 80' /etc/passwd
  10. This replaces tty with TERMINAL only on the first line of output, all other lines are still displayed, but not altered
    1. who | awk 'NR==1 { gsub("tty", "TERMINAL"); print } NR!=1'
  11. Count words in a file
  12. awk '{ total = total + NF }; END {print total}' /etc/passwd

Scripts[-][- -][++]

Show the highest userid number from /etc/passwd[-][- -][++]

Items on the BEGIN line happen before any lines are processed, so we set the standard FS (field separator) character to a :, then set a maxuid variable to 0. Then if the 3rd column (the uuid in /etc/passwd) is greater that maxuid set it to the new maxuid and set the maxname variable to the lines username which is column 1. Items on the END line happen after all lines are processed.

maxuid.awk
#!/usr/bin/awk -f
BEGIN { FS = ":"; maxuid = 0 }
$3 > maxuid { maxuid = $3; maxname = $1 }
END { print maxname ": " maxuid }
 

Since we added the awk shebang in the file, we can just run it with ./maxuid.awk /etc/passwd, if we hadn't added the shebang we could run it with awk like awk -f maxuid.awk /etc/passwd

Treat multi lines as one line[-][- -][++]

Awk usually works on one line, but what if we have data like this:

/tmp/test.txt
Michael
Jackson
555-5551

Kevin
Jones
555-5552

We want to treat 'Michael Jackson 555-5551' as one line etc... To do this we alter the FS field separator character and the RS record separator character. So the FS should be a new line (\n) and RS should be an empty line (="")

To display

Code Snippet
Michael 555-5551
Kevin 555-5552

we use awk 'BEGIN { RS = ""; FS = "\n" } { print $1,$3 }' /tmp/test.txt
or only display Kevins use awk 'BEGIN { RS = ""; FS = "\n" } $2 == "Smith" { print $1,$3 }' /tmp/test.txt

Get each process time in seconds[-][- -][++]

ps -ef shows processes and how long they have been running in the hh:mm:ss format, lets print just the second column (the PID) and the running time but convert time into seconds

seconds.awk
{ split($7, hms, ":")
    secs = (hms[1] * 3600) + (hms[2] * 60) + hms[3]
    printf "%6d %5d\n", $2, secs
}
 

And run it with ps -ef | awk -f seconds.awk. You can see we have hms which is an array variable, and secs an integer variable. The printf function just prints in a nice tabed style

Get number of processes and total memory usage per user[-][- -][++]

Notice that count is not a built in function it's an array.

totalmem.awk
$1 != "USER" { count[$1]++; tot[$1] += $6 }
END {
    for (user in tot)
        printf "%8s: %4d %8d\n", user, count[user], tot[user]
}
 

And run it with ps aux | awk -f totalmem.awk

Resources[-][- -][++]

  1. Most came from Linux Format Magazine LXF 138 page 62.