Scripting Clinic: Your Pre-Fab Text Processing Toolkit - Page 2
awk
awk is way fun. awk lets you pluck things out of lines of text according to their position. This example sorts out the human users from Linux system users, assuming you have stuck to a sensible user numbering scheme:
$ awk -F: '$3 > 999 { print $0}' /etc/passwd nobody:x:65534:65534:nobody:/nonexistent:/bin/sh carla:x:1000:1000:carla schroder,,,:/home/carla:/bin/bash dawns:x:1002:1002:Dawn Marie Schroder,,,,foo:/home/dawns: nikitah:x:1003:1003:Nikita Horse,,123-4567,equine:/home/nikitah: rubst:x:1004:1004:Rubs The Cat,101,,234-5678,,test test:/home/rubst:
How does awk know what the delimiter is? You tell it with the -F flag, which selects the colon as the delimiter in this example. $3 means "the third field."
On a Debian system, human users start at UID 1000. On Red Hat and SuSE, they start at 500. You can select subsets easily enough:
$ awk -F: '($3 >= 1000) &&($3 <=1050) { print $0}' /etc/passwd
OK, but suppose all you really want are the logins. This is an awk specialty; change print $0, which means "print the whole line," to print $1:
$ awk -F: '$3 > 999 { print $1}' /etc/passwd nobody carla dawns nikitah rubst
Well that's all very nice and everything, but they're in UID order. What if you want them in alphabetical order? Easy. Throw sort into the fray:
$ awk -F: '($3 >= 1000) &&($3 <=1050) { print $1}' /etc/passwd | sort
And they will be listed alphabetically. You can do all sorts of things with this, like copy and paste mass users into groups, check for duplicate logins, and generate an index of users.
sort has its own flag for sending the output to a file, don't use a pipe or a redirect, because they won't work. Use sort's -o option to name the output file:
$ awk -F: '($3 >= 1000) &&($3 <=1050) { print $1}' /etc/passwd | sort -o list1.txt
Now suppose you have the ever-so-fun chore of merging lists of logins from two different systems. sort can do this too. Take your two files containing the logins, which are already sorted, and do this:
$ sort -um list1 list2 -o merged_list
-u checks for duplicates; if it finds any, it only prints one of them. -m means merge.
Suppose you want to add line numbers? Why, all you need is nl.
nl
This shows line numbers on the screen, but does not change the file:
$ nl merged_list 1 carla 2 dawns 3 foober 4 goober 5 helen 6 nanana 7 nikitah 8 nobody 9 rubst
Or you can create a file containing the line numbers:
$ nl merged_list > merged_list_numbered
nl can do some other interesting things, like number lines that contain only a specific string. For example, you want to number only the lines in an article containing your name:
$ nl -bpCarla article.text
You don't have to settle for a boring old tab stop delimiting the line numbers from the lines. Add custom text with the -s flag:
$ nl -bpCarla -s "***hey, here I am*** " article.text
You may need leading zeroes, or numbers of a specific length. Suppose you want to use 4-digit IDs on your login list, and you want to start numbering from 0500, with three spaces between the numbers and the logins:
$ nl -nrz -w4 -v0500 -s" " list1 0500 carla 0501 dawns 0502 nikitah 0503 nobody
That's a mere scratch on the surface of the Wide Wide World Of Super-Specialized Text Utilities. Next month on Scripting Clinic we'll look at putting some of these together for bringing sanity to reading log files.
Resources
Check out the man pages for awk, tr, sort, expand/unexpand, and nl. awk is rather complicated beast, being a full-grown programming language; an excellent reference book is sed & awk, 2nd Edition, by Dale Dougherty and Arnold Robbins. Of course it is an O'Reilly book.