Scripting Clinic: Your Pre-Fab Text Processing Toolkit - Page 2

By Carla Schroder | Posted Aug 11, 2004
Page 2 of 2   |  Back to Page 1
Print ArticleEmail Article
  • Share on Facebook
  • Share on Twitter
  • Share on LinkedIn

Continued From Page 1

awk
awk is way fun. awk lets you pluck things out of lines of text according to their position. This example sorts out the human users from Linux system users, assuming you have stuck to a sensible user numbering scheme:

$  awk -F: '$3 > 999 { print $0}' /etc/passwd
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
carla:x:1000:1000:carla schroder,,,:/home/carla:/bin/bash
dawns:x:1002:1002:Dawn Marie Schroder,,,,foo:/home/dawns:
nikitah:x:1003:1003:Nikita Horse,,123-4567,equine:/home/nikitah:
rubst:x:1004:1004:Rubs The Cat,101,,234-5678,,test test:/home/rubst:

How does awk know what the delimiter is? You tell it with the -F flag, which selects the colon as the delimiter in this example. $3 means "the third field."

On a Debian system, human users start at UID 1000. On Red Hat and SuSE, they start at 500. You can select subsets easily enough:

$ awk -F: '($3 >= 1000) &&($3 <=1050)  { print $0}' /etc/passwd

OK, but suppose all you really want are the logins. This is an awk specialty; change print $0, which means "print the whole line," to print $1:

$ awk -F: '$3 > 999 { print $1}' /etc/passwd nobody carla dawns nikitah rubst

Well that's all very nice and everything, but they're in UID order. What if you want them in alphabetical order? Easy. Throw sort into the fray:

$ awk -F: '($3 >= 1000) &&($3 <=1050) { print $1}' /etc/passwd | sort

And they will be listed alphabetically. You can do all sorts of things with this, like copy and paste mass users into groups, check for duplicate logins, and generate an index of users.

sort has its own flag for sending the output to a file, don't use a pipe or a redirect, because they won't work. Use sort's -o option to name the output file:

$ awk -F: '($3 >= 1000) &&($3 <=1050)  { print $1}' /etc/passwd | sort -o list1.txt

Now suppose you have the ever-so-fun chore of merging lists of logins from two different systems. sort can do this too. Take your two files containing the logins, which are already sorted, and do this:

$ sort -um list1 list2 -o merged_list

-u checks for duplicates; if it finds any, it only prints one of them. -m means merge.

Suppose you want to add line numbers? Why, all you need is nl.

nl
This shows line numbers on the screen, but does not change the file:

$ nl merged_list
     1  carla
     2  dawns
     3  foober
     4  goober
     5  helen
     6  nanana
     7  nikitah
     8  nobody
     9  rubst

Or you can create a file containing the line numbers:

$ nl merged_list > merged_list_numbered

nl can do some other interesting things, like number lines that contain only a specific string. For example, you want to number only the lines in an article containing your name:

$ nl -bpCarla article.text

You don't have to settle for a boring old tab stop delimiting the line numbers from the lines. Add custom text with the -s flag:

$ nl -bpCarla -s "***hey, here I am*** " article.text

You may need leading zeroes, or numbers of a specific length. Suppose you want to use 4-digit IDs on your login list, and you want to start numbering from 0500, with three spaces between the numbers and the logins:

$ nl -nrz -w4 -v0500  -s" " list1
0500   carla
0501   dawns
0502   nikitah
0503   nobody

That's a mere scratch on the surface of the Wide Wide World Of Super-Specialized Text Utilities. Next month on Scripting Clinic we'll look at putting some of these together for bringing sanity to reading log files.

Resources
Check out the man pages for awk, tr, sort, expand/unexpand, and nl. awk is rather complicated beast, being a full-grown programming language; an excellent reference book is sed & awk, 2nd Edition, by Dale Dougherty and Arnold Robbins. Of course it is an O'Reilly book.

Comment and Contribute
(Maximum characters: 1200). You have
characters left.
Get the Latest Scoop with Enterprise Networking Planet Newsletter