Scripting Clinic: Your Pre-Fab Text Processing Toolkit

Scripting he-men are fond of 'writing a few lines of Perl' whenever a file needs munging. Too bad they're ignoring the overflowing toolbox of Unix and Linux text-processing utilities.

 By Carla Schroder
Page 1 of 2
Print Article

Because Linux/Unix relies on text files for practically everything, a veritable thicket of text-processing utilities have grown up to satisfy every text-processing whim. Many of them are specialized and do only a few things. This is a real lifesaver when you just need to do one little thing, and don't want to wade through reams of Perl or Bash documentation to figure out how. This is especially handy when you're scripting; it's often easier to keep a list of useful text-processing utitilies, than to explore all the meeelyuns of possibilities of more complex tools.

Convert tabs to spaces, or spaces to tabs. Some programs are finicky about whitespace; either you must use only tabs, or only spaces, or simply pick either one, and don't mix them. To convert tabs to spaces:

$ expand filename

And that's it. All the tabs in the file will be converted to 8 spaces. You don't have to settle for 8 spaces, you may select any number you like:

$ expand -t 4 filename

That spits the output to stdout, which may not be what you want. This sends the output to a new file:

$ expand -t 4 filename >> filename

You don't have to convert all the tabs in the file, you can convert only the leading tabs on each line:

$ expand -i filename

unexpand does the reverse, it converts leading spaces to tabs:

$ unexpand filename

Or, convert all strings of two or more spaces to tabs:

$ unexpand -a filename

If your text editor does not have a way of displaying tabs, cat can do it:

$ cat -v -t testfile ^IThis section shows how to put various ^Ipieces of information into the Bash prompt. ^I There are an infin^Iite^I number of things that ^I could be put in your prompt.

The carets indicate tabs, so you can see for yourself how expand and unexpand work.

tr, "translate", is a deceptive little utility. It does a whole lot of things. A handy use for it is converting text to lowercase, like Windows filenames. You can test this at the command prompt:

$ tr "[:upper:]" "[:lower:]"

Hit ctrl + c to stop. This can easily be applied to files. This example uses character classes to convert the contents of trtest to lowercase, and output it to trtestout:

$  tr "[:upper:]" "[:lower:]" > trtest < trtestout

< means read from this file, > means send the output to a new file.

Sometimes you want to change a specific character. Suppose you've written an article, and after spending hours on it you realize you inadvertently slipped into l33t-sp34k. Well this will never do. Use tr to change the numbers to letters:

$ tr "340" "eao"
I 4m a l33t hax0r. Ph34r m3!
I am a leet haxor. Phear me!

Pretty simple -- first list the characters you want changed, all grouped together, then the second group is what you want the new characters to be. Just make sure they are in order.

tr can delete strings, using the -d flag:

$ tr -d "leet" I am a leet haxor e e lt et el I am a haxor

You can see the limitations of tr in this example -- it looks for characters, not words.

Continued on page 2: awk, sort, nl

This article was originally published on Aug 11, 2004
Get the Latest Scoop with Networking Update Newsletter