Scripting Clinic: Your Pre-Fab Text Processing Toolkit
Scripting he-men are fond of 'writing a few lines of Perl' whenever a file needs munging. Too bad they're ignoring the overflowing toolbox of Unix and Linux text-processing utilities.
Convert tabs to spaces, or spaces to tabs. Some programs are finicky about whitespace; either you must use only tabs, or only spaces, or simply pick either one, and don't mix them. To convert tabs to spaces:
$ expand filename
And that's it. All the tabs in the file will be converted to 8 spaces. You don't have to settle for 8 spaces, you may select any number you like:
$ expand -t 4 filename
That spits the output to stdout, which may not be what you want. This sends the output to a new file:
$ expand -t 4 filename >> filename
You don't have to convert all the tabs in the file, you can convert only the leading tabs on each line:
$ expand -i filename
unexpand does the reverse, it converts leading spaces to tabs:
$ unexpand filename
Or, convert all strings of two or more spaces to tabs:
$ unexpand -a filename
If your text editor does not have a way of displaying tabs, cat can do it:
$ cat -v -t testfile ^IThis section shows how to put various ^Ipieces of information into the Bash prompt. ^I There are an infin^Iite^I number of things that ^I could be put in your prompt.
The carets indicate tabs, so you can see for yourself how expand and unexpand work.
tr, "translate", is a deceptive little utility. It does a whole lot of things. A handy use for it is converting text to lowercase, like Windows filenames. You can test this at the command prompt:
$ tr "[:upper:]" "[:lower:]" MyVirus.pif.xls myvirus.pif.xls MyTrojanInstaller.exe.jpg mytrojaninstaller.exe.jpg
Hit ctrl + c to stop. This can easily be applied to files. This example uses character classes to convert the contents of trtest to lowercase, and output it to trtestout:
$ tr "[:upper:]" "[:lower:]" > trtest < trtestout
< means read from this file, > means send the output to a new file.
Sometimes you want to change a specific character. Suppose you've written an article, and after spending hours on it you realize you inadvertently slipped into l33t-sp34k. Well this will never do. Use tr to change the numbers to letters:
$ tr "340" "eao" I 4m a l33t hax0r. Ph34r m3! I am a leet haxor. Phear me!
Pretty simple -- first list the characters you want changed, all grouped together, then the second group is what you want the new characters to be. Just make sure they are in order.
tr can delete strings, using the -d flag:
$ tr -d "leet" I am a leet haxor e e lt et el I am a haxor
You can see the limitations of tr in this example -- it looks for characters, not words.