Hone Your Scripting With a Regexp Toolbox - Page 2

 By Carla Schroder
Page 2 of 2   |  Back to Page 1
Print Article

Search and Replace

Perl is the champion of text processing. Don't be afraid of Perl; Perl is very nice and helpful. This example is adaptable for all kinds of text strings. I use it to update copyright notices on all the files in the current directory:

$ perl -p -i -e 's/2006/2007/g' `grep -il 2006 *`

You could run just the grep command first to see what files are going to be changed. Add the -r flag to recurse through all the subdirectories (grep -ril).

This is a cool little Perl command I used back in the olden days of the mbox mail file format. mbox stores all of your messages in a single text file, unlike maildir which puts each message in a separate file. If the file is damaged and cannot be read by a mail client it is still readable, which is the nice thing about text files. Then you can use Perl to parse the file and output each message into a separate file:

$ perl -pe 'BEGIN { $n=1 } open STDOUT, ">$ARGV.$n" and $n++ if /^From /' inbox

Each file is named inbox.1, inbox.2, and so on. This can be adapted to split up any kind of repetitive file that has a word you can use as a separator.

What if your file is all full of excessive blank lines and you want them to go away? Try this:

$ perl -pe 's/^\s+$//' filename > filename-despaced 

This does not change your original file, but removes the blank lines and copies the results to a new file. The angle brace is the Bash redirect operator. It creates a new file, or overwrites an existing file. Two angle braces, >>, create a new file or append to an existing file.

Who Sed That

Managing whitespace is easy with sed. After stripping out all of the blank lines, you might want to put some back for readability, instead of in random places. This sed command inserts a blank line before lines containing the word "From" and stores the results in a new file:

$ sed '/From/{x;p;x;}' filename > newfilename

sed '/[keyword]/G' inserts a blank line after the line containing your keyword.

You can insert blank lines according to numbers of lines, like this example that inserts a blank line after every five lines of text:

$ sed 'n;n;n;n;G;' filename > newfilename

You can delete all leading whitespaces:

$ sed 's/^[ \t]*//'  filename > newfilename

Something that has always driven me buggy is computers don't insert commas into long numbers. Hello, computers are supposed to make our lives easier. One way to do this is with sed. Run the command, then enter your numbers:

$ sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'

Keep going until you are out of numbers to put commas in. Ctrl+C exits.


Add to del.icio.us | DiggThis

This article was originally published on May 22, 2007
Get the Latest Scoop with Networking Update Newsletter