Hone Your Scripting With a Regexp Toolbox - Page 2
Search and Replace
Perl is the champion of text processing. Don't be afraid of Perl; Perl is very nice and helpful. This example is adaptable for all kinds of text strings. I use it to update copyright notices on all the files in the current directory:
$ perl -p -i -e 's/2006/2007/g' `grep -il 2006 *`
You could run just the grep command first to see what files are going to be changed. Add the -r flag to recurse through all the subdirectories (grep -ril).
This is a cool little Perl command I used back in the olden days of the mbox mail file format. mbox stores all of your messages in a single text file, unlike maildir which puts each message in a separate file. If the file is damaged and cannot be read by a mail client it is still readable, which is the nice thing about text files. Then you can use Perl to parse the file and output each message into a separate file:
$ perl -pe 'BEGIN { $n=1 } open STDOUT, ">$ARGV.$n" and $n++ if /^From /' inbox
Each file is named inbox.1, inbox.2, and so on. This can be adapted to split up any kind of repetitive file that has a word you can use as a separator.
What if your file is all full of excessive blank lines and you want them to go away? Try this:
$ perl -pe 's/^\s+$//' filename > filename-despaced
This does not change your original file, but removes the blank lines and copies the results to a new file. The angle brace is the Bash redirect operator. It creates a new file, or overwrites an existing file. Two angle braces, >>, create a new file or append to an existing file.
Who Sed That
Managing whitespace is easy with sed. After stripping out all of the blank lines, you might want to put some back for readability, instead of in random places. This sed command inserts a blank line before lines containing the word "From" and stores the results in a new file:
$ sed '/From/{x;p;x;}' filename > newfilename
sed '/[keyword]/G' inserts a blank line after the line containing your keyword.
You can insert blank lines according to numbers of lines, like this example that inserts a blank line after every five lines of text:
$ sed 'n;n;n;n;G;' filename > newfilename
You can delete all leading whitespaces:
$ sed 's/^[ \t]*//' filename > newfilename
Something that has always driven me buggy is computers don't insert commas into long numbers. Hello, computers are supposed to make our lives easier. One way to do this is with sed. Run the command, then enter your numbers:
$ sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta' 123456789 123,456,789
Keep going until you are out of numbers to put commas in. Ctrl+C exits.
Resources
- man find
- man bash
- man grep
- man sed
- man awk
- man perl
- Linkifying with Regular Expressions
- How to Find or Validate an Email Address -- the longest regexp of all time