Scripting Clinic: Slice and Dice Text with Perl
Perl's been called the 'Swiss Army chainsaw' of scripting languages, but in this installment of the Scripting Clinic you'll learn how to use it like a scalpel for your most demanding (and disorganized) files.
No no, don't run away just because I said "Perl"! Perl is a really nice scripting language, don't be afraid. Pretend you have never been abused by mean people on the Perl mailing lists. Pretend that you have never looked at a long, complex Perl program and wondered if it was really a program, or a random collection of characters generated as a practical joke. The key to using Perl effectively is to focus on two basic principles:
- Learn a few basic Perl tools well. You can do a lot with a little in Perl. Don't worry about the show-offs who flaunt their strange, arcane Perl knowledge, just stay focused on the tasks you need to accomplish.
- Write your Perl code for clarity and human understanding. Some Perl geeks love to compete to write the most obfuscated code. You're welcome to join in; but for writing easy-to-maintain scripts, clarity is the way to go.
Today we'll take a Real People look at Perl, and learn some Perl tricks for doing useful text searches and replacements.
First, a pop quiz: Which line is a Perl expression, and which one is a random collection of characters?
@caps= m/(\b[^\Wa-z0-9_]+\b)/g; $a//\*/\||+=#//\n
Stay tuned for the answer.
Simple String Searches
Nothing beats Perl regular expressions for searching text in any way you can imagine. Here are some simple string comparisons.
This is a simple search for a specific string:
That's a nice easy search, but we can nail it down more precisely. Adding the caret looks for our search string at the beginning of lines:
Lose the caret and add a dollar sign to find your search string at the end of the line:
To do a case-insensitive search, add i:
Combine the caret and dollar sign to find lines that consist only of the search string:
This conducts a whole-word search; the previous examples will return any matching string, even if it is inside another word:
Search And Replace
Well that was easy and fun. So what are you going to do with those search strings when you find them? You could have Perl replace them:
s/\bGeorge Bush\b/Anyone At All/;
This only replaces the first instance. To replace all occurrences in a document, add the /g switch:
This is a huge time-saver for anyone who needs to do a global search-and-replace in a batch of files, like Web pages. Run it from the command line, substituting your own text to search and replace:
$ perl -e 's/string/stringier/gi' -p -i.bak *.html
perl -e means "run this command." The command is enclosed in single quotes. -i.bak creates backup copies of the originals, and *.html = all .html files in the current directory.