Hone Your Scripting With a Regexp Toolbox

The Linux command line is the ultimate power tool. Thanks to the Bash shell, regular expressions, and all the wonderful GNU tools, Linux users can do more cool, useful power-user tasks than with many other operating systems. Regular expressions are the magic incantations that let you find and replace mass quantities of text with a single command, pluck specific text or files out of gigabytes of stuff with precision, and string commands together to perform amazing feats of computing wizardry. Today I shall share with you an assortment of my favorite one-liners for all occasions.

Finding Files

Finding big files, little files, new files, old files, changed files, files by UID or GID, finding files when you’re not sure of the name- these are all child’s play for the wonderful find command. You can even do some simple security audits. We all know the dangers of setting the SUID bit on files- so why not generate a list of the ones on your system to keep track?

# find / ( -perm -4000 -fprintf suid-list.txt '%#m %u %pn' )

This creates a list in the file suid-list.txt that looks like this:

04755 root /bin/umount
04755 root /bin/mount
04754 root /usr/sbin/pppd

There should be around a couple dozen in a typical Linux installation. (wc -l suid-list.txt counts them for you.) You don’t want any homegrown SUID scripts, unless you really really know what you’re doing. man find tells what the options mean. The backslashes prevent Bash from interpreting the characters that follow them, so they are sent on to the find command unmolested. This is important; otherwise you’ll experience slower performance, or even errors.

This command searches for orphaned files; these are files that belong to logins that are not in /etc/passwd, or groups that are not in /etc/group. This might happen when a user is removed from the system, and you don’t track down all their leftover files for deletion or archiving, or whatever you want to do with them:

# find / -nowner
# find / -nogroup 

What if you find some? If you want to keep them, you should assign them to a different user:

# find / -nouser -exec chown alrac {} ;

You can also change ownership by the UID of the files:

# find / -uid 1325 -exec chown alrac {}:

How do you find the UID or GID of files? Use the stat command. Once you know a specific UID/GID to search for, find can show them all to you:

# find /var -uid 1325

Here is a slick trick for finding all files in your home directory created after a date and time of your choosing. First create a file with the time and date you want (YYYYMMDDhhmm), then find all the files created after that:

$ touch -t "200705011200"  date.txt
$ find ~  -newer date.txt

Use ! -newer to find files created before your date.txt file:

You can find files changed or created in the previous few minutes. This is handy when you’ve lost a new download, or want to see what a new application dumped on your system:

$ find ~ -mmin -5

A nice variation on this is files created or changed more than five minutes ago, but less than ten:

$ find ~  -mmin +5 -mmin -10

Everyone knows how to do simple find searches, like find / -name foo. When you do this as a non-root user, you get all those annoying “Permission denied” messages. Consign the derned things to /dev/null:

$ find / -name foo 2>/dev/null

Find your biggest files, print the size, owner, and name of each one, and add up the total:

# find / -size +900M -printf '%#s %u %pn'| awk '{print $1, $2, $3; total += $1} END 
{ print "The total size of these files is", total}'

In this example, the backslash after END indicates that even though the line appears as broken on this page, it is really one long unbroken line. If you enter it as one unbroken line, omit the backslash.

Search and Replace

Perl is the champion of text processing. Don’t be afraid of Perl; Perl is very nice and helpful. This example is adaptable for all kinds of text strings. I use it to update copyright notices on all the files in the current directory:

$ perl -p -i -e 's/2006/2007/g' `grep -il 2006 *`

You could run just the grep command first to see what files are going to be changed. Add the -r flag to recurse through all the subdirectories (grep -ril).

This is a cool little Perl command I used back in the olden days of the mbox mail file format. mbox stores all of your messages in a single text file, unlike maildir which puts each message in a separate file. If the file is damaged and cannot be read by a mail client it is still readable, which is the nice thing about text files. Then you can use Perl to parse the file and output each message into a separate file:

$ perl -pe 'BEGIN { $n=1 } open STDOUT, ">$ARGV.$n" and $n++ if /^From /' inbox

Each file is named inbox.1, inbox.2, and so on. This can be adapted to split up any kind of repetitive file that has a word you can use as a separator.

What if your file is all full of excessive blank lines and you want them to go away? Try this:

$ perl -pe 's/^s+$//' filename > filename-despaced 

This does not change your original file, but removes the blank lines and copies the results to a new file. The angle brace is the Bash redirect operator. It creates a new file, or overwrites an existing file. Two angle braces, >>, create a new file or append to an existing file.

Who Sed That

Managing whitespace is easy with sed. After stripping out all of the blank lines, you might want to put some back for readability, instead of in random places. This sed command inserts a blank line before lines containing the word “From” and stores the results in a new file:

$ sed '/From/{x;p;x;}' filename > newfilename

sed ‘/[keyword]/G’ inserts a blank line after the line containing your keyword.

You can insert blank lines according to numbers of lines, like this example that inserts a blank line after every five lines of text:

$ sed 'n;n;n;n;G;' filename > newfilename

You can delete all leading whitespaces:

$ sed 's/^[ t]*//'  filename > newfilename

Something that has always driven me buggy is computers don’t insert commas into long numbers. Hello, computers are supposed to make our lives easier. One way to do this is with sed. Run the command, then enter your numbers:

$ sed -e :a -e 's/(.*[0-9])([0-9]{3})/1,2/;ta'

Keep going until you are out of numbers to put commas in. Ctrl+C exits.


Add to del.icio.us

Latest Articles

Follow Us On Social Media

Explore More