Scripting Clinic: Dissecting a Live Python... Script

By Carla Schroder | May 26, 2004 | Print this Page
http://www.enterprisenetworkingplanet.com/netsysm/article.php/3359481/Scripting-Clinic-Dissecting-a-Live-Python-Script.htm

Last time we covered some Python fundamentals, and exposed the seekrit tricks and traps that foil new Python users. This month we're going to dissect a nice Python script line-by-line, that you may learn something useful thereby, then go forth and script some more. It is called "pyfind." pyfind is brought to you courtesy of ace coder Akkana Peck, whose expertise made this tutorial possible. It searches for groups of file types with a single command. In this example, it searches for image files, or for many types of Web files. It is easily modifiable to search for whatever your heart desires.

Remember to make it executable:

$ chmod +x pyfind

Using it is as easy as falling over:

$ ./pyfind web .
$ ./pyfind image ~
$ ./pyfind image ~/gallery/thumbs

And now, our lovely script:

#!/usr/bin/env python
# Find files of particular types under the given root.
# Usage: pyfind type rootdir
# Known types: image, web.

import string, sys, os

# The types we know about, their names and file extensions:
known_types = [
       [ "image", "gif", "jpg", "jpeg", "tif",
       "tiff", "bmp", "ico", "xcf" ],
       [ "web", "html", "htm", "xml", "cgi" ]
]

def find_in_dir(root, type) :
       files = os.listdir(root)

       # Loop through known_types looking for matches
       type_exts = 0
       for t in known_types :
               if type == t[0] :
                      type_exts = t
       if type_exts == 0 :
               return

       for f in files :
               if os.path.isdir(root + "/" + f) :
                      find_in_dir(root + "/" + f, type)
                      continue
               ext = string.lower(os.path.splitext(f)[1][1:])
               for e in type_exts[1:] :
                      if e == ext :
                             print f
                             break

# make it so
if len(sys.argv) < 3 :
       print "Usage:", sys.argv[0], "type rootdir"
       sys.exit(1)

type = sys.argv[1]
root = sys.argv[2]

find_in_dir(root, type)

Let's take this a piece at a time.

import string, sys, os

The import statement tells Python which library packages you want to use in the script. This is similar to the include directive in C. Python comes with bales of library packages. The Python Library Reference describes them.

known_types = [
[ "image", "gif", "jpg", "jpeg", "tif", "tiff", "bmp", "ico" ],
[ "web", "html", "htm", "xml", "cgi" ]
]

This demonstrates using nested arrays. Note how each individual array is enclosed in square brackets, and the whole megillah is inside another set of brackets. Our names for these arrays, image and web, are simply stuck inside the arrays, and not given special labels. So how does Python know what the array names are? Why are they all lowercase? You shall see presently. (See Line Structure in the Python Reference Manual for how to span lines.)

def find_in_dir(root, type) :

This defines a Python function, which takes two arguments from the calling function.

root = which directory to search
type = the name of the array containing the file types we want to find

files = os.listdir(root)

The os library package, one of the packages listed in the import statement, includes utilities for listing files and directories.

# Loop through known_types looking for matches
type_exts = 0

for t in known_types :

t is an arbitrary variable name, you can call it anything. This statement means "look in the arrays defined by known_types . Continued on Page 2Continued From Page 1

if type == t[0] :

Python uses 0, not 1, as the starting point for arrays. So t[0] is the first element of the array t, web. And that's how Python knows the name we chose for that array.

type_exts = t
if type_exts == 0 :
return

If there is a match, type_exts will be an array. If it's 0, exit quietly. Otherwise, move on to listing matching file types.

for f in files :
if os.path.isdir(root + "/" + f) :

Some Python libraries have sub-packages which contain useful utilities. The os library, which we've already imported, includes a group of utilities called os.path which operate on filenames, including isdir, which checks whether its argument is a directory. We don't need to import os.path explicitly if we've already imported os.

find_in_dir(root + "/" + f, type)
continue

If f is a directory, not a plain file, then we'll call find_in_dir recursively, to continue searching the file tree. But we need to specify the relative path to the new directory; that's what (root + "/" + f, type) does:

root = the current path
"/" = a literal slash
f, type = new directory name

Let's say our root directory is /home. This expression adds up to /home/new directory name.

ext = string.lower(os.path.splitext(f)[1][1:])

Taking this line one step at a time:

os.path.splitext(f) splits the filename into a 2-item array
consisting of a filename (such as "pic007") and an extension
(such as ".JPG").

os.path.splitext(f)[1] is the second element of this array, e.g. ".JPG".

os.path.splitext(f)[1][1:] takes all but the first character of this string. This is a very useful feature of Python: the "slicing" operator, specified with square brackets and a colon. loaf[i:j] returns all of the elements of loaf from position i to position j; if i is left out, the beginning is implied, while leaving out j takes everything from i to the end. Slices can be used with either arrays or strings.

So if os.path.splitext(f)[1] is ".JPG", then os.path.splitext(f)[1][1:] is everything from the second character to the end, or "JPG" without the dot.

Finally, string.lower turns the string to lowercase, "jpg", so that we can compare it against our list of lower-case file extensions.

for e in type_exts[1:] :

This is the slice operator, used this time on an array, type_exts. The first element of the array is the type name ("image"), so we want to skip over it but loop over the rest of the array.

if e == ext :

A string comparison between our string ("jpg") and the current extension in the array.

print f
break

If it matches, then print it, then break out of this inner loop, in order to move on to the next filename.

This is the end of find_in_dir. Now we begin the main code, called when we run the script.

# main()
if len(sys.argv) < 3 :
print "Usage:", sys.argv[0], "type rootdir"
exit(1)

Make sure we have two arguments, and if not, print a usage statement.

sys.argv (from the library package "sys") is the array of arguments passed in. The first element (at index 0) is the program name. It's best to use argv[0] when printing error messages or warnings, rather than making assumptions that the program will always be called by the same name.

type = sys.argv[1]
root = sys.argv[2]

find_in_dir(root, type)

The routine find_in_dir does the heavy lifting. In a way, the # make it so section is like the movie star who lolls around the set while stand-ins and stunt doubles do all the hard stuff. Then the star strolls onstage for a closeup, then retires to let the stand-ins and stunt doubles toil some more.

Conclusion
This little script does a lot in a few lines, which is a nice characteristic of Python. Be sure to visit Python's home page for reference manuals and excellent tutorials.