Scripting Clinic: Dissecting a Live Python... Script

By examining a working script line by line, this edition of the Scripting Clinic shows you how to put your own scripts together and exposes a few Python quirks along the way.

 By Carla Schroder
Page 1 of 2
Print Article

Last time we covered some Python fundamentals, and exposed the seekrit tricks and traps that foil new Python users. This month we're going to dissect a nice Python script line-by-line, that you may learn something useful thereby, then go forth and script some more. It is called "pyfind." pyfind is brought to you courtesy of ace coder Akkana Peck, whose expertise made this tutorial possible. It searches for groups of file types with a single command. In this example, it searches for image files, or for many types of Web files. It is easily modifiable to search for whatever your heart desires.

Remember to make it executable:

$ chmod +x pyfind

Using it is as easy as falling over:

$ ./pyfind web .
$ ./pyfind image ~
$ ./pyfind image ~/gallery/thumbs

And now, our lovely script:

#!/usr/bin/env python
# Find files of particular types under the given root.
# Usage: pyfind type rootdir
# Known types: image, web.

import string, sys, os

# The types we know about, their names and file extensions:
known_types = [
       [ "image", "gif", "jpg", "jpeg", "tif",
       "tiff", "bmp", "ico", "xcf" ],
       [ "web", "html", "htm", "xml", "cgi" ]

def find_in_dir(root, type) :
       files = os.listdir(root)

       # Loop through known_types looking for matches
       type_exts = 0
       for t in known_types :
               if type == t[0] :
                      type_exts = t
       if type_exts == 0 :

       for f in files :
               if os.path.isdir(root + "/" + f) :
                      find_in_dir(root + "/" + f, type)
               ext = string.lower(os.path.splitext(f)[1][1:])
               for e in type_exts[1:] :
                      if e == ext :
                             print f

# make it so
if len(sys.argv) < 3 :
       print "Usage:", sys.argv[0], "type rootdir"

type = sys.argv[1]
root = sys.argv[2]

find_in_dir(root, type)

Let's take this a piece at a time.

import string, sys, os

The import statement tells Python which library packages you want to use in the script. This is similar to the include directive in C. Python comes with bales of library packages. The Python Library Reference describes them.

known_types = [
[ "image", "gif", "jpg", "jpeg", "tif", "tiff", "bmp", "ico" ],
[ "web", "html", "htm", "xml", "cgi" ]

This demonstrates using nested arrays. Note how each individual array is enclosed in square brackets, and the whole megillah is inside another set of brackets. Our names for these arrays, image and web, are simply stuck inside the arrays, and not given special labels. So how does Python know what the array names are? Why are they all lowercase? You shall see presently. (See Line Structure in the Python Reference Manual for how to span lines.)

def find_in_dir(root, type) :

This defines a Python function, which takes two arguments from the calling function.

root = which directory to search
type = the name of the array containing the file types we want to find

files = os.listdir(root)

The os library package, one of the packages listed in the import statement, includes utilities for listing files and directories.

# Loop through known_types looking for matches
type_exts = 0

for t in known_types :

t is an arbitrary variable name, you can call it anything. This statement means "look in the arrays defined by known_types . Continued on Page 2

This article was originally published on May 26, 2004
Get the Latest Scoop with Networking Update Newsletter