Digital Archiving: The Impossible Dream?
Carla Schroder reports on the increasingly difficult challenges of efficiently storing data for the long haul and being able to retrieve that information in a timely, useful manner.
Information storage is easy; anyone can throw things into a box. Retrieval is where the real power lies; the ability to efficiently sort and retrieve records in a timely, useful manner. This is, of course, where computers really shine -- they make it possible to store and sort vast quantities of data easily and quickly. The drawback is the transient nature of both the technologies and the media.
All of us ace sysadmins have excellent short-term backup and storage schemes. (That's my story at least, and I'm sticking to it.) But what about planning for the future? Shakespeare's bar bills and court records have survived the last 500 years. The Dead Sea Scrolls go back about 2000 years. Cave paintings date back over 30,000 years. Yet in this here fabulously advanced 21st century, neither analog nor digital creations seem to have anywhere near this longevity.
David Mandel, of Mandel and Associates, says "Modern digital media are very bad in terms of long term storage. As libraries, historical societies, and other institutions become more and more digital, we are storing more and more of our culture on transient media that require constant maintenance by highly skilled people. If something happens in our society (war, economic depression, total social chaos, etc.) and we have to mothball everything for a generation or two, we risk losing a lot of information of cultural importance."
There are two aspects of modern data storage to consider: storage medium and data format. The technology changes quickly, and the options seem to grow even faster: punch cards, 9-track tapes, 1/2" reel-to-reel tapes, paper tape, floppy disks, all the various incarnations of hard drives, optical disks, and so on. Even if you have the right hardware to read obsolete media, and the media are still readable, file formats become obsolete. If you can't decode the data, it's has no value, no matter how well it's been preserved.
Any magnetic medium is a bit scary for me; it feels too vulnerable. Ed Sawicki of Alcpress has this to say on the subject, "I studied this issue a while back for a customer. I did some informal testing as well. I had an extensive library of diskettes that was indexed with a database so I knew when the diskette was placed in the library. I noticed that errors started occurring in diskettes that were over 6 years old, and by 10 years the failure rate was nearly 50 percent. At 15 years, the majority of the diskettes were unreadable."
CD-ROMs and DVDs
CDROMs and DVDs theoretically should be more stable and last longer, but manufacturers are reluctant to make any specific longevity claims. Some people insist that "CDs will spontaneously de-laminate, you mark my words, and then you'll be sorry." Well, I don't know. I have six-year old commercial data CDs that are fine, and older music CDs that still play. Home-burned are not as dependable, though, due to being burned with lower-power and less-precise lasers than the commercial variety.
Still, if Shakespeare's bar bills can survive nearly 500 years, why can't we do better in this day and age? There are two old-fashioned but dependable means of archiving that still do the job well.
Microfilm, properly stored and handled, should have a lifespan of 100-200 years, with some experts claiming up to 500 years. If microfilm readers vanish, the film is still readable with a magnifying glass, which is a technology dating from before Galileo's time, so I imagine it won't be going away anytime soon.
Paper is still king. It's inexpensive, needs no special technology to read, and lasts decades, if not centuries. And with computers, there's no need to retype all those pages; simply use a high-quality, high-volume OCR (optical character recognition) scanner to convert the pages back into digital files. I think it is safe to assume that technologies to convert printed text into digital files will not go away.