Describing Archived Web Content


Operating MAVAC computerOperating MAVAC computer

Our profession has been harvesting and archiving web content for well over a dozen years, yet archivists have still not settled on preferred practices for describing the content we harvest. Part of this is based on the fact that different units in libraries perform this work. While some institutions prefer to document harvested web content through catalog records, others do it through collection guides. Because we're treating these materials as archival records and manuscript items, we're focusing our descriptive work on collection guides, rather than the catalog (collection guides are already represented in the catalog). Along with the descriptive records we create in Archive-It for the harvested sites, we've recently made some decisions about how to describe this content to make it more discoverable, as well as understandable, by researchers.

Where possible, we will map a website to an existing collection. In that collection's guide, we will create a record for the related web content. The contextual and descriptive information about archived web sites we anticipate researchers finding useful includes the title and URL of the site; the date at which harvesting began, and possibly ended, and the scheduled frequency of capture; the harvesting tool(s) used; and links out to the archived content hosted by the Internet Archive. An example is the guide for the Libraries Director's Office Records, which now includes the Libraries web site, which we have scheduled to be captured monthly since October 2015, is being harvested using the Internet Archive's Archive-It tool, and also includes prior captures by the Internet Archive dating back to February 2000.

There's more to look for in this area. Recently, the OCLC Research Library Partnership Web Archiving Metadata Working Group shared its report, "Best Practices for Web Archiving Metadata." They have openly solicited feedback. The community is hopeful that a set of established descriptive practices will soon be established. We, too, would welcome feedback about our approach. If you're a researcher who might be interested in using archived web sites in your researcher, we would appreciate hearing from you about what sort of archival description you would find useful in your work. You can share comments through the SCRC's Contact Form