Ethics in Archives: Decisions in Digital Archiving

Blog post contributed by Taylor de Klerk and Jessica Serrao, Library Associates

Digital archival work requires a lot of tough decisions. It may not always seem that way, but a lot has to happen behind the scenes before archival materials can be available to everyone on the Internet. Archivists have been trying to figure out the best way to incorporate digital assets into existing archival structures for quite some time. In our case, Special Collections has developed three pillars that make up our digital program: digitization, born digital processing, and web archiving. In this way, we organize our program by type of digital asset. Each of these pillars raise ethical questions as we work to fairly preserve, describe, and provide access to digital materials.

Three pillars of SCRC's Digital Program

The first pillar is digitization. Most of Special Collections’ holdings are physical items. To improve access to these materials, we select and prioritize items to digitize, scan or photograph them, create descriptive metadata, conduct quality control, and make them available online on our Rare and Unique Digital Collections site.

Our second pillar is born digital processing. Unlike digitized materials, born digital materials were originally created in a digital format, such as a Microsoft Word document or a digital photo saved as a JPEG file. Our processing procedures ensure that those files are properly preserved--we use software to gather information about the files, document our actions, and create and manage archival packages that contain the original files and preservation documentation.

The third pillar, web archiving, is an endeavor in which Special Collections captures and saves instances of websites created by or affiliated with NC State University, as well as other donors. Each capture is a snapshot of what that website looked like on the date it was captured. We use Archive-It as our platform for this activity, which makes everything we capture available to view in the Internet Archive’s Wayback Machine. You can access NCSU Libraries’ ArchiveIt homepage here. For more details on the three pillars of our digital program, visit Taylor’s “Space Showcase: Digital Collections” post here.

Digital archives require so much specialized expertise that Special Collections has its own Digital Program Librarian: Brian Dietz. He collaborates with the department’s curators, staff, and students to ensure that all three pillars of SCRC’s digital program are supported. One SCRC staff member who works closest with Dietz is Library Technician Laura Abraham, who creates metadata for our digital collections and manages the collections’ overall quality.

Brian Dietz, Digital Program LibrarianLaura Abraham, SCRC Library Technician

Digital materials require similar ethical decisions to those previously addressed in this series. However, when considering digital formats, these decisions take on new forms. For example, digitization costs a lot: it takes a long time, requires a variety of technologies, and needs a supportive storage system to ensure the security and longevity of the files. Because of those costs, Dietz works with other SCRC staff to prioritize materials for enhanced access. It’s a common misconception that everything donated to an archive is immediately digitized and available for public use.

To make decisions about those and other priorities, Dietz, along with colleagues in the SCRC, balances how researchers are using our collections with needs communicated by our curators. Dietz says, “it’s those two things together--what the curator is finding of value and wants to promote but also what researchers have expressed an interest in.” In this way, Dietz makes “data-informed instead of data-driven decisions” that consider what researchers are using as well as anticipating what they will want to use.

Privacy

Archivists must be vigilant about privacy when digitizing archival collections, processing born digital materials, or capturing Web content. We follow the same rules when digitizing as we do when processing, but there are additional facets to consider. For example, someone who wrote a letter in the 1960s could not have anticipated that letter being available online, so is it okay for us to put it there? Creators’ expectations of privacy, as well as the people who may be mentioned in the materials, are something worth thinking about when deciding whether or not to provide online access. If we do provide access to something that a creator or subject expresses discomfort with, Special Collections accepts takedown requests and reviews them on a case-by-case basis.

We also consider privacy when collecting social media content. This is different from written correspondence in that, as Dietz says, “some social media is intended to be public. It’s intended to be findable.” For born digital materials, we use applications like BulkExtractor to search for instances of personally identifiable information (PII), such as social security numbers or credit card information. Due to the nature of our collections, these are not common occurrences. If PII is found, we then make decisions as to how to manage it based on the nature of the information and privacy risks.

Intellectual Property

Similar to our respect for privacy, our policies also respect creators’ intellectual property rights. Careful consideration of copyright interests is an important part of digital archives. Pamphlets we find in folders, born digital photographs, websites, and other media may have a variety of copyright holders and restrictions for publication. To help us make decisions about rights, and to determine how to communicate those decisions to our users, we find RightsStatements.org to be a helpful tool. Navigating the legal jargon of copyright can be challenging, and there are often many grey areas that require us to make case-by-case decisions. According to Dietz, “assessing copyright is a risk assessment. We're far more likely to put no and low risk materials online. We need a strong justification to make things available online when there's little known about the rights situation.” As a public repository, our goal is ultimately to provide access, but not at the expense of the copyright holder. It’s a balancing act, just like many of these ethical decisions.

Authenticity

Another important ethical part of our digital program is ensuring that the digital records we create and preserve are authentic. Born digital materials in particular are easily modified or corrupted, and we use write protection software to mitigate that concern. Dietz says, “what gets into storage ought to be the same thing that left the work environment.” We verify checksums at multiple points throughout the process to make sure that is the case.

We also have an obligation to ensure that our digitized materials authentically represent the original document or item that we digitized. To do this, we use high resolution scanners and cameras, make sure any coloration is accurate, and treat the files that digitization generates much like we do with born digital archival materials so that they can be indefinitely preserved.

For Web archiving, we limit our capturing to websites created by departments and units at NC State, as well as those we received permission from the site creator to capture. Websites beyond the scope of our servers can be created by anyone for any purpose, and therefore we only archive those that we trust.

Preservation

Digital archives aren’t immune to preservation’s ethical issues. Quite the opposite! Digital files can degrade over time (aka bit rot), and digital storage is vulnerable to technical failures and obsolescence (old storage technology gets upgraded and files need to be moved to new storage). As a result, archivists are obligated to have a more active and iterative role in preservation over time than they would with paper materials. Papers, properly stored and preserved in a stable environment, can last a long time without intervention. Digital files will suffer if treated this way. Dietz works with other library groups and departments, including the Information Technology (IT) department, Digital Library Initiatives (DLI), and the Digitization and Digital Curation Working Group to develop preservation strategies and maximize efficient use of our storage environment. Files may be stored on fragile old media (think floppy disks), which are very vulnerable to degradation. When we process born digital materials, we move those files to a new storage environment to, as Dietz says, “liberate them from their legacy media jails.” Once in our stable storage environment, files need to be checked periodically for degradation or moved to new storage media as existing storage ages out. Dietz, IT, and DLI work to maintain, update, and check to ensure the Libraries’ environment is safe, stable, and reliable.

Check out this cartoon for a fun overview of digital preservation. Additional information about bit level preservation and bit rot can be found in the Digital Preservation Coalition’s Introduction to Digital Preservation.

Access

Digital archivists are ethically bound to preserve and maintain digital materials so researchers can access them now and into the future. This means two different things when it comes to born digital versus digitized archival materials.

To ensure born digital materials are accessible, archivists must save the bits of each born digital file. This means battling technological obsolescence that can prevent access to the original files. There is a real risk that technology will fall out of favor and lose support from its creators and user communities. This can happen with file formats (AppleWorks’ .cwk files, .tga graphics files, etc.), digital storage media (floppy disks, compact discs, etc.), and even the software programs and hardware needed to read those files (WordStar, Windows 98, etc.). This isn’t unique to digital collections, and anyone who uses technology has experienced the ebb and flow of new technology in, old technology out. An amusing article from CNET highlights multiple ports that are now obsolete, which can become an archivist’s nightmare as external hard drives using these old ports can no longer connect to a computer to be read.

To ensure digitized archival materials are accessible, archivists must strive to provide unmediated web access. Making digital collections available online helps us to minimize geographic barriers to access and potentially to reach more diverse audiences. It is this positive exposure that could also be harmful to living subjects who are unknowingly represented in digital materials. As archivists provide increased access to digital materials, they must remember that some communities will still have limited access, such as those with little to no Internet availability. Other communities may be unaware that there are collections online which document their history. Additionally, there may be researchers who do not know online collections relevant to their research efforts exist. Online access is an improvement, but archivists must be aware that they may not be reaching all stakeholders (whether that is the documented communities, researchers, creators, etc.). Increased online access means more people are exposed to significant archival materials and archives need to inform, or even work with, stakeholders to provide fair and equitable access.

Archivists must constantly balance access with the time they spend on description. Should we take the time to describe materials more thoroughly, or quickly create minimal descriptions to get more materials available online? We can use tools like Optical Character Recognition to quickly make typed documents full-text searchable, but even those materials require some additional description. Visual materials, such as photographs and AV recordings, rely more heavily on archivists’ descriptions, which directly affects how users find materials online.

Description

Describing digital materials has many of the same ethical concerns as describing physical materials. Various stakeholders have diverse values and identities that archivists need to represent fairly through descriptive practices. In addition to that responsibility, archivists need to be aware of how their own values and biases can creep into their descriptions. Archivists, subjects, and users may have conflicting claims to the language used to describe a record, but each claim should be considered empathetically in relation to each other and in relation to dominant power structures before any descriptive decisions are made. Laura Abraham, who creates descriptions for our digital collections, illustrates this point:

The Library of Congress has the authorized subject heading "Indians of North America," while many would initially think that this is anachronistic or even offensive. They may presume that "Native Americans" is what they should use as a better describer. However, the communities and individuals the terms represent are not unanimous in preferring "American Indian," "Native American," or something else. In addition, many want to be called by their tribal affiliation.

Abraham and others suggest the answer to this is getting those communities involved in descriptive practices. Abraham states, “We should ask the people the headings describe. What do the people of our campus and community prefer, how do they want to be represented?”  As collections documenting more sensitive subjects and vulnerable and marginalized communities are digitized and described, this will become an even more pertinent question.


Digital materials pose (and will no doubt continue to pose) challenging ethical situations for archivists to consider at every turn - from choosing materials to digitize or websites to capture, to describing the materials and providing online access. An open and empathetic approach can help archivists view their work from diverse angles and strive to make more ethically sound decisions. These decisions should be informed by those who are most affected by our work, whether that is the researcher, creator, or communities that have been historically unrepresented and underrepresented by archives.

This is the final post in our series about ethics in archives, introduced here. We hope this series has shed light on the tough ethical decisions that exist at every stage of archival work. Archivists do a lot more than just put papers in folders and folders in boxes. We have presented a variety of ethical issues that archivists face, and aimed to provide background information about what we do every day. Check out our previous posts on privacy, description, preservation, curation, and social responsibility.


Additional Resources

Always Already Computational - Collections as Data. (2018). Santa Barbara statement on collections as data. Institute of Museum and Library Services National Forum at the University of California Santa Barbara. Retrieved from https://collectionsasdata.github.io/statement/.

Caswell, M. L. (2016). From human rights to feminist ethics: Radical empathy in archives. UCLA. Retrieved from https://escholarship.org/uc/item/0mb9568h.

Leach, E., Billey, A., & Hurst-Wahl, J. (2018). Can there be neutrality in cataloging? A conversation starter [Webinar]. National Information Standards Organization. Retrieved from https://www.niso.org/events/2018/04/can-there-be-neutrality-cataloging-conversation-starter.

Sims, Nancy. (2017). Rights, ethics, accuracy, and open licenses in online collections: What’s “ours” isn’t really ours. College & Research Libraries News 78(2). Retrieved from http://crln.acrl.org/index.php/crlnews/article/view/9620/11028.