NCSU Libraries Focus Online
Volume 22 number 2 - Winter 2002
First Temple of the Atom Electronic Text Project: Increasing Access to
Special Collections in the Digital Age
By Russell S. Koonts, Special Collections Department, and James Jackson Sanborn,
Research and Information Services and Digital Library Initiatives
". . . at 59 minutes past midnight in the early morning hours of September
5, 1953, the Raleigh Research Reactor breathed with nuclear life for the first
time. . . . For 51 months--four years and 12 weeks--the world's first college-owned
nuclear reactor was in the making, evolving from a dream through negotiations,
design, and construction to initial operation. . . . The N.C. State nuclear
reactor was (1) the first to be used entirely for peacetime training and research,
(2) the first to be operated on any college campus as a non-AEC reactor, (3)
the first to be open for public inspection with visitors welcomed. . . ."
--from First Temple of the Atom, NC State School of Engineering,
ca. late-1950s
In July 2000 the NCSU Libraries' Special Collections Department established
an electronic texts program to provide increased access to unique and interesting
items from its collections. The first priority required choosing a project
small enough to complete successfully, but varied enough to allow experimentation
with tools and procedures at each stage of the process and to establish program
standards. Work on a Web site detailing the history of the first Raleigh Research
Reactor at North Carolina State College (operational from 1953 to 1955) had
been conducted earlier in the year. In light of this, the department decided
that the first electronic text project should focus on key historical and archival
documents relating to the establishment of the reactor program. The intent
was to add value to the Web project and to increase awareness of the unique
materials housed in Special Collections.
Throughout the summer of 1999, staff members from Special Collections and
Digital Library Initiatives reviewed boxes of historical documents from the
College of Engineering, Office of the Dean, Department of Nuclear Engineering,
and Engineering Communication record groups. The survey identified for potential
inclusion nearly 250 documents that trace the history of the reactor from its
initial conception, through its creation, to the nuclear accident that caused
it to be decommissioned.
Among the papers identified for possible inclusion were many that related
to NC State faculty members who founded the reactor project: Clifford A. Beck,
A. C. Menius, Newton Underwood, Arthur Waltner, and Raymond Murray. Murray
further assisted library staff by providing context for the reactor program
and the importance of key documents. Additionally, he had donated his "Reactor
Notebook" to the University Archives several years earlier. The ninety-four-item
notebook--consisting of memoranda, reports, notes, and experimental results--covers
the formative days of the reactor planning and became the cornerstone of the
reactor electronic text project.
In July 2000 Russell S. Koonts, James M. Jackson Sanborn, and Maryjo George
determined which documents would be included in the reactor project. The team
reviewed documents listed in the 1999 survey and selected documents for scanning
based on subject matter, level of perceived digitization difficulty, and interest
to the processing individual. Clifford Beck's 1950 "Proposal of a Nuclear
Reactor at North Carolina State College" was the first document chosen
because of its considerable length, its use of images, and significance in
establishing the reactor program at North Carolina State College. Each document
selected then required intensive work to prepare it for the Web.
Special Collections unveiled the "First Temple of the Atom Electronic
Text" Web site in December 2000 (http://www.lib.ncsu.edu/archives/etext/engineering/reactor),
which coincided with the Department of Nuclear Engineering's fiftieth anniversary
celebration. Additionally, a site devoted to Murray's notebook is accessible
at http://www.lib.ncsu.edu/archives/etext/engineering/reactor/murray/index.html.
When Special Collections began the project, it hoped to make documents available
to a wide range of individuals who might not otherwise be able to examine the
documents. Since its public announcement, the site has been used in presentations,
classwork (by students and professors), and historical and genealogical research.
The site has been indexed by Yahoo and Google and is included in the North
Carolina Exploring Cultural Heritage On-Line (NCECHO) Web site, which provides
access to the special collections of North Carolina's libraries, archives,
and museums <http://www.ncecho.org/>.
First Temple of the Atom Document Digitization Process
The documents digitized for the reactor electronic text project went through
a five-step process to make them accessible via the World Wide Web. Following
is a brief description of the five steps.
Step 1: Digitization
The first method used to convert traditional paper documents into electronic
texts is to scan them onto a computer. The initial high-resolution scan, or
master image, is captured in color and saved as a TIFF file roughly forty-four
megabytes in size. To produce smaller files appropriate for viewing on the
Web, TIFF images are then converted and compressed to full-size and thumbnail
JPG files. A thumbnail of the image is used to display alongside the document
text when viewing the digitized item and provides a link to the larger image
of the original document.
Step 2: Transcribing/Converting Text
After a document is scanned, the text must be either transcribed or converted
into editable text. If the item is merely a few pages, hand written, or contains
poor quality typeface, library staff transcribe the document. If the document
is longer, typed, and has a clear typeface, the library uses TextBridge9.0,
an Optical Character Recognition (OCR) program, to capture the text for future
encoding. TextBridge imports the TIFF file and converts the image to a text-based
file. After conversion, the converted text is compared to the original for
accuracy. Conversion using OCR can reach a 98 to 99 percent accuracy rate.
Step 3: Encoding
The third step uses an encoding language to convert the text-based file into
a format viewable on the Web. Special Collections projects use an eXtensible
Markup Language (XML), the Text Encoding Initiative (TEI) tag subset named
teixlite. The TEI is an international project that is developing guidelines
for the preparation and interchange of electronic texts for scholarly research.
The teixlite subset, which consists of the most widely used tags from the TEI
standard, allows the library to identify people, places, dates, and other content
within the documents by selecting the appropriate tag and attributes.
Step 4: Validating/Parsing/Viewing
In XML, tagging and encoding is controlled by industry-wide standards. Tags
must be opened and closed in a precise order and must adhere to strict guidelines
relating to usage and placement. Such definitions appear in a "document
type definition" or dtd. To ensure that library documents meet teilite
practices, Special Collections runs a program that checks tag formations against
the rules set forth in the dtd. Referred to as parsing, the program reports
any errors it encounters as it checks the document tags and their locations.
When an error is reported, the encoder locates and fixes the error.
Step 5: Providing Access
Today's Internet browsers do not have the capability to display XML-based
documents without translation. To make XML files available to the public, they
are translated into HTML documents using the eXtensible Stylesheet Language
(XSL). For example, an XSL stylesheet converts the XML tags <title render="italic">
</title> into
the HTML tags <i>
</i> for the sake of display. However, the
original file retains the XML tags to provide increased searching capabilities.
Search engines can then be programmed to search only for words within specific
tags instead of a generic keyword search (e.g., one can find all documents
with the date of 1952 and Raymond Murray as the author) Presently, Internet
Explorer 5.5 and 6 are the only commercially available browsers that support
XML/XSL translated documents. Because of browser limitations, Special Collections
provides both XML and HTML versions of the reactor documents.
|