NCSU Endeca Home

Purpose & History

People

Technology

Bibliography

Publications

Presentations

Press

Blogs / Discussions

FAQs

Research Projects

Send Comments


Endeca Homepage

Endeca at the NCSU Libraries

The Technology Behind the Endeca Catalog

Basic information

The new online catalog is an implementation of the Endeca Information Access Platform Guided Navigation software. The Endeca software creates a "navigation engine" process that responds to search queries and an API for building a web application that communicates with this back-end server. NCSU created a servlet based Java web application that uses URL parameters to construct the user's query, send that query to the Endeca navigation engine, and display the results. The application server uses Apache and Tomcat.

Once a properly formatted query is submitted, the Endeca navigation engine returns a complex object that includes the resulting records and their properties (title, author, etc.) as well as all available refinements. The web application parses this object to display the results list page. Java Beans are used to parse holdings-level data into easy-to-use objects for the jsp files that actually produce the resulting web pages.

architecture

Nearly every feature on the results list page is enabled using Endeca's pre-defined URL parameters. This includes sorting results and paging through result sets.

Call numbers

Since Endeca is not a library-specific application, extra logic was required to enable correct sorting for Library of Congress call numbers. The first LC call number for a title is identified as the sorting call number. Then NCSU uses a Perl script to add padding to this sorting call number. When padded, the call numbers sort correctly using Endeca.s default ASCII alphabetical sort.

To make the LC Class dimension work properly, NCSU built a similarly padded LC hierarchy based on the documentation on the Library of Congress web site. The hierarchy creates call number ranges into which the padded LC class numbers fall. Unlike call number sorting, the LC class number for each item belonging to a title is identified, since it is theoretically possible that a single title has different LC class numbers. NCSU uses only the first 1-3 letters and the following decimal number of the call number for the LC class number (cutter is excluded).

Data processing

The Endeca portion of the online catalog does not use a live index of NCSU's MARC data. NCSU started with a full bibliographic and item record export from SirsiDynix Unicorn. Each night at 00:30, the system generates a report of records that have been added or modified to the MARC database. Using the SirsiDynix cat_key (a unique identifier for the record), the Endeca database is updated. The process works like this:
  1. System creates list of updated cat_keys
  2. System extracts MARC records and holdings (stored in MARC 999) for the list of cat_keys
  3. The MARC records are reformatted into a flat file using MARC4J
  4. A perl script compares the modified/added record list to those already in the Endeca data, and overwrites/adds based on cascading rule of unique identifiers (unfortunately, SirsiDynix does not have a unique record identifier in the MARC record)
  5. Once the data is updated, a full index of the Endeca data is run (note: Endeca does offer a partial indexing, i.e., updates only, module)
  6. Endeca Navigation engine is restarted
The process outlined above is managed with a shell script that runs every night from cron. The entire process, including re-indexing the entire database, takes approximately 7 hours. Future changes in the architecture of the technical backend should cut this re-indexing time in half.

New Features

We are continually working to improve the functionality of our Endeca catalog.

The CatalogWS (web services) project is one such effort to make catalog data available via XML through a simple web API. Our goal is to enable easier access to the catalog data so that other applications can take advantage of it. See CatalogWS Applications for a list of applications that take advantage of these web services.