Narrative - DPLA Collection Achievements and Profiles System

pdf

Introduction

Our DPLA Beta Sprint contribution describes a model for how cultural heritage organizations with an existing online presence can maximize the discovery and use of their digital library collections through a minimal amount of centralized coordination facilitated by the DPLA. We advance the notion of the DPLA as platform that can help both large and small organizations (including, but not limited to libraries, museums, archives, and other cultural organizations) maximize their efforts to “make the cultural and scientific heritage of humanity available, free of charge, to all” 1. We believe that the DPLA should encourage community adoption of a variety of lightweight standards that enable collection managers and service providers to collaborate in new ways.

A major early obstacle facing the DPLA initiative is getting a critical mass of content providers to participate in the collective effort. The potential group of organizations that can contribute content to the DPLA effort is vast. These organizations operate under varying human resource, financial, and technical resource constraints. Given the DPLA aspirations as a “big tent” initiative, we believe it is critically important that the DPLA take an approach that is inclusive and extensible from the very start. Our contribution describes one approach to achieve these goals.

This narrative aims to describe our proposal in enough detail that the reader can envision a hypothetical working system. It is not intended to function as a technical specification, program plan, or description of future work. This contribution is primarily one of ideas, and is intended to stimulate thinking about the role of the DPLA.

Assumptions and Scope

“The future is already here—it’s just not very evenly distributed” -William Gibson

The World Wide Web provides an ideal infrastructure for making DPLA-affiliated content widely available. Although we don’t prescribe a specific technical implementation, we envision the proposed system utilizing a Web-based architecture that enables relatively low barrier interoperability with other Web-based services.

We believe that a significant amount of cultural heritage content is already published in digital form on the Web, but is largely hidden and poorly discoverable by the general public. In this context, we think that the DPLA initiative is well positioned to act as a catalyst to virtually move large amounts of cultural heritage content from the invisible Web 2, to the visible and interoperable Web. Further, we assume that continued digitization efforts and improvements in Web publishing tools will bring greater amounts of content on the Web in the coming years. We believe the DPLA has a potentially significant role in improving the discovery and reuse of this content.

In our model, the DPLA provides a shared infrastructure for describing collections in a common way that can be utilized by many different service providers. The scope of our proposed system is to provide a open data platform rather than a specific service, tool, or user-facing application.

Guiding Principles

“Simple things should be simple, complex things should be possible.” -Alan Kay

Our proposed system was shaped by a set of guiding principles that our team formulated at the start of our Beta Sprint effort. We share these to help explain some of the design choices in the proposed system, and to advance one point of view regarding the role of the DPLA.

  1. Provide an extremely low barrier for collective participation
  2. Provide a clear path for organizations and institutions to deepen their engagement with the initiative over time
  3. Make explicit the benefits of participation, and provide an immediate return on investment wherever possible
  4. Enable the general public to participate in the description and promotion of collections
  5. Incorporate successful models of community building and engagement from the broader World Wide Web
  6. Leverage existing Web-based tools, protocols, and standards to maximize interoperability on the Web
  7. Assume that the best uses of this infrastructure will emerge over time, therefore design for extensibility from the start

Proposed System

“The Web is full of these kind of unanticipated uses. I think that’s part of the DNA of the Web… is that once it’s up there, people will make use of it in ways you can’t anticipate. We haven’t leveraged that fact enough.” -Daniel J. Cohen

This section describes a straw man proposal for a DPLA-managed web-based platform to encourage the discovery and use of existing cultural heritage collections among the general public.

Architectural Model

Our proposed system consists of three components.

  1. Collection Profiles
  2. Collection Achievements
  3. Contributor User Model

These components work together to create a platform intended to aggregate information about an ever growing number of collections on the open Web. Collection Profiles provide a mechanism for describing distributed collections in a common way, and Collection Achievements provide a mechanism for extending the scope of these descriptions to enable new services. We also propose a simple user model to manage metadata contributions from the broader community.

Collection Profiles

The fundamental component of the proposed system is a DPLA-managed database of Collection Profiles about extant cultural heritage collections on the Web. Our inspiration for this comes from social networking websites such as Facebook and LinkedIn that have been very successful in aggregating richly descriptive profiles of people and organizations on the Web. These social networking platforms have also facilitated 3rd party developers to create innovative services that leverage the structured data contained in these profiles. Our model proposes something similar for cultural heritage organizations. Rather than build a online directory of individuals or organizations, we propose that the DPLA build an online editable directory of collections.

Collection Profiles are intended as means to an end. The goal of this work is to enable service providers and developers to leverage this shared description infrastructure to build new aggregation and interoperability services. These services could include search indexing services, visualization tools, instructional tools, promotional tools, analytics services, and recommendation services. Since DPLA Collection Profiles are published on the open Web, these DPLA-scale services could be developed by the DPLA organization itself, the extended DPLA community, or 3rd party developers. This open data approach means that services can be developed without prior coordination or consent by the DPLA.

We propose a Wikipedia-style community editing approach for building Collection Profiles. In this model, anyone with a registered user account can create a DPLA Profile record for a given collection. We advocate for a “minimum viable collection profile” that requires only a collection name, URL, and a brief description. This approach optimizes for increasing the number of collections represented in the system, rather than optimizing for collection description quality or comprehensiveness. The assumption is that volunteers will improve these Profiles over time.

Figures 1-3 illustrate webpage representations of two hypothetical Collection Profiles. The first Profile (“Virginia Military History Museum”) is unclaimed, whereas the second Profile (“Cape Fear North Carolina Maps Collection”) is claimed by a user named “Cape Fear Libraries”. Figure 4 illustrates one way of providing machine-optimized access to profile data, specifically by providing JSON API access over HTTP. The Collection Profile data could be serialized in many different formats.

In addition to providing API access to individual profile data, the DPLA could also provide baseline services that aggregate DPLA-wide profile data. These baseline services could take the form of APIs or data dumps that make specialized services easier to build. Examples include DPLA-wide search APIs, DPLA collection recommender tools, or aggregated sitemaps for vertical search engines. This layered services model has great potential to fulfill the promise of the DPLA as a “generative platform for undefined future use cases” 3.

Collection Achievements

DPLA Collection Achievements provide a mechanism for expanding Collection Profile descriptions. Collection Achievements are named entities that represent specialized collection enhancements that can be attached to DPLA Collection Profiles. Achievements provide a mechanism for making the activity of standards adoption more explicit.

The Achievement idea is inspired by the achievement/badge gamification technique prevalent in gaming culture. In a gaming context, achievements provide a framework for gamers to complete micro goals that emphasize different aspects of gameplay, rather than focusing on “leveling-up” along a singular path. In the DPLA content, Achievements provide micro goals for enhancing Profile descriptions along many different vectors, thus enabling small and large content providers to participate equally. Each collection can collect a set of Achievements that is tailored to the attributes of the collection. The Achievements model provides a versatile framework for describing collections, rather than imposing a one-size-fits-all or a lowest common denominator approach to describing collections.

Figure 6 illustrates a collection-specific browsing interface for Achievements. Appendix A includes a longer list of proposed Achievements organized by category.

We intentionally selected a diverse set of Achievements to stimulate thinking about this model. Additional planning work would be required to identify the best starter list of Achievements for the program. Ideally, Achievements would be defined via a community-driven process so that the most useful and accessible Achievements are given priority. Over time, new Achievements could be minted that capitalize on new technology or new models for interacting with collections.

Where applicable, each Achievement should have an explicit set of requirements and supporting documentation. Ideally, verification of these requirements would happen in an automated way. We envision a suite of verification and validation tools managed by the DPLA that would periodically check to see if a collection has fulfilled the requirements of an Achievement.

Once a mature set of Achievements is in place, it should be possible to algorithmically recommend specific Achievements for specific Collection Profiles, based on attributes of the Profile/Achievement network. These recommendations would encourage collection managers to expand their Collection Profiles in relevant ways.

The Achievement model has great potential for advancing community adoption of metadata standards. The proposed system would not only increase the visibility of specific standards, but also increase the visibility of specific collections that have adopted a standard. This additional information should make it easier for developers to build services for DPLA collections that require some level of standardization across collections.

Contributor User Model

To manage contributions to Collection Profiles, we envision the need for a supporting user account model. This could follow the pattern of web-based user account management, whereby interested contributors go through a minimal online registration process to create a user account. These accounts should require an email address, to provide a hook for subsequent notifications about changes to Collection Profiles. The system would support a many-to-many relationship between users and collections, so one user could contribute metadata about multiple collections, and one collection could be edited by multiple users.

Our proposed user model distinguishes between two categories of contributors: Volunteers and Collection Managers. Volunteers are members of the general public that contribute metadata about one or more collections. Volunteers need not have a formal relationship with the organizational entity that manages the collection. We assume that Volunteers could provide the minimal metadata required to create new Collection Profiles. We also assume that some Collection Achievements could be completed by Volunteer contributions. For example, a Volunteer could provide basic information about the physical location of a collection, or provide links to published RSS feeds for a given collection.

Collection Managers are registered users that have expanded profile editing privileges for one or more collections, by virtue of their formal affiliation with a collection. The implementation should provide the option for Collection Managers to be listed by organizational name rather than personal name. To become a Collection Manager, the user would need to go through a process of “claiming” a Collection Profile. Ideally the claim would be handled through an automated verification process. Once a collection is claimed, certain parts of the Collection Profile, such as the basic collection description, could be locked from further editing by Volunteers. Additionally, Collection Managers would have expanded access to Achievements not available to Volunteers, such as the ability to complete Reuse or Administrative Achievements (see Appendix A).

The proposed split between Volunteers and Collection Managers is intended to strike a compromise between the benefits of crowdsourced contributions and the challenges of quality control over Profiles that represent a real-world entity. The user model could be simplified further by only allowing profile editing by verified Collection Managers, though this approach would run the risk of significantly limiting the representation of collections in the program long term.

Issues and Challenges

During the Sprint process we identified several issues with the proposed system if it were to be implemented.

In general, getting critical mass participation for a program this broadly scoped is always challenging at the start. Our model describes a crowdsourced approach for creating DPLA collection profiles. This approach has the potential of increasing the breadth of collections represented in the proposed DPLA Collection Profile system by enabling the general public to participate in the process of identifying and describing collections, rather than managing this centrally with a small group of curators. However, this approach introduces additional quality and scope management challenges that would need attention.

Another issue we identified is how to determine whether a specific collection is “in scope” for the DPLA. This is a broader issue for the DPLA not limited to our proposed solution, however the open profile creation model intensifies the issue.

Our model describes a system that assumes that cultural heritage organizations manage one or more discrete collections identified by a unique URL. In practice, some collections operate as a sub-collection of a larger digital library collection or online portal. These sub-collections may merit their own representation within the DPLA Collection Profile system, including their own Achievements. We did not design a solution to this issue, though we imagine it could be addressed by extending the Collection Profile schema to support named child collections, each with their own basic metadata set. Alternatively, these sub-collections could be handled as discrete collections with “is part of” relations between parent and child collections.

Our approach is primarily oriented to improve the discovery of standalone online collections and cultural heritage websites, rather than focusing on aggregating items across collections. Large scale item-level aggregation of heterogenous content that is managed in a decentralized way is a very challenging problem that the digital library community has addressed with mixed success. Despite the emphasis on collection-level description in our approach, we believe that our model could provide inroads into this problem area by providing a platform that facilitates community adoption of item-level metadata standards. For example, the combination of the “Sitemap” and “Item Microdata” Achievements would provide a rudimentary approach for harvesting item-level metadata from a collection website. New standards that facilitate item-level aggregation could be layered in over time.

Finally, our model does not provide specific guidance on how third party services are integrated into the proposed DPLA platform. One approach would be to tightly integrate services into the shared platform, not unlike what Facebook and LinkedIn have done with their developer platforms. This allows the relationship between Achievements and services to be more explicit and tightly-coupled. It would also put the DPLA in the position of managing a development platform, a significant resource commitment. An alternative approach would be to focus on building collection profiles and provide minimal public API access to Profile data. This lightweight approach would enable many different kinds of services to leverage the DPLA effort without prior coordination. More thinking is needed about the level of services integration in this model.

Conclusion

“The best way to get good ideas is to get lots of ideas, and throw the bad ones away.” -Linus Pauling

The model presented in this Beta Sprint contribution is intended to stimulate thinking about one possible direction of effort for the DPLA initiative. We acknowledge that a significant amount of upfront development effort would be required to implement the proposed system. The scope of our Beta Sprint effort did not include estimating these costs, nor did it include research into sustainability issues. These are items for future work.

If the proposed system cannot be implemented as described, perhaps it would be worthwhile to initiate a pilot project that focuses on aggregating a minimal set of actionable metadata about DPLA-affiliated collections, to gauge community interest in such a program, and identify potential obstacles to broader participation.

Alternatively, components of this proposal could be integrated into other DPLA initiatives as appropriate. These components could include Web-based collection profiles, crowdsourced profile editing, or the Achievements model.

We look forward to exchanging ideas about this proposal with interested parties.

Acknowledgements

We would like to thank members of the DLF & IMLS/DCC Beta Sprint Project team for their feedback and comments on an early version of this proposal.

Notes

  1. Concept Note – Digital Public Library of America https://cyber.law.harvard.edu/dpla/Concept_Note 
  2. https://en.wikipedia.org/wiki/Invisible_Web
  3. https://inkdroid.org/data/dpla-amsterdam.pdf 

Appendix A: Achievement Ideas

Achievements are grouped into broad families (all caps) and more specific categories (italicized). Individual Achievements (bolded) are followed by a brief description.

CONTENT

Firsts

Genealogy One of the first 100 collections verified in this category
Maps One of the first 50 collections verified in this category
Yearbooks One of the first 10 collections verified in this category

Books

Full Text The full text of books are available either from OCR from digitized books or original full text
PDF Books are available as PDFs
Epub Books are available as Epub

Images

Faces Images are categorized by whether the image contains a face or not. This can be done through automated means.

Audio

Transcript Transcripts are available for audio in this collection

Video

Closed Captioned Video is made accessible with closed captions

ACCESS/PROMOTION

Real World Access

Visitor Info Add information on address, phone number, access policy, admission, hours of operation and lat/lng
Tour A tour of the physical collection can be arranged for individuals or groups

Current Awareness

Calendar of Events Users can subscribe to a calendar of public events related to the collection
New Items Feed A newsfeed (e.g., RSS, Atom) is available that highlights new items in the collection
Collection Blog A blog about the collection is available

Online Access

Mobile Access A mobile-optimized access point is provided for the collection
OpenSearch The collection can be searched and a URL template has been provided to allow keyword searching of the collection from external applications
Exhibit An online exhibit is available which uses content from the collection

Audience

Target Audience The collection is targeted at a specific audience (e.g., K-12, General Public, Higher Education, Named Community)
Reading Level For text-based resources an indication is given of the appropriate reading level

Search Engine Optimization

Robot Friendly Checks for a robots.txt and looks for whether the site will allow crawlers
Sitemap Provides an XML sitemap of the most important pages for robots to crawl

REUSE

Teaching and Learning

Teaching Tools: K-6 Teaching tools, like lesson plans, provided that use materials from the collection and target elementary school students
Teaching Tools: HS Teaching tools provided that target high schoolers
Quiz Tool An interactive quiz tool is available that engages users with the collection

Data/APIs

Data Dump Collection metadata and/or content is available as a data dump
Data Access Collection metadata and/or content is available through one or more APIs
KML Item-level geospatial metadata is available via KML

Items and Objects

RDFa Web pages are marked up with RDFa to expose item-level metadata
Microdata Embedded HTML5 Microdata is used to mark up web pages and expose item-level metadata

Rights Management

Rights Statement A statement regarding the reuse terms applied to a collection or items within a collection
CC License A machine-readable Creative Commons license is applied to a collection or items within a collection

COMMUNITY

Fundraising

Donate A mechanism is provided for users to donate money online to help support the collection

Social Media

Discussion List An email list is available for discussion of the collection
Comments Items within the collection have comment streams
Twitter The collection has an associated twitter account to provide more information and connect with users

Crowdsource

Transcription The collection allows users to transcribe materials which could not be OCR’d
Tagging The collection allows users to describe items in the collection using conventional tagging

ADMINISTRATIVE

Upkeep

Claimed Achievement indicates that the profile has been claimed
Annual Review Collection manager has logged in and verified info about collection in last 12 months

Analytics

Tracking The collection enables site tracking through a Javascript snippet
Share to Compare The collection allows anonymized analytics to be aggregated with other DPLA collections for comparative analyses

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.