Components and Functions

From Metadata-Registry
Jump to: navigation, search

Metadata Management System Components and Functions

Metadata Repository

One of the primary functions of the entire Metadata Management infrastructure is the aggregation of metadata from both multiple providers of metadata and services providing normalization and enhancement of that metadata. The Metadata Repository (MR) component provides an essential central storage function for this aggregation of data.

Metadata coming into the MR is shredded into individual elements or statements which form the base storage unit in the database. Each statement is uniquely identified, and an association is maintained with its parent item record, the collection or service providing that item record, and the metadata or service provider that is providing that service.

The MR also tracks changes and updates to each statement and maintains a record of each harvest that creates or updates a statement. This high level of detailed data provenance allows very fine-grained data analysis and allows downstream services created to consume data from the MR great flexibility in tuning their data usage to their needs.

Harvest Services

In order to get data into the MR, there must be a comprehensive suite of services to manage the data sources and services providing the data, and to coordinate and manage the flow of data into the MR.

Service Provider Registration

Service Providers are the organizations and individuals that provide and manage the Services. A single Service Provider may be responsible for one or more Services and the MMS provides the ability for Service Providers to identify themselves and their organization, register organizational, administrative and technical contacts, manage the authentication and level of authorization for these individuals within the system, and register the individual services they're providing.

Service Registration

Both services that provide metadata and the services that provide additional enhancements and cleanup (all just 'Services' from now on) will need to be registered with the system in order to be effectively managed. It's also important that Services identify themselves and describe the nature and components of their service in order to provide downstream consumers of the data they provide with a better understanding of the source and quality of that data (as well as the techniques and methods of creation and updating).

Service Registration provides interfaces to allow Services to identify themselves to the system along with their location, access requirements, and interfaces. Both human interfaces and machine APIs will be provided in order to allow maximum flexibility.

Integrated OAI harvester

All of the Services are envisioned as intermediaries between other sets of interfaces. The primary purpose of the MMS is to move bulk data between services for processing and storage. The OAI-PMH represents a highly effective metadata transport protocol and an OAI harvester is an essential component of the MMS.

The integrated OAI harvester is designed to be flexible and forgiving in its handling of common metadata validation errors, allowing otherwise invalid multi-record metadata harvests to be processed on an individual record basis, rejecting only records that can't be processed, and logging the results. Upon completion of a harvest, these error logs are passed to the notification service, and registered Service administrators or technical contacts that have signed up for event notification will receive an email or a notice in their RSS feed. They'll then have the opportunity to correct or delete the invalid records and schedule an immediate one-time incremental harvest to retrieve them for addition to the MR data store.

Harvest Scheduler

The scheduler allows Service adminstrators and MMS Editors to schedule repeating OAI harvests. Any number of harvests may be scheduled for a Service, and each may be run at different frequencies with different parameters. For instance an incremental update harvest might be run every week, but a complete cleanup harvest might be run only every 6 months.

The scheduler makes sure the Harvester is invoked at the correct time on the correct date with the correct parameters, logs the results of the harvest, and informs the Notification Service that a harvest has been started and completed (or not).

Event Logs and Histories

All of the harvest parameters, along with the date, time, result of the harvest, and several harvest statistics are stored in a log. The harvest logs can be viewed for an individual scheduled harvest, the historical record of all harvest from that scheduled requests, all harvests from that service, or all harvests from all services.

Harvest results and statistics can be tracked through these logs providing a detailed historical analysis of the interactions with individual services. For instance it's easy to determine from the logs that a service may be experiencing a high harvest failure rate, or a diminishing number of records are being harvested, or the current harvest frequency is too high and is providing too few records per harvest.

Error-Tracking and Notification

As noted above, harvest errors and other events are logged and are categorized into event groups. Service contacts and MMS managers can sign up to be notified of events in any event group or any individual event by either email or RSS/Atom feed. Service administrators can only sign up for notifications from services for which they are registered. This allows for a very fine-grained control of the flow of information and ensures that notification recipients only receive the notices they're really interested in receiving.

User Access Management

MMS users will have several different levels of system access depending on their role within the system. For instance MMS editors will be able to view logs for all services, manage harvest schedules for any service, and edit and manage Service registrations for any provider. The technical contact for an individual Service Provider might be authorized to manage harvest schedules for a single service or schedules for all services. Or she might be authorized to fully manage all of the Services for that provider.

The MMS allows Service Providers to exercise detailed control over functional authorizations for system activities for the Services they provide, while providing global authorizations for MMS Editors and Administrators.

Harvest Diagnostics and Helpdesk

Harvests periodically fail: data that was valid suddenly becomes invalid, servers crash and aren't rebooted, network connections fail, servers are moved and aren't re-registered, fresh data is served that isn't valid. All these issues and problems arise continually in the context of an active and complex repository. The MMS includes facilities to diagnose server failures and invalid data and determine enough specifics about the problem to inform a service provider. An integrated helpdesk provides FAQ management and individual customer support tracking (particularly important where different MR staff handle different kinds and levels of problems). A technical forum provides an opportunity for public discussion and resolution of problems.

OAI-PMH Servers

The MMS will provide a reference OAI server to Service Providers for use in setting up OAI-PMH-based data interchange.

One Model, Many Platforms

The reference OAI server has been designed with a single internal object model designed to be implemented in any object-oriented language. This will allow considerable flexibility in creating servers in any language and operating system with which a Service Provider is most comfortable. Service Provider technical staff and MMS consultants will save considerable time in getting OAI servers set up and running.

Plugin Data Connectors

The server provides a single simple data provision API designed to support drop-in, plug-in data connectors. A single server can support any number of data connectors and can use them to aggregate data from multiple sources into a single feed, or provide data from specific data sources in response to request for specific collections or metadata formats.

Comprehensive Test Suite

The MMS provides a comprehensive remote acceptance test suite that fully tests the functionality of an MMS OAI server. The MMS OAI server is also provided with a set of test data that can be accessed from the remote test suite to ensure that the server is correctly serving a known and correct set of data to compare and isolate data source problems form the server itself.

Metadata Quality Improvement

The MMS can be viewed as a metadata washing machine, making unruly piles of dirty metadata clean and neatly folded. One of the key elements of the this process is integrated services to support evaluation and improvement, both automated and manual, of metadata.

Each of these services uses the MR as its data source: harvesting the data to be processed via OAI-PMH, processing the metadata, and supplying the results back to the MR via OAI-PMH. Each of the upgraded or additional metadata statements supplied are stored like any other statements in the MR -- each as a component of its own individual record, allowing it to be tracked and managed like any other metadata statement. A relationship is maintained at the record level between the complete original source record and the record containing the service metadata. Statement-level provenance is maintained throughout the process.

When the original record is harvested from the MR, it can contain:

  • just the original supplied metadata
  • the original metadata and all of the additional and normalized statements in a 'mudball', complete with statement-level provenance
  • just the improved metadata and only those statements from the original metadata that have not required improved in some way, again with complete statement-level provenance. We call this the 'gold' version of the metadata description -- which statements are included is dependent on a rating assigned to each service.

Validation and Normalization

All harvested metadata is initially validated by the Harvest Service before insertion into the MR, as noted above, creating by default a body of harvestable metadata that is schema-valid XML.

Immediately after a harvest, the MMS invokes the Safe-transform Service, which uses a set of common data transformations to normalize common data items such as the proper identification of values from known vocabulary schemes, e.g. mime-types, and dates.

The MMS also supports the creation of custom, service-specific transformations that allow fine-tuning of normalizations to the idiosyncrasies of the metadata provided by a particular Metadata Provider. If a service-specific transform exists, it is automatically invoked after the safe-transformations are complete.

Both the safe and service-specific transformation services support normalization of metadata at the vocabulary level. Service Providers will be able to create crosswalks from terms in their own metadata vocabularies to terms in published vocabulary schemes as well as formalize and register their own vocabularies using the Metadata Registry service provided as part of the NSDL.

A simple user interface will be provided that will allow service providers, as well as MMS Editors, to utilize their own expertise in building and maintaining these custom transformations.


Services are also provided that examine the resource associated with the metadata record and provide additional metadata such as:

  • Subject keywords and LCSH subjects based on an examination of the textual content of the resource
  • Mime-type verification
  • Location verification -- is the resource still available?

Any type of service can be created that provides additional metadata about a resource. New service metadata is aggregated into the MR by registering the service with the MMS, harvesting metadata from the MR, and resupplying the metadata using OAI while maintaining the identifier of the original source metadata record in the provenance section of each OAI record provided.

MMS Editors apply an accuracy rating to each of these services based on evaluation of the quality and accuracy of the metadata provided. Consumers of metadata from the MR may choose to accept or reject metadata from services based on the service's quality rating. Each metadata record supplied by the MR contains a complete provenance of each statement and a brief description of the service that supplied it, allowing each statement to be individually evaluated for its quality and reliability.

Non-OAI Data Services

Inevitably there will be a desire to aggregate metadata from non-OAI data feeds, such as RSS or Atom feeds. Many of these feeds, while not conforming to the OAI protocol contain regularly updated, valid, parsable XML that can be easily crosswalked to DC. The MMS will provide a service that allows RSS and Atom feed providers to create such crosswalks, have their feeds regularly harvested by the MMS RSS-OAI service and served via the OAI protocol.

Metadata Exposure (To be added)

Validation, flexible OAI server, provenance exposure, multiple output formats (linked to schema and crosswalk registry)

Submission Support (To be added)

Recommendation service (basic editing, user management, small IR with no doc management), RSS feeds as beginning submission