Difference between revisions of "Components and Functions"

From Metadata-Registry
Jump to: navigation, search
(Integrated OAI harvester)
Line 21: Line 21:
 
All of the Services are envisioned as intermediaries between other sets of interfaces. The primary purpose of the MMS is to move bulk data between services for processing and storage. The OAI-PMH represents a highly effective metadata transport protocol and an OAI harvester is an essential component of the MMS.  
 
All of the Services are envisioned as intermediaries between other sets of interfaces. The primary purpose of the MMS is to move bulk data between services for processing and storage. The OAI-PMH represents a highly effective metadata transport protocol and an OAI harvester is an essential component of the MMS.  
  
The integrated OAI harvester is designed to be flexible and forgiving in it's handling of common metadata validation errors, allowing otherwise invalid multi-record metadata harvests to be processed on an individual record basis, rejecting only records that can't be processed, and logging the results. Upon completion of a harvest, these error logs are passed to the notification service and registered Service administrators or technical contacts that have signed up for event notification will receive an email or a notice in their RSS feed.
+
The integrated OAI harvester is designed to be flexible and forgiving in it's handling of common metadata validation errors, allowing otherwise invalid multi-record metadata harvests to be processed on an individual record basis, rejecting only records that can't be processed, and logging the results. Upon completion of a harvest, these error logs are passed to the notification service and registered Service administrators or technical contacts that have signed up for event notification will receive an email or a notice in their RSS feed. They'll then have the opportunity to correct or delete the invalid records and schedule an immediate one-time incremental harvest to retrieve them.
 +
 
 
===Harvest scheduler===
 
===Harvest scheduler===
 
The scheduler allows Service adminstrators and MMS Editors to schedule repeating OAI harvests. Any number of harvests may be scheduled for a Service, and each may be run at different frequencies with different parameters. For instance an incremental update harvest might be run every week, but a complete cleanup harvest might be run only every 6 months.  
 
The scheduler allows Service adminstrators and MMS Editors to schedule repeating OAI harvests. Any number of harvests may be scheduled for a Service, and each may be run at different frequencies with different parameters. For instance an incremental update harvest might be run every week, but a complete cleanup harvest might be run only every 6 months.  

Revision as of 06:30, 30 October 2005

Metadata Management System Components and Functions

[Note: Include here more accessible descriptions of components from the point of view of data managers. No jargon! Should have range of functionality from simple to more long-term ideas.]


Metadata Repository

One of the primary function of the entire Metadata Management infrastructure is the aggregation of metadata from both multiple providers of metadata and services providing normalization and enhancement of that metadata. The Metadata Repository (MR) component provides an essential central storage function for this aggregation of data.

Metadata coming into the MR is shredded into individual elements or statements which form the base storage element in the database. Each statement is uniquely identified, and an association is maintained with its parent item record, the collection or service providing that item record, and the metadata or service provider that is providing that service.

The MR also tracks changes and updates to each statement and maintains a record of each harvest that creates or updates a statement. This high level of detailed data provenance allows very fine-grained data analysis and allows downstream services created to consume data from the MR great flexibility in tuning their data usage to their needs.

Harvest Services

In order to get data into the MR, there must be a comprehensive suite of services to manage the data sources and services providing the data and to coordinate and manage the flow of data into the MR.

Service Provider Registration

Service Providers are the organizations and individuals that provide and manage the Services. A single Service Provider may be responsible for one or more Services and the MMS provides the ability for Service Providers to identify themselves and their organization, register organizational, administrative and technical contacts, manage the authentication and level of authorization for these individuals within the system, and register the services they're providing.

Service registration

Both services that provide metadata and the services that provide additional enhancements and cleanup (all just 'Services' from now on) will need to be registered with the system in order to be effectively managed. It's also important that Services identify themselves and describe the nature and components of their service in order to provide downstream consumers of the data they provide with a better understanding of the source and quality of that data.

Service Registration provides interfaces to allow Services to identify themselves to the system along with their location, access requirements, and interfaces. Both human interfaces and machine APIs will be provided in order to allow maximum flexibility.

Integrated OAI harvester

All of the Services are envisioned as intermediaries between other sets of interfaces. The primary purpose of the MMS is to move bulk data between services for processing and storage. The OAI-PMH represents a highly effective metadata transport protocol and an OAI harvester is an essential component of the MMS.

The integrated OAI harvester is designed to be flexible and forgiving in it's handling of common metadata validation errors, allowing otherwise invalid multi-record metadata harvests to be processed on an individual record basis, rejecting only records that can't be processed, and logging the results. Upon completion of a harvest, these error logs are passed to the notification service and registered Service administrators or technical contacts that have signed up for event notification will receive an email or a notice in their RSS feed. They'll then have the opportunity to correct or delete the invalid records and schedule an immediate one-time incremental harvest to retrieve them.

Harvest scheduler

The scheduler allows Service adminstrators and MMS Editors to schedule repeating OAI harvests. Any number of harvests may be scheduled for a Service, and each may be run at different frequencies with different parameters. For instance an incremental update harvest might be run every week, but a complete cleanup harvest might be run only every 6 months.

The scheduler makes sure the Harvester is invoked at the correct time on the correct date with the correct parameters, logs the results of the harvest, and informs the Notification Service that a harvest has been started and completed (or not).

Event logs and histories

All of the harvest parameters, along with the date, time, result of the harvest, and several harvest statistics are stored in a log. The harvest logs can be viewed for an individual scheduled harvest, the historical record of all harvest from that scheduled requests, all harvests from that service, or all harvests from all services.

Harvest results and statistics can be tracked through these logs providing a detailed historical analysis of the interactions with individual services. For instance it's easy to determine from the logs that a service may be experiencing a high harvest failure rate, or a diminishing number of records are being harvested, or the current harvest frequency is too high and is providing too few records per harvest.

Error-tracking and notification

As noted above, harvest errors and other events are logged and are categorized into event groups. Service contacts and MMS managers can sign up to be notified of events in any event group or any individual event by either email or RSS/Atom feed. Service administrators can only sign up for notifications from services for which they are registered. This allows for a very fine-grained control of the flow of information and ensures that notification recipients only receive the notices they're really interested in receiving.

User access management

MMS users will have several different levels of system access depending on their role within the system. For instance MMS editors will be able to view logs for all services, manage harvest schedules for any service, and edit and manage Service registrations for any provider. The technical contact for an individual Service Provider might be authorized to manage harvest schedules for a single service or schedules for all services. Or she might be authorized to fully manage all of the Services for that provider.

The MMS provides for the control by Service Providers detailed over functional authorizations for system activities for the Services they provide, while provide global authorizations for MMS Editors and Administrators

Harvest diagnostics and helpdesk

Harvests periodically fail: data that was valid suddenly becomes invalid, servers crash and aren't rebooted, network connections fail, servers are moved and aren't re-registered, fresh data is served that isn't valid. Many's the slip twixt the server and the harvester. The MMS includes facilities to diagnose server failures and invalid data. An integrated helpdesk provides FAQ management and individual customer support and a technical forum provides an opportunity for public discussion and resolution of problems.

OAI-PMH servers

The MMS will provide a reference OAI server to Service Providers for use in setting up OAI-PMH-based data interchange

One model, many platforms

The reference OAI server has been designed with a single internal object model designed to be implemented in any object-oriented language. This will allow considerable flexibility in creating servers in any language and operating system with which a Service Provider is most comfortable. Service Provider technical staff and MMS consultants will save considerable time in getting OAI servers set up and running.

Plugin data connectors

The server provides a single simple data provision API designed to support drop-in, plug-in data connectors. A single server can support any number of data connectors and can use them to aggregate data from multiple sources into a single feed, or provide data from specific data sources in response to request for specific collections or metadata formats.

Comprehensive test suite

The MMS provide a comprehensive remote acceptance test suite that fully tests the functionality of an MMS OAI server. The MMS OAI server is also provided with a set of test data that can be accessed from the remote test suite to ensure that the server is correctly serving a known and correct set of data to compare and isolate data source problems form the server itself.

Metadata Quality Improvement

Validation, metadata evaluation, normalization, creation of transformations (safe and collection-specific), service rating (gold stuff), aggregation of RSS feeds, controlled vocabulary crosswalking (based on metadata registry)

Metadata Exposure

Validation, flexible OAI server, provenance exposure, multiple output formats (linked to schema and crosswalk registry)

Submission Support

Recommendation service (basic editing, user management, small IR with no doc management), RSS feeds as beginning submission