University of Rochester eXtensible Catalog Project

From Metadata-Registry
Revision as of 19:58, 12 December 2007 by Diane (Talk | contribs)

Jump to: navigation, search

University of Rochester eXtensible Catalog Project

HUB Deliverables

OAI Server

The reference OAI server has been designed with a single internal object model designed to be implemented in any object-oriented language. This will allow considerable flexibility in creating servers in any language and operating system with which a Service Provider is most comfortable. Service Provider technical staff and MMS consultants will save considerable time in getting OAI servers set up and running. In addition, the server:

  • Must provide a single simple data API designed to support drop-in, plug-in data connectors.
  • Must support any number of data connectors and can use them to aggregate data from multiple sources into a single feed
  • Can provide data from specific data sources in response to request for specific collections or metadata formats.

The server should provide appropriate responses to harvest requests that facilitate problem solving:

  1. Self-test' mode initiated by the harvester: the server would receive a special request from the harvester (outside of the protocol) that would cause the server to switch from its normal dataset to a standard-in-all-the-servers test dataset, which would then be used as the data for all of the protocol tests. This would allow the harvester (or a test service) to test every facet of the protocol without the variable of unknown data parameters. It would also allow the server to be setup, configured, and tested independent of local data, minimizing errors caused by data handling problems.
  2. Server should have the ability to embed useful information in any error response it sends to the harvester, including some administrative notification ability in the server itself.

OAI Harvester

For proper functioning, the harvest process must succeed on both a protocol level and a data level. First time interactions with any server must begin with a test of that server's compliance with the OAI Protocol itself. Current testing services are inadequate and allow server administrators to believe that they are compliant, when they are not. OAI Protocol Testing Service The Test Service would operate as a part of the harvester but not require that interactions end with a harvest. It would provide a comprehensive remote acceptance test suite, open to anyone, that fully tests the functionality of a compatible OAI server.

The service should be able to test the server responses to both a valid and an invalid request -- including:

  • missing parameters
  • misspelled parameters
  • whether the request responds to mixed-case
  • whether the request considers non-mixed case an error
  • resumption tokens
  • record counts
  • multiple format support

To provide this service, there would need to be a set of test data that can be accessed from the remote test suite to ensure that the server is correctly serving a known and correct set of data to compare and isolate data source problems form the server itself. This will require some interaction with a harvester and both the harvester and the server being tested must have the same dataset to test against.

Testing sequence:

  1. The server would receive a special request from the harvester (outside of the protocol) that would cause the server to switch from its normal dataset to a standard-in-all-the-servers test dataset, which would then be used as the data for all of the protocol tests.
  2. This would allow the harvester (or a test service) to test every facet of the protocol without the variable of unknown data parameters.
  3. This process would also allow the server to be setup, configured, and tested independent of local data, minimizing errors caused by data handling problems.
  4. The tests would also make sure that the server returned valid error messages as well as valid data

Validation results should be available to HUB Managers as well as individual server administrators. HUB Managers need to know about validation results, although:

  1. Tests passed should be reflected in detail in the administrative interface, the Hub Manager should not be notified of successful tests unless notification is specifically requested
  2. Tests passed should result in an attempt to harvest, unless the test is done on a server not intended to be harvested
  3. Tests failed should be reported to the Hub Manager and reflected in the administrative interface. Reports to the Hub Manager should allow subsequent steps to be taken easily, either via a helpdesk application or manually
  4. Tests failed should be reported to the administrator of the server, specifying appropriate follow-up options, including access to the helpdesk application.

Data Testing

The HUB Harvester must be flexible and forgiving in its handling of common metadata validation errors, allowing otherwise invalid multi-record metadata harvests to be processed on an individual record basis, rejecting only records that can't be processed, while logging the results and notifying appropriate parties.

Common tests for metadata problems:

  1. Compare the first OAI record ID and dc:identifier to ensure that service provider is not using the record ID as the resource ID. If the data fails this test it is rejected, notification is sent to the HUB Manager and a ticket created in the Helpdesk application
  2. Unescaped characters invalid in XML
  3. Embedded HTML
  4. Incorrect record counts

The normal server responses should contain accurate counts of the records available:

  • total overall
  • total by set
  • total by format

This is optional and rarely implemented, but the protocol allows for it even stating that the count and content of the records in a response should not vary during the life of the request -- no dropping or adding of records to the response between the first resumption token and the last.

  • One solution: If the 'identify' response from the server includes a record count, embed the total record count in the 'getRecord' response and the resumption tokens. Generate a notification to the HUB Manager if the count of the data harvested doesn't match the count declared by the server. If the count does not match, put the server in test mode and make sure that it wasn't an internal configuration problem (or a harvester problem) by checking the standard dataset response.

[MORE TESTS?]

Services Registry (includes db and admin interface)

Service Providers are the organizations and individuals that provide and manage the Services. In view of general preferences to harvest data where it occurs and use human resources only when necessary, we intend to make a first pass at service registration where an OAI server is specified by simply harvesting the data that the server provides in compliance with the OAI-PMH specification. This is the approach used to populate the [http://gita.grainger.uiuc.edu/registry/searchform.asp University of Illinois/Urbana-Champaign OAI Registry.

A single Service Provider may be responsible for one or more Services and the HUB provides the ability for Service Providers to:

  • identify themselves and their organization
    • This includes the OAI server address and sufficient information to allow the OAI Harvester to validate compliance with OAI-PMH and to initiate a harvest.
  • register organizational, administrative and technical contacts
    • Name, role and email address are required. Other kinds of addresses and alternative contact information can be filled in, but are not required. The email address of the primary contact (the person providing the registration) will require validation prior to the registration of additional contacts. A role vocabulary will be provided as a pull down list, but text entry will be allowed as well.
  • manage the authentication and level of authorization for these individuals within the system
    • Default notifications and authorizations based on role will be enabled, but can be overriden by the server admin at the time of registration or during a later login.
  • register the individual services they're providing
    • A service vocabulary and definitions will enable characterization of services at the time of registration. Use of these vocabularies will be required, although additional tagging will be allowed to further develop the vocabularies. Description of the services or links to service descriptions will be required.

Other Notifications:

  • The HUB Administrator should be notified of new Service Registrations, including when email notifications to registered contacts bounce.
  • The Service contacts should be notified of the Service Registration, including how to change their settings if they are incorrect.

Mudball & Gold schema design

Database design

Shredding (QDC, OAIDC, MARCXML)

Reassembly (Mudball)

====

Service Orchestration (ping, ordering, notification)

  • Upon completion of a harvest, error logs are passed to the notification service, and registered Service administrators or technical contacts that have signed up for event notification will receive an email or a notice in their RSS feed.
  • Technical contacts will have the opportunity to correct or delete the invalid records and schedule an immediate one-time incremental harvest to retrieve them for addition to the MR data store.

Hub Raw Data browser

Hub Identifier management (record, resource, relationships)

LC authority to OAI service



HUB Use Cases


Archive Page