Difference between revisions of "Augmented Metadata and Annotations"

From Metadata-Registry
Jump to: navigation, search
(Outline--a harvest scenario)
Line 1: Line 1:
==Outline--an Annotation and Augmentation Harvest Scenario==
+
==Management Use cases: Native metadata harvest, crosswalking, safe and collection-specific transformations, NSDL gold metadata==
  
# The Repository publishes two metadata formats designed to support annotations and augmented metadata
+
===Use Case 1: Metadata Provision, Evaluation and Normalization===
#*The key feature of these formats is two metadata elements, one (and only one) of which must be present, either of which may be repeated:
+
 
#**xxxxUniqueIdRef? -- which contains a reference to an existing metadata record in the MR and must match an existing record
+
? is identified by Repository Collection Developers as a relevant and useful addition to the library.  A collection record is created in the MMS and ? exposes oai_dc and MARC records, and according to its policy, the Repository harvests both formats for the MR. The Repository Manager receives an email reporting the successful initial harvest and the location of csv files for the harvested metadata. Using Spotfire, she examines the files quickly and determines that the oai_dc files are very sparse, but the MARC files have exploitable information that would be useful to a standard MARC crosswalk stored and exposed in a Metadata Registry for this purpose, she initiates a transformation of the MARC file into qualified_dc and examines the results quickly using Spotfire to assure that the appropriate elements are populated and the values are appropriate. in this case primarily to associate standard encoding schemes with appropriate values) and approves the loading of the newly transformed data into the MR. The Repository then exposes the additional qualified_dc format, along with the providers supplied oai_dc and MARC, via the Repository OAI server. Additionally, the qualified_dc elements become available via the augmented formats: rqis_mudball and rqis_gold.
#**dcIdentifierRef? -- which contains a reference to a URI that may or may not exist in the MR. #*Augmented metadata must reference an existing URI in the MR. This could also be expressed as <reference type=xxxxUniqueId?> or <reference type=dcIdentifier>
+
 
#*These are intended to be used to supply annotations and augmented metadata for harvest via OAI and perhaps a services interface.
+
===Use Case 2: Collection-Specific Transformation===
#Annotation and augmentation suppliers wishing to supply metadata about a resource identified by a URI should first query or harvest the MR to get a list of metadata records that are about that URI.
+
 
#They create metadata about their annotation in the above format and serve it via OAI. This record may carry the actual annotation or it can simply contain a reference. In the case of metadata augmentation, each record served should be a self-contained, incomplete metadata record and should not reference another source of metadata.
+
self-registers via the MMS, creates an initial collection record and initiates a harvest of metadata. Because this is an initial harvest, the a form to create a collection specific transform for that collection, to reverse the values in each element and add the appropriate encoding scheme.  Since no other serious errors appear, she invokes the safe transformation and approves the data for the MR. She sends a notification to the provider, pointing out the error, and asking him to inform her when the error is corrected so that she can pull the collection-specific transform when the data can be correctly harvested.
#We harvest the records through a standard harvest -- all incoming records will have to be associated with a collection record
+
 
#The ingest process creates a unique mrec record for each incoming record
+
===Use Case 3: Crosswalking Instance Metadata===
#References in the MR must always be mrec_ids so in the case of dcIdentifierRef? the ingest process retrieves all mrecs that reference each dcIdentifierRef?.
+
 
#If a dcIdentifierRef? references a URI that is not found, an mrec record is created for that URI and is queued for metadata generation by iVia (controversial)
+
The Recommender creates a collection recordfor science museums that the MR has not yet encountered.  The Metadata Manager receives notification that this new format has been harvested, and the schema provided allows the creation of a csv file so that the data can be reviewed. The schema also supports the creation of a crosswalk worksheet, allowing the Empress to set up a crosswalk from the richer format to qualified_dc.  When the crosswalk is completed, the data is transformed and made available through the NSDL OAI server, and the crosswalk itself is registered in the NSDL Registry, for specific reference in the for use by others. The provider is also notified about the presence of the crosswalk, and invited to comment or suggest improvements.
#An entry is created in the link table for each mrec identifed either directly or by reference. This will contain the mrec_id of the annotation record, the mrec_id of the mrec being annotated, a reference type, a datestamp, and a source mrec_id
+
 
#*Note that the link table will need an additional 'source' field that will, in the case ofannotations and augmentations, contain the mrec_id of the annotation or augmenation metadata record that supplied the link.
+
===Use Case 4: Transforming Metadata Values===
#*Note also that reference type and datestamp are denormalized values that can be determined by reference to the source mrec_id if necessary.
+
 
#Output of augmented metadata is the tough thing -- it needs to be served both as a component part of the metadata format being augmented and as a distinct format, both within and without the mudball.
+
Provider “S,a long time Repository data provider, , and takes a look at the csv files to see how these changes would affect access to the the addition of Audience values.  The Manager determines that the provider is not using the available standard vocabularies but a mix of other available vocabularies and unattributed terms. She sets up a collection-specific transform that crosswalks the non-standard vocabularies to standard vocabularies as well as a quick crosswalk from the unattributed terms to standard vocabulary. She also sets up an rqis_gold profile for the provider, so that appropriate ratings are established for the range of terms available.
 +
 
 +
===Use Case 5: Machine-Generated Metadata Augmentation===
 +
 
 +
A routine crawl for item metadata is initiated via the MMS after the completion of a collection record for a resource without available metadata.  The iVia Service makes machine-generated metadata available for harvest by the Repository.  Because a rights statement applying to all the resources on the site is available, but the iVia Service does not reflect that in the items, a collection-specific transform is initiated for the collection, and the appropriate statement is defaulted in the Rights element for the items.
  
==Metadata Augmentation: Use cases==
+
==Metadata Augmentation: Use Cases for Specific Situations==
  
 
===Use Case #1:  field replacement or deprecation===
 
===Use Case #1:  field replacement or deprecation===
Line 109: Line 113:
 
#* deletion of augmentation metadata record is duly propogated through MR storage and affected XML served out of MR.
 
#* deletion of augmentation metadata record is duly propogated through MR storage and affected XML served out of MR.
  
===Use Case #7: a simple augmentation sequence===
+
==Scenarios and Sequences==
 +
===A simple augmentation sequence===
 
# Repository gets metadata record 1 from provider Q.
 
# Repository gets metadata record 1 from provider Q.
 
# Repository normalizes the metadata, creating record 1N.   
 
# Repository normalizes the metadata, creating record 1N.   
Line 119: Line 124:
 
# Repository uses <nowiki>1NiViaN</nowiki> as part of a rqis augmented/gold record
 
# Repository uses <nowiki>1NiViaN</nowiki> as part of a rqis augmented/gold record
  
===Use Case #8: a more complex augmentation sequence===
+
===A more complex augmentation sequence===
 
# Repository gets metadata record 1 from provider Q.
 
# Repository gets metadata record 1 from provider Q.
 
# Repository normalizes the metadata, creating record 1N.   
 
# Repository normalizes the metadata, creating record 1N.   
Line 138: Line 143:
 
# Repository Rating Service determines what value(s) exposed for dc:format in record 1aug, the nsdl augmented/gold record
 
# Repository Rating Service determines what value(s) exposed for dc:format in record 1aug, the nsdl augmented/gold record
  
==Use cases: Native metadata harvest, crosswalking, safe and collection-specific transformations, NSDL gold metadata==
+
==Outline--an Annotation and Augmentation Harvest Scenario==
  
===Use Case 1: Metadata Provision, Evaluation and Normalization===
+
# The Repository publishes two metadata formats designed to support annotations and augmented metadata
 
+
#*The key feature of these formats is two metadata elements, one (and only one) of which must be present, either of which may be repeated:
? is identified by Repository Collection Developers as a relevant and useful addition to the library.  A collection record is created in the MMS and ? exposes oai_dc and MARC records, and according to its policy, the Repository harvests both formats for the MR. The Repository Manager receives an email reporting the successful initial harvest and the location of csv files for the harvested metadata. Using Spotfire, she examines the files quickly and determines that the oai_dc files are very sparse, but the MARC files have exploitable information that would be useful to a standard MARC crosswalk stored and exposed in a Metadata Registry for this purpose, she initiates a transformation of the MARC file into qualified_dc and examines the results quickly using Spotfire to assure that the appropriate elements are populated and the values are appropriate. in this case primarily to associate standard encoding schemes with appropriate values) and approves the loading of the newly transformed data into the MR. The Repository then exposes the additional qualified_dc format, along with the providers supplied oai_dc and MARC, via the Repository OAI server. Additionally, the qualified_dc elements become available via the augmented formats: rqis_mudball and rqis_gold.
+
#**xxxxUniqueIdRef? -- which contains a reference to an existing metadata record in the MR and must match an existing record
 
+
#**dcIdentifierRef? -- which contains a reference to a URI that may or may not exist in the MR. #*Augmented metadata must reference an existing URI in the MR. This could also be expressed as <reference type=xxxxUniqueId?> or <reference type=dcIdentifier>
===Use Case 2: Collection-Specific Transformation===
+
#*These are intended to be used to supply annotations and augmented metadata for harvest via OAI and perhaps a services interface.
 
+
#Annotation and augmentation suppliers wishing to supply metadata about a resource identified by a URI should first query or harvest the MR to get a list of metadata records that are about that URI.
self-registers via the MMS, creates an initial collection record and initiates a harvest of metadata. Because this is an initial harvest, the a form to create a collection specific transform for that collection, to reverse the values in each element and add the appropriate encoding scheme.  Since no other serious errors appear, she invokes the safe transformation and approves the data for the MR. She sends a notification to the provider, pointing out the error, and asking him to inform her when the error is corrected so that she can pull the collection-specific transform when the data can be correctly harvested.
+
#They create metadata about their annotation in the above format and serve it via OAI. This record may carry the actual annotation or it can simply contain a reference. In the case of metadata augmentation, each record served should be a self-contained, incomplete metadata record and should not reference another source of metadata.
 
+
#We harvest the records through a standard harvest -- all incoming records will have to be associated with a collection record
===Use Case 3: Crosswalking Instance Metadata===
+
#The ingest process creates a unique mrec record for each incoming record
 
+
#References in the MR must always be mrec_ids so in the case of dcIdentifierRef? the ingest process retrieves all mrecs that reference each dcIdentifierRef?.
The Recommender creates a collection recordfor science museums that the MR has not yet encountered.  The Metadata Manager receives notification that this new format has been harvested, and the schema provided allows the creation of a csv file so that the data can be reviewed. The schema also supports the creation of a crosswalk worksheet, allowing the Empress to set up a crosswalk from the richer format to qualified_dc.  When the crosswalk is completed, the data is transformed and made available through the NSDL OAI server, and the crosswalk itself is registered in the NSDL Registry, for specific reference in the for use by others. The provider is also notified about the presence of the crosswalk, and invited to comment or suggest improvements.
+
#If a dcIdentifierRef? references a URI that is not found, an mrec record is created for that URI and is queued for metadata generation by iVia (controversial)
 
+
#An entry is created in the link table for each mrec identifed either directly or by reference. This will contain the mrec_id of the annotation record, the mrec_id of the mrec being annotated, a reference type, a datestamp, and a source mrec_id
===Use Case 4: Transforming Metadata Values===
+
#*Note that the link table will need an additional 'source' field that will, in the case ofannotations and augmentations, contain the mrec_id of the annotation or augmenation metadata record that supplied the link.
 
+
#*Note also that reference type and datestamp are denormalized values that can be determined by reference to the source mrec_id if necessary.
Provider “S,a long time Repository data provider, , and takes a look at the csv files to see how these changes would affect access to the the addition of Audience values.  The Manager determines that the provider is not using the available standard vocabularies but a mix of other available vocabularies and unattributed terms. She sets up a collection-specific transform that crosswalks the non-standard vocabularies to standard vocabularies as well as a quick crosswalk from the unattributed terms to standard vocabulary. She also sets up an rqis_gold profile for the provider, so that appropriate ratings are established for the range of terms available.
+
#Output of augmented metadata is the tough thing -- it needs to be served both as a component part of the metadata format being augmented and as a distinct format, both within and without the mudball.
 
+
===Use Case 5: Machine-Generated Metadata Augmentation===
+
 
+
A routine crawl for item metadata is initiated via the MMS after the completion of a collection record for a resource without available metadata.  The iVia Service makes machine-generated metadata available for harvest by the Repository.  Because a rights statement applying to all the resources on the site is available, but the iVia Service does not reflect that in the items, a collection-specific transform is initiated for the collection, and the appropriate statement is defaulted in the Rights element for the items.
+

Revision as of 12:29, 25 October 2005

Management Use cases: Native metadata harvest, crosswalking, safe and collection-specific transformations, NSDL gold metadata

Use Case 1: Metadata Provision, Evaluation and Normalization

Use Case 2: Collection-Specific Transformation

Use Case 3: Crosswalking Instance Metadata

Use Case 4: Transforming Metadata Values

Use Case 5: Machine-Generated Metadata Augmentation

A routine crawl for item metadata is initiated via the MMS after the completion of a collection record for a resource without available metadata. The iVia Service makes machine-generated metadata available for harvest by the Repository. Because a rights statement applying to all the resources on the site is available, but the iVia Service does not reflect that in the items, a collection-specific transform is initiated for the collection, and the appropriate statement is defaulted in the Rights element for the items.

Metadata Augmentation: Use Cases for Specific Situations

Use Case #1: field replacement or deprecation

The Repository receives a file of item records from the Whatsis provider. Each record contains a defaulted value "unknown" in the Coverage element. Based on the Repository policy to deprecate useless defaults, the element is marked as deprecated, and that assertion indicates the Repository Quality Improvement Service (RQIS) is its source. In addition, the dc:format value of "application/flash" is consistently misspelled. A second version of the dc:format element with the correctly spelled value is provided by RQIS and an error notification message is sent to the data provider. MR OAI format rqis_dc_plus will include both versions of dc:format; rqis_dc_gold will only show the correctly spelled one. Lastly, the DCMIType value of "InteractiveResource" is consistently misspelled by the provider in a dc:type field. A second version of the dc:type element with the correctly spelled value is provided by RQIS, and the encoding scheme of dct:DCMIType is added. An error notification message is sent to the data provider. rqis_dc_plus will include both versions of dc:type; rqis_dc_gold will only show the correctly spelled one, with its indicated encoding scheme.

Later, the Repository harvests updated item records from Whatsis. RQIS quality control routines are run on the updated metadata:

  1. <coverage>unknown</coverage> is provided again. The RQIS continues to keep the deprecation assertion and NOT serve this useless info to downstream users.
  2. <coverage>unknown</coverage> is no longer provided. The RQIS needs to remove the deprecation assertion because it no longer refers to an actively served statement.
  3. <coverage>unknown</coverage> is no longer provided BUT <coverage>Washington</coverage> is now provided. The RQIS removes the deprecation assertion because there is now a useful (!) value; the Repository must serve the new coverage info to downstream users.
  4. provider now serves newly misspelled "apprication/flash". Because we have a separate RQIS-provided element with the correct spelling, the new (incorrect) provider element replaces the old (incorrect) provider element, and the correct RQIS attributed element is left alone.
  5. correctly spelled "application/flash" is now provided. The Repository should now drop the RQIS-sourced correct element, as it is a duplicate of the provider sourced correct element. Or not -- the bottom line is to serve only ONE, rather than duplicate.
  6. newly misspelled "Inteactive Resource" is provided. Because the Repository has a separate element, with the correct spelling, that indicates encoding scheme, the newly incorrect provider element replaces the old (incorrect) provider element, and the correct one is retained.
  7. correctly spelled "InteractiveResource" is now provided. The MR should now either add the encoding scheme to the provider's newly correct element and drop the (duplicate) RQIS-sourced correct element, or the MR should keep both, with the encoding scheme only applied to the RQIS element. Or not -- the bottom line is to only serve ONE, rather than duplicate.
  8. provider no longer serves dc:type element. (orphaned field enhancement) Should the RQIS dc:type field be retained, or should it be discarded? If the Repository doesn't retain a connection from the RQIS assertion to the original provider assertion, then the RQIS dc:type element just remains (with what provenance?). [Alternatively, since this level of quality improvement is based on examination of metadata, not resources, the element is not retained.]

The critical thing is who makes the assertion. For example, if the original metadata provider supplies a field with a typo, "texp/html", and RQIS corrects the typo to "text/html", the original metadata provider made the assertion. However, if the original metadata provider says a resource is an image, when it's really (or also) text, then the RQIS correction has a new assertion in it.

Use Case #N1: provider updates their metadata after it has been augmented

  1. ThatsUs provides rqis_dc to the Repository
  2. iVia augments the ThatsUs items with dc:subject fields with LCC values
  3. MR harvests updated nsdl_dc from Shindy
    • ThatsUs' new rqis_dc has no dc:subject fields
    • ThatsUS' new rqis_dc has dc:subject fields with LCC values
  4. Q: Under what conditions do we trigger new iVia augmentations? Only if primary identifier changes?
  5. Q: (When do safe xforms happen? where are they in this sequence?)

Use Case #N2: augmentation service updates their provided augmentations

  1. ThatsUs provides oai_dc to the Repository
  2. iVia augments the ThatsUs items with dc:subject fields with LCC values
  3. iVia newly augments the ThatsUs items with new, improved dc:subject fields with LCC values
    • do we set up the process to assume augmentations supercede older versions of themselves?
  4. Q: (When do safe xforms happen? where are they in this sequence?)

Use Case #N3: auto-chosen/auto-gen item metadata is augmented by another service

  1. iVia does a crawl and provides item level metadata to the Repository as collection wowza.
  2. ENC augments the wowza items with dct:audience fields
  3. SDSC augments the wowza items with dc:format information and information about broken links.
  4. Q: (is there anything special about this case, or is it the same as N1?)
  5. Q: (When do safe xforms happen? where are they in this sequence?)

Use Case #2: multiple equivalent resources and their relationship to augmentations on output

ENC provides the Repository with metadata augmentations asserting that specified items in a number of collections relate to the Illinois third grade science standard for basic understanding of photosynthesis. ENC provides the Repository with metadata records identifying a Repository metadata record ID, a URL (providing an internal check as well as an additional identifier for the resource) and the DC refinement "conformsTo" specifying the particular standard to which the resource is related. This element contains the source ENC and is identified as human created data. The Repository Simple Equivalency Service (based on resource URLs) identifies three other items in other collections that this relationship assertion applies to, and the appropriate links are made, and the resource metadata records (aggregated version only) updated in the Repository OAI server.

Use Case #3: Multiple providers of metadata and augmentation </b> -- original metadata provider, RQIS (as augmenter), 3rd party augmenter, metadata served out in various flavors

The Whomever Collection supplies NSDL with 2233 item records described with oai_dc metadata. Based on routine normalization procedures, NSDL adds several encoding schemes to the records: "URI" to the identifier element (all values begin with "http") and "DCMIType" to most of the Type values which are valid DCMIType terms. In each of these cases, the source of the data continues to be identified as the original data provider. Several weeks later, the iVia staff harvest the metadata for the collection, and feed back to NSDL LCC classification and LCSH subject headings for the collection. This information is identified as originating with iVia and also as machine generated data.

The metadata is served out in a number of flavors:

  • native_oai_dc: metadata exactly how it came to us
  • rqis_dc: native_oai_dc plus safe xforms (was: as received, though normalized for errors and with added valid schemes)
  • rqis_dc_plus: rqis_dc plus augmentations (each safe xformed native record with any augmentations that apply, based on equivalence relationships)
  • rqis_dc_gold: rqis_dc, with erroneous values removed. Different from "rqis_dc_plus" because fields may be removed.
  • oai_dc: the RQIS's "dumbing down" of one of the above rqis_dc formats so we are compliant with OAI-PMH 2.0 (we must serve oai_dc)
  • "Mudball" (aggregation of all available metadata elements, with source, identified as being about a particular resource)

Use Case #4: focus on possible uses available to downstream users

ENC harvests "mudball" metadata records from the Repository to fulfil a number of specific requirements of their middle school portal:

  • They look for assertions of "conformsTo" relationships from a small number of sources that they consider reliable
  • They look for subject terms from controlled vocabularies on relevant resource metadata that they can use on their portal to provide topical navigation.
  • They look for annotations about middle school resources from teachers, librarians, and specific sources known by them to be reliable and appropriate for middle school audiences

MathForum re-harvests their metadata records from the Repository in the "rqis_dc_plus" flavor, looking for additional metadata added by others to provide additional value on their site. They also harvest the "mudball" records from other math collections to see if they can add some resources described by others to their site, making them available to their special services.

Use Case #5: when a resource or its metadata changes or is deleted, what happens to augmentations?

  • Deletion from specific providing collection: link moves to an equivalent resource metadata record? (or doesn't it matter, so long as there's another available Repository metadata record of that resource?)
  • Deletion of last Repository metadata record for that particular resource (perhaps it died?):
    • mark for deletion, but run occasional report to see if some can be revived?
    • point to Repository archived version of resource
  • Resource changes in ways that cannot be easily determined:
    • Augmentors notified to re-crawl or review,
    • non-updated augmentations could be "sunsetted" after some passage of time?
  • how can we be sure disappearance is permanent vs. temporary?

Use Case #6: when an augmentation is changed or deleted, what happens?

  1. a metadata augmentation is changed
    • MR picks it up on regular harvest from aug service (b/c OAI datestamp of changed record is after our "from" date argument in the OAI harvest from the aug service)
    • augmentation metadata record is updated in MR
    • changes to augmentation metadata record are duly propogated through MR storage and affected XML served out of MR
  2. a metadata augmentation is deleted
    • MR harvests deletion on regular harvest from aug service (b/c OAI datestamp of deleted record is after our "from" date argument in the OAI harvest from the aug service) [Q: what if aug service doesn't do persistent OAI deletes? (or transient deletes of a long period of time?]
    • augmentation metadata record is marked deleted in MR
    • deletion of augmentation metadata record is duly propogated through MR storage and affected XML served out of MR.

Scenarios and Sequences

A simple augmentation sequence

  1. Repository gets metadata record 1 from provider Q.
  2. Repository normalizes the metadata, creating record 1N.
  3. iVia harvests metadata record 1N from the MR's OAI server
  4. iVia uses IDs from harvested metadata to target resources for automated metadata creation, subject and classification assignment
  5. iVia exposes its metadata augmentations (not data harvested from original records) to the world as metadata record 1NiVia
  6. Repository harvests metadata record 1NiVia from iVia
  7. Repository normalizes or otherwise alters and stores the iVia aug record as record 1NiViaN
  8. Repository uses 1NiViaN as part of a rqis augmented/gold record

A more complex augmentation sequence

  1. Repository gets metadata record 1 from provider Q.
  2. Repository normalizes the metadata, creating record 1N.
  3. iVia harvests metadata record 1N from the MR's OAI server
  4. iVia uses IDs from harvested metadata to target resources for automated metadata creation, subject and classification assignment
  5. iVia exposes its metadata augmentations (not statements harvested from the original record) to the world as metadata record 1NiVia
  6. Repository harvests metadata record 1NiVia from iVia
  7. Repository normalizes or otherwise alters and stores the iVia aug record as record 1NiViaN
  8. Repository uses 1NiViaN in record 1aug, a nsdl augmented/gold record.
  9. Repository search service harvests record 1N or record 1aug.
  10. Repository search service discovers that the dc:format value is wrong -- it's text, not an image.
  11. Repository search service provides a correction to the dc:format field
  12. Repository archive service harvests record 1N or record 1aug.
  13. Repository archive service discovers that the dc:format value is wrong -- it's text/xml, not an image.
  14. Repository archive service provides a correction to the dc:format field
  15. Repository harvests via OAI or otherwise gets the corrections from the search service.
  16. Repository harvests via OAI or otherwise gets the corrections from the archive service.
  17. Repository Rating Service determines what value(s) exposed for dc:format in record 1aug, the nsdl augmented/gold record

Outline--an Annotation and Augmentation Harvest Scenario

  1. The Repository publishes two metadata formats designed to support annotations and augmented metadata
    • The key feature of these formats is two metadata elements, one (and only one) of which must be present, either of which may be repeated:
      • xxxxUniqueIdRef? -- which contains a reference to an existing metadata record in the MR and must match an existing record
      • dcIdentifierRef? -- which contains a reference to a URI that may or may not exist in the MR. #*Augmented metadata must reference an existing URI in the MR. This could also be expressed as <reference type=xxxxUniqueId?> or <reference type=dcIdentifier>
    • These are intended to be used to supply annotations and augmented metadata for harvest via OAI and perhaps a services interface.
  2. Annotation and augmentation suppliers wishing to supply metadata about a resource identified by a URI should first query or harvest the MR to get a list of metadata records that are about that URI.
  3. They create metadata about their annotation in the above format and serve it via OAI. This record may carry the actual annotation or it can simply contain a reference. In the case of metadata augmentation, each record served should be a self-contained, incomplete metadata record and should not reference another source of metadata.
  4. We harvest the records through a standard harvest -- all incoming records will have to be associated with a collection record
  5. The ingest process creates a unique mrec record for each incoming record
  6. References in the MR must always be mrec_ids so in the case of dcIdentifierRef? the ingest process retrieves all mrecs that reference each dcIdentifierRef?.
  7. If a dcIdentifierRef? references a URI that is not found, an mrec record is created for that URI and is queued for metadata generation by iVia (controversial)
  8. An entry is created in the link table for each mrec identifed either directly or by reference. This will contain the mrec_id of the annotation record, the mrec_id of the mrec being annotated, a reference type, a datestamp, and a source mrec_id
    • Note that the link table will need an additional 'source' field that will, in the case ofannotations and augmentations, contain the mrec_id of the annotation or augmenation metadata record that supplied the link.
    • Note also that reference type and datestamp are denormalized values that can be determined by reference to the source mrec_id if necessary.
  9. Output of augmented metadata is the tough thing -- it needs to be served both as a component part of the metadata format being augmented and as a distinct format, both within and without the mudball.