Difference between revisions of "Versioning"

From Metadata-Registry
Jump to: navigation, search
(Draft Versioning Policy)
m (Reverted edits by Venusparkle85 (Venusparkle85); changed back to last version by 128.208.131.180)
 
(5 intermediate revisions by 2 users not shown)
Line 18: Line 18:
 
A hosted vocabulary is one whose canonical (official) version resides, or is "hosted" in the Registry.  
 
A hosted vocabulary is one whose canonical (official) version resides, or is "hosted" in the Registry.  
  
   
+
====URI Changes====
 +
 
 +
Stability and reliability of concept URIs is critical to the Registry. Determining unambiguously when a registry maintainer intends to change the semantics of a term will be a challenge with some forms of controlled vocabularies. If the Registry allows registry of simple term lists, without hierarchies or definitions to determine term boundaries, clearly there is no ability to signal any semantic change beyond the addition and deprecation of terms.
 +
 
 +
Changes in description of the term, including most changes of definitions and simple additions or changes in term relationships, should not qualify as semantic changes requiring a change in a term URI. In general, non-semantic changes might be:
 +
# Additions of broader, narrower or related terms
 +
# Changes in definition for clarification, correction of typos or grammar, etc.
 +
# Addition of definition or scope note when none is present
 +
# Change in term status
 +
# Addition of other information (references, etc.)
 +
 
 +
Semantic changes, requiring a change in URI, might include:
 +
# Term splitting or consolidation
 +
# Semantic changes in definition
 +
# Changes in hierarchical relationships, when there's no definition and the hierarchy placement is the only semantic clue
 +
 
 +
Enforcement of this policy is problematic, since the initial decision on whether a change requires a change in URI is made by the maintainer (the exception is splits or consolidation, where machine enforcement is possible). It's possible that a combination of explicit questions to the maintainer before a submission and some monitoring by the Registry administrator (particularly focusing on new maintainers) might decrease chances of semantically significant changes being made without triggering a new URI.  This is certainly an area where experience will be instructive (and research perhaps be useful).
  
 
===Non-Hosted Vocabularies===
 
===Non-Hosted Vocabularies===
Line 24: Line 40:
 
A Non-Hosted vocabulary is one that is published (exposed) through the Registry but that is created and maintained by its promulgating agency in a separate registry or as a Web-addressable file in its own namespace.
 
A Non-Hosted vocabulary is one that is published (exposed) through the Registry but that is created and maintained by its promulgating agency in a separate registry or as a Web-addressable file in its own namespace.
  
It seems clear that most of the 'control' over externally managed vocabularies, particularly in terms of versioning, will be at a policy level, since the maintenance agency processes will be independent of the Registry. If the Registry is to make available any notion of versioning for these vocabularies, the versioning information at both the vocabulary and term levels must be exposed to the Registry. Ideally, the registry will at some point be able to ingest vocabulary 'snapshots' (if the maintaining agency makes them available) or create from ingestion of term changes viable versioned 'snapshots' for use by other services or organizations.
+
It seems clear that most of the 'control' over externally managed vocabularies, particularly in terms of versioning, will be at a policy level, since the maintenance agency processes will be independent of the Registry. If the Registry is to make available any notion of 'versioned copies' for these vocabularies, the versioning information at both the vocabulary and term levels must be exposed to the Registry. Ideally, the registry will at some point be able to 'slurp' vocabulary 'snapshots' (if the maintaining agency makes them available) or create from 'slurping' of term changes viable versioned 'snapshots' for use by other services or organizations.
  
Services may be developed to manage "slurping" processes when terms change externally, and we should maintain sequenced copies of the concept schemes to be able to track changes over time and reflect those to vocabulary users.
+
Registry services may be developed to manage agreements with agencies and 'slurping' processes when terms change externally, and the Registry should maintain sequenced copies of the concept schemes to be able to track changes over time and reflect those to vocabulary users.
  
 
==Versioning for Vocabularies: Notes==
 
==Versioning for Vocabularies: Notes==
Line 56: Line 72:
 
  <li>Semantic changes in definition
 
  <li>Semantic changes in definition
 
  <li>[Possibly] changes in hierarchical relationships (only when there's no definition and the hierarchy placement is the only semantic clue?) [Not sure how we would enforce this]
 
  <li>[Possibly] changes in hierarchical relationships (only when there's no definition and the hierarchy placement is the only semantic clue?) [Not sure how we would enforce this]
 +
<li>Changes in RDFS:range or RDFS:domain that result in breaking backward compatibility (e.g., stipulating the use of a URI where a string value had been previously pernitted)
 
</ol>
 
</ol>
 
</ul>
 
</ul>

Latest revision as of 08:00, 1 November 2006

Draft Versioning [Change Management?] Policy

Controlled vocabulary versioning issues occur with both URIs and descriptions. Each can change at two levels: at the term level, where each term requires change management, and at the overall vocabulary, which is intrinsically different each time a term changes. Because it's not entirely clear what end users of vocabularies will require from registered vocabularies, the Registry will make available versions of the vocabularies and individual terms to the extent possible.

Because there are distinct differences in the control the registry has over hosted and non-hosted vocabularies, the Registry policies for each will be separately addressed.

General Assumptions:

  1. URIs will remain stable so long as the semantics of the term do not change
  2. URIs of individual term values won't contain version information
  3. The Registry must be able to allow people/services to create dependencies on a 'versioned' snapshot of a particular SKOS/OWL representation of a vocabulary and it's relationships
  4. A 'versioned snapshot' must include the version designation (either 'number' or 'date')
  5. Individual values in a vocabulary may be created, updated, or deprecated, but not deleted
  6. Namespaces of vocabulary schemas won't be versioned
  7. Schema name versioning will only change if the version change would break backward compatibility

Hosted Vocabularies

A hosted vocabulary is one whose canonical (official) version resides, or is "hosted" in the Registry.

URI Changes

Stability and reliability of concept URIs is critical to the Registry. Determining unambiguously when a registry maintainer intends to change the semantics of a term will be a challenge with some forms of controlled vocabularies. If the Registry allows registry of simple term lists, without hierarchies or definitions to determine term boundaries, clearly there is no ability to signal any semantic change beyond the addition and deprecation of terms.

Changes in description of the term, including most changes of definitions and simple additions or changes in term relationships, should not qualify as semantic changes requiring a change in a term URI. In general, non-semantic changes might be:

  1. Additions of broader, narrower or related terms
  2. Changes in definition for clarification, correction of typos or grammar, etc.
  3. Addition of definition or scope note when none is present
  4. Change in term status
  5. Addition of other information (references, etc.)

Semantic changes, requiring a change in URI, might include:

  1. Term splitting or consolidation
  2. Semantic changes in definition
  3. Changes in hierarchical relationships, when there's no definition and the hierarchy placement is the only semantic clue

Enforcement of this policy is problematic, since the initial decision on whether a change requires a change in URI is made by the maintainer (the exception is splits or consolidation, where machine enforcement is possible). It's possible that a combination of explicit questions to the maintainer before a submission and some monitoring by the Registry administrator (particularly focusing on new maintainers) might decrease chances of semantically significant changes being made without triggering a new URI. This is certainly an area where experience will be instructive (and research perhaps be useful).

Non-Hosted Vocabularies

A Non-Hosted vocabulary is one that is published (exposed) through the Registry but that is created and maintained by its promulgating agency in a separate registry or as a Web-addressable file in its own namespace.

It seems clear that most of the 'control' over externally managed vocabularies, particularly in terms of versioning, will be at a policy level, since the maintenance agency processes will be independent of the Registry. If the Registry is to make available any notion of 'versioned copies' for these vocabularies, the versioning information at both the vocabulary and term levels must be exposed to the Registry. Ideally, the registry will at some point be able to 'slurp' vocabulary 'snapshots' (if the maintaining agency makes them available) or create from 'slurping' of term changes viable versioned 'snapshots' for use by other services or organizations.

Registry services may be developed to manage agreements with agencies and 'slurping' processes when terms change externally, and the Registry should maintain sequenced copies of the concept schemes to be able to track changes over time and reflect those to vocabulary users.

Versioning for Vocabularies: Notes

At this point it seems relevant to gather information about the issue:

  • I've requested information from Rebecca Guenther and Eric Childress about the administrative information around the LC vocabularies and DDC (respectively) that might help us manage terms in our vocabularies.
  • Some beginnings of consensus:
  • URIs should be stable so long as the semantics of the term/concept do not change
  • Non-semantic changes might be:
    1. Additions of broader, narrower or related terms
    2. Changes in definition for clarification, correction of typos or grammar, etc.
    3. Addition of definition or scope note when none is present
    4. Change in term status
    5. Addition of other information (references, etc.)
  • Semantic changes, requiring a change in URI, might include:
    1. Term splitting or consolidation
    2. Semantic changes in definition
    3. [Possibly] changes in hierarchical relationships (only when there's no definition and the hierarchy placement is the only semantic clue?) [Not sure how we would enforce this]
    4. Changes in RDFS:range or RDFS:domain that result in breaking backward compatibility (e.g., stipulating the use of a URI where a string value had been previously pernitted)

JP's notes on versioning:

  • Some potential versioning requirements:
  1. must be able to allow people/services to create dependencies on a 'versioned' snapshot of a particular SKOS/OWL representation of a vocabulary and it's relationships
  2. must be able to encode the version 'number' in the above snapshot
  3. individual values in a vocabulary may be created, updated, or deprecated, but not deleted
  4. URIs of individual values won't contain version information
  5. Namespaces of vocabulary schemas won't be versioned
  6. Vocabulary schema names will contain version information
  7. Schema name versioning will only change if the version change would break backward compatibility

F2F Meeting Additions

Most control over externally managed vocabularies in terms of versioning will be at a policy level. Services may be developed to manage "slurping" processes when terms change externally, and we should maintain sequenced copies of the concept schemes to be able to track changes over time and reflect those to vocabulary users.