Difference between revisions of "HarvestSteps"
From Metadata-Registry
(→Harvest steps for tab-delimted files for Spotfire Data analysis:) |
|||
Line 1: | Line 1: | ||
− | == Harvest steps for tab- | + | == Harvest steps for tab-delimited files for Spotfire Data analysis: == |
=== harvest to files === | === harvest to files === | ||
note: file for each resumption block | note: file for each resumption block |
Latest revision as of 12:09, 18 January 2006
Harvest steps for tab-delimited files for Spotfire Data analysis:
harvest to files
note: file for each resumption block note: filename should be [service_id]_[set_id]_[metadata-prefix]_[yyyy-mm-dd-hh-mm-ss(request time)_[00000(chunk number)].xml
- store harvest log for each harvest with
- stat section
- harvest stat
- start time
- end time
- record count
- http errors
- http redirects
- chunk stats (written at the top of each chunk file?)
- chunk 01
- start time
- end time
- record count
- chunk 01
- harvest stat
- stat section
parse the files to create csv
- get total record count from harvest stats -- x and number of files -- z
- get number of requested records from csv convert command args -- y
- divide y/z and get that number of random records from each file
- get them all if y == 0
- open csv file for write
note: filename should be [service_id]_[set_id]_[yyyy-mm-dd-hh-mm-ss(request time).csv - for each record
- store record.header.identifier
- store namespaces -- record.metadata.dc xmlns:dc, xmlns:oai_dc,xmlns:xsi
- for each row in record
- metadata record id == record.header.identifier (stored)
- element namespace == record.metadata.dc xmlns:dc, xmlns:oai_dc,xmlns:xsi (stored)
- element name == record.metadata.dc.[any element]
- element value == record.metadata.dc.[any element].value
- element type == record.metadata.dc.[any element].type.value
- element lang == record.metadata.dc.[any element].lang.value