Abstract
The purpose of this report is to describe SMURF (semantically marked up record format) profile, which includes ERMS (electronic records management systems) and SFSB (simple file-system based) records as described below. When extracting information from a producer’s system one has the choice of two generic options:
1. Extracting data in a relational database structure
Extracting data from a relational database into a long-term preservation format (SIARD) that preserves the properties of the relational database so that the data can be imported into a relational database management system (RDBMS) on Access. Access can happen via database queries or via a search field.
The main access use cases are:
a. The producer wishes to retrieve their data for business purposes and/or re-use.
b. The consumer wishes to consult the data for purposes of research.
c. The archivist wishes to retrieve the data for professional treatment: to check and, if necessary perform preservation actions, etc. More information about this option can be read in the SIARD 2.0 Profile Specification.
2. Extracting data and metadata as records
Extract the records and normalise them to a standard E-ARK XML format. This means that the records are semantically marked up using metadata. Being technically valid and complying with this specification makes them directly accessible for validation, data management, indexing and searching. Their structured semantic metadata description is explicit rather than hidden inside a RDBS. The representation of descriptive metadata inside the archive can be in the E-ARK SMURF AIP format and/or another native archive format. The main advantages over the RDBS representation are that:
o Records from different sources can be merged.
o Search and access is possible across all records from all sources.
o Records can be managed and accessed uniformly.
o The original database / records system software does not need to be licensed and preserved.
1. Extracting data in a relational database structure
Extracting data from a relational database into a long-term preservation format (SIARD) that preserves the properties of the relational database so that the data can be imported into a relational database management system (RDBMS) on Access. Access can happen via database queries or via a search field.
The main access use cases are:
a. The producer wishes to retrieve their data for business purposes and/or re-use.
b. The consumer wishes to consult the data for purposes of research.
c. The archivist wishes to retrieve the data for professional treatment: to check and, if necessary perform preservation actions, etc. More information about this option can be read in the SIARD 2.0 Profile Specification.
2. Extracting data and metadata as records
Extract the records and normalise them to a standard E-ARK XML format. This means that the records are semantically marked up using metadata. Being technically valid and complying with this specification makes them directly accessible for validation, data management, indexing and searching. Their structured semantic metadata description is explicit rather than hidden inside a RDBS. The representation of descriptive metadata inside the archive can be in the E-ARK SMURF AIP format and/or another native archive format. The main advantages over the RDBS representation are that:
o Records from different sources can be merged.
o Search and access is possible across all records from all sources.
o Records can be managed and accessed uniformly.
o The original database / records system software does not need to be licensed and preserved.
Original language | English |
---|---|
Type | Project deliverable |
DOIs | |
Publication status | Published - 13 Feb 2018 |
Keywords
- E-ARK
- SMURF