Provenance Information

Provenance information is important if one is to be able to reproduce the object's creation and processing.

For an specific data object there are likely to be many "siblings" with similar histories, but the most recent activities will be unique to that object.

The Provenance information would initially be relatively simple until information from multiple sources are combined e.g. a FITS image for an area of the sky in one wavelength combined with similar images for other wavelengths. Handling such Provenance is still an area of active research see https://www.sciencedirect.com/topics/computer-science/data-provenance and https://www.research.ed.ac.uk/en/publications/data-provenance.

There are multiple methods in use to capture Provenance, often dependent on the domain. For example the library community often use PREMIS, which can be associated with specific vocabularies. Open Provenance Model (OPM, https://openprovenance.org) and PROV (https://www.w3.org/TR/prov-dm/) are available standards.

Many scientific formats contain elements of Provenance, for example FITS files https://www.loc.gov/preservation/digital/formats/fdd/fdd000317.shtml have COMMENT and HISTORY records which can be used to describe the data unit and its provenance, while HDF can also contain customized Provenance. In such cases the way in which the Provenance is encoded may be specific to a particular project and specific code would be required to extract it, perhaps into one of the standard formats to allow efficient queries to be performed.

Where the Provenence, or part of it, is encoded within the Data Object, as with FITS or HDF, the Provenance Information should describe how to extract that (piece of) the Provenance and its Representation Information, such as its Semantic RepInfo.

The Provenance Information may be encoded as a separate Data Object, for example PREMIS, OPM etc, and therefore there must be associated Representation Information for that object, for example the definition of the version of PREMIS and specific vocabulary used.

Last updated