LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page

Was this helpful?

  1. DATA CURATION AND PRESERVATION
  2. How to deal with Additional Information

Provenance Information

Provenance information is important if one is to be able to reproduce the object's creation and processing.

PreviousRepresentation InformationNextContext Information

Last updated 3 years ago

Was this helpful?

For an specific data object there are likely to be many "siblings" with similar histories, but the most recent activities will be unique to that object.

The Provenance information would initially be relatively simple until information from multiple sources are combined e.g. a FITS image for an area of the sky in one wavelength combined with similar images for other wavelengths. Handling such Provenance is still an area of active research see and .

There are multiple methods in use to capture Provenance, often dependent on the domain. For example the library community often use PREMIS, which can be associated with specific vocabularies. Open Provenance Model (OPM, ) and PROV () are available standards.

Many scientific formats contain elements of Provenance, for example FITS files have COMMENT and HISTORY records which can be used to describe the data unit and its provenance, while can also contain customized Provenance. In such cases the way in which the Provenance is encoded may be specific to a particular project and specific code would be required to extract it, perhaps into one of the standard formats to allow efficient queries to be performed.

Where the Provenence, or part of it, is encoded within the Data Object, as with FITS or HDF, the Provenance Information should describe how to extract that (piece of) the Provenance and its Representation Information, such as its Semantic RepInfo.

The Provenance Information may be encoded as a separate Data Object, for example PREMIS, OPM etc, and therefore there must be associated Representation Information for that object, for example the definition of the version of PREMIS and specific vocabulary used.

https://www.sciencedirect.com/topics/computer-science/data-provenance
https://www.research.ed.ac.uk/en/publications/data-provenance
https://openprovenance.org
https://www.w3.org/TR/prov-dm/
https://www.loc.gov/preservation/digital/formats/fdd/fdd000317.shtml
HDF