LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page

Was this helpful?

  1. DATA CURATION AND PRESERVATION

Collecting Information needed for Re-Use and Preservation

In order to be able to (re-)use information and preserve information, appropriate information should be collected as soon as possible during its planning and creation, before it is forgotten or lost.

PreviousInformation LifecyclesNextPlanning and Using Additional Information in LABDRIVE

Last updated 2 years ago

Was this helpful?

IPELTU uses a very general approach to describing projects, in terms of the what are termed Collection Groups, namely “Initiating”, “Planning”, “Executing” and “Closing” for each requiring Additional Information.

The table below provides examples for the various stages. The IPELTU document provides further details and checklists for a number of types of projects.

Collection Group
Initiating
Planning
Executing
Closing

Additional Information Area

Data Object

  • Estimate of volume of data to be produced

  • Ideas of the potential value of the data

  • Update Additional Information from Initiating based on more detailed plans

  • Identify types of data (raw, processed, etc.) which should be preserved

  • Identify types of data e.g., images, tables – and any generic interfaces

  • Quality constraints

  • Planned rate of data production

  • Expand and add detail

  • Update Additional Information from Planning based on what really happens

·* Finalise Additional Information from Executing

· Inventory of data produced which should be preserved

· Volume that would require preservation

· Collect quality checks which may be performed on the data by non-experts

· Define Information Properties which may be useful

· Checks for (and logs of) any missing data

Representation Information

· Standards planned to be used

· Information Model

· Update Additional Information from Initiating based on more detailed plans

· Review applicable standards

· Refine Information Model

· Choice of data format

· Identify Hardware and Software Dependencies

· Relationships between data items

· Update Additional Information from Planning based on what really happens

· Collect Semantics of the data elements e.g., data dictionaries and other semantics

· Collect Format definitions and formal descriptions

· Create Other Data Documentation

· Calibration and system test tools and system test data that will be delivered

· Finalise Additional Information from Executing

· Finalise Representation Information Networks to reasonable level

· Identify other software which may be used on the data

· Create suggestions for the Designated Community and Representation Information needed

Reference Information

· Identify standards which will be used to identify and reference the data and metadata

· Update Additional Information from Initiating based on more detailed plans

· Identify which unique identifiers should be used (e.g., DOI or other)

· Update Additional Information from Planning based on what really happens

· Rules, methods, tools for referencing data

· Generate references to data as it is being created/captured

· Finalise Additional Information from Executing

· Identify what may be used in future to identify the Information

· Checks for (and logs of) missing references and logs of any

Provenance Information

· Record of origins of the project e.g., in a Current Research Information System (CRI)

· Update Additional Information from Initiating based on more detailed plans

· Define Processing workflow, Processing inputs and Processing parameters

· Define System Testing required

· Documents from system development milestones

· Update Additional Information from Planning based on what really happens

· Documentation about the hardware and software used to create the data, including a history of the changes in these over time

· Update Documentation of Processing workflow, Processing inputs and Processing parameters

· Record who was responsible for each stage of processing

· Record when each stage was performed

· Record of any special hardware needed

· Record Calibration

· Processing logs

· Record checking of Fixity

· Finalise Additional Information from Executing

· Finalise Provenance handover

Context Information

· Outline of background concepts needed to understand the project

· Update Additional Information from Initiating based on more detailed plans

· Update Additional Information from Planning based on what really happens

· Collect publications related to the data or the processing system

· Potential Value of the data and likely business case for sustainability

· Finalise Additional Information from Executing

· Identify related data which may in the future be combined with this data

Fixity Information

· Fixity mechanism (e.g., CRC or digest) of data which may be preserved

· Update Additional Information from Planning based on what really happens

· Identify any special validation procedures that should be carried out.

· Finalise Additional Information from Executing

· Identify how do we verify that all files are intact

Access Rights Information

· What are the restrictions on access in the long term?

· Clear identification of Intellectual Property Rights

· Owners of the data – who can authorize hand-over

· Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Licenses involved

· The owner, and the restrictions on access (licenses), and the intellectual property rights

Packaging Information

· Details of the way components are packaged together for delivery to a repository

· Definition of mechanisms for transferring information to next element in the workflow or next in the chain of preservation (e.g., definitions of SIPs)

Descriptive Information

  • Identification of methods for exploration/ quick look at the data

· Finalise Additional Information from Executing

· Create browse/query data if needed

Issues Outside the Information Model

  • Estimated Cost of the project

  • The budget for archiving and its relationship to the. overall budget for the p

  • The schedule for major project milestones and deliveries to the archive.

  • Identification of archives which are likely to be able to host the data

  • Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Schedule of deliveries

· Pointers to the components to be transferred to the next element in the workflow or next in the chain of preservation

· Potential preservation aims for the information created

· Potential risks to preservation and exploitation of the data

· Define the mechanism for communication between project and archive.

· Define suggested Transformational Information Properties

· Publications, or references to publications, including scientific publications, related to the project.

Phases and cycles in a project which collects/creates information to be preserved/curated