Collecting Information needed for Re-Use and Preservation

In order to be able to (re-)use information and preserve information, appropriate information should be collected as soon as possible during its planning and creation, before it is forgotten or lost.

IPELTU uses a very general approach to describing projects, in terms of the what are termed Collection Groups, namely “Initiating”, “Planning”, “Executing” and “Closing” for each requiring Additional Information.

The table below provides examples for the various stages. The IPELTU document provides further details and checklists for a number of types of projects.

Collection GroupInitiatingPlanningExecutingClosing

Additional Information Area

Data Object

  • Estimate of volume of data to be produced

  • Ideas of the potential value of the data

  • Update Additional Information from Initiating based on more detailed plans

  • Identify types of data (raw, processed, etc.) which should be preserved

  • Identify types of data e.g., images, tables – and any generic interfaces

  • Quality constraints

  • Planned rate of data production

  • Expand and add detail

  • Update Additional Information from Planning based on what really happens

·* Finalise Additional Information from Executing

· Inventory of data produced which should be preserved

· Volume that would require preservation

· Collect quality checks which may be performed on the data by non-experts

· Define Information Properties which may be useful

· Checks for (and logs of) any missing data

Representation Information

· Standards planned to be used

· Information Model

· Update Additional Information from Initiating based on more detailed plans

· Review applicable standards

· Refine Information Model

· Choice of data format

· Identify Hardware and Software Dependencies

· Relationships between data items

· Update Additional Information from Planning based on what really happens

· Collect Semantics of the data elements e.g., data dictionaries and other semantics

· Collect Format definitions and formal descriptions

· Create Other Data Documentation

· Calibration and system test tools and system test data that will be delivered

· Finalise Additional Information from Executing

· Finalise Representation Information Networks to reasonable level

· Identify other software which may be used on the data

· Create suggestions for the Designated Community and Representation Information needed

Reference Information

· Identify standards which will be used to identify and reference the data and metadata

· Update Additional Information from Initiating based on more detailed plans

· Identify which unique identifiers should be used (e.g., DOI or other)

· Update Additional Information from Planning based on what really happens

· Rules, methods, tools for referencing data

· Generate references to data as it is being created/captured

· Finalise Additional Information from Executing

· Identify what may be used in future to identify the Information

· Checks for (and logs of) missing references and logs of any

Provenance Information

· Record of origins of the project e.g., in a Current Research Information System (CRI)

· Update Additional Information from Initiating based on more detailed plans

· Define Processing workflow, Processing inputs and Processing parameters

· Define System Testing required

· Documents from system development milestones

· Update Additional Information from Planning based on what really happens

· Documentation about the hardware and software used to create the data, including a history of the changes in these over time

· Update Documentation of Processing workflow, Processing inputs and Processing parameters

· Record who was responsible for each stage of processing

· Record when each stage was performed

· Record of any special hardware needed

· Record Calibration

· Processing logs

· Record checking of Fixity

· Finalise Additional Information from Executing

· Finalise Provenance handover

Context Information

· Outline of background concepts needed to understand the project

· Update Additional Information from Initiating based on more detailed plans

· Update Additional Information from Planning based on what really happens

· Collect publications related to the data or the processing system

· Potential Value of the data and likely business case for sustainability

· Finalise Additional Information from Executing

· Identify related data which may in the future be combined with this data

Fixity Information

· Fixity mechanism (e.g., CRC or digest) of data which may be preserved

· Update Additional Information from Planning based on what really happens

· Identify any special validation procedures that should be carried out.

· Finalise Additional Information from Executing

· Identify how do we verify that all files are intact

Access Rights Information

· What are the restrictions on access in the long term?

· Clear identification of Intellectual Property Rights

· Owners of the data – who can authorize hand-over

· Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Licenses involved

· The owner, and the restrictions on access (licenses), and the intellectual property rights

Packaging Information

· Details of the way components are packaged together for delivery to a repository

· Definition of mechanisms for transferring information to next element in the workflow or next in the chain of preservation (e.g., definitions of SIPs)

Descriptive Information

  • Identification of methods for exploration/ quick look at the data

· Finalise Additional Information from Executing

· Create browse/query data if needed

Issues Outside the Information Model

  • Estimated Cost of the project

  • The budget for archiving and its relationship to the. overall budget for the p

  • The schedule for major project milestones and deliveries to the archive.

  • Identification of archives which are likely to be able to host the data

  • Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Schedule of deliveries

· Pointers to the components to be transferred to the next element in the workflow or next in the chain of preservation

· Potential preservation aims for the information created

· Potential risks to preservation and exploitation of the data

· Define the mechanism for communication between project and archive.

· Define suggested Transformational Information Properties

· Publications, or references to publications, including scientific publications, related to the project.

Last updated