Collecting Information needed for Re-Use and Preservation

In order to be able to (re-)use information and preserve information, appropriate information should be collected as soon as possible during its planning and creation, before it is forgotten or lost.

IPELTU uses a very general approach to describing projects, in terms of the what are termed Collection Groups, namely “Initiating”, “Planning”, “Executing” and “Closing” for each requiring Additional Information.

Phases and cycles in a project which collects/creates information to be preserved/curated

The table below provides examples for the various stages. The IPELTU document provides further details and checklists for a number of types of projects.

Collection Group
Initiating
Planning
Executing
Closing

Additional Information Area

Data Object

  • Estimate of volume of data to be produced

  • Ideas of the potential value of the data

  • Update Additional Information from Initiating based on more detailed plans

  • Identify types of data (raw, processed, etc.) which should be preserved

  • Identify types of data e.g., images, tables – and any generic interfaces

  • Quality constraints

  • Planned rate of data production

  • Expand and add detail

  • Update Additional Information from Planning based on what really happens

·* Finalise Additional Information from Executing

· Inventory of data produced which should be preserved

· Volume that would require preservation

· Collect quality checks which may be performed on the data by non-experts

· Define Information Properties which may be useful

· Checks for (and logs of) any missing data

Representation Information

· Standards planned to be used

· Information Model

· Update Additional Information from Initiating based on more detailed plans

· Review applicable standards

· Refine Information Model

· Choice of data format

· Identify Hardware and Software Dependencies

· Relationships between data items

· Update Additional Information from Planning based on what really happens

· Collect Semantics of the data elements e.g., data dictionaries and other semantics

· Collect Format definitions and formal descriptions

· Create Other Data Documentation

· Calibration and system test tools and system test data that will be delivered

· Finalise Additional Information from Executing

· Finalise Representation Information Networks to reasonable level

· Identify other software which may be used on the data

· Create suggestions for the Designated Community and Representation Information needed

Reference Information

· Identify standards which will be used to identify and reference the data and metadata

· Update Additional Information from Initiating based on more detailed plans

· Identify which unique identifiers should be used (e.g., DOI or other)

· Update Additional Information from Planning based on what really happens

· Rules, methods, tools for referencing data

· Generate references to data as it is being created/captured

· Finalise Additional Information from Executing

· Identify what may be used in future to identify the Information

· Checks for (and logs of) missing references and logs of any

Provenance Information

· Record of origins of the project e.g., in a Current Research Information System (CRI)

· Update Additional Information from Initiating based on more detailed plans

· Define Processing workflow, Processing inputs and Processing parameters

· Define System Testing required

· Documents from system development milestones

· Update Additional Information from Planning based on what really happens

· Documentation about the hardware and software used to create the data, including a history of the changes in these over time

· Update Documentation of Processing workflow, Processing inputs and Processing parameters

· Record who was responsible for each stage of processing

· Record when each stage was performed

· Record of any special hardware needed

· Record Calibration

· Processing logs

· Record checking of Fixity

· Finalise Additional Information from Executing

· Finalise Provenance handover

Context Information

· Outline of background concepts needed to understand the project

· Update Additional Information from Initiating based on more detailed plans

· Update Additional Information from Planning based on what really happens

· Collect publications related to the data or the processing system

· Potential Value of the data and likely business case for sustainability

· Finalise Additional Information from Executing

· Identify related data which may in the future be combined with this data

Fixity Information

· Fixity mechanism (e.g., CRC or digest) of data which may be preserved

· Update Additional Information from Planning based on what really happens

· Identify any special validation procedures that should be carried out.

· Finalise Additional Information from Executing

· Identify how do we verify that all files are intact

Access Rights Information

· What are the restrictions on access in the long term?

· Clear identification of Intellectual Property Rights

· Owners of the data – who can authorize hand-over

· Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Licenses involved

· The owner, and the restrictions on access (licenses), and the intellectual property rights

Packaging Information

· Details of the way components are packaged together for delivery to a repository

· Definition of mechanisms for transferring information to next element in the workflow or next in the chain of preservation (e.g., definitions of SIPs)

Descriptive Information

  • Identification of methods for exploration/ quick look at the data

· Finalise Additional Information from Executing

· Create browse/query data if needed

Issues Outside the Information Model

  • Estimated Cost of the project

  • The budget for archiving and its relationship to the. overall budget for the p

  • The schedule for major project milestones and deliveries to the archive.

  • Identification of archives which are likely to be able to host the data

  • Update Additional Information from Planning based on what really happens

· Finalise Additional Information from Executing

· Schedule of deliveries

· Pointers to the components to be transferred to the next element in the workflow or next in the chain of preservation

· Potential preservation aims for the information created

· Potential risks to preservation and exploitation of the data

· Define the mechanism for communication between project and archive.

· Define suggested Transformational Information Properties

· Publications, or references to publications, including scientific publications, related to the project.

Last updated

Was this helpful?