Collecting Information needed for Re-Use and Preservation
In order to be able to (re-)use information and preserve information, appropriate information should be collected as soon as possible during its planning and creation, before it is forgotten or lost.
IPELTU uses a very general approach to describing projects, in terms of the what are termed Collection Groups, namely “Initiating”, “Planning”, “Executing” and “Closing” for each requiring Additional Information.
The table below provides examples for the various stages. The IPELTU document provides further details and checklists for a number of types of projects.
Collection Group | Initiating | Planning | Executing | Closing |
---|---|---|---|---|
Additional Information Area | ||||
Data Object |
|
|
| ·* Finalise Additional Information from Executing · Inventory of data produced which should be preserved · Volume that would require preservation · Collect quality checks which may be performed on the data by non-experts · Define Information Properties which may be useful · Checks for (and logs of) any missing data |
Representation Information | · Standards planned to be used · Information Model | · Update Additional Information from Initiating based on more detailed plans · Review applicable standards · Refine Information Model · Choice of data format · Identify Hardware and Software Dependencies · Relationships between data items | · Update Additional Information from Planning based on what really happens · Collect Semantics of the data elements e.g., data dictionaries and other semantics · Collect Format definitions and formal descriptions · Create Other Data Documentation · Calibration and system test tools and system test data that will be delivered | · Finalise Additional Information from Executing · Finalise Representation Information Networks to reasonable level · Identify other software which may be used on the data · Create suggestions for the Designated Community and Representation Information needed |
Reference Information | · Identify standards which will be used to identify and reference the data and metadata | · Update Additional Information from Initiating based on more detailed plans · Identify which unique identifiers should be used (e.g., DOI or other) | · Update Additional Information from Planning based on what really happens · Rules, methods, tools for referencing data · Generate references to data as it is being created/captured | · Finalise Additional Information from Executing · Identify what may be used in future to identify the Information · Checks for (and logs of) missing references and logs of any |
Provenance Information | · Record of origins of the project e.g., in a Current Research Information System (CRI) | · Update Additional Information from Initiating based on more detailed plans · Define Processing workflow, Processing inputs and Processing parameters · Define System Testing required · Documents from system development milestones | · Update Additional Information from Planning based on what really happens · Documentation about the hardware and software used to create the data, including a history of the changes in these over time · Update Documentation of Processing workflow, Processing inputs and Processing parameters · Record who was responsible for each stage of processing · Record when each stage was performed · Record of any special hardware needed · Record Calibration · Processing logs · Record checking of Fixity | · Finalise Additional Information from Executing · Finalise Provenance handover |
Context Information | · Outline of background concepts needed to understand the project | · Update Additional Information from Initiating based on more detailed plans | · Update Additional Information from Planning based on what really happens · Collect publications related to the data or the processing system · Potential Value of the data and likely business case for sustainability | · Finalise Additional Information from Executing · Identify related data which may in the future be combined with this data |
Fixity Information |
| · Fixity mechanism (e.g., CRC or digest) of data which may be preserved | · Update Additional Information from Planning based on what really happens · Identify any special validation procedures that should be carried out. | · Finalise Additional Information from Executing · Identify how do we verify that all files are intact |
Access Rights Information |
| · What are the restrictions on access in the long term? · Clear identification of Intellectual Property Rights · Owners of the data – who can authorize hand-over | · Update Additional Information from Planning based on what really happens | · Finalise Additional Information from Executing · Licenses involved · The owner, and the restrictions on access (licenses), and the intellectual property rights |
Packaging Information |
|
|
| · Details of the way components are packaged together for delivery to a repository · Definition of mechanisms for transferring information to next element in the workflow or next in the chain of preservation (e.g., definitions of SIPs) |
Descriptive Information |
|
|
| · Finalise Additional Information from Executing · Create browse/query data if needed |
Issues Outside the Information Model |
|
|
| · Finalise Additional Information from Executing · Schedule of deliveries · Pointers to the components to be transferred to the next element in the workflow or next in the chain of preservation · Potential preservation aims for the information created · Potential risks to preservation and exploitation of the data · Define the mechanism for communication between project and archive. · Define suggested Transformational Information Properties · Publications, or references to publications, including scientific publications, related to the project. |
Last updated