General considerations
A preservation system is a system for the massive processing of data. These data are usually stored in automatically accessible storage systems which are of the kind called ‘hard disk’. This has some consequences, which should be considered when operating the system:
a. Object ingestion, audit, retrieve and datamover operations will need time for their operation, which may take from seconds to days.
b. Factors influencing the length of these processes are:
i. Physical distribution of data in disks.
ii. Stacking of tasks requiring disk access.
iii. Speed of communication between the processing unit and the storage units.
c. Operational factors influencing the length are:
i. Stacking time of tasks that require disk access.
ii. Size (and/or number) of the object/s affected by the tasks.
Because of this, those tasks foreseen to extend over a long time should be taken out, according to a previous planning, and in a way that an operator/administrator who is responsible for the platform use has visibility of them.
In case that the storage systems used by LIBSAFE are shared with other tools, the effects of this situation will have to be studied more in detail by the administrators of the common systems.
Something important to consider in a preservation system is that preserving implies immutability. Everything that is inserted in a preservation system is oriented to be kept forever in time, with no chance to change or edit it. Because of this:
Once any kind of information is inserted, it cannot be changed and the fact is registered to determine its influence or implication in any process of the system.
To change, edit or version any data in the system really implies an action in which a new copy of the data is made, and saved in relation to the original. For example, different versions by evolution of an object can be made, but the original will still exist; or a preservation plan can be edited, but what really happens is that a copy is made and then set as active.
LIBSAFE runs most of its processes asynchronously. This means that it is not necessary to wait for the end of a process to define or execute another. For example: we may create several ingestion jobs without waiting for the previous one to end. Or, we can create retrieve jobs while ingestion and audit jobs are running.
Last updated