Data Integrity

In the general field of content preservation, it is necessary to have security mechanisms that guarantee that the information has not been modified in any way over time or by external factors, so it is important to have tools that verify this premise in processes such as the transfer of information or the recovery of content or even the verification of the integrity of the data generated in a data calculation.

In digital preservation, files are kept for an indeterminate period of time and make up data structures that represent what LIBSAFE calls an object. Over time, these files must have validation mechanisms to ensure the content and its invariability. It may happen that the content of the objects (the files) is moved to another location for reasons of performance or free space, but what must not happen is that a file is altered from the moment it was ingested.

For this reason, LIBSAFE has a tool that digitally verifies that the files contained within an object are still the ones that were originally added to it. This tool is the encrypted calculated value of the files, which is used to check that an initial version of the file is exactly the same as the current one or the one recovered some time later.

Currently LIBSAFE has three algorithms that are the most commonly used: MD5, SHA-1 and SHA-256, which can be used in the platform to cover this need for file verification.

To make use of these tools, it is necessary to define them when creating the environment in which the platform configures all the mechanisms to be used. This process is the creation of Preservation Plans or Areas and it is from this point where the user chooses which algorithms to use, and which will be the default algorithm for linear flow operations such as the file verification algorithm when moving the object to other disks.

When the user creates a Preservation plan or area in LIBSAFE, it is possible to choose one or several algorithms from the available list. All selected algorithms for this plan will be listed with an option to check the master or default algorithm.

If more than one algorithm needs to be included in the plan, by clicking on the "select" link the user can choose more algorithms:

A selection entry appears in the interface with the available algorithms (in black) and the selected algorithms (in light grey). The user can select the new algorithm and finish the process by clicking on the Add link.

This algorithm selection procedure becomes visible when accessing the details of an object file but internally it is useful when moving files with the Datamover functionality or when retrieving a copy of the object to be consulted with the Retrieve service. To view the information resulting from the selected algorithms tool, in the object catalogue, in the File and Folder Structure section it is possible to click on a file name and view the extract information about the file including the algorithm(s) used.

All defined or associated algorithms to this plan will be listed in the summary file details information in the File Hash row.

Within the Hash table inside this row, there are 4 columns describing:

  • Hash algorithm: The used cryptographic algorithm

  • Hash value: Result of calculation with the algorithm over the file

  • Creation date time: Date time when the calculation was made

  • Master: Indicates which of the algorithms is the common used for all processes in the preservation plan including internal system processes for verification, such as the movement of objects between disks, retrieval of object files, among others.

Last updated