Ingestion – Execution of an ingestion job

During the execution of an ingestion job, no user intervention is required. What follows is an explanation of the accessible information, as well as the jobs the system is performing.

When accessing an ingestion job under execution, the screen will show its detail, which is updated periodically (every minute, subject to system configuration values).

The first section shows the same detail as at creation, plus a progress bar and information about the task being executed and the type of ingestion job being done, which can be Manual (if the user started the job), Automatic (if the job was started by the automatic ingestion process) or Evolution (as a result of manual or automatic evolution processes).

The tasks/modules that make an ingestion job are:

  • Sanitizing: the process in which the sanitizers configured in the preservation plan are applied, and which typically consists of cleaning the object of defects that do not prevent the object’s correct composition. An example of this would be deleting temporary folders or system index files, such as the Thumbs.db, that some OSs automatically generate when dealing with the files.

  • Preprocessing: preprocessing, evolution of pre-ingestion files, and file format characterization and validation actions are executed in this phase. Based on the preservation plan configuration, since all the actions herein are optional, all or just some of them can be executed.

  • Explorer: in this process, the object metadata is processed, and an index with the files and folders of the object is made. The metadata being examined is the one according to the metadata schema and filter defined for use in the preservation plan. This phase is also designed for the execution of the characterization and validation routines.

  • Checks: the process in which the checks configured in the preservation plan are applied. Nonconformance to these checks implies that the ingestion job will stop (user intervention needed).

  • Archiving: the module and process that will make the copies of the object in the storage areas. This process includes calculating a hash signature first, planning the future location of the object copies in the disks, copying the information to the database for later cataloging, a second hash calculation from the copied files in the destination disks, equality checking of the calculated hash signatures, and information packaging in an XML to be added to the object AIP.

  • Audit: a process, which has been programed in a completely independent way from the rest of the modules, that will do the same checks that the archiving module, and then will contrast the results to certify the correct archiving –and preservation- of the objects.

The second section shows the results in the different areas. They can be consulted in Related audit job summary and Ingestion job results. In case one of the tasks fails to comply with some of the controls, the reasons can be consulted in these sections by laying out the execution details of each element.

Next, the File format characterization and validation section can be found. Several charts in this section show file distribution by file format and the status of file validation, plus a combination of the former two charts showing the distribution of file formats and their various validation status. A summary of the file formats identified in the ingestion job follows, and then a dynamic list of files.

The summary shows additional information and access to specific information of every file format identified in the ingestion process, and the possibility to filter every file format in the list of files.

The list of files shows the files processed in the ingestion process, being able to filter by related path, object name, validation status or by file format (as mentioned above).

The next section shows information related to file transformation (ingestion evolution), if that was configured in the preservation plan and if there were files that should be transformed.

The section called Objects preserved by this job will show the ingested objects as soon as the ingestion job has –successfully- finished. The ingested objects will be listed, with access to object or object group detail of each one.

At the end, the last section shows the logs of all the tasks making up the complete job, as well as a summary of the ingestion job history, where time spent for each section can be consulted.

This job execution screen is what is left available for historic consultation at any moment.

Note: the ingestion jobs under execution available for consultation are not only those created manually, but also those created through the automatic (external) ingestion system and the file format evolution system.

Last updated