Understanding OAIS and ISO 16363

By: David Giaretta (head of the OAIS and ISO 16363 working group)

Overview of OAIS

The Reference Model for an Open Archival Information System (OAIS), also known as ISO 14721:2012, is the "gold standard" guiding how digital preservation should be done. It is available from ISO and also from the CCSDS site. It is the fundamental standard for digital preservation and provides a high-level reference model or framework identifying the participants in digital preservation, their roles and responsibilities, and the kinds of information to be exchanged during the course of deposit and ingest into and dissemination from a digital repository.

OAIS itself defines what conformance means; this definition is detailed enough to allow one to say that a specific repository is NOT conformant and therefore is not able to carry out digital preservation properly.

In order to be sure that a repository IS ABLE to carry out digital preservation properly, the standard ISO 16363, which builds on OAIS, must be used.

Overview of ISO 16363

ISO16363:2012 is available from ISO and also from the CCSDS web site.

The standard was designed to provide the basis of an ISO audit and certification process. However it has a great deal of additional explanations to allow repository managers to perform the initial assessments of their own repositories.

It was written with auditors in mind, bearing in mind that the outcome will depend on the judgement of the auditors. In order to help auditors the standard was designed in a hierarchical way. For example the standard directs the auditors’ attention to three separate aspects of the repository:

  1. organizational infrastructure – which addresses the repository organisation, its commitment to preservation, governance, staffing, financial sustainability and legal responsibilities

  2. digital object management – which addresses the fundamentals of digital preservation, following the OAIS concepts

  3. infrastructure and security risk management – addressing security aspects, which may be taken care of by ISO 27000 certification

Within each of these further details are brought out in specific metrics which direct the auditors’ attention to specific areas; where appropriate the metrics are further broken down into sub-metrics in order to ensure that some even more specific aspects are inspected.

Repository managers must also be able to use the standard in order to prepare for audits.

For auditors, and even more so for repository managers, each metric has additional explanatory text:

  • supporting text – which provides a brief explanation of why the metric is important

  • examples of evidence the repository may present

  • a more detailed discussion of the metric – to provide a broader understanding of the metric

It is unlikely that any repository will be found to be perfect in all metrics. The aim of the audit is to identify areas which are in need of improvement – as part of a cycle of continuous improvement.

OAIS and ISO 16363 Conformance

OAIS Conformance, which also underpins ISO 16363 conformance, is defined as follows (using the updated version of OAIS):

A conforming OAIS Archive implementation shall support, and be able to map to the components of, the model of information .... A conforming OAIS Archive shall fulfill the responsibilities ....

In order to clarify what these requirements mean, the next sections outline the Information Model and Mandatory Responsibilities.

What does an archive need right now

An archive must have an archive system which supports OAIS conformance, as described below. Some of these rely on policies and procedures while others rely on the capabilities of the software system. LABDRIVE provides the latter, while templates and advice for the former can be provided.

The amount of the each of the various types of "metadata" depend on the aims of the archive. It may be that at this moment very little of certain kinds of "metadata" is needed, but in future, in order for the information to be preserved, a great deal more will need to be collected. The software system must not be limited to what is needed right now, but instead must be able to deal with the challenges presented as time passes. This is the very essence of the requirement for preservation.

OAIS Information Model

OAIS does not use the word "metadata" because this has been found to be confusing; "metadata" is interpreted by different people to mean widely different things. Instead OAIS uses more precise terminology for the various components needed in order for information to be preserved. The aim is to be able to clearly define what and how much of the various types of information are needed in order to successfully carry out preservation. An archive is not required to use OAIS terminology BUT it must be able to clearly identify what the various required pieces of information identified by OAIS are in its organisation.

The OAIS Information Model is fully expanded in the diagram of the Archival Information Package (AIP).

Any Information which is to be preserved, referred to as the Content Information, must be associated with all these pieces of "metadata". The AIP is a logical container, in other words the various components may be pointed to given the Information being preserved.

Therefore to conform to OAIS the repository software must allow the repository to make these associations, the mechanism for which we illustrate below for LABDRIVE.

In order to help the reader understand the significance of the Information Model a brief outline is provided in the next section.

Rationale for the OAIS Information Model

An important point to note is that the Representation Information is required to understand/use the Data Object. Most systems identify what they call the "format" of the data, for example using PRONOM. However the problem with doing this, especially for scientific data, may be illustrated using the simple example of CSV files containing the text:

FirstName, Surname, Gender

Fred,Bloggs, Male

Jane, Bloggs, Female

The files may be saved using UTF-8, UTF-8 with BOM, UTF-16, and a variety of others. Current applications such as Excel or Notepad, are likely to deal with these in an apparently identical fashion. However the bit sequences are different.

Encoding

Hex first line

UTF-8

46 69 72 73 74 4e 61 6d 65 2c 20 53 75 72 6e 61

UTF-8 BOM

ef bb bf 46 69 72 73 74 4e 61 6d 65 2c 20 53 75

UTF-16

ff fe 46 00 69 00 72 00 73 00 74 00 4e 00 61 00

In the future applications may not be so accommodating, and may not recognize one or other of these encodings automatically

The PRONOM code for all these is x-fmt/18 and the PRONOM page further tells us that the MIME Type is text/csv but provides no further information about encoding.

More importantly no information about the semantics is provided. In this case it may be obvious what FirstName, Surname and Gender "mean", but does the latter mean gender at birth or by declaration or following medical procedure?

The software which may be used to deal with these files are readily available now, for example Excel or Notepad in Windows, but will they be so readily available in future? Moreover while UTF-8 and UTF-8 BOM appear using Excel as a table with 3 columns, the UTF-16 is shown as having only 1 column.

This is a trivial example and one can imagine the difficulties which can arise for more complex scientific and other data, such as that belonging to scientific research organizations.

Scientific data may use specialised terminology. The current users of that data will be very familiar with terms such as

  • bad_thing/min_x = -390

  • coffset = 582.00e-3

  • SUBRUN

Such current users will understand the meaning, units and use of such terms. In the future such things may not be common knowledge when a project has ended but the data is still used. In some cases it may be possible to guess the meaning, but guesses may be catastrophically wrong.

The OAIS Information Model insists that one should be able to provide, when required:

  • Structure Representation Information - for example the format (in more detail than x-fmt/18)

  • Semantic Representation Information - for example the meaning of terms, or units of measurements

  • Other Representation Information - for example software

Note that how much Representation Information is provided at any specific time depends upon the Designated Community which is defined by the repository. If the Designated Community is simply the current users then no Representation Information may be needed, but as time passes this will change and more Representation Information will need to be added as part of the preservation process.

Provenance Information is, as one might guess about Provenance. The point about saying that it is Information is to make it clear that a file, for example a PREMIS file, or a simple table of events, is what OAIS calls a Data Object. This needs appropriate Representation Information to ensure that it is understandable. For example a PREMIS file is defined by the appropriate version of the PREMIS standard, but one may need to provide the specific vocabulary being used.

Further details are provided in the examples and evidence given below.

Part of the provenance, including fixity checks, will be the events recorded by the LABDRIVE system. Extracting the events gives one, for example, the following:

    {
        "container_id": 133,
        "file_id": 60736,
        "user_id": 0,
        "module": "SafeboxCommon",
        "action": "file.index",
        "message": "Harvested file #60736: \\Objects\\MAGIC_2019_GRB190114C_mw.fits",
        "level": "SUCCESS",
        "timestamp": "2021-05-04T10:38:02.3032826Z"
    },
    {
        "container_id": 133,
        "file_id": 60736,
        "user_id": 0,
        "module": "FILE.CREATE",
        "action": "event.init",
        "message": "",
        "level": "INFO",
        "timestamp": "2021-05-04T10:38:02.3421292Z"
    },
    {
        "container_id": 133,
        "file_id": 60736,
        "user_id": 0,
        "module": "Safebox",
        "action": "file.hash",
        "message": "File hash retrieved from FileMeta: \r\n [*] File path: \\CONTAINERS\\133\\Objects\\MAGIC_2019_GRB190114C_mw.fits\r\n [*] File hash: a9ddcea04b67c4b635b9a0504e6fa3ff\r\n [*] Hash algo: md5",
        "level": "SUCCESS",
        "timestamp": "2021-05-04T10:38:02.3493161Z"
    },
    {
        "container_id": 133,
        "file_id": 60736,
        "user_id": 0,
        "module": "Safebox",
        "action": "file.hash",
        "message": "File hash retrieved from FileMeta: \r\n [*] File path: \\CONTAINERS\\133\\Objects\\MAGIC_2019_GRB190114C_mw.fits\r\n [*] File hash: a5597f04de027b7b029ea80c86c0302a2f90e3e6\r\n [*] Hash algo: sha1",
        "level": "SUCCESS",
        "timestamp": "2021-05-04T10:38:02.3559376Z"
    },

In order to understand this one needs to know what the terms such as "action" and "file_hash" mean, in other words the data file containing that text should be associated with its own Representation Information.

OAIS Mandatory Responsibilities

These responsibilities apply to the organisation and not the software. However, one can describe what the software solution should support in order to enable the archive to meet its responsibilities

1. Negotiate for and accept appropriate information from information Producers.

2. Obtain sufficient control of the information provided to the level needed to ensure Long Term Preservation.

3. Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided, thereby defining its Knowledge Base.

4. Ensure that the information to be preserved is Independently Understandable to the Designated Community. In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information.

5. Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, including the demise of the Archive, ensuring that it is never deleted unless allowed as part of an approved strategy. There should be no ad-hoc deletions.

6. Make the preserved information available to the Designated Community and enable the information to be disseminated as copies of, or as traceable to, the original submitted Data Objects with evidence supporting its Authenticity.

Details of how LABDRIVE supports OAIS Conformance, both the OAIS Information Model and the Mandatory Responsibilities, are also provided.

Fundamental Preservation Options

OAIS identifies three basic preservation strategies as time passes and the knowledge base of the Designated Community, including hardware, software, tacit knowledge, changes. Of course the bits of the digital objects must be kept safe.

The three options may be identified as follows.

The Data Object of the Information being preserved may be

  1. kept by the Archive unchanged; or

  2. kept by the Archive but may be changed; or

  3. not kept by the Archive, but instead handed on to another Archive.

Each of these three imply the following:

  • In case 1) the archive may add Representation Information to ensure the Content Information is Independently Understandable.

  • In case 2) the archive may Transform the Data Object of the Information being preserved.

  • In case 3) the archive may hand over the AIP which contains the Object being preserved.

For each of these approaches there will be the need to ensure that an Information Object being preserved continues to be Independently Understandable by the Designated Community, the components of its AIP are not lost and are updated appropriately.

More details are available in Preservation Activities

The preservation of "metadata" is discussed in Introduction.

Last updated