LABDRIVE support for FAIRness

This section contains and evaluation of how well LABDRIVE supports the FAIR guiding principles.

Advantages of FAIR

FAIR is an abbreviation of Findable, Accessible, Interoperable and Reusable. The FAIR Guiding Principles for scientific data management and stewardship are being used by many repositories to show that they have ensured that their data is valuable as it is easier to find through unique identifiers and easier to combine and integrate. The principles provide a checklist when managing scientific, and other, data, to help making decisions which will enable the data to be more useful.

FAIRness and OAIS Representation Information

In order to support Interoperability and Re-usability, Representation Information is essential. Therefore the archive system must allow the addition of more Representation Information than the current Designated Community requires.

For example if the description of the way in which FITS is used is not made available then the meaning of the keyword EFFICIEN in the FITS header may not be understood by someone from a different community who wishes to Re-use that data or combine that data with other data (Interoperate).

EFFICIEN[float]:  Average efficiency of the gamma/hadron separation cut(s) (in %).

Besides preserving information, such as scientific information, LABDRIVE allows one to preserve software systems, whether source code or complete virtual machines, each with the Representation Information needed to be able to use them.

Test steps and test criteria

The proposed test steps are to review each FAIR principle in detail. The ability of LABDRIVE to fulfil each FAIR principle will be described, with specific evidence provided by examples from LABDRIVE instances. We distinguish between

  • principles which are fulfilled automatically by LABDRIVE

  • principles which depend upon user choices in LABDRIVE configuration, in which cases we look at LABDRIVE support for the user actions.

The proposed test criteria are that the evidence is found to be convincing by reviewers of this document, on the understanding that LABDRIVE satisfies some of the principles automatically, such as indexing, and in other cases allows a user or archive manager to configure the system appropriately e.g. choose metadata elements such as vocabularies that follow FAIR principles. We provide our own evaluation.

SUMMARY

FAIR PRINCIPLE
OUR EVALUATION

TO BE FINDABLE

F1. (meta)data are assigned a globally unique and persistent identifier

FULLY MET

F2. data are described with rich metadata (defined by R1 below)

FULLY MET

F3. metadata clearly and explicitly include the identifier of the data it describes

FULLY MET

F4. (meta)data are registered or indexed in a searchable resource

FULLY MET

TO BE ACCESSIBLE

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

FULLY MET

A1.1 the protocol is open, free, and universally implementable

FULLY MET

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

FULLY MET

A2. metadata are accessible, even when the data are no longer available

FULLY MET

TO BE INTEROPERABLE

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

FULLY MET

I2. (meta)data use vocabularies that follow FAIR principles

FULLY MET

I3. (meta)data include qualified references to other (meta)data

FULLY MET

TO BE REUSABLE

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

FULLY MET

R1.1. (meta)data are released with a clear and accessible data usage license

FULLY MET

R1.2. (meta)data are associated with detailed provenance

FULLY MET

R1.3. (meta)data meet domain-relevant community standards

FULLY MET IN THAT LABDRIVE IS CONFIGURED APPROPRIATELY

DETAILED EVALUATION of FAIRness

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

Each object within LABDRIVE, whether data or metadata, is assigned a globally unique identifier, which persists as long as the object is in the archive. These unique identifiers may then be registered with a redirection service such as DOI or a URN name resolver, depending upon the granularity required.

For example: https://certification.demo.libnova.com/containers/12/file/128 and associated UUID

When the UUID has been created, it is possible to use the Search Links feature to register a unique URL in an external service, that the platform will use to map a DOI/ARK type of external URL resolver to a platform object.

F1: FULLY MET

F2. data are described with rich metadata (defined by R1 below)

Each object, folder, is associated with one or more “metadata” schema, such as https://certification.demo.libnova.com/configuration/metadata/object. Each schema can be tailored to meet any requirement of standard, for example:

  • Dublin Core

  • OAIS Archival Information Model Schema

Each metadata element can be simple descriptive text, or can point to another object, for example a document containing a standard, either inside or outside the archive, or can include both text and pointer.

Containers can also have a more limited set of metadata assigned.

For example :

curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key" (See Appendix for results)

F2: FULLY MET

F3. metadata clearly and explicitly include the identifier of the data it describes

Metadata is always linked to the data it describes in LABDRIVE. This is done by keeping an internal relationship between a data element (a file, for instance), and its metadata. This can be observed in the interface: When selecting an object and selecting its properties, its metadata is displayed:

Analogous result can be observed when using the API. When the details for a given object are requested:

$ curl --request GET \
        --url "$your_platform_url/api/file/13015" \
        --header "authorization: Bearer $your_platform_api_key"
        

its metadata is delivered:

The opposite is also possible: Users can search for which objects are referenced by a particular metadata value using the Search functionality, allowing the user to select individual elements using the method described in the metadata search documentation, for example:

F3: FULLY MET

F4. (meta)data are registered or indexed in a searchable resource

Every object and metadata element is indexed and this index can be searched.

See the documentation on search for details.

F4: FULLY MET

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

We use the Australian Research Data Commons definition

A standardised communications protocol is one that has been codified as a standard. Examples of standardised communications protocols include WiFi, the Internet Protocol, and the Hypertext Transfer Protocol (HTTP).

Each object, some of which will themselves be metadata e.g. the FITS standard which is metadata for a FITS file, can be retrieved using a URI (standard: https://datatracker.ietf.org/doc/html/rfc3986 ), for example, https://certification.demo.libnova.com/containers/12/file/128

A full list of metadata associated with a file can be accessed with CURL

curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key"

With curl, one can download or upload data using one of the supported protocols including HTTP (standard: https://datatracker.ietf.org/doc/html/rfc2616), HTTPS (standard: https://datatracker.ietf.org/doc/html/rfc2818), SCP , SFTP , and FTP (standard: https://datatracker.ietf.org/doc/html/rfc959).

S3 access uses HTTP and HTTPS as the underlying communications protocol, which have been described above.

A1: FULLY MET

A1.1 the protocol is open, free, and universally implementable

Each of the protocols are open, for example defined in RFCs, they are free and implementable universally.

A1.1: FULLY MET

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

HTTP and HTTPS allow authentication (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication and https://datatracker.ietf.org/doc/html/rfc7235). LABDRIVE supports multiple authentication options see Users and Permissions

HTTP and HTTPS also allow for an authorization procedure (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization). The authorization procedure is normally a function of the server (https://httpd.apache.org/docs/2.4/howto/auth.html). LADRIVE supports a very fine grain level of object specification as described in Users and Permissions

A1.2: FULLY MET

A2. metadata are accessible, even when the data are no longer available

Metadata elements, such as the FITS standard mentioned earlier, are not deleted when the object for which it is metadata is deleted, because (1) that piece of metadata may be used by billions of other objects and (2) the element is itself an object.

Using LABDRIVE, users are free to store the metadata with the objects or keep it independently of them.

For example a package could be created following the recipe for creating a full AIP provided in https://docs.libnova.com/labdrive/concepts/oais-and-iso-16363/labdrive-support-for-oais-conformance#labdrive-and-aips-as-bagit-files. In this way one can extract all the metadata elements required, such as the events that have happened to the data object and all the hashes which have been calculated and re-calculated, and keeps them even the data object is deleted.

Another way in which this can be achieved is storing metadata at container level, so if/when data is deleted, metadata about the object remains.

In cases in which this is not possible, organizations can even achieve it following a procedure when removing data:

  1. Metadata is exported from the object,

  2. Metadata is imported back to an empty folder representing the object that is going to be erased.

  3. The original object is deleted, while retaining its metadata.

The previous process can be achieved following the process outlined in the Importing/Exporting metadata section.

As described in https://docs.libnova.com/labdrive/get-started/file-versioning-and-recovery it is possible to restore a deleted file, with all its metadata, is required.

Of course, there may be some archives in which an object, and all its metadata, must be deleted, perhaps for reasons of security or confidentiality; LABDRIVE can be configured to support that also.

A2: FULLY MET

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

Springer Link defines knowledge representation (https://link.springer.com/referenceworkentry/10.1007/978-1-4419-9863-7_595) as

Knowledge representation refers to the technical problem of encoding human knowledge and reasoning (Automated Reasoning) into a symbolic language that enables it to be processed by information systems. In systems biology, knowledge representation is used to infuse data with scientific concepts and understanding in order to maximize its utility for furthering scientific insight.

LABDRIVE supports many formal metadata languages (see https://docs.libnova.com/labdrive/configuration/metadata). For examples include

All these, and many others, are accessible, shared, and broadly applicable, as described in the above documentation.

The example Representation Information Network shows that the metadata which described a piece of scientific data, with its own metadata.

I1: FULLY MET

I2. (meta)data use vocabularies that follow FAIR principles

Where the vocabularies are archived within LABDRIVE, then they will also follow FAIR principles. Vocabularies outside LABDRIVE can be used (pointed to) and it is up to specifier of the vocabularies to choose ones which follow FAIR principles.

Vocabularies can be defined with the methods described in the Configure metadata section, using:

To point to external ones, you can use the linked fields, that are referenced here:

I2: FULLY MET

I3. (meta)data include qualified references to other (meta)data

Where the metadata elements are objects preserved within LABDRIVE then they themselves with have appropriate metadata, as illustrated here

I3: FULLY MET

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

LABDRIVE allows an unlimited number of metadata schema and schema elements to be associated with data as well as metadata, allowing each to be “richly described with a plurality of accurate and relevant attributes”. In particular the full set of Representation Information can be attached, as described in the response to I1 above.

R1: FULLY MET

R1.1. (meta)data are released with a clear and accessible data usage license

Every object, whether data or metadata, has configurable access permissions (https://docs.libnova.com/labdrive/configuration/sharing-and-permissions). Each preserved object should also have Access Rights Information (https://docs.libnova.com/labdrive/concepts/oais-and-iso-16363/planning-for-preservation#access-rights) which can point to the associated licences.

R1.1: FULLY MET

R1.2. (meta)data are associated with detailed provenance

Detailed Provenance Information is associated with each object (data as well as metadata) in two forms.

1) explicitly as https://docs.libnova.com/labdrive/data-curation-and-preservation-1/oais-based-information-preservation-curation-and-exploitation/how-to-deal-with-additional-information/provenance-information?q=provenance, https://docs.libnova.com/labdrive/data-curation-and-preservation-1/oais-based-information-preservation-curation-and-exploitation/reproducing-research#use-of-provenance-in-reproducibility

2) the internal LABDRIVE events may be listed as part of the Provenance using, for example

curl $your_labdrive_url/api/file/128/event -H "authorization: Bearer $your_labdrive_api_key"

R1.2: FULLY MET

R1.3. (meta)data meet domain-relevant community standards

LABDRIVE can be configured to support specific community standard formats or to alert the user if non-standard formats are used. It is up to the user/administrator to ensure that the system is configured appropriately.

Examples on metadata can be configured: https://docs.libnova.com/labdrive/configuration/metadata

R1.3: FULLY MET IN THAT LABDRIVE CAN BE CONFIGURED APPROPRIATELY

Conclusion

LABDRIVE supports many of the FAIR principles automatically, as is the case with identifiers and the internal events which are part of the Provenance. Other FAIR principles cannot be satisfied by software alone because they require choices to be made by the researcher or archive manager. In these cases the only consistent way to decide whether or not the FAIR principles are met is if the software can be configured appropriately.

This document proposes the Test Criteria to be consistent with this understanding.

The evidence is presented for each of the FAIR principles as descriptive text, supported by LABDRIVE documentation, with concrete examples where possible.

Overall we feel that reviewers will agree that a LABDRIVE repository fully supports the FAIR principles and so this evaluation has been PASSED.

APPENDIX: EXAMPLES

F2 example

curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key"

  {
            "id": "37",
            "metadata_schema_descriptor_id": "69",
            "container_id": "12",
            "file_id": "128",
            "value": "Provenance",
            "creator": "8",
            "iecode": "provenanceOais",
            "linked": {
                "file_id": "130",
                "container_id": null,
                "target": "BOTH",
                "link": "/containers/12/file/130"
            }
        },
        {
            "id": "38",
            "metadata_schema_descriptor_id": "70",
            "container_id": "12",
            "file_id": "128",
            "value": "Context text",
            "creator": "8",
            "iecode": "contextOais",
            "linked": {
                "file_id": "133",
                "container_id": null,
                "target": "BOTH",
                "link": "/containers/12/file/133"
            }
        },
        {
            "id": "39",
            "metadata_schema_descriptor_id": "71",
            "container_id": "12",
            "file_id": "128",
            "value": "Ref text",
            "creator": "8",
            "iecode": "referenceOais",
            "linked": {
                "file_id": "132",
                "container_id": null,
                "target": "BOTH",
                "link": "/containers/12/file/132"
            }
        },
        {
            "id": "40",
            "metadata_schema_descriptor_id": "72",
            "container_id": "12",
            "file_id": "128",
            "value": "Fixity description",
            "creator": "8",
            "iecode": "fixityOais",
            "linked": {
                "file_id": "134",
                "container_id": null,
                "target": "BOTH",
                "link": "/containers/12/file/134"
            }
        },
        {
            "id": "41",
            "metadata_schema_descriptor_id": "73",
            "container_id": "12",
            "file_id": "128",
            "value": "Access rights description",
            "creator": "8",
            "iecode": "accessRightsOais",
            "linked": {
                "file_id": null,
                "container_id": null,
                "target": "BOTH",
                "link": "/container/12/file/131"
            }
        },
        {
            "id": "42",
            "metadata_schema_descriptor_id": "74",
            "container_id": "12",
            "file_id": "128",
            "value": "Package description text",
            "creator": "8",
            "iecode": "packageDescriptionOais",
            "linked": {
                "file_id": null,
                "container_id": null,
                "target": "BOTH",
                "link": "/container/12/file/129"
            }
        }
        

Last updated