LABDRIVE support for FAIRness
This section contains and evaluation of how well LABDRIVE supports the FAIR guiding principles.
Advantages of FAIR
FAIR is an abbreviation of Findable, Accessible, Interoperable and Reusable. The FAIR Guiding Principles for scientific data management and stewardship are being used by many repositories to show that they have ensured that their data is valuable as it is easier to find through unique identifiers and easier to combine and integrate. The principles provide a checklist when managing scientific, and other, data, to help making decisions which will enable the data to be more useful.
FAIRness and OAIS Representation Information
In order to support Interoperability and Re-usability, Representation Information is essential. Therefore the archive system must allow the addition of more Representation Information than the current Designated Community requires.
For example if the description of the way in which FITS is used is not made available then the meaning of the keyword EFFICIEN in the FITS header may not be understood by someone from a different community who wishes to Re-use that data or combine that data with other data (Interoperate).
Besides preserving information, such as scientific information, LABDRIVE allows one to preserve software systems, whether source code or complete virtual machines, each with the Representation Information needed to be able to use them.
Test steps and test criteria
The proposed test steps are to review each FAIR principle in detail. The ability of LABDRIVE to fulfil each FAIR principle will be described, with specific evidence provided by examples from LABDRIVE instances. We distinguish between
principles which are fulfilled automatically by LABDRIVE
principles which depend upon user choices in LABDRIVE configuration, in which cases we look at LABDRIVE support for the user actions.
The proposed test criteria are that the evidence is found to be convincing by reviewers of this document, on the understanding that LABDRIVE satisfies some of the principles automatically, such as indexing, and in other cases allows a user or archive manager to configure the system appropriately e.g. choose metadata elements such as vocabularies that follow FAIR principles. We provide our own evaluation.
SUMMARY
FAIR PRINCIPLE | OUR EVALUATION |
---|---|
TO BE FINDABLE | |
F1. (meta)data are assigned a globally unique and persistent identifier | FULLY MET |
F2. data are described with rich metadata (defined by R1 below) | FULLY MET |
F3. metadata clearly and explicitly include the identifier of the data it describes | FULLY MET |
F4. (meta)data are registered or indexed in a searchable resource | FULLY MET |
TO BE ACCESSIBLE | |
A1. (meta)data are retrievable by their identifier using a standardized communications protocol | FULLY MET |
A1.1 the protocol is open, free, and universally implementable | FULLY MET |
A1.2 the protocol allows for an authentication and authorization procedure, where necessary | FULLY MET |
A2. metadata are accessible, even when the data are no longer available | FULLY MET |
TO BE INTEROPERABLE | |
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. | FULLY MET |
I2. (meta)data use vocabularies that follow FAIR principles | FULLY MET |
I3. (meta)data include qualified references to other (meta)data | FULLY MET |
TO BE REUSABLE | |
R1. meta(data) are richly described with a plurality of accurate and relevant attributes | FULLY MET |
R1.1. (meta)data are released with a clear and accessible data usage license | FULLY MET |
R1.2. (meta)data are associated with detailed provenance | FULLY MET |
R1.3. (meta)data meet domain-relevant community standards | FULLY MET IN THAT LABDRIVE IS CONFIGURED APPROPRIATELY |
DETAILED EVALUATION of FAIRness
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
Each object within LABDRIVE, whether data or metadata, is assigned a globally unique identifier, which persists as long as the object is in the archive. These unique identifiers may then be registered with a redirection service such as DOI or a URN name resolver, depending upon the granularity required.
For example: https://certification.demo.libnova.com/containers/12/file/128 and associated UUID
When the UUID has been created, it is possible to use the Search Links feature to register a unique URL in an external service, that the platform will use to map a DOI/ARK type of external URL resolver to a platform object.
F1: FULLY MET
F2. data are described with rich metadata (defined by R1 below)
Each object, folder, is associated with one or more “metadata” schema, such as https://certification.demo.libnova.com/configuration/metadata/object. Each schema can be tailored to meet any requirement of standard, for example:
Dublin Core
OAIS Archival Information Model Schema
Each metadata element can be simple descriptive text, or can point to another object, for example a document containing a standard, either inside or outside the archive, or can include both text and pointer.
Containers can also have a more limited set of metadata assigned.
For example :
curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key" (See Appendix for results)
F2: FULLY MET
F3. metadata clearly and explicitly include the identifier of the data it describes
Metadata is always linked to the data it describes in LABDRIVE. This is done by keeping an internal relationship between a data element (a file, for instance), and its metadata. This can be observed in the interface: When selecting an object and selecting its properties, its metadata is displayed:
Analogous result can be observed when using the API. When the details for a given object are requested:
its metadata is delivered:
The opposite is also possible: Users can search for which objects are referenced by a particular metadata value using the Search functionality, allowing the user to select individual elements using the method described in the metadata search documentation, for example:
F3: FULLY MET
F4. (meta)data are registered or indexed in a searchable resource
Every object and metadata element is indexed and this index can be searched.
See the documentation on search for details.
F4: FULLY MET
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
We use the Australian Research Data Commons definition
A standardised communications protocol is one that has been codified as a standard. Examples of standardised communications protocols include WiFi, the Internet Protocol, and the Hypertext Transfer Protocol (HTTP).
Each object, some of which will themselves be metadata e.g. the FITS standard which is metadata for a FITS file, can be retrieved using a URI (standard: https://datatracker.ietf.org/doc/html/rfc3986 ), for example, https://certification.demo.libnova.com/containers/12/file/128
A full list of metadata associated with a file can be accessed with CURL
curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key"
With curl, one can download or upload data using one of the supported protocols including HTTP (standard: https://datatracker.ietf.org/doc/html/rfc2616), HTTPS (standard: https://datatracker.ietf.org/doc/html/rfc2818), SCP , SFTP , and FTP (standard: https://datatracker.ietf.org/doc/html/rfc959).
S3 access uses HTTP and HTTPS as the underlying communications protocol, which have been described above.
A1: FULLY MET
A1.1 the protocol is open, free, and universally implementable
Each of the protocols are open, for example defined in RFCs, they are free and implementable universally.
A1.1: FULLY MET
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
HTTP and HTTPS allow authentication (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication and https://datatracker.ietf.org/doc/html/rfc7235). LABDRIVE supports multiple authentication options see Users and Permissions
HTTP and HTTPS also allow for an authorization procedure (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization). The authorization procedure is normally a function of the server (https://httpd.apache.org/docs/2.4/howto/auth.html). LADRIVE supports a very fine grain level of object specification as described in Users and Permissions
A1.2: FULLY MET
A2. metadata are accessible, even when the data are no longer available
Metadata elements, such as the FITS standard mentioned earlier, are not deleted when the object for which it is metadata is deleted, because (1) that piece of metadata may be used by billions of other objects and (2) the element is itself an object.
Using LABDRIVE, users are free to store the metadata with the objects or keep it independently of them.
For example a package could be created following the recipe for creating a full AIP provided in https://docs.libnova.com/labdrive/concepts/oais-and-iso-16363/labdrive-support-for-oais-conformance#labdrive-and-aips-as-bagit-files. In this way one can extract all the metadata elements required, such as the events that have happened to the data object and all the hashes which have been calculated and re-calculated, and keeps them even the data object is deleted.
Another way in which this can be achieved is storing metadata at container level, so if/when data is deleted, metadata about the object remains.
In cases in which this is not possible, organizations can even achieve it following a procedure when removing data:
Metadata is exported from the object,
Metadata is imported back to an empty folder representing the object that is going to be erased.
The original object is deleted, while retaining its metadata.
The previous process can be achieved following the process outlined in the Importing/Exporting metadata section.
As described in https://docs.libnova.com/labdrive/get-started/file-versioning-and-recovery it is possible to restore a deleted file, with all its metadata, is required.
Of course, there may be some archives in which an object, and all its metadata, must be deleted, perhaps for reasons of security or confidentiality; LABDRIVE can be configured to support that also.
A2: FULLY MET
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
Springer Link defines knowledge representation (https://link.springer.com/referenceworkentry/10.1007/978-1-4419-9863-7_595) as
Knowledge representation refers to the technical problem of encoding human knowledge and reasoning (Automated Reasoning) into a symbolic language that enables it to be processed by information systems. In systems biology, knowledge representation is used to infuse data with scientific concepts and understanding in order to maximize its utility for furthering scientific insight.
LABDRIVE supports many formal metadata languages (see https://docs.libnova.com/labdrive/configuration/metadata). For examples include
Other Representation Information includes software code which certain is “symbolic language that enables it to be processed by information systems”
Descriptive Information: Dublin Core
All these, and many others, are accessible, shared, and broadly applicable, as described in the above documentation.
The example Representation Information Network shows that the metadata which described a piece of scientific data, with its own metadata.
I1: FULLY MET
I2. (meta)data use vocabularies that follow FAIR principles
Where the vocabularies are archived within LABDRIVE, then they will also follow FAIR principles. Vocabularies outside LABDRIVE can be used (pointed to) and it is up to specifier of the vocabularies to choose ones which follow FAIR principles.
Vocabularies can be defined with the methods described in the Configure metadata section, using:
To point to external ones, you can use the linked fields, that are referenced here:
I2: FULLY MET
I3. (meta)data include qualified references to other (meta)data
Where the metadata elements are objects preserved within LABDRIVE then they themselves with have appropriate metadata, as illustrated here
I3: FULLY MET
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
LABDRIVE allows an unlimited number of metadata schema and schema elements to be associated with data as well as metadata, allowing each to be “richly described with a plurality of accurate and relevant attributes”. In particular the full set of Representation Information can be attached, as described in the response to I1 above.
R1: FULLY MET
R1.1. (meta)data are released with a clear and accessible data usage license
Every object, whether data or metadata, has configurable access permissions (https://docs.libnova.com/labdrive/configuration/sharing-and-permissions). Each preserved object should also have Access Rights Information (https://docs.libnova.com/labdrive/concepts/oais-and-iso-16363/planning-for-preservation#access-rights) which can point to the associated licences.
R1.1: FULLY MET
R1.2. (meta)data are associated with detailed provenance
Detailed Provenance Information is associated with each object (data as well as metadata) in two forms.
1) explicitly as https://docs.libnova.com/labdrive/data-curation-and-preservation-1/oais-based-information-preservation-curation-and-exploitation/how-to-deal-with-additional-information/provenance-information?q=provenance, https://docs.libnova.com/labdrive/data-curation-and-preservation-1/oais-based-information-preservation-curation-and-exploitation/reproducing-research#use-of-provenance-in-reproducibility
2) the internal LABDRIVE events may be listed as part of the Provenance using, for example
curl $your_labdrive_url/api/file/128/event -H "authorization: Bearer $your_labdrive_api_key"
R1.2: FULLY MET
R1.3. (meta)data meet domain-relevant community standards
LABDRIVE can be configured to support specific community standard formats or to alert the user if non-standard formats are used. It is up to the user/administrator to ensure that the system is configured appropriately.
Examples on metadata can be configured: https://docs.libnova.com/labdrive/configuration/metadata
R1.3: FULLY MET IN THAT LABDRIVE CAN BE CONFIGURED APPROPRIATELY
Conclusion
LABDRIVE supports many of the FAIR principles automatically, as is the case with identifiers and the internal events which are part of the Provenance. Other FAIR principles cannot be satisfied by software alone because they require choices to be made by the researcher or archive manager. In these cases the only consistent way to decide whether or not the FAIR principles are met is if the software can be configured appropriately.
This document proposes the Test Criteria to be consistent with this understanding.
The evidence is presented for each of the FAIR principles as descriptive text, supported by LABDRIVE documentation, with concrete examples where possible.
Overall we feel that reviewers will agree that a LABDRIVE repository fully supports the FAIR principles and so this evaluation has been PASSED.
APPENDIX: EXAMPLES
F2 example
curl $your_labdrive_url/api/file/128/metadata -H "authorization: Bearer $your_labdrive_api_key"
Last updated