API Extended documentation

LABDRIVE includes a powerful HTTP Restful API capable of performing any system action or adjust any platform setting.

To better understand the platform, we recommend the reader to review What is LABDRIVE and Architecture first.

See here the full list of API methods, or continue reading for an introduction to its methods.

A python library to make using the API easier and more convenient is available: library and its documentation. This library can be used in Functions, Jupyter Notebooks and in your own scripts.

When working with the API, a maximum execution time of 600 seconds for a single query is imposed by the platform.

For example: If the user wants to associate multiple metadata values to descriptors for a single item in a single query, the maximum number of operations in a single query are between 20.000 and 30.000 operations. This number can be higher or lower depending on the type of operation, character count of the values, server node load, etc.

Introduction to the LABDRIVE API

While some usage examples of the API are included across every section in the documentation, here we have a more specific documentation about all existing methods:

Containers

Data containers are the basic way of grouping content in LABDRIVE. See available container methods here.

Workflows

Each data container can be associated to a workflow step. Containers can be listed by the step they are in, for the users to know which container is in which process. Workflows can be created, edited or deleted with the methods in the Workflow methods section.

Lifecycle policies

Lifecycle policies can protect the content or erase it automatically based on dates or periods of time (e.g.: "make the content inmutable once I upload it for 5 years"). Policies can be maintained in the Lifecycle policies section, and applied using the container's methods /container/{id}/lifecycle-policy/{lifecycle_id}

Container metadata

Data containers can have metadata associated to describe them. Metadata is grouped into schemas and fields/descriptors, that are associated to the containers. See Container metadata schemas section, and assign them when creating containers.

Files, folders or whole containers can be made publicly-accesible for un-authenticated/anonymous users using the methods in the Sharing section.

Container templates

When data containers are created, a lot of parameters need to be defined. Some users prefer to create templates with all their settings and simply apply them when the container is created. Container templates section describes the methods.

Objects/items

Data containers preserve files/folders inside (both things are files inside LABDRIVE), that have metadata associated to them, becoming objects.

Files

See methods to create and assign other properties to them in the file methods section.

Metadata

Files/folders can have metadata associated. Metadata is grouped into metadata schemas. How to maintain them is described in the Object metadata section. Actual values for the fields defined in the schema can be associated/obtained to/from the objects using /file/{id}/metadata

PRONOM

For digital preservation purposes, it is important to understand the format of a file. To do that, the preservation community uses the PRONOM standard. Methods to work with PRONOM are defined in the PRONOM section.

Archival structure

Data containers are organized in an hierarchical way using the Archival structure and the Archival structure nodes. This is managed using the Archival Structure methods.

Submission areas

Submission areas can be created so anonymous or unauthenticated users can ingest content into the platform (without being able to access it after depositing it). Submission areas are managed using the methods available in the Submission Areas section.

Reports

Reports can be launched, retrieved or scheduled using the Report methods.

Events

Many user or system actions are retained by LABDRIVE. They can be accessed using the methods in the Events section.

Functions

LABDRIVE users are capable of defining lambda functions (code) that the platform executes on certain triggers, massively increasing the customization and adaptation of LABDRIVE to some use cases.

Jobs

When a Function is executed by a user, it creates a job, that is used to retrieve the function output. Jobs-related methods are accessed in the Jobs section.

Users

User accounts are managed using the User methods. Users can be grouped in Groups, that can be maintained in the Groups section. Permissions are assigned to users or groups, to make them capable of performing certain actions, and are adjusted in the Permissions section.

Tips when working with the API and creating your scripts

Safe approach for the container creation and permissions assignment

When you create a new container using the API POST /container method, LABDRIVE needs to create it in the S3 bucket and assign permissions to your user (and others with permissions to the same data container) in order for you to be able to start uploading to it. This process may need a few seconds to complete.

This way, if you try to create a new container and immediately after making the request, you try to write to it using S3, you will get 404 or permission denied-related errors. Permissions will be adjusted under 5 seconds in most situations, but the safe approach to this is to:

Create the data container with the POST /container
Loop until you can write your first file without getting an error back
Continue your uploads

Safe approach for uploading and modifying a file's property or metadata field immediately after uploading it

When you upload a new file using S3, LABDRIVE needs to:

Phase I (Index your file): Detect the file in the storage to create LABDRIVE-internal data structures (like assigning the file ID to every file).
Phase II (System functions): LABDRIVE carries out its basic and mandatory preservation-related actions (integrity hashes calculation, characterization, etc).
Phase III (User functions): And finally, user-level lambda functions are called by the system.

Uploaded files cannot be handled with the API until LABDRIVE has completed the phase I, and the platform gains consistency for the file. This initial process takes less than half a second under normal circumstances. So, if your code uploads a file and immediately after the upload it tries to get its file ID, LABDRIVE may not show it yet. As this period of time depends on the platform workload (and not always half a second), your code needs to be ready for this to happen, and the safe approach to this is to:

Upload your file using S3
Loop until you get the file ID using the /container/{containerID}/file/path/{your path}
- If the file is not yet indexed (or if it does not exist, of course), LABDRIVE will return a 404 error.
- If the file is indexed, LABDRIVE will return the file details.
Assign your metadata or any other action over the file using the file ID.

Under high workload (for instance, uploading 2 million files), the first file query tells us that the file does not exist if we check it immediately after uploading the file:

But, if we query it again, you get our result:

In this example, the platform is getting consistency in less than one second, but this works the first time you query for it, or need a few extra seconds. So, as a general recommendation, follow the safe approach described above.

Regarding the output of system functions and user functions, follow the same advice. Until the file hashing and characterization process is finished, your results may show a file without hashes (or only with some of them) or without characterization result. Your code needs to be ready for this to happen.

If you want to show some feedback to your users in your code, the container details method /container/{container id} shows the property files_pending_ingestion, that indicates true if there are files still to be processed or false if everything has been processed in the phase II:

Properly managing limit/offset values

On every search/list request, it is possible to include two values to delimit the number of search results that we would like to get as the result of a search process: limit and offset.

limit defines the number of results you want for your query, with a maximum of 200.
offset. defines from which result (and not the page) you would like to start.

For instance, if you have 4 files in your containers (with ids 1 to 4), if you make a query with limit=2 and offset = 0, you will get objects 1 and 2. If you make a query with limit=2 and offset = 1, you will get objects 2 and 3.

For instance, with this search query, we are requesting LABDRIVE to return up to 20 results:

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 0
        }'

When we want to request 20 additional ones, we should use:

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 20
        }'

And for the next 20,

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 40
        }'

Note that the number of total results is usually provided in the platform answer:

There is a maximum limit of 200 results per request. Any value for limit greater than 200 is ignored.

PreviousScripting NextLABDRIVE Functions gallery

Last updated 3 years ago

Was this helpful?