LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • Introduction to the LABDRIVE API
  • Containers
  • Objects/items
  • Tips when working with the API and creating your scripts
  • Safe approach for the container creation and permissions assignment
  • Safe approach for uploading and modifying a file's property or metadata field immediately after uploading it
  • Properly managing limit/offset values

Was this helpful?

  1. DEVELOPER'S GUIDE

API Extended documentation

PreviousScriptingNextLABDRIVE Functions gallery

Last updated 3 years ago

Was this helpful?

LABDRIVE includes a powerful HTTP Restful API capable of performing any system action or adjust any platform setting.

To better understand the platform, we recommend the reader to review and first.

See here the , or continue reading for an introduction to its methods.

A python library to make using the API easier and more convenient is available: and its . This library can be used in Functions, Jupyter Notebooks and in your own scripts.

When working with the API, a maximum execution time of 600 seconds for a single query is imposed by the platform.

For example: If the user wants to associate multiple metadata values to descriptors for a single item in a single query, the maximum number of operations in a single query are between 20.000 and 30.000 operations. This number can be higher or lower depending on the type of operation, character count of the values, server node load, etc.

Introduction to the LABDRIVE API

While some usage examples of the API are included across every section in the documentation, here we have a more specific documentation about all existing methods:

Containers

Data containers are the basic way of grouping content in LABDRIVE. See available here.

Tags

It is possible to associate tags to elements (files or folders), so you can search for them later. See to create, edit or remove them. To assign a tag to a file, review the /file/{id}/tag/{tag_id} in the section.

Workflows

Each data container can be associated to a workflow step. Containers can be listed by the step they are in, for the users to know which container is in which process. Workflows can be created, edited or deleted with the methods in the section.

Lifecycle policies

Container metadata

Public share

Container templates

Objects/items

Data containers preserve files/folders inside (both things are files inside LABDRIVE), that have metadata associated to them, becoming objects.

Files

Metadata

PRONOM

Archival structure

Submission areas

Reports

Events

Functions

LABDRIVE users are capable of defining lambda functions (code) that the platform executes on certain triggers, massively increasing the customization and adaptation of LABDRIVE to some use cases.

Jobs

Users

Tips when working with the API and creating your scripts

Safe approach for the container creation and permissions assignment

When you create a new container using the API POST /container method, LABDRIVE needs to create it in the S3 bucket and assign permissions to your user (and others with permissions to the same data container) in order for you to be able to start uploading to it. This process may need a few seconds to complete.

This way, if you try to create a new container and immediately after making the request, you try to write to it using S3, you will get 404 or permission denied-related errors. Permissions will be adjusted under 5 seconds in most situations, but the safe approach to this is to:

  • Create the data container with the POST /container

  • Loop until you can write your first file without getting an error back

  • Continue your uploads

Safe approach for uploading and modifying a file's property or metadata field immediately after uploading it

When you upload a new file using S3, LABDRIVE needs to:

  • Phase I (Index your file): Detect the file in the storage to create LABDRIVE-internal data structures (like assigning the file ID to every file).

  • Phase II (System functions): LABDRIVE carries out its basic and mandatory preservation-related actions (integrity hashes calculation, characterization, etc).

  • Phase III (User functions): And finally, user-level lambda functions are called by the system.

Uploaded files cannot be handled with the API until LABDRIVE has completed the phase I, and the platform gains consistency for the file. This initial process takes less than half a second under normal circumstances. So, if your code uploads a file and immediately after the upload it tries to get its file ID, LABDRIVE may not show it yet. As this period of time depends on the platform workload (and not always half a second), your code needs to be ready for this to happen, and the safe approach to this is to:

  • Upload your file using S3

  • Loop until you get the file ID using the /container/{containerID}/file/path/{your path}

    • If the file is not yet indexed (or if it does not exist, of course), LABDRIVE will return a 404 error.

    • If the file is indexed, LABDRIVE will return the file details.

  • Assign your metadata or any other action over the file using the file ID.

Under high workload (for instance, uploading 2 million files), the first file query tells us that the file does not exist if we check it immediately after uploading the file:

But, if we query it again, you get our result:

In this example, the platform is getting consistency in less than one second, but this works the first time you query for it, or need a few extra seconds. So, as a general recommendation, follow the safe approach described above.

Regarding the output of system functions and user functions, follow the same advice. Until the file hashing and characterization process is finished, your results may show a file without hashes (or only with some of them) or without characterization result. Your code needs to be ready for this to happen.

If you want to show some feedback to your users in your code, the container details method /container/{container id} shows the property files_pending_ingestion, that indicates true if there are files still to be processed or false if everything has been processed in the phase II:

Properly managing limit/offset values

On every search/list request, it is possible to include two values to delimit the number of search results that we would like to get as the result of a search process: limit and offset.

  • limit defines the number of results you want for your query, with a maximum of 200.

  • offset. defines from which result (and not the page) you would like to start.

For instance, if you have 4 files in your containers (with ids 1 to 4), if you make a query with limit=2 and offset = 0, you will get objects 1 and 2. If you make a query with limit=2 and offset = 1, you will get objects 2 and 3.

For instance, with this search query, we are requesting LABDRIVE to return up to 20 results:

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 0
        }'

When we want to request 20 additional ones, we should use:

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 20
        }'

And for the next 20,

curl --request GET  --url "$your_labdrive_url/api/file" \
     --header "Content-Type: application/json" \
     --header "authorization: Bearer $your_labdrive_api_key" \
     --data '{
            "limit" : 20,
            "offset" : 40
        }'

Note that the number of total results is usually provided in the platform answer:

There is a maximum limit of 200 results per request. Any value for limit greater than 200 is ignored.

Lifecycle policies can protect the content or erase it automatically based on dates or periods of time (e.g.: "make the content inmutable once I upload it for 5 years"). Policies can be maintained in the section, and applied using the /container/{id}/lifecycle-policy/{lifecycle_id}

Data containers can have metadata associated to describe them. Metadata is grouped into schemas and fields/descriptors, that are associated to the containers. See , and assign them when creating containers.

Files, folders or whole containers can be made publicly-accesible for un-authenticated/anonymous users using the methods in the section.

When data containers are created, a lot of parameters need to be defined. Some users prefer to create templates with all their settings and simply apply them when the container is created. section describes the methods.

See methods to create and assign other properties to them in the .

Files/folders can have metadata associated. Metadata is grouped into metadata schemas. How to maintain them is described in the section. Actual values for the fields defined in the schema can be associated/obtained to/from the objects using /file/{id}/metadata

For digital preservation purposes, it is important to understand the format of a file. To do that, the preservation community uses the PRONOM standard. Methods to work with PRONOM are defined in the section.

Data containers are organized in an hierarchical way using the Archival structure and the Archival structure nodes. This is managed using the methods.

Submission areas can be created so anonymous or unauthenticated users can ingest content into the platform (without being able to access it after depositing it). Submission areas are managed using the methods available in the section.

Reports can be launched, retrieved or scheduled using the .

Many user or system actions are retained by LABDRIVE. They can be accessed using the methods in the section.

When a Function is executed by a user, it creates a job, that is used to retrieve the function output. Jobs-related methods are accessed in the section.

User accounts are managed using the . Users can be grouped in Groups, that can be maintained in the section. Permissions are assigned to users or groups, to make them capable of performing certain actions, and are adjusted in the section.

What is LABDRIVE
Architecture
full list of API methods
library
documentation
container methods
tag methods
file methods
Workflow methods
Lifecycle policies
container's methods
Container metadata schemas section
Sharing
Container templates
file methods section
Object metadata
PRONOM
Archival Structure
Submission Areas
Report methods
Events
Jobs
User methods
Groups
Permissions