What is LABDRIVE

LABDRIVE is a Research Data Management and Preservation platform. It allows organizations to capture the research data they produce, helping them to properly manage, preserve and allow access to it, during the whole data lifecycle.

LABDRIVE allows organizations to transition from a siloed approach in which each series of datasets, departments or units are using multiple, disaggregated systems to keep content to a single repository that can adapt to the particularities of each dataset, unifying all content in a single platform. The platform works for organizations with a few gigabytes of data, to organizations managing several petabytes.

Continue reading here or go to:

The CONCEPTS section, to get used to the LABDRIVE terminology and architecture
The GETTING STARTED section, for a practical, hands-on introduction to the usual processes.
The CONFIGURATION section, on how to create structures, policies, configure the system and permissions.
The DATA CURATION AND PRESERVATION section, on how to prepare data for curation and preservation within LABDRIVE.
If you want to quickly learn how to adapt LABDRIVE to your data set, you can see how LABDRIVE is configured to work with datasets from organizations like the CERN, PIC, EMBL-EBI or DESY in the DATA RECIPES section.

LABDRIVE foundation

Metadata-driven virtualized scalable storage
- Admins can assign a specific Storage Policy to each Data Container in the platform, to dictate storage types, replicas, technologies, providers and integrity policies to use at data container level.
- A single repository supports multiple storage providers and types. Always thinking on high volumes of content.
- Transition from one storage policy to another (even from a storage provider to another) is fully managed by the platform.
- Storage is virtualized so files' path remains unchanged when the underlying storage technology is changed.
- Extensible storage architecture (cloud object storage, CEPH, tapes, etc)
Code-driven advanced content management
- LABDRIVE lambda functions can be defined by the organizations (or integrators) so the platform automatically processes the content using the logic they define.
Easy to use and powerful
- Equally-capable web interface and API, so users can easily manage the platform while power users can automate every process.
Strong digital preservation technology
- Digital preservation principles always present: Data protection comes first.
- Fully aligned with OAIS, ISO16363, redundant checks and safe processes.

Basic concepts

Content, represented as files/folders + metadata, lives inside a Data Container (that are like S3 buckets or Azure containers).
Content in a given Data Container share some commonalities:
- Metadata schema to use,
- Storage policy,
- Functions,
- Permissions,
- and others
Data Containers are grouped in collections or sub collections (archive nodes), creating a way for users to group and organize datasets and content.
Users/organizations are able to see the whole tree or just a fraction of it, depending on their permissions.
LABDRIVE lambda functions are able to process files as they are created, periodically or by request, providing a powerful way to process content.

Core platform capacities

01 Automation with code

User is able to define data container-level lambda functions (LABDRIVE Lambda functions) that are executed on certain events. See more in the Functions section.

02 Virtualized storage

LABDRIVE creates an abstraction layer between the user and from the content, making it possible to define policies that drive the infrastructure. See more in the Storage section.

03 Metadata and discovery

It is possible to define metadata schemas, that are associated to each item in a data container, making it possible to store any type of metadata (structured [XML, JSON and triples/links] or simple [strings, dates, etc]) with your data.

LABDRIVE includes search capabilities in the Management Interface and when using the API. Metadata can be automatically imported (and exported) using the LABDRIVE Lambda functions. See more in the Metadata section.

LABDRIVE allows users to access content using multiple protocols: S3, SFTP, NFS, rsync, in addition to the capable management web interface. Accessing the same data container using multiple protocols is possible.

05 Smart reports

Smart reports allow you to get insights about your content, and to analyze it.

06 Federated access

Content is organized by collections and sub collections (of data containers). Permissions for content and metadata are data container-based or collection-based (inherited). Users can belong to multiple organizations and can log-in using their own identity provider. See more in the Federated access section.

NextArchitecture and overview

Last updated 1 year ago

Was this helpful?