LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • LABDRIVE foundation
  • Basic concepts
  • Core platform capacities
  • 01 Automation with code
  • 02 Virtualized storage
  • 03 Metadata and discovery
  • 04 Multi-protocol file sharing
  • 05 Smart reports
  • 06 Federated access

Was this helpful?

What is LABDRIVE

NextArchitecture and overview

Last updated 1 year ago

Was this helpful?

LABDRIVE is a Research Data Management and Preservation platform. It allows organizations to capture the research data they produce, helping them to properly manage, preserve and allow access to it, during the whole data lifecycle.

LABDRIVE allows organizations to transition from a siloed approach in which each series of datasets, departments or units are using multiple, disaggregated systems to keep content to a single repository that can adapt to the particularities of each dataset, unifying all content in a single platform. The platform works for organizations with a few gigabytes of data, to organizations managing several petabytes.

Continue reading here or go to:

  • The CONCEPTS section, to get used to the LABDRIVE terminology and architecture

  • The GETTING STARTED section, for a practical, hands-on introduction to the usual processes.

  • The CONFIGURATION section, on how to create structures, policies, configure the system and permissions.

  • The DATA CURATION AND PRESERVATION section, on how to prepare data for curation and preservation within LABDRIVE.

  • If you want to quickly learn how to adapt LABDRIVE to your data set, you can see how LABDRIVE is configured to work with datasets from organizations like the CERN, PIC, EMBL-EBI or DESY in the DATA RECIPES section.

LABDRIVE foundation

  • Metadata-driven virtualized scalable storage

    • Admins can assign a specific Storage Policy to each Data Container in the platform, to dictate storage types, replicas, technologies, providers and integrity policies to use at data container level.

    • A single repository supports multiple storage providers and types. Always thinking on high volumes of content.

    • Transition from one storage policy to another (even from a storage provider to another) is fully managed by the platform.

    • Storage is virtualized so files' path remains unchanged when the underlying storage technology is changed.

    • Extensible storage architecture (cloud object storage, CEPH, tapes, etc)

  • Code-driven advanced content management

    • LABDRIVE lambda functions can be defined by the organizations (or integrators) so the platform automatically processes the content using the logic they define.

  • Easy to use and powerful

    • Equally-capable web interface and API, so users can easily manage the platform while power users can automate every process.

  • Strong digital preservation technology

    • Digital preservation principles always present: Data protection comes first.

    • Fully aligned with OAIS, ISO16363, redundant checks and safe processes.

Basic concepts

  • Content, represented as files/folders + metadata, lives inside a Data Container (that are like S3 buckets or Azure containers).

  • Content in a given Data Container share some commonalities:

    • Metadata schema to use,

    • Storage policy,

    • Functions,

    • Permissions,

    • and others

  • Data Containers are grouped in collections or sub collections (archive nodes), creating a way for users to group and organize datasets and content.

  • Users/organizations are able to see the whole tree or just a fraction of it, depending on their permissions.

  • LABDRIVE lambda functions are able to process files as they are created, periodically or by request, providing a powerful way to process content.

Core platform capacities

01 Automation with code

User is able to define data container-level lambda functions (LABDRIVE Lambda functions) that are executed on certain events. See more in the Functions section.

02 Virtualized storage

LABDRIVE creates an abstraction layer between the user and from the content, making it possible to define policies that drive the infrastructure. See more in the Storage section.

03 Metadata and discovery

It is possible to define metadata schemas, that are associated to each item in a data container, making it possible to store any type of metadata (structured [XML, JSON and triples/links] or simple [strings, dates, etc]) with your data.

LABDRIVE includes search capabilities in the Management Interface and when using the API. Metadata can be automatically imported (and exported) using the LABDRIVE Lambda functions. See more in the Metadata section.

04 Multi-protocol file sharing

LABDRIVE allows users to access content using multiple protocols: S3, SFTP, NFS, rsync, in addition to the capable management web interface. Accessing the same data container using multiple protocols is possible.

05 Smart reports

Smart reports allow you to get insights about your content, and to analyze it.

06 Federated access

Content is organized by collections and sub collections (of data containers). Permissions for content and metadata are data container-based or collection-based (inherited). Users can belong to multiple organizations and can log-in using their own identity provider. See more in the Federated access section.