LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • LABDRIVE support for reproducibility by means of Jupyter Notebooks
  • Preservation challenges of Jupyter Notebooks
  • Good practices for the reproducibility of Jupyter Notebooks

Was this helpful?

  1. DATA CURATION AND PRESERVATION
  2. Preservation Activities
  3. Adding Representation Information
  4. Other Representation Information
  5. Software as part of the RIN
  6. Preserving simple software

Jupyter Notebooks as Other RepInfo

Interactive computational notebooks are tools that enable the creation of dynamic documents that combine narrative, data, and code, and are provided as web-based interactive computing platforms. Within an interactive computational notebook, data, code and explanation live together to describe and interactively show the steps pursued in order to obtain the results.

Jupyter Notebooks is currently widely adopted in multiple domains. They can be created or imported into LABDRIVE.

LABDRIVE support for reproducibility by means of Jupyter Notebooks

Computational artifacts (mainly code) and data reside in the same containers, following the model "keep the data and code together".

A full notebook is executable from the platform, given that the different components are located in different containers.

External users are allowed to access the data from external reproducibility services through JupyterHub software, which is installed on the platform.

For these three LABDRIVE's internal options, enough per-user isolation from any other platform components is considered, so interactive notebooks are run in isolated containers.

Preservation challenges of Jupyter Notebooks

The Jupyter Notebook software evolves and so in order to be sure that what worked in the past can be reproduced in future, snapshots of the various components including support libraries, Jupyter kernels for required languages, and the appropriate operating system, must be captured and preserved, for example in Virtual machines as Other RepInfo.

Good practices for the reproducibility of Jupyter Notebooks

A bunch of good practices for the development of notebooks in order to underpin its future reproducibility has been identified by Pimentel, Murta, Braganholo and Freire*, based on a deeply and extensive study.

  1. Use short titles with a restrict charset (A-Z a-z 0-9 . -) for notebook files and Markdown headings for more detailed ones in the body.

  2. Pay attention to the bottom of the notebook. Check whether it can benefit from descriptive Markdown cells. Additionally, check whether the bottom cells have been executed. If not, consider either executing or removing them.

  3. Abstract code into functions, classes, and modules, and test them.

  4. Declare the dependencies in requirement files and pin the versions of all packages.

  5. Use a clean environment for testing the dependencies to check if all of them are declared.

  6. Put imports at the beginning of notebooks.

  7. Use relative paths for accessing data in the repository.

  8. Re-run notebooks top to bottom before committing.

PreviousPreserving simple softwareNextPreserving complex software

Last updated 3 years ago

Was this helpful?

A recommendable tool is Julynter (), a Jupyter Lab extension, that performs many checks on the quality and reproducibility of notebooks in real-time and produces recommendations.

*Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. (2021). Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering, 26(4), 65. .

https://dew-uff.github.io/julynter/
https://doi.org/10.1007/s10664-021-09961-9