Jupyter Notebooks as Other RepInfo

Interactive computational notebooks are tools that enable the creation of dynamic documents that combine narrative, data, and code, and are provided as web-based interactive computing platforms. Within an interactive computational notebook, data, code and explanation live together to describe and interactively show the steps pursued in order to obtain the results.

Jupyter Notebooks is currently widely adopted in multiple domains. They can be created or imported into LABDRIVE.

LABDRIVE support for reproducibility by means of Jupyter Notebooks

Computational artifacts (mainly code) and data reside in the same containers, following the model "keep the data and code together".

A full notebook is executable from the platform, given that the different components are located in different containers.

External users are allowed to access the data from external reproducibility services through JupyterHub software, which is installed on the platform.

For these three LABDRIVE's internal options, enough per-user isolation from any other platform components is considered, so interactive notebooks are run in isolated containers.

Preservation challenges of Jupyter Notebooks

The Jupyter Notebook software evolves and so in order to be sure that what worked in the past can be reproduced in future, snapshots of the various components including support libraries, Jupyter kernels for required languages, and the appropriate operating system, must be captured and preserved, for example in Virtual machines as Other RepInfo.

Good practices for the reproducibility of Jupyter Notebooks

A bunch of good practices for the development of notebooks in order to underpin its future reproducibility has been identified by Pimentel, Murta, Braganholo and Freire*, based on a deeply and extensive study.

Use short titles with a restrict charset (A-Z a-z 0-9 . -) for notebook files and Markdown headings for more detailed ones in the body.
Pay attention to the bottom of the notebook. Check whether it can benefit from descriptive Markdown cells. Additionally, check whether the bottom cells have been executed. If not, consider either executing or removing them.
Abstract code into functions, classes, and modules, and test them.
Declare the dependencies in requirement files and pin the versions of all packages.
Use a clean environment for testing the dependencies to check if all of them are declared.
Put imports at the beginning of notebooks.
Use relative paths for accessing data in the repository.
Re-run notebooks top to bottom before committing.

A recommendable tool is Julynter (https://dew-uff.github.io/julynter/), a Jupyter Lab extension, that performs many checks on the quality and reproducibility of notebooks in real-time and produces recommendations.

*Pimentel, J. F., Murta, L., Braganholo, V., and Freire, J. (2021). Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering, 26(4), 65. https://doi.org/10.1007/s10664-021-09961-9.

PreviousPreserving simple software NextPreserving complex software

Last updated 3 years ago

Was this helpful?