LogoLogo
  • What is LIBSAFE ADVANCED
  • LIBSAFE Advanced Benefits
  • CONCEPTS
    • Overview
    • Organize your content
    • Active escrow data protection
    • Platform architecture
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LIBSAFE Advanced support for OAIS Conformance
      • Planning for preservation
      • ISO 16363 certification guide
  • GET STARTED
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
    • OpenAccess Integration
      • Transfer Connector
      • LIBSAFE Configuration
      • OpenAccess Configuration
      • Multilanguage
      • Supported Formats
      • Changelog
  • CONFIGURATION
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
  • REPORTS
    • Introduction
    • Data Analytics Reports
    • Container Tab Reports
  • DEVELOPER'S GUIDE
    • Functions
    • Using the API
    • Functions gallery
  • COOKBOOK
    • AWS CLI with LIBSAFE Advanced
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • Configuring Azure SAML-based authentication
    • Configuring Okta authentication
    • Create a manifest before transferring files
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • Create a new digital notebook
  • Upload an existing Jupyter Notebook
  • Open an existing Jupyter Notebook
  • How to use them
  1. GET STARTED

Jupyter Notebooks

PreviousStorage mode transitionsNextOpenAccess Integration

Last updated 2 years ago

LIBSAFE Advanced is integrated with Jupyter Notebooks. Jupyter Notebooks are documents that contain an organized list of input/output cells which can contain code (Python usually, but other languages can be used), text (using Markdown), mathematics, plots and rich media, that can be executed step by step or in full, in a very easy to use environment, in a LIBSAFE Advanced-integrated computational environment.

The source code used to create, read and analyze scientific and research data is usually created by the researchers as Jupyter Notebooks, and must also be preserved, along with the datasets. It is usually the best existing Provenance and Structure metadata for the dataset.

LIBSAFE Advanced allows users to keep the Jupyter Notebooks in which they have the code that reads and "understands" their data as part of the dataset they are creating.

Before using the Jupyter notebooks feature, make sure that your user has an active API key and S3 credentials already generated. If not, a 403 Forbidden error will be shown while trying to access a notebook.

Create a new digital notebook

When in the Explore Content tab of a Data container, right-click over an empty space in the files area. Select New and then Dynamic Notebook, to create a new notebook.

Upload an existing Jupyter Notebook

You can upload any existing Jupyter Notebook like any other file, using a file transfer protocol or simply dragging and dropping your file to the LIBSAFE Advanced Data Container.

Open an existing Jupyter Notebook

To open a Jupyter Notebook, double click the notebook icon you would like to open.

How to use them

For example, let's say you would like to create a function that hashes your files with a new algorithm you would like to use.

First, you should initialize your function, loading the LIBNOVA LIBSAFE Advanced libraries:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

If your Function is going to be called from a LIBSAFE Advanced Function, you will receive some parameters from LIBSAFE Advanced every time your Function is called, but if you plan to use it inside a Jupyter Notebook, you should initialize it on your own:

json_sample = {
    "api": {
        "url": "http://go.libnova.com",
        "key_user": "1234567890abcdefghijklmnopqrstuvwxyz",
        "key_root": "1234567890abcdefghijklmnopqrstuvwxyz"
    },
    "function_data": {
        "container": {
            "id": "1"
        },
        "user": {
            "id": "1"
        },
        "files": {
            "ids":   [ ],
            "paths": [ ]
        },
        "job": {
            "id": "299"
        },
        "trigger": {
            "id": "0",
            "type": "",
            "regex": ""
        },
        "function": {
            "id": "0",
            "key": ""
        }
    },
    "function_params": {
        "your_custom_parameter": "custom_parameter_value"
    }
}

# Initialize the Request parser
#
# This will automatically parse the data sent by the platform to this function, like the File ID,
# the Job ID, or the User ID who triggered this function.
#
# It will also initialize the API Driver using the user API Key
request_helper = com.Nuclio.Request.Request(
    None,
    type('',(object,),{"body": json.dumps(json_sample)})()
)

Every function executes in relation to an (Execution) Job, that is really useful for logging the execution progress. You should initialize it with:

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

And you can log to it using:

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

And then, you would usually have your payload. In this example:

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

And finally, we must let LIBSAFE Advanced know that our function has finished, with the result status:

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

The full code sample:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

json_sample = {
    "api": {
        "url": "http://go.libnova.com",
        "key_user": "1234567890abcdefghijklmnopqrstuvwxyz",
        "key_root": "1234567890abcdefghijklmnopqrstuvwxyz"
    },
    "function_data": {
        "container": {
            "id": "1"
        },
        "user": {
            "id": "1"
        },
        "files": {
            "ids":   [ ],
            "paths": [ ]
        },
        "job": {
            "id": "299"
        },
        "trigger": {
            "id": "0",
            "type": "",
            "regex": ""
        },
        "function": {
            "id": "0",
            "key": ""
        }
    },
    "function_params": {
        "your_custom_parameter": "custom_parameter_value"
    }
}

# Initialize the Request parser
#
# This will automatically parse the data sent by the platform to this function, like the File ID,
# the Job ID, or the User ID who triggered this function.
#
# It will also initialize the API Driver using the user API Key
request_helper = com.Nuclio.Request.Request(
    None,
    type('',(object,),{"body": json.dumps(json_sample)})()
)

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

You can use your Jupyter Notebooks in the same way you would use them in any other platform but, if you plan to work with the data you have in a LIBSAFE Advanced container, we have created a that simplifies many actions and makes your programming easier.

The JobMessage.JobMessageType defines the type of message. You can see a list of the available types .

Python library
here