LogoLogo
  • What is LIBSAFE Go
  • LIBSAFE Go Benefits
  • CONCEPTS
    • Overview
    • Organize your content
    • Platform architecture
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LIBSAFE Go support for OAIS Conformance
      • Planning for preservation
      • ISO 16363 certification guide
  • GET STARTED
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
    • OpenAccess Integration
      • Transfer Connector
      • LIBSAFE Go Configuration
      • OpenAccess Configuration
      • Multilanguage
      • Supported Formats
      • Changelog
  • CONFIGURATION
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
  • DEVELOPER'S GUIDE
    • Functions
    • Using the API
    • Functions gallery
  • COOKBOOK
    • AWS CLI with LIBSAFE Go
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • Configuring Azure SAML-based authentication
    • Configuring Okta authentication
    • Create a manifest before transferring files
  • FILE BROWSER
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • Create a new digital notebook
  • Upload an existing Jupyter Notebook
  • Open an existing Jupyter Notebook
  • How to use them

Was this helpful?

  1. GET STARTED

Jupyter Notebooks

PreviousStorage mode transitionsNextOpenAccess Integration

Last updated 2 years ago

Was this helpful?

LIBSAFE Go is integrated with Jupyter Notebooks. Jupyter Notebooks are documents that contain an organized list of input/output cells which can contain code (Python usually, but other languages can be used), text (using Markdown), mathematics, plots and rich media, that can be executed step by step or in full, in a very easy to use environment, in a LIBSAFE Go-integrated computational environment.

The source code used to create, read and analyze scientific and research data is usually created by the researchers as Jupyter Notebooks, and must also be preserved, along with the datasets. It is usually the best existing Provenance and Structure metadata for the dataset.

LIBSAFE Go allows users to keep the Jupyter Notebooks in which they have the code that reads and "understands" their data as part of the dataset they are creating.

Before using the Jupyter notebooks feature, make sure that your user has an active API key and S3 credentials already generated. If not, a 403 Forbidden error will be shown while trying to access a notebook.

Create a new digital notebook

When in the Explore Content tab of a Data container, right-click over an empty space in the files area. Select New and then Dynamic Notebook, to create a new notebook.

Upload an existing Jupyter Notebook

You can upload any existing Jupyter Notebook like any other file, using a file transfer protocol or simply dragging and dropping your file to the LIBSAFE Go Data Container.

Open an existing Jupyter Notebook

To open a Jupyter Notebook, double click the notebook icon you would like to open.

How to use them

For example, let's say you would like to create a function that hashes your files with a new algorithm you would like to use.

First, you should initialize your function, loading the LIBNOVA LIBSAFE Go libraries:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

If your Function is going to be called from a LIBSAFE Go Function, you will receive some parameters from LIBSAFE Go every time your Function is called, but if you plan to use it inside a Jupyter Notebook, you should initialize it on your own:

json_sample = {
    "api": {
        "url": "http://go.libnova.com",
        "key_user": "1234567890abcdefghijklmnopqrstuvwxyz",
        "key_root": "1234567890abcdefghijklmnopqrstuvwxyz"
    },
    "function_data": {
        "container": {
            "id": "1"
        },
        "user": {
            "id": "1"
        },
        "files": {
            "ids":   [ ],
            "paths": [ ]
        },
        "job": {
            "id": "299"
        },
        "trigger": {
            "id": "0",
            "type": "",
            "regex": ""
        },
        "function": {
            "id": "0",
            "key": ""
        }
    },
    "function_params": {
        "your_custom_parameter": "custom_parameter_value"
    }
}

# Initialize the Request parser
#
# This will automatically parse the data sent by the platform to this function, like the File ID,
# the Job ID, or the User ID who triggered this function.
#
# It will also initialize the API Driver using the user API Key
request_helper = com.Nuclio.Request.Request(
    None,
    type('',(object,),{"body": json.dumps(json_sample)})()
)

Every function executes in relation to an (Execution) Job, that is really useful for logging the execution progress. You should initialize it with:

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

And you can log to it using:

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

And then, you would usually have your payload. In this example:

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

And finally, we must let LIBSAFE Go know that our function has finished, with the result status:

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

The full code sample:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

json_sample = {
    "api": {
        "url": "http://go.libnova.com",
        "key_user": "1234567890abcdefghijklmnopqrstuvwxyz",
        "key_root": "1234567890abcdefghijklmnopqrstuvwxyz"
    },
    "function_data": {
        "container": {
            "id": "1"
        },
        "user": {
            "id": "1"
        },
        "files": {
            "ids":   [ ],
            "paths": [ ]
        },
        "job": {
            "id": "299"
        },
        "trigger": {
            "id": "0",
            "type": "",
            "regex": ""
        },
        "function": {
            "id": "0",
            "key": ""
        }
    },
    "function_params": {
        "your_custom_parameter": "custom_parameter_value"
    }
}

# Initialize the Request parser
#
# This will automatically parse the data sent by the platform to this function, like the File ID,
# the Job ID, or the User ID who triggered this function.
#
# It will also initialize the API Driver using the user API Key
request_helper = com.Nuclio.Request.Request(
    None,
    type('',(object,),{"body": json.dumps(json_sample)})()
)

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

You can use your Jupyter Notebooks in the same way you would use them in any other platform but, if you plan to work with the data you have in a LIBSAFE Go container, we have created a that simplifies many actions and makes your programming easier.

The JobMessage.JobMessageType defines the type of message. You can see a list of the available types .

Python library
here