Functions

LIBSAFE Advanced Functions let you run code in the platform, in response to certain events or triggers, without needing any client-side script. They are similar to Amazon AWS' Lambda functions.

Functions are useful when you want the platform to behave in a specific way, in response to external events or when you want to add your own code to be executed on demand by you.

With the LIBSAFE Advanced Functions, you just upload your code and add the triggers you would like to make it execute.

Users are able to define data container-level functions (LIBSAFE Advanced Functions) that are executed on certain events:

  • CRUD Functions for files and metadata: When files are created, read, updated or deleted.

    E.g.: Every time you upload one of your photographs:

    • Extract the date from the file name to the metadata catalogue, so it is searchable using the web interface or the API,

    • Calculate each file integrity and

    • Tag images that contain faces

  • Periodic functions: Every minute, hour and day.

    E.g. Webhooks to other systems

  • Executed-by-demand functions: When the user selects files using the GUI, or launched using the API

    E.g.: Apply a massive rename for the selected items.

This guide focuses on the executed-by-demand functions (the ones users manually trigger using the Management Interface or the API). See Create functions to learn how to create your own functions.

A python library to make using the API easier and more convenient is available: library and its documentation. This library can be used in Functions, Jupyter Notebooks and in your own scripts.

Real world example for a LIBSAFE Advanced Function

Let's say you have the following use case: You would like to perform an upload of your bagits. Then, you would like to perform an integrity verification to maintain the custody/integrity chain (to detect uploading errors) and finally you would like to assign metadata to them.

There are multiple ways to achieve this behavior in LIBSAFE Advanced , but this use case is a perfect example of combining Workflows and Functions. LIBSAFE Advanced Functions can be triggered by container's workflow changes, so it would be relatively easy to implement a 6-step workflow:

Upload content: You would upload content to the container. Then, when your upload has finished, you would make a single API call (or use the Management Interface) to advance the container to the next status of the workflow. To do that, you can use the following API call:

  --url "$your_platform_url/api/container/{container_id}/step/next" \
  --header "Authorization: Bearer $your_platform_api_key" \
  --data ''

You upload your metadata as a file, as part of your package, or as an Excel spreadsheet, for instance.

Waiting for ingestion to finish: A function will be triggered by the platform itself, that will wait for all files to be ingested. When this is completed, the function will move the container to the next status.

To know if a container is still ingesting content that you have uploaded, look for the files_pending_ingestion property for the container. If true, LIBSAFE Advanced is still processing content.

Integrity verification: A function will be triggered by the platform itself, that will launch a bagit verification process. If the process is positive (your bags are fine), it will move the container to the “Assign metadata” step. If not, to the “Validation errors detected”.

Validation errors detected: A function will send you an email, telling you the bags are not fine.

Assign metadata: A function will be triggered by the platform itself, that will assign all your metadata to your bags. Or, alternatively, it would wait for you to update it.

Archive data: A function will be triggered by the platform itself, that will launch the process to move all data to a cold storage.

You could achieve the same results using API calls client-side and managing this process in your code, but Functions deliver a more integrated approach, and allow other users to simply upload data without needing to perform scripting. As an additional benefit, you have integrated logging for the whole process, and your code performs better (as it is executed server-side).

With this approach, you would be uploading the content and making a single API call at the end of the upload.

Launch functions using the Management Interface

  1. Locate the data container you would like to add metadata to using the Containers menu section or by searching. This guide assumes metadata is properly configured for the data container. See Configuration\Metadata for more details or Working with data containers to see how to create them.

  2. Select Check-in in case you are not checked in the container, and you have the check-in/out enabled for the data container.

In the data container page, choose Explore content:

  1. Select the file you would like to execute the function over and select the function you would like to launch in the sidebar:

It is possible to select multiple items using your mouse to click-and-drag a box around the files or folders that you want to select. You can also use Ctrl and Shift and use the Select all, Select none or Invert selection in the file browser top bar.

You can also filter the files in the folder you are looking at by file name, for instance, for selecting all XML or JPG files:

and then, launch your function from the side bar.

Some functions are going to execute immediately, while others may require several hours. To track the function progress, you can use the link provided in the confirmation window that LIBSAFE Advanced shows when you launch the function:

You can also go to the Container and select Functions, to see the ones in execution and their outcome:

Some functions will process and change your content, while others may create new files in the container. When a function produces a new report or file, it is shown as an Asset when you open the function execution details page:

Launch functions using the API

API examples here are just illustrative. Check the LIBSAFE Advanced API documentation for additional information and all available methods.

  1. Sign in to the LIBSAFE Advanced Management Interface

  2. Obtain your API key by selecting your name and then Access Methods.

Launch the function

  1. To execute a Function you need to know the Function ID and the container or file IDs you want to execute it over (or apply it to). You can get all Functions that are loaded in the LIBSAFE Advanced platform using the following method:

$ curl --request GET \
      --url "$your_platform_url/api/functions" \
      --header "authorization: Bearer $your_platform_api_key"

Some Functions receive parameters. To call them, use the following method:

$ curl --request POST \
      --url "$your_platform_url/api/container/{your container id}/file/0/function/{your function id}" \
      --header "authorization: Bearer $your_platform_api_key" \
      --data '{your parameters here}'

For instance, this Function requests the file IDs and the path as parameters:

$ curl --request POST \
      --url "$your_platform_url/api/container/171/file/0/event/32" \
      --header "authorization: Bearer $your_platform_api_key" \
      --data '{"extra":{"filename":"","ids":["985248"],"path":"/"}}'

For small Functions that are immediately completed, LIBSAFE Advanced will answer with a Success/error code, but for more complex Functions, LIBSAFE Advanced will create a job to execute them, so you can track the execution progress.

When the Function is launched, LIBSAFE Advanced will provide the job id in the response:

Then, you can do three things:

Monitor its execution

Some Functions may take hours to complete. You can use the /job API endpoint with the job_id returned in the previous method to monitor its progress:

$ curl --request GET \
      --url "$your_platform_url/api/job/{your job id}" \
      --header "authorization: Bearer $your_platform_api_key" \
      --data '{}'

See Function output log

For each job, you can get its log using the /job/{job id}/messages:

$ curl --request GET \
        --url "$your_platform_url/api/job/{job id}/messages" \
        --header "authorization: Bearer $your_platform_api_key" \
        --data '{}'

Review created assets/files

And finally, some Functions could create new assets (files). For example, your Function could produce a report in a PDF file, or if the Function is to compress data, it will create a ZIP file.

You can list the assets a job has created with the /job/{job id}/assets method:

$ curl --request GET \
        --url "$your_platform_url/api/job/{job id}/assets" \
        --header "authorization: Bearer $your_platform_api_key" \
        --data '{}'

And download the asset like any other file with the file_id, using the API or any other available download method (remember to include the "-L" in your call):

$ curl --request GET \
       --url "$your_platform_url/api/file/{your file id}/download" \
       --header "Content-Type: application/json" \
       --header "authorization: Bearer $your_platform_api_key" \
       --data '{}' -L --output my_execution_report.html

Create a LIBSAFE Advanced Function

We have created a Python library that simplifies many actions and makes your programming easier when creating a function.

A python library to make using the API easier and more convenient is available: library and its documentation. This library can be used in Functions, Jupyter Notebooks and in your own scripts.

For example, lets say you would like to create a function that hashes your files with a new algorithm you would like to use.

First, you should initialize your function, loading the LIBNOVA LIBSAFE Advanced libraries:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

If your code is going to be called from a LIBSAFE Advanced Function, you will receive some parameters from LIBSAFE Advanced every time your function is called. This variable is initialized in the following way:

request_helper = com.Nuclio.Request.Request(context, event)

Depending on the function type, the structure you receive can change, but usually you can find the following:

{
    "api": {
        "url": "http://go.libnova.com",
        "key_user": "1234567890abcdefghijklmnopqrstuvwxyz",
        "key_root": "1234567890abcdefghijklmnopqrstuvwxyz"
    },
    "function_data": {
        "container": {
            "id": "1"
        },
        "user": {
            "id": "1"
        },
        "files": {
            "ids":   [ ],
            "paths": [ ]
        },
        "job": {
            "id": "299"
        },
        "trigger": {
            "id": "0",
            "type": "",
            "regex": ""
        },
        "function": {
            "id": "0",
            "key": ""
        }
    },
    "function_params": {
        "your_custom_parameter": "custom_parameter_value"
    }
}

Every function executes in relation to an (Execution) Job, that is really useful for logging the execution progress. You should initialize it with:

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

And you can log to it using:

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

The JobMessage.JobMessageType defines the type of message. You can see a list of the available types in the method documentation.

And then, you would usually have your payload. In this example:

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

And finally, we must let LIBSAFE Advanced know that our function has finished, with the result status:

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

The full code sample:

#!/usr/bin/env python
# coding: utf-8

import json
import hashlib

from libnova                           import com, Util
from libnova.com                       import Nuclio
from libnova.com.Nuclio                import Request
from libnova.com.Api                   import Driver, Container, File, Job, JobMessage
from libnova.com.Filesystem            import S3
from libnova.com.Filesystem.S3         import File as S3File, Storage

# Initialize the Request parser
#
# This will automatically parse the data sent by the platform to this function, like the File ID,
# the Job ID, or the User ID who triggered this function.
#
# It will also initialize the API Driver using the user API Key
context.logger.info(event.body.decode("utf-8"))
request_helper = com.Nuclio.Request.Request(context, event) 

# This will set the current function Job to the status "RUNNING"
request_helper.job_init()

# This will write a new Job Message related with the current function Job
request_helper.log("Sample message", JobMessage.JobMessageType.INFO)

# This will iterate over all the files related with this function execution
for request_file in request_helper.Files:
    # This will retrieve the current function File metadata
    file_metadata = File.get_metadata(request_file.id, True)
    if file_metadata is not None:
        # We log the metadata
        request_helper.log(Util.format_json_item(file_metadata), JobMessage.JobMessageType.INFO)
    else:
        request_helper.log("File " + request_file.id + " has no metadata", JobMessage.JobMessageType.INFO)

    # This will retrieve a seekable S3 file stream that can be used like a native file stream reader
    file_stream = S3.File.get_stream(
        # The storage is needed to set the source bucket of the file
        request_helper.Storage,
        request_file
    )
    if file_stream is not None:
        file_hash_md5 = hashlib.md5()
        file_hash_sha1 = hashlib.sha1()
        file_hash_sha256 = hashlib.sha256()

        # Hashing the blocks with a stream buffer read we can hash multiple algorithms at once
        file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)
        while file_data_stream_buffer:
            file_hash_md5.update(file_data_stream_buffer)
            file_hash_sha1.update(file_data_stream_buffer)
            file_hash_sha256.update(file_data_stream_buffer)

            file_data_stream_buffer = file_stream.read(8 * 1024 * 1024)

        # We log some messages related to the result of the function
        request_helper.log("File hash calculated: MD5    - " + file_hash_md5.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA1   - " + file_hash_sha1.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)
        request_helper.log("File hash calculated: SHA256 - " + file_hash_sha256.hexdigest(),
                           JobMessage.JobMessageType.INFO, request_file.id)

        # We can also store the calculated hashes in the database
        File.set_hash(request_file.id, "md5", file_hash_md5.hexdigest())
        File.set_hash(request_file.id, "sha1", file_hash_sha1.hexdigest())
        File.set_hash(request_file.id, "sha256", file_hash_sha256.hexdigest())

# This will finalize the current function Job
# The parameter is a boolean that determines if the function Job was successful or not
#
# If the parameter is True,  the result will be "COMPLETED",
# else,
# If the parameter is False, the result will be "FAILED"
request_helper.job_end(True)

Last updated