LogoLogo
  • What is LABDRIVE
  • Concepts
    • Architecture and overview
    • Organize your content
    • OAIS and ISO 16363
      • Understanding OAIS and ISO 16363
      • LABDRIVE support for OAIS Conformance
      • Benefits of preserving research data
      • Planning for preservation
      • ISO 16363 certification guide
      • LABDRIVE support for FAIRness
  • Get started
    • Create a data container
    • Upload content
    • Download content
    • Introduction to metadata
    • Search
    • File versioning and recovery
    • Work with data containers
    • Functions
    • Storage mode transitions
    • Jupyter Notebooks
  • Configuration
    • Archive organization
    • Container templates
    • Configure metadata
    • Users and Permissions
    • Running on premises
  • DATA CURATION AND PRESERVATION
    • Introduction
    • Information Lifecycles
    • Collecting Information needed for Re-Use and Preservation
    • Planning and Using Additional Information in LABDRIVE
    • How to deal with Additional Information
      • Representation Information
      • Provenance Information
      • Context Information
      • Reference Information
      • Descriptive Information
      • Packaging Information
      • Definition of the Designated Community(ies)
      • Preservation Objectives
      • Transformational Information Properties
    • Preservation Activities
      • Adding Representation Information
        • Semantic Representation Information
        • Structural Representation Information
        • Other Representation Information
          • Software as part of the RIN
            • Preserving simple software
              • Jupyter Notebooks as Other RepInfo
            • Preserving complex software
              • Emulation/Virtualisation
                • Virtual machines as Other RepInfo
                • Docker and other containers as Other RepInfo
              • Use of ReproZip
      • Transforming the Digital Object
      • Handing over to another archive
    • Reproducing research
    • Exploiting preserved information
  • DEVELOPER'S GUIDE
    • Introduction
    • Functions
    • Scripting
    • API Extended documentation
  • COOKBOOK
    • LABDRIVE Functions gallery
    • AWS CLI with LABDRIVE
    • Using S3 Browser
    • Using FileZilla Pro
    • Getting your S3 bucket name
    • Getting your S3 storage credentials
    • Advanced API File Search
    • Tips for faster uploads
    • File naming recommendations
    • Configuring Azure SAML-based authentication
    • Exporting OAIS AIP Packages
  • File Browser
    • Supported formats for preview
    • Known issues and limitations
  • Changelog and Release Notes
Powered by GitBook
On this page
  • Shell scripting
  • Use case: Launching a function over every item in a container
  • Use case: Copy content from one container to another one

Was this helpful?

  1. DEVELOPER'S GUIDE

Scripting

You can make your code to interact with the platform. Here you can find some code examples you can use to start.

Shell scripting

Combining a few shell-based commands, you can automate many processes and work with the platform in batch.

Use case: Launching a function over every item in a container

Let's say you would like to launch a function over every item you have in a certain container. Doing it using the interface may be a boring experience. Let's do it using the fish shell instead.

First, let's list the items in the container:

You could simply list them using something like:

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                }
                          ]
        }'

But this will 1)only retrieve the first 200 elements (out of 2335 in this case), 2)will not give you the identifiers you need to launch your functions using bash and 3)will provide files and folders, while you only want your files:

To accomplish this task, you can use a combination of a for to iterate over the API to get your identifiers and jq, to parse the json output to get a list of identifiers.

First, make a query that will only retrieve the number of files:

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                },
                {
                    "type": "FILE"
                }
                          ],
            "limit": 1,
            "offset": 0
        }'

You can then use jq to obtain the number of elements in the query (that you could keep in a variable for the next step if you wish):

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                },
                {
                    "type": "FILE"
                }
                          ],
            "limit": 1,
            "offset": 0
        }' -s | jq --raw-output '.total'

Next, you could simply iterate over the results, using the limit/offset values to get a full list of results, and using jq to parse their ids:

for i in (seq 0 100 2334); curl --request GET  --url "$your_platform_url/api/file" \
               --header "Content-Type: application/json" \
               --header "authorization: Bearer $your_platform_api_key" \
               --data '{
                  "conditions": [
                      {
                          "container_id": 15
                      },
                      {
                          "type": "FILE"
                      }
                                ],
                  "limit": 100,
                  "offset": '$i'
              }' -s | jq --raw-output '.result[].id'; end

The resulting list of ids could be used to launch the function directly (using xargs for instance), but we are going to save the result to a txt file instead:

for i in (seq 0 100 2334); curl --request GET  --url "$your_labdrive_url/api/file" \
               --header "Content-Type: application/json" \
               --header "authorization: Bearer $your_labdrive_api_key" \
               --data '{
                  "conditions": [
                      {
                          "container_id": 15
                      },
                      {
                          "type": "FILE"
                      }
                                ],
                  "limit": 100,
                  "offset": '$i'
              }' -s | jq --raw-output '.result[].id' >> my_file_ids.txt; end

And then, we'll use the list of ids to launch a function for each of them:

while read id;curl --request POST \
                  --url "$your_labdrive_url/api/container/15/function/v2/1" \
                  --header "Content-Type: application/json" \
                  --header "authorization: Bearer $your_labdrive_api_key" -s \
                  --data '{
                     "files": {
                     "ids": [
                      '$id'
                            ]
              },
              "params": {
                  "target_container": "17",
                  "target_path":      "/v1"} }'  ; end < my_file_ids.txt

Use case: Copy content from one container to another one

Let's say you would like copy content from one container to another one, but renaming your files in the process.

First of all, let's obtain a ful list of the files in the container, using the same principle showcased in the previous article:

for i in (seq 0 100 3745); curl --request GET  --url "$your_platform_url/api/file" \
                     --header "Content-Type: application/json" \
                     --header "authorization: Bearer $your_platform_api_key" \
                     --data '{
                        "conditions": [
                            {
                                "container_id": 18
                            },
                            {
                                "type": "FILE"
                            }
                                      ],
                        "limit": 100,
                        "offset": '$i'
                    }' -s | jq --raw-output '.result[].id' >> container18.txt; end
                    

(remember to update the total items in your container in the seq command)

Quickly check that your generated file contains the elements you expect. For instance, using wc:

Let's now build the file that will contain the commands to execute. A while loop may work well in this case:

while read id; 
echo 'aws s3 cp s3://YourS3Bucket/8'$id' s3://YourS3Bucket/19'(string replace 'PROCESSED__' ''  $id) --metadata-directive REPLACE; 
end  < container9.txt > container8to19.txt

(note the (string replace) function, that is used to replace some text in the source file when copying it to the destination file. This is a Fish Shell function, but you can find alternatives also for Bash, for instance)

And finally, you can launch your processes in parallel for maximum performance using, for instance:

parallel -a container8to19.txt --jobs 20 --bar {}

Which will copy 20 files in parallel from one container to another:

PreviousFunctionsNextAPI Extended documentation

Last updated 3 years ago

Was this helpful?