Scripting

You can make your code to interact with the platform. Here you can find some code examples you can use to start.

Shell scripting

Combining a few shell-based commands, you can automate many processes and work with the platform in batch.

Use case: Launching a function over every item in a container

Let's say you would like to launch a function over every item you have in a certain container. Doing it using the interface may be a boring experience. Let's do it using the fish shell instead.

First, let's list the items in the container:

You could simply list them using something like:

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                }
                          ]
        }'

But this will 1)only retrieve the first 200 elements (out of 2335 in this case), 2)will not give you the identifiers you need to launch your functions using bash and 3)will provide files and folders, while you only want your files:

To accomplish this task, you can use a combination of a for to iterate over the API to get your identifiers and jq, to parse the json output to get a list of identifiers.

First, make a query that will only retrieve the number of files:

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                },
                {
                    "type": "FILE"
                }
                          ],
            "limit": 1,
            "offset": 0
        }'

You can then use jq to obtain the number of elements in the query (that you could keep in a variable for the next step if you wish):

curl --request GET  --url "$your_platform_url/api/file" \
         --header "Content-Type: application/json" \
         --header "authorization: Bearer $your_platform_api_key" \
         --data '{
            "conditions": [
                {
                    "container_id": 15
                },
                {
                    "type": "FILE"
                }
                          ],
            "limit": 1,
            "offset": 0
        }' -s | jq --raw-output '.total'

Next, you could simply iterate over the results, using the limit/offset values to get a full list of results, and using jq to parse their ids:

for i in (seq 0 100 2334); curl --request GET  --url "$your_platform_url/api/file" \
               --header "Content-Type: application/json" \
               --header "authorization: Bearer $your_platform_api_key" \
               --data '{
                  "conditions": [
                      {
                          "container_id": 15
                      },
                      {
                          "type": "FILE"
                      }
                                ],
                  "limit": 100,
                  "offset": '$i'
              }' -s | jq --raw-output '.result[].id'; end

The resulting list of ids could be used to launch the function directly (using xargs for instance), but we are going to save the result to a txt file instead:

for i in (seq 0 100 2334); curl --request GET  --url "$your_labdrive_url/api/file" \
               --header "Content-Type: application/json" \
               --header "authorization: Bearer $your_labdrive_api_key" \
               --data '{
                  "conditions": [
                      {
                          "container_id": 15
                      },
                      {
                          "type": "FILE"
                      }
                                ],
                  "limit": 100,
                  "offset": '$i'
              }' -s | jq --raw-output '.result[].id' >> my_file_ids.txt; end

And then, we'll use the list of ids to launch a function for each of them:

while read id;curl --request POST \
                  --url "$your_labdrive_url/api/container/15/function/v2/1" \
                  --header "Content-Type: application/json" \
                  --header "authorization: Bearer $your_labdrive_api_key" -s \
                  --data '{
                     "files": {
                     "ids": [
                      '$id'
                            ]
              },
              "params": {
                  "target_container": "17",
                  "target_path":      "/v1"} }'  ; end < my_file_ids.txt

Use case: Copy content from one container to another one

Let's say you would like copy content from one container to another one, but renaming your files in the process.

First of all, let's obtain a ful list of the files in the container, using the same principle showcased in the previous article:

for i in (seq 0 100 3745); curl --request GET  --url "$your_platform_url/api/file" \
                     --header "Content-Type: application/json" \
                     --header "authorization: Bearer $your_platform_api_key" \
                     --data '{
                        "conditions": [
                            {
                                "container_id": 18
                            },
                            {
                                "type": "FILE"
                            }
                                      ],
                        "limit": 100,
                        "offset": '$i'
                    }' -s | jq --raw-output '.result[].id' >> container18.txt; end
                    

(remember to update the total items in your container in the seq command)

Quickly check that your generated file contains the elements you expect. For instance, using wc:

Let's now build the file that will contain the commands to execute. A while loop may work well in this case:

while read id; 
echo 'aws s3 cp s3://YourS3Bucket/8'$id' s3://YourS3Bucket/19'(string replace 'PROCESSED__' ''  $id) --metadata-directive REPLACE; 
end  < container9.txt > container8to19.txt

(note the (string replace) function, that is used to replace some text in the source file when copying it to the destination file. This is a Fish Shell function, but you can find alternatives also for Bash, for instance)

And finally, you can launch your processes in parallel for maximum performance using, for instance:

parallel -a container8to19.txt --jobs 20 --bar {}

Which will copy 20 files in parallel from one container to another:

Last updated