Using the API
Last updated
Was this helpful?
Last updated
Was this helpful?
Flexible Intake includes a powerful HTTP Restful API capable of performing any system action or adjust any platform setting.
To better understand the platform, we recommend the reader to review and the overview first.
See here the , or continue reading for an introduction to its methods.
While some usage examples of the API are included across every section in the documentation, here we have a more specific documentation about all existing methods:
Data containers are the basic way of grouping content in Flexible Intake.
It is possible to associate tags to elements (files or folders), so you can search for them later. See tag methods to create, edit or remove them. To assign a tag to a file, review the /file/{id}/tag/{tag_id}
in the file methods section.
Each data container can be associated to a workflow step. Containers can be listed by the step they are in, for the users to know which container is in what process. Workflows can be created, edited or deleted with the methods in the Workflow methods section.
Lifecycle policies can protect the content or erase it automatically based on dates or periods of time (e.g. "make the content inmutable once I upload it for 5 years"). Policies can be maintained in the Lifecycle policies section, and applied using the container's methods /container/{id}/lifecycle-policy/{lifecycle_id}
Data containers can have metadata associated to describe them. Metadata is grouped into schemas and fields/descriptors that are associated to the containers. See Container metadata schemas section, and assign them when creating containers.
Files, folders or whole containers can be made publicly accesible for unauthenticated/anonymous users using the methods in the Sharing section.
When data containers are created, a lot of parameters need to be defined. Some users prefer to create templates with all their settings and simply apply them when the container is created. Container templates section describes the methods.
Data containers preserve files/folders inside (both things are files inside Flexible Intake), that have metadata associated to them, becoming objects.
See methods to create and assign other properties to them in the file methods section.
Files/folders can have metadata associated. Metadata is grouped into metadata schemas. How to maintain them is described in the Object metadata section. Actual values for the fields defined in the schema can be associated to/obtained from the objects using /file/{id}/metadata
For digital preservation purposes, it is important to understand the format of a file. To do that, the preservation community uses the PRONOM standard. Methods to work with PRONOM are defined in the PRONOM section.
Data containers are organized in an hierarchical way using the Archival structure and the Archival structure nodes. This is managed using the Archival Structure methods.
Submission areas can be created so anonymous or unauthenticated users can ingest content into the platform (without being able to access it after depositing it). Submission areas are managed using the methods available in the Submission Areas section.
Reports can be launched, retrieved or scheduled using the Report methods.
Many user or system actions are retained by Flexible Intake. They can be accessed using the methods in the Events section.
Flexible Intake users are capable of defining lambda functions (code) that the platform executes on certain triggers, massively increasing the customization and adaptation of Flexible Intake to some use cases.
When a Function is executed by a user, it creates a job that is used to retrieve the function output. Job-related methods are accessed in the Jobs section.
User accounts are managed using the User methods. Users can be grouped in Groups that can be maintained in the Groups section. Permissions are assigned to users or groups to make them capable of performing certain actions, and are adjusted in the Permissions section.
When you create a new container using the API POST /container
method, Flexible Intake needs to create it in the S3 bucket and assign permissions to your user (and to others with permissions to the same data container) in order for you to be able to start uploading to it. This process may need a few seconds to complete.
This way, if you try to create a new container and, immediately after making the request, you try to write to it using S3, you will get 404 or permission denied-related errors. Permissions will be adjusted under 5 seconds in most situations, but the safe approach to this is to:
Create the data container with the POST /container
Loop until you can write your first file without getting an error back
Continue your uploads.
When you upload a new file using S3, Flexible Intake needs to:
Phase I (Index your file): Detect the file in the storage to create Flexible Intake-internal data structures (like assigning the file ID to every file).
Phase II (System functions): Flexible Intake carries out its basic and mandatory preservation-related actions (integrity hash calculation, characterization, etc.).
Phase III (User functions): And finally, user-level lambda functions are called by the system.
Uploaded files cannot be handled with the API until Flexible Intake has completed Phase I, and the platform gains consistency for the file. This initial process takes less than half a second under normal circumstances. So, if your code uploads a file and immediately after the upload it tries to get its file ID, Flexible Intake may not show it yet. As this period of time depends on the platform workload (and not always half a second), your code needs to be ready for this to happen, and the safe approach to this is to:
Upload your file using S3
Loop until you get the file ID using the /container/{containerID}/file/path/{your path}
If the file is not yet indexed (or if it does not exist, of course), Flexible Intake will return a 404 error
If the file is indexed, Flexible Intake will return the file details.
Assign your metadata or any other action over the file using the file ID.
Under high workload (for instance, uploading 2 million files), the first file query tells us that the file does not exist if we check it immediately after uploading the file:
But, if we query it again, we get our result:
In this example, the platform is getting consistency in less than one second, but this happens the first time you query for it, or need a few extra seconds. So, as a general recommendation, follow the safe approach described above.
Regarding the output of system functions and user functions, follow the same advice. Until the file hashing and characterization process is finished, your results may show a file without hashes (or only with some of them) or without a characterization result. Your code needs to be ready for this to happen.
If you want to show some feedback to your users in your code, the container details method /container/{container id}
shows the property files_pending_ingestion
, that indicates true if there are files still to be processed or false if everything has been processed in Phase II:
On every search/list request, it is possible to include two values to delimit the number of search results that we would like to get as the result of a search process: limit
and offset
.
For instance, with this search query, we are requesting Flexible Intake to return up to 20 results:
When we want to request 20 additional ones, we should use:
Note that the number of total results is usually provided in the Flexible Intake answer: