Organize your content

Once you create, gather, or start manipulating data and files, your content can easily and quickly become disorganised.

To be more efficient and prevent errors later on, organizations should decide how you will name and structure their content. This includes metadata, that will provide context to your content, so future users can also understand it.

LABDRIVE offers a great degree of flexibility on how information is organized when in the platform, and it is able to adapt to any kind of data structure. In LABDRIVE, organizations and users are in charge of defining their own data models for their content.

As LABDRIVE also supports multiple metadata schemas and policies for the content, it is also able to accommodate content coming from multiple sources and disciplines, and to make it usable by organizations that may be complex and large or simple and small.

A good place to start is to develop a logical data structure that includes organising by type of processing and by data type. When doing so, you need to pay attention to the following topics:

  • Permissions: The lowest level for permissions is at container level. It is not possible to define that, for two folders in a container, one should be accessed by a given user while the other should not. If you need to have different access permissions for two groups of content, you cannot have them in the same container.

  • Storage policies: Everything you place in a container will have the same INITIAL default policy for storage. You can change it at file or folder level later, but the platform will assign one initially.

  • Reports: You may want to have a report that groups only certain information, or you may want to use reporting to chargeback other platform tenants/departments or users. You may need to arrange your content in a particular way to be able to fulfill this need.

  • Search: You can restrict a search down to a single container. Perhaps your search needs could be limited to a fraction of the information you have. It may be simpler and easier to just split your content in two containers than creating more complex searches.

  • Metadata: All items in a container share the same metadata schema. It is not possible to have some items with one schema and other items with another schema in the same container. You may need to combine your schemas in one, or to have content with different schemas in two containers.

  • Functions: The platform can execute functions (like macros or scripts) to organize, transform or analyse your content. Having (or not having) some functions available may impact your decision on how to arrange your information.

  • Preference: Sometimes you are used to organize the information in a way, and it may be convenient to continue using it.

The tools you have in LABDRIVE to organize your information are:

  • Archival nodes' structure and the user/group permissions, and includes

    • Permissions

    • Policies

    • Reports including explanation of logical structure, and

    • Search

  • Data containers, share several commonalities with the archival nodes, including

    • Permissions

    • Policies

    • Reports including explanation of logical structure

    • Search, and

    • Metadata

  • Folder, file names and associated metadata

  • Items' metadata (including descriptive fields, tags and schema to support preservation and usability)

Shared Provenance, such as details of designs of instruments or process algorithms, and Representation Information, such as file formatting and semantics, can be attached to any level including containers, folders or files, in order to share those pieces of metadata important for use, re-use and preservation.

Archival nodes

Archival nodes contain other nodes or containers, and allow organizations to create a first level of organization for their data.

There are four factors to consider when deciding how to use them:

Permissions

Access permissions can be adjusted at node level. You can group your information in the archival nodes based on who should be able to access it.

Policies

LABDRIVE allows you to define container templates, that can be used to define how the content is going to be managed in the containers created associated to them including:

  • The content source for the container: When creating a new container, LABDRIVE can copy the content from another data container into it. This is useful if you want to define a template or a pre-defined folder structure for it.

  • Container metadata schema: The available set of fields to describe a container.

  • Items metadata schema: The available set of fields to describe the items inside a container (files and folders).

  • Workflow: The workflow to use to handle the container.

  • Storage: The type and class of storage to use for the content that is placed in the container.

  • Check-in/out policy: That defines if you want to allow multiple users to work simultaneously over a data container or not.

  • Quota: The storage space you would like to assign to the container.

At the archival node level, you can define the templates that are going to be available to the users when creating new data containers in a given archival node. This way, if you want to enforce a certain policy for a certain group of experiments, you can create a node and associate a certain policy to it, that enforces the use of certain metadata fields for instance.

To define a container template, go to Configuration and then Data container templates. When the template has been created, you can associate it to an archival node in the Configuration menu under the Archival Structure option.

Reports

Certain reports may explain the logical structure and may take into consideration the archival node in which the content is located, rather than the individual data container. For instance, if you create a first level classification for each organization department, you can launch a report that will tell you how much storage each department is using.

Search results can be limited to a certain node or set of nodes. This may be useful to you as a way to easily filter the content you are looking for.

Data containers

Data containers are created inside an archival node, and are the next level of granularity you can use. You can use the same elements available for the archival nodes, described in the preceding section.

You can create containers for each experiment, year, type of data, etc, or you could have everything in the same container. This decision depends on how you plan to organize your data, but keep in mind that every item in the container is sharing:

  • The same permissions: you cannot adjust permissions by folder or by file, only by archival node or container.

  • The same metadata schema: Every file and folder inside a data container shares the same metadata schema, and may be inherited, if appropriate, by sub-folders.

When you are creating a container in a given archival node, LABDRIVE will show the possibility of choosing from a list of templates (outlined in the previous section) if they are configured for the node in which you are creating the container, or offer the user a full range of settings to choose from.

Part of the metadata could be Provenance, for example overall designs of instrumentation, which is applicable to all files and sub-folders in this container.

Folder and file names

Inside a LABDRIVE data container, you can create a hierarchical folders/subfolders structure like in traditional filesystems. This makes it possible to:

  • Use folders/subfolder: Group files within folders so information on a particular topic is located in one place. Start with a limited number of folders for the broader topics, and then create more specific folders within these.

  • Use a template: LABDRIVE allows you to start from a template when creating a new container. If you plan to have multiple projects with the same folder/file structure, create a template and use it as the starting point.

  • Use existing naming conventions: If you are already using a certain approach in your organization, you can always use it.

  • Name folders in a meaningful way: Name folders after the areas of work to which they relate, and not creating folders for individuals in which each of them will have its own way to organize the information. This makes the file system easier to navigate for new people joining the workspace, and makes locating the information easier for them.

  • Keep naming consistency: When developing a naming scheme for your folders it is important that once you have decided on a method, you stick to it.

  • Structure folders hierarchically: Create a limited number of folders for the broader areas or concepts, Provenance or Representation Information, and then create more specific folders inside them using the same principles.

  • Separate ongoing and final versioned work: As you start to create lots of folders and files, it is a good idea to start thinking about separating your older ojbects from those you are currently working on. Alternatively, all versions of an object may be kept in a single folder so that the folder name can be used as a logical name to represent all versions so that the latest version can always be delivered based on creation date within that folder.

  • Review your content: Assess your team naming and organization schema, and call their attention to the content that is not following the naming schema. It is a good opportunity to train your team and to revisit your naming schema for needed changes.

When using LABDRIVE, you are capable of searching by files, folders, etc using any part of the file/file name. Check the Advanced API File Search to know how.

Items' metadata

Files and folders can be associated with metadata in LABDRIVE. Object metadata can be used for searching and grouping content, independently of the container/folder in which the files are located.

For instance, you could create a metadata schema with three fields for your datasets:

  • Experiment code (string)

  • Date (date/time)

  • Final (boolean)

And then, you can easily query LABDRIVE for the datasets that belong to a specific experiment, that was created from X to Y date and that are final versions, or the elements only from a certain container, or only the ones with a certain file type (CSV for instance). Possibilities are truly endless.

If you are using the API or the management interface, take a look at the Search page. Remember that you can always use Jupyter Notebooks to extend LABDRIVE capabilities and to create your advanced search interfaces and custom reports.

Creating your structures in order

When creating your data structures, it is relevant that you do it following a particular order. If not you may end trying to create a Container without previously creating the Archival Node in which you want to create it in.

The recommended order to create your structures is the following (skip the ones you don't need to use):

  1. Users and groups

  2. Object metadata schema categories

  3. Container metadata schema categories

  4. Object metadata schema

  5. Object metadata fields

  6. Container metadata schema

  7. Container metadata fields

  8. Workflows

  9. Data container templates

  10. Archival node for template containers

  11. Template containers

  12. Archival Nodes for everything else

Further considerations which will affect the organisation of the information may be found in Planning and Using Additional Information in LABDRIVE.

Last updated