Resources and resource mappings

In the terminology of this library, a “resource” is a sequence of bytes associated with a URI. Currently the two types of resources recognized by asdf are schemas and extension manifests. Both of these are YAML documents whose associated URI is expected to match the id property of the document.

A “resource mapping” is an asdf plugin that provides access to the content for a URI. These plugins must implement the Mapping interface (a simple dict qualifies) and map str URI keys to bytes values. Resource mappings are installed into the asdf library via one of two routes: the AsdfConfig.add_resource_mapping method or the asdf.resource_mappings entry point.

Installing resources via AsdfConfig

The simplest way to isntall a resource into asdf is to add it at runtime using the AsdfConfig.add_resource_mapping method. For example, the following code istalls a schema for use with the asdf.AsdfFile custom_schema argument:

import asdf

content = b"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: asdf://example.com/example-project/schemas/foo-1.0.0
type: object
properties:
  foo:
    type: string
required: [foo]
...
"""

asdf.get_config().add_resource_mapping({
  "asdf://example.com/example-project/schemas/foo-1.0.0": content
})

The schema will now be available for validating files:

af = asdf.AsdfFile(custom_schema="asdf://example.com/example-project/schemas/foo-1.0.0")
af.validate() # Error, "foo" is missing

The DirectoryResourceMapping class

But what if we don’t want to store our schemas in variables in the code? Storing resources in a directory tree is a common use case, so asdf provides a Mapping implementation that reads schema content from a filesystem. This is the DirectoryResourceMapping class.

Consider these three schemas:

# foo-1.0.0.yaml
id: asdf://example.com/example-project/schemas/foo-1.0.0
# ...

# bar-2.3.4.yaml
id: asdf://example.com/example-project/nested/bar-2.3.4
# ...

# baz-8.1.1.yaml
id: asdf://example.com/example-project/nested/baz-8.1.1
# ...

which are arranged in the following directory structure:

schemas
├─ foo-1.0.0.yaml
├─ README
└─ nested
    ├─ bar-2.3.4.yaml
    └─ baz-8.1.1.yaml

Our goal is to install all schemas in the directory tree so that they are available for use with asdf. The DirectoryResourceMapping class can do that for us, but we need to show it how to construct the schema URIs from the file paths without reading the id property from the files. This requirement is a performance consideration; not all resources are used in every session, and if asdf were to read and parse all available files when plugins are loaded, the first call to asdf.open would be intolerably slow.

We should configure DirectoryResourceMapping like this:

import asdf
from asdf.resource import DirectoryResourceMapping

mapping = DirectoryResourceMapping(
    "/path/to/schemas",
    "asdf://example.com/example-project/schemas/",
    recursive=True,
    filename_pattern="*.yaml",
    stem_filename=True
)

asdf.get_config().add_resource_mapping(mapping)

The first argument is the path to the schemas directory on the filesystem. The second argument is the prefix that should be prepended to file paths relative to that root when constructing the schema URIs. The recursive argument tells the class to descend into the nested directory when searching for schemas, filename_pattern is a glob pattern chosen to exclude our README file, and stem_filename causes the class to drop the .yaml suffix when constructing URIs.

We can test that our configuration is correct by asking asdf to read and parse one of the schemas:

from asdf.schema import load_schema

uri = "asdf://example.com/example-project/schemas/nested/bar-2.3.4.yaml"
schema = load_schema(uri)
assert schema["id"] == uri

Installing resources via entry points

The asdf package also offers an entry point for installing resource mapping plugins. This installs a package’s resources automatically without requiring calls to the AsdfConfig method. The entry point is called asdf.resource_mappings and expects to receive a method that returns a list of Mapping instances.

For example, let’s say we’re creating a package named asdf-foo-schemas that provides the same schemas described in the previous section. Our directory structure might look something like this:

asdf-foo-schemas
├─ setup.cfg
├─ setup.py
└─ src
   └─ asdf_foo_schemas
      ├─ __init__.py
      ├─ integration.py
      └─ schemas
         ├─ __init__.py
         ├─ foo-1.0.0.yaml
         ├─ README
         └─ nested
            ├─ __init__.py
            ├─ bar-2.3.4.yaml
            └─ baz-8.1.1.yaml

In integration.py, we’ll define the entry point method and have it return a list with a single element, our DirectoryResourceMapping instance:

# integration.py
from pathlib import Path

from asdf.resource import DirectoryResourceMapping


def get_resource_mappings():
    # Get path to schemas directory relative to this file
    schemas_path = Path(__file__).parent / "schemas"
    mapping = DirectoryResourceMapping(
        schemas_path,
        "asdf://example.com/example-project/schemas/",
        recursive=True,
        filename_pattern="*.yaml",
        stem_filename=True
    )
    return [mapping]

Then in setup.cfg, define an [options.entry_points] section that identifies the method as an asdf.resource_mappings entry point:

# setup.cfg
[options.entry_points]
asdf.resource_mappings =
    asdf_foo_schemas = asdf_foo_schemas.integration:get_resource_mappings

After installing the package, it should be possible to load one of our schemas in a new session without any additional setup:

from asdf.schema import load_schema

uri = "asdf://example.com/example-project/schemas/nested/bar-2.3.4.yaml"
schema = load_schema(uri)
assert schema["id"] == uri

Note that the package will need to be configured to include the YAML files. There are multiple ways to accomplish this, but one easy option is to add an [options.package_data] section to setup.cfg requesting that all files with a .yaml extension be installed:

# setup.cfg
[options.package_data]
* = *.yaml

Entry point performance considerations

For the good of asdf users everywhere, it’s important that entry point methods load as quickly as possible. All resource URIs must be loaded before reading an ASDF file, so any entry point method that lingers will introduce a delay to the initial call to asdf.open. For that reason, we recommend to minimize the number of imports that occur in the module containing the entry point method, particularly imports of modules outside of the Python standard library or asdf itself. When resources are stored in a filesystem, it’s also helpful to delay reading a file until its URI is actually requested, which may not occur in a given session. The DirectoryResourceMapping class is implemented with this behavior.