ASDF schemas

ASDF schemas are YAML documents that describe validations to be performed on tagged objects nested within the ASDF tree or on the tree itself. Schemas can validate the presence, datatype, and value of objects and their properties, and can be combined in different ways to facilitate reuse.

These schemas, though expressed in YAML, are structured according to the JSON Schema Draft 4 specification. The excellent Understanding JSON Schema book is a great place to start for users not already familiar with JSON Schema. Just keep in mind that the book includes coverage of later drafts of the JSON Schema spec, so certain features (constant values, conditional subschemas, etc) will not be available when writing schemas for ASDF. The book makes clear which features were introduced after Draft 4.

Anatomy of a schema

Here is an example of an ASDF schema that validates an object with a numeric value and corresponding unit:

 1%YAML 1.1
 2---
 3$schema: http://stsci.edu/schemas/yaml-schema/draft-01
 4id: asdf://asdf-format.org/core/schemas/quantity-2.0.0
 5
 6title: Quantity object containing numeric value and unit
 7description: >-
 8  An object with a numeric value, which may be a scalar
 9  or an array, and associated unit.
10
11type: object
12properties:
13  value:
14    description: A vector of one or more values
15    anyOf:
16      - type: number
17      - tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
18  unit:
19    description: The unit corresponding to the values
20    tag: tag:stsci.edu:asdf/unit/unit-1.0.0
21  required: [value, unit]
22...

This is similar to the quantity schema, found here, of the ASDF Standard, but has been updated to reflect current recommendations regarding schemas. Let’s walk through this schema line by line.

1%YAML 1.1
2---

These first two lines form the header of the file. The %YAML 1.1 indicates that we’re following version 1.1 of the YAML spec. The --- marks the start of a new YAML document.

3$schema: http://stsci.edu/schemas/yaml-schema/draft-01

The $schema property contains the URI of the schema that validates this document. Since our document is itself a schema, the URI refers to a metaschema. ASDF comes with three built-in metaschemas:

  • http://json-schema.org/draft-04/schema - The JSON Schema Draft 4 metaschema. Includes basic validators and combiners.

  • http://stsci.edu/schemas/yaml-schema/draft-01 - The YAML Schema metaschema. Includes everything in JSON Schema Draft 4, plus additional YAML-specific validators including tag and propertyOrder.

  • http://stsci.edu/schemas/asdf/asdf-schema-1.0.0 - The ASDF Schema metaschema. Includes everything in YAML Schema, plus additional ASDF-specific validators that check ndarray properties.

Our schema makes use of the tag validator, so we’re specifying the YAML Schema URI here.

4id: asdf://asdf-format.org/core/schemas/quantity-2.0.0

The id property contains the URI that uniquely identifies our schema. This URI is how we’ll refer to the schema when using the asdf library.

6title: Quantity object containing numeric value and unit
7description: >-
8  An object with a numeric value, which may be a scalar
9  or an array, and associated unit.

Title and description are optional (but recommended) documentation properties. These properties can be placed multiple times at any level of the schema and do not have an impact on the validation process.

11type: object

This line invokes the type validator to check the data type of the top-level value. We’re asserting that the type must be a YAML mapping, which in Python is represented as a dict.

12properties:

The properties validator announces that we’d like to validate certain named properties of mapping. If a property is listed here and is present in the ASDF, it will be validated accordingly.

13  value:
14    description: A vector of one or more values

Here we’re identifying a property named value that we’d like to validate. The description is used to add some additional documentation.

15  anyOf:

The anyOf validator is one of JSON Schema’s combiners. The value property will be validated against each of the following subschemas, and if any validates successfully, the entire anyOf will be considered valid. Other available combiners are allOf, which requires that all subschemas validate successfully, oneOf, which requires that one and only one of the subschemas validates, and not, which requires that a single subschema does not validate.

16    - type: number

The first subschema in the list contains a type validator that succeeds if the entity assigned to value is a numeric literal.

17    - tag: tag:stsci.edu:asdf/core/ndarray-1.0.0

The second subschema contains a tag validator, which makes an assertion regarding the YAML tag URI of the object assigned to value. In this subschema we’re requiring the tag of an ndarray-1.0.0 object, which is how n-dimensional arrays are represented in an ASDF tree.

The net effect of the anyOf combiner and its two subschemas is: validate successfully if the value object is either a numeric literal or an n-dimensional array.

18  unit:
19    description: The unit corresponding to the values
20    tag: tag:stsci.edu:asdf/unit/unit-1.0.0

The unit property has another bit of documentation and a tag validator that requires it to be a unit-1.0.0 object.

21required: [value, unit]

Since the properties validator does not require the presence of its listed properties, we need another validator to do that. The required validator defines a list of properties that need to be present if validation is to succeed.

21...

Finally, the YAML document end indicator indicates the end of the schema.

Checking schema syntax

The check_schema function performs basic syntax checks on a schema and will raise an error if it discovers a problem. It does not currently accept URIs and requires that the schema already be loaded into Python objects. If the schema is already registered with the asdf library as a resource (see Resources and resource mappings), it can be loaded and checked like this:

from asdf.schema import load_schema, check_schema

schema = load_schema("asdf://example.com/example-project/schemas/foo-1.0.0")
check_schema(schema)

Otherwise, the schema can be loaded using pyyaml directly:

from asdf.schema import check_schema
import yaml

schema = yaml.safe_load(open("/path/to/foo-1.0.0.yaml").read())
check_schema(schema)

Testing validation

Getting a schema to validate as intended can be a tricky business, so it’s helpful to test validation against some example objects as you go along. The validate function will validate a Python object against a schema:

from asdf.schema import validate
import yaml

schema = yaml.safe_load(open("/path/to/foo-1.0.0.yaml").read())
obj = {"foo": "bar"}
validate(obj, schema=schema)

The validate function will return successfully if the object is valid, or raise an error if not.

Testing custom schemas

Packages that provide their own schemas can test them using asdf’s pytest plugin for schema testing. Schemas are tested for overall validity, and any examples given within the schemas are also tested.

The schema tester plugin is automatically registered when the asdf package is installed. In order to enable testing, it is necessary to add the directory containing your schema files to the pytest section of your project’s build configuration (pyproject.toml or setup.cfg). If you do not already have such a file, creating one with the following should be sufficient:

[tool.pytest.ini_options]
asdf_schema_root = 'path/to/schemas another/path/to/schemas'
[tool:pytest]
asdf_schema_root = path/to/schemas another/path/to/schemas

The schema directory paths should be paths that are relative to the top of the package directory when it is installed. If this is different from the path in the source directory, then both paths can be used to facilitate in-place testing (see asdf’s own pyproject.toml for an example of this).

Note

Older versions of asdf (prior to 2.4.0) required the plugin to be registered in your project’s conftest.py file. As of 2.4.0, the plugin is now registered automatically and so this line should be removed from your conftest.py file, unless you need to retain compatibility with older versions of asdf.

The asdf_schema_skip_names configuration variable can be used to skip schema files that live within one of the asdf_schema_root directories but should not be tested. The names should be given as simple base file names (without directory paths or extensions). Again, see asdf’s own pyproject.toml file for an example.

The schema tests do not run by default. In order to enable the tests by default for your package, add asdf_schema_tests_enabled = 'true' to the [tool.pytest.ini_options] section of your pyproject.toml file (or [tool:pytest] in setup.cfg). If you do not wish to enable the schema tests by default, you can add the --asdf-tests option to the pytest command line to enable tests on a per-run basis.

See also: