ASDF schemas
ASDF schemas are YAML documents that describe validations to be performed on tagged objects nested within the ASDF tree or on the tree itself. Schemas can validate the presence, datatype, and value of objects and their properties, and can be combined in different ways to facilitate reuse.
These schemas, though expressed in YAML, are structured according to the JSON Schema Draft 4 specification. The excellent Understanding JSON Schema book is a great place to start for users not already familiar with JSON Schema. Just keep in mind that the book includes coverage of later drafts of the JSON Schema spec, so certain features (constant values, conditional subschemas, etc) will not be available when writing schemas for ASDF. The book makes clear which features were introduced after Draft 4.
Anatomy of a schema
Here is an example of an ASDF schema that validates an object with a numeric value and corresponding unit:
1%YAML 1.1
2---
3$schema: http://stsci.edu/schemas/yaml-schema/draft-01
4id: asdf://asdf-format.org/core/schemas/quantity-2.0.0
5
6title: Quantity object containing numeric value and unit
7description: >-
8 An object with a numeric value, which may be a scalar
9 or an array, and associated unit.
10
11type: object
12properties:
13 value:
14 description: A vector of one or more values
15 anyOf:
16 - type: number
17 - tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
18 unit:
19 description: The unit corresponding to the values
20 tag: tag:stsci.edu:asdf/unit/unit-1.0.0
21 required: [value, unit]
22...
This is similar to the quantity schema, found here, of the ASDF Standard, but has been updated to reflect current recommendations regarding schemas. Let’s walk through this schema line by line.
1%YAML 1.1
2---
These first two lines form the header of the file. The %YAML 1.1
indicates that we’re following version 1.1 of the YAML spec. The
---
marks the start of a new YAML document.
3$schema: http://stsci.edu/schemas/yaml-schema/draft-01
The $schema
property contains the URI of the schema that validates
this document. Since our document is itself a schema, the URI refers to
a metaschema. ASDF comes with three built-in metaschemas:
http://json-schema.org/draft-04/schema
- The JSON Schema Draft 4 metaschema. Includes basic validators and combiners.http://stsci.edu/schemas/yaml-schema/draft-01
- The YAML Schema metaschema. Includes everything in JSON Schema Draft 4, plus additional YAML-specific validators includingtag
andpropertyOrder
.http://stsci.edu/schemas/asdf/asdf-schema-1.0.0
- The ASDF Schema metaschema. Includes everything in YAML Schema, plus additional ASDF-specific validators that check ndarray properties.
Our schema makes use of the tag
validator, so we’re specifying the YAML Schema
URI here.
4id: asdf://asdf-format.org/core/schemas/quantity-2.0.0
The id
property contains the URI that uniquely identifies our schema. This
URI is how we’ll refer to the schema when using the asdf library.
6title: Quantity object containing numeric value and unit
7description: >-
8 An object with a numeric value, which may be a scalar
9 or an array, and associated unit.
Title and description are optional (but recommended) documentation properties. These properties can be placed multiple times at any level of the schema and do not have an impact on the validation process.
11type: object
This line invokes the type
validator to check the data type of the
top-level value. We’re asserting that the type must be a YAML mapping,
which in Python is represented as a dict
.
12properties:
The properties
validator announces that we’d like to validate certain
named properties of mapping. If a property is listed here and is present
in the ASDF, it will be validated accordingly.
13 value:
14 description: A vector of one or more values
Here we’re identifying a property named value
that we’d like to
validate. The description
is used to add some additional
documentation.
15 anyOf:
The anyOf
validator is one of JSON Schema’s combiners. The value
property will be validated against each of the following subschemas, and
if any validates successfully, the entire anyOf
will be considered
valid. Other available combiners are allOf
, which requires that all
subschemas validate successfully, oneOf
, which requires that one and
only one of the subschemas validates, and not
, which requires that
a single subschema does not validate.
16 - type: number
The first subschema in the list contains a type
validator that
succeeds if the entity assigned to value
is a numeric literal.
17 - tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
The second subschema contains a tag
validator, which makes an
assertion regarding the YAML tag URI of the object assigned to value
.
In this subschema we’re requiring the tag of an ndarray-1.0.0 object,
which is how n-dimensional arrays are represented in an ASDF tree.
The net effect of the anyOf
combiner and its two subschemas is:
validate successfully if the value
object is either a numeric
literal or an n-dimensional array.
18 unit:
19 description: The unit corresponding to the values
20 tag: tag:stsci.edu:asdf/unit/unit-1.0.0
The unit
property has another bit of documentation and a
tag
validator that requires it to be a unit-1.0.0 object.
21required: [value, unit]
Since the properties
validator does not require the presence of
its listed properties, we need another validator to do that. The required
validator defines a list of properties that need to be present if validation
is to succeed.
21...
Finally, the YAML document end indicator indicates the end of the schema.
Checking schema syntax
The check_schema
function performs basic syntax checks on a schema and
will raise an error if it discovers a problem. It does not currently accept URIs and
requires that the schema already be loaded into Python objects. If the schema is already
registered with the asdf library as a resource (see Resources and resource mappings), it can
be loaded and checked like this:
from asdf.schema import load_schema, check_schema
schema = load_schema("asdf://example.com/example-project/schemas/foo-1.0.0")
check_schema(schema)
Otherwise, the schema can be loaded using pyyaml directly:
from asdf.schema import check_schema
import yaml
schema = yaml.safe_load(open("/path/to/foo-1.0.0.yaml").read())
check_schema(schema)
Testing validation
Getting a schema to validate as intended can be a tricky business, so it’s helpful
to test validation against some example objects as you go along. The validate
function will validate a Python object against a schema:
from asdf.schema import validate
import yaml
schema = yaml.safe_load(open("/path/to/foo-1.0.0.yaml").read())
obj = {"foo": "bar"}
validate(obj, schema=schema)
The validate function will return successfully if the object is valid, or raise an error if not.
Testing custom schemas
Packages that provide their own schemas can test them using asdf
’s
pytest plugin for schema testing.
Schemas are tested for overall validity, and any examples given within the
schemas are also tested.
The schema tester plugin is automatically registered when the asdf
package is
installed. In order to enable testing, it is necessary to add the directory
containing your schema files to the pytest section of your project’s build configuration
(pyproject.toml
or setup.cfg
). If you do not already have such a file, creating
one with the following should be sufficient:
[tool.pytest.ini_options]
asdf_schema_root = 'path/to/schemas another/path/to/schemas'
[tool:pytest]
asdf_schema_root = path/to/schemas another/path/to/schemas
The schema directory paths should be paths that are relative to the top of the
package directory when it is installed. If this is different from the path
in the source directory, then both paths can be used to facilitate in-place
testing (see asdf
’s own pyproject.toml
for an example of this).
Note
Older versions of asdf
(prior to 2.4.0) required the plugin to be registered
in your project’s conftest.py
file. As of 2.4.0, the plugin is now
registered automatically and so this line should be removed from your
conftest.py
file, unless you need to retain compatibility with older
versions of asdf
.
The asdf_schema_skip_names
configuration variable can be used to skip
schema files that live within one of the asdf_schema_root
directories but
should not be tested. The names should be given as simple base file names
(without directory paths or extensions). Again, see asdf
’s own pyproject.toml
file
for an example.
The schema tests do not run by default. In order to enable the tests by
default for your package, add asdf_schema_tests_enabled = 'true'
to the
[tool.pytest.ini_options]
section of your pyproject.toml
file (or [tool:pytest]
in setup.cfg
).
If you do not wish to enable the schema tests by default, you can add the --asdf-tests
option to
the pytest
command line to enable tests on a per-run basis.