Tag Versioning and You

Here we’ll explore ASDF tag versioning, and walk through the process of supporting new and updated tags with AsdfType subclasses. AsdfType is the original API that is currently used to support the ASDF core tags. The new API, Converter, remains experimental and is currently (2020-09-24) being trialled in the asdf-astropy package.

ASDF versioning conventions

The ASDF Standard document provides a helpful overview of the various ASDF versioning conventions. We will be concerned with the standard version and individual tag versions.

Overview

The “standard version” or “ASDF Standard version” refers to the subset of individual tag versions that correspond to a specific release version of the ASDF Standard. The list of tags and versions is maintained in version_map files in the asdf-standard repository. For example, version_map-1.3.0.yaml contains a list of all tag versions that we must handle in order to fully support version 1.3.0 of the ASDF Standard. This list contains both “core” tags and non-core tags. The distinction there is that core tags are supported by this library, while the others are supported by some external Python library, such as astropy.

Our support for specific versions of the ASDF core tags is implemented with AsdfType subclasses. We’ll discuss these more later, but for now the important thing to know is that each AsdfType class identifies the tag name and version(s) that it supports. Any core tag objects that lack this support will not serialize or deserialize properly.

When reading an ASDF file, the standard version doesn’t play a significant role. Each core object is self-described by a YAML tag, which will be used to deserialize the object even if that tag conflicts with the overall standard version of the file. The library will use the tag to identify the most appropriate AsdfType to deserialize the object.

On write, the situation is different. The library may have a choice in which tag and/or AsdfType to use when serializing a given core object – if multiple versions of the same tag are present, which shall we choose? Here the standard version becomes important. The tag version selected is specified by the version map of the standard version that the file is being written under.

By default, the standard version used for writes is the latest offered, but users may override with another version.

Implementation details

Supported ASDF standard version list

The list of supported ASDF standard versions is maintained in asdf.versioning.supported_versions. The default version, asdf.versioning.default_version, is applied whenever a user declines to specify the standard version of a new file, and is set to the latest supported version.

AsdfType

In this library, each core tag is handled by a distinct asdf.types.AsdfType subclass. The AsdfType subclass is responsible for identifying the base name of its tag and the tag version(s) that it supports. It also provides any custom serialization/deserialization behavior that is required – AsdfType provides a default implementation that is only able to get and set attributes on dict-like objects.

In some cases, the AsdfType subclass also serves as the deserialized object type. For example, asdf.types.core.Software subclasses both AsdfType and dict. Its AsdfType-like behavior is to identify its tag and version, while its dict-like behavior is to act as a container for the attributes described by the tag. The class definition is mostly empty because as a dict it can rely on AsdfType’s default implementation for (de)serialization.

Meanwhile, other AsdfType subclasses deserialize ASDF objects into instances of entirely separate classes. For example, asdf.types.core.complex.ComplexType handles complex number types, which aren’t natively supported by YAML. ComplexType includes an additional class attribute, types, that lists the types that it is able to handle. It also provides custom implementations of the to_tree and from_tree class methods, which enable it to serialize a complex value into the appropriate string, and later rebuild the complex value from that string. This additional code is necessary because ComplexType does not (de)serialize itself.

We won’t find an explicit list of AsdfType subclasses in the code; that list is assembled at runtime by AsdfType’s metaclass, asdf.types.AsdfTypeMeta. The list can be inspected in the console like so:

>>> import asdf
>>> asdf.types._all_asdftypes 
...

The AsdfType class attributes relevant to versioning are as follows:

  • name: the base name of the tag, without its version string. For example, the tag URI tag:stsci.edu:asdf/core/example-1.2.0 will have a name value of "core/example".

  • version: the primary tag version supported by the AsdfType. For the example above, version should be set to "1.2.0". This should be the latest version that the tag supports.

  • supported_versions: a set of tag versions that the AsdfType supports. In the above example, this might be {"1.0.0", "1.1.0", "1.2.0"}.

AsdfType selection rules

On read, the library will ideally be able to identify an AsdfType subclass that explicitly supports a given tag (either in the version class attribute or supported_versions. If that is not possible, it proceeds as follows:

  • Use the AsdfType that supports the latest version that is less than the tag version. For example, if the tag is example-1.2.0, and AsdfType are available for 1.1.0 and 1.3.0, it will use the 1.1.0 subclass.

  • If the above fails, use the earliest available AsdfType

  • If no AsdfType exists that supports any version of that tag, then ASDF will deserialize the data into vanilla diff.

The library does not currently emit a warning in either of the first two cases, but in the third case, a warning is emitted.

The rules for selecting an AsdfType for a given tag are implemented by asdf.type_index.AsdfTypeIndex.fix_yaml_tag.

On write, the library will read the version map that corresponds to the ASDF Standard version in use, which dictates the subset of tag versions that are available. From the subset of AsdfType subclasses that handle those tag versions, it selects the subclass that is able to handle the type of the core object being serialized.

If an object is not supported by an AsdfType, its serialization will be handled by pyyaml. If pyyaml doesn’t know how to serialize, it will raise yaml.representer.RepresenterError.

The rules for selecting an AsdfType for a given serializable object are implemented by asdf.type_index.AsdfTypeIndex.from_custom_type.

Implementing updates to the standard

Let’s assume that there is a new standard version, 2.0.0, which includes one entirely new core tag, core/new_object-1.0.0, one backwards-compatible update to an existing tag, core/updated_object-1.1.0, and one breaking change to an existing tag, core/breaking_object-2.0.0. The following sections walk through the steps we’ll need to take to support this new material.

Update the asdf-standard submodule commit pointer

The asdf-standard repository is integrated into the asdf repository as a submodule. To pull in new commits from the remote master ( assumed to be named origin:

$ cd asdf-standard
$ git fetch origin
$ git checkout origin/master

Support the new standard version

The list can be found in asdf.versioning.supported_versions. Add AsdfVersion("2.0.0") to the end of the list (maintaining the sort order). This new version will become the default for new files, but we can update the definition of asdf.versioning.default_version if that is undesirable.

Support the new tag

Tags for previously unsupported objects are straightforward, since we don’t need to worry about compatibility issues. Create a new AsdfType subclass with name and version set appropriately:

class NewObjectType(AsdfType):
    name = "core/new_object"
    version = "1.0.0"

In a real-life scenario, we’d need to actually support (de)serialization in some way, but those details are beyond the scope of this document.

Support the backwards-compatible tag

Since our updated_object-1.1.0 is backwards-compatible, we can share the same AsdfType subclass between it and the previous version. Presumably there exists an AsdfType that looks something like this:

class UpdatedObjectType(AsdfType):
    name = "core/updated_object"
    version = "1.0.0"

We’ll need to update the version, and list 1.0.0 as a supported version, so that this class can continue to handle it:

class UpdatedObjectType(AsdfType):
    name = "core/updated_object"
    version = "1.1.0"
    supported_versions = {"1.0.0", "1.1.0"}

Support the breaking tag

The tag with breaking changes, core/breaking_object-2.0.0, may not be easily supported by the same AsdfType as the previous version. In that case, we can create a new AsdfType for 2.0.0, and as long as the two subclasses have distinct version values and non-overlapping supported_versions sets, they should coexist peaceably.

If this is the existing AsdfType:

class BreakingObjectType(AsdfType):
    name = "core/breaking_object"
    version = "1.0.0"

The new AsdfType might look something like this:

class BreakingObjectType2(AsdfType):
    name = "core/breaking_object"
    version = "2.0.0"

CAUTION: We might be tempted here to simply update the original BreakingObjectType, but failing to handle an older version of the tag constitutes dropping support for any ASDF Standard version that relies on that tag. This should only be done after a deprecation period and with a major version release of the library, since files written by an older release will not be readable by the new code.