Binary block compressors

The Compressor interface provides an implementation of a compression algorithm that can be used to transform binary blocks in an AsdfFile. Each Compressor must provide a 4-byte compression code that identifies the algorithm. Once the Compressor is installed as part of an Extension plugin, this code will be available to users as an argument to set_array_compression and the all_array_compression argument to write_to and update.

See Additional block compressors for details on including a Compressor in an extension.

The Compressor interface

Every Compressor implementation must provide one required property and two required methods:

Compressor.label - A 4-byte compression code. This code is used by users to select a compression algorithm and also stored in the binary block header to identify the algorithm that was applied to the block’s data.

Compressor.compress - The method that transforms the block’s bytes before they are written to an ASDF file. The positional argument is a memoryview object which is guaranteed to be 1D and contiguous. Compressors must be prepared to handle memoryview.itemsize > 1. Any keyword arguments are passed through from the user and may be used to tune the compression algorithm. compress methods have no return value and instead are expected to yield bytes-like values until the input data has been fully compressed.

Compressor.decompress - The method that transforms the block’s bytes after they are read from an ASDF file. The first positional argument is an Iterable of bytes-like objects that each contain a chunk of the compressed input data. The second positional argument is a pre-allocated output array where the decompressed bytes should be written. The method is expected to return the number of bytes written to the output array.

Entry point performance considerations

For the good of asdf users everywhere, it’s important that entry point methods load as quickly as possible. All extensions must be loaded before reading an ASDF file, and therefore all compressors are created as well. Any compressor module or __init__ method that lingers will introduce a delay to the initial call to asdf.open. For that reason, we recommend that compressor authors minimize the number of imports that occur in the module containing the Compressor implementation, and defer imports of compression libraries to inside the Compressor.compress and Compressor.decompress methods. This will prevent the library from ever being imported when reading ASDF files that do not utilize the Compressor’s algorithm.