Overview

Let’s start by taking a look at a few basic ASDF use cases. This will introduce you to some of the core features of ASDF and will show you how to get started with using ASDF in your own projects.

To follow along with this tutorial, you will need to install the asdf package. See Installation for details.

Hello World

At its core, ASDF is a way of saving nested data structures to YAML. Here we save a dict with the key/value pair 'hello': 'world'.

from asdf import AsdfFile

# Make the tree structure, and create a AsdfFile from it.
tree = {'hello': 'world'}
ff = AsdfFile(tree)
ff.write_to("test.asdf")

# You can also make the AsdfFile first, and modify its tree directly:
ff = AsdfFile()
ff.tree['hello'] = 'world'
ff.write_to("test.asdf")
test.asdfb'#ASDF' 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 2.7.4}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 2.7.4}
hello: world
...

Creating Files

We’re going to store several numpy arrays and other data to an ASDF file. We do this by creating a “tree”, which is simply a dict, and we provide it as input to the constructor of AsdfFile:

import asdf
import numpy as np

# Create some data
sequence = np.arange(100)
squares  = sequence**2
random = np.random.random(100)

# Store the data in an arbitrarily nested dictionary
tree = {
    'foo': 42,
    'name': 'Monty',
    'sequence': sequence,
    'powers': { 'squares' : squares },
    'random': random
}

# Create the ASDF file object from our data tree
af = asdf.AsdfFile(tree)

# Write the data to a new file
af.write_to('example.asdf')

If we open the newly created file, we can see some of the key features of ASDF on display:

#ASDF 1.0.0
#ASDF_STANDARD 1.2.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 2.0.0}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: {name: asdf, version: 2.0.0}
foo: 42
name: Monty
powers:
  squares: !core/ndarray-1.0.0
    source: 1
    datatype: int64
    byteorder: little
    shape: [100]
random: !core/ndarray-1.0.0
  source: 2
  datatype: float64
  byteorder: little
  shape: [100]
sequence: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [100]
...

The metadata in the file mirrors the structure of the tree that was stored. It is hierarchical and human-readable. Notice that metadata has been added to the tree that was not explicitly given by the user. Notice also that the numerical array data is not stored in the metadata tree itself. Instead, it is stored as binary data blocks below the metadata section (not shown here).

It is possible to compress the array data when writing the file:

af.write_to('compressed.asdf', all_array_compression='zlib')

Available compression algorithms are 'zlib', 'bzp2', and 'lz4'.

Reading Files

To read an existing ASDF file, we simply use the top-level open function of the asdf package:

import asdf

af = asdf.open('example.asdf')

The open function also works as a context handler:

with asdf.open('example.asdf') as af:
    ...

To access the data stored in the file, use the top-level AsdfFile.tree attribute:

>>> import asdf
>>> af = asdf.open('example.asdf')
>>> af.tree
{'asdf_library': {'author': 'Space Telescope Science Institute',
  'homepage': 'http://github.com/spacetelescope/asdf',
  'name': 'asdf',
  'version': '1.3.1'},
 'foo': 42,
 'name': 'Monty',
 'powers': {'squares': <array (unloaded) shape: [100] dtype: int64>},
 'random': <array (unloaded) shape: [100] dtype: float64>,
 'sequence': <array (unloaded) shape: [100] dtype: int64>}

The tree is simply a Python dict, and nodes are accessed like any other dictionary entry:

>>> af.tree['name']
'Monty'
>>> af.tree['powers']
{'squares': <array (unloaded) shape: [100] dtype: int64>}

Array data remains unloaded until it is explicitly accessed:

>>> af.tree['powers']['squares']
array([   0,    1,    4,    9,   16,   25,   36,   49,   64,   81,  100,
        121,  144,  169,  196,  225,  256,  289,  324,  361,  400,  441,
        484,  529,  576,  625,  676,  729,  784,  841,  900,  961, 1024,
       1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849,
       1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916,
       3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225,
       4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776,
       5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569,
       7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604,
       9801])

>>> import numpy as np
>>> expected = [x**2 for x in range(100)]
>>> np.equal(af.tree['powers']['squares'], expected).all()
True

By default, uncompressed data blocks are memory mapped for efficient access. Memory mapping can be disabled by using the copy_arrays option of open when reading:

af = asdf.open('example.asdf', copy_arrays=True)