Examples

Hello World

In it’s simplest form, ASDF is a way of saving nested data structures to YAML. Here we save a dictionary with the key/value pair 'hello': 'world'.

from asdf import AsdfFile

# Make the tree structure, and create a AsdfFile from it.
tree = {'hello': 'world'}
ff = AsdfFile(tree)
ff.write_to("test.asdf")

# You can also make the AsdfFile first, and modify its tree directly:
ff = AsdfFile()
ff.tree['hello'] = 'world'
ff.write_to("test.asdf")

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
hello: world
...

Saving arrays

Beyond the basic data types of dictionaries, lists, strings and numbers, the most important thing ASDF can save is arrays. It’s as simple as putting a Numpy array somewhere in the tree. Here, we save an 8x8 array of random floating-point numbers. Note that the YAML part contains information about the structure (size and data type) of the array, but the actual array content is in a binary block.

from asdf import AsdfFile
import numpy as np

tree = {'my_array': np.random.rand(8, 8)}
ff = AsdfFile(tree)
ff.write_to("test.asdf")

Note

In the file examples below, the first YAML part appears as it appears in the file. The BLOCK sections are stored as binary data in the file, but are presented in human-readable form on this page.

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [8, 8]
...
BLOCK 0:
    allocated_size: 512
    used_size: 512
    data_size: 512
    data: b73b154fb031e93fa02df3a075cba13fd46e6814...
#ASDF BLOCK INDEX
%YAML 1.1
--- [353]
...

Schema validation

In the current draft of the ASDF schema, there are very few elements defined at the top-level – for the most part, the top-level can contain any elements. One of the few specified elements is data: it must be an array, and is used to specify the “main” data content (for some definition of “main”) so that tools that merely want to view or preview the ASDF file have a standard location to find the most interesting data. If you set this to anything but an array, asdf will complain:

>>> from asdf import AsdfFile
>>> tree = {'data': 'Not an array'}
>>> AsdfFile(tree)  
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:asdf/core/ndarray-1.0.0', got
'tag:yaml.org,2002:str'
...

This validation happens only when a AsdfFile is instantiated, read or saved, so it’s still possible to get the tree into an invalid intermediate state:

>>> from asdf import AsdfFile
>>> ff = AsdfFile()
>>> ff.tree['data'] = 'Not an array'
>>> # The ASDF file is now invalid, but asdf will tell us when
>>> # we write it out.
>>> ff.write_to('test.asdf')  
Traceback (most recent call last):
...
ValidationError: mismatched tags, wanted
'tag:stsci.edu:asdf/core/ndarray-1.0.0', got
'tag:yaml.org,2002:str'
...

Sharing of data

Arrays that are views on the same data automatically share the same data in the file. In this example an array and a subview on that same array are saved to the same file, resulting in only a single block of data being saved.

from asdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
subset = my_array[2:4,3:6]
tree = {
    'my_array': my_array,
    'subset':   subset
}
ff = AsdfFile(tree)
ff.write_to("test.asdf")

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [8, 8]
subset: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [2, 3]
  offset: 152
  strides: [64, 8]
...
BLOCK 0:
    allocated_size: 512
    used_size: 512
    data_size: 512
    data: fed8a5a36292de3fc9742f478bcbe33f2a04cf63...
#ASDF BLOCK INDEX
%YAML 1.1
--- [482]
...

Saving inline arrays

For these sort of small arrays, you may not care about the efficiency of a binary representation and want to just save the content directly in the YAML tree. The set_array_storage method can be used to set the type of block of the associated data, either internal, external or inline.

  • internal: The default. The array data will be stored in a binary block in the same ASDF file.
  • external: Store the data in a binary block in a separate ASDF file.
  • inline: Store the data as YAML inline in the tree.
from asdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = AsdfFile(tree)
ff.set_array_storage(my_array, 'inline')
ff.write_to("test.asdf")

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
  data:
  - [0.5456266967971288, 0.4317107421294396, 0.5287068110840725, 0.28251023187580415,
    0.9320987210870508, 0.22255185080078488, 0.09525305414563379, 0.7699704461037513]
  - [0.3151041558282649, 0.6681743291152566, 0.8812497094057354, 0.14558063656945297,
    0.25656158545928076, 0.23579111970388444, 0.24037146154097044, 0.6645671630976169]
  - [0.6194576673417725, 0.30086561161753755, 0.19913519076816832, 0.9257413227079443,
    0.2886004051119241, 0.20348742711899137, 0.16109329832024422, 0.9545821649061179]
  - [0.35206027447031574, 0.6703522443418255, 0.3874959243071987, 0.229113329747213,
    0.08841674420082579, 0.39016847990097114, 0.5010751719422006, 0.5682131436544698]
  - [0.2765049239325059, 0.7589027466154844, 0.6316227401736904, 0.9441958054464983,
    0.9152572905566972, 0.2704158565602697, 0.9473659589855606, 0.9076415767902133]
  - [0.45742851896138603, 0.9994086464127573, 0.12959035739986058, 0.1253267606175884,
    0.19922761260828836, 0.16226655570144555, 0.6507213278165754, 0.40628689181529243]
  - [0.02595763103925508, 0.913914184983345, 0.35075284374190885, 0.6809966874798863,
    0.48427027817320056, 0.01743840814577513, 0.9352790688748627, 0.11334678418163247]
  - [0.17567847788971624, 0.260077175212994, 0.02522040143379478, 0.7689959501909805,
    0.24905748691177376, 0.8722415963540282, 0.6344327171211053, 0.4692444771869736]
  datatype: float64
  shape: [8, 8]
...

Saving external arrays

ASDF files may also be saved in “exploded form”, in multiple files:

  • An ASDF file containing only the header and tree.
  • n ASDF files, each containing a single block.

Exploded form is useful in the following scenarios:

  • Not all text editors may handle the hybrid text and binary nature of the ASDF file, and therefore either can’t open a ASDF file or would break a ASDF file upon saving. In this scenario, a user may explode the ASDF file, edit the YAML portion as a pure YAML file, and implode the parts back together.
  • Over a network protocol, such as HTTP, a client may only need to access some of the blocks. While reading a subset of the file can be done using HTTP Range headers, it still requires one (small) request per block to “jump” through the file to determine the start location of each block. This can become time-consuming over a high-latency network if there are many blocks. Exploded form allows each block to be requested directly by a specific URI.
  • An ASDF writer may stream a table to disk, when the size of the table is not known at the outset. Using exploded form simplifies this, since a standalone file containing a single table can be iteratively appended to without worrying about any blocks that may follow it.

To save a block in an external file, set its block type to 'external'.

from asdf import AsdfFile
import numpy as np

my_array = np.random.rand(8, 8)
tree = {'my_array': my_array}
ff = AsdfFile(tree)

# On an individual block basis:
ff.set_array_storage(my_array, 'external')
ff.write_to("test.asdf")

# Or for every block:
ff.write_to("test.asdf", all_array_storage='external')

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_array: !core/ndarray-1.0.0
  source: test0000.asdf
  datatype: float64
  byteorder: little
  shape: [8, 8]
...

test0000.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
...
BLOCK 0:
    allocated_size: 512
    used_size: 512
    data_size: 512
    data: 00898e92e526b63f60953d70dbc1a23f7078cf40...
#ASDF BLOCK INDEX
%YAML 1.1
--- [255]
...

Streaming array data

In certain scenarios, you may want to stream data to disk, rather than writing an entire array of data at once. For example, it may not be possible to fit the entire array in memory, or you may want to save data from a device as it comes in to prevent data loss. The ASDF standard allows exactly one streaming block per file where the size of the block isn’t included in the block header, but instead is implicitly determined to include all of the remaining contents of the file. By definition, it must be the last block in the file.

To use streaming, rather than including a Numpy array object in the tree, you include a asdf.Stream object which sets up the structure of the streamed data, but will not write out the actual content. The file handle’s write method is then used to manually write out the binary data.

from asdf import AsdfFile, Stream
import numpy as np

tree = {
    # Each "row" of data will have 128 entries.
    'my_stream': Stream([128], np.float64)
}

ff = AsdfFile(tree)
with open('test.asdf', 'wb') as fd:
    ff.write_to(fd)
    # Write 100 rows of data, one row at a time.  ``write``
    # expects the raw binary bytes, not an array, so we use
    # ``tostring()``.
    for i in range(100):
        fd.write(np.array([i] * 128, np.float64).tostring())

test.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_stream: !core/ndarray-1.0.0
  source: -1
  datatype: float64
  byteorder: little
  shape: ['*', 128]
...
BLOCK 0:
    flags: BLOCK_FLAG_STREAMED
    allocated_size: 0
    used_size: 0
    data_size: 0
    data: 0000000000000000000000000000000000000000...

References

ASDF files may reference items in the tree in other ASDF files. The syntax used in the file for this is called “JSON Pointer”, but users of asdf can largely ignore that.

First, we’ll create a ASDF file with a couple of arrays in it:

from asdf import AsdfFile
import numpy as np

tree = {
    'a': np.arange(0, 10),
    'b': np.arange(10, 20)
}

target = AsdfFile(tree)
target.write_to('target.asdf')

target.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [10]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
b: !core/ndarray-1.0.0
  source: 1
  datatype: int64
  byteorder: little
  shape: [10]
...
BLOCK 0:
    allocated_size: 80
    used_size: 80
    data_size: 80
    data: 0000000000000000010000000000000002000000...
BLOCK 1:
    allocated_size: 80
    used_size: 80
    data_size: 80
    data: 0a000000000000000b000000000000000c000000...
#ASDF BLOCK INDEX
%YAML 1.1
--- [429, 563]
...

Then we will reference those arrays in a couple of different ways. First, we’ll load the source file in Python and use the make_reference method to generate a reference to array a. Second, we’ll work at the lower level by manually writing a JSON Pointer to array b, which doesn’t require loading or having access to the target file.

ff = AsdfFile()

with AsdfFile.open('target.asdf') as target:
    ff.tree['my_ref_a'] = target.make_reference(['a'])

ff.tree['my_ref_b'] = {'$ref': 'target.asdf#b'}

ff.write_to('source.asdf')

source.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_ref_a: {$ref: target.asdf#a}
my_ref_b: {$ref: target.asdf#b}
...

Calling find_references will look up all of the references so they can be used as if they were local to the tree. It doesn’t actually move any of the data, and keeps the references as references.

with AsdfFile.open('source.asdf') as ff:
    ff.find_references()
    assert ff.tree['my_ref_b'].shape == (10,)

On the other hand, calling resolve_references places all of the referenced content directly in the tree, so when we write it out again, all of the external references are gone, with the literal content in its place.

with AsdfFile.open('source.asdf') as ff:
    ff.resolve_references()
    ff.write_to('resolved.asdf')

resolved.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
my_ref_a: !core/ndarray-1.0.0
  source: 0
  datatype: int64
  byteorder: little
  shape: [10]
my_ref_b: !core/ndarray-1.0.0
  source: 1
  datatype: int64
  byteorder: little
  shape: [10]
...
BLOCK 0:
    allocated_size: 80
    used_size: 80
    data_size: 80
    data: 0000000000000000010000000000000002000000...
BLOCK 1:
    allocated_size: 80
    used_size: 80
    data_size: 80
    data: 0a000000000000000b000000000000000c000000...
#ASDF BLOCK INDEX
%YAML 1.1
--- [443, 577]
...

A similar feature provided by YAML, anchors and aliases, also provides a way to support references within the same file. These are supported by asdf, however the JSON Pointer approach is generally favored because:

  • It is possible to reference elements in another file
  • Elements are referenced by location in the tree, not an identifier, therefore, everything can be referenced.

Anchors and aliases are handled automatically by asdf when the data structure is recursive. For example here is a dictionary that is included twice in the same tree:

d = {'foo': 'bar'}
d['baz'] = d
tree = {'d': d}

ff = AsdfFile(tree)
ff.write_to('anchors.asdf')

anchors.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
d:
  baz:
    baz: &id001
      baz: *id001
      foo: bar
    foo: bar
  foo: bar
...

Compression

Individual blocks in an ASDF file may be compressed.

You can easily zlib or bzip2 compress all blocks:

from asdf import AsdfFile
import numpy as np

tree = {
    'a': np.random.rand(256, 256),
    'b': np.random.rand(512, 512)
}

target = AsdfFile(tree)
target.write_to('target.asdf', all_array_compression='zlib')
target.write_to('target.asdf', all_array_compression='bzp2')

target.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [256, 256]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
b: !core/ndarray-1.0.0
  source: 1
  datatype: float64
  byteorder: little
  shape: [512, 512]
...
BLOCK 0:
    compression: bzp2
    allocated_size: 500870
    used_size: 500870
    data_size: 524288
    data: 4c0a61827701e83f22269fedc7ebd73fd2a952fa...
BLOCK 1:
    compression: bzp2
    allocated_size: 2002130
    used_size: 2002130
    data_size: 2097152
    data: ed2c91f9bfc1ea3f00389698ba51733f94f099d9...
#ASDF BLOCK INDEX
%YAML 1.1
--- [445, 501369]
...

Saving history entries

asdf has a convenience method for notating the history of transformations that have been performed on a file.

Given a AsdfFile object, call add_history_entry, given a description of the change and optionally a description of the software (i.e. your software, not asdf) that performed the operation.

from asdf import AsdfFile
import numpy as np

tree = {
    'a': np.random.rand(256, 256)
}

ff = AsdfFile(tree)
ff.add_history_entry(
    u"Initial random numbers",
    {u'name': u'asdf examples',
     u'author': u'John Q. Public',
     u'homepage': u'http://github.com/spacetelescope/asdf',
     u'version': u'0.1'})
ff.write_to('example.asdf')

example.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
a: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [256, 256]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
history:
- !core/history_entry-1.0.0
  description: Initial random numbers
  software: !core/software-1.0.0 {author: John Q. Public, homepage: 'http://github.com/spacetelescope/asdf',
    name: asdf examples, version: '0.1'}
  time: 2017-07-07 15:45:09.857781
...
BLOCK 0:
    allocated_size: 524288
    used_size: 524288
    data_size: 524288
    data: 9062936f33a4e33fcc8b5ac0c9badb3f25c1e9bf...
#ASDF BLOCK INDEX
%YAML 1.1
--- [610]
...

Saving ASDF in FITS

Sometimes you may need to store the structured data supported by ASDF inside of a FITS file in order to be compatible with legacy tools that support only FITS. This can be achieved by including a special extension with the name ASDF to the FITS file, containing the YAML tree from an ASDF file. The array tags within the ASDF tree point directly to other binary extensions in the FITS file.

First, make a FITS file in the usual way with astropy.io.fits. Here, we are building a FITS file from scratch, but it could also have been loaded from a file.

This FITS file has two image extensions, SCI and DQ respectively.

from astropy.io import fits

hdulist = fits.HDUList()
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='SCI'))
hdulist.append(fits.ImageHDU(np.arange(512, dtype=np.float), name='DQ'))

Next we make a tree structure out of the data in the FITS file. Importantly, we use the same arrays in the FITS HDUList and store them in the tree. By doing this, asdf will be smart enough to point to the data in the regular FITS extensions.

tree = {
    'model': {
        'sci': {
            'data': hdulist['SCI'].data,
        },
        'dq': {
            'data': hdulist['DQ'].data,
        }
    }
}

Now we take both the FITS HDUList and the ASDF tree and create a AsdfInFits object. It behaves identically to the AsdfFile object, but reads and writes this special ASDF-in-FITS format.

from asdf import fits_embed

ff = fits_embed.AsdfInFits(hdulist, tree)
ff.write_to('embedded_asdf.fits')

The special ASDF extension in the resulting FITS file looks like the following. Note that the data source of the arrays uses the fits: prefix to indicate that the data comes from a FITS extension.

content.asdf

#ASDF 1.0.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.2.1}
model:
  dq:
    data: !core/ndarray-1.0.0
      source: fits:DQ,1
      datatype: float64
      byteorder: big
      shape: [512]
  sci:
    data: !core/ndarray-1.0.0
      source: fits:SCI,1
      datatype: float64
      byteorder: big
      shape: [512]
...

To load an ASDF-in-FITS file, first open it with astropy.io.fits, and then pass that HDU list to AsdfInFits:

with fits.open('embedded_asdf.fits') as hdulist:
    with fits_embed.AsdfInFits.open(hdulist) as asdf:
        science = asdf.tree['model']['sci']