cf.Field.HDF_chunks

Field.HDF_chunks(*chunksizes)[source]

Specify HDF5 chunks for the data array.

Chunking refers to a storage layout where the data array is partitioned into fixed-size multi-dimensional chunks when written to a netCDF4 file on disk. Chunking is ignored if the data array is written to a netCDF3 format file.

A chunk has the same rank as the data array, but with fewer (or no more) elements along each axis. The chunk is defined by a dictionary in which the keys identify axes and the values are the chunk sizes for those axes.

If a given chunk size for an axis is larger than the axis size, then the size of the axis at the time of writing to disk will be used instead.

If chunk sizes have been specified for some but not all axes, then the each unspecified chunk size is assumed to be the full size of its axis.

If no chunk sizes have been set for any axes then the netCDF default chunk is used (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html).

A detailed discussion of HDF chunking and I/O performance is available at https://www.hdfgroup.org/HDF5/doc/H5.user/Chunking.html and http://www.unidata.ucar.edu/software/netcdf/workshops/2011/nc4chunking. Basically, you want the chunks for each dimension to match as closely as possible the size and shape of the data block that users will read from the file.

Chunking the metadata

The coordinate, cell measure, and ancillary contructs are not automatically chunked, but they may be chunked manually. For example, a two dimensional latitude coordinate could chunked as follows (see cf.AuxiliaryCoordinate.HDF_chunks for details):

>>> f.coord('latitude').HDF_chunks({0: 10, 1: 15})

In version 2.0, the metadata will be automatically chunked.

Chunking via cf.write

Chunking may also be defined via a parameter to the cf.write function, in which case any axis chunk sizes set on the field take precedence.

New in version 1.1.13.

See also

cf.write

Examples 1:

To define chunks which are the full size for each axis except for the time axis which is to have a chunk size of 12:

>>> old_chunks = f.HDF_chunks({'T': 12})
Parameters:
chunksizes : dict or None, optional

Specify the chunk sizes for axes of the field. Axes are given by dictionary keys, with a chunk size for those axes as the dictionary values. A dictionary key of axes defines the axes that would be returned by the field’s axes method, i.e. by f.axes(axes). See cf.Field.axes for details. In the special case of chunksizes being None, then chunking is set to the netCDF default.

Example:

To set the chunk size for time axes to 365: {'T': 365}.

Example:

To set the chunk size for the first and third data array axes to 100: {0: 100, 2: 100}, or equivalently {(0, 2): 100}.

Example:

To set the chunk size for the longitude axis to 100 and for the air temperature axis to 5: {'X': 100, 'air_temperature': 5}.

Example:

To set the chunk size for all axes to 10: {None: 10}. This works because f.axes(None) returns all field axes.

Example:

To set the chunking to the netCDF default: None.

Returns:
out : dict

The chunk sizes prior to the new setting, or the current current sizes if no new values are specified.

Examples 2:
>>> f
<CF Field: air_temperature(time(3650), latitude(64), longitude(128)) K>
>>> f.HDF_chunks()
{0: None, 1: None, 2: None}
>>> f.HDF_chunks({'T': 365, 2: 1000})
{0: None, 1: None, 2: None}
>>> f.HDF_chunks({'X': None})
{0: 365, 1: None, 2: 1000}
>>> f.HDF_chunks(None)
{0: 365, 1: None, 2: None}
>>> f.HDF_chunks()
{0: None, 1: None, 2: None}