cf-python

Go to the cf-python home page for more documention, downloads and source code.

Space structure and behaviour

An implementation of the CF data model.

Provide the objects necessary to represent a space according to the CF data model, with functions for I/O and basic manipulation.

Space structure

A space is composed as follows (ignoring attributes not composed from cf classes):

Space:
* cell methods
* ancillary variables
* grid:
* coordinates:
* coordinate bounds
* cell measures
* transforms

All components of a space are optional. Coordinates and cell measures are key values of the grid dictionary and all other components are attributes. Refer to each class’s documentation for details.

Data storage, access and manipulation

Some of the objects contain data arrays, namely all objects derived from the base class Variable (spaces, coordinates, coordinate bounds and cell measures). The data storage, access and manipulation model is very similar for all of them. In the following description, ‘variable’ may refer to any such object and it will be noted when certain objects (usually a space) exhibit different behaviour.

Copying and resizing

When a variable is indexed, a new variable is created which is a deep copy of the original but with the data array sliced according to given indices. For a space, the grid is also sliced by applying each dimension’s slice to the appropriate grid elements.

A variable uses similar indexing to a numpy array:

>>> isinstance(v, Variable)
True
>>> v.shape
(10, 20, 30)
>>> v[0:5,...,slice(0,11,2)].shape
(5, 20, 6)
>>> v.shape
(10, 20, 30)
>>> v[:,:,numpy.arange(30) < 4].shape
(10, 20, 4)

The only differences to numpy array indexing are:

  1. More than one dimension’s slice may be a boolean array.
  2. A tuple (as well as a list) of indices for a single dimension is allowed.

Dimensions of size one resulting from a slice are discarded for a space but retained for other types:

>>> s
<CF Space: air_temperature(7070, 64, 128)>
>>> s[0]
<CF Space: air_temperature(64, 128)>
>>> c
<CF Coordinate: latitude(10, 20)>
>>> c[0]
<CF Coordinate: latitude(1, 20)>

Scalar data are not allowed in any case and are stored instead as size 1 arrays with at least one dimension:

>>> s[0,0,0]
<CF Space: air_temperature(1)>
>>> c[-1,-1]
<CF Coordinate: latitude(1, 1)>

Creating a deep copy of a variable

Indexing a variable with an ellipsis or equivalent reads the data from disk (if required) before returning a deep copy of itself, and so both the original variable and its copy contain different numpy array objects in memory. It is, however, also possible to create a deep copy of a variable without altering the type of its data from a file pointer to a numpy array, which can enhance performance (see the ‘Storage’ section). There are three equivalent ways of doing this:

>>> w = v()
>>> w = v.copy()
>>> w = copy.deepcopy(v)

The difference between copying by one of these methods and copying by indexing is as follows:

>>> v.type
<type 'netCDF4.Variable'>
>>> w = v()
>>> w.type, v.type
(<type 'netCDF4.Variable'>, <type 'netCDF4.Variable'>)
>>> w = v[...]
>>> w.type, v.type
(<type 'numpy.ndarray'>, <type 'numpy.ndarray'>)

Note that a call to a space also allows arguments which create a slice of the space according to conditions on its coordinate values.

Assignment to the data array

Assignment to the data array is done with the same indexing as for data access. Assignment can not change the shape of the data:

>>> v.shape
(10, 20, 30)
>>> v[0, 0, 0] = 273.15
>>> v.varray[0,0,0]
273.15
>>> v[0] = 273.15
>>> v.shape
(10, 20, 30)
>>> v[1,2,:] = range(30)
>>> v[...] = numpy.arange(6000).reshape(10, 20, 30)

Storage

The data may be stored as a numpy array or as a file pointer in the form of a netCDF4.Variable instance. Accessing the data makes no distinction between the two storage methods, but there are I/O, memory and speed performance issues to consider. If the data are a file pointer then accessing any part of the array will cause the entire array to be read from disk into memory and stored as a numpy array, replacing the original file pointer. Refer to read() and write().

A variable’s repoint() method will revert the data back to a file pointer if possible, freeing memory if no other variables are referring to the numpy array (or arrays).

Operator overloading for variables

The following operators, operations and assignments are overloaded in a variable to apply element-wise to the variable’s data numpy array.

Comparison operators:

==, !=, >, <, >=, <=

Binary arithmetic operations:

+, -, *, /, //, %, pow(), **, &, ^, |

Unary arithmetic operations:

-, +, abs(), ~

Augmented arithmetic assignments:

+=, -=, *=, /=, //=, %=, **=, &=, ^=, |=

Either side of an arithmetic operation may be a variable (in which case its data as a numpy array are used) or any object allowed by the equivalent numpy operation. Note that, as usual, if the left hand side object supports the operation with the right hand side then the returned object will be of former’s type:

>>> type(v)
<class 'cf.space.Space'>
>>> type(2.5 + v)
<class 'cf.space.Space'>
>>> type(v.varray + v)
<type 'numpy.ndarray'>

Apart from this exception, the unary and binary arithmetic operations return a new variable with modified data. The augmented arithmetic assignments change a variable’s data an in-place.

Numeric equalities for the == and != comparisons are determined to within a tolerance which may be adjusted.

Tolerance of numeric equality

All objects defined in cf have an equals() method which determines the congruence of two instances. The aspects of this equality vary between objects, but for all objects numeric equalities are tested to within a tolerance defined by parameters ‘rtol’ (relative tolerance) and ‘atol’ (absolute tolerance), where two numbers a and b (from the left and right sides respectively of the comparison) are considered equal if:

|a-b|<=atol+rtol*|b|

Default tolerances are taken from the parameters cf.DEFAULT_RTOL and cf.DEFAULT_ATOL.

The values of these parameters may be overridden for any object by setting its _rtol and _atol attributes for relative and absolute tolerances respectively. Either of these attributes may be non-existent or None, which generally results in the default parameters being used. The only exception to this is if the object is an element of a CF iterable (see below).

In a comparison between objects x and y (x==y, for example), the values used for relative and absolute tolerances are the first found from each column of the following table:

Relative tolerance Absolute tolerance
x._rtol x._atol
y._rtol y._atol
cf.DEFAULT_RTOL cf.DEFAULT_ATOL

If the _rtol and _atol attributes of iterable CF objects (those subclassed from CfDict or CfList, such as SpaceList) exist, then they are used if and only if an element does not have its own numeric value of a tolerance parameter.

Grid structure

A grid contains any number of dimension coordinate constructs (or ‘dimensions’ for short), auxiliary coordinate constructs, cell measure constructs and transforms.

The grid must have dimensionality attributes (even if they are null, i.e. empty dictionaries) but all other components are optional.

Dimensionality

The dimensions of the grid, and of its associated space, are given by the dimension_sizes and dimension attributes.

The dimension_sizes attribute is a dictionary whose keys are dimension identifiers and values are positive integer dimension sizes. A dimension identifier is the string ‘dim’ suffixed by an arbitrary, but unique, integer. For example:

>>> g
<CF Grid: (30, 24, 1, 17)>
>>> g.dimension_sizes
{'dim0': 1, 'dim1': 17, 'dim2': 30, 'dim3': 24}

The dimension attribute specifies which dimensions relate to each grid component (coordinates and cell measures) and to the data array of the space which holds the grid. For example:

>>> g.dimensions
{'data': ['dim1', 'dim2', 'dim3'],
 'aux0': ['dim2', 'dim3'],
 'aux1': ['dim2', 'dim3'],
 'cm0' : ['dim2', 'dim3'],
 'dim0': ['dim0'],
 'dim1': ['dim1'],
 'dim2': ['dim2'],
 'dim3': ['dim3']}

Each value of this dictionary is an ordered list which corresponds to the shape of each component.

Note that if the grid contains dimensions of size 1 then the data of the space may have a lesser dimensionality then the grid (see the ‘data’ key in the above example).

Note that it is possible for a grid dimension to have no associated grid components.

Storage of coordinates and cell measures variables

Keys of the grid are identifiers for each of grid’s coordinate and cell measures variables. The key values are the variable objects themselves. The key names are strings which describe which type of variable they store. Recognized types are:

Key prefix Description
aux Auxiliary coordinate
cm Cell measures
dim Dimension coordinate

A key string comprises one of these prefixes followed by a non-negative integer to discern between grid components of the same type. For example, a grid’s keys may be:

>>> sorted(g.keys())
['aux0', 'aux1', 'cm0', 'dim0', 'dim1', 'dim2', 'dim3']

The non-negative integer suffixing each key is arbitrary but must be unique amongst keys of the same type.

Note that, similarly to the storage of dimension coordinates in a netCDF file, a dimension coordinate’s dimension name must be the same as its grid identifier.

Transforms

If a coordinate has an associated transform (such as projection parameters or a derived dimensional coordinate), then it will have a transform attribute whose value is a key of the grid’s transform attribute, which in turn is a CfDict object (a cf dictionary). The values of this cf dictionary are Transform objects which contain the information required to realise the transform. For example:

>>> g.transform['atmosphere_sigma_coordinate']
<CF Transform: atmosphere_sigma_coordinate transform>
>>> cf.dump(g.transform['atmosphere_sigma_coordinate'])
atmosphere_sigma_coordinate transform
-------------------------------------
Transform['ps'] = <CF Space: surface_air_pressure(30, 24)>
Transform['ptop'] = <CF Space: ptop(1)>
Transform['sigma'] = 'dim0'

Virtual coordinates

If a coordinate has a transform attribute but no data and no specified dimensions (i.e. its entry in the grid’s dimensions dictionary attribute is an empty list) then it should be considered a container for the coordinate implied by the named transform, should it be realized.

cf classes

Class Description Parent class
Variable Base class object
Space Space Variable
Coordinate Dimension or auxiliary coordinate Variable
CoordinateBounds Coordinate bounds Variable
CellMeasures Cell measures Variable
CfDict Base class MutableMapping
Grid Grid CfDict
Transform Coordinate transforms CfDict
CfList Base class MutableSequence
VariableList List of variables CfList
SpaceList List of spaces VariableList
CellMethods Cell methods CfList
Comparison Comparison expression object

The composition of a space may be cast in terms of the class of each space element type:

Note

Ancillary variables are stored in a SpaceList object

cf functions

Function Description
dump() Print the string returned from an object’s dump() method.
eq() Create a Comparison object.
equals() Determine whether two objects are congruent.
ge() Create a Comparison object.
gt() Create a Comparison object.
inside() Create a Comparison object.
le() Create a Comparison object.
lt() Create a Comparison object.
ne() Create a Comparison object.
outside() Create a Comparison object.
read() Read spaces from netCDF files.
read1() Read a space from netCDF files.
write() Write spaces to a netCDF file.

Indices and tables