Field manipulation

Manipulating a field generally involves operating on its data array and making any necessary changes to the field’s space to make it consistent with the new array.

Data array

A field’s data array is stored by the Data attribute as a Data object:

>>> type(f.Data)
<class 'cf.data.Data'>

This Data object:

  • Contains an N-dimensional array with many similarities to a numpy array
  • Contains the units of the array elements.
  • Uses LAMA functionality to store and operate on arrays which are larger then the available memory.
  • Supports masked arrays [1], regardless of whether or not it was initialized with a masked array.

Data mask

The data array’s mask may be retrieved and deleted with the field’s mask attribute. The mask is returned as a Data object:

>>> f.shape
(12, 73, 96)
>>> m = f.mask
>>> type(m)
<cf.data.Data>
>>> m.dtype
dtype('bool')
>>> m.shape
[12, 73, 96]
>>> m.array.shape
(12, 73, 96)
>>> del f.mask
>>> f.array.mask
False
>>> import numpy
>>> f.array.mask is numpy.ma.nomask
True

Conversion to a numpy array

A field’s data array may be converted to either a numpy array view (numpy.ndarray.view) or an independent numpy array of the underlying data with the varray and array attributes respectively:

>>> a = d.array
>>> type(a)
<class 'numpy.ndarray'>
>>> v = d.varray
>>> type(v)
<class 'numpy.ndarray'>
>>> type(v.base)
<class 'numpy.ndarray'>

Changing a numpy array view in place will also change the data array:

>>> d.array
array([1, 2, 3])
>>> v = d.varray
>>> v[0] = -999
>>> d.array
array([-999,    2,    3])

Warning

The numpy array created with the array or varray attribute forces all of the data to be read into memory at the same time, which may not be possible for very large arrays.

Copying

A deep copy of a variable may be created with its copy method or equivalently with the copy.deepcopy function:

>>> g = f.copy()
>>> import copy
>>> g = copy.deepcopy(f)

Copying utilizes LAMA copying functionality.

Subsetting

Subetting a field means subsetting its data array and its space in a consistent manner.

A field may be subsetted with its subset attribute. This attribute may be indexed to select a subset from dimension index values (f.subset[indices]) or called to select a subset from dimension coordinate array values (f.subset(coordinate_values)):

>>> g = f.subset[0, ...]
>>> g = f.subset(latitude=30, longitude=cf.inside(0, 90, 'degrees'))

The result of subsetting a field is a new field whose data array and, crucially, any data arrays within the field’s metadata (such as coordinates, for example) are subsets of their originals:

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> g = f.subset[-1, :, 48::-1]
>>> print g
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(1) = [345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(49) = [180, ..., 0] degrees_east
                : height(1) = [2] m
Auxiliary coords:

The new subsetted field is independent of the original. Subsetting utilizes LAMA subsetting functionality.

Indexing

Subsetting by dimension indices uses an extended Python slicing syntax, which is similar numpy array indexing. There are two extensions to the numpy indexing functionality:

  • Size 1 dimensions are never removed.

    An integer index i takes the i-th element but does not reduce the rank of the output array by one.

  • When advanced indexing is used on more than one dimension, the advanced indices work independently.

    When more than one dimension’s slice is a 1-d boolean array or 1-d sequence of integers, then these indices work independently along each dimension (similar to the way vector subscripts work in Fortran), rather than by their elements.

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> f.shape
(12, 73, 96)
>>> f.subset[...].shape
(12, 73, 96)
>>> f.subset[0].shape
(1, 73, 96)
>>> f.subset[0,...].shape
(1, 73, 96)
>>> f.subset[::-1].shape
(12, 73, 96)
>>> f.subset[0:5, ..., slice(None, None, 2)].shape
(5, 73, 48)
>>> lon = f.coord('longitude').array
>>> f.subset[..., lon<90]
(12, 73, 24)
>>> f.subset[[1,2], [1,2,3], [1,2,3,4]].shape
(2, 3, 4)

Note that the indices of the last example above would raise an error when given to a numpy array.

Coordinate values

Subsetting by coordinate values allows a subsetted field to be defined by particular coordinate values of its space.

Subsetting by coordinate values is functionally equivalent to subsetting by indexing – internally, the selected coordinate values are in fact converted to dimension indices.

Coordinate values are provided as arguments to a call to the subset method.

The benefits to subsetting in this fashion are:

  • The dimensions to be subsetted are identified by name.
  • The position in the data array of each dimension need not be known.
  • Dimensions for which no subsetting is required need not be specified.
  • Size 1 dimensions of the space which are not spanned by the data array may be specified.
>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> f.subset(latitude=0).shape
(12, 1, 96)
>>> f.subset(latitude=cf.inside(-30, 30)).shape
(12, 25, 96)
>>> f.subset(longitude=cf.ge(270, 'degrees_east'), latitude=[0, 2.5, 10]).shape
(12, 3, 24)
>>> f.subset(latitude=cf.lt(0, 'degrees_north'))
(12, 36, 96)
>>> f.subset(latitude=[cf.lt(0, 'degrees_north'), 90])
(12, 37, 96)
>>> import math
>>> f.subset(longitude=cf.lt(math.pi, 'radian'), height=2)
(12, 73, 48)
>>> f.subset(height=cf.gt(3))
IndexError: No indices found for 'height' values gt 3

Note that if a comparison function (such as inside) does not specify any units, then the units of the named coordinate are assumed.

Selection

Fields may be tested for matching given conditions and selected according to those matches with the match and extract methods. Conditions may be given on:

  • The field’s standard and non-standard attributes (attr parameter).
  • Any other of the field’s attributes (priv parameter).
  • The field’s coordinate values (coord parameter).
  • The field’s coordinate cell sizes (cellsize parameter).
>>> f
<CF Field: air_temperature(12, 73, 96)>
>>> f.match(priv={'ncvar': 'tas'})
True
>>> g = f.extract(priv={'ncvar': 'tas'})
>>> g is f
True
>>> f
[<CF Field: eastward_wind(110, 106)>,
 <CF Field: air_temperature(12, 73, 96)>]
>>> f.match(attr={'standard_name': '.*temperature'})
[False, True]
>>> g = f.extract(attr={'standard_name': '.*temperature'}, coord={'longitude': 0})
>>> g
[<CF Field: air_temperature(12, 73, 96)>]

All of these keywords may be used with the read function to select on input:

>>> f = cf.read('file*.nc', attr={'standard_name': '.*temperature'}, coord={'longitude': 0})

Aggregation

Fields are aggregated into as few multidimensional fields as possible with the aggregate function:

>>> f
[<CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: air_temperature(12, 73, 96)>,
 <CF Field: air_temperature(96, 73)>]
>>> cf.aggregate(f)
>>> f
[<CF Field: eastward_wind(3, 2, 110, 106)>,
 <CF Field: air_temperature(13, 73, 96)>]

By default, the the fields return from the read function have been aggregated:

>>> f = cf.read('file*.nc')
>>> len(f)
1
>>> f = cf.read('file*.nc', aggregate=False)
>>> len(f)
12

Aggregation implements the CF aggregation rules.

Assignment

In-place assignment to a field’s data array may be done by assigning to the field’s indexed subset attribute, observing the numpy broadcasting rules.

Assigning to a subset uses LAMA functionality, so it is possible to assign to subsets which are larger than the available memory.

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> f.subset[0] = 273.15
>>> f.subset[0, 0] = 273.15
>>> f.subset[0, 0, 0] = 273.15
>>> f.subset[11, :, :] = numpy.arange(96)

In-place assignment may also be done by creating and then changing a numpy array view (numpy.ndarray.view) of the data with the varray attribute:

>>> f.subset[0, 0, 0].array.item()
287.567
>>> a = f.varray
>>> type(a)
<type 'numpy.ndarray'>
>>> a[0, 0, 0] = 300
>>> f.first_datum
300.0

Warning

The numpy array created with the varray attribute forces all of the data to be read into memory at the same time, which may not be possible for very large arrays.

Arithmetic and comparison

Arithmetic and comparison operations on a field are defined as element-wise operations on the field’s data array, and return a field as the result:

  • When using a field in unary or binary arithmetic operations (such as abs(), + or **) a new, independent field is created with a modified data array.
  • When using a field in augmented arithmetic operations (such as -=), the field’s data array is modified in place.
  • When using a field in comparison operations (such as < or !=) a new, independent field is created with a boolean data array.

A field’s data array is modified in a very similar way to how a numpy array would be modified in the same operation, i.e. broadcasting ensures that the operands are compatible and the data array is modified element-wise.

Broadcasting is metadata-aware and will automatically account for arbitrary configurations, such as dimension order, but will not allow incompatible metadata to be combined, such as adding a field of height to one of temperature.

The resulting field’s metadata will be very similar to that of the operands which are also fields. Differences arise when the existing metadata can not correctly describe the newly created field. For example, when dividing a field with units of metres by one with units of seconds, the resulting field will have units of metres/second.

Arithmetic and comparison utilizes LAMA functionality so data arrays larger than the available physical memory may be operated on.

Broadcasting

The term broadcasting describes how data arrays of the operands with different shapes are treated during arithmetic and comparison operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

The general broadcasting rules are similar to the broadcasting rules implemented in numpy, the only difference being when both operands are fields, in which case the fields are temporarily conformed so that:

  • Dimensions are aligned according to the coordinates’ metadata to ensure that matching dimensions are broadcast against each other.
  • Common dimensions have matching units.
  • Common dimensions have matching axis directions.

This restructuring of the field ensures that the matching dimensions are broadcast against each other.

Broadcasting is done without making needless copies of data and so is usually very efficient.

What a field may be combined with

A field may be combined or compared with the following objects:

Object Description
int, long, float The field’s data array is combined with the python scalar
Data with size 1

The field’s data array is combined with the Data object’s scalar value, taking into account:

  • Different but equivalent units
Field

The two field’s must satisfy the field combination rules. The fields’ data arrays and spaces are combined taking into account:

  • Identities of dimensions
  • Different but equivalent units
  • Different dimension orders
  • Different dimension directions

A field may appear on the left or right hand side of an operator, but note the following warning:

Warning

Combining a numpy array on the left with a field on the right does work, but will give generally unintended results – namely a numpy array of fields.

Resulting metadata

When creating any new field, the field’s history attribute is updated to record the operation. The fields existing name-like attributes may also need to be changed:

>>> f.standard_name
'air_temperature'
>>> f += 2
>>> f.standard_name
'air_temperature'
>>> f.history
'air_temperature+2'
>>> f.standard_name
'air_temperature'
>>> f **= 2
>>> f.standard_name
AttributeError: 'Field' object has no attribute 'standard_name'
>>> f.history
'air_temperature**2'
>>> f.standard_name, g.standard_name
('air_temperature', 'eastward_wind')
>>> h = f * g
>>> h.standard_name
AttributeError: 'Field' object has no attribute 'standard_name'
>>> h.long_name
'air_temperature*eastward_wind'
>>> h.history
'air_temperature*eastward_wind'

When creating a new field which has different physical properties to the input field(s) the units will also need to be changed:

>>> f.units
'K'
>>> f += 2
>>> f.units
'K'
>>> f.units
'K'
>>> f **= 2
>>> f.units
'K2'
>>> f.units, g.units
('m', 's')
>>> h = f / g
>>> h.units
'm/s'

When creating a new field which has a different space to the input fields, the new space will in general contain the superset of dimensions from the two input fields, but may not have some of either input field’s auxiliary coordinates or size 1 dimension coordinates. Refer to the field combination rules for details.

Overloaded operators

A field defines the following overloaded operators for arithmetic and comparison.

Rich comparison operators

Operator Method
< __lt__()
<= __le__()
== __eq__()
!= __ne__()
> __gt__()
>= __ge__()

Binary arithmetic operators

Operator Methods
+ __add__() __radd__()
- __sub__() __rsub__()
* __mul__() __rmul__()
/ __div__() __rdiv__()
/ __truediv__() __rtruediv__()
// __floordiv__() __rfloordiv__()
% __mod__() __rmod__()
divmod() __divmod__() __rdivmod__()
**, pow() __pow__() __rpow__()
& __and__() __rand__()
^ __xor__() __rxor__()
| __or__() __ror__()

Augmented arithmetic operators

Operator Method
+= __iadd__()
-= __isub__()
*= __imul__()
/ __idiv__()
/ __itruediv__()
//= __ifloordiv__()
%= __imod__()
**= __ipow__()
&= __iand__()
^= __ixor__()
|= __ior__()

Unary arithmetic operators

Operator Method
- __neg__()
+ __pos__()
abs() __abs__()
~ __invert__()

Manipulation routines

A field has attributes and methods which return information about its data array or manipulate the data array in some manner. Many of these behave similarly to their numpy counterparts with the same name but, where appropriate, return Field objects rather than numpy arrays.

Attributes

Field attribute Description Numpy counterpart
size Number of elements in the data array numpy.ndarray.size
shape Tuple of the data array’s dimension sizes numpy.ndarray.shape
ndim Number of dimensions in the data array numpy.ndarray.ndim
dtype Numpy data-type of the data array numpy.ndarray.dtype

Methods

Field method Description Numpy counterpart
expand_dims Expand the shape of the data array numpy.expand_dims
reverse_dims Reverse the directions of data array axes  
squeeze Remove size 1 dimensions from the field’s data array numpy.squeeze
transpose Permute the dimensions of the data array numpy.transpose
unsqueeze Insert size 1 dimensions from the field’s space into its data array  

Manipulating other variables

Subsetting, assignment, arithmetic and comparison operations on other Variable types (such as Coordinate, CoordinateBounds, CellMeasures) are very similar to those for fields.

In general, different dimension identities, different dimension orders and different dimension directions are not considered, since these objects do not contain the coordinate system required to define these properties (unlike a field).

Coordinates

Coordinates are a special case as they may contain a data array for their coordinate bounds which needs to be treated consistently with the main coordinate array:

>>> type(c)
<cf.coordinate.Coordinate>
>>> type(c.bounds)
<cf.coordinate.CoordinateBounds>
>>> c.shape
(12,)
>>> c.bounds.shape
(12, 2)
>>> d = c.subset[0:6]
>>> d.shape
(6,)
>>> d.bounds.shape
(6, 2)

Warning

If the coordinate bounds are operated on directly, consistency with the parent coordinate may be broken.


Footnotes

[1]Arrays that may have missing or invalid entries