Field manipulation
==================

Manipulating a field generally involves operating on its data array
and making any necessary changes to the field's space to make it
consistent with the new array.


Data array
----------

A field's data array is stored by the *Data* attribute as a
:class:`.Data` object:

>>> type(f.Data)
<class 'cf.data.Data'>

This :class:`.Data` object:

* Contains an N-dimensional array with many similarities to a
  :ref:`numpy array <numpy:arrays>`

* Contains the :ref:`units <units>` of the array elements.

* Uses :ref:`LAMA functionality <LAMA>` to store and operate on arrays
  which are larger then the available memory.

* Supports masked arrays [1]_, regardless of whether or not it was
  initialized with a masked array.


Data mask
^^^^^^^^^

The data array's mask may be retrieved and deleted with the field's
:attr:`~cf.Field.mask` attribute. The mask is returned as a
:class:`.Data` object:

>>> f.shape
(12, 73, 96)
>>> m = f.mask
>>> type(m)
<cf.data.Data>
>>> m.dtype
dtype('bool')
>>> m.shape
[12, 73, 96]
>>> m.array.shape
(12, 73, 96)

>>> del f.mask
>>> f.array.mask
False
>>> import numpy
>>> f.array.mask is numpy.ma.nomask
True


Conversion to a numpy array
^^^^^^^^^^^^^^^^^^^^^^^^^^^

A field's data array may be converted to either a numpy array view
(:meth:`numpy.ndarray.view`) or an independent numpy array of the
underlying data with the :meth:`~cf.Field.varray` and
:meth:`~cf.Field.array` attributes respectively:

>>> a = d.array
>>> type(a)
<class 'numpy.ndarray'>
>>> v = d.varray
>>> type(v)
<class 'numpy.ndarray'>
>>> type(v.base)
<class 'numpy.ndarray'>

Changing a numpy array view in place will also change the data array:

>>> d.array
array([1, 2, 3])
>>> v = d.varray
>>> v[0] = -999
>>> d.array
array([-999,    2,    3])

.. warning:: The numpy array created with the *array* or *varray*
             attribute forces all of the data to be read into memory
             at the same time, which may not be possible for very
             large arrays.


Copying
-------

A deep copy of a variable may be created with its
:meth:`~cf.Field.copy` method or equivalently with the
:func:`copy.deepcopy` function:

>>> g = f.copy()
>>> import copy
>>> g = copy.deepcopy(f)

Copying utilizes :ref:`LAMA copying functionality <LAMA_copying>`.

.. _Subsetting:

Subsetting
----------

Subetting a field means subsetting its data array and its space in a
consistent manner.

A field may be subsetted with its :attr:`~cf.Field.subset`
attribute. This attribute may be **indexed** to select a subset from
dimension index values (``f.subset[indices]``) or **called** to select
a subset from dimension coordinate array values
(``f.subset(coordinate_values)``):

>>> g = f.subset[0, ...]
>>> g = f.subset(latitude=30, longitude=cf.inside(0, 90, 'degrees'))

The result of subsetting a field is a new field whose data array
and, crucially, any data arrays within the field's metadata (such as
coordinates, for example) are subsets of their originals:

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> g = f.subset[-1, :, 48::-1]
>>> print g
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(1) = [345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(49) = [180, ..., 0] degrees_east
                : height(1) = [2] m
Auxiliary coords:

The new subsetted field is independent of the original. Subsetting utilizes
:ref:`LAMA subsetting functionality <LAMA_subsetting>`.

.. _indexing:

Indexing
^^^^^^^^

Subsetting by dimension indices uses an extended Python slicing
syntax, which is similar :ref:`numpy array indexing
<numpy:arrays.indexing>`. There are two extensions to the numpy
indexing functionality:

* **Size 1 dimensions are never removed.**

  An integer index `i` takes the `i`-th element but does not reduce
  the rank of the output array by one.

* **When advanced indexing is used on more than one dimension, the
  advanced indices work independently.**

  When more than one dimension's slice is a 1-d boolean array or 1-d
  sequence of integers, then these indices work independently along
  each dimension (similar to the way vector subscripts work in
  Fortran), rather than by their elements.

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> f.shape
(12, 73, 96)
>>> f.subset[...].shape
(12, 73, 96)
>>> f.subset[0].shape
(1, 73, 96)
>>> f.subset[0,...].shape
(1, 73, 96)
>>> f.subset[::-1].shape
(12, 73, 96)
>>> f.subset[0:5, ..., slice(None, None, 2)].shape
(5, 73, 48)
>>> lon = f.coord('longitude').array
>>> f.subset[..., lon<90]
(12, 73, 24)
>>> f.subset[[1,2], [1,2,3], [1,2,3,4]].shape
(2, 3, 4)

Note that the indices of the last example above would raise an error
when given to a numpy array.

Coordinate values
^^^^^^^^^^^^^^^^^

Subsetting by coordinate values allows a subsetted field to be defined
by particular coordinate values of its space.

Subsetting by coordinate values is functionally equivalent to
subsetting by :ref:`indexing <indexing>` -- internally, the selected
coordinate values are in fact converted to dimension indices.

Coordinate values are provided as arguments to a **call** to the
*subset* method.

The benefits to subsetting in this fashion are:

* The dimensions to be subsetted are identified by name.

* The position in the data array of each dimension need not be known.

* Dimensions for which no subsetting is required need not be
  specified.

* Size 1 dimensions of the space which are not spanned by the data
  array may be specified.

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:
>>> f.subset(latitude=0).shape
(12, 1, 96)
>>> f.subset(latitude=cf.inside(-30, 30)).shape
(12, 25, 96)
>>> f.subset(longitude=cf.ge(270, 'degrees_east'), latitude=[0, 2.5, 10]).shape
(12, 3, 24)
>>> f.subset(latitude=cf.lt(0, 'degrees_north'))
(12, 36, 96)
>>> f.subset(latitude=[cf.lt(0, 'degrees_north'), 90])
(12, 37, 96)
>>> import math
>>> f.subset(longitude=cf.lt(math.pi, 'radian'), height=2)
(12, 73, 48)
>>> f.subset(height=cf.gt(3))
IndexError: No indices found for 'height' values gt 3

Note that if a comparison function (such as :func:`~cf.inside`) does
not specify any units, then the units of the named coordinate are
assumed.


Selection
---------

Fields may be tested for matching given conditions and selected
according to those matches with the :meth:`~cf.Field.match` and
:meth:`~cf.Field.extract` methods. Conditions may be given on:

* The field's standard and non-standard attributes (*attr* parameter).

* Any other of the field's attributes (*priv* parameter).

* The field's coordinate values (*coord* parameter).

* The field's coordinate cell sizes (*cellsize* parameter).

>>> f
<CF Field: air_temperature(12, 73, 96)>
>>> f.match(priv={'ncvar': 'tas'})
True
>>> g = f.extract(priv={'ncvar': 'tas'})
>>> g is f
True

>>> f
[<CF Field: eastward_wind(110, 106)>,
 <CF Field: air_temperature(12, 73, 96)>]
>>> f.match(attr={'standard_name': '.*temperature'})
[False, True]
>>> g = f.extract(attr={'standard_name': '.*temperature'}, coord={'longitude': 0})
>>> g
[<CF Field: air_temperature(12, 73, 96)>]

All of these keywords may be used with the :func:`~cf.read` function
to select on input:

>>> f = cf.read('file*.nc', attr={'standard_name': '.*temperature'}, coord={'longitude': 0})

Aggregation
-----------

Fields are aggregated into as few multidimensional fields as possible
with the :func:`~cf.aggregate` function:

>>> f
[<CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: eastward_wind(110, 106)>,
 <CF Field: air_temperature(12, 73, 96)>,
 <CF Field: air_temperature(96, 73)>]
>>> cf.aggregate(f)
>>> f
[<CF Field: eastward_wind(3, 2, 110, 106)>,
 <CF Field: air_temperature(13, 73, 96)>]

By default, the the fields return from the :func:`~cf.read` function
have been aggregated:

>>> f = cf.read('file*.nc')
>>> len(f)
1
>>> f = cf.read('file*.nc', aggregate=False)
>>> len(f)
12

Aggregation implements the `CF aggregation rules
<http://www.met.reading.ac.uk/~david/cf_aggregation_rules.html>`_.


Assignment
----------

In-place assignment to a field's data array may be done by assigning
to the field's indexed :attr:`~cf.Field.subset` attribute, observing
the :mod:`numpy broadcasting rules <numpy.doc.broadcasting>`.

Assigning to a subset uses :ref:`LAMA functionality <LAMA>`, so it is possible
to assign to subsets which are larger than the available memory.

>>> print f
Data            : air_temperature(time, latitude, longitude)
Cell methods    : time: mean
Dimensions      : time(12) = [15, ..., 345] days since 1860-1-1
                : latitude(73) = [-90, ..., 90] degrees_north
                : longitude(96) = [0, ..., 356.25] degrees_east
                : height(1) = [2] m
Auxiliary coords:

>>> f.subset[0] = 273.15
>>> f.subset[0, 0] = 273.15
>>> f.subset[0, 0, 0] = 273.15
>>> f.subset[11, :, :] = numpy.arange(96)

In-place assignment may also be done by creating and then changing a
numpy array view (:meth:`numpy.ndarray.view`) of the data with the
:attr:`~cf.Field.varray` attribute:

>>> f.subset[0, 0, 0].array.item()
287.567
>>> a = f.varray
>>> type(a)
<type 'numpy.ndarray'>
>>> a[0, 0, 0] = 300
>>> f.first_datum
300.0

.. warning:: The numpy array created with the *varray* attribute
             forces all of the data to be read into memory at the same
             time, which may not be possible for very large arrays.


Arithmetic and comparison
-------------------------

Arithmetic and comparison operations on a field are defined as
element-wise operations on the field's data array, and return a field
as the result:

* When using a field in unary or binary arithmetic operations (such as
  ``abs()``, ``+`` or ``**``) a new, independent field is created with
  a modified data array.

* When using a field in augmented arithmetic operations (such as
  ``-=``), the field's data array is modified in place.

* When using a field in comparison operations (such as ``<`` or
  ``!=``) a new, independent field is created with a boolean data
  array.

A field's data array is modified in a very similar way to how a numpy
array would be modified in the same operation, i.e. :ref:`broadcasting
<broadcasting>` ensures that the operands are compatible and the data
array is modified element-wise.

Broadcasting is metadata-aware and will automatically account for
arbitrary configurations, such as dimension order, but will not allow
incompatible metadata to be combined, such as adding a field of
height to one of temperature.

The :ref:`resulting field's metadata <resulting_metadata>` will be very
similar to that of the operands which are also fields. Differences arise when
the existing metadata can not correctly describe the newly created field. For
example, when dividing a field with units of *metres* by one with units of
*seconds*, the resulting field will have units of *metres/second*.

Arithmetic and comparison utilizes :ref:`LAMA functionality <LAMA>` so
data arrays larger than the available physical memory may be operated
on.


.. _broadcasting:

Broadcasting
^^^^^^^^^^^^

The term broadcasting describes how data arrays of the operands with
different shapes are treated during arithmetic and comparison
operations. Subject to certain constraints, the smaller array is
"broadcast" across the larger array so that they have compatible
shapes.

The general broadcasting rules are similar to the :mod:`broadcasting
rules implemented in numpy <numpy.doc.broadcasting>`, the only
difference being when both operands are fields, in which case the
fields are temporarily conformed so that:

* Dimensions are aligned according to the coordinates' metadata to
  ensure that matching dimensions are broadcast against each other.

* Common dimensions have matching units.

* Common dimensions have matching axis directions.

This restructuring of the field ensures that the matching dimensions
are broadcast against each other.

Broadcasting is done without making needless copies of data and so is
usually very efficient.


What a field may be combined with
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A field may be combined or compared with the following objects:

+---------------+----------------------------------------------------+
| Object        | Description                                        |
+===============+====================================================+
|:obj:`int`,    | The field's data array is combined with            |
|:obj:`long`,   | the python scalar                                  |
|:obj:`float`   |                                                    |
+---------------+----------------------------------------------------+
|:class:`.Data` | The field's data array                             |
|with size 1    | is combined with the :class:`.Data` object's scalar|
|               | value, taking into account:                        |
|               |                                                    |
|               | * Different but equivalent units                   |
+---------------+----------------------------------------------------+
|:class:`.Field`| The two field's must satisfy the field combination |
|               | rules. The fields' data arrays and spaces are      |
|               | combined taking into account:                      |
|               |                                                    |
|               | * Identities of dimensions                         |
|               | * Different but equivalent units                   |
|               | * Different dimension orders                       |
|               | * Different dimension directions                   |
+---------------+----------------------------------------------------+


A field may appear on the left or right hand side of an operator, but
note the following warning:

.. warning::

   Combining a numpy array on the *left* with a field on the right
   does work, but will give generally unintended results -- namely a
   numpy array of fields.


.. _resulting_metadata:

Resulting metadata
^^^^^^^^^^^^^^^^^^

When creating any new field, the field's `history` attribute is updated to
record the operation. The fields existing name-like attributes may also need
to be changed:

>>> f.standard_name
'air_temperature'
>>> f += 2
>>> f.standard_name
'air_temperature'
>>> f.history
'air_temperature+2'

>>> f.standard_name
'air_temperature'
>>> f **= 2
>>> f.standard_name
AttributeError: 'Field' object has no attribute 'standard_name'
>>> f.history
'air_temperature**2'

>>> f.standard_name, g.standard_name
('air_temperature', 'eastward_wind')
>>> h = f * g
>>> h.standard_name
AttributeError: 'Field' object has no attribute 'standard_name'
>>> h.long_name
'air_temperature*eastward_wind'
>>> h.history
'air_temperature*eastward_wind'

When creating a new field which has different physical properties to
the input field(s) the units will also need to be changed:

>>> f.units
'K'
>>> f += 2
>>> f.units
'K'

>>> f.units
'K'
>>> f **= 2
>>> f.units
'K2'

>>> f.units, g.units
('m', 's')
>>> h = f / g
>>> h.units
'm/s'

When creating a new field which has a different space to the input fields, the
new space will in general contain the superset of dimensions from the two
input fields, but may not have some of either input field's auxiliary
coordinates or size 1 dimension coordinates. Refer to the field combination
rules for details.

Overloaded operators
^^^^^^^^^^^^^^^^^^^^

A field defines the following overloaded operators for arithmetic and comparison.

.. admonition:: Rich comparison operators

   +-------------------+---------------+
   | Operator          | Method        |
   +===================+===============+
   | ``<``             | __lt__()      |
   +-------------------+---------------+
   | ``<=``            | __le__()      |
   +-------------------+---------------+
   | ``==``            | __eq__()      |
   +-------------------+---------------+
   | ``!=``            | __ne__()      |
   +-------------------+---------------+
   | ``>``             | __gt__()      |
   +-------------------+---------------+
   | ``>=``            | __ge__()      |
   +-------------------+---------------+

.. admonition:: Binary arithmetic operators

   +-------------------+-------------------------------+
   | Operator          | Methods                       |
   +===================+==============+================+
   | ``+``             |__add__()     | __radd__()     |
   +-------------------+--------------+----------------+
   | ``-``             |__sub__()     | __rsub__()     |
   +-------------------+--------------+----------------+
   | ``*``             |__mul__()     | __rmul__()     |
   +-------------------+--------------+----------------+
   | ``/``             |__div__()     | __rdiv__()     |
   +-------------------+--------------+----------------+ 
   | ``/``             |__truediv__() | __rtruediv__() |
   +-------------------+--------------+----------------+ 
   | ``//``            |__floordiv__()| __rfloordiv__()| 
   +-------------------+--------------+----------------+
   | ``%``             |__mod__()     | __rmod__()     |
   +-------------------+--------------+----------------+
   | ``divmod()``      |__divmod__()  | __rdivmod__()  | 
   +-------------------+--------------+----------------+
   | ``**``, ``pow()`` |__pow__()     | __rpow__()     | 
   +-------------------+--------------+----------------+
   | ``&``             |__and__()     | __rand__()     | 
   +-------------------+--------------+----------------+
   | ``^``             |__xor__()     | __rxor__()     | 
   +-------------------+--------------+----------------+
   | ``|``             |__or__()      | __ror__()      |
   +-------------------+--------------+----------------+
   
.. admonition:: Augmented arithmetic operators
   
   +-------------------+---------------+
   | Operator          | Method        |
   +===================+===============+
   | ``+=``            |__iadd__()     |
   +-------------------+---------------+
   | ``-=``            |__isub__()     |
   +-------------------+---------------+
   | ``*=``            |__imul__()     |
   +-------------------+---------------+
   | ``/``             |__idiv__()     |
   +-------------------+---------------+ 
   | ``/``             |__itruediv__() |
   +-------------------+---------------+ 
   | ``//=``           |__ifloordiv__()| 
   +-------------------+---------------+
   | ``%=``            |__imod__()     |
   +-------------------+---------------+
   | ``**=``           |__ipow__()     | 
   +-------------------+---------------+
   | ``&=``            |__iand__()     | 
   +-------------------+---------------+
   | ``^=``            |__ixor__()     | 
   +-------------------+---------------+
   | ``|=``            |__ior__()      |
   +-------------------+---------------+
   
.. admonition:: Unary arithmetic operators
   
   +-------------------+---------------+
   | Operator          | Method        |
   +===================+===============+
   | ``-``             |__neg__()      |
   +-------------------+---------------+
   | ``+``             |__pos__()      |
   +-------------------+---------------+
   | ``abs()``         |__abs__()      |
   +-------------------+---------------+
   | ``~``             |__invert__()   |
   +-------------------+---------------+


Manipulation routines
---------------------

A field has attributes and methods which return information about its
data array or manipulate the data array in some manner. Many of these
behave similarly to their numpy counterparts with the same name but,
where appropriate, return :class:`.Field` objects rather than numpy
arrays.

.. admonition:: Attributes

   .. tabularcolumns:: |l|l|l|

   ==============================  ===============================  ===========================
   Field attribute                 Description                      Numpy counterpart
   ==============================  ===============================  ===========================
   :meth:`~cf.Field.size`          Number of elements in the data   :attr:`numpy.ndarray.size`
                                   array
   :meth:`~cf.Field.shape`         Tuple of the data array's        :attr:`numpy.ndarray.shape`
                                   dimension sizes
   :meth:`~cf.Field.ndim`          Number of dimensions in the      :attr:`numpy.ndarray.ndim`
                                   data array 
   :meth:`~cf.Field.dtype`         Numpy data-type of the data      :attr:`numpy.ndarray.dtype`
                                   array     
   ==============================  ===============================  ===========================

.. admonition:: Methods

   .. tabularcolumns:: |l|l|l|

   ==============================  ==============================  =========================
   Field method                    Description                     Numpy counterpart
   ==============================  ==============================  =========================
   :meth:`~cf.Field.expand_dims`   Expand the shape of the data    :func:`numpy.expand_dims`
                                   array
   :meth:`~cf.Field.reverse_dims`  Reverse the directions of data
                                   array axes
   :meth:`~cf.Field.squeeze`       Remove size 1 dimensions from   :func:`numpy.squeeze`
                                   the field's data array
   :meth:`~cf.Field.transpose`     Permute the dimensions of the   :func:`numpy.transpose`
                                   data array
   :meth:`~cf.Field.unsqueeze`     Insert size 1 dimensions from
                                   the field's space into its
                                   data array
   ==============================  ==============================  =========================


Manipulating other variables
----------------------------

Subsetting, assignment, arithmetic and comparison operations on other
:class:`.Variable` types (such as :class:`.Coordinate`,
:class:`.CoordinateBounds`, :class:`.CellMeasures`) are very similar
to those for fields.

In general, different dimension identities, different dimension orders
and different dimension directions are not considered, since these
objects do not contain the coordinate system required to define these
properties (unlike a field).

Coordinates
^^^^^^^^^^^

Coordinates are a special case as they may contain a data array for
their coordinate bounds which needs to be treated consistently with
the main coordinate array:

>>> type(c)
<cf.coordinate.Coordinate>
>>> type(c.bounds)
<cf.coordinate.CoordinateBounds>
>>> c.shape
(12,)
>>> c.bounds.shape
(12, 2)
>>> d = c.subset[0:6]
>>> d.shape
(6,)
>>> d.bounds.shape
(6, 2)

.. warning:: 

   If the coordinate bounds are operated on directly, consistency with the
   parent coordinate may be broken.

----

.. rubric:: Footnotes

.. [1] Arrays that may have missing or invalid entries