cf.Data¶

class cf.Data(data=None, units=None, _FillValue=None, hardmask=True, chunk=False)¶

Bases: object

An N-dimensional data array with units and masked values.

Contains an N-dimensional, indexable and broadcastable array with many similarities to a numpy array.
Contains the units of the array elements.
Supports masked arrays, regardless of whether or not it was initialized with a masked array.
Uses Large Amounts of Massive Arrays (LAMA) functionality to store and operate on arrays which are larger then the available memory.

Indexing

A data array is indexable in a similar way to numpy array but for two important differences:

Size 1 dimensions are never removed.

An integer index i takes the i-th element but does not reduce the rank of the output array by one.
When advanced indexing is used on more than one dimension, the advanced indices work independently.

When more than one dimension’s slice is a 1-d boolean array or 1-d sequence of integers, then these indices work independently along each dimension (similar to the way vector subscripts work in Fortran), rather than by their elements.

Miscellaneous

Data objects are hashable. Note that, since Data objects are mutable, their hash values may change if created at different times.

Examples

>>> d.shape
(12, 19, 73, 96)
>>> d[0, :, [0,1], [0,1,2]].shape
(1, 19, 2, 3)

Initialization

Parameters :

data : array-like, optional: The data for the array.
units : str or Units, optional: The units of the data. By default the array elements are dimensionless.
_FillValue : optional: The fill value of the data. By default the numpy fill value appropriate to the array’s data type will be used.
hardmask : bool, optional: If False then the mask is soft. By default the mask is hard.
chunk : bool, optional: If True then the data array will be partitioned if it is larger than the chunk size. By default the data array will be stored in a single partition.

Examples

>>> d = cf.Data(5)
>>> d = cf.Data([1,2,3], units='K')
>>> import numpy   
>>> d = cf.Data(numpy.arange(10).reshape(2,5), units=cf.Units('m/s'), _FillValue=-999)
>>> d = cf.Data(('f', 'l', 'y'))

Data attributes

`array`	A numpy array copy the data array.
`direction`
`dtype`	The numpy data type of the data array.
`_FillValue`	The _FillValue CF attribute.
`first_datum`	The first element of the data array.
`hardmask`	Whether the mask is hard (True) or soft (False).
`ismasked`	True if the data array has any masked values.
`isscalar`	True if the data array is a 0-d scalar array.
`last_datum`	The last element of the data array.
`mask`	The boolean missing data mask of the data array.
`ndim`	Number of dimensions in the data array.
`order`
`partitions`
`pdims`
`pshape`	List of the data array’s partition dimension sizes.
`psize`	Number of data array partitions.
`shape`	List of the data array’s dimension sizes.
`size`	Number of elements in the data array.
`Units`	The Units object containing the units of the data array.
`varray`	A numpy array view the data array.

Data methods

`add_partitions`	Add partition boundaries.
`all`	Test whether all data array elements evaluate to True.
`any`	Test whether any data array elements evaluate to True.
`binary_mask`	Return a binary missing data mask of the data array.
`change_dimension_names`	Change the dimension names.
`chunk`	Partition the data array
`clip`	Clip (limit) the values in the data array in place.
`copy`	Return a deep copy.
`cos`	Take the trigonometric cosine of the data array in place.
`dump`	Return a string containing a full description of the instance.
`equals`	True if two data arrays are logically equal, False otherwise.
`expand_dims`	no check is done for dim already being in self.order
`expand_partition_dims`	Insert a new size 1 partition dimension in place.
`flat`	Return a flat iterator over elements of the data array.
`flip`	Flip dimensions of the data array in place.
`func`	Apply an element-wise array operation to the data array in place.
`iterindices`	Return an iterator over indices of the data array.
`new_dimension_name`	Return a dimension name not being used by the data array.
`override_units`	Override the data array units in place.
`partition_boundaries`
`save_to_disk`	Put the data array on disk.
`setitem`
`setmask`	Set selected elements of the data array’s mask in place.
`sin`	Take the trigonometric sine of the data array in place.
`squeeze`	Remove size 1 dimensions from the data in place.
`to_disk`	Store each partition’s data on disk in place.
`to_memory`	Store each partition’s data in memory in place if the master array is smaller than the chunk size.
`transpose`	Permute the dimensions of the data array in place.

add_partitions(extra_boundaries, pdim, existing_boundaries=None)¶

Add partition boundaries.

Examples

>>> d.add_partitions(    )

all()¶

Test whether all data array elements evaluate to True.

Performs a logical and over the data array and returns the result. Masked values are considered as True during computation.

Examples

>>> print d.array
[[0 3 0]]
>>> d.all()
False

>>> print d.array
[[1 3 --]]
>>> d.all()
True

any()¶

Test whether any data array elements evaluate to True.

Performs a logical or over the data array and returns the result. Masked values are considered as False during computation.

Examples

>>> print d.array
[[0 0 0]]
>>> d.any()
False

>>> print d.array
[[-- 0 0]]
>>> d.any()
False

>>> print d.array
[[0 3 0]]
>>> d.any()
True

binary_mask()¶

Return a binary missing data mask of the data array.

The binary mask’s data array comprises dimensionless 8-bit integers and has 0 where the data array has missing data and 1 otherwise.

Returns :	out : Data The binary mask.

Examples

>>> print d.mask.array
[[ True False  True False]]
>>> b = d.binary_mask().array
>>> print b
[[0 1 0 1]]

change_dimension_names(dim_name_map)¶

Change the dimension names.

The dimension names are arbitrary (though unique), so mapping them to another arbitrary (though unique) set does not change the data array values, units, dimension directions nor dimension order.

Examples

>>> d.order
['dim0', 'dim1', 'dim2']
>>> dim_name_map
{'dim0': 'dim1',
 'dim1': 'dim0',
 'dim2': 'dim2',
 'dim3': 'dim3'}
>>> d.change_dimension_names(dim_name_map)
>>> d.order
['dim1', 'dim0', 'dim2']

chunk(chunksize=None, extra_boundaries=None, chunk_dims=None)¶

Partition the data array

Parameters :

chunksize : int, optional

extra_boundaries : sequence of lists or tuples, optional

chunk_dims : sequence of lists or tuples, optional

Returns :

extra_boundaries, chunk_dims : list, list

Examples

>>> d.chunk()
>>> d.chunk(100000)
>>> d.chunk(extra_boundaries=([3, 6],), chunk_dims=['dim0'])
>>> d.chunk(extra_boundaries=([3, 6], [40, 80]), chunk_dims=['dim0', 'dim1'])

clip(a_min, a_max, units=None)¶

Clip (limit) the values in the data array in place.

Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified then values smaller than 0 become 0 and values larger than 1 become 1.

Parameters :

a_min : scalar

a_max : scalar

units : str or Units

Returns :	None

Examples

conform_args(save=None, dtype=True, **kwargs)¶

Return a dictionary of arguments for the Partition object’s conform method.

The values are inferred from the state of the Data object and any keyword arguments.

Parameters :

save : bool, optional

dtype : numpy.dtype or None, optional

kwargs :

Returns :

out : dict

Examples

copy()¶

Return a deep copy.

Equivalent to copy.deepcopy(d).

Returns :	out : The deep copy.

Examples

>>> e = d.copy()

cos()¶

Take the trigonometric cosine of the data array in place.

Units are accounted for in the calcualtion, so that the the cosine of 90 degrees_east is 0.0, as is the sine of 1.57079632 radians. If the units are not equivalent to radians (such as Kelvin) then they are treated as if they were radians.

The Units are changed to ‘1’ (nondimensionsal).

Returns :	None

Examples

>>> d.Units
<CF Units: degrees_east>
>>> print d.array
[[-90 0 90 --]]
>>> d.cos()
>>> d.Units
<CF Units: 1>
>>> print d.array
[[0.0 1.0 0.0 --]]

>>> d.Units
<CF Units: m s-1>
>>> print d.array
[[1 2 3 --]]
>>> d.cos()
>>> d.Units
<CF Units: 1>
>>> print d.array
[[0.540302305868 -0.416146836547 -0.9899924966 --]]

dump(id=None)¶

Return a string containing a full description of the instance.

Parameters :	id : str, optional Set the common prefix of component names. By default the instance’s class name is used.
Returns :	out : str A string containing the description.

Examples

>>> x = d.dump()
>>> print d.dump()
>>> print d.dump(id='data1')

equals(other, rtol=None, atol=None, traceback=False)¶

True if two data arrays are logically equal, False otherwise.

Parameters :

other :: The object to compare for equality.
atol : float, optional: The absolute tolerance for all numerical comparisons, By default the value returned by the ATOL function is used.
rtol : float, optional: The relative tolerance for all numerical comparisons, By default the value returned by the RTOL function is used.
traceback : bool, optional: If True then print a traceback highlighting where the two instances differ.

Returns :

out : bool: Whether or not the two instances are equal.

Examples

>>> d.equals(d)
True
>>> d.equals(d + 1)
False

expand_dims(axis=0, dim='None', direction=True)¶

no check is done for dim already being in self.order

Not to be confused with the expand_partitions_dims method.

expand_partition_dims(pdim)¶

Insert a new size 1 partition dimension in place.

The new parition dimension is inserted at position 0.

Not to be confused with the expand_dims method.

Parameters :	axis : str The name of the new partition dimension.
Returns :	None

Examples

>>> d.pdims
['dim0', 'dim1']
>>> d.expand_partition_dims('dim2')
>>> d.pdims
['dim2', 'dim0', 'dim1']

flat(ignore_masked=True)¶

Return a flat iterator over elements of the data array.

Parameters :	ignore_masked : bool, optional If False then masked and unmasked elements will be returned. By default only unmasked elements are returned
Returns :	out : generator An iterator over elements of the data array.

Examples

>>> print d.array
[[1 -- 3]]
>>> for x in d.flat():
...     print x
...
1
3

>>> for x in d.flat(False):
...     print x
...
1
--
3

flip(axes=None)¶

Flip dimensions of the data array in place.

Parameters :	axes : int or sequence of ints Flip the dimensions whose positions are given. By default all dimensions are flipped.
Returns :	out : list of ints The axes which were flipped, in arbitrary order.

Examples

>>> d.flip()
>>> d.flip(1)

>>> e = d[::-1, :, ::-1]
>>> d.flip([2, 0]).equals(e)
True

func(f, *args, **kwargs)¶

Apply an element-wise array operation to the data array in place.

Parameters :	f : function The function to be applied. args, kwargs : Any arguments and keyword arguments passed to the function given by the f paramaeter.
Returns :	None

Examples

>>> print d.array
[[ 0.          1.57079633]
 [ 3.14159265  4.71238898]]
>>> import numpy
>>> d.func(numpy.cos)
>>> print d.array
[[ 1.0  0.0]
 [-1.0  0.0]]
>>> def f(x, y, a=0):
...     return x*y + a
...
>>> d.func(f, 2, a=10)
>>> print d.array
[[ 12.0  10.0]
 [-12.0  10.0]]

iterindices()¶

Return an iterator over indices of the data array.

Returns :	out : generator An iterator over indices of the data array.

Examples

>>> d.shape
(2, 1, 3)
>>> for index in d.iterindices():
...     print index
...
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(1, 0, 0)
(1, 0, 1)
(1, 0, 2)

new_dimension_name()¶

Return a dimension name not being used by the data array.

Note that a partition of the data array may have dimensions which don’t belong to the data array itself.

Returns :	out : str The new dimension name.

Examples

>>> d.order
['dim1', 'dim0']
>>> d.partitions.info('order')
[['dim0', 'dim0'],
 ['dim1', 'dim0', 'dim2']]
>>> d.new_dimension_name()
'dim3'

override_units(units)¶

Override the data array units in place.

Not to be confused with setting the Units attribute to units which are equivalent to the original units. This is different because in this case the new units need not be equivalent to the original ones and the data array elements will not be changed to reflect the new units.

Parameters :	units : str or Units The new units for the data array.
Returns :	None

Examples

>>> d.Units
<CF Units: hPa>
>>> d.first_datum
1012.0
>>> d.override_units('km')
>>> d.Units
<CF Units: km>
>>> d.first_datum
1012.0
>>> d.override_units(cf.Units('watts'))
>>> d.Units
<CF Units: watts>
>>> d.first_datum
1012.0

partition_boundaries()¶

save_to_disk(itemsize=None)¶

Put the data array on disk.

Parameters :	itemsize : int, optional
Returns :	out : bool

setitem(value, indices=Ellipsis, condition=None, masked=None, ref_mask=None, hardmask=None)¶

setmask(value, indices=Ellipsis)¶

Set selected elements of the data array’s mask in place.

The value to which the selected elements of the mask will be set may be any object which is broadcastable across the selected elements. The broadcasted value may be of any data type but will be evaluated as boolean.

Unmasked elements are set to the fill value.

The mask may be effectively removed by setting every element to False with f.setmask(False).

Note that if and only if the value to be assigned is logically scalar and evaluates to True then f.setmask(value, indices) is equivalent to f.setitem(cf.masked, indices). This is consistent with the behaviour of numpy masked arrays.

Parameters :	value : array-like The value to which the selected element s of the mask will be set. Must be an object which is broadcastable across the selected elements. indices : optional Indices of the data array. Only elements of the mask described by the indices are set to value. By default, the entire mask is considered.
Returns :	None

Examples

sin()¶

Take the trigonometric sine of the data array in place.

Units are accounted for in the calcualtion, so that the the sine of 90 degrees_east is 1.0, as is the sine of 1.57079632 radians. If the units are not equivalent to radians (such as Kelvin) then they are treated as if they were radians.

The Units are changed to ‘1’ (nondimensionsal).

Returns :	None

Examples

>>> d.Units
<CF Units: degrees_north>
>>> print d.array
[[-90 0 90 --]]
>>> d.sin()
>>> d.Units
<CF Units: 1>
>>> print d.array
[[-1.0 0.0 1.0 --]]

>>> d.Units
<CF Units: m s-1>
>>> print d.array
[[1 2 3 --]]
>>> d.sin()
>>> d.Units
<CF Units: 1>
>>> print d.array
[[0.841470984808 0.909297426826 0.14112000806 --]]

squeeze(axes=None)¶

Remove size 1 dimensions from the data in place.

Parameters :	axes : (sequence of) int or str The axes to be squeezed. May be one of, or a sequence of any combination of: The integer position of a dimension in the data array (negative indices allowed). The internal name a dimension.
Returns :	out : list of ints The axes which were squeezed as a tuple of their positions.

Examples

>>> v.shape
(1,)
>>> v.squeeze()
>>> v.shape
()

>>> v.shape
(1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1)
>>> v.squeeze((0,))
>>> v.shape
(2, 1, 3, 1, 4, 1, 5, 1, 6, 1)
>>> v.squeeze(1)
>>> v.shape
(2, 3, 1, 4, 1, 5, 1, 6, 1)
>>> v.squeeze([2, 4])
>>> v.shape
(2, 3, 4, 5, 1, 6, 1)
>>> v.squeeze()
>>> v.shape
(2, 3, 4, 5, 6)

to_disk()¶

Store each partition’s data on disk in place.

There is no change to partitions with data that are already on disk.

Returns :	None

Examples

>>> pa.to_disk()

to_memory(regardless=False)¶

Store each partition’s data in memory in place if the master array is smaller than the chunk size.

There is no change to partitions with data that are already in memory.

Parameters :	regardless : bool, optional If True then store all partitions’ data in memory regardless of the size of the master array. By default only store all partitions’ data in memory if the master array is smaller than the chunk size.
Returns :	None

Examples

>>> pa.to_memory()
>>> pa.to_memory(True)

transpose(axes=None)¶

Permute the dimensions of the data array in place.

Parameters :	axes : sequence, optional The new order of the data array. By default, reverse the dimensions’ order, otherwise the axes are permuted according to the values given. The values of the sequence may be any combination of: The integer position of a dimension in the data array. The internal name a dimension.
Returns :	None

Examples

>>> d.ndim
3
>>> d.transpose()
>>> d.transpose([1, 0, 2])
>>> d.transpose(['dim2', 'dim0', 'dim1'])
>>> d.transpose((1, 0, 'dim2'))

Units¶

The Units object containing the units of the data array.

Deleting the Units attribute is equivalent to setting the units the undefined units object, so the Data object is guaranteed to always have the Units attribute.

Examples

>>> d.Units = cf.Units('m')
>>> d.Units
<CF Units: m>
>>> del d.Units
>>> d.Units
<CF Units: >

array¶

A numpy array copy the data array.

Examples

>>> a = d.array
>>> type(a)
<type 'numpy.ndarray'>

direction¶

dtype¶

The numpy data type of the data array.

By default this is the data type with the smallest size and smallest scalar kind to which all data array partitions may be safely cast without loss of information. For example, if the partitions have data types ‘int64’ and ‘float32’ then the data array’s data type will be ‘float64’ or if the partitions have data types ‘int64’ and ‘int32’ then the data array’s data type will be ‘int64’.

Setting the data type to a numpy.dtype object, or any object convertible to a numpy.dtype object, will change the interpretation of the underlying data array elements. Note that the underlying data are not altered, so reinstating the original data type results in no loss of information, even if the interim data type was of smaller size and scalar kind.

Deleting the data type after setting it will reinstate the default behaviour. Deleting the data type when the default behaviour is in place will have no effect.

Examples

>>> d.dtype
dtype('float64')
>>> type(d.dtype)
<type 'numpy.dtype'>
>>> print d.array
[0.5 1.5 2.5]

>>> print d.array
[0.5 1.5 2.5]
>>> import numpy
>>> d.dtype = numpy.dtype(int)
>>> print d.array
[0 1 2]
>>> d.dtype = bool
>>> print d.array
[False True True]
>>> d.dtype = 'float64'
>>> print d.array
[0.5 1.5 2.5]

first_datum¶

The first element of the data array.

Examples

>>> print d.array
[[1 2 3 4]]
>>> d.first_datum
1

>>> print d.array
[[-- 2 3 4]]
>>> d.first_datum
--

hardmask¶

Whether the mask is hard (True) or soft (False).

When the mask is hard, masked entries of the data array can not be unmasked by assignment.

By default, the mask is hard.

Examples

>>> d.hardmask = False
>>> d.hardmask
False

ismasked¶

True if the data array has any masked values.

Examples

>>> d.ismasked
True

isscalar¶

True if the data array is a 0-d scalar array.

Examples

>>> d.ndim
0
>>> d.isscalar
True

>>> d.ndim >= 1
True
>>> d.isscalar
False

last_datum¶

The last element of the data array.

Examples

>>> print d.array
[[1 2 3 4]]
>>> d.last_datum
4

>>> print d.array
[[1 2 3 --]]
>>> d.last_datum
--

mask¶

The boolean missing data mask of the data array.

The boolean mask has True where the data array has missing data and False otherwise.

Examples

>>> d.shape
(12, 73, 96)
>>> m = d.mask
>>> m
<CF Data: >
>>> m.dtype
dtype('bool')
>>> m.shape
(12, 73, 96])

ndim¶

Number of dimensions in the data array.

Examples

>>> d.shape
[73, 96]
>>> d.ndim
2

order¶

pdims¶

pndim¶

pshape¶

List of the data array’s partition dimension sizes.

Note that this attribute is a list, not a tuple.

Examples

>>> d.shape
(73, 96)
>>> d.pshape
[73, 2]

psize¶

Number of data array partitions.

Examples

>>> d.pshape
(73, 2)
>>> d.psize
146

shape¶

List of the data array’s dimension sizes.

Examples

>>> d.shape
(73, 96)

>>> d.shape
()

size¶

Number of elements in the data array.

Examples

>>> d.shape
(73, 96)
>>> d.size
7008

varray¶

A numpy array view the data array.

Note that making changes to elements of the returned view changes the underlying data.

Examples

>>> a = d.varray
>>> type(a)
<type 'numpy.ndarray'>
>>> a
array([0, 1, 2, 3, 4])
>>> a[0] = 999
>>> d.varray
array([999, 1, 2, 3, 4])

cf.Data¶

Previous topic

Next topic

This Page

Navigation

cf.Data¶

Previous topic

Next topic

This Page

Quick search

Navigation