cf.Field.collapse¶
-
Field.
collapse
(method, axes=None, squeeze=False, mtol=1, weights=None, ddof=1, a=None, i=False, group=None, regroup=False, within_days=None, within_years=None, over_days=None, over_years=None, coordinate=’mid_range’, group_by=’coords’, _debug=False, **kwargs)[source]¶ Collapse axes of the field.
Collapsing an axis involves reducing its size with a given (typically statistical) method.
By default all axes with size greater than 1 are collapsed completely with the given method. For example, to find the minumum of the data array:
>>> g = f.collapse('min')
By default the calculations of means, standard deviations and variances are not weighted. For example to find the mean of the data array, non-weighted:
>>> g = f.collapse('mean')
Specific weights may be forced with the weights parameter. For example to find the variance of the data array, weighting the X and Y axes by cell area, the T axis linearly and leaving all other axes unweighted:
>>> g = f.collapse('variance', weights=['area', 'T'])
A subset of the axes may be collapsed. For example, to find the mean over the time axis:
>>> f <CF Field: air_temperature(time(12), latitude(73), longitude(96) K> >>> g = f.collapse('T: mean') >>> g <CF Field: air_temperature(time(1), latitude(73), longitude(96) K>
For example, to find the maximum over the time and height axes:
>>> g = f.collapse('T: Z: max')
or, equivalently:
>>> g = f.collapse('max', axes=['T', 'Z'])
An ordered sequence of collapses over different (or the same) subsets of the axes may be specified. For example, to first find the mean over the time axis and subequently the standard deviation over the latitude and longitude axes:
>>> g = f.collapse('T: mean area: sd')
or, equivalently, in two steps:
>>> g = f.collapse('mean', axes='T').collapse('sd', axes='area')
Grouped collapses are possible, whereby groups of elements along an axis are defined and each group is collapsed independently. The collapsed groups are concatenated so that the collapsed axis in the output field has a size equal to the number of groups. For example, to find the variance along the longitude axis within each group of size 10 degrees:
>>> g = f.collapse('longitude: variance', group=cf.Data(10, 'degrees'))
Climatological statistics (a type of grouped collapse) as defined by the CF conventions may be specified. For example, to collapse a time axis into multiannual means of calendar monthly minima:
>>> g = f.collapse('time: minimum within years T: mean over years', ... within_years=cf.M())
In all collapses, missing data array elements are accounted for in the calculation.
The following collapse methods are available, over any subset of the axes:
Method Notes Maximum The maximum of the values. Minimum The minimum of the values. Sum The sum of the values. Mid-range The average of the maximum and the minimum of the values. Range The absolute difference between the maximum and the minimum of the values. Mean The unweighted mean \(N\) values \(x_i\) is
\[\mu=\frac{1}{N}\sum_{i=1}^{N} x_i\]The weighted mean of \(N\) values \(x_i\) with corresponding weights \(w_i\) is
\[\hat{\mu}=\frac{1}{V_{1}} \sum_{i=1}^{N} w_i x_i\]where \(V_{1}=\sum_{i=1}^{N} w_i\), the sum of the weights.
Variance The unweighted variance of \(N\) values \(x_i\) and with \(N-ddof\) degrees of freedom (\(ddof\ge0\)) is
\[s_{N-ddof}^{2}=\frac{1}{N-ddof} \sum_{i=1}^{N} (x_i - \mu)^2\]The unweighted biased estimate of the variance (\(s_{N}^{2}\)) is given by \(ddof=0\) and the unweighted unbiased estimate of the variance using Bessel’s correction (\(s^{2}=s_{N-1}^{2}\)) is given by \(ddof=1\).
The weighted biased estimate of the variance of \(N\) values \(x_i\) with corresponding weights \(w_i\) is
\[\hat{s}_{N}^{2}=\frac{1}{V_{1}} \sum_{i=1}^{N} w_i(x_i - \hat{\mu})^{2}\]The corresponding weighted unbiased estimate of the variance is
\[\hat{s}^{2}=\frac{1}{V_{1} - (V_{1}/V_{2})} \sum_{i=1}^{N} w_i(x_i - \hat{\mu})^{2}\]where \(V_{2}=\sum_{i=1}^{N} w_i^{2}\), the sum of the squares of weights. In both cases, the weights are assumed to be non-random “reliability weights”, as opposed to frequency weights.
Standard deviation The variance is the square root of the variance. Sample size The sample size, \(N\), as would be used for other statistical calculations. Sum of weights The sum of sample weights, \(V_{1}\), as would be used for other statistical calculations. Sum of squares of weights The sum of squares of sample weights, \(V_{2}\), as would be used for other statistical calculations. New in version 1.0.
Parameters: - method:
str
Define the collapse method. All of the axes specified by the axes parameter are collapsed simultaneously by this method. The method is given by one of the following strings:
method Description 'max'
or'maximum'
Maximum 'min'
or'minimum'
Minimum 'sum'
Sum 'mid_range'
Mid-range 'range'
Range 'mean'
or'average'
or'avg'
Mean 'var'
or'variance'
Variance 'sd'
or'standard_deviation'
Standard deviation 'sample_size'
Sample size 'sum_of_weights'
Sum of weights 'sum_of_weights2'
Sum of squares of weights An alternative form is to provide a CF cell methods-like string. In this case an ordered sequence of collapses may be defined and both the collapse methods and their axes are provided. The axes are interpreted as for the axes parameter, which must not also be set. For example:
>>> g = f.collapse('time: max (interval 1 hr) X: Y: mean dim3: sd')
is equivalent to:
>>> g = f.collapse('max', axes='time') >>> g = g.collapse('mean', axes=['X', 'Y']) >>> g = g.collapse('sd', axes='dim3')
Climatological collapses are carried out if a method string contains any of the modifiers
'within days'
,'within years'
,'over days'
or'over years'
. For example, to collapse a time axis into multiannual means of calendar monthly minima:>>> g = f.collapse('time: minimum within years T: mean over years', ... within_years=cf.M())
which is equivalent to:
>>> g = f.collapse('time: minimum within years', within_years=cf.M()) >>> g = g.collapse('mean over years', axes='T')
- axes, kwargs: optional
The axes to be collapsed. The axes are those that would be selected by this call of the field’s
axes
method:f.axes(axes, **kwargs)
. Seecf.Field.axes
for details. If an axis has size 1 then it is ignored. By default all axes with size greater than 1 are collapsed. If axes has the special value'area'
then it is assumed that the X and Y axes are intended.Example:
axes='area'
is equivalent toaxes=['X', 'Y']
.axes=['area', Z']
is equivalent toaxes=['X', 'Y', 'Z']
.- weights: optional
Specify the weights for the collapse. By default the collapse is unweighted. The weights are those that would be returned by this call of the field’s
weights
method:f.weights(weights, components=True)
. Seecf.Field.weights
for details. By default weights isNone
meaning that the collapse is unweighted- Example:
To specify weights based on cell areas use
weights='area'
.- Example:
To specify weights based on cell areas and linearly in time you could set
weights=('area', 'T')
.- Example:
To automatically detect the best weights available for all axes from the metadata:
weights='auto'
. Seecf.Field.weights
for details on how the weights are derived in this case.
- squeeze:
bool
, optional If True then size 1 collapsed axes are removed from the output data array. By default the axes which are collapsed are retained in the result’s data array.
- mtol: number, optional
Set the fraction of input array elements which is allowed to contain missing data when contributing to an individual output array element. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning that a missing datum in the output array only occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.
- Example:
To ensure that an output array element is a missing datum if more than 25% of its input array elements are missing data:
mtol=0.25
.
- ddof: number, optional
The delta degrees of freedom in the calculation of a standard deviation or variance. The number of degrees of freedom used in the calculation is (N-ddof) where N represents the number of non-missing elements. By default ddof is 1, meaning the standard deviation and variance of the population is estimated according to the usual formula with (N-1) in the denominator to avoid the bias caused by the use of the sample mean (Bessel’s correction).
- coordinate:
str
, optional Set how the cell coordinate values for collapsed axes are defined. This has no effect on the cell bounds for the collapsed axes, which always represent the extrema of the input coordinates. Valid values are:
coordinate Description 'mid_range'
An output coordinate is the average of the first and last input coordinate bounds (or the first and last coordinates if there are no bounds). This is the default. 'min'
An output coordinate is the minimum of the input coordinates. 'max'
An output coordinate is the maximum of the input coordinates. - group: optional
Independently collapse groups of axis elements. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups. The group parameter defines how the elements are partitioned into groups, and may be one of:
A
cf.Data
defining the group size in terms of ranges of coordinate values. The first group starts at the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups of 10 kilometres:
group=cf.Data(10, 'km')
.- Note:
- By default each element will be in exactly one group (see the group_by parameter).
- Groups may contain different numbers of elements.
- If no units are specified then the units of the coordinates are assumed.
A
cf.TimeDuration
defining the group size in terms of calendar months and years or other time intervals. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups of 5 days, starting and ending at midnight on each day:
group=cf.D(5)
(seecf.D
).- Example:
To define groups of 1 calendar month, starting and ending at day 16 of each month:
group=cf.M(day=16)
(seecf.M
).- Note:
- By default each element will be in exactly one group (see the group_by parameter).
- Groups may contain different numbers of elements.
- The start of the first group may be before the first
first axis element, depending on the offset defined
by the time duration. For example, if
group=cf.Y(month=12)
then the first group will start on the closest 1st December to the first axis element.
A (sequence of)
cf.Query
, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.- Example:
To define groups of the season MAM in each year:
group=cf.mam()
(seecf.mam
).- Example:
To define groups of the seasons DJF and JJA in each year:
group=[cf.jja(), cf.djf()]
. To define groups for seasons DJF, MAM, JJA and SON in each year:group=cf.seasons()
(seecf.djf
,cf.jja
andcf.season
).- Example:
To define groups for longitude elements less than or equal to 90 degrees and greater than 90 degrees:
group=[cf.le(90, 'degrees'), cf.gt(90, 'degrees')]
(seecf.le
andcf.gt
).- Note:
- If a coordinate does not satisfy any of the conditions then its element will not be in a group.
- Groups may contain different numbers of elements.
- If no units are specified then the units of the coordinates are assumed.
- If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
An
int
defining the number of elements in each group. The first group starts with the first axis element and spans the defined number of consecutive elements. Each susbsequent group immediately follows the preceeeding one.- Example:
To define groups of 5 elements:
group=5
.- Note:
- Each group has the defined number of elements, apart from the last group which may contain fewer elements.
A
numpy.array
of integers defining groups. The array must have the same length as the axis to be collapsed and its sequence of values correspond to the axis elements. Each group contains the elements which correspond to a common non-negative integer value in the numpy array. Upon output, the collapsed axis is arranged in order of increasing group number.- Example:
For an axis of size 8, create two groups, the first containing the first and last elements and the second containing the 3rd, 4th and 5th elements, whilst ignoring the 2nd, 6th and 7th elements:
group=numpy.array([0, -1, 4, 4, 4, -1, -2, 0])
.- Note:
- The groups do not have to be in runs of consective elements; they may be scattered throughout the axis.
- An element which corresponds to a negative integer in the array will not be in a group.
- group_by:
str
, optional Specify how coordinates are assigned to the groups defined by the group, within_days or within_years parameter. Ignored unless one of these parameters is a
cf.Data
orcf.TimeDuration
object. The group_by parameter may be one of:'coords'
. This is the default. Each group contains the axis elements whose coordinate values lie within the group limits. Every element will be in a group.
'bounds'
. Each group contains the axis elements whose upper and lower coordinate bounds both lie within the group limits. Some elements may not be inside any group, either because the group limits do not coincide with coordinate bounds or because the group size is sufficiently small.
- regroup:
bool
, optional For grouped collapses, return a
numpy.array
of integers which identifies the groups defined by the group parameter. The array is interpreted as for a numpy array value of the group parameter, and thus may subsequently be used by group parameter in a separate collapse. For example:>>> groups = f.collapse('time: mean', group=10, regroup=True) >>> g = f.collapse('time: mean', group=groups)
is equivalent to:
>>> g = f.collapse('time: mean', group=10)
- within_days: optional
Independently collapse groups of reference-time axis elements for CF “within days” climatological statistics. Each group contains elements whose coordinates span a time interval of up to one day. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.
- Note:
For CF compliance, a “within days” collapse should be followed by an “over days” collapse.
The within_days parameter defines how the elements are partitioned into groups, and may be one of:
A
cf.TimeDuration
defining the group size in terms of a time interval of up to one day. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups of 6 hours, starting at 00:00, 06:00, 12:00 and 18:00:
within_days=cf.h(6)
(seecf.h
).- Example:
To define groups of 1 day, starting at 06:00:
within_days=cf.D(1, hour=6)
(seecf.D
).- Note:
- Groups may contain different numbers of elements.
- The start of the first group may be before the first
first axis element, depending on the offset defined
by the time duration. For example, if
group=cf.D(hour=12)
then the first group will start on the closest midday to the first axis element.
A (sequence of)
cf.Query
, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.- Example:
To define groups of 00:00 to 06:00 within each day, ignoring the rest of each day:
within_days=cf.hour(cf.le(6))
(seecf.hour
andcf.le
).- Example:
To define groups of 00:00 to 06:00 and 18:00 to 24:00 within each day, ignoring the rest of each day:
within_days=[cf.hour(cf.le(6)), cf.hour(cf.gt(18))]
(seecf.gt
,cf.hour
andcf.le
).- Note:
- Groups may contain different numbers of elements.
- If no units are specified then the units of the coordinates are assumed.
- If a coordinate does not satisfy any of the conditions then its element will not be in a group.
- If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
- within_years: optional
Independently collapse groups of reference-time axis elements for CF “within years” climatological statistics. Each group contains elements whose coordinates span a time interval of up to one calendar year. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.
- Note:
For CF compliance, a “within years” collapse should be followed by an “over years” collapse.
The within_years parameter defines how the elements are partitioned into groups, and may be one of:
A
cf.TimeDuration
defining the group size in terms of a time interval of up to one calendar year. The first group starts at or before the first coordinate bound of the first axis element (or its coordinate if there are no bounds) and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the consective run of elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups of 90 days:
within_years=cf.D(90)
(seecf.D
).- Example:
To define groups of 3 calendar months, starting on the 15th of a month:
within_years=cf.M(3, day=15)
(seecf.M
).- Note:
- Groups may contain different numbers of elements.
- The start of the first group may be before the first
first axis element, depending on the offset defined
by the time duration. For example, if
group=cf.Y(month=12)
then the first group will start on the closest 1st December to the first axis element.
A (sequence of)
cf.Query
, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each maximally consecutive run within these elements.- Example:
To define groups for the season MAM within each year:
within_years=cf.mam()
(seecf.mam
).- Example:
To define groups for February and for November to December within each year:
within_years=[cf.month(2), cf.month(cf.ge(11))]
(seecf.month
andcf.ge
).- Note:
- The first group may start outside of the range of
coordinates (the start of the first group is
controlled by parameters of the
cf.TimeDuration
). - If group boundaries do not coincide with coordinate bounds then some elements may not be inside any group.
- If the group size is sufficiently small then some elements may not be inside any group.
- Groups may contain different numbers of elements.
- The first group may start outside of the range of
coordinates (the start of the first group is
controlled by parameters of the
- over_days: optional
Independently collapse groups of reference-time axis elements for CF “over days” climatological statistics. Each group contains elements whose coordinates are matching, in that their lower bounds have a common time of day but different dates of the year, and their upper bounds also have a common time of day but different dates of the year. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.
- Example:
An element with coordinate bounds {1999-12-31 06:00:00, 1999-12-31 18:00:00} matches an element with coordinate bounds {2000-01-01 06:00:00, 2000-01-01 18:00:00}.
- Example:
An element with coordinate bounds {1999-12-31 00:00:00, 2000-01-01 00:00:00} matches an element with coordinate bounds {2000-01-01 00:00:00, 2000-01-02 00:00:00}.
- Note:
A coordinate parameter value of
'min'
is assumed, regardless of its given value.A group_by parameter value of
'bounds'
is assumed, regardless of its given value.An “over days” collapse must be preceded by a “within days” collapse, as described by the CF conventions. If the field already contains sub-daily data, but does not have the “within days” cell methods flag then it may be added, for example, as follows (this example assumes that the appropriate cell method is the most recently applied, which need not be the case; see
cf.CellMethods
for details):>>> f.cell_methods[-1].within = 'days'
The over_days parameter defines how the elements are partitioned into groups, and may be one of:
None
. This is the default. Each collection of matching elements forms a group.
A
cf.TimeDuration
defining the group size in terms of a time duration of at least one day. Multiple groups are created from each collection of matching elements - the first of which starts at or before the first coordinate bound of the first element and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the matching elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups spanning 90 days:
over_days=cf.D(90)
orover_days=cf.h(2160)
. (seecf.D
andcf.h
).- Example:
To define groups spanning 3 calendar months, starting and ending at 06:00 in the first day of each month:
over_days=cf.M(3, hour=6)
(seecf.M
).- Note:
- Groups may contain different numbers of elements.
- The start of the first group may be before the first
first axis element, depending on the offset defined
by the time duration. For example, if
group=cf.M(day=15)
then the first group will start on the closest 15th of a month to the first axis element.
A (sequence of)
cf.Query
, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each subset of matching elements.- Example:
To define groups for January and for June to December, ignoring all other months:
over_days=[cf.month(1), cf.month(cf.wi(6, 12))]
(seecf.month
andcf.wi
).- Note:
- If a coordinate does not satisfy any of the conditions then its element will not be in a group.
- Groups may contain different numbers of elements.
- If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
- over_years: optional
Independently collapse groups of reference-time axis elements for CF “over years” climatological statistics. Each group contains elements whose coordinates are matching, in that their lower bounds have a common sub-annual date but different years, and their upper bounds also have a common sub-annual date but different years. Upon output, the results of the collapses are concatenated so that the output axis has a size equal to the number of groups.
- Example:
An element with coordinate bounds {1999-06-01 06:00:00, 1999-09-01 06:00:00} matches an element with coordinate bounds {2000-06-01 06:00:00, 2000-09-01 06:00:00}.
- Example:
An element with coordinate bounds {1999-12-01 00:00:00, 2000-12-01 00:00:00} matches an element with coordinate bounds {2000-12-01 00:00:00, 2001-12-01 00:00:00}.
- Note:
A coordinate parameter value of
'min'
is assumed, regardless of its given value.A group_by parameter value of
'bounds'
is assumed, regardless of its given value.An “over years” collapse must be preceded by a “within years” or an “over days” collapse, as described by the CF conventions. If the field already contains sub-annual data, but does not have the “within years” or “over days” cell methods flag then it may be added, for example, as follows (this example assumes that the appropriate cell method is the most recently applied, which need not be the case; see
cf.CellMethods
for details):>>> f.cell_methods[-1].over = 'days'
The over_years parameter defines how the elements are partitioned into groups, and may be one of:
None
. Each collection of matching elements forms a group. This is the default.
A
cf.TimeDuration
defining the group size in terms of a time interval of at least one calendar year. Multiple groups are created from each collection of matching elements - the first of which starts at or before the first coordinate bound of the first element and spans the defined group size. Each susbsequent group immediately follows the preceeeding one. By default each group contains the matching elements whose coordinate values lie within the group limits (see the group_by parameter).- Example:
To define groups spanning 10 calendar years:
over_years=cf.Y(10)
orover_years=cf.M(120)
(seecf.M
andcf.Y
).- Example:
To define groups spanning 5 calendar years, starting and ending at 06:00 on 01 December of each year:
over_years=cf.Y(5, month=12, hour=6)
(seecf.Y
).- Note:
- Groups may contain different numbers of elements.
- The start of the first group may be before the first
first axis element, depending on the offset defined
by the time duration. For example, if
group=cf.Y(month=12)
then the first group will start on the closest 1st December to the first axis element.
A (sequence of)
cf.Query
, each of which is a condition defining one or more groups. Each query selects elements whose coordinates satisfy its condition and from these elements multiple groups are created - one for each subset of matching elements.- Example:
To define one group spanning 1981 to 1990 and another spanning 2001 to 2005:
over_years=[cf.year(cf.wi(1981, 1990), cf.year(cf.wi(2001, 2005)]
(seecf.year
andcf.wi
).- Note:
- If a coordinate does not satisfy any of the conditions then its element will not be in a group.
- Groups may contain different numbers of elements.
- If an element is selected by two or more queries then the latest one in the sequence defines which group it will be in.
- i:
bool
, optional If True then update the field in place. By default a new field is created. In either case, a field is returned.
Returns: - out:
cf.Field
ornumpy.ndarray
The collapsed field. Alternatively, if the regroup parameter is True then a numpy array is returned.
Examples: Calculate the unweighted mean over a the entire field:
>>> g = f.collapse('mean')
Five equivalent ways to calculate the unweighted mean over a CF latitude axis:
>>> g = f.collapse('latitude: mean') >>> g = f.collapse('lat: avg') >>> g = f.collapse('Y: average') >>> g = f.collapse('mean', 'Y') >>> g = f.collapse('mean', ['latitude'])
Three equivalent ways to calculate an area weighted mean over CF latitude and longitude axes:
>>> g = f.collapse('area: mean', weights='area') >>> g = f.collapse('lat: lon: mean', weights='area') >>> g = f.collapse('mean', axes=['Y', 'X'], weights='area')
Two equivalent ways to calculate a time weighted mean over CF latitude, longitude and time axes:
>>> g = f.collapse('X: Y: T: mean', weights='T') >>> g = f.collapse('mean', axes=['T', 'Y', 'X'], weights='T')
Find how many non-missing elements in each group of a grouped collapse:
>>> f.collapse('latitude: sample_size', group=cf.Data(5 'degrees'))
- method: