cf.Data.sd¶
-
Data.
sd
(axes=None, squeeze=False, mtol=1, weights=None, ddof=1, a=None, i=False, _preserve_partitions=False)[source]¶ Collapse axes by calculating their standard deviation.
The standard deviation may be adjusted for the number of degrees of freedom and may be calculated with weighted values.
Missing data array elements and those with zero weight are omitted from the calculation.
The unweighted standard deviation, \(s\), of \(N\) values \(x_i\) with mean \(m\) and with \(N-ddof\) degrees of freedom (\(ddof\ge0\)) is
\[s=\sqrt{\frac{1}{N-ddof} \sum_{i=1}^{N} (x_i - m)^2}\]The weighted standard deviation, \(\tilde{s}_N\), of \(N\) values \(x_i\) with corresponding weights \(w_i\), weighted mean \(\tilde{m}\) and with \(N\) degrees of freedom is
\[\tilde{s}_N=\sqrt{\frac{1}{\sum_{i=1}^{N} w_i} \sum_{i=1}^{N} w_i(x_i - \tilde{m})^2}\]The weighted standard deviation, \(\tilde{s}\), of \(N\) values \(x_i\) with corresponding weights \(w_i\) and with \(N-ddof\) degrees of freedom (\(ddof>0\)) is
\[\tilde{s} = \sqrt{\frac{f \sum_{i=1}^{N} w_i}{f \sum_{i=1}^{N} w_i - ddof}} \tilde{s}_N\]where \(f\) is the smallest positive number whose product with each weight is an integer. \(f \sum_{i=1}^{N} w_i\) is the size of a new sample created by each \(x_i\) having \(fw_i\) repeats. In practice, \(f\) may not exist or may be difficult to calculate, so \(f\) is either set to a predetermined value or an approximate value is calculated. The approximation is the smallest positive number whose products with the smallest and largest weights and the sum of the weights are all integers, where a positive number is considered to be an integer if its decimal part is sufficiently small (no greater than \(10^{-8}\) plus \(10^{-5}\) times its integer part). This approximation will never overestimate \(f\), so \(\tilde{s}\) will never be underestimated when the approximation is used. If the weights are all integers which are collectively coprime then setting \(f=1\) will guarantee that \(\tilde{s}\) is exact.
Parameters: - axes : (sequence of) int, optional
The axes to be collapsed. By default flattened input is used. Each axis is identified by its integer position. No axes are collapsed if axes is an empty sequence.
- squeeze : bool, optional
If True then collapsed axes are removed. By default the axes which are collapsed are left in the result as axes with size 1. When the collapsed axes are retained, the result is guaranteed to broadcast correctly against the original array.
Example: Suppose that an array,
d
, has shape (2, 3, 4) ande = d.sd(axis=1)
. Thene
has shape (2, 1, 4) and, for example,d/e
is allowed. Ife = d.sd(axis=1, squeeze=True)
thene
will have shape (2, 4) andd/e
is an illegal operation.- weights : data-like or dict, optional
Weights associated with values of the array. By default all non-missing elements of the array are assumed to have equal weights of 1. If weights is a data-like object then it must have either the same shape as the array or, if that is not the case, the same shape as the axes being collapsed. If weights is a dictionary then each key is axes of the array (an int or tuple of ints) with a corresponding data-like value of weights for those axes. In this case, the implied weights array is the outer product of the dictionary’s values it may be used in conjunction wih any value of axes, because the axes to which the weights apply are given explicitly.
Example: Suppose that the original array being collapsed has shape (2, 3, 4) and weights is set to a data-like object,
w
. Ifaxes=None
thenw
must have shape (2, 3, 4). Ifaxes=(0, 1, 2)
thenw
must have shape (2, 3, 4). Ifaxes=(2, 0, 1)
thenw
must either have shape (2, 3, 4) or else (4, 2, 3). Ifaxes=1
thenw
must either have shape (2, 3, 4) or else (3,). Ifaxes=(2, 0)
thenw
must either have shape (2, 3, 4) or else (4, 2). Suppose weights is a dictionary. Ifweights={1: x}
thenx
must have shape (3,). Ifweights={1: x, (2, 0): y}
thenx
must have shape (3,) andy
must have shape (4, 2). The last example is equivalent toweights={(1, 2, 0): x.outerproduct(y)}
(seeouterproduct
for details).- mtol : number, optional
For each element in the output data array, the fraction of contributing input array elements which is allowed to contain missing data. Where this fraction exceeds mtol, missing data is returned. The default is 1, meaning a missing datum in the output array only occurs when its contributing input array elements are all missing data. A value of 0 means that a missing datum in the output array occurs whenever any of its contributing input array elements are missing data. Any intermediate value is permitted.
- ddof : number, optional
The delta degrees of freedom. The number of degrees of freedom used in the calculation is (N-ddof) where N represents the number of elements. By default ddof is 1, meaning the standard deviation of the population is estimated according to the usual formula with (N-1) in the denominator to avoid the bias caused by the use of the sample mean (Bessel’s correction).
- i : bool, optional
If True then update the data array in place. By default a new data array is created.
Returns: out: cf.Data
Examples: Some, not wholly comprehensive, examples:
>>> d = cf.Data([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]) >>> e = cf.Data([1, 2, 3, 4]) >>> d.sd(squeeze=False) <CF Data: [1.06262254195] > >>> d.sd() <CF Data: 1.06262254195 > >>> e.sd(weights=[2, 3, 5, 6]) <CF Data: 1.09991882817 > >>> e.sd(weights=[2, 3, 5, 6], f=1) <CF Data: 1.06262254195 > >>> d.sd(ddof=0) <CF Data: 1.02887985207 > >>> e.sd(ddof=0, weights=[2, 3, 5, 6]) <CF Data: 1.02887985207 >