sigma_clip¶
-
astropy.stats.
sigma_clip
(data, sig=3.0, iters=1, cenfunc=<function median>, varfunc=<function var>, axis=None, copy=True)[source] [edit on github]¶ Perform sigma-clipping on the provided data.
This performs the sigma clipping algorithm - i.e. the data will be iterated over, each time rejecting points that are more than a specified number of standard deviations discrepant.
Note
scipy.stats.sigmaclip provides a subset of the functionality in this function.
Parameters: data : array-like
The data to be sigma-clipped (any shape).
sig : float
The number of standard deviations (not variances) to use as the clipping limit.
iters : int or
None
The number of iterations to perform clipping for, or
None
to clip until convergence is achieved (i.e. continue until the last iteration clips nothing).cenfunc : callable
The technique to compute the center for the clipping. Must be a callable that takes in a masked array and outputs the central value. Defaults to the median (numpy.median).
varfunc : callable
The technique to compute the standard deviation about the center. Must be a callable that takes in a masked array and outputs a width estimator:
deviation**2 > sig**2 * varfunc(deviation)
Defaults to the variance (numpy.var).
axis : int or
None
If not
None
, clip along the given axis. For this case, axis=int will be passed on to cenfunc and varfunc, which are expected to return an array with the axis dimension removed (like the numpy functions). IfNone
, clip over all values. Defaults toNone
.copy : bool
If
True
, the data array will be copied. IfFalse
, the masked array data will contain the same array asdata
. Defaults toTrue
.Returns: filtered_data :
numpy.ma.MaskedArray
A masked array with the same shape as
data
input, where the points rejected by the algorithm have been masked.Notes
The routine works by calculating:
deviation = data - cenfunc(data [,axis=int])
and then setting a mask for points outside the range:
data.mask = deviation**2 > sig**2 * varfunc(deviation)
It will iterate a given number of times, or until no further points are rejected.
Most numpy functions deal well with masked arrays, but if one would like to have an array with just the good (or bad) values, one can use:
good_only = filtered_data.data[~filtered_data.mask] bad_only = filtered_data.data[filtered_data.mask]
However, for multidimensional data, this flattens the array, which may not be what one wants (especially is filtering was done along an axis).
Examples
This will generate random variates from a Gaussian distribution and return a masked array in which all points that are more than 2 sample standard deviation from the median are masked:
>>> from astropy.stats import sigma_clip >>> from numpy.random import randn >>> randvar = randn(10000) >>> filtered_data = sigma_clip(randvar, 2, 1)
This will clipping on a similar distribution, but for 3 sigma relative to the sample mean, will clip until converged, and does not copy the data:
>>> from astropy.stats import sigma_clip >>> from numpy.random import randn >>> from numpy import mean >>> randvar = randn(10000) >>> filtered_data = sigma_clip(randvar, 3, None, mean, copy=False)
This will clip along one axis on a similar distribution with bad points inserted:
>>> from astropy.stats import sigma_clip >>> from numpy.random import normal >>> from numpy import arange, diag, ones >>> data = arange(5)+normal(0.,0.05,(5,5))+diag(ones(5)) >>> filtered_data = sigma_clip(data, axis=0, sig=2.3)
Note that along the other axis, no points would be masked, as the variance is higher.