get_stats¶
- pySurf.data2D.get_stats(data=None, x=None, y=None, units=None, vars=None, string=False, fmt=None)¶
Return selected statistics for each of data,x,y as numeric array or string, wrapping dataIO.stats.
vars determines which statistical indicators are included in stats, while string, fmt and units are used to generate and control a string output in a similar way as in wrapped function dataIO.arrays.stats. get_stats implements a more versatile syntax handling statistics on three coordinates axis. See the test function test_get_stats for more examples.
vars is an array of indices selecting which variables must be included in the statistics. You can call stats with no argument to see the list of variables (and standard format). The options are:
scalar: use a preset (=1 basic statistics, =2 for extended statistics) single level list of integer indices (e.g. =[0,1]): it is applied only to data. two-level nested list (e.g. =[[0,1]]) and the outer list has a single element, the selection is used replicated to data, x, y. 3-element nested list =[[0,1],[1],[2]]: indicates different choices for data, x and y.
In this context, special values can be used to indicate different type of defaults (N.B.: vars are in order matching data, x, y), these are internally converted to the proper format: None: don’t include element (e.g. [1,2] is equivalent to [[1,2],None,None]) []: use default (e.g. [[0,2],[],None] uses default for x and doesn’t report y) [[]]: use full set of variables (e.g. [[0,2],[],[[]]] uses default for x and full stats for y, [[[]]] uses full for data).
Statistics are returned as array of numerical values, unless string flag is set. In that case, units and fmt are used to control the output format.
units (scalar in dataIO.arrays.stats) can be passed as 3-element list of strings to individually set the units for each axis. These are appended to every value in the respective axis (a more flexible behavior can be obtained by using fmt). If scalar is used on data axis, if single element string array, use for all axis (i.e. set units as array to obtain different behavior like units=[‘’,’’,u] to set only the data axis).
fmt uses dataIO.arrays.stats, but it is not divided in axis. All axis settings are combined in a single list. If string is set to True get_stats returns a flattened array of strings, so an array of equal lenght can be passed, or a scalar, used for all axis and stats. Note that strings are assembled here without accessing to dataIO.arrays.stats function, whose fmt argument is not used at all here.
units are used and appended to fmt if not None or set to empty string. The length of the two must match, and are converted to the correct format inside this function. Conversion is made in this case in dependance on the format of vars. For example, vars = [[1,2,3],None,None] requires to convert [‘mm’,’mm’,’um’] to [‘um’,’um’,’um’]
If default, units are built from vars and from strings obtained from dataIO.arrays.stats (called without data).
TODO: span doesn’t exclude nan data, put flag to tune this option. TODO: there is some confusion in creating labels for plot_data because it can be unclear which one is X, Y, Z. A label should be added externally or in a routine. Also, statistics cannot be sorted (a list is returned, so it is possible to sort the list). TODO: make a default extended stats, with span and pts nr. for x and y and mean, span, rms for z.
- stats(data=None, units=None, string=False, fmt=None, vars=None)
Return selected statistics on data as numerical array or list of strings (one for each stats).
vars is a list of indices that select the variables to be included, wrt a list (if called without data returns a format representation of the variables in the list): 0 - mean 1 - stddev 2 - rms 3 - PV 4 - min 5 - max 6 - number of elements
N.B.: (1) is intended as the rms of the deviation from the mean, while (2) is the root mean square of the signal as value (wrt to zero). Note that span doesn’t exclude nan data, put flag to tune this option.
string if set to True return stats as strings. In this case a string units can be used to add a postfix to statistics. A finer control can be obtained by passing in fmt a list of format strings for each var. e.g.: the default is obtained with:
- fmt = [‘mean: %.5g’+units,
‘StdDev: %.5g’+units, ‘rms: %.5g’+units, ‘PV: %.5g’+units, ‘min: %.5g’+units, ‘max: %.5g’+units, ‘n: %i’]
2021/06/30 added rms (different from standard dev, which is centered about mean).