Full Complement of Summary Statistics

summaryFull is a generic function used to produce a full complement of summary statistics. The function invokes particular methods which depend on the class of the first argument. The summary statistics include: sample size, number of missing values, mean, median, trimmed mean, geometric mean, skew, kurtosis, min, max, range, 1st quartile, 3rd quartile, standard deviation, geometric standard deviation, interquartile range, median absolute deviation, and coefficient of variation.

summaryFull(object, ...)

# S3 method for class 'formula'
summaryFull(object, data = NULL, subset, 
  na.action = na.pass, ...)

# Default S3 method
summaryFull(object, group = NULL, 
    combine.groups = FALSE, drop.unused.levels = TRUE, 
    rm.group.na = TRUE, stats = NULL, trim = 0.1, 
    sd.method = "sqrt.unbiased", geo.sd.method = "sqrt.unbiased", 
    skew.list = list(), kurtosis.list = list(), 
    cv.list = list(), digits = max(3, getOption("digits") - 3), 
    digit.type = "signif", stats.in.rows = TRUE, 
    drop0trailing = TRUE, data.name = deparse(substitute(object)), 
    ...)

# S3 method for class 'data.frame'
summaryFull(object, ...)

# S3 method for class 'matrix'
summaryFull(object, ...)

# S3 method for class 'list'
summaryFull(object, ...)

Arguments

object: an object for which summary statistics are desired. In the default method, the argument object must be a numeric vector, a data frame, a matrix, or a list. When object is a data frame, all columns must be numeric. When object is a matrix, it must be a numeric matrix. When object is a list, all components must be numeric vectors. In the formula method, a symbolic specification of the form y ~ g can be given, indicating the observations in the vector y are to be grouped according to the levels of the factor g (the form y ~ 1 indicates no grouping). NAs are allowed in the data.
data: when object is a formula, data specifies an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which
summaryFull is called.
subset: when object is a formula, subset specifies an optional vector specifying a subset of observations to be used.
na.action: when object is a formula, na.action specifies a function which indicates what should happen when the data contain NAs. The default is na.pass.
group: when object is a numeric vector, group is a factor or character vector indicating which group each observation belongs to. When object is a matrix or data frame this argument is ignored and the columns define the groups. When object is a formula, this argument is ignored and the right-hand side of the formula specifies the grouping variable.
combine.groups: logical scalar indicating whether to show summary statistics for all groups combined. The default value is FALSE.
drop.unused.levels: when drop.unused.levels=TRUE, groups with no observations are dropped.
rm.group.na: logical scalar indicating whether to remove missing values from the group argument. By default rm.group.na=TRUE.
stats: character vector indicating which statistics to compute. Possible elements of the character vector include: "all" (indicating to include all summary statistics), "for.non.pos" (only compute statistics that are meaningful for datasets with non-positive values), "n" (number of non-missing values), "n.miss" (number of missing values), "mean", "median",
"trimmed.mean", "geo.mean", "skew", "kurtosis", "min", "max",
"range", "1st.quart", "3rd.quart", "sd", "geo.sd", "iqr",
"mad", "cv". The default value is stats="for.non.pos" when object contains non-positive values (i.e., values \(\le 0\)), and stats="all" when object contains only positive values.
trim: fraction (between 0 and 0.5 inclusive) of values to be trimmed from each end of the ordered data to compute the trimmed mean. The default value is trim=0.1. If trim=0.5, this yields the median.
sd.method: character string specifying what method to use to compute the sample standard deviation. The possible values are "sqrt.ubiased" (the square root of the unbiased estimate of variance; the default), or "moments" (the method of moments estimator).
geo.sd.method: character string specifying what method to use to compute the sample standard deviation of the log-transformed observations prior to exponentiating this quantity. The possible values are "sqrt.ubiased" (the square root of the unbiased estimate of variance; the default), or "moments" (the method of moments estimator). See the help file for geoSD for more information.
skew.list: list of arguments to supply to the skewness function. See the help file for skewness for more information. The default value is skew.list=list(), which results in using the default arguments to skewness.
kurtosis.list: list of arguments to supply to the kurtosis function. See the help file for kurtosis for more information. The default value is
kurtosis.list=list(), which results in using the default arguments to
kurtosis.
cv.list: list of arguments to supply to the cv function. See the help file for cv for more information. The default value is cv.list=list(), which results in using the default arguments to cv.
digits: integer indicating the number of digits to use for the summary statistics. When digit.type="signif", digits indicates the number of significant digits. When digit.type="round", digits indicates the number of decimal places to round to. The default value is max(3, getOption("digits") - 3), that is, the maximum of 3 versus the current setting of the "digits" component of .Options minus 3.
digit.type: character string indicating whether the digits argument refers to significant digits (digit.type="signif", the default), or how many decimal places to round to (digit.type="round").
stats.in.rows: logical scalar indicating whether to show the summary statistics in the rows or columns of the output. The default is stats.in.rows=TRUE.
drop0trailing: logical scalar indicating whether to drop trailing 0's when printing the summary statistics. The value of this argument is added as an attribute to the returned list and is used by the print.summaryStats function. The default value is TRUE.
data.name: character string indicating the name of the data used for the summary statistics.
...: additional arguments affecting the summary statistics produced.

Details

The function summaryFull returns summary statistics that are useful to describe various characteristics of one or more variables. It is an extended version of the built-in R function summary specifically for non-factor numeric data. The table below shows what statistics are computed and what functions are called by summaryFull to compute these statistics.

The object returned by summaryFull is useful for printing or report purposes. You may also use the functions that summaryFull calls (see table below) to compute summary statistics to be used by other functions.

See the help files for the functions listed in the table below for more information on these summary statistics.

Summary Statistic	Function Used
Mean	`mean`
Median	`median`
Trimmed Mean	`mean` with `trim` argument
Geometric Mean	`geoMean`
Skew	`skewness`
Kurtosis	`kurtosis`
Min	`min`
Max	`max`
Range	`range` and `diff`
1st Quartile	`quantile`
3rd Quartile	`quantile`
Standard Deviation	`sd`
Geometric Standard Deviation	`geoSD`
Interquartile Range	`iqr`
Median Absolute Deviation	`mad`
Coefficient of Variation	`cv`

Value

an object of class "summaryStats" (see summaryStats.object. Objects of class "summaryStats" are numeric matrices that contain the summary statisics produced by a call to summaryStats or summaryFull. These objects have a special printing method that by default removes trailing zeros for sample size entries and prints blanks for statistics that are normally displayed as NA (see print.summaryStats).

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers, Second Edition. Lewis Publishers, Boca Raton, FL.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY.

Leidel, N.A., K.A. Busch, and J.R. Lynch. (1977). Occupational Exposure Sampling Strategy Manual. U.S. Department of Health, Education, and Welfare, Public Health Service, Center for Disease Control, National Institute for Occupational Safety and Health, Cincinnati, Ohio 45226, January, 1977, pp.102-103.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

Zar, J.H. (2010). Biostatistical Analysis, Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Examples