Compute the sample geometric mean.

geoMean(x, na.rm = FALSE)

Arguments

x

numeric vector of observations.

na.rm

logical scalar indicating whether to remove missing values from x. If na.rm=FALSE (the default) and x contains missing values, then a missing value (NA) is returned. If na.rm=TRUE, missing values are removed from x prior to computing the coefficient of variation.

Details

If x contains any non-positive values (values less than or equal to 0), geoMean returns NA and issues a warning.

Let \(\underline{x}\) denote a vector of \(n\) observations from some distribution. The sample geometric mean is a measure of central tendency. It is defined as: $$\bar{x}_G = \sqrt[n]{x_1 x_2 \ldots x_n} = [\prod_{i=1}^n x_i]^{1/n} \;\;\;\;\;\; (1)$$ that is, it is the \(n\)'th root of the product of all \(n\) observations.

An equivalent way to define the geometric mean is by: $$\bar{x}_G = exp[\frac{1}{n} \sum_{i=1}^n log(x_i)] = e^{\bar{y}} \;\;\;\;\;\; (2)$$ where $$\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \;\;\;\;\;\; (3)$$ $$y_i = log(x_i), \;\; i = 1, 2, \ldots, n \;\;\;\;\;\; (4)$$ That is, the sample geometric mean is antilog of the sample mean of the log-transformed observations.

The geometric mean is only defined for positive observations. It can be shown that the geometric mean is less than or equal to the sample arithmetic mean with equality only when all of the observations are the same value.

Value

A numeric scalar – the sample geometric mean.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers, Second Edition. Lewis Publishers, Boca Raton, FL.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, NY.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

Taylor, J.K. (1990). Statistical Techniques for Data Analysis. Lewis Publishers, Boca Raton, FL.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The geometric mean is sometimes used to average ratios and percent changes (Zar, 2010). For the lognormal distribution, the geometric mean is the maximum likelihood estimator of the median of the distribution, although it is sometimes used incorrectly to estimate the mean of the distribution (see the NOTE section in the help file for elnormAlt).

Examples

  # Generate 20 observations from a lognormal distribution with parameters 
  # mean=10 and cv=2, and compute the mean, median, and geometric mean. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rlnormAlt(20, mean = 10, cv = 2) 

  mean(dat) 
#> [1] 5.339273
  #[1] 5.339273

  median(dat) 
#> [1] 3.692091
  #[1] 3.692091
 
  geoMean(dat) 
#> [1] 4.095127
  #[1] 4.095127
 
  #----------
  # Clean up
  rm(dat)