enorm.Rd
Estimate the mean and standard deviation parameters of a normal (Gaussian) distribution, and optionally construct a confidence interval for the mean or the variance.
enorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided",
ci.method = "exact", conf.level = 0.95, ci.param = "mean")
numeric vector of observations.
character string specifying the method of estimation. Possible values are
"mvue"
(minimum variance unbiased; the default), and "mle/mme"
(maximum likelihood/method of moments).
See the DETAILS section for more information on these estimation methods.
logical scalar indicating whether to compute a confidence interval for the
mean or variance. The default value is FALSE
.
character string indicating what kind of confidence interval to compute. The
possible values are "two-sided"
(the default), "lower"
, and
"upper"
. This argument is ignored if ci=FALSE
.
character string indicating what method to use to construct the confidence interval
for the mean or variance. The only possible value is "exact"
(the default).
See the DETAILS section for more information. This argument is ignored if
ci=FALSE
.
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is conf.level=0.95
. This argument is ignored if
ci=FALSE
.
character string indicating which parameter to create a confidence interval for.
The possible values are ci.param="mean"
(the default) and ci.param="variance"
. This argument is ignored if ci=FALSE
.
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of
\(n\) observations from an normal (Gaussian) distribution with
parameters mean=
\(\mu\) and sd=
\(\sigma\).
Estimation
Minimum Variance Unbiased Estimation (method="mvue"
)
The minimum variance unbiased estimators (mvue's) of the mean and variance are:
$$\hat{\mu}_{mvue} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (1)$$
$$\hat{\sigma}^2_{mvue} = s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (2)$$
(Johnson et al., 1994; Forbes et al., 2011). Note that when method="mvue"
,
the estimated standard deviation is the square root of the mvue of the variance,
but is not itself an mvue.
Maximum Likelihood/Method of Moments Estimation (method="mle/mme"
)
The maximum likelihood estimator (mle) and method of moments estimator (mme) of the
mean are both the same as the mvue of the mean given in equation (1) above. The
mle and mme of the variance is given by:
$$\hat{\sigma}^2_{mle} = s^2_m = \frac{n-1}{n} s^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (3)$$
When method="mle/mme"
, the estimated standard deviation is the square root of
the mle of the variance, and is itself an mle.
Confidence Intervals
Confidence Interval for the Mean (ci.param="mean"
)
When ci=TRUE
and ci.param = "mean"
, the usual confidence interval
for \(\mu\) is constructed as follows. If ci.type="two-sided"
, a
the \((1-\alpha)\)100% confidence interval for \(\mu\) is given by:
$$[\hat{\mu} - t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\; (4)$$
where \(t(\nu, p)\) is the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom
(Zar, 2010; Gilbert, 1987; Ott, 1995; Helsel and Hirsch, 1992).
If ci.type="lower"
, the \((1-\alpha)\)100% confidence interval for
\(\mu\) is given by:
$$[\hat{\mu} - t(n-1, 1-\alpha) \frac{\hat{\sigma}}{\sqrt{n}}, \, \infty] \;\;\;\; (5)$$
and if ci.type="upper"
, the confidence interval is given by:
$$[-\infty, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\; (6)$$
Confidence Interval for the Variance (ci.param="variance"
)
When ci=TRUE
and ci.param = "variance"
, the usual confidence interval
for \(\sigma^2\) is constructed as follows. A two-sided
\((1-\alpha)\)100% confidence interval for \(\sigma^2\) is given by:
$$[ \frac{(n-1)s^2}{\chi^2_{n-1,1-\alpha/2}}, \, \frac{(n-1)s^2}{\chi^2_{n-1,\alpha/2}} ] \;\;\;\; (7)$$
Similarly, a one-sided upper \((1-\alpha)\)100% confidence interval for the
population variance is given by:
$$[ 0, \, \frac{(n-1)s^2}{\chi^2_{n-1,\alpha}} ] \;\;\;\; (8)$$
and a one-sided lower \((1-\alpha)\)100% confidence interval for the population
variance is given by:
$$[ \frac{(n-1)s^2}{\chi^2_{n-1,1-\alpha}}, \, \infty ] \;\;\;\; (9)$$
(van Belle et al., 2004; Zar, 2010).
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.
Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences, 2nd Edition. John Wiley & Sons, New York.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
The normal and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean or variance. This is done with confidence intervals.
# Generate 20 observations from a N(3, 2) distribution, then estimate
# the parameters and create a 95% confidence interval for the mean.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
dat <- rnorm(20, mean = 3, sd = 2)
enorm(dat, ci = TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Normal
#>
#> Estimated Parameter(s): mean = 2.861160
#> sd = 1.180226
#>
#> Estimation Method: mvue
#>
#> Data: dat
#>
#> Sample Size: 20
#>
#> Confidence Interval for: mean
#>
#> Confidence Interval Method: Exact
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 2.308798
#> UCL = 3.413523
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 2.861160
# sd = 1.180226
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 20
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Exact
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 2.308798
# UCL = 3.413523
#----------
# Using the same data, construct an upper 90% confidence interval for
# the variance.
enorm(dat, ci = TRUE, ci.type = "upper", ci.param = "variance")$interval
#> $name
#> [1] "Confidence"
#>
#> $parameter
#> [1] "variance"
#>
#> $limits
#> LCL UCL
#> 0.000000 2.615963
#>
#> $type
#> [1] "upper"
#>
#> $method
#> [1] "Exact"
#>
#> $conf.level
#> [1] 0.95
#>
#> $sample.size
#> [1] 20
#>
#> $dof
#> [1] 19
#>
#> attr(,"class")
#> [1] "intervalEstimate"
#Confidence Interval for: variance
#
#Confidence Interval Method: Exact
#
#Confidence Interval Type: upper
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.000000
# UCL = 2.615963
#----------
# Clean up
#---------
rm(dat)
#----------
# Using the Reference area TcCB data in the data frame EPA.94b.tccb.df,
# estimate the mean and standard deviation of the log-transformed data,
# and construct a 95% confidence interval for the mean.
with(EPA.94b.tccb.df, enorm(log(TcCB[Area == "Reference"]), ci = TRUE))
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Normal
#>
#> Estimated Parameter(s): mean = -0.6195712
#> sd = 0.4679530
#>
#> Estimation Method: mvue
#>
#> Data: log(TcCB[Area == "Reference"])
#>
#> Sample Size: 47
#>
#> Confidence Interval for: mean
#>
#> Confidence Interval Method: Exact
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = -0.7569673
#> UCL = -0.4821751
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = -0.6195712
# sd = 0.4679530
#
#Estimation Method: mvue
#
#Data: log(TcCB[Area == "Reference"])
#
#Sample Size: 47
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Exact
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = -0.7569673
# UCL = -0.4821751