ezmnorm.Rd
Estimate the mean and standard deviation of a zero-modified normal distribution, and optionally construct a confidence interval for the mean.
ezmnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided",
ci.method = "normal.approx", conf.level = 0.95)
numeric vector of observations.
character string specifying the method of estimation. Currently, the only possible
value is "mvue"
(minimum variance unbiased; the default). See the DETAILS
section for more information.
logical scalar indicating whether to compute a confidence interval for the
mean. The default value is FALSE
.
character string indicating what kind of confidence interval to compute. The
possible values are "two-sided"
(the default), "lower"
, and
"upper"
. This argument is ignored if ci=FALSE
.
character string indicating what method to use to construct the confidence interval
for the mean. Currently the only possible value is "normal.approx"
(the default). See the DETAILS section for more information.
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is conf.level=0.95
. This argument is ignored if
ci=FALSE
.
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of
\(n\) observations from a
zero-modified normal distribution with
parameters mean=
\(\mu\), sd=
\(\sigma\), and p.zero=
\(p\).
Let \(r\) denote the number of observations in \(\underline{x}\) that are equal
to 0, and order the observations so that \(x_1, x_2, \ldots, x_r\) denote
the \(r\) zero observations, and \(x_{r+1}, x_{r+2}, \ldots, x_n\) denote the
\(n-r\) non-zero observations.
Note that \(\mu\) is not the mean of the zero-modified normal distribution; it is the mean of the normal part of the distribution. Similarly, \(\sigma\) is not the standard deviation of the zero-modified normal distribution; it is the standard deviation of the normal part of the distribution.
Let \(\gamma\) and \(\delta\) denote the mean and standard deviation of the
overall zero-modified normal distribution. Aitchison (1955) shows that:
$$\gamma = (1 - p) \mu \;\;\;\; (1)$$
$$\delta^2 = (1 - p) \sigma^2 + p (1 - p) \mu^2 \;\;\;\; (2)$$
Estimation
Minimum Variance Unbiased Estimation (method="mvue"
)
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of
\(\gamma\) and \(\delta\) are:
$$\hat{\gamma}_{mvue} = \bar{x} \;\;\;\; (3)$$
\(\hat{\delta}^2_{mvue} =\) | \(\frac{n-r-1}{n-1} (s^*)^2 + \frac{r}{n} (\frac{n-r}{n-1}) (\bar{x}^*)^2\) | if \(r < n - 1\), |
\(x_n^2 / n\) | if \(r = n - 1\), | |
\(0\) | if \(r = n \;\;\;\; (4)\) |
where
$$\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (5)$$
$$\bar{x}^* = \frac{1}{n-r} \sum_{i=r+1}^n x_i \;\;\;\; (6)$$
$$(s^*)^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (x_i - \bar{x}^*)^2 \;\;\;\; (7)$$
Note that the quantity in equation (5) is the sample mean of all observations
(including 0 values), the quantity in equation (6) is the sample mean of all non-zero
observations, and the quantity in equation (7) is the sample variance of all
non-zero observations. Also note that for \(r=n-1\) or \(r=n\), the estimator
of \(\delta^2\) is the sample variance for all observations (including 0 values).
Confidence Intervals
Based on Normal Approximation (ci.method="normal.approx"
)
An approximate \((1-\alpha)100\%\) confidence interval for \(\gamma\) is
constructed based on the assumption that the estimator of \(\gamma\) is
approximately normally distributed. Aitchison (1955) shows that
$$Var(\hat{\gamma}_{mvue}) = Var(\bar{x}) = \frac{\delta^2}{n} \;\;\;\; (8)$$
Thus, an approximate two-sided \((1-\alpha)100\%\) confidence interval for
\(\gamma\) is constructed as:
$$[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \frac{\hat{\delta}_{mvue}}{\sqrt{n}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \frac{\hat{\delta}_{mvue}}{\sqrt{n}} ] \;\;\;\; (9)$$
where \(t_{\nu, p}\) is the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom.
One-sided confidence intervals are computed in a similar fashion.
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
The component called parameters
is a numeric vector with the following
estimated parameters:
Parameter Name | Explanation |
mean | mean of the normal (Gaussian) part of the distribution. |
sd | standard deviation of the normal (Gaussian) part of the distribution. |
p.zero | probability that an observation will be 0. |
mean.zmnorm | mean of the overall zero-modified normal distribution. |
sd.zmnorm | standard deviation of the overall normal distribution. |
Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.
Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.
Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.
USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.
The zero-modified normal distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit”. See, for example USEPA (1992c, pp.27-34). In most cases, however, the zero-modified lognormal (delta) distribution will be more appropriate, since chemical concentrations are bounded below at 0 (e.g., Gilliom and Helsel, 1986; Owen and DeRouen, 1980).
Once you estimate the parameters of the zero-modified normal distribution, it is often useful to characterize the uncertainty in the estimate of the mean. This is done with a confidence interval.
One way to try to assess whether a
zero-modified lognormal (delta),
zero-modified normal, censored normal, or
censored lognormal is the best model for the data is to construct both
censored and detects-only probability plots (see qqPlotCensored
).
# Generate 100 observations from a zero-modified normal distribution
# with mean=4, sd=2, and p.zero=0.5, then estimate the parameters.
# According to equations (1) and (2) above, the overall mean is
# mean.zmnorm=2 and the overall standard deviation is sd.zmnorm=sqrt(6).
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
dat <- rzmnorm(100, mean = 4, sd = 2, p.zero = 0.5)
ezmnorm(dat, ci = TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Zero-Modified Normal
#>
#> Estimated Parameter(s): mean = 4.037732
#> sd = 1.917004
#> p.zero = 0.450000
#> mean.zmnorm = 2.220753
#> sd.zmnorm = 2.465829
#>
#> Estimation Method: mvue
#>
#> Data: dat
#>
#> Sample Size: 100
#>
#> Confidence Interval for: mean.zmnorm
#>
#> Confidence Interval Method: Normal Approximation
#> (t Distribution)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 1.731417
#> UCL = 2.710088
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Zero-Modified Normal
#
#Estimated Parameter(s): mean = 4.037732
# sd = 1.917004
# p.zero = 0.450000
# mean.zmnorm = 2.220753
# sd.zmnorm = 2.465829
#
#Estimation Method: mvue
#
#Data: dat
#
#Sample Size: 100
#
#Confidence Interval for: mean.zmnorm
#
#Confidence Interval Method: Normal Approximation
# (t Distribution)
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 1.731417
# UCL = 2.710088
#----------
# Following Example 9 on page 34 of USEPA (1992c), compute an
# estimate of the mean of the zinc data, assuming a
# zero-modified normal distribution. The data are stored in
# EPA.92c.zinc.df.
head(EPA.92c.zinc.df)
#> Zinc.orig Zinc Censored Sample Well
#> 1 <7 7.00 TRUE 1 1
#> 2 11.41 11.41 FALSE 2 1
#> 3 <7 7.00 TRUE 3 1
#> 4 <7 7.00 TRUE 4 1
#> 5 <7 7.00 TRUE 5 1
#> 6 10.00 10.00 FALSE 6 1
# Zinc.orig Zinc Censored Sample Well
#1 <7 7.00 TRUE 1 1
#2 11.41 11.41 FALSE 2 1
#3 <7 7.00 TRUE 3 1
#4 <7 7.00 TRUE 4 1
#5 <7 7.00 TRUE 5 1
#6 10.00 10.00 FALSE 6 1
New.Zinc <- EPA.92c.zinc.df$Zinc
New.Zinc[EPA.92c.zinc.df$Censored] <- 0
ezmnorm(New.Zinc, ci = TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Zero-Modified Normal
#>
#> Estimated Parameter(s): mean = 11.891000
#> sd = 1.594523
#> p.zero = 0.500000
#> mean.zmnorm = 5.945500
#> sd.zmnorm = 6.123235
#>
#> Estimation Method: mvue
#>
#> Data: New.Zinc
#>
#> Sample Size: 40
#>
#> Confidence Interval for: mean.zmnorm
#>
#> Confidence Interval Method: Normal Approximation
#> (t Distribution)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 3.985545
#> UCL = 7.905455
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Zero-Modified Normal
#
#Estimated Parameter(s): mean = 11.891000
# sd = 1.594523
# p.zero = 0.500000
# mean.zmnorm = 5.945500
# sd.zmnorm = 6.123235
#
#Estimation Method: mvue
#
#Data: New.Zinc
#
#Sample Size: 40
#
#Confidence Interval for: mean.zmnorm
#
#Confidence Interval Method: Normal Approximation
# (t Distribution)
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 3.985545
# UCL = 7.905455
#----------
# Clean up
rm(dat, New.Zinc)