Estimate the parameters of a zero-modified lognormal distribution or a zero-modified lognormal distribution (alternative parameterization), and optionally construct a confidence interval for the mean.

ezmlnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

  ezmlnormAlt(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "normal.approx", conf.level = 0.95)

Arguments

x

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

method

character string specifying the method of estimation. The only possible value is "mvue" (minimum variance unbiased; the default). See the DETAILS section for more information on this estimation method.

ci

logical scalar indicating whether to compute a confidence interval for the mean. The default value is FALSE. If ci=TRUE and there are less than three non-missing observations in x, or if all observations are zeros, a warning will be issued and no confidence interval will be computed.

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the mean. The only possible value is "normal.approx" (the default). See the DETAILS section for more information. This argument is ignored if ci=FALSE.

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of \(n\) observations from a zero-modified lognormal distribution with parameters meanlog=\(\mu\), sdlog=\(\sigma\), and p.zero=\(p\). Alternatively, let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of \(n\) observations from a zero-modified lognormal distribution (alternative parameterization) with parameters mean=\(\theta\), cv=\(\tau\), and p.zero=\(p\).

Let \(r\) denote the number of observations in \(\underline{x}\) that are equal to 0, and order the observations so that \(x_1, x_2, \ldots, x_r\) denote the \(r\) zero observations and \(x_{r+1}, x_{r+2}, \ldots, x_n\) denote the \(n-r\) non-zero observations.

Note that \(\theta\) is not the mean of the zero-modified lognormal distribution; it is the mean of the lognormal part of the distribution. Similarly, \(\tau\) is not the coefficient of variation of the zero-modified lognormal distribution; it is the coefficient of variation of the lognormal part of the distribution.

Let \(\gamma\), \(\delta\), and \(\phi\) denote the mean, standard deviation, and coefficient of variation of the overall zero-modified lognormal (delta) distribution. Let \(\eta\) denote the standard deviation of the lognormal part of the distribution, so that \(\eta = \theta \tau\). Aitchison (1955) shows that: $$\gamma = (1 - p) \theta \;\;\;\; (1)$$ $$\delta^2 = (1 - p) \eta^2 + p (1 - p) \theta^2 \;\;\;\; (2)$$ so that $$\phi = \frac{\delta}{\gamma} = \frac{\sqrt{\tau^2 + p}}{\sqrt{1-p}} \;\;\;\; (3)$$

Estimation

Minimum Variance Unbiased Estimation (method="mvue")
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of \(\gamma\) and \(\delta\) are:

\(\hat{\gamma}_{mvue} =\)\((1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2})\)if \(r < n - 1\),
\(x_n / n\)if \(r = n - 1\),
\(0\)if \(r = n \;\;\;\; (4)\)
\(\hat{\delta}^2_{mvue} =\)\((1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \} \)if \(r < n - 1\),
\(x_n^2 / n\)if \(r = n - 1\),
\(0\)if \(r = n \;\;\;\; (5)\)

where $$y_i = log(x_i), \; r = r+1, r+2, \ldots, n \;\;\;\; (6)$$ $$\bar{y} = \frac{1}{n-r} \sum_{i=r+1}^n y_i \;\;\;\; (7)$$ $$s^2 = \frac{1}{n-r-1} \sum_{i=r+1}^n (y_i - \bar{y})^2 \;\;\;\; (8)$$ $$g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)$$

Note that when \(r=n-1\) or \(r=n\), the estimator of \(\gamma\) is simply the sample mean for all observations (including zero values), and the estimator for \(\delta^2\) is simply the sample variance for all observations.

The expected value and asymptotic variance of the mvue of \(\gamma\) are (Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980): $$E(\hat{\gamma}_{mvue}) = \gamma \;\;\;\; (10)$$ $$AVar(\hat{\gamma}_{mvue}) = \frac{1}{n} exp(2\mu + \sigma^2) (1-p) (p + \frac{2\sigma^2 + \sigma^4}{2}) \;\;\;\; (11)$$

Confidence Intervals

Based on Normal Approximation (ci.method="normal.approx")
An approximate \((1-\alpha)100\%\) confidence interval for \(\gamma\) is constructed based on the assumption that the estimator of \(\gamma\) is approximately normally distributed. Thus, an approximate two-sided \((1-\alpha)100\%\) confidence interval for \(\gamma\) is constructed as: $$[ \hat{\gamma}_{mvue} - t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}}, \; \hat{\gamma}_{mvue} + t_{n-2, 1-\alpha/2} \hat{\sigma}_{\hat{\gamma}} ] \;\;\;\; (12)$$ where \(t_{\nu, p}\) is the \(p\)'th quantile of Student's t-distribution with \(\nu\) degrees of freedom, and the quantity \(\hat{\sigma}_{\hat{\gamma}}\) is the estimated standard deviation of the mvue of \(\gamma\), and is computed by replacing the values of \(\mu\), \(\sigma\), and \(p\) in equation (11) above with their estimated values and taking the square root.

Note that there must be at least 3 non-missing observations (\(n \ge 3\)) and at least one observation must be non-zero (\(r \le n-1\)) in order to construct a confidence interval.

One-sided confidence intervals are computed in a similar fashion.

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

For the function ezmlnorm, the component called parameters is a numeric vector with the following estimated parameters:

Parameter NameExplanation
meanlogmean of the log of the lognormal part of the distribution.
sdlogstandard deviation of the log of the lognormal part of the distribution.
p.zeroprobability that an observation will be 0.
mean.zmlnormmean of the overall zero-modified lognormal (delta) distribution.
sd.zmlnormstandard deviation of the overall zero-modified lognormal (delta) distribution.

For the function ezmlnormAlt, the component called parameters is a numeric vector with the following estimated parameters:

Parameter NameExplanation
meanmean of the lognormal part of the distribution.
cvcoefficient of variation of the lognormal part of the distribution.
p.zeroprobability that an observation will be 0.
mean.zmlnormmean of the overall zero-modified lognormal (delta) distribution.
sd.zmlnormstandard deviation of the overall zero-modified lognormal (delta) distribution.

References

Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.94-99.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47–51.

Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.

Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.

Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.

USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The zero-modified lognormal (delta) distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit” (the nondetects are assumed equal to 0). See, for example, Gilliom and Helsel (1986), Owen and DeRouen (1980), and Gibbons et al. (2009, Chapter 12). USEPA (2009, Chapter 15) recommends this strategy only in specific situations, and Helsel (2012, Chapter 1) strongly discourages this approach to dealing with non-detects.

A variation of the zero-modified lognormal (delta) distribution is the zero-modified normal distribution, in which a normal distribution is mixed with a positive probability mass at 0.

One way to try to assess whether a zero-modified lognormal (delta), zero-modified normal, censored normal, or censored lognormal is the best model for the data is to construct both censored and detects-only probability plots (see qqPlotCensored).

Examples

  # Generate 100 observations from a zero-modified lognormal (delta) 
  # distribution with mean=2, cv=1, and p.zero=0.5, then estimate the 
  # parameters. According to equations (1) and (3) above, the overall mean 
  # is mean.zmlnorm=1 and the overall cv is cv.zmlnorm=sqrt(3). 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rzmlnormAlt(100, mean = 2, cv = 1, p.zero = 0.5) 
  ezmlnormAlt(dat, ci = TRUE) 
#> 
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#> 
#> Assumed Distribution:            Zero-Modified Lognormal (Delta)
#> 
#> Estimated Parameter(s):          mean         = 1.9604561
#>                                  cv           = 0.9169411
#>                                  p.zero       = 0.4500000
#>                                  mean.zmlnorm = 1.0782508
#>                                  cv.zmlnorm   = 1.5307175
#> 
#> Estimation Method:               mvue
#> 
#> Data:                            dat
#> 
#> Sample Size:                     100
#> 
#> Confidence Interval for:         mean.zmlnorm
#> 
#> Confidence Interval Method:      Normal Approximation
#>                                  (t Distribution)
#> 
#> Confidence Interval Type:        two-sided
#> 
#> Confidence Level:                95%
#> 
#> Confidence Interval:             LCL = 0.748134
#>                                  UCL = 1.408368
#> 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Zero-Modified Lognormal (Delta)
  #
  #Estimated Parameter(s):          mean         = 1.9604561
  #                                 cv           = 0.9169411
  #                                 p.zero       = 0.4500000
  #                                 mean.zmlnorm = 1.0782508
  #                                 cv.zmlnorm   = 1.5307175
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     100
  #
  #Confidence Interval for:         mean.zmlnorm
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution)
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.748134
  #                                 UCL = 1.408368

  #----------

  # Clean up
  rm(dat)