Estimate the mean and coefficient of variation of a lognormal distribution, and optionally construct a confidence interval for the mean.

elnormAlt(x, method = "mvue", ci = FALSE, ci.type = "two-sided", 
    ci.method = "land", conf.level = 0.95, parkin.list = NULL)

Arguments

x

numeric vector of positive observations.

method

character string specifying the method of estimation. Possible values are "mvue" (minimum variance unbiased; the default), "qmle" (quasi maximum likelihood), "mle" (maximum likelihood), "mme" (method of moments), and "mmue" (method of moments based on the unbiased estimate of variance). See the DETAILS section for more information on these estimation methods.

ci

logical scalar indicating whether to compute a confidence interval for the mean. The default value is FALSE.

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.

ci.method

character string indicating what method to use to construct the confidence interval for the mean. The possible values are "land" (Land's method; the default), zou (Zou et al.'s method), "parkin" (Parkin et al.'s method), "cox" (Cox's approximation), and "normal.approx" (normal approximation). See the DETAILS section for more information. This argument is ignored if ci=FALSE.

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

parkin.list

a list containing arguments for the function eqnpar. The components of this list are lcl.rank (set to NULL by default), ucl.rank (set to NULL by default), ci.method (set to "exact" if the sample size is \(\le 20\), otherwise set to
"normal.approx"), and approx.conf.level (set to the value of conf.level). This argument is ignored unless ci=TRUE and ci.method="parkin".

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let \(\underline{x}\) be a vector of \(n\) observations from a lognormal distribution with parameters mean=\(\theta\) and cv=\(\tau\). Let \(\eta\) denote the standard deviation of this distribution, so that \(\eta = \theta \tau\). Set \(\underline{y} = log(\underline{x})\). Then \(\underline{y}\) is a vector of observations from a normal distribution with parameters mean=\(\mu\) and sd=\(\sigma\). See the help file for LognormalAlt for the relationship between \(\theta, \tau, \eta, \mu\), and \(\sigma\).

Estimation
This section explains how each of the estimators of mean=\(\theta\) and cv=\(\tau\) are computed. The approach is to first compute estimates of \(\theta\) and \(\eta^2\) (the mean and variance of the lognormal distribution), say \(\hat{\theta}\) and \(\hat{\eta}^2\), then compute the estimate of the cv \(\tau\) by \(\hat{\tau} = \hat{\eta}/\hat{\theta}\).

Minimum Variance Unbiased Estimation (method="mvue")
The minimum variance unbiased estimators (mvue's) of \(\theta\) and \(\eta^2\) were derived by Finney (1941) and are discussed in Gilbert (1987, pp. 164-167) and Cohn et al. (1989). These estimators are computed as: $$\hat{\theta}_{mvue} = e^{\bar{y}} g_{n-1}(\frac{s^2}{2}) \;\;\;\; (1)$$ $$\hat{\eta}^2_{mvue} = e^{2 \bar{y}} \{g_{n-1}(2s^2) - g_{n-1}[\frac{(n-2)s^2}{n-1}]\} \;\;\;\; (2)$$ where $$\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \;\;\;\; (3)$$ $$s^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2 \;\;\;\; (4)$$ $$g_m(z) = \sum_{i=0}^\infty \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (5)$$

The expected value and variance of the mvue of \(\theta\) are (Bradu and Mundlak, 1970; Cohn et al., 1989): $$E[\hat{\theta}_{mvue}] = \theta \;\;\;\; (6)$$ $$Var[\hat{\theta}_{mvue}] = e^{2\mu} \{e^{[(2+n-1)\sigma^2]/n} g_{n-1}(\frac{\sigma^4}{4n}) - e^{\sigma^2} \} \;\;\;\; (7)$$

Maximum Likelihood Estimation (method="mle")
The maximum likelihood estimators (mle's) of \(\theta\) and \(\eta^2\) are given by: $$\hat{\theta}_{mle} = exp(\bar{y} + \frac{\hat{\sigma}^2_{mle}}{2}) \;\;\;\; (8)$$ $$\hat{\eta}^2_{mle} = \hat{\theta}^2_{mle} \hat{\tau}^2_{mle} \;\;\;\; (9)$$ where $$\hat{\tau}^2_{mle} = exp(\hat{\sigma}^2_{mle}) - 1 \;\;\;\; (10)$$ $$\hat{\sigma}^2_{mle} = \frac{n-1}{n} s^2 \;\;\;\; (11)$$

The expected value and variance of the mle of \(\theta\) are (after Cohn et al., 1989): $$E[\hat{\theta}_{mle}] = \theta exp[\frac{-(n-1)\sigma^2}{2n}] (1 - \frac{\sigma^2}{n})^{-(n-1)/2} \;\;\;\; (12)$$ $$Var[\hat{\theta}_{mle}] = exp(2\mu + \frac{\sigma^2}{n}) \{exp(\frac{\sigma^2}{n}) [1 - \frac{2\sigma^2}{n}]^{-(n-1)/2} - [1 - \frac{\sigma^2}{n}]^{-(n-1)} \} \;\;\;\; (13)$$ As can be seen from equation (12), the expected value of the mle of \(\theta\) does not exist when \(\sigma^2 > n\). In general, the \(p\)'th moment of the mle of \(\theta\) does not exist when \(\sigma^2 > n/p\).

Quasi Maximum Likelihood Estimation (method="qmle")
The quasi maximum likelihood estimators (qmle's; Cohn et al., 1989; Gilbert, 1987, p.167) of \(\theta\) and \(\eta^2\) are the same as the mle's, except the mle of \(\sigma^2\) in equations (8) and (10) is replaced with the more commonly used mvue of \(\sigma^2\) shown in equation (4): $$\hat{\theta}_{qmle} = exp(\bar{y} + \frac{s^2}{2}) \;\;\;\; (14)$$ $$\hat{\eta}^2_{qmle} = \hat{\theta}^2_{qmle} \hat{\tau}^2_{qmle} \;\;\;\; (15)$$ $$\hat{\tau}^2_{qmle} = exp(s^2) - 1 \;\;\;\; (16)$$

The expected value and variance of the qmle of \(\theta\) are (Cohn et al., 1989): $$E[\hat{\theta}_{mle}] = \theta exp[\frac{-(n-1)\sigma^2}{2n}] (1 - \frac{\sigma^2}{n-1})^{-(n-1)/2} \;\;\;\; (17)$$ $$Var[\hat{\theta}_{mle}] = exp(2\mu + \frac{\sigma^2}{n}) \{exp(\frac{\sigma^2}{n}) [1 - \frac{2\sigma^2}{n-1}]^{-(n-1)/2} - [1 - \frac{\sigma^2}{n-1}]^{-(n-1)} \} \;\;\;\; (18)$$ As can be seen from equation (17), the expected value of the qmle of \(\theta\) does not exist when \(\sigma^2 > (n - 1)\). In general, the \(p\)'th moment of the mle of \(\theta\) does not exist when \(\sigma^2 > (n - 1)/p\).

Note that Gilbert (1987, p. 167) incorrectly presents equation (12) rather than equation (17) as the expected value of the qmle of \(\theta\). For large values of \(n\) relative to \(\sigma^2\), however, equations (12) and (17) are virtually identical.

Method of Moments Estimation (method="mme")
The method of moments estimators (mme's) of \(\theta\) and \(\eta^2\) are found by equating the sample mean and variance with their population values: $$\hat{\theta}_{mme} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (19)$$ $$\hat{\eta}_{mme} = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (20)$$ Note that the estimator of variance in equation (20) is biased.

The expected value and variance of the mme of \(\theta\) are: $$E[\hat{\theta}_{mme}] = \theta \;\;\;\; (21)$$ $$Var[\hat{\theta}_{mme}] = \frac{\eta^2}{n} = \frac{1}{n} exp(2\mu + \sigma^2) [exp(\sigma^2)-1] \;\;\;\; (22)$$

Method of Moments Estimation Based on the Unbiased Estimate of Variance (method="mmue")
These estimators are exactly the same as the method of moments estimators described above, except that the usual unbiased estimate of variance is used: $$\hat{\theta}_{mmue} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (23)$$ $$\hat{\eta}_{mmue} = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (24)$$ Since the mmue of \(\theta\) is equivalent to the mme of \(\theta\), so are its mean and varaince.

Confidence Intervals
This section explains the different methods for constructing confidence intervals for \(\theta\), the mean of the lognormal distribution.

Land's Method (ci.method="land")
Land (1971, 1975) derived a method for computing one-sided (lower or upper) uniformly most accurate unbiased confidence intervals for \(\theta\). A two-sided confidence interval can be constructed by combining an optimal lower confidence limit with an optimal upper confidence limit. This procedure for two-sided confidence intervals is only asymptotically optimal, but for most purposes should be acceptable (Land, 1975, p.387).

As shown in equation (3) in the help file for LognormalAlt, the mean \(\theta\) of a lognormal random variable is related to the mean \(\mu\) and standard deviation \(\sigma\) of the log-transformed random variable by the following relationship: $$\theta = e^{\beta} \;\;\;\; (25)$$ where $$\beta = \mu + \frac{\sigma^2}{2} \;\;\;\; (26)$$ Land (1971) developed confidence bounds for the quantity \(\beta\). The mvue of \(\beta\) is given by: $$\hat{\beta}_{mvue} = \bar{y} + \frac{s^2}{2} \;\;\;\; (27)$$ Note that \(\hat{\theta}_{qmle} = exp(\hat{\beta}_{mvue})\). The \((1-\alpha)100\%\) two-sided confidence interval for \(\beta\) is given by: $$[ \hat{\beta}_{mvue} + s \frac{C_{\alpha/2}}{\sqrt{n-1}}, \; \hat{\beta}_{mvue} + s \frac{C_{1-\alpha/2}}{\sqrt{n-1}} ] \;\;\;\; (28)$$ the \((1-\alpha)100\%\) one-sided upper confidence interval for \(\beta\) is given by: $$[ -\infty, \; \hat{\beta}_{mvue} + s \frac{C_{1-\alpha}}{\sqrt{n-1}} ] \;\;\;\; (29)$$ and the \((1-\alpha)100\%\) one-sided lower confidence interval for \(\beta\) is given by: $$[ \hat{\beta}_{mvue} + s \frac{C_{\alpha}}{\sqrt{n-1}}, \; \infty ] \;\;\;\; (30)$$ where \(s\) is the estimate of \(\sigma\) (see equation (4) above), and the factor \(C\) is given in tables in Land (1975).

Thus, by equations (25)-(30), the two-sided \((1-\alpha)100\%\) confidence interval for \(\theta\) is given by: $$\{\hat{\theta}_{qmle} exp[s \frac{C_{\alpha/2}}{\sqrt{n-1}}], \; \hat{\theta}_{qmle} exp[s \frac{C_{1-\alpha/2}}{\sqrt{n-1}}] \} \;\;\;\; (31)$$ the \((1-\alpha)100\%\) one-sided upper confidence interval for \(\theta\) is given by: $$\{ 0, \; \hat{\theta}_{qmle} exp[s \frac{C_{1-\alpha}}{\sqrt{n-1}}] \} \;\;\;\; (32)$$ and the \((1-\alpha)100\%\) one-sided lower confidence interval for \(\theta\) is given by: $$\{\hat{\theta}_{qmle} exp[s \frac{C_{\alpha}}{\sqrt{n-1}} ], \; \infty \} \;\;\;\; (33)$$

Note that Gilbert (1987, pp. 169-171, 264-265) denotes the quantity \(C\) above as \(H\) and reproduces a subset of Land's (1975) tables. Some guidance documents (e.g., USEPA, 1992d) refer to this quantity as the \(H\)-statistic.

Zou et al.'s Method (ci.method="zou")
Zou et al. (2009) proposed the following approximation for the two-sided \((1-\alpha)100\%\) confidence intervals for \(\theta\). The lower limit \(LL\) is given by: $$LL = \hat{\theta}_{qmle} exp\{ -[\frac{z^2_{1-\alpha/2}s^2}{n} + (\frac{s^2}{2} - \frac{(n-1)s^2}{2\chi^2_{1-\alpha/2, n-1}})^2]^{1/2}\} \;\;\;\; (34)$$ and the upper limit \(UL\) is given by: $$UL = \hat{\theta}_{qmle} exp\{ [\frac{z^2_{1-\alpha/2}s^2}{n} + (\frac{(n-1)s^2}{2\chi^2_{\alpha/2, n-1}} - \frac{s^2}{2})^2]^{1/2}\} \;\;\;\; (35)$$ where \(z_p\) denotes the \(p\)'th quantile of the standard normal distribuiton, and \(\chi_{p, \nu}\) denotes the \(p\)'th quantile of the chi-square distribution with \(\nu\) degrees of freedom. The \((1-\alpha)100\%\) one-sided lower confidence limit and one-sided upper confidence limit are given by equations (34) and (35), respectively, with \(\alpha/2\) replaced by \(\alpha\).

Parkin et al.'s Method (ci.method="parkin")
This method was developed by Parkin et al. (1990). It can be shown that the mean of a lognormal distribution corresponds to the \(p\)'th quantile, where $$p = \Phi(\frac{\sigma}{2}) \;\;\;\; (36)$$ and \(\Phi\) denotes the cumulative distribution function of the standard normal distribution. Parkin et al. (1990) suggested estimating \(p\) by replacing \(\sigma\) in equation (36) with the estimate \(s\) as computed in equation (4). Once an estimate of \(p\) is obtained, a nonparametric confidence interval can be constructed for \(p\), assuming \(p\) is equal to its estimated value (see eqnpar).

Cox's Method (ci.method="cox")
This method was suggested by Professor D.R. Cox and is illustrated in Land (1972). El-Shaarawi (1989) adapts this method to the case of censored water quality data. Cox's idea is to construct an approximate \((1-\alpha)100\%\) confidence interval for the quantity \(\beta\) defined in equation (26) above assuming the estimate of \(\beta\) is approximately normally distributed, and then exponentiate the confidence limits. That is, a two-sided \((1-\alpha)100\%\) confidence interval for \(\theta\) is constructed as: $$[exp(\hat{\beta} - t_{1-\alpha/2, n-1}\hat{\sigma}_{\hat{\beta}}), \; exp(\hat{\beta} + t_{1-\alpha/2, n-1}\hat{\sigma}_{\hat{\beta}})] \;\;\;\; (37)$$ where \(t(p, \nu)\) denotes the \(p\)'th quantile of Student's t-distribution with \(\nu\) degrees of freedom. Note that this method, unlike the normal approximation method discussed below, guarantees a positive value for the lower confidence limit. One-sided confidence intervals are computed in a similar fashion.

Define an estimator of \(\beta\) by: $$\hat{\beta} = \hat{\mu} + \frac{\hat{\sigma}^2}{2} \;\;\;\; (38)$$ Then the variance of this estimator is given by: $$Var(\hat{\beta}) = Var(\hat{\mu}) + Cov(\hat{\mu}, \hat{\sigma}^2) + \frac{1}{4}Var(\hat{\sigma}^2) \;\;\;\; (39)$$ The function elnormAlt follows Land (1972) and uses the minimum variance unbiased estimator for \(\beta\) shown in equation (27) above, so the variance and estimated variance of this estimator are: $$Var(\hat{\beta}_{mvue}) = \frac{\sigma^2}{n} + \frac{\sigma^4}{2(n-1)} \;\;\;\; (40)$$ $$\hat{\sigma}^2_{\hat{\beta}} = \frac{s^2}{n} + \frac{s^4}{2(n+1)} \;\;\;\; (41)$$ Note that El-Shaarawi (1989, equation 5) simply replaces the value of \(s^2\) in equation (41) with some estimator of \(\sigma^2\) (the mle or mvue of \(\sigma^2\)), rather than using the mvue of the variance of \(\beta\) as shown in equation (41).

Normal Approximation (ci.method="normal.approx") This method constructs approximate \((1-\alpha)100\%\) confidence intervals for \(\theta\) based on the assumption that the estimator of \(\theta\) is approximately normally distributed. That is, a two-sided \((1-\alpha)100\%\) confidence interval for \(\theta\) is constructed as: $$[\hat{\theta} - t_{1-\alpha/2, n-1}\hat{\sigma}_{\hat{\theta}}, \; \hat{\theta} + t_{1-\alpha/2, n-1}\hat{\sigma}_{\hat{\theta}}] \;\;\;\; (42)$$ One-sided confidence intervals are computed in a similar fashion.

When method="mvue" is used to estimate \(\theta\), an unbiased estimate of the variance of the estimator of \(\theta\) is used in equation (42) (Bradu and Mundlak, 1970, equation 4.3; Gilbert, 1987, equation 13.5): $$\hat{\sigma^2}_{\hat{\theta}} = e^{2\bar{y}} \{[g_{n-1}(\frac{s^2}{2})]^2 - g_{n-1}[\frac{s^2(n-2)}{n-1}] \} \;\;\;\; (43)$$

When method="mle" is used to estimate \(\theta\), the estimate of the variance of the estimator of \(\theta\) is computed by replacing \(\mu\) and \(\sigma^2\) in equation (13) with their mle's: $$\hat{\sigma}^2_{\hat{\theta}} = exp(2\bar{y} + \frac{\hat{\sigma}^2_{mle}}{n}) \{exp(\frac{\hat{\sigma}^2_{mle}}{n}) [1 - \frac{2\hat{\sigma}^2_{mle}}{n}]^{-(n-1)/2} - [1 - \frac{\hat{\sigma}^2_{mle}}{n}]^{-(n-1)} \} \;\;\;\; (44)$$

When method="qmle" is used to estimate \(\theta\), the estimate of the variance of the estimator of \(\theta\) is computed by replacing \(\mu\) and \(\sigma^2\) in equation (18) with their mvue's: $$\hat{\sigma}^2_{\hat{\theta}} = exp(2\bar{y} + \frac{s^2}{n}) \{exp(\frac{s^2}{n}) [1 - \frac{2 s^2}{n-1}]^{-(n-1)/2} - [1 - \frac{s^2}{n-1}]^{-(n-1)} \} \;\;\;\; (45)$$ Note that equation (45) is exactly the same as Gilbert's (1987, p. 167) equation 13.8a, except that Gilbert (1987) erroneously uses \(n\) where he should use \(n-1\) instead. For large values of \(n\) relative to \(s^2\), however, this makes little difference.

When method="mme", the estimate of the variance of the estimator of \(\theta\) is computed by replacing \(eta^2\) in equation (22) with the mme of \(\eta^2\) defined in equation (20): $$\hat{\sigma}^2_{\hat{\theta}} = \frac{\hat{\eta}_{mme}^2}{n} = \frac{1}{n^2} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (46)$$

When method="mmue", the estimate of the variance of the estimator of \(\theta\) is computed by replacing \(eta^2\) in equation (22) with the mmue of \(\eta^2\) defined in equation (24): $$\hat{\sigma}^2_{\hat{\theta}} = \frac{\hat{\eta}_{mmue}^2}{n} = \frac{1}{n(n-1)} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\; (47)$$

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

References

Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special references to its uses in economics). Cambridge University Press, London, Chapter 5.

Armstrong, B.G. (1992). Confidence Intervals for Arithmetic Means of Lognormally Distributed Exposures. American Industrial Hygiene Association Journal 53, 481–485.

Bradu, D., and Y. Mundlak. (1970). Estimation in Lognormal Linear Models. Journal of the American Statistical Association 65, 198–211.

Cohn, T.A., L.L. DeLong, E.J. Gilroy, R.M. Hirsch, and D.K. Wells. (1989). Estimating Constituent Loads. Water Resources Research 25(5), 937–942.

Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, Chapter 2.

El-Shaarawi, A.H., and J. Lin. (2007). Interval Estimation for Log-Normal Mean with Applications to Water Quality. Environmetrics 18, 1–10.

El-Shaarawi, A.H., and R. Viveros. (1997). Inference About the Mean in Log-Regression with Environmental Applications. Environmetrics 8, 569–582.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Finney, D.J. (1941). On the Distribution of a Variate Whose Logarithm is Normally Distributed. Supplement to the Journal of the Royal Statistical Society 7, 155–161.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Krishnamoorthy, K., and T.P. Mathew. (2003). Inferences on the Means of Lognormal Distributions Using Generalized p-Values and Generalized Confidence Intervals. Journal of Statistical Planning and Inference 115, 103–121.

Land, C.E. (1971). Confidence Intervals for Linear Functions of the Normal Mean and Variance. The Annals of Mathematical Statistics 42(4), 1187–1205.

Land, C.E. (1972). An Evaluation of Approximate Confidence Interval Estimation Methods for Lognormal Means. Technometrics 14(1), 145–158.

Land, C.E. (1973). Standard Confidence Limits for Linear Functions of the Normal Mean and Variance. Journal of the American Statistical Association 68(344), 960–963.

Land, C.E. (1975). Tables of Confidence Limits for Linear Functions of the Normal Mean and Variance, in Selected Tables in Mathematical Statistics, Vol. III. American Mathematical Society, Providence, RI, pp. 385–419.

Likes, J. (1980). Variance of the MVUE for Lognormal Variance. Technometrics 22(2), 253–258.

Limpert, E., W.A. Stahel, and M. Abbt. (2001). Log-Normal Distributions Across the Sciences: Keys and Clues. BioScience 51, 341–352.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

Parkin, T.B., J.J. Meisinger, S.T. Chester, J.L. Starr, and J.A. Robinson. (1988). Evaluation of Statistical Estimation Methods for Lognormally Distributed Variables. Journal of the Soil Science Society of America 52, 323–329.

Parkin, T.B., S.T. Chester, and J.A. Robinson. (1990). Calculating Confidence Intervals for the Mean of a Lognormally Distributed Variable. Journal of the Soil Science Society of America 54, 321–326.

Singh, A., A.K. Singh, and R.J. Iaci. (2002). Estimation of the Exposure Point Concentration Term Using a Gamma Distribution. EPA/600/R-02/084. October 2002. Technology Support Center for Monitoring and Site Characterization, Office of Research and Development, Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (1992d). Supplemental Guidance to RAGS: Calculating the Concentration Term. Publication 9285.7-081, May 1992. Intermittenet Bulletin, Volume 1, Number 1. Office of Emergency and Remedial Response, Hazardous Site Evaluation Division, OS-230. Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Zou, G.Y., C.Y. Huo, and J. Taleban. (2009). Simple Confidence Intervals for Lognormal Means and their Differences with Environmental Applications. Environmetrics 20, 172–180.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The normal and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean or variance. This is done with confidence intervals.

Some EPA guidance documents (e.g., Singh et al., 2002; Singh et al., 2010a,b) strongly recommend against using a lognormal model for environmental data and recommend trying a gamma distribuiton instead.

USEPA (1992d) directs persons involved in risk assessment for Superfund sites to use Land's (1971, 1975) method (ci.method="land") for computing the upper 95% confidence interval for the mean, assuming the data follow a lognormal distribution (the guidance document cites Gilbert (1987) as a source of descriptions and tables for this method). The last example in the EXAMPLES section below reproduces an example from this guidance document.

In the past, some authors suggested using the geometric mean, also called the "rating curve" estimator (Cohn et al., 1989), as the estimator of the mean, \(\theta\). This estimator is computed as: $$\hat{\theta}_{rc} = e^{\bar{y}} \;\;\;\; (48)$$ Cohn et al. (1989) cite several authors who have pointed out this estimator is biased and is not even a consistent estimator of the mean. In fact, it is the maximum likelihood estimator of the median of the distribution (see eqlnorm.)

Finney (1941) computed the efficiency of the method of moments estimators of the mean (\(\theta\)) and variance (\(\eta^2\)) of the lognormal distribution (equations (19)-(20)) relative to the mvue's (equations (1)-(2)) as a function of \(\sigma^2\) (the variance of the log-transformed observations), and found that while the mme of \(\theta\) is reasonably efficient compared to the mvue of \(\theta\), the mme of \(\eta^2\) performs quite poorly relative to the mvue of \(\eta^2\).

Cohn et al. (1989) and Parkin et al. (1988) have shown that the qmle and the mle of the mean can be severely biased for typical environmental data, and suggest always using the mvue.

Parkin et al. (1990) studied the performance of various methods for constructing a confidence interval for the mean via Monte Carlo simulation. They compared approximate methods to Land's optimal method (ci.method="land"). They used four parent lognormal distributions to generate observations; all had mean 10, but differed in coefficient of variation: 50, 100, 200, and 500%. They also generated sample sizes from 6 to 100 in increments of 2. For each combination of parent distribution and sample size, they generated 25,000 Monte Carlo trials. Parkin et al. found that for small sample sizes (\(n < 20\)), none of the approximate methods ("parkin", "cox", "normal.approx") worked very well. For \(n > 20\), their method ("parkin") provided reasonably accurate coverage. Cox's method ("cox") worked well for \(n > 60\), and performed slightly better than Parkin et al.'s method ("parkin") for highly skewed populations.

Zou et al. (2009) used Monte Carlo simulation to compare the performance of their method with the CGI method of Krishnamoorthy and Mathew (2003) and the modified Cox method of Armstrong (1992) and El-Shaarawi and Lin (2007). Performance was assessed based on 1) percentage of times the interval contained the parameter value (coverage%), 2) balance between left and right tail errors, and 3) confidence interval width. All three methods showed acceptable coverage percentages. The modified Cox method showed unbalanced tail errors, and Zou et al.'s method showed consistently narrower average width.

Examples

  # Using the Reference area TcCB data in the data frame EPA.94b.tccb.df, 
  # estimate the mean and coefficient of variation, 
  # and construct a 95% confidence interval for the mean.

  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], ci = TRUE))  
#> 
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#> 
#> Assumed Distribution:            Lognormal
#> 
#> Estimated Parameter(s):          mean = 0.5989072
#>                                  cv   = 0.4899539
#> 
#> Estimation Method:               mvue
#> 
#> Data:                            TcCB[Area == "Reference"]
#> 
#> Sample Size:                     47
#> 
#> Confidence Interval for:         mean
#> 
#> Confidence Interval Method:      Land
#> 
#> Confidence Interval Type:        two-sided
#> 
#> Confidence Level:                95%
#> 
#> Confidence Interval:             LCL = 0.5243787
#>                                  UCL = 0.7016992
#> 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          mean = 0.5989072
  #                                 cv   = 0.4899539
  #
  #Estimation Method:               mvue
  #
  #Data:                            TcCB[Area == "Reference"]
  #
  #Sample Size:                     47
  #
  #Confidence Interval for:         mean
  #
  #Confidence Interval Method:      Land
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.5243787
  #                                 UCL = 0.7016992

  #----------

  # Compare the different methods of estimating the distribution parameters using the 
  # Reference area TcCB data.

  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], method = "mvue"))$parameters
#>      mean        cv 
#> 0.5989072 0.4899539 
  #     mean        cv 
  #0.5989072 0.4899539

  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], method = "qmle"))$parameters
#>      mean        cv 
#> 0.6004468 0.4947791 
  #     mean        cv 
  #0.6004468 0.4947791 
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], method = "mle"))$parameters
#>      mean        cv 
#> 0.5990497 0.4888968 
  #     mean        cv 
  #0.5990497 0.4888968 
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], method = "mme"))$parameters
#>      mean        cv 
#> 0.5985106 0.4688423 
  #     mean        cv 
  #0.5985106 0.4688423 
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], method = "mmue"))$parameters
#>      mean        cv 
#> 0.5985106 0.4739110 
  #     mean        cv 
  #0.5985106 0.4739110

  #----------

  # Compare the different methods of constructing the confidence interval for
  # the mean using the Reference area TcCB data.
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], 
    method = "mvue", ci = TRUE, ci.method = "land"))$interval$limits
#>       LCL       UCL 
#> 0.5243787 0.7016992 
  #      LCL       UCL 
  #0.5243787 0.7016992

  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], 
    method = "mvue", ci = TRUE, ci.method = "zou"))$interval$limits
#>       LCL       UCL 
#> 0.5230444 0.6962071 
  #      LCL       UCL 
  #0.5230444 0.6962071 

  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], 
    method = "mvue", ci = TRUE, ci.method = "parkin"))$interval$limits
#>  LCL  UCL 
#> 0.50 0.74 
  # LCL  UCL 
  #0.50 0.74 
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], 
     method = "mvue", ci = TRUE, ci.method = "cox"))$interval$limits
#>       LCL       UCL 
#> 0.5196213 0.6938444 
  #      LCL       UCL 
  #0.5196213 0.6938444 
 
  with(EPA.94b.tccb.df, elnormAlt(TcCB[Area == "Reference"], 
     method = "mvue", ci = TRUE, ci.method = "normal.approx"))$interval$limits
#>       LCL       UCL 
#> 0.5130160 0.6847984 
  #      LCL       UCL 
  #0.5130160 0.6847984 

  #----------

  # Reproduce the example in Highlights 7 and 8 of USEPA (1992d).  This example shows 
  # how to compute the upper 95% confidence limit of the mean of a lognormal distribution 
  # and compares it to the result of computing the upper 95% confidence limit assuming a 
  # normal distribution. The data for this example are chromium concentrations (mg/kg) in 
  # soil samples collected randomly over a Superfund site, and are stored in the data frame 
  # EPA.92d.chromium.vec.

  # First look at the data 

  EPA.92d.chromium.vec
#>  [1]   10   13   20   36   41   59   67  110  110  136  140  160  200  230 1300
  # [1]   10   13   20   36   41   59   67  110  110  136  140  160  200  230 1300

  stripChart(EPA.92d.chromium.vec, ylab = "Chromium (mg/kg)")


  # Note there is one very large "outlier" (1300).  
  # Perform a goodness-of-fit test to determine whether a lognormal distribution 
  # is appropriate:

  gof.list <- gofTest(EPA.92d.chromium.vec, dist = 'lnormAlt') 
  gof.list 
#> 
#> Results of Goodness-of-Fit Test
#> -------------------------------
#> 
#> Test Method:                     Shapiro-Wilk GOF
#> 
#> Hypothesized Distribution:       Lognormal
#> 
#> Estimated Parameter(s):          mean = 159.855185
#>                                  cv   =   1.493994
#> 
#> Estimation Method:               mvue
#> 
#> Data:                            EPA.92d.chromium.vec
#> 
#> Sample Size:                     15
#> 
#> Test Statistic:                  W = 0.9607179
#> 
#> Test Statistic Parameter:        n = 15
#> 
#> P-value:                         0.7048747
#> 
#> Alternative Hypothesis:          True cdf does not equal the
#>                                  Lognormal Distribution.

  #Results of Goodness-of-Fit Test 
  #------------------------------- 
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Lognormal 
  #
  #Estimated Parameter(s):          mean = 159.855185
  #                                 cv   =   1.493994
  #
  #Estimation Method:               mvue
  #
  #Data:                            EPA.92d.chromium.vec
  #
  #Sample Size:                     15
  #
  #Test Statistic:                  W = 0.9607179
  #
  #Test Statistic Parameter:        n = 15
  #
  #P-value:                         0.7048747
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Lognormal Distribution. 

  plot(gof.list, digits = 2)


  # The lognormal distribution seems to provide an adequate fit, although the largest 
  # observation (1300) is somewhat suspect, and given the small sample size there is 
  # not much power to detect any kind of mild deviation from a lognormal distribution.
  
  # Now compute the one-sided 95% upper confidence limit for the mean.  
  # Note that the value of 502 mg/kg shown in Hightlight 7 of USEPA (1992d) is a bit 
  # larger than the exact value of 496.6 mg/kg shown below.  
  # This is simply due to rounding error.

  elnormAlt(EPA.92d.chromium.vec, ci = TRUE, ci.type = "upper") 
#> 
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#> 
#> Assumed Distribution:            Lognormal
#> 
#> Estimated Parameter(s):          mean = 159.855185
#>                                  cv   =   1.493994
#> 
#> Estimation Method:               mvue
#> 
#> Data:                            EPA.92d.chromium.vec
#> 
#> Sample Size:                     15
#> 
#> Confidence Interval for:         mean
#> 
#> Confidence Interval Method:      Land
#> 
#> Confidence Interval Type:        upper
#> 
#> Confidence Level:                95%
#> 
#> Confidence Interval:             LCL =   0.0000
#>                                  UCL = 496.6282
#> 

  #Results of Distribution Parameter Estimation 
  #-------------------------------------------- 
  #
  #Assumed Distribution:          Lognormal 
  #
  #Estimated Parameter(s):        mean = 159.855185
  #                                 cv   =   1.493994 
  #
  #Estimation Method:             mvue 
  #
  #Data:                          EPA.92d.chromium.vec 
  #
  #Sample Size:                   15 
  #
  #Confidence Interval for:       mean 
  #
  #Confidence Interval Method:    Land 
  #
  #Confidence Interval Type:      upper 
  #
  #Confidence Level:              95% 
  #
  #Confidence Interval:           LCL =   0 
  #                               UCL = 496.6282 

  # Now compare this result with the upper 95% confidence limit based on assuming 
  # a normal distribution.  Again note that the value of 325 mg/kg shown in 
  # Hightlight 8 is slightly larger than the exact value of 320.3 mg/kg shown below.  
  # This is simply due to rounding error.

  enorm(EPA.92d.chromium.vec, ci = TRUE, ci.type = "upper") 
#> 
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#> 
#> Assumed Distribution:            Normal
#> 
#> Estimated Parameter(s):          mean = 175.4667
#>                                  sd   = 318.5440
#> 
#> Estimation Method:               mvue
#> 
#> Data:                            EPA.92d.chromium.vec
#> 
#> Sample Size:                     15
#> 
#> Confidence Interval for:         mean
#> 
#> Confidence Interval Method:      Exact
#> 
#> Confidence Interval Type:        upper
#> 
#> Confidence Level:                95%
#> 
#> Confidence Interval:             LCL =     -Inf
#>                                  UCL = 320.3304
#> 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Estimated Parameter(s):          mean = 175.4667
  #                                 sd   = 318.5440
  #
  #Estimation Method:               mvue
  #
  #Data:                            EPA.92d.chromium.vec
  #
  #Sample Size:                     15
  #
  #Confidence Interval for:         mean
  #
  #Confidence Interval Method:      Exact
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =     -Inf
  #                                 UCL = 320.3304

  #----------

  # Clean up
  #---------

  rm(gof.list)