Prediction Interval for a Lognormal Distribution

Estimate the mean and standard deviation on the log-scale for a lognormal distribution, or estimate the mean and coefficient of variation for a lognormal distribution (alternative parameterization), and construct a prediction interval for the next \(k\) observations or next set of \(k\) geometric means.

predIntLnorm(x, n.geomean = 1, k = 1, method = "Bonferroni",
    pi.type = "two-sided", conf.level = 0.95)

  predIntLnormAlt(x, n.geomean = 1, k = 1, method = "Bonferroni",
    pi.type = "two-sided", conf.level = 0.95, est.arg.list = NULL)

Arguments

x

For predIntLnorm, x can be a numeric vector of positive observations, or an object resulting from a call to an estimating function that assumes a lognormal distribution (i.e., elnorm or elnormCensored). You cannot supply objects resulting from a call to estimating functions that use the alternative parameterization such as elnormAlt or elnormAltCensored.

For predIntLnormAlt, a numeric vector of positive observations.

If x is a numeric vector, missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

n.geomean

positive integer specifying the sample size associated with the \(k\) future geometric means. The default value is n.geomean=1 (i.e., individual observations). Note that all future geometric means must be based on the same sample size.

k

positive integer specifying the number of future observations or geometric means the prediction interval should contain with confidence level conf.level. The default value is k=1.

method

character string specifying the method to use if the number of future observations (k) is greater than 1. The possible values are method="Bonferroni" (approximate method based on Bonferonni inequality; the default), and
method="exact" (exact method due to Dunnett, 1955). See the DETAILS section of predIntNormK for more information. This argument is ignored if k=1.

pi.type

character string indicating what kind of prediction interval to compute. The possible values are pi.type="two-sided" (the default), pi.type="lower", and pi.type="upper".

conf.level

a scalar between 0 and 1 indicating the confidence level of the prediction interval. The default value is conf.level=0.95.

est.arg.list

for predIntLnormAlt, a list containing arguments to pass to the function
elnormAlt for estimating the mean and coefficient of variation. The default value is est.arg.list=NULL, which implies the default values will be used in the call to elnormAlt.

Details

The function predIntLnorm returns a prediction interval as well as estimates of the meanlog and sdlog parameters. The function predIntLnormAlt returns a prediction interval as well as estimates of the mean and coefficient of variation.

A prediction interval for a lognormal distribution is constructed by taking the natural logarithm of the observations and constructing a prediction interval based on the normal (Gaussian) distribution by calling predIntNorm. These prediction limits are then exponentiated to produce a prediction interval on the original scale of the data.

Value

If x is a numeric vector, a list of class "estimate" containing the estimated parameters, the prediction interval, and other information. See the help file for
estimate.object for details.

If x is the result of calling an estimation function, predIntLnorm returns a list whose class is the same as x. The list contains the same components as x, as well as a component called interval containing the prediction interval information. If x already has a component called interval, this component is replaced with the prediction interval information.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.

Dunnett, C.W. (1955). A Multiple Comparisons Procedure for Comparing Several Treatments with a Control. Journal of the American Statistical Association 50, 1096-1121.

Dunnett, C.W. (1964). New Tables for Multiple Comparisons with a Control. Biometrics 20, 482-491.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Hahn, G.J. (1969). Factors for Calculating Two-Sided Prediction Intervals for Samples from a Lognormal Distribution. Journal of the American Statistical Association 64(327), 878-898.

Hahn, G.J. (1970a). Additional Factors for Calculating Prediction Intervals for Samples from a Lognormal Distribution. Journal of the American Statistical Association 65(332), 1668-1676.

Hahn, G.J. (1970b). Statistical Intervals for a Lognormal Population, Part I: Tables, Examples and Applications. Journal of Quality Technology 2(3), 115-125.

Hahn, G.J. (1970c). Statistical Intervals for a Lognormal Population, Part II: Formulas, Assumptions, Some Derivations. Journal of Quality Technology 2(4), 195-206.

Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York.

Hahn, G., and W. Nelson. (1973). A Survey of Prediction Intervals and Their Applications. Journal of Quality Technology 5, 178-188.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York.

Helsel, D.R., and R.M. Hirsch. (2002). Statistical Methods in Water Resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. (available on-line at: https://pubs.usgs.gov/tm/04/a03/tm4a3.pdf).

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.

Miller, R.G. (1981a). Simultaneous Statistical Inference. McGraw-Hill, New York.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Prediction and tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Nelson, 1973; Krishnamoorthy and Mathew, 2009). In the context of environmental statistics, prediction intervals are useful for analyzing data from groundwater detection monitoring programs at hazardous and solid waste facilities (e.g., Gibbons et al., 2009; Millard and Neerchal, 2001; USEPA, 2009).

Examples

  # Generate 20 observations from a lognormal distribution with parameters
  # meanlog=0 and sdlog=1.  The exact two-sided 90% prediction interval for
  # k=1 future observation is given by: [exp(-1.645), exp(1.645)] = [0.1930, 5.181].
  # Use predIntLnorm to estimate the distribution parameters, and construct a
  # two-sided 90% prediction interval.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(47)
  dat <- rlnorm(20, meanlog = 0, sdlog = 1)
  predIntLnorm(dat, conf = 0.9)
#> $distribution
#> [1] "Lognormal"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>    meanlog      sdlog 
#> -0.1035722  0.9106429 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "mvue"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Prediction"
#> 
#> $limits
#>       LPL       UPL 
#> 0.1795898 4.5264399 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "exact"
#> 
#> $conf.level
#> [1] 0.9
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> $n.geomean
#> [1] 1
#> 
#> $k
#> [1] 1
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          meanlog = -0.1035722
  #                                 sdlog   =  0.9106429
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Prediction Interval Method:      exact
  #
  #Prediction Interval Type:        two-sided
  #
  #Confidence Level:                90%
  #
  #Number of Future Observations:   1
  #
  #Prediction Interval:             LPL = 0.1795898
  #                                 UPL = 4.5264399

  #----------

  # Repeat the above example, but do it in two steps.
  # First create a list called est.list containing information about the
  # estimated parameters, then create the prediction interval.

  est.list <- elnorm(dat)

  est.list
#> $distribution
#> [1] "Lognormal"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>    meanlog      sdlog 
#> -0.1035722  0.9106429 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "mvue"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> attr(,"class")
#> [1] "estimate"
  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          meanlog = -0.1035722
  #                                 sdlog   =  0.9106429
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20

  predIntLnorm(est.list, conf = 0.9)
#> $distribution
#> [1] "Lognormal"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>    meanlog      sdlog 
#> -0.1035722  0.9106429 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "mvue"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Prediction"
#> 
#> $limits
#>       LPL       UPL 
#> 0.1795898 4.5264399 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "exact"
#> 
#> $conf.level
#> [1] 0.9
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> $n.geomean
#> [1] 1
#> 
#> $k
#> [1] 1
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"
  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          meanlog = -0.1035722
  #                                 sdlog   =  0.9106429
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Prediction Interval Method:      exact
  #
  #Prediction Interval Type:        two-sided
  #
  #Confidence Level:                90%
  #
  #Number of Future Observations:   1
  #
  #Prediction Interval:             LPL = 0.1795898
  #                                 UPL = 4.5264399

  #----------

  # Using the same data from the first example, create a one-sided
  # upper 99% prediction limit for the next 3 geometric means of order 2
  # (i.e., each of the 3 future geometric means is based on a sample size
  # of 2 future observations).

  predIntLnorm(dat, n.geomean = 2, k = 3, conf.level = 0.99,
    pi.type = "upper")
#> $distribution
#> [1] "Lognormal"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>    meanlog      sdlog 
#> -0.1035722  0.9106429 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "mvue"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Prediction"
#> 
#> $limits
#>      LPL      UPL 
#> 0.000000 7.047571 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "Bonferroni"
#> 
#> $conf.level
#> [1] 0.99
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> $n.geomean
#> [1] 2
#> 
#> $k
#> [1] 3
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          meanlog = -0.1035722
  #                                 sdlog   =  0.9106429
  #
  #Estimation Method:               mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Prediction Interval Method:      Bonferroni
  #
  #Prediction Interval Type:        upper
  #
  #Confidence Level:                99%
  #
  #Number of Future
  #Geometric Means:                 3
  #
  #Sample Size for
  #Geometric Means:                 2
  #
  #Prediction Interval:             LPL = 0.000000
  #                                 UPL = 7.047571

  #----------

  # Compare the result above that is based on the Bonferroni method
  # with the exact method

  predIntLnorm(dat, n.geomean = 2, k = 3, conf.level = 0.99,
    pi.type = "upper", method = "exact")$interval$limits["UPL"]
#>     UPL 
#> 7.00316 

  #    UPL
  #7.00316

  #----------

  # Clean up
  rm(dat, est.list)

  #--------------------------------------------------------------------

  # Example 18-2 of USEPA (2009, p.18-15) shows how to construct a 99%
  # upper prediction interval for the log-scale mean of 4 future observations
  # (future mean of order 4) assuming a lognormal distribution based on
  # chrysene concentrations (ppb) in groundwater at 2 background wells.
  # Data were collected once per month over 4 months at the 2 background
  # wells, and also at a compliance well.
  # The question to be answered is whether there is evidence of
  # contamination at the compliance well.

  # Here we will follow the example, but look at the geometric mean
  # instead of the log-scale mean.

  #----------

  # The data for this example are stored in EPA.09.Ex.18.2.chrysene.df.

  EPA.09.Ex.18.2.chrysene.df
#>    Month   Well  Well.type Chrysene.ppb
#> 1      1 Well.1 Background          6.9
#> 2      2 Well.1 Background         27.3
#> 3      3 Well.1 Background         10.8
#> 4      4 Well.1 Background          8.9
#> 5      1 Well.2 Background         15.1
#> 6      2 Well.2 Background          7.2
#> 7      3 Well.2 Background         48.4
#> 8      4 Well.2 Background          7.8
#> 9      1 Well.3 Compliance         68.0
#> 10     2 Well.3 Compliance         48.9
#> 11     3 Well.3 Compliance         30.1
#> 12     4 Well.3 Compliance         38.1

  #   Month   Well  Well.type Chrysene.ppb
  #1      1 Well.1 Background          6.9
  #2      2 Well.1 Background         27.3
  #3      3 Well.1 Background         10.8
  #4      4 Well.1 Background          8.9
  #5      1 Well.2 Background         15.1
  #6      2 Well.2 Background          7.2
  #7      3 Well.2 Background         48.4
  #8      4 Well.2 Background          7.8
  #9      1 Well.3 Compliance         68.0
  #10     2 Well.3 Compliance         48.9
  #11     3 Well.3 Compliance         30.1
  #12     4 Well.3 Compliance         38.1

  Chrysene.bkgd <- with(EPA.09.Ex.18.2.chrysene.df,
    Chrysene.ppb[Well.type == "Background"])
  Chrysene.cmpl <- with(EPA.09.Ex.18.2.chrysene.df,
    Chrysene.ppb[Well.type == "Compliance"])


  #----------

  # A Shapiro-Wilks goodness-of-fit test for normality indicates
  # we should reject the assumption of normality and assume a
  # lognormal distribution for the background well data:

  gofTest(Chrysene.bkgd)
#> $distribution
#> [1] "Normal"
#> 
#> $dist.abb
#> [1] "norm"
#> 
#> $distribution.parameters
#>     mean       sd 
#> 16.55000 14.54441 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $estimation.method
#> [1] "mvue"
#> 
#> $statistic
#>         W 
#> 0.7289006 
#> 
#> $sample.size
#> [1] 8
#> 
#> $parameters
#> n 
#> 8 
#> 
#> $z.value
#> [1] 2.592803
#> 
#> $p.value
#> [1] 0.004759859
#> 
#> $alternative
#> [1] "True cdf does not equal the\n                                 Normal Distribution."
#> 
#> $method
#> [1] "Shapiro-Wilk GOF"
#> 
#> $data
#> [1]  6.9 27.3 10.8  8.9 15.1  7.2 48.4  7.8
#> 
#> $data.name
#> [1] "Chrysene.bkgd"
#> 
#> $bad.obs
#> [1] 0
#> 
#> attr(,"class")
#> [1] "gof"

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Normal
  #
  #Estimated Parameter(s):          mean = 16.55000
  #                                 sd   = 14.54441
  #
  #Estimation Method:               mvue
  #
  #Data:                            Chrysene.bkgd
  #
  #Sample Size:                     8
  #
  #Test Statistic:                  W = 0.7289006
  #
  #Test Statistic Parameter:        n = 8
  #
  #P-value:                         0.004759859
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Normal Distribution.

  gofTest(Chrysene.bkgd, dist = "lnorm")
#> $distribution
#> [1] "Lognormal"
#> 
#> $dist.abb
#> [1] "lnorm"
#> 
#> $distribution.parameters
#>   meanlog     sdlog 
#> 2.5533006 0.7060038 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $estimation.method
#> [1] "mvue"
#> 
#> $statistic
#>         W 
#> 0.8546352 
#> 
#> $sample.size
#> [1] 8
#> 
#> $parameters
#> n 
#> 8 
#> 
#> $z.value
#> [1] 1.247508
#> 
#> $p.value
#> [1] 0.1061057
#> 
#> $alternative
#> [1] "True cdf does not equal the\n                                 Lognormal Distribution."
#> 
#> $method
#> [1] "Shapiro-Wilk GOF"
#> 
#> $data
#> [1]  6.9 27.3 10.8  8.9 15.1  7.2 48.4  7.8
#> 
#> $data.name
#> [1] "Chrysene.bkgd"
#> 
#> $bad.obs
#> [1] 0
#> 
#> attr(,"class")
#> [1] "gof"

  #Results of Goodness-of-Fit Test
  #-------------------------------
  #
  #Test Method:                     Shapiro-Wilk GOF
  #
  #Hypothesized Distribution:       Lognormal
  #
  #Estimated Parameter(s):          meanlog = 2.5533006
  #                                 sdlog   = 0.7060038
  #
  #Estimation Method:               mvue
  #
  #Data:                            Chrysene.bkgd
  #
  #Sample Size:                     8
  #
  #Test Statistic:                  W = 0.8546352
  #
  #Test Statistic Parameter:        n = 8
  #
  #P-value:                         0.1061057
  #
  #Alternative Hypothesis:          True cdf does not equal the
  #                                 Lognormal Distribution.

  #----------

  # Here is the one-sided 99% upper prediction limit for
  # a geometric mean based on 4 future observations:

  predIntLnorm(Chrysene.bkgd, n.geomean = 4, k = 1,
    conf.level = 0.99, pi.type = "upper")
#> $distribution
#> [1] "Lognormal"
#> 
#> $sample.size
#> [1] 8
#> 
#> $parameters
#>   meanlog     sdlog 
#> 2.5533006 0.7060038 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "mvue"
#> 
#> $data.name
#> [1] "Chrysene.bkgd"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Prediction"
#> 
#> $limits
#>      LPL      UPL 
#>  0.00000 46.96613 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "exact"
#> 
#> $conf.level
#> [1] 0.99
#> 
#> $sample.size
#> [1] 8
#> 
#> $dof
#> [1] 7
#> 
#> $n.geomean
#> [1] 4
#> 
#> $k
#> [1] 1
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Lognormal
  #
  #Estimated Parameter(s):          meanlog = 2.5533006
  #                                 sdlog   = 0.7060038
  #
  #Estimation Method:               mvue
  #
  #Data:                            Chrysene.bkgd
  #
  #Sample Size:                     8
  #
  #Prediction Interval Method:      exact
  #
  #Prediction Interval Type:        upper
  #
  #Confidence Level:                99%
  #
  #Number of Future
  #Geometric Means:                 1
  #
  #Sample Size for
  #Geometric Means:                 4
  #
  #Prediction Interval:             LPL =  0.00000
  #                                 UPL = 46.96613

  UPL <- predIntLnorm(Chrysene.bkgd, n.geomean = 4, k = 1,
    conf.level = 0.99, pi.type = "upper")$interval$limits["UPL"]

  UPL
#>      UPL 
#> 46.96613 
  #     UPL
  #46.96613

  # Is there evidence of contamination at the compliance well?

  geoMean(Chrysene.cmpl)
#> [1] 44.19034
  #[1] 44.19034

  # Since the geometric mean at the compliance well is less than
  # the upper prediction limit, there is no evidence of contamination.

  #----------

  # Cleanup
  #--------

  rm(Chrysene.bkgd, Chrysene.cmpl, UPL)