Tolerance Interval for a Normal Distribution Based on Censored Data

Construct a \(\beta\)-content or \(\beta\)-expectation tolerance interval for a normal distribution based on Type I or Type II censored data.

tolIntNormCensored(x, censored, censoring.side = "left", coverage = 0.95, 
    cov.type = "content", ti.type = "two-sided", conf.level = 0.95, 
    method = "mle", ti.method = "exact.for.complete", seed = NULL, 
    nmc = 1000)

Arguments

x: numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
censored: numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of x that are censored, and FALSE values correspond to elements of x that are not censored. If the mode of censored is "numeric", it must contain only 1's and 0's; 1 corresponds to TRUE and 0 corresponds to FALSE. Missing (NA) values are allowed but will be removed.
censoring.side: character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".
coverage: a scalar between 0 and 1 indicating the desired coverage of the tolerance interval. The default value is coverage=0.95.
cov.type: character string specifying the coverage type for the tolerance interval. The possible values are "content" (\(\beta\)-content; the default), and "expectation" (\(\beta\)-expectation). See the DETAILS section for more information.
ti.type: character string indicating what kind of tolerance interval to compute. The possible values are "two-sided" (the default), "lower", and "upper".
conf.level: a scalar between 0 and 1 indicating the confidence level associated with the tolerance interval. The default value is conf.level=0.95.
method: character string indicating the method to use for parameter estimation.

For singly censored data, possible values are "mle" (the default), "bcmle", "qq.reg", "qq.reg.w.cen.level", "impute.w.qq.reg",
"impute.w.qq.reg.w.cen.level", "impute.w.mle",
"iterative.impute.w.qq.reg", "m.est", and "half.cen.level". See the help file for enormCensored for details.

For multiply censored data, possible values are "mle" (the default), "qq.reg", "impute.w.qq.reg", and "half.cen.level". See the help file for
enormCensored for details.
ti.method: character string specifying the method for constructing the tolerance interval. Possible values are:
"exact.for.complete" (the default),
"gpq" (Generalized Pivotal Quantity), and
"wald.wolfowitz.for.complete" (only available for a two-sided tolerance interval, i.e., when ti.type="two-sided").
See the DETAILS section for more information.
seed: for the case when ti.method="gpq", a positive integer to pass to the function gpqTolIntNormSinglyCensored or gpqTolIntNormMultiplyCensored. This argument is ignored if seed=NULL (the default). Using the seed argument lets you reproduce the exact same result if all other arguments stay the same.
nmc: for the case when ti.method="gpq", a positive integer \(\ge 10\) indicating the number of Monte Carlo trials to run in order to compute the GPQ(s).

Details

See the help file for tolIntNorm for an explanation of tolerance intervals. When ti.method="gpq", the tolerance interval is constructed using the method of Generalized Pivotal Quantities as explained in Krishnamoorthy and Mathew (2009, p. 327). When ti.method="exact.for.complete" or ti.method="wald.wolfowitz.for.complete", the tolerance interval is constructed by first computing the maximum likelihood estimates of the mean and standard deviation by calling
enormCensored, then passing these values to the function tolIntNorm to produce the tolerance interval as if the estimates were based on complete rather than censored data. These last two methods are purely ad-hoc and their properties need to be studied.

Value

A list of class "estimateCensored" containing the estimated parameters, the tolerance interval, and other information. See estimateCensored.object for details.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.

Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York.

Ellison, B.E. (1964). On Two-Sided Tolerance Intervals for a Normal Distribution. Annals of Mathematical Statistics 35, 762-772.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT.

Hahn, G.J. (1970b). Statistical Intervals for a Normal Population, Part I: Tables, Examples and Applications. Journal of Quality Technology 2(3), 115-125.

Hahn, G.J. (1970c). Statistical Intervals for a Normal Population, Part II: Formulas, Assumptions, Some Derivations. Journal of Quality Technology 2(4), 195-206.

Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York.

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton.

Odeh, R.E., and D.B. Owen. (1980). Tables for Normal Tolerance Limits, Sampling Plans, and Screening. Marcel Dekker, New York.

Owen, D.B. (1962). Handbook of Statistical Tables. Addison-Wesley, Reading, MA.

Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Wald, A., and J. Wolfowitz. (1946). Tolerance Limits for a Normal Distribution. Annals of Mathematical Statistics 17, 208-215.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Meeker, 1991; Krishnamoorthy and Mathew, 2009). References that discuss tolerance intervals in the context of environmental monitoring include: Berthouex and Brown (2002, Chapter 21), Gibbons et al. (2009), Millard and Neerchal (2001, Chapter 6), Singh et al. (2010b), and USEPA (2009).

Examples

  # Generate 20 observations from a normal distribution with parameters 
  # mean=10 and sd=3, censor the observations less than 9, 
  # then create a one-sided upper tolerance interval with 90% 
  # coverage and 95% confidence based on these Type I left, singly 
  # censored data. 
  # (Note: the call to set.seed allows you to reproduce this example.

  set.seed(250) 
  dat <- sort(rnorm(20, mean = 10, sd = 3))
  dat
#>  [1]  6.406313  7.126621  8.119660  8.277216  8.426941  8.847961  8.899098
#>  [8]  9.357509  9.525756  9.534858  9.558567  9.847663 10.001989 10.014964
#> [15] 10.841384 11.386264 11.721850 12.524300 12.602469 12.813429
  # [1]  6.406313  7.126621  8.119660  8.277216  8.426941  8.847961
  # [7]  8.899098  9.357509  9.525756  9.534858  9.558567  9.847663
  #[13] 10.001989 10.014964 10.841384 11.386264 11.721850 12.524300
  #[19] 12.602469 12.813429

  censored <- dat < 9
  dat[censored] <- 9
 
  tolIntNormCensored(dat, censored, coverage = 0.9, ti.type="upper")
#> $distribution
#> [1] "Normal"
#> 
#> $sample.size
#> [1] 20
#> 
#> $censoring.side
#> [1] "left"
#> 
#> $censoring.levels
#> [1] 9
#> 
#> $percent.censored
#> [1] 35
#> 
#> $parameters
#>     mean       sd 
#> 9.700962 1.845067 
#> 
#> $n.param.est
#> [1] 2
#> 
#> $method
#> [1] "MLE"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $censoring.name
#> [1] "censored"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Tolerance"
#> 
#> $coverage
#> [1] 0.9
#> 
#> $coverage.type
#> [1] "content"
#> 
#> $limits
#>      LTL      UTL 
#>     -Inf 13.25454 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "Exact for\n                                 Complete Data"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> attr(,"class")
#> [1] "intervalEstimateCensored"
#> 
#> attr(,"class")
#> [1] "estimateCensored"

  #Results of Distribution Parameter Estimation
  #Based on Type I Censored Data
  #--------------------------------------------
  #
  #Assumed Distribution:            Normal
  #
  #Censoring Side:                  left
  #
  #Censoring Level(s):              9 
  #
  #Estimated Parameter(s):          mean = 9.700962
  #                                 sd   = 1.845067
  #
  #Estimation Method:               MLE
  #
  #Data:                            dat
  #
  #Censoring Variable:              censored
  #
  #Sample Size:                     20
  #
  #Percent Censored:                35%
  #
  #Assumed Sample Size:             20
  #
  #Tolerance Interval Coverage:     90%
  #
  #Coverage Type:                   content
  #
  #Tolerance Interval Method:       Exact for
  #                                 Complete Data
  #
  #Tolerance Interval Type:         upper
  #
  #Confidence Level:                95%
  #
  #Tolerance Interval:              LTL =     -Inf
  #                                 UTL = 13.25454
if (FALSE) { # \dontrun{

  # Note:  The true 90'th percentile is 13.84465
  #---------------------------------------------
  qnorm(0.9, mean = 10, sd = 3)
  # [1] 13.84465

  # Compare the result using the method "gpq"
  tolIntNormCensored(dat, censored, coverage = 0.9, ti.type="upper", 
    ti.method = "gpq", seed = 432)$interval$limits
  #     LTL      UTL 
  #    -Inf 13.56826 

  # Clean Up
  #---------
  rm(dat, censored)

  #==========

  # Example 15-1 of USEPA (2009, p. 15-10) shows how to estimate 
  # the mean and standard deviation using log-transformed multiply 
  # left-censored manganese concentration data.  Here we'll construct a 
  # 95

  EPA.09.Ex.15.1.manganese.df
  #    Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
  # 1       1 Well.1                 <5           5.0     TRUE
  # 2       2 Well.1               12.1          12.1    FALSE
  # 3       3 Well.1               16.9          16.9    FALSE
  # ...
  # 23      3 Well.5                3.3           3.3    FALSE
  # 24      4 Well.5                8.4           8.4    FALSE
  # 25      5 Well.5                 <2           2.0     TRUE

  with(EPA.09.Ex.15.1.manganese.df, 
    tolIntNormCensored(log(Manganese.ppb), Censored, coverage = 0.9, 
      ti.type = "upper"))

  # Results of Distribution Parameter Estimation
  # Based on Type I Censored Data
  # --------------------------------------------
  #
  # Assumed Distribution:            Normal
  #
  # Censoring Side:                  left
  #
  # Censoring Level(s):              0.6931472 1.6094379 
  #
  # Estimated Parameter(s):          mean = 2.215905
  #                                  sd   = 1.356291
  #
  # Estimation Method:               MLE
  #
  # Data:                            log(Manganese.ppb)
  #
  # Censoring Variable:              censored
  #
  # Sample Size:                     25
  #
  # Percent Censored:                24
  #
  # Assumed Sample Size:             25
  #
  # Tolerance Interval Coverage:     90
  #
  # Coverage Type:                   content
  #
  # Tolerance Interval Method:       Exact for
  #                                  Complete Data
  #
  # Tolerance Interval Type:         upper
  #
  # Confidence Level:                95
  #
  # Tolerance Interval:              LTL =     -Inf
  #                                  UTL = 4.708904
  } # }