tolIntNpar.Rd
Construct a \(\beta\)-content or \(\beta\)-expectation tolerance interval nonparametrically without making any assumptions about the form of the distribution except that it is continuous.
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
a scalar between 0 and 1 indicating the desired coverage of the \(\beta\)-content
tolerance interval.
The default value is coverage=0.95
. If cov.type="content"
, you must
supply a value for coverage
or a value for conf.level
, but not both.
If cov.type="expectation"
, this argument is ignored.
a scalar between 0 and 1 indicating the confidence level associated with the \(\beta\)-content
tolerance interval. The default value is conf.level=0.95
. If cov.type="content"
,
you must supply a value for coverage
or a value for conf.level
, but not both.
If cov.type="expectation"
, this argument is ignored.
character string specifying the coverage type for the tolerance interval.
The possible values are "content"
(\(\beta\)-content; the default), and
"expectation"
(\(\beta\)-expectation). See the DETAILS section for more
information.
positive integer indicating the rank of the order statistic to use for the lower bound
of the tolerance interval. If ti.type="two-sided"
or ti.type="lower"
,
the default value is ltl.rank=1
(implying the minimum value of x
is used
as the lower bound of the tolerance interval). If ti.type="upper"
, this argument
is set equal to 0
and the value of lb
is used as the lower bound of the
tolerance interval.
positive integer related to the rank of the order statistic to use for
the upper bound of the toleracne interval. A value of
n.plus.one.minus.utl.rank=1
(the default) means use the
first largest value of x
, and in general a value of
n.plus.one.minus.utl.rank=
\(i\) means use the \(i\)'th largest value. If ti.type="lower"
,
this argument is set equal to 0
and the value of ub
is used as the upper
bound of the tolerance interval.
scalars indicating lower and upper bounds on the distribution. By default, lb=-Inf
and
ub=Inf
. If you are constructing a tolerance interval for a distribution
that you know has a lower bound other than -Inf
(e.g., 0
), set lb
to this
value. Similarly, if you know the distribution has an upper bound other than Inf
, set
ub
to this value. The argument lb
is ignored if ti.type="two-sided"
or
ti.type="lower"
. The argument ub
is ignored if ti.type="two-sided"
or
ti.type="upper"
.
character string indicating what kind of tolerance interval to compute.
The possible values are "two-sided"
(the default), "lower"
, and
"upper"
.
A tolerance interval for some population is an interval on the real line constructed so as to contain \(100 \beta \%\) of the population (i.e., \(100 \beta \%\) of all future observations), where \(0 < \beta < 1\). The quantity \(100 \beta \%\) is called the coverage.
There are two kinds of tolerance intervals (Guttman, 1970):
A \(\beta\)-content tolerance interval with confidence level \(100(1-\alpha)\%\) is constructed so that it contains at least \(100 \beta \%\) of the population (i.e., the coverage is at least \(100 \beta \%\)) with probability \(100(1-\alpha)\%\), where \(0 < \alpha < 1\). The quantity \(100(1-\alpha)\%\) is called the confidence level or confidence coefficient associated with the tolerance interval.
A \(\beta\)-expectation tolerance interval is constructed so that the average coverage of the interval is \(100 \beta \%\).
Note: A \(\beta\)-expectation tolerance interval with coverage \(100 \beta \%\) is
equivalent to a prediction interval for one future observation with associated confidence level
\(100 \beta \%\). Note that there is no explicit confidence level associated with a
\(\beta\)-expectation tolerance interval. If a \(\beta\)-expectation tolerance interval is
treated as a \(\beta\)-content tolerance interval, the confidence level associated with this
tolerance interval is usually around 50% (e.g., Guttman, 1970, Table 4.2, p.76).
The Form of a Nonparametric Tolerance Interval
Let \(\underline{x}\) denote a random sample of \(n\) independent observations
from some continuous distribution and let \(x_{(i)}\) denote the \(i\)'th order
statistic in \(\underline{x}\). A two-sided nonparametric tolerance interval is
constructed as:
$$[x_{(u)}, x_{(v)}] \;\;\;\;\;\; (1)$$
where \(u\) and \(v\) are positive integers between \(1\) and \(n\), and
\(u < v\). That is, \(u\) denotes the rank of the lower tolerance limit, and
\(v\) denotes the rank of the upper tolerance limit. To make it easier to write
some equations later on, we can also write the tolerance interval (1) in a slightly
different way as:
$$[x_{(u)}, x_{(n+1-w)}] \;\;\;\;\;\; (2)$$
where
$$w = n + 1 - v \;\;\;\;\;\; (3)$$
so that \(w\) is a positive integer between \(1\) and \(n-1\), and \(u < n+1-w\).
In terms of the arguments to the function tolIntNpar
, the argument
ltl.rank
corresponds to \(u\), and the argument n.plus.one.minus.utl.rank
corresponds to \(w\).
If we allow \(u=0\) and \(w=0\) and define lower and upper bounds as: $$x_{(0)} = lb \;\;\;\;\;\; (4)$$ $$x_{(n+1)} = ub \;\;\;\;\;\; (5)$$ then equation (2) above can also represent a one-sided lower or one-sided upper tolerance interval as well. That is, a one-sided lower nonparametric tolerance interval is constructed as: $$[x_{(u)}, x_{(n+1)}] = [x_{(u)}, ub] \;\;\;\;\;\; (6)$$ and a one-sided upper nonparametric tolerance interval is constructed as: $$[x_{(0)}, x_{(v)}] = [lb, x_{(v)}] \;\;\;\;\;\; (7)$$ Usually, \(lb = -\infty\) or \(lb = 0\) and \(ub = \infty\).
Let \(C\) be a random variable denoting the coverage of the above nonparametric
tolerance intervals. Wilks (1941) showed that the distribution of \(C\) follows a
beta distribution with parameters shape1=
\(v-u\) and
shape2=
\(w+u\) when the unknown distribution is continuous.
Computations for a \(\beta\)-Content Tolerance Interval
For a \(\beta\)-content tolerance interval, if the coverage \(C = \beta\) is specified,
then the associated confidence level \((1-\alpha)100\%\) is computed as:
$$1 - \alpha = 1 - F(\beta, v-u, w+u) \;\;\;\;\;\; (8)$$
where \(F(y, \delta, \gamma)\) denotes the cumulative distribution function of a
beta random variable with parameters shape1=
\(\delta\) and
shape2=
\(\gamma\) evaluated at \(y\).
Similarly, if the confidence level associated with the tolerance interval is specified as
\((1-\alpha)100\%\), then the coverage \(C = \beta\) is computed as:
$$\beta = B(\alpha, v-u, w+u) \;\;\;\;\;\; (9)$$
where \(B(p, \delta, \gamma)\) denotes the \(p\)'th quantile of a
beta distribution with parameters shape1=
\(\delta\)
and shape2=
\(\gamma\).
Computations for a \(\beta\)-Expectation Tolerance Interval
For a \(\beta\)-expectation tolerance interval, the expected coverage is simply
the mean of a beta random variable with parameters
shape1=
\(v-u\) and shape2=
\(w+u\), which is given by:
$$E(C) = \frac{v-u}{n+1} \;\;\;\;\;\; (10)$$
As stated above, a \(\beta\)-expectation tolerance interval with coverage
\(\beta 100\%\) is equivalent to a prediction interval for one future observation
with associated confidence level \(\beta 100\%\). This is because the probability
that any single future observation will fall into this interval is \(\beta 100\%\),
so the distribution of the number of \(N\) future observations that will fall into
this interval is binomial with parameters size=
\(N\)
and prob=
\(\beta\). Hence the expected proportion of future observations
that fall into this interval is \(\beta 100\%\) and is independent of the value of \(N\).
See the help file for predIntNpar
for more information on constructing
a nonparametric prediction interval.
A list of class "estimate"
containing the estimated parameters,
the tolerance interval, and other information. See estimate.object
for details.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.
Danziger, L., and S. Davis. (1964). Tables of Distribution-Free Tolerance Limits. Annals of Mathematical Statistics 35(5), 1361–1365.
Davis, C.B. (1994). Environmental Regulatory Statistics. In Patil, G.P., and C.R. Rao, eds., Handbook of Statistics, Vol. 12: Environmental Statistics. North-Holland, Amsterdam, a division of Elsevier, New York, NY, Chapter 26, 817–865.
Davis, C.B., and R.J. McNichols. (1994a). Ground Water Monitoring Statistics Update: Part I: Progress Since 1988. Ground Water Monitoring and Remediation 14(4), 148–158.
Gibbons, R.D. (1991b). Statistical Tolerance Limits for Ground-Water Monitoring. Ground Water 29, 563–570.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT, Chapter 2.
Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York, 392pp.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.88-90.
Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
Wilks, S.S. (1941). Determination of Sample Sizes for Setting Tolerance Limits. Annals of Mathematical Statistics 12, 91–96.
Tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Meeker, 1991; Krishnamoorthy and Mathew, 2009). References that discuss tolerance intervals in the context of environmental monitoring include: Berthouex and Brown (2002, Chapter 21), Gibbons et al. (2009), Millard and Neerchal (2001, Chapter 6), Singh et al. (2010b), and USEPA (2009).
# Generate 20 observations from a lognormal mixture distribution
# with parameters mean1=1, cv1=0.5, mean2=5, cv2=1, and p.mix=0.1.
# The exact two-sided interval that contains 90% of this distribution is given by:
# [0.682312, 13.32052]. Use tolIntNpar to construct a two-sided 90%
# \eqn{\beta}-content tolerance interval. Note that the associated confidence level
# is only 61%. A larger sample size is required to obtain a larger confidence
# level (see the help file for tolIntNparN).
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(23)
dat <- rlnormMixAlt(20, 1, 0.5, 5, 1, 0.1)
tolIntNpar(dat, coverage = 0.9)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: None
#>
#> Data: dat
#>
#> Sample Size: 20
#>
#> Tolerance Interval Coverage: 90%
#>
#> Coverage Type: content
#>
#> Tolerance Interval Method: Exact
#>
#> Tolerance Interval Type: two-sided
#>
#> Confidence Level: 60.8253%
#>
#> Tolerance Limit Rank(s): 1 20
#>
#> Tolerance Interval: LTL = 0.5035035
#> UTL = 9.9504662
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: None
#
#Data: dat
#
#Sample Size: 20
#
#Tolerance Interval Coverage: 90%
#
#Coverage Type: content
#
#Tolerance Interval Method: Exact
#
#Tolerance Interval Type: two-sided
#
#Confidence Level: 60.8253%
#
#Tolerance Limit Rank(s): 1 20
#
#Tolerance Interval: LTL = 0.5035035
# UTL = 9.9504662
#----------
# Clean up
rm(dat)
#----------
# Reproduce Example 17-4 on page 17-21 of USEPA (2009). This example uses
# copper concentrations (ppb) from 3 background wells to set an upper
# limit for 2 compliance wells. The maximum value from the 3 wells is set
# to the 95% confidence upper tolerance limit, and we need to determine the
# coverage of this tolerance interval. The data are stored in EPA.92c.copper2.df.
# Note that even though these data are Type I left singly censored, it is still
# possible to compute an upper tolerance interval using any of the uncensored
# observations as the upper limit.
EPA.92c.copper2.df
#> Copper.orig Copper Censored Month Well Well.type
#> 1 <5 5.0 TRUE 1 1 Background
#> 2 <5 5.0 TRUE 2 1 Background
#> 3 7.5 7.5 FALSE 3 1 Background
#> 4 <5 5.0 TRUE 4 1 Background
#> 5 <5 5.0 TRUE 5 1 Background
#> 6 <5 5.0 TRUE 6 1 Background
#> 7 6.4 6.4 FALSE 7 1 Background
#> 8 6 6.0 FALSE 8 1 Background
#> 9 9.2 9.2 FALSE 1 2 Background
#> 10 <5 5.0 TRUE 2 2 Background
#> 11 <5 5.0 TRUE 3 2 Background
#> 12 6.1 6.1 FALSE 4 2 Background
#> 13 8 8.0 FALSE 5 2 Background
#> 14 5.9 5.9 FALSE 6 2 Background
#> 15 <5 5.0 TRUE 7 2 Background
#> 16 <5 5.0 TRUE 8 2 Background
#> 17 <5 5.0 TRUE 1 3 Background
#> 18 5.4 5.4 FALSE 2 3 Background
#> 19 6.7 6.7 FALSE 3 3 Background
#> 20 <5 5.0 TRUE 4 3 Background
#> 21 <5 5.0 TRUE 5 3 Background
#> 22 <5 5.0 TRUE 6 3 Background
#> 23 <5 5.0 TRUE 7 3 Background
#> 24 <5 5.0 TRUE 8 3 Background
#> 25 NA FALSE 1 4 Compliance
#> 26 NA FALSE 2 4 Compliance
#> 27 NA FALSE 3 4 Compliance
#> 28 NA FALSE 4 4 Compliance
#> 29 6.2 6.2 FALSE 5 4 Compliance
#> 30 <5 5.0 TRUE 6 4 Compliance
#> 31 7.8 7.8 FALSE 7 4 Compliance
#> 32 10.4 10.4 FALSE 8 4 Compliance
#> 33 NA FALSE 1 5 Compliance
#> 34 NA FALSE 2 5 Compliance
#> 35 NA FALSE 3 5 Compliance
#> 36 NA FALSE 4 5 Compliance
#> 37 <5 5.0 TRUE 5 5 Compliance
#> 38 <5 5.0 TRUE 6 5 Compliance
#> 39 5.6 5.6 FALSE 7 5 Compliance
#> 40 <5 5.0 TRUE 8 5 Compliance
# Copper.orig Copper Censored Month Well Well.type
#1 <5 5.0 TRUE 1 1 Background
#2 <5 5.0 TRUE 2 1 Background
#3 7.5 7.5 FALSE 3 1 Background
#...
#9 9.2 9.2 FALSE 1 2 Background
#10 <5 5.0 TRUE 2 2 Background
#11 <5 5.0 TRUE 3 2 Background
#...
#17 <5 5.0 TRUE 1 3 Background
#18 5.4 5.4 FALSE 2 3 Background
#19 6.7 6.7 FALSE 3 3 Background
#...
#29 6.2 6.2 FALSE 5 4 Compliance
#30 <5 5.0 TRUE 6 4 Compliance
#31 7.8 7.8 FALSE 7 4 Compliance
#...
#38 <5 5.0 TRUE 6 5 Compliance
#39 5.6 5.6 FALSE 7 5 Compliance
#40 <5 5.0 TRUE 8 5 Compliance
with(EPA.92c.copper2.df,
tolIntNpar(Copper[Well.type=="Background"],
conf.level = 0.95, lb = 0, ti.type = "upper"))
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: None
#>
#> Data: Copper[Well.type == "Background"]
#>
#> Sample Size: 24
#>
#> Tolerance Interval Coverage: 88.26538%
#>
#> Coverage Type: content
#>
#> Tolerance Interval Method: Exact
#>
#> Tolerance Interval Type: upper
#>
#> Confidence Level: 95%
#>
#> Tolerance Limit Rank(s): 24
#>
#> Tolerance Interval: LTL = 0.0
#> UTL = 9.2
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: None
#
#Data: Copper[Well.type == "Background"]
#
#Sample Size: 24
#
#Tolerance Interval Coverage: 88.26538%
#
#Coverage Type: content
#
#Tolerance Interval Method: Exact
#
#Tolerance Interval Type: upper
#
#Confidence Level: 95%
#
#Tolerance Limit Rank(s): 24
#
#Tolerance Interval: LTL = 0.0
# UTL = 9.2
#----------
# Repeat the last example, except compute an upper
# \eqn{\beta}-expectation tolerance interval:
with(EPA.92c.copper2.df,
tolIntNpar(Copper[Well.type=="Background"],
cov.type = "expectation", lb = 0, ti.type = "upper"))
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: None
#>
#> Data: Copper[Well.type == "Background"]
#>
#> Sample Size: 24
#>
#> Tolerance Interval Coverage: 96%
#>
#> Coverage Type: expectation
#>
#> Tolerance Interval Method: Exact
#>
#> Tolerance Interval Type: upper
#>
#> Tolerance Limit Rank(s): 24
#>
#> Tolerance Interval: LTL = 0.0
#> UTL = 9.2
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: None
#
#Data: Copper[Well.type == "Background"]
#
#Sample Size: 24
#
#Tolerance Interval Coverage: 96%
#
#Coverage Type: expectation
#
#Tolerance Interval Method: Exact
#
#Tolerance Interval Type: upper
#
#Tolerance Limit Rank(s): 24
#
#Tolerance Interval: LTL = 0.0
# UTL = 9.2