distChooseCensored.Rd
Perform a series of goodness-of-fit tests for censored data from a (possibly user-specified) set of candidate probability distributions to determine which probability distribution provides the best fit for a data set.
distChooseCensored(x, censored, censoring.side = "left", alpha = 0.05,
method = "sf", choices = c("norm", "gamma", "lnorm"),
est.arg.list = NULL, prob.method = "hirsch-stedinger",
plot.pos.con = 0.375, warn = TRUE, keep.data = TRUE,
data.name = NULL, censoring.name = NULL)
a numeric vector containing data for the goodness-of-fit tests.
Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be
removed.
numeric or logical vector indicating which values of x
are censored.
This must be the same length as x
. If the mode of censored
is
"logical"
, TRUE
values correspond to elements of x
that
are censored, and FALSE
values correspond to elements of x
that
are not censored. If the mode of censored
is "numeric"
,
it must contain only 1
's and 0
's; 1
corresponds to
TRUE
and 0
corresponds to FALSE
. Missing (NA
)
values are allowed but will be removed.
character string indicating on which side the censoring occurs. The possible
values are "left"
(the default) and "right"
.
numeric scalar between 0 and 1 specifying the Type I error associated with each
goodness-of-fit test. When method="proucl"
the only allowed values for
alpha
are 0.01, 0.05, and 0.1. The default value is alpha=0.05
.
character string defining which method to use. Possible values are:
"sw"
. Shapiro-Wilk.
"sf"
. Shapiro-Francia; the default.
"ppcc"
. Probability Plot Correlation Coefficient.
"proucl"
. ProUCL.
The Shapiro-Wilk method is only available for singly censored data.
See the DETAILS section for more information.
a character vector denoting the distribution abbreviations of the candidate
distributions. See the help file for Distribution.df
for a list
of distributions and their abbreviations.
The default value is choices=c("norm", "gamma", "lnorm")
,
indicating the Normal, Gamma, and Lognormal distributions.
This argument is ignored when method="proucl"
.
a list containing one or more lists of arguments to be passed to the
function(s) estimating the distribution parameters. The name(s) of
the components of the list must be equal to or a subset of the values of the
argument choices
. For example, if
choices=c("norm", "gammaAlt", "lnormAlt")
, setting est.arg.list=list(lnormAlt=list(method="bcmle"))
indicates
using the bias-corrected maximum-likelihood estimators (see the help file for
elnormAltCensored
).
See the section Estimating Distribution Parameters in the help file
EnvStats Functions for Censored Data for a list of
available estimating functions for censored data.
The default value is est.arg.list=NULL
so that all default values for the
estimating functions are used.
In the course of testing for some form of normality (i.e., Normal, Lognormal),
the estimated parameters are saved in the test.results
component of the returned object, but the choice of the method of estimation
has no effect on the goodness-of-fit test statistic or p-value.
This argument is ignored when method="proucl"
.
character string indicating what method to use to compute the plotting positions
(empirical probabilities) when test="sf"
or test="ppcc"
.
Possible values are:
"modified kaplan-meier"
(modification of product-limit method of
Kaplan and Meier (1958))
"nelson"
(hazard plotting method of Nelson (1972))
"michael-schucany"
(generalization of the product-limit method due to
Michael and Schucany (1986))
"hirsch-stedinger"
(generalization of the product-limit method due to
Hirsch and Stedinger (1987))
The default value is prob.method="hirsch-stedinger"
.
The "nelson"
method is only available for censoring.side="right"
, and
the "modified kaplan-meier"
method is only available for
censoring.side="left"
. See the help files for
gofTestCensored
and ppointsCensored
for more information.
numeric scalar between 0 and 1 containing the value of the plotting position
constant to use when test="sf"
or test="ppcc"
. The default value is plot.pos.con=0.375
. See the help files for gofTestCensored
and
ppointsCensored
for more information.
logical scalar indicating whether to print a warning message when
observations with NA
s, NaN
s, or Inf
s in
y
are removed. The default value is warn=TRUE
.
logical scalar indicating whether to return the original data. The
default value is keep.data=TRUE
.
optional character string indicating the name of the data used for argument x
.
optional character string indicating the name for the data used for argument censored
.
The function distChooseCensored
returns a list with information on the goodness-of-fit
tests for various distributions and which distribution appears to best fit the
data based on the p-values from the goodness-of-fit tests. This function was written in
order to compare ProUCL's way of choosing the best-fitting distribution (USEPA, 2015) with
other ways of choosing the best-fitting distribution.
Method Based on Shapiro-Wilk, Shapiro-Francia, or Probability Plot Correlation Test
(method="sw"
, method="sf"
, or method="ppcc"
)
For each value of the argument choices
, the function distChooseCensored
runs the goodness-of-fit test using the data in x
assuming that particular
distribution. For example, if choices=c("norm", "gamma", "lnorm")
,
indicating the Normal, Gamma, and Lognormal distributions, and
method="sf"
, then the usual Shapiro-Francia test is performed for the Normal
and Lognormal distributions, and the extension of the Shapiro-Francia test is performed
for the Gamma distribution (see the section
Testing Goodness-of-Fit for Any Continuous Distribution in the help
file for gofTestCensored
for an explanation of the latter). The distribution associated
with the largest p-value is the chosen distribution. In the case when all p-values are
less than the value of the argument alpha
, the distribution “Nonparametric” is chosen.
Method Based on ProUCL Algorithm (method="proucl"
)
When method="proucl"
, the function distChooseCensored
uses the
algorithm that ProUCL (USEPA, 2015) uses to determine the best fitting
distribution. The candidate distributions are the
Normal, Gamma, and Lognormal distributions. The algorithm
used by ProUCL is as follows:
Remove all censored observations and use only the uncensored observations.
Perform the Shapiro-Wilk and Lilliefors goodness-of-fit tests for the Normal distribution,
i.e., call the function gofTest
with distribution="norm", test="sw"
and distribution = "norm", test="lillie"
.
If either or both of the associated p-values are greater than or equal to the user-supplied value
of alpha
, then choose the Normal distribution. Otherwise, proceed to the next step.
Perform the “ProUCL Anderson-Darling” and
“ProUCL Kolmogorov-Smirnov” goodness-of-fit tests for the Gamma distribution,
i.e., call the function gofTest
with distribution="gamma", test="proucl.ad.gamma"
and distribution="gamma", test="proucl.ks.gamma"
.
If either or both of the associated p-values are greater than or equal to the user-supplied value
of alpha
, then choose the Gamma distribution. Otherwise, proceed to the next step.
Perform the Shapiro-Wilk and Lilliefors goodness-of-fit tests for the
Lognormal distribution, i.e., call the function gofTest
with distribution = "lnorm", test="sw"
and distribution = "lnorm", test="lillie"
.
If either or both of the associated p-values are greater than or equal to the user-supplied value
of alpha
, then choose the Lognormal distribution. Otherwise, proceed to the next step.
If none of the goodness-of-fit tests above yields a p-value greater than or equal to the user-supplied value
of alpha
, then choose the “Nonparametric” distribution.
a list of class "distChooseCensored"
containing the results of the goodness-of-fit tests.
Objects of class "distChooseCensored"
have a special printing method.
See the help file for distChooseCensored.object
for details.
Birnbaum, Z.W., and F.H. Tingey. (1951). One-Sided Confidence Contours for Probability Distribution Functions. Annals of Mathematical Statistics 22, 592-596.
Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. John Wiley and Sons, New York.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.
Dallal, G.E., and L. Wilkinson. (1986). An Analytic Approximation to the Distribution of Lilliefor's Test for Normality. The American Statistician 40, 294-296.
D'Agostino, R.B. (1970). Transformation to Normality of the Null Distribution of \(g1\). Biometrika 57, 679-681.
D'Agostino, R.B. (1971). An Omnibus Test of Normality for Moderate and Large Size Samples. Biometrika 58, 341-348.
D'Agostino, R.B. (1986b). Tests for the Normal Distribution. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York.
D'Agostino, R.B., and E.S. Pearson (1973). Tests for Departures from Normality. Empirical Results for the Distributions of \(b2\) and \(\sqrt{b1}\). Biometrika 60(3), 613-622.
D'Agostino, R.B., and G.L. Tietjen (1973). Approaches to the Null Distribution of \(\sqrt{b1}\). Biometrika 60(1), 169-173.
Fisher, R.A. (1950). Statistical Methods for Research Workers. 11'th Edition. Hafner Publishing Company, New York, pp.99-100.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Kendall, M.G., and A. Stuart. (1991). The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Fifth Edition. Oxford University Press, New York.
Kim, P.J., and R.I. Jennrich. (1973). Tables of the Exact Sampling Distribution of the Two Sample Kolmogorov-Smirnov Criterion. In Harter, H.L., and D.B. Owen, eds. Selected Tables in Mathematical Statistics, Vol. 1. American Mathematical Society, Providence, Rhode Island, pp.79-170.
Kolmogorov, A.N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell' Istituto Italiano degle Attuari 4, 83-91.
Marsaglia, G., W.W. Tsang, and J. Wang. (2003). Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8(18). doi:10.18637/jss.v008.i18 .
Moore, D.S. (1986). Tests of Chi-Squared Type. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, pp.63-95.
Pomeranz, J. (1973). Exact Cumulative Distribution of the Kolmogorov-Smirnov Statistic for Small Samples (Algorithm 487). Collected Algorithms from ACM ??, ???-???.
Royston, J.P. (1992a). Approximating the Shapiro-Wilk W-Test for Non-Normality. Statistics and Computing 2, 117-119.
Royston, J.P. (1992b). Estimation, Reference Ranges and Goodness of Fit for the Three-Parameter Log-Normal Distribution. Statistics in Medicine 11, 897-912.
Royston, J.P. (1992c). A Pocket-Calculator Algorithm for the Shapiro-Francia Test of Non-Normality: An Application to Medicine. Statistics in Medicine 12, 181-184.
Royston, P. (1993). A Toolkit for Testing for Non-Normality in Complete and Censored Samples. The Statistician 42, 37-43.
Ryan, T., and B. Joiner. (1973). Normal Probability Plots and Tests for Normality. Technical Report, Pennsylvannia State University, Department of Statistics.
Shapiro, S.S., and R.S. Francia. (1972). An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association 67(337), 215-219.
Shapiro, S.S., and M.B. Wilk. (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika 52, 591-611.
Smirnov, N.V. (1939). Estimate of Deviation Between Empirical Distribution Functions in Two Independent Samples. Bulletin Moscow University 2(2), 3-16.
Smirnov, N.V. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions. Annals of Mathematical Statistics 19, 279-281.
Stephens, M.A. (1970). Use of the Kolmogorov-Smirnov, Cramer-von Mises and Related Statistics Without Extensive Tables. Journal of the Royal Statistical Society, Series B, 32, 115-122.
Stephens, M.A. (1986a). Tests Based on EDF Statistics. In D'Agostino, R. B., and M.A. Stevens, eds. Goodness-of-Fit Techniques. Marcel Dekker, New York.
USEPA. (2015). ProUCL Version 5.1.002 Technical Guide. EPA/600/R-07/041, October 2015. Office of Research and Development. U.S. Environmental Protection Agency, Washington, D.C.
Verrill, S., and R.A. Johnson. (1987). The Asymptotic Equivalence of Some Modified Shapiro-Wilk Statistics – Complete and Censored Sample Cases. The Annals of Statistics 15(1), 413-419.
Verrill, S., and R.A. Johnson. (1988). Tables and Large-Sample Distribution Theory for Censored-Data Correlation Statistics for Testing Normality. Journal of the American Statistical Association 83, 1192-1197.
Weisberg, S., and C. Bingham. (1975). An Approximate Analysis of Variance Test for Non-Normality Suitable for Machine Calculation. Technometrics 17, 133-134.
Wilk, M.B., and S.S. Shapiro. (1968). The Joint Assessment of Normality of Several Independent Samples. Technometrics, 10(4), 825-839.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
In practice, almost any goodness-of-fit test will not reject the null hypothesis
if the number of observations is relatively small. Conversely, almost any goodness-of-fit
test will reject the null hypothesis if the number of observations is very large,
since “real” data are never distributed according to any theoretical distribution
(Conover, 1980, p.367). For most cases, however, the distribution of “real” data
is close enough to some theoretical distribution that fairly accurate results may be
provided by assuming that particular theoretical distribution. One way to asses the
goodness of the fit is to use goodness-of-fit tests. Another way is to look at
quantile-quantile (Q-Q) plots (see qqPlotCensored
).
# Generate 30 observations from a gamma distribution with
# parameters mean=10 and cv=1 and then censor observations less than 5.
# Then:
#
# 1) Call distChooseCensored using the Shapiro-Wilk method and specify
# choices of the
# normal,
# gamma (alternative parameterzation), and
# lognormal (alternative parameterization)
# distributions.
#
# 2) Compare the results in 1) above with the results using the
# ProUCL method.
#
# Notes: The call to set.seed lets you reproduce this example.
#
# The ProUCL method chooses the Normal distribution, whereas the
# Shapiro-Wilk method chooses the Gamma distribution.
set.seed(598)
dat <- sort(rgammaAlt(30, mean = 10, cv = 1))
dat
#> [1] 0.5313509 1.4741833 1.9936208 2.7980636 3.4509840 3.7987348
#> [7] 4.5542952 5.5207531 5.5253596 5.7177872 5.7513827 9.1086375
#> [13] 9.8444090 10.6247123 10.9304922 11.7925398 13.3432689 13.9562777
#> [19] 14.6029065 15.0563342 15.8730642 16.0039936 16.6910715 17.0288922
#> [25] 17.8507891 19.1105522 20.2657141 26.3815970 30.2912797 42.8726101
# [1] 0.5313509 1.4741833 1.9936208 2.7980636 3.4509840
# [6] 3.7987348 4.5542952 5.5207531 5.5253596 5.7177872
#[11] 5.7513827 9.1086375 9.8444090 10.6247123 10.9304922
#[16] 11.7925398 13.3432689 13.9562777 14.6029065 15.0563342
#[21] 15.8730642 16.0039936 16.6910715 17.0288922 17.8507891
#[26] 19.1105522 20.2657141 26.3815970 30.2912797 42.8726101
dat.censored <- dat
censored <- dat.censored < 5
dat.censored[censored] <- 5
# 1) Call distChooseCensored using the Shapiro-Wilk method.
#----------------------------------------------------------
distChooseCensored(dat.censored, censored, method = "sw",
choices = c("norm", "gammaAlt", "lnormAlt"))
#> $choices
#> [1] "Normal" "Gamma" "Lognormal"
#>
#> $method
#> [1] "Shapiro-Wilk"
#>
#> $decision
#> [1] "Gamma"
#>
#> $alpha
#> [1] 0.05
#>
#> $distribution.parameters
#> mean cv
#> 12.4911448 0.7617343
#>
#> $estimation.method
#> [1] "MLE"
#>
#> $sample.size
#> [1] 30
#>
#> $censoring.side
#> [1] "left"
#>
#> $censoring.levels
#> [1] 5
#>
#> $percent.censored
#> [1] 23.33333
#>
#> $test.results
#> $test.results$norm
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#> (Singly Censored Data)
#>
#> Hypothesized Distribution: Normal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 5
#>
#> Estimated Parameter(s): mean = 11.44780
#> sd = 10.64297
#>
#> Estimation Method: MLE
#>
#> Data: dat.censored
#>
#> Censoring Variable: censored
#>
#> Sample Size: 30
#>
#> Percent Censored: 23.33333%
#>
#> Test Statistic: W = 0.9372741
#>
#> Test Statistic Parameters: N = 30.0000000
#> DELTA = 0.2333333
#>
#> P-value: 0.1704876
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$gammaAlt
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#> (Singly Censored Data)
#> Based on Chen & Balakrisnan (1995)
#>
#> Hypothesized Distribution: Gamma
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 5
#>
#> Estimated Parameter(s): mean = 12.4911448
#> cv = 0.7617343
#>
#> Estimation Method: MLE
#>
#> Data: dat.censored
#>
#> Censoring Variable: censored
#>
#> Sample Size: 30
#>
#> Percent Censored: 23.33333%
#>
#> Test Statistic: W = 0.9613711
#>
#> Test Statistic Parameters: N = 30.0000000
#> DELTA = 0.2333333
#>
#> P-value: 0.522329
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$lnormAlt
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#> (Singly Censored Data)
#>
#> Hypothesized Distribution: Lognormal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 5
#>
#> Estimated Parameter(s): mean = 13.0382221
#> cv = 0.9129512
#>
#> Estimation Method: MLE
#>
#> Data: dat.censored
#>
#> Censoring Variable: censored
#>
#> Sample Size: 30
#>
#> Percent Censored: 23.33333%
#>
#> Test Statistic: W = 0.9292406
#>
#> Test Statistic Parameters: N = 30.0000000
#> DELTA = 0.2333333
#>
#> P-value: 0.114511
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#>
#> $data
#> [1] 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
#> [8] 5.520753 5.525360 5.717787 5.751383 9.108638 9.844409 10.624712
#> [15] 10.930492 11.792540 13.343269 13.956278 14.602906 15.056334 15.873064
#> [22] 16.003994 16.691071 17.028892 17.850789 19.110552 20.265714 26.381597
#> [29] 30.291280 42.872610
#>
#> $censored
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE FALSE FALSE FALSE FALSE
#>
#> $data.name
#> [1] "dat.censored"
#>
#> $censoring.name
#> [1] "censored"
#>
#> attr(,"class")
#> [1] "distChooseCensored"
#Results of Choosing Distribution
#--------------------------------
#
#Candidate Distributions: Normal
# Gamma
# Lognormal
#
#Choice Method: Shapiro-Wilk
#
#Type I Error per Test: 0.05
#
#Decision: Gamma
#
#Estimated Parameter(s): mean = 12.4911448
# cv = 0.7617343
#
#Estimation Method: MLE
#
#Data: dat.censored
#
#Sample Size: 30
#
#Censoring Side: left
#
#Censoring Variable: censored
#
#Censoring Level(s): 5
#
#Percent Censored: 23.33333%
#
#Test Results:
#
# Normal
# Test Statistic: W = 0.9372741
# P-value: 0.1704876
#
# Gamma
# Test Statistic: W = 0.9613711
# P-value: 0.522329
#
# Lognormal
# Test Statistic: W = 0.9292406
# P-value: 0.114511
#--------------------
# 2) Compare the results in 1) above with the results using the
# ProUCL method.
#---------------------------------------------------------------
distChooseCensored(dat.censored, censored, method = "proucl")
#> $choices
#> [1] "Normal" "Gamma" "Lognormal"
#>
#> $method
#> [1] "ProUCL"
#>
#> $decision
#> [1] "Normal"
#>
#> $alpha
#> [1] 0.05
#>
#> $distribution.parameters
#> mean sd
#> 15.397584 8.688302
#>
#> $estimation.method
#> [1] "mvue"
#>
#> $sample.size
#> [1] 30
#>
#> $censoring.side
#> [1] "left"
#>
#> $censoring.levels
#> [1] 5
#>
#> $percent.censored
#> [1] 23.33333
#>
#> $proucl.sample.size
#> [1] 23
#>
#> $test.results
#> $test.results$norm
#> $test.results$norm$`Shapiro-Wilk GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#>
#> Hypothesized Distribution: Normal
#>
#> Estimated Parameter(s): mean = 15.397584
#> sd = 8.688302
#>
#> Estimation Method: mvue
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: W = 0.861652
#>
#> Test Statistic Parameter: n = 23
#>
#> P-value: 0.004457924
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$norm$`Lilliefors (Kolmogorov-Smirnov) GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Lilliefors (Kolmogorov-Smirnov) GOF
#>
#> Hypothesized Distribution: Normal
#>
#> Estimated Parameter(s): mean = 15.397584
#> sd = 8.688302
#>
#> Estimation Method: mvue
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: D = 0.1714435
#>
#> Test Statistic Parameter: n = 23
#>
#> P-value: 0.07794315
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$norm$text
#> [1] "Data appear Approximate Normal at 5% Significance Level"
#>
#>
#> $test.results$gamma
#> $test.results$gamma$`ProUCL Anderson-Darling Gamma GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: ProUCL Anderson-Darling Gamma GOF
#>
#> Hypothesized Distribution: Gamma
#>
#> Estimated Parameter(s): shape = 3.789306
#> scale = 4.063431
#>
#> Estimation Method: MLE
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: A = 0.3805556
#>
#> Test Statistic Parameter: n = 23
#>
#> Critical Values: A.0.01 = 1.024
#> A.0.05 = 0.750
#> A.0.10 = 0.631
#>
#> P-value: >= 0.10
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$gamma$`ProUCL Kolmogorov-Smirnov Gamma GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: ProUCL Kolmogorov-Smirnov Gamma GOF
#>
#> Hypothesized Distribution: Gamma
#>
#> Estimated Parameter(s): shape = 3.789306
#> scale = 4.063431
#>
#> Estimation Method: MLE
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: D = 0.1035271
#>
#> Test Statistic Parameter: n = 23
#>
#> Critical Values: D.0.01 = 0.213
#> D.0.05 = 0.183
#> D.0.10 = 0.168
#>
#> P-value: >= 0.10
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$gamma$text
#> [1] "Data appear Gamma Distributed at 5% Significance Level"
#>
#>
#> $test.results$lnorm
#> $test.results$lnorm$`Shapiro-Wilk GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#>
#> Hypothesized Distribution: Lognormal
#>
#> Estimated Parameter(s): meanlog = 2.5964958
#> sdlog = 0.5398728
#>
#> Estimation Method: mvue
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: W = 0.9532604
#>
#> Test Statistic Parameter: n = 23
#>
#> P-value: 0.3414187
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#> $test.results$lnorm$`Lilliefors (Kolmogorov-Smirnov) GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Lilliefors (Kolmogorov-Smirnov) GOF
#>
#> Hypothesized Distribution: Lognormal
#>
#> Estimated Parameter(s): meanlog = 2.5964958
#> sdlog = 0.5398728
#>
#> Estimation Method: mvue
#>
#> Data: dat.censored
#>
#> Sample Size: 23
#>
#> Test Statistic: D = 0.115588
#>
#> Test Statistic Parameter: n = 23
#>
#> P-value: 0.5899259
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#> $test.results$lnorm$text
#> [1] "Data appear Lognormal at 5% Significance Level"
#>
#>
#>
#> $<NA>
#> NULL
#>
#> $censoring.name
#> [1] "censored"
#>
#> $data
#> [1] 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
#> [8] 5.520753 5.525360 5.717787 5.751383 9.108638 9.844409 10.624712
#> [15] 10.930492 11.792540 13.343269 13.956278 14.602906 15.056334 15.873064
#> [22] 16.003994 16.691071 17.028892 17.850789 19.110552 20.265714 26.381597
#> [29] 30.291280 42.872610
#>
#> $censored
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE FALSE FALSE FALSE FALSE
#>
#> $data.name
#> [1] "dat.censored"
#>
#> attr(,"class")
#> [1] "distChooseCensored"
#Results of Choosing Distribution
#--------------------------------
#
#Candidate Distributions: Normal
# Gamma
# Lognormal
#
#Choice Method: ProUCL
#
#Type I Error per Test: 0.05
#
#Decision: Normal
#
#Estimated Parameter(s): mean = 15.397584
# sd = 8.688302
#
#Estimation Method: mvue
#
#Data: dat.censored
#
#Sample Size: 30
#
#Censoring Side: left
#
#Censoring Variable: censored
#
#Censoring Level(s): 5
#
#Percent Censored: 23.33333%
#
#ProUCL Sample Size: 23
#
#Test Results:
#
# Normal
# Shapiro-Wilk GOF
# Test Statistic: W = 0.861652
# P-value: 0.004457924
# Lilliefors (Kolmogorov-Smirnov) GOF
# Test Statistic: D = 0.1714435
# P-value: 0.07794315
#
# Gamma
# ProUCL Anderson-Darling Gamma GOF
# Test Statistic: A = 0.3805556
# P-value: >= 0.10
# ProUCL Kolmogorov-Smirnov Gamma GOF
# Test Statistic: D = 0.1035271
# P-value: >= 0.10
#
# Lognormal
# Shapiro-Wilk GOF
# Test Statistic: W = 0.9532604
# P-value: 0.3414187
# Lilliefors (Kolmogorov-Smirnov) GOF
# Test Statistic: D = 0.115588
# P-value: 0.5899259
#--------------------
# Clean up
#---------
rm(dat, censored, dat.censored)
#====================================================================
# Check the assumption that the silver data stored in Helsel.Cohn.88.silver.df
# follows a lognormal distribution.
# Note that the small p-value and the shape of the Q-Q plot
# (an inverted S-shape) suggests that the log transformation is not quite strong
# enough to "bring in" the tails (i.e., the log-transformed silver data has tails
# that are slightly too long relative to a normal distribution).
# Helsel and Cohn (1988, p.2002) note that the gross outlier of 560 mg/L tends to
# make the shape of the data resemble a gamma distribution, but
# the distChooseCensored function decision is neither Gamma nor Lognormal,
# but instead Nonparametric.
# First create a lognormal Q-Q plot
#----------------------------------
dev.new()
with(Helsel.Cohn.88.silver.df,
qqPlotCensored(Ag, Censored, distribution = "lnorm",
points.col = "blue", add.line = TRUE))
#----------
# Now call the distChooseCensored function using the default settings.
#---------------------------------------------------------------------
with(Helsel.Cohn.88.silver.df,
distChooseCensored(Ag, Censored))
#> $choices
#> [1] "Normal" "Gamma" "Lognormal"
#>
#> $method
#> [1] "Shapiro-Francia"
#>
#> $decision
#> [1] "Nonparametric"
#>
#> $alpha
#> [1] 0.05
#>
#> $distribution.parameters
#> NULL
#>
#> $estimation.method
#> NULL
#>
#> $sample.size
#> [1] 56
#>
#> $censoring.side
#> [1] "left"
#>
#> $censoring.levels
#> [1] 0.1 0.2 0.3 0.5 1.0 2.0 2.5 5.0 6.0 10.0 20.0 25.0
#>
#> $percent.censored
#> [1] 60.71429
#>
#> $test.results
#> $test.results$norm
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Francia GOF
#> (Multiply Censored Data)
#>
#> Hypothesized Distribution: Normal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 0.1 0.2 0.3 0.5 1.0 2.0 2.5 5.0 6.0 10.0 20.0 25.0
#>
#> Estimated Parameter(s): mean = -64.18403
#> sd = 127.22776
#>
#> Estimation Method: MLE
#>
#> Data: Ag
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 56
#>
#> Percent Censored: 60.71429%
#>
#> Test Statistic: W = 0.3065529
#>
#> Test Statistic Parameters: N = 56.0000000
#> DELTA = 0.6071429
#>
#> P-value: 8.346126e-08
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$gamma
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Francia GOF
#> (Multiply Censored Data)
#> Based on Chen & Balakrisnan (1995)
#>
#> Hypothesized Distribution: Gamma
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 0.1 0.2 0.3 0.5 1.0 2.0 2.5 5.0 6.0 10.0 20.0 25.0
#>
#> Estimated Parameter(s): shape = 0.1045252
#> scale = 121.0024944
#>
#> Estimation Method: MLE
#>
#> Data: Ag
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 56
#>
#> Percent Censored: 60.71429%
#>
#> Test Statistic: W = 0.6254148
#>
#> Test Statistic Parameters: N = 56.0000000
#> DELTA = 0.6071429
#>
#> P-value: 1.884155e-05
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$lnorm
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: Shapiro-Francia GOF
#> (Multiply Censored Data)
#>
#> Hypothesized Distribution: Lognormal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 0.1 0.2 0.3 0.5 1.0 2.0 2.5 5.0 6.0 10.0 20.0 25.0
#>
#> Estimated Parameter(s): meanlog = -1.040572
#> sdlog = 2.354847
#>
#> Estimation Method: MLE
#>
#> Data: Ag
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 56
#>
#> Percent Censored: 60.71429%
#>
#> Test Statistic: W = 0.8957198
#>
#> Test Statistic Parameters: N = 56.0000000
#> DELTA = 0.6071429
#>
#> P-value: 0.03490314
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#>
#> $data
#> [1] 0.8 0.1 2.0 0.2 5.0 1.0 2.0 25.0 2.7 1.0 1.2 3.2
#> [13] 20.0 10.0 5.0 0.1 10.0 1.0 2.0 5.0 560.0 0.2 20.0 1.0
#> [25] 1.0 10.0 10.0 5.0 0.5 1.4 0.2 6.0 1.0 10.0 0.1 5.0
#> [37] 2.0 1.0 1.0 4.4 90.0 20.0 0.3 2.5 10.0 0.7 1.0 1.5
#> [49] 1.0 0.2 2.0 0.2 1.0 1.0 1.0 0.1
#>
#> $censored
#> [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
#> [13] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
#> [25] FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
#> [37] FALSE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE
#> [49] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#>
#> $data.name
#> [1] "Ag"
#>
#> $censoring.name
#> [1] "Censored"
#>
#> attr(,"class")
#> [1] "distChooseCensored"
#Results of Choosing Distribution
#--------------------------------
#
#Candidate Distributions: Normal
# Gamma
# Lognormal
#
#Choice Method: Shapiro-Francia
#
#Type I Error per Test: 0.05
#
#Decision: Nonparametric
#
#Data: Ag
#
#Sample Size: 56
#
#Censoring Side: left
#
#Censoring Variable: Censored
#
#Censoring Level(s): 0.1 0.2 0.3 0.5 1.0 2.0 2.5 5.0 6.0 10.0 20.0 25.0
#
#Percent Censored: 60.71429%
#
#Test Results:
#
# Normal
# Test Statistic: W = 0.3065529
# P-value: 8.346126e-08
#
# Gamma
# Test Statistic: W = 0.6254148
# P-value: 1.884155e-05
#
# Lognormal
# Test Statistic: W = 0.8957198
# P-value: 0.03490314
#----------
# Clean up
#---------
graphics.off()
#====================================================================
# Chapter 15 of USEPA (2009) gives several examples of looking
# at normal Q-Q plots and estimating the mean and standard deviation
# for manganese concentrations (ppb) in groundwater at five background
# wells (USEPA, 2009, p. 15-10). The Q-Q plot shown in Figure 15-4
# on page 15-13 clearly indicates that the Lognormal distribution
# is a good fit for these data.
# In EnvStats these data are stored in the data frame EPA.09.Ex.15.1.manganese.df.
# Here we will call the distChooseCensored function to determine
# whether the data appear to come from a normal, gamma, or lognormal
# distribution.
#
# Note that using the Probability Plot Correlation Coefficient method
# (equivalent to using the Shapiro-Francia method) yields a decision of
# Lognormal, but using the ProUCL method yields a decision of Gamma.
#----------------------------------------------------------------------
# First look at the data:
#-----------------------
EPA.09.Ex.15.1.manganese.df
#> Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#> 1 1 Well.1 <5 5.0 TRUE
#> 2 2 Well.1 12.1 12.1 FALSE
#> 3 3 Well.1 16.9 16.9 FALSE
#> 4 4 Well.1 21.6 21.6 FALSE
#> 5 5 Well.1 <2 2.0 TRUE
#> 6 1 Well.2 <5 5.0 TRUE
#> 7 2 Well.2 7.7 7.7 FALSE
#> 8 3 Well.2 53.6 53.6 FALSE
#> 9 4 Well.2 9.5 9.5 FALSE
#> 10 5 Well.2 45.9 45.9 FALSE
#> 11 1 Well.3 <5 5.0 TRUE
#> 12 2 Well.3 5.3 5.3 FALSE
#> 13 3 Well.3 12.6 12.6 FALSE
#> 14 4 Well.3 106.3 106.3 FALSE
#> 15 5 Well.3 34.5 34.5 FALSE
#> 16 1 Well.4 6.3 6.3 FALSE
#> 17 2 Well.4 11.9 11.9 FALSE
#> 18 3 Well.4 10 10.0 FALSE
#> 19 4 Well.4 <2 2.0 TRUE
#> 20 5 Well.4 77.2 77.2 FALSE
#> 21 1 Well.5 17.9 17.9 FALSE
#> 22 2 Well.5 22.7 22.7 FALSE
#> 23 3 Well.5 3.3 3.3 FALSE
#> 24 4 Well.5 8.4 8.4 FALSE
#> 25 5 Well.5 <2 2.0 TRUE
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#...
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
longToWide(EPA.09.Ex.15.1.manganese.df,
"Manganese.Orig.ppb", "Sample", "Well",
paste.row.name = TRUE)
#> Well.1 Well.2 Well.3 Well.4 Well.5
#> Sample.1 <5 <5 <5 6.3 17.9
#> Sample.2 12.1 7.7 5.3 11.9 22.7
#> Sample.3 16.9 53.6 12.6 10 3.3
#> Sample.4 21.6 9.5 106.3 <2 8.4
#> Sample.5 <2 45.9 34.5 77.2 <2
# Well.1 Well.2 Well.3 Well.4 Well.5
#Sample.1 <5 <5 <5 6.3 17.9
#Sample.2 12.1 7.7 5.3 11.9 22.7
#Sample.3 16.9 53.6 12.6 10 3.3
#Sample.4 21.6 9.5 106.3 <2 8.4
#Sample.5 <2 45.9 34.5 77.2 <2
# Use distChooseCensored with the probability plot correlation method,
# and for the gamma and lognormal distribution specify the
# mean and CV parameterization:
#------------------------------------------------------------
with(EPA.09.Ex.15.1.manganese.df,
distChooseCensored(Manganese.ppb, Censored,
choices = c("norm", "gamma", "lnormAlt"), method = "ppcc"))
#> $choices
#> [1] "Normal" "Gamma" "Lognormal"
#>
#> $method
#> [1] "PPCC"
#>
#> $decision
#> [1] "Lognormal"
#>
#> $alpha
#> [1] 0.05
#>
#> $distribution.parameters
#> mean cv
#> 23.003987 2.300772
#>
#> $estimation.method
#> [1] "MLE"
#>
#> $sample.size
#> [1] 25
#>
#> $censoring.side
#> [1] "left"
#>
#> $censoring.levels
#> [1] 2 5
#>
#> $percent.censored
#> [1] 24
#>
#> $test.results
#> $test.results$norm
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: PPCC GOF
#> (Multiply Censored Data)
#>
#> Hypothesized Distribution: Normal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 2 5
#>
#> Estimated Parameter(s): mean = 15.23508
#> sd = 30.62812
#>
#> Estimation Method: MLE
#>
#> Data: Manganese.ppb
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 25
#>
#> Percent Censored: 24%
#>
#> Test Statistic: r = 0.9147686
#>
#> Test Statistic Parameters: N = 25.00
#> DELTA = 0.24
#>
#> P-value: 0.004662658
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$gamma
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: PPCC GOF
#> (Multiply Censored Data)
#> Based on Chen & Balakrisnan (1995)
#>
#> Hypothesized Distribution: Gamma
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 2 5
#>
#> Estimated Parameter(s): shape = 0.6370043
#> scale = 30.8707540
#>
#> Estimation Method: MLE
#>
#> Data: Manganese.ppb
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 25
#>
#> Percent Censored: 24%
#>
#> Test Statistic: r = 0.9844875
#>
#> Test Statistic Parameters: N = 25.00
#> DELTA = 0.24
#>
#> P-value: 0.6836625
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$lnormAlt
#>
#> Results of Goodness-of-Fit Test
#> Based on Type I Censored Data
#> -------------------------------
#>
#> Test Method: PPCC GOF
#> (Multiply Censored Data)
#>
#> Hypothesized Distribution: Lognormal
#>
#> Censoring Side: left
#>
#> Censoring Level(s): 2 5
#>
#> Estimated Parameter(s): mean = 23.003987
#> cv = 2.300772
#>
#> Estimation Method: MLE
#>
#> Data: Manganese.ppb
#>
#> Censoring Variable: Censored
#>
#> Sample Size: 25
#>
#> Percent Censored: 24%
#>
#> Test Statistic: r = 0.9931982
#>
#> Test Statistic Parameters: N = 25.00
#> DELTA = 0.24
#>
#> P-value: 0.9767731
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#>
#> $data
#> [1] 5.0 12.1 16.9 21.6 2.0 5.0 7.7 53.6 9.5 45.9 5.0 5.3
#> [13] 12.6 106.3 34.5 6.3 11.9 10.0 2.0 77.2 17.9 22.7 3.3 8.4
#> [25] 2.0
#>
#> $censored
#> [1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> [25] TRUE
#>
#> $data.name
#> [1] "Manganese.ppb"
#>
#> $censoring.name
#> [1] "Censored"
#>
#> attr(,"class")
#> [1] "distChooseCensored"
#Results of Choosing Distribution
#--------------------------------
#
#Candidate Distributions: Normal
# Gamma
# Lognormal
#
#Choice Method: PPCC
#
#Type I Error per Test: 0.05
#
#Decision: Lognormal
#
#Estimated Parameter(s): mean = 23.003987
# cv = 2.300772
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Sample Size: 25
#
#Censoring Side: left
#
#Censoring Variable: Censored
#
#Censoring Level(s): 2 5
#
#Percent Censored: 24%
#
#Test Results:
#
# Normal
# Test Statistic: r = 0.9147686
# P-value: 0.004662658
#
# Gamma
# Test Statistic: r = 0.9844875
# P-value: 0.6836625
#
# Lognormal
# Test Statistic: r = 0.9931982
# P-value: 0.9767731
#--------------------
# Repeat the above example using the ProUCL method.
#--------------------------------------------------
with(EPA.09.Ex.15.1.manganese.df,
distChooseCensored(Manganese.ppb, Censored, method = "proucl"))
#> $choices
#> [1] "Normal" "Gamma" "Lognormal"
#>
#> $method
#> [1] "ProUCL"
#>
#> $decision
#> [1] "Gamma"
#>
#> $alpha
#> [1] 0.05
#>
#> $distribution.parameters
#> shape scale
#> 1.284882 19.813413
#>
#> $estimation.method
#> [1] "MLE"
#>
#> $sample.size
#> [1] 25
#>
#> $censoring.side
#> [1] "left"
#>
#> $censoring.levels
#> [1] 2 5
#>
#> $percent.censored
#> [1] 24
#>
#> $proucl.sample.size
#> [1] 19
#>
#> $test.results
#> $test.results$norm
#> $test.results$norm$`Shapiro-Wilk GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#>
#> Hypothesized Distribution: Normal
#>
#> Estimated Parameter(s): mean = 25.45789
#> sd = 27.43577
#>
#> Estimation Method: mvue
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: W = 0.7423947
#>
#> Test Statistic Parameter: n = 19
#>
#> P-value: 0.0001862975
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$norm$`Lilliefors (Kolmogorov-Smirnov) GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Lilliefors (Kolmogorov-Smirnov) GOF
#>
#> Hypothesized Distribution: Normal
#>
#> Estimated Parameter(s): mean = 25.45789
#> sd = 27.43577
#>
#> Estimation Method: mvue
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: D = 0.2768771
#>
#> Test Statistic Parameter: n = 19
#>
#> P-value: 0.0004771155
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Normal Distribution.
#>
#> $test.results$norm$text
#> [1] "Data Not Normal at 5% Significance Level"
#>
#>
#> $test.results$gamma
#> $test.results$gamma$`ProUCL Anderson-Darling Gamma GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: ProUCL Anderson-Darling Gamma GOF
#>
#> Hypothesized Distribution: Gamma
#>
#> Estimated Parameter(s): shape = 1.284882
#> scale = 19.813413
#>
#> Estimation Method: MLE
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: A = 0.6857121
#>
#> Test Statistic Parameter: n = 19
#>
#> Critical Values: A.0.01 = 1.050
#> A.0.05 = 0.764
#> A.0.10 = 0.642
#>
#> P-value: 0.05 <= p < 0.10
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$gamma$`ProUCL Kolmogorov-Smirnov Gamma GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: ProUCL Kolmogorov-Smirnov Gamma GOF
#>
#> Hypothesized Distribution: Gamma
#>
#> Estimated Parameter(s): shape = 1.284882
#> scale = 19.813413
#>
#> Estimation Method: MLE
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: D = 0.1830034
#>
#> Test Statistic Parameter: n = 19
#>
#> Critical Values: D.0.01 = 0.237
#> D.0.05 = 0.203
#> D.0.10 = 0.186
#>
#> P-value: >= 0.10
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Gamma Distribution.
#>
#> $test.results$gamma$text
#> [1] "Data appear Gamma Distributed at 5% Significance Level"
#>
#>
#> $test.results$lnorm
#> $test.results$lnorm$`Shapiro-Wilk GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Shapiro-Wilk GOF
#>
#> Hypothesized Distribution: Lognormal
#>
#> Estimated Parameter(s): meanlog = 2.7998821
#> sdlog = 0.9335277
#>
#> Estimation Method: mvue
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: W = 0.969805
#>
#> Test Statistic Parameter: n = 19
#>
#> P-value: 0.7725528
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#> $test.results$lnorm$`Lilliefors (Kolmogorov-Smirnov) GOF`
#>
#> Results of Goodness-of-Fit Test
#> -------------------------------
#>
#> Test Method: Lilliefors (Kolmogorov-Smirnov) GOF
#>
#> Hypothesized Distribution: Lognormal
#>
#> Estimated Parameter(s): meanlog = 2.7998821
#> sdlog = 0.9335277
#>
#> Estimation Method: mvue
#>
#> Data: Manganese.ppb
#>
#> Sample Size: 19
#>
#> Test Statistic: D = 0.138547
#>
#> Test Statistic Parameter: n = 19
#>
#> P-value: 0.4385195
#>
#> Alternative Hypothesis: True cdf does not equal the
#> Lognormal Distribution.
#>
#> $test.results$lnorm$text
#> [1] "Data appear Lognormal at 5% Significance Level"
#>
#>
#>
#> $<NA>
#> NULL
#>
#> $censoring.name
#> [1] "Censored"
#>
#> $data
#> [1] 5.0 12.1 16.9 21.6 2.0 5.0 7.7 53.6 9.5 45.9 5.0 5.3
#> [13] 12.6 106.3 34.5 6.3 11.9 10.0 2.0 77.2 17.9 22.7 3.3 8.4
#> [25] 2.0
#>
#> $censored
#> [1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> [25] TRUE
#>
#> $data.name
#> [1] "Manganese.ppb"
#>
#> attr(,"class")
#> [1] "distChooseCensored"
#Results of Choosing Distribution
#--------------------------------
#
#Candidate Distributions: Normal
# Gamma
# Lognormal
#
#Choice Method: ProUCL
#
#Type I Error per Test: 0.05
#
#Decision: Gamma
#
#Estimated Parameter(s): shape = 1.284882
# scale = 19.813413
#
#Estimation Method: MLE
#
#Data: Manganese.ppb
#
#Sample Size: 25
#
#Censoring Side: left
#
#Censoring Variable: Censored
#
#Censoring Level(s): 2 5
#
#Percent Censored: 24%
#
#ProUCL Sample Size: 19
#
#Test Results:
#
# Normal
# Shapiro-Wilk GOF
# Test Statistic: W = 0.7423947
# P-value: 0.0001862975
# Lilliefors (Kolmogorov-Smirnov) GOF
# Test Statistic: D = 0.2768771
# P-value: 0.0004771155
#
# Gamma
# ProUCL Anderson-Darling Gamma GOF
# Test Statistic: A = 0.6857121
# P-value: 0.05 <= p < 0.10
# ProUCL Kolmogorov-Smirnov Gamma GOF
# Test Statistic: D = 0.1830034
# P-value: >= 0.10
#
# Lognormal
# Shapiro-Wilk GOF
# Test Statistic: W = 0.969805
# P-value: 0.7725528
# Lilliefors (Kolmogorov-Smirnov) GOF
# Test Statistic: D = 0.138547
# P-value: 0.4385195
#====================================================================
if (FALSE) { # \dontrun{
# 1) Simulate 1000 trials where for each trial you:
# a) Generate 30 observations from a Gamma distribution with
# parameters mean = 10 and CV = 1.
# b) Censor observations less than 5 (the 39th percentile).
# c) Use distChooseCensored with the Shapiro-Francia method.
# d) Use distChooseCensored with the ProUCL method.
#
# 2) Compare the proportion of times the
# Normal vs. Gamma vs. Lognormal vs. Nonparametric distribution
# is chosen for c) and d) above.
#------------------------------------------------------------------
set.seed(58)
N <- 1000
Choose.fac <- factor(rep("", N), levels = c("Normal", "Gamma", "Lognormal", "Nonparametric"))
Choose.df <- data.frame(SW = Choose.fac, ProUCL = Choose.fac)
for(i in 1:N) {
dat <- rgammaAlt(30, mean = 10, cv = 1)
censored <- dat < 5
dat[censored] <- 5
Choose.df[i, "SW"] <- distChooseCensored(dat, censored, method = "sw")$decision
Choose.df[i, "ProUCL"] <- distChooseCensored(dat, censored, method = "proucl")$decision
}
summaryStats(Choose.df, digits = 0)
# ProUCL(N) ProUCL(Pct) SW(N) SW(Pct)
#Normal 520 52 398 40
#Gamma 336 34 375 38
#Lognormal 105 10 221 22
#Nonparametric 39 4 6 1
#Combined 1000 100 1000 100
#--------------------
# Repeat above example for the Lognormal Distribution with mean=10 and CV = 1.
# In this case, 5 is the 34th percentile.
#-----------------------------------------------------------------------------
set.seed(297)
N <- 1000
Choose.fac <- factor(rep("", N), levels = c("Normal", "Gamma", "Lognormal", "Nonparametric"))
Choose.df <- data.frame(SW = Choose.fac, ProUCL = Choose.fac)
for(i in 1:N) {
dat <- rlnormAlt(30, mean = 10, cv = 1)
censored <- dat < 5
dat[censored] <- 5
Choose.df[i, "SW"] <- distChooseCensored(dat, censored, method = "sf")$decision
Choose.df[i, "ProUCL"] <- distChooseCensored(dat, censored, method = "proucl")$decision
}
summaryStats(Choose.df, digits = 0)
# ProUCL(N) ProUCL(Pct) SW(N) SW(Pct)
#Normal 277 28 92 9
#Gamma 393 39 231 23
#Lognormal 190 19 624 62
#Nonparametric 140 14 53 5
#Combined 1000 100 1000 100
#--------------------
# Clean up
#---------
rm(N, Choose.fac, Choose.df, i, dat, censored)
} # }