ebinom.Rd
Estimate \(p\) (the probability of “success”) for a binomial distribution, and optionally construct a confidence interval for \(p\).
ebinom(x, size = NULL, method = "mle/mme/mvue", ci = FALSE,
ci.type = "two-sided", ci.method = "score", correct = TRUE,
var.denom = "n", conf.level = 0.95, warn = TRUE)
numeric or logical vector of observations. When size
is not supplied, x
must be
a numeric vector of 0s (“failures”) and 1s (“successes”), or else a logical vector
of FALSE
values (“failures”) and TRUE
values (“successes”). When
size
is supplied, x
must be a non-negative integer containing the number of
“successes” out of the number of trials indicated by size
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
) values are
allowed but will be removed.
positive integer indicating the of number of trials; size
must be at least as
large as the value of x
.
character string specifying the method of estimation. The only possible value is
"mle/mme/mvue"
(maximum likelihood, method of moments, and minimum variance unbiased).
See the DETAILS section for more information.
logical scalar indicating whether to compute a confidence interval for the mean. The default value
is ci=FALSE
.
character string indicating what kind of confidence interval to compute. The possible values are
"two-sided"
(the default), "lower"
, and "upper"
. This
argument is ignored if ci=FALSE
.
character string indicating which method to use to construct the confidence interval. Possible values
are "score"
(the default), "exact"
, "adjusted Wald"
, and "Wald"
.
This argument is ignored if ci=FALSE
.
logical scalar indicating whether to use the continuity correction when ci.method="score"
or ci.method="Wald"
.
The default value is correct=TRUE
.
character string indicating what value to use in the denominator of the variance estimator when
ci.method="Wald"
. Possible values are "n"
(the default) and "n-1"
.
This argument is ignored if ci=FALSE
.
a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default
value is conf.level=0.95
. This argument is ignored if ci=FALSE
.
a logical scalar indicating whether to issue a waning in the case when ci=TRUE
,
ci.method="Wald"
, and any of the following conditions is true: the estimated
proportion is less than 0.2, the estimated proportion is greater than 0.8, the number of
successes or failures is less than 5. The default value is warn=TRUE
.
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to performing the estimation.
If \(\underline{x}\) is a vector of \(n\) observations from a binomial distribution with
parameters size=
\(1\) and prob=
\(p\), then the sum of all the values in
\(\underline{x}\) is an observation from a binomial distribution with parameters
size=
\(n\) and prob=
\(p\).
If \(x\) is an observation from a binomial distribution with parameters size=
\(n\)
and prob=
\(p\), the maximum likelihood estimator (mle), method of moments estimator (mme),
and minimum variance unbiased estimator (mvue) of \(p\) is simply \(x/n\).
Confidence Intervals.
ci.method="score"
The confidence interval for \(p\) based on the
score method was developed by Wilson (1927) and is discussed by Newcombe (1998a),
Agresti and Coull (1998), and Agresti and Caffo (2000). When ci=TRUE
and
ci.method="score"
, the function ebinom
calls the R function
prop.test
to compute the confidence interval. This method
has been shown to provide the best performance (in terms of actual coverage matching assumed
coverage) of all the methods provided here, although unlike the exact method, the actual
coverage can fall below the assumed coverage.
ci.method="exact"
The confidence interval for \(p\) based on the
exact (Clopper-Pearson) method is discussed by Newcombe (1998a), Agresti and Coull (1998),
and Zar (2010, pp.543-547). This is the method used in the R function
binom.test
. This method ensures the actual coverage is greater than or
equal to the assumed coverage.
ci.method="Wald"
The confidence interval for \(p\) based on the Wald method (with or without a correction for continuity) is the usual “normal approximation” method and is discussed by Newcombe (1998a), Agresti and Coull (1998), Agresti and Caffo (2000), and Zar (2010, pp.543-547). This method is never recommended but is included for historical purposes.
ci.method="adjusted Wald"
The confidence interval for \(p\) based on the adjusted Wald method is discussed by Agresti and Coull (1998), Agresti and Caffo (2000), and Zar (2010, pp.543-547). This is a simple modification of the Wald method and performs surpringly well.
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
Agresti, A., and B.A. Coull. (1998). Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions. The American Statistician, 52(2), 119–126.
Agresti, A., and B. Caffo. (2000). Simple and Effective Confidence Intervals for Proportions and Differences of Proportions Result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288.
Berthouex, P.M., and L.C. Brown. (1994). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton, FL, Chapters 2 and 15.
Cochran, W.G. (1977). Sampling Techniques. John Wiley and Sons, New York, Chapter 3.
Fisher, R.A., and F. Yates. (1963). Statistical Tables for Biological, Agricultural, and Medical Research. 6th edition. Hafner, New York, 146pp.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. Second Edition. John Wiley and Sons, New York, Chapters 1-2.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 11.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 3.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.
Newcombe, R.G. (1998a). Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, 857–872.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL, Chapter 4.
USEPA. (1989b). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities, Interim Final Guidance. EPA/530-SW-89-026. Office of Solid Waste, U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.6-38.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapter 24.
The binomial distribution is used to model processes with binary (Yes-No, Success-Failure, Heads-Tails, etc.) outcomes. It is assumed that the outcome of any one trial is independent of any other trial, and that the probability of “success”, \(p\), is the same on each trial. A binomial discrete random variable \(X\) is the number of “successes” in \(n\) independent trials. A special case of the binomial distribution occurs when \(n=1\), in which case \(X\) is also called a Bernoulli random variable.
In the context of environmental statistics, the binomial distribution is sometimes used to model the proportion of times a chemical concentration exceeds a set standard in a given period of time (e.g., Gilbert, 1987, p.143). The binomial distribution is also used to compute an upper bound on the overall Type I error rate for deciding whether a facility or location is in compliance with some set standard. Assume the null hypothesis is that the facility is in compliance. If a test of hypothesis is conducted periodically over time to test compliance and/or several tests are performed during each time period, and the facility or location is always in compliance, and each single test has a Type I error rate of \(\alpha\), and the result of each test is independent of the result of any other test (usually not a reasonable assumption), then the number of times the facility is declared out of compliance when in fact it is in compliance is a binomial random variable with probability of “success” \(p=\alpha\) being the probability of being declared out of compliance (see USEPA, 2009).
# Generate 20 observations from a binomial distribution with
# parameters size=1 and prob=0.2, then estimate the 'prob' parameter.
# (Note: the call to set.seed simply allows you to reproduce this
# example. Also, the only parameter estimated is 'prob'; 'size' is
# specified in the call to ebinom. The parameter 'size' is printed
# inorder to show all of the parameters associated with the
# distribution.)
set.seed(251)
dat <- rbinom(20, size = 1, prob = 0.2)
ebinom(dat)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Binomial
#>
#> Estimated Parameter(s): size = 20.0
#> prob = 0.1
#>
#> Estimation Method: mle/mme/mvue for 'prob'
#>
#> Data: dat
#>
#> Sample Size: 20
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 20.0
# prob = 0.1
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: dat
#
#Sample Size: 20
#----------------------------------------------------------------
# Generate one observation from a binomial distribution with
# parameters size=20 and prob=0.2, then estimate the "prob"
# parameter and compute a confidence interval:
set.seed(763)
dat <- rbinom(1, size=20, prob=0.2)
ebinom(dat, size = 20, ci = TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Binomial
#>
#> Estimated Parameter(s): size = 20.00
#> prob = 0.35
#>
#> Estimation Method: mle/mme/mvue for 'prob'
#>
#> Data: dat
#>
#> Sample Size: 20
#>
#> Confidence Interval for: prob
#>
#> Confidence Interval Method: Score normal approximation
#> (With continuity correction)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 0.1630867
#> UCL = 0.5905104
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 20.00
# prob = 0.35
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: dat
#
#Sample Size: 20
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.1630867
# UCL = 0.5905104
#----------------------------------------------------------------
# Using the data from the last example, compare confidence
# intervals based on the various methods
ebinom(dat, size = 20, ci = TRUE,
ci.method = "score", correct = TRUE)$interval$limits
#> LCL UCL
#> 0.1630867 0.5905104
# LCL UCL
#0.1630867 0.5905104
ebinom(dat, size = 20, ci = TRUE,
ci.method = "score", correct = FALSE)$interval$limits
#> LCL UCL
#> 0.1811918 0.5671457
# LCL UCL
#0.1811918 0.5671457
ebinom(dat, size = 20, ci = TRUE,
ci.method = "exact")$interval$limits
#> LCL UCL
#> 0.1539092 0.5921885
# LCL UCL
#0.1539092 0.5921885
ebinom(dat, size = 20, ci = TRUE,
ci.method = "adjusted Wald")$interval$limits
#> LCL UCL
#> 0.1799264 0.5684112
# LCL UCL
#0.1799264 0.5684112
ebinom(dat, size = 20, ci = TRUE,
ci.method = "Wald", correct = TRUE)$interval$limits
#> LCL UCL
#> 0.1159627 0.5840373
# LCL UCL
#0.1159627 0.5840373
ebinom(dat, size = 20, ci = TRUE,
ci.method = "Wald", correct = FALSE)$interval$limits
#> LCL UCL
#> 0.1409627 0.5590373
# LCL UCL
#0.1409627 0.5590373
#----------------------------------------------------------------
# Use the cadmium data on page 8-6 of USEPA (1989b) to compute
# two-sided 95% confidence intervals for the probability of
# detection at background and compliance wells. The data are
# stored in EPA.89b.cadmium.df.
EPA.89b.cadmium.df
#> Cadmium.orig Cadmium Censored Well.type
#> 1 0.1 0.100 FALSE Background
#> 2 0.12 0.120 FALSE Background
#> 3 BDL 0.000 TRUE Background
#> 4 0.26 0.260 FALSE Background
#> 5 BDL 0.000 TRUE Background
#> 6 0.1 0.100 FALSE Background
#> 7 BDL 0.000 TRUE Background
#> 8 0.014 0.014 FALSE Background
#> 9 BDL 0.000 TRUE Background
#> 10 BDL 0.000 TRUE Background
#> 11 BDL 0.000 TRUE Background
#> 12 BDL 0.000 TRUE Background
#> 13 BDL 0.000 TRUE Background
#> 14 0.12 0.120 FALSE Background
#> 15 BDL 0.000 TRUE Background
#> 16 0.21 0.210 FALSE Background
#> 17 BDL 0.000 TRUE Background
#> 18 0.12 0.120 FALSE Background
#> 19 BDL 0.000 TRUE Background
#> 20 BDL 0.000 TRUE Background
#> 21 BDL 0.000 TRUE Background
#> 22 BDL 0.000 TRUE Background
#> 23 BDL 0.000 TRUE Background
#> 24 BDL 0.000 TRUE Background
#> 25 0.12 0.120 FALSE Compliance
#> 26 0.08 0.080 FALSE Compliance
#> 27 BDL 0.000 TRUE Compliance
#> 28 0.2 0.200 FALSE Compliance
#> 29 BDL 0.000 TRUE Compliance
#> 30 0.1 0.100 FALSE Compliance
#> 31 BDL 0.000 TRUE Compliance
#> 32 0.012 0.012 FALSE Compliance
#> 33 BDL 0.000 TRUE Compliance
#> 34 BDL 0.000 TRUE Compliance
#> 35 BDL 0.000 TRUE Compliance
#> 36 BDL 0.000 TRUE Compliance
#> 37 BDL 0.000 TRUE Compliance
#> 38 0.12 0.120 FALSE Compliance
#> 39 0.07 0.070 FALSE Compliance
#> 40 BDL 0.000 TRUE Compliance
#> 41 0.19 0.190 FALSE Compliance
#> 42 BDL 0.000 TRUE Compliance
#> 43 0.1 0.100 FALSE Compliance
#> 44 BDL 0.000 TRUE Compliance
#> 45 0.01 0.010 FALSE Compliance
#> 46 BDL 0.000 TRUE Compliance
#> 47 BDL 0.000 TRUE Compliance
#> 48 BDL 0.000 TRUE Compliance
#> 49 BDL 0.000 TRUE Compliance
#> 50 BDL 0.000 TRUE Compliance
#> 51 0.11 0.110 FALSE Compliance
#> 52 0.06 0.060 FALSE Compliance
#> 53 BDL 0.000 TRUE Compliance
#> 54 0.23 0.230 FALSE Compliance
#> 55 BDL 0.000 TRUE Compliance
#> 56 0.11 0.110 FALSE Compliance
#> 57 BDL 0.000 TRUE Compliance
#> 58 0.031 0.031 FALSE Compliance
#> 59 BDL 0.000 TRUE Compliance
#> 60 BDL 0.000 TRUE Compliance
#> 61 BDL 0.000 TRUE Compliance
#> 62 BDL 0.000 TRUE Compliance
#> 63 BDL 0.000 TRUE Compliance
#> 64 0.12 0.120 FALSE Compliance
#> 65 0.08 0.080 FALSE Compliance
#> 66 BDL 0.000 TRUE Compliance
#> 67 0.26 0.260 FALSE Compliance
#> 68 BDL 0.000 TRUE Compliance
#> 69 0.02 0.020 FALSE Compliance
#> 70 BDL 0.000 TRUE Compliance
#> 71 0.024 0.024 FALSE Compliance
#> 72 BDL 0.000 TRUE Compliance
#> 73 BDL 0.000 TRUE Compliance
#> 74 BDL 0.000 TRUE Compliance
#> 75 BDL 0.000 TRUE Compliance
#> 76 BDL 0.000 TRUE Compliance
#> 77 0.1 0.100 FALSE Compliance
#> 78 0.04 0.040 FALSE Compliance
#> 79 BDL 0.000 TRUE Compliance
#> 80 BDL 0.000 TRUE Compliance
#> 81 0.1 0.100 FALSE Compliance
#> 82 BDL 0.000 TRUE Compliance
#> 83 0.01 0.010 FALSE Compliance
#> 84 BDL 0.000 TRUE Compliance
#> 85 BDL 0.000 TRUE Compliance
#> 86 BDL 0.000 TRUE Compliance
#> 87 BDL 0.000 TRUE Compliance
#> 88 BDL 0.000 TRUE Compliance
# Cadmium.orig Cadmium Censored Well.type
#1 0.1 0.100 FALSE Background
#2 0.12 0.120 FALSE Background
#3 BDL 0.000 TRUE Background
#...
#86 BDL 0.000 TRUE Compliance
#87 BDL 0.000 TRUE Compliance
#88 BDL 0.000 TRUE Compliance
attach(EPA.89b.cadmium.df)
# Probability of detection at Background well:
#--------------------------------------------
ebinom(!Censored[Well.type=="Background"], ci=TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Binomial
#>
#> Estimated Parameter(s): size = 24.0000000
#> prob = 0.3333333
#>
#> Estimation Method: mle/mme/mvue for 'prob'
#>
#> Data: !Censored[Well.type == "Background"]
#>
#> Sample Size: 24
#>
#> Confidence Interval for: prob
#>
#> Confidence Interval Method: Score normal approximation
#> (With continuity correction)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 0.1642654
#> UCL = 0.5530745
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 24.0000000
# prob = 0.3333333
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: !Censored[Well.type == "Background"]
#
#Sample Size: 24
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.1642654
# UCL = 0.5530745
# Probability of detection at Compliance well:
#--------------------------------------------
ebinom(!Censored[Well.type=="Compliance"], ci=TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Binomial
#>
#> Estimated Parameter(s): size = 64.000
#> prob = 0.375
#>
#> Estimation Method: mle/mme/mvue for 'prob'
#>
#> Data: !Censored[Well.type == "Compliance"]
#>
#> Sample Size: 64
#>
#> Confidence Interval for: prob
#>
#> Confidence Interval Method: Score normal approximation
#> (With continuity correction)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 0.2597567
#> UCL = 0.5053034
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 64.000
# prob = 0.375
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: !Censored[Well.type == "Compliance"]
#
#Sample Size: 64
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.2597567
# UCL = 0.5053034
#----------------------------------------------------------------
# Clean up
rm(dat)
detach("EPA.89b.cadmium.df")