ciTableProp.Rd
Create a table of confidence intervals for probability of "success" for a binomial distribution or the difference between two proportions following Bacchetti (2010), by varying the estimated proportion or differene between the two estimated proportions given the sample size(s).
positive integer greater than 1 specifying the sample size when sample.type="one.sample"
or the sample size for group 1 when sample.type="two.sample"
. The default value is n1=10
.
numeric vector of values between 0 and 1 indicating the estimated proportion
(sample.type="one.sample"
) or the estimated proportion for group 1
(sample.type="two.sample"
). The default value is c(0.1, 0.2, 0.3)
.
Missing (NA
), undefined (NaN
), an infinite
(-Inf
, Inf
) values are not allowed.
positive integer greater than 1 specifying the sample size for group 2 when
sample.type="two.sample"
. The default value is n2=n1
, i.e.,
equal sample sizes. This argument is ignored when sample.type="one.sample"
.
numeric vector indicating the assumed difference between the two sample proportions
when sample.type="two.sample"
. The default value is c(0.2, 0.1, 0)
.
Missing (NA
), undefined (NaN
), an infinite
(-Inf
, Inf
) values are not allowed. This argument is ignored when
sample.type="one.sample"
.
character string specifying whether to create confidence intervals for the difference
between two proportions (sample.type="two.sample"
; the default) or confidence
intervals for a single proportion (sample.type="one.sample"
).
character string indicating what kind of confidence interval to compute. The
possible values are "two-sided"
(the default), "lower"
, and
"upper"
.
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is conf.level=0.95
.
positive integer indicating how many decimal places to display in the table. The
default value is digits=2
.
character string indicating the method to use to construct the confidence interval.
The default value is ci.method="score"
(i.e., the score method; see the
help file for prop.test
), which is the only method available when sample.type="two.sample"
. When sample.type="one.sample"
, you may
also set ci.method="exact"
(i.e., the exact method).
logical scalar indicating whether to use the correction for continuity when ci.method="score"
(see the help file for prop.test
). The
default value is correct=TRUE
.
numeric scalar indicating how close the values of the adjusted elements of p2.hat.minus.p1.hat
have to be in order to provide a simply display
of confidence intervals (see DETAILS section below). The default value is tol=10^-(digits + 1)
.
One-Sample Case (sample.type="one.sample"
)
For the one-sample case, the function ciTableProp
calls the R function
prop.test
when ci.method="score"
, and calls the R function
binom.test
, when ci.method="exact"
. To ensure that the
user-supplied values of p1.hat
are valid for the given user-supplied values
of n1
, values for the argument x
to the function
prop.test
or binom.test
are computed using the formula
x <- unique(round((p1.hat * n1), 0))
and the argument p.hat
is then adjusted using the formula
p.hat <- x/n1
Two-Sample Case (sample.type="two.sample"
)
For the two-sample case, the function ciTableProp
calls the R function
prop.test
. To ensure that the user-supplied values of p1.hat
are valid for the given user-supplied values of n1
, the values for the
first component of the argument x
to the function
prop.test
are computed using the formula
x1 <- unique(round((p1.hat * n1), 0))
and the argument p1.hat
is then adjusted using the formula
p1.hat <- x1/n1
Next, the estimated proportions from group 2 are computed by adding together all
possible combinations from the elements of p1.hat
and
p2.hat.minus.p1.hat
. These estimated proportions from group 2 are then
adjusted using the formulas:
x2.rep <- round((p2.hat.rep * n2), 0)
p2.hat.rep <- x2.rep/n2
If any of these adjusted proportions from group 2 are \(\le 0\) or \(\ge 1\) the function terminates with a message indicating that impossible values have been supplied.
In cases where the sample sizes are small there may be instances where the
user-supplied values of p1.hat
and/or p2.hat.minus.p1.hat
are not
attainable. The argument tol
is used to determine whether to return
the table in conventional form or whether it is necessary to modify the table
to include twice as many columns (see EXAMPLES section below).
a data frame with elements that are character strings indicating the confidence intervals.
When sample.type="two.sample"
, a data frame with the rows varying
the estimated proportion for group 1 (i.e., the values of p1.hat
) and
the columns varying the estimated difference between the proportions from
group 2 and group 1 (i.e., the values of p2.hat.minus.p1.hat
). In cases
where the sample sizes are small, it may not be possible to obtain certain
differences for given values of p1.hat
, in which case the returned
data frame contains twice as many columns indicating the actual difference
in one column and the compute confidence interval next to it (see EXAMPLES
section below).
When sample.type="one.sample"
, a 1-row data frame with the columns
varying the estimated proportion (i.e., the values of p1.hat
).
Bacchetti, P. (2010). Current sample size conventions: Flaws, Harms, and Alternatives. BMC Medicine 8, 17–23.
Also see the references in the help files for prop.test
and
binom.test
.
Bacchetti (2010) presents strong arguments against the current convention in scientific research for computing sample size that is based on formulas that use a fixed Type I error (usually 5%) and a fixed minimal power (often 80%) without regard to costs. He notes that a key input to these formulas is a measure of variability (usually a standard deviation) that is difficult to measure accurately "unless there is so much preliminary data that the study isn't really needed." Also, study designers often avoid defining what a scientifically meaningful difference is by presenting sample size results in terms of the effect size (i.e., the difference of interest divided by the elusive standard deviation). Bacchetti (2010) encourages study designers to use simple tables in a sensitivity analysis to see what results of a study may look like for low, moderate, and high rates of variability and large, intermediate, and no underlying differences in the populations or processes being studied.
# Reproduce Table 1 in Bacchetti (2010). This involves planning a study with
# n1 = n2 = 935 subjects per group, where Group 1 is the control group and
# Group 2 is the treatment group. The outcome in the study is proportion of
# subjects with serious outcomes or death. A negative value for the difference
# in proportions between groups (Group 2 proportion - Group 1 proportion)
# indicates the treatment group has a better outcome. In this table, the
# proportion of subjects in Group 1 with serious outcomes or death is set
# to 3%, 6.5%, and 12%, and the difference in proportions between the two
# groups is set to -2.8 percentage points, -1.4 percentage points, and 0.
ciTableProp(n1 = 935, p1.hat = c(0.03, 0.065, 0.12), n2 = 935,
p2.hat.minus.p1.hat = c(-0.028, -0.014, 0), digits = 3)
#> Diff=-0.028 Diff=-0.014 Diff=0
#> P1.hat=0.030 [-0.040, -0.015] [-0.029, 0.001] [-0.015, 0.015]
#> P1.hat=0.065 [-0.049, -0.007] [-0.036, 0.008] [-0.022, 0.022]
#> P1.hat=0.120 [-0.057, 0.001] [-0.044, 0.016] [-0.029, 0.029]
# Diff=-0.028 Diff=-0.014 Diff=0
#P1.hat=0.030 [-0.040, -0.015] [-0.029, 0.001] [-0.015, 0.015]
#P1.hat=0.065 [-0.049, -0.007] [-0.036, 0.008] [-0.022, 0.022]
#P1.hat=0.120 [-0.057, 0.001] [-0.044, 0.016] [-0.029, 0.029]
#==========
# Show how the returned data frame has to be modified for cases of small
# sample sizes where not all user-supplied differenes are possible.
ciTableProp(n1 = 5, n2 = 5, p1.hat = c(0.3, 0.6, 0.12), p2.hat = c(0.2, 0.1, 0))
#> Diff CI Diff CI Diff CI
#> P1.hat=0.4 0.2 [-0.61, 1.00] 0.0 [-0.61, 0.61] 0 [-0.61, 0.61]
#> P1.hat=0.6 0.2 [-0.55, 0.95] 0.2 [-0.55, 0.95] 0 [-0.61, 0.61]
#> P1.hat=0.2 0.2 [-0.55, 0.95] 0.2 [-0.55, 0.95] 0 [-0.50, 0.50]
# Diff CI Diff CI Diff CI
#P1.hat=0.4 0.2 [-0.61, 1.00] 0.0 [-0.61, 0.61] 0 [-0.61, 0.61]
#P1.hat=0.6 0.2 [-0.55, 0.95] 0.2 [-0.55, 0.95] 0 [-0.61, 0.61]
#P1.hat=0.2 0.2 [-0.55, 0.95] 0.2 [-0.55, 0.95] 0 [-0.50, 0.50]
#==========
# Suppose we are planning a study to compare the proportion of nondetects at
# a background and downgradient well, and we can use ciTableProp to look how
# the confidence interval for the difference between the two proportions using
# say 36 quarterly samples at each well varies with the observed estimated
# proportions. Here we'll let the argument "p1.hat" denote the proportion of
# nondetects observed at the downgradient well and set this equal to
# 20%, 40% and 60%. The argument "p2.hat.minus.p1.hat" represents the proportion
# of nondetects at the background well minus the proportion of nondetects at the
# downgradient well.
ciTableProp(n1 = 36, p1.hat = c(0.2, 0.4, 0.6), n2 = 36,
p2.hat.minus.p1.hat = c(0.3, 0.15, 0))
#> Diff=0.31 Diff=0.14 Diff=0
#> P1.hat=0.19 [ 0.07, 0.54] [-0.09, 0.37] [-0.18, 0.18]
#> P1.hat=0.39 [ 0.06, 0.55] [-0.12, 0.39] [-0.23, 0.23]
#> P1.hat=0.61 [ 0.09, 0.52] [-0.10, 0.38] [-0.23, 0.23]
# Diff=0.31 Diff=0.14 Diff=0
#P1.hat=0.19 [ 0.07, 0.54] [-0.09, 0.37] [-0.18, 0.18]
#P1.hat=0.39 [ 0.06, 0.55] [-0.12, 0.39] [-0.23, 0.23]
#P1.hat=0.61 [ 0.09, 0.52] [-0.10, 0.38] [-0.23, 0.23]
# We see that even if the observed difference in the proportion of nondetects
# is about 15 percentage points, all of the confidence intervals for the
# difference between the proportions of nondetects at the two wells contain 0,
# so if a difference of 15 percentage points is important to substantiate, we
# may need to increase our sample sizes.