Create a table of confidence intervals for probability of "success" for a binomial distribution or the difference between two proportions following Bacchetti (2010), by varying the estimated proportion or differene between the two estimated proportions given the sample size(s).

ciTableProp(n1 = 10, p1.hat = c(0.1, 0.2, 0.3), n2 = n1, 
    p2.hat.minus.p1.hat = c(0.2, 0.1, 0), sample.type = "two.sample", 
    ci.type = "two.sided", conf.level = 0.95, digits = 2, ci.method = "score", 
    correct = TRUE, tol = 10^-(digits + 1))

Arguments

n1

positive integer greater than 1 specifying the sample size when
sample.type="one.sample" or the sample size for group 1 when
sample.type="two.sample". The default value is n1=10.

p1.hat

numeric vector of values between 0 and 1 indicating the estimated proportion (sample.type="one.sample") or the estimated proportion for group 1
(sample.type="two.sample"). The default value is c(0.1, 0.2, 0.3). Missing (NA), undefined (NaN), an infinite (-Inf, Inf) values are not allowed.

n2

positive integer greater than 1 specifying the sample size for group 2 when sample.type="two.sample". The default value is n2=n1, i.e., equal sample sizes. This argument is ignored when sample.type="one.sample".

p2.hat.minus.p1.hat

numeric vector indicating the assumed difference between the two sample proportions when sample.type="two.sample". The default value is c(0.2, 0.1, 0). Missing (NA), undefined (NaN), an infinite (-Inf, Inf) values are not allowed. This argument is ignored when sample.type="one.sample".

sample.type

character string specifying whether to create confidence intervals for the difference between two proportions (sample.type="two.sample"; the default) or confidence intervals for a single proportion (sample.type="one.sample").

ci.type

character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper".

conf.level

a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95.

digits

positive integer indicating how many decimal places to display in the table. The default value is digits=2.

ci.method

character string indicating the method to use to construct the confidence interval. The default value is ci.method="score" (i.e., the score method; see the help file for prop.test), which is the only method available when
sample.type="two.sample". When sample.type="one.sample", you may also set ci.method="exact" (i.e., the exact method).

correct

logical scalar indicating whether to use the correction for continuity when
ci.method="score" (see the help file for prop.test). The default value is correct=TRUE.

tol

numeric scalar indicating how close the values of the adjusted elements of
p2.hat.minus.p1.hat have to be in order to provide a simply display of confidence intervals (see DETAILS section below). The default value is
tol=10^-(digits + 1).

Details

One-Sample Case (sample.type="one.sample")
For the one-sample case, the function ciTableProp calls the R function prop.test when
ci.method="score", and calls the R function binom.test, when ci.method="exact". To ensure that the user-supplied values of p1.hat are valid for the given user-supplied values of n1, values for the argument x to the function prop.test or binom.test are computed using the formula

x <- unique(round((p1.hat * n1), 0))

and the argument p.hat is then adjusted using the formula

p.hat <- x/n1

Two-Sample Case (sample.type="two.sample")
For the two-sample case, the function ciTableProp calls the R function prop.test. To ensure that the user-supplied values of p1.hat are valid for the given user-supplied values of n1, the values for the first component of the argument x to the function prop.test are computed using the formula

x1 <- unique(round((p1.hat * n1), 0))

and the argument p1.hat is then adjusted using the formula

p1.hat <- x1/n1

Next, the estimated proportions from group 2 are computed by adding together all possible combinations from the elements of p1.hat and p2.hat.minus.p1.hat. These estimated proportions from group 2 are then adjusted using the formulas:

x2.rep <- round((p2.hat.rep * n2), 0)
p2.hat.rep <- x2.rep/n2

If any of these adjusted proportions from group 2 are \(\le 0\) or \(\ge 1\) the function terminates with a message indicating that impossible values have been supplied.

In cases where the sample sizes are small there may be instances where the user-supplied values of p1.hat and/or p2.hat.minus.p1.hat are not attainable. The argument tol is used to determine whether to return the table in conventional form or whether it is necessary to modify the table to include twice as many columns (see EXAMPLES section below).

Value

a data frame with elements that are character strings indicating the confidence intervals.

When sample.type="two.sample", a data frame with the rows varying the estimated proportion for group 1 (i.e., the values of p1.hat) and the columns varying the estimated difference between the proportions from group 2 and group 1 (i.e., the values of p2.hat.minus.p1.hat). In cases where the sample sizes are small, it may not be possible to obtain certain differences for given values of p1.hat, in which case the returned data frame contains twice as many columns indicating the actual difference in one column and the compute confidence interval next to it (see EXAMPLES section below).

When sample.type="one.sample", a 1-row data frame with the columns varying the estimated proportion (i.e., the values of p1.hat).

References

Bacchetti, P. (2010). Current sample size conventions: Flaws, Harms, and Alternatives. BMC Medicine 8, 17–23.

Also see the references in the help files for prop.test and binom.test.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Bacchetti (2010) presents strong arguments against the current convention in scientific research for computing sample size that is based on formulas that use a fixed Type I error (usually 5%) and a fixed minimal power (often 80%) without regard to costs. He notes that a key input to these formulas is a measure of variability (usually a standard deviation) that is difficult to measure accurately "unless there is so much preliminary data that the study isn't really needed." Also, study designers often avoid defining what a scientifically meaningful difference is by presenting sample size results in terms of the effect size (i.e., the difference of interest divided by the elusive standard deviation). Bacchetti (2010) encourages study designers to use simple tables in a sensitivity analysis to see what results of a study may look like for low, moderate, and high rates of variability and large, intermediate, and no underlying differences in the populations or processes being studied.

Examples

  # Reproduce Table 1 in Bacchetti (2010).  This involves planning a study with 
  # n1 = n2 = 935 subjects per group, where Group 1 is the control group and 
  # Group 2 is the treatment group.  The outcome in the study is proportion of 
  # subjects with serious outcomes or death.  A negative value for the difference 
  # in proportions between groups (Group 2 proportion - Group 1 proportion) 
  # indicates the treatment group has a better outcome.  In this table, the 
  # proportion of subjects in Group 1 with serious outcomes or death is set 
  # to 3%, 6.5%, and 12%, and the difference in proportions between the two 
  # groups is set to -2.8 percentage points, -1.4 percentage points, and 0.

  ciTableProp(n1 = 935, p1.hat = c(0.03, 0.065, 0.12), n2 = 935, 
    p2.hat.minus.p1.hat = c(-0.028, -0.014, 0), digits = 3)
#>                   Diff=-0.028      Diff=-0.014           Diff=0
#> P1.hat=0.030 [-0.040, -0.015] [-0.029,  0.001] [-0.015,  0.015]
#> P1.hat=0.065 [-0.049, -0.007] [-0.036,  0.008] [-0.022,  0.022]
#> P1.hat=0.120 [-0.057,  0.001] [-0.044,  0.016] [-0.029,  0.029]
  #                  Diff=-0.028      Diff=-0.014           Diff=0
  #P1.hat=0.030 [-0.040, -0.015] [-0.029,  0.001] [-0.015,  0.015]
  #P1.hat=0.065 [-0.049, -0.007] [-0.036,  0.008] [-0.022,  0.022]
  #P1.hat=0.120 [-0.057,  0.001] [-0.044,  0.016] [-0.029,  0.029]

  #==========

  # Show how the returned data frame has to be modified for cases of small 
  # sample sizes where not all user-supplied differenes are possible.

  ciTableProp(n1 = 5, n2 = 5, p1.hat = c(0.3, 0.6, 0.12), p2.hat = c(0.2, 0.1, 0))
#>            Diff            CI Diff            CI Diff            CI
#> P1.hat=0.4  0.2 [-0.61, 1.00]  0.0 [-0.61, 0.61]    0 [-0.61, 0.61]
#> P1.hat=0.6  0.2 [-0.55, 0.95]  0.2 [-0.55, 0.95]    0 [-0.61, 0.61]
#> P1.hat=0.2  0.2 [-0.55, 0.95]  0.2 [-0.55, 0.95]    0 [-0.50, 0.50]
  #           Diff            CI Diff            CI Diff            CI
  #P1.hat=0.4  0.2 [-0.61, 1.00]  0.0 [-0.61, 0.61]    0 [-0.61, 0.61]
  #P1.hat=0.6  0.2 [-0.55, 0.95]  0.2 [-0.55, 0.95]    0 [-0.61, 0.61]
  #P1.hat=0.2  0.2 [-0.55, 0.95]  0.2 [-0.55, 0.95]    0 [-0.50, 0.50]

  #==========

  # Suppose we are planning a study to compare the proportion of nondetects at 
  # a background and downgradient well, and we can use ciTableProp to look how 
  # the confidence interval for the difference between the two proportions using 
  # say 36 quarterly samples at each well varies with the observed estimated 
  # proportions.  Here we'll let the argument "p1.hat" denote the proportion of 
  # nondetects observed at the downgradient well and set this equal to 
  # 20%, 40% and 60%.  The argument "p2.hat.minus.p1.hat" represents the proportion 
  # of nondetects at the background well minus the proportion of nondetects at the 
  # downgradient well.

  ciTableProp(n1 = 36, p1.hat = c(0.2, 0.4, 0.6), n2 = 36, 
    p2.hat.minus.p1.hat = c(0.3, 0.15, 0))
#>                 Diff=0.31     Diff=0.14        Diff=0
#> P1.hat=0.19 [ 0.07, 0.54] [-0.09, 0.37] [-0.18, 0.18]
#> P1.hat=0.39 [ 0.06, 0.55] [-0.12, 0.39] [-0.23, 0.23]
#> P1.hat=0.61 [ 0.09, 0.52] [-0.10, 0.38] [-0.23, 0.23]
  #                Diff=0.31     Diff=0.14        Diff=0
  #P1.hat=0.19 [ 0.07, 0.54] [-0.09, 0.37] [-0.18, 0.18]
  #P1.hat=0.39 [ 0.06, 0.55] [-0.12, 0.39] [-0.23, 0.23]
  #P1.hat=0.61 [ 0.09, 0.52] [-0.10, 0.38] [-0.23, 0.23]

  # We see that even if the observed difference in the proportion of nondetects 
  # is about 15 percentage points, all of the confidence intervals for the 
  # difference between the proportions of nondetects at the two wells contain 0, 
  # so if a difference of 15 percentage points is important to substantiate, we 
  # may need to increase our sample sizes.