ciTableMean.Rd
Create a table of confidence intervals for the mean of a normal distribution or the difference between two means following Bacchetti (2010), by varying the estimated standard deviation and the estimated mean or differene between the two estimated means given the sample size(s).
ciTableMean(n1 = 10, n2 = n1, diff.or.mean = 2:0, SD = 1:3,
sample.type = "two.sample", ci.type = "two.sided", conf.level = 0.95,
digits = 1)
positive integer greater than 1 specifying the sample size when sample.type="one.sample"
or the sample size for group 1 when sample.type="two.sample"
. The default value is n1=10
.
positive integer greater than 1 specifying the sample size for group 2 when
sample.type="two.sample"
. The default value is n2=n1
, i.e.,
equal sample sizes. This argument is ignored when sample.type="one.sample"
.
numeric vector indicating either the assumed difference between the two sample means
when sample.type="two.sample"
or the value of the sample mean when
sample.type="one.sample"
. The default value is diff.or.mean=2:0
.
Missing (NA
), undefined (NaN
), an infinite
(-Inf
, Inf
) values are not allowed.
numeric vector of positive values specifying the assumed estimated standard
deviation. The default value is SD=1:3
.
Missing (NA
), undefined (NaN
), an infinite
(-Inf
, Inf
) values are not allowed.
character string specifying whether to create confidence intervals for the difference
between two means (sample.type="two.sample"
; the default) or confidence
intervals for a single mean (sample.type="one.sample"
).
character string indicating what kind of confidence interval to compute. The
possible values are "two-sided"
(the default), "lower"
, and
"upper"
.
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is conf.level=0.95
.
positive integer indicating how many decimal places to display in the table. The
default value is digits=1
.
Following Bacchetti (2010) (see NOTE below), the function ciTableMean
allows you to perform sensitivity analyses while planning future studies by
producing a table of confidence intervals for the mean or the difference
between two means by varying the estimated standard deviation and the
estimated mean or differene between the two estimated means given the
sample size(s).
One Sample Case (sample.type="one.sample"
)
Let \(\underline{x} = (x_1, x_2, \ldots, x_n)\) be a vector of
\(n\) observations from an normal (Gaussian) distribution with
parameters mean=
\(\mu\) and sd=
\(\sigma\).
The usual confidence interval for \(\mu\) is constructed as follows.
If ci.type="two-sided"
, the \((1-\alpha)\)100% confidence interval for
\(\mu\) is given by:
$$[\hat{\mu} - t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\;\;\; (1)$$
where
$$\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\;\;\; (2)$$
$$\hat{\sigma}^2 = s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (3)$$
and \(t(\nu, p)\) is the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom
(Zar, 2010; Gilbert, 1987; Ott, 1995; Helsel and Hirsch, 1992).
If ci.type="lower"
, the \((1-\alpha)\)100% confidence interval for
\(\mu\) is given by:
$$[\hat{\mu} - t(n-1, 1-\alpha) \frac{\hat{\sigma}}{\sqrt{n}}, \, \infty] \;\;\;\; (4)$$
and if ci.type="upper"
, the confidence interval is given by:
$$[-\infty, \, \hat{\mu} + t(n-1, 1-\alpha/2) \frac{\hat{\sigma}}{\sqrt{n}}] \;\;\;\; (5)$$
For the one-sample case, the argument n1
corresponds to \(n\) in
Equation (1), the argument diff.or.mean
corresponds to
\(\hat{\mu} = \bar{x}\) in Equation (2), and the argument SD
corresponds to
\(\hat{\sigma} = s\) in Equation (3).
Two Sample Case (sample.type="two.sample"
)
Let \(\underline{x}_1 = (x_{11}, x_{21}, \ldots, x_{n_11})\) be a vector of
\(n_1\) observations from an normal (Gaussian) distribution
with parameters mean=
\(\mu_1\) and sd=
\(\sigma\), and let
\(\underline{x}_2 = (x_{12}, x_{22}, \ldots, x_{n_22})\) be a vector of
\(n_2\) observations from an normal (Gaussian) distribution
with parameters mean=
\(\mu_2\) and sd=
\(\sigma\).
The usual confidence interval for the difference between the two population means
\(\mu_1 - \mu_2\) is constructed as follows.
If ci.type="two-sided"
, the \((1-\alpha)\)100% confidence interval for
\(\mu_1 - \mu_2\) is given by:
$$[(\hat{\mu}_1 - \hat{\mu}_2) - t(n_1 + n_2 -2, 1-\alpha/2) \hat{\sigma}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, \; (\hat{\mu}_1 - \hat{\mu}_2) + t(n_1 + n_2 -2, 1-\alpha/2) \hat{\sigma}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}] \;\;\;\;\;\; (6)$$
where
$$\hat{\mu}_1 = \bar{x}_1 = \frac{1}{n_1} \sum_{i=1}^{n_1} x_{i1} \;\;\;\;\;\; (7)$$
$$\hat{\mu}_2 = \bar{x}_2 = \frac{1}{n_2} \sum_{i=1}^{n_2} x_{i2} \;\;\;\;\;\; (8)$$
$$\hat{\sigma}^2 = s_p^2 = \frac{(n_1-1) s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2} \;\;\;\;\;\; (9)$$
$$s_1^2 = \frac{1}{n_1-1} \sum_{i=1}^{n_1} (x_{i1} - \bar{x}_1)^2 \;\;\;\;\;\; (10)$$
$$s_2^2 = \frac{1}{n_2-1} \sum_{i=1}^{n_2} (x_{i2} - \bar{x}_2)^2 \;\;\;\;\;\; (11)$$
and \(t(\nu, p)\) is the \(p\)'th quantile of
Student's t-distribution with \(\nu\) degrees of freedom
(Zar, 2010; Gilbert, 1987; Ott, 1995; Helsel and Hirsch, 1992).
If ci.type="lower"
, the \((1-\alpha)\)100% confidence interval for
\(\mu_1 - \mu_2\) is given by:
$$[(\hat{\mu}_1 - \hat{\mu}_2) - t(n_1 + n_2 -2, 1-\alpha) \hat{\sigma}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, \; \infty] \;\;\;\;\;\; (12)$$
and if ci.type="upper"
, the confidence interval is given by:
$$[-\infty, \; (\hat{\mu}_1 - \hat{\mu}_2) - t(n_1 + n_2 -2, 1-\alpha) \hat{\sigma}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}] \;\;\;\;\;\; (13)$$
For the two-sample case, the arguments n1
and n2
correspond to
\(n_1\) and \(n_2\) in Equation (6), the argument diff.or.mean
corresponds
to \(\hat{\mu_1} - \hat{\mu_2} = \bar{x}_1 - \bar{x}_2\) in Equations (7) and (8),
and the argument SD
corresponds to \(\hat{\sigma} = s_p\) in Equation (9).
a data frame with the rows varying the standard deviation and the columns varying the estimated mean or difference between the means. Elements of the data frame are character strings indicating the confidence intervals.
Bacchetti, P. (2010). Current sample size conventions: Flaws, Harms, and Alternatives. BMC Medicine 8, 17–23.
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
Bacchetti (2010) presents strong arguments against the current convention in scientific research for computing sample size that is based on formulas that use a fixed Type I error (usually 5%) and a fixed minimal power (often 80%) without regard to costs. He notes that a key input to these formulas is a measure of variability (usually a standard deviation) that is difficult to measure accurately "unless there is so much preliminary data that the study isn't really needed." Also, study designers often avoid defining what a scientifically meaningful difference is by presenting sample size results in terms of the effect size (i.e., the difference of interest divided by the elusive standard deviation). Bacchetti (2010) encourages study designers to use simple tables in a sensitivity analysis to see what results of a study may look like for low, moderate, and high rates of variability and large, intermediate, and no underlying differences in the populations or processes being studied.
# Show how potential confidence intervals for the difference between two means
# will look assuming standard deviations of 1, 2, or 3, differences between
# the two means of 2, 1, or 0, and a sample size of 10 in each group.
ciTableMean()
#> Diff=2 Diff=1 Diff=0
#> SD=1 [ 1.1, 2.9] [ 0.1, 1.9] [-0.9, 0.9]
#> SD=2 [ 0.1, 3.9] [-0.9, 2.9] [-1.9, 1.9]
#> SD=3 [-0.8, 4.8] [-1.8, 3.8] [-2.8, 2.8]
# Diff=2 Diff=1 Diff=0
#SD=1 [ 1.1, 2.9] [ 0.1, 1.9] [-0.9, 0.9]
#SD=2 [ 0.1, 3.9] [-0.9, 2.9] [-1.9, 1.9]
#SD=3 [-0.8, 4.8] [-1.8, 3.8] [-2.8, 2.8]
#==========
# Show how a potential confidence interval for a mean will look assuming
# standard deviations of 1, 2, or 5, a sample mean of 5, 3, or 1, and
# a sample size of 15.
ciTableMean(n1 = 15, diff.or.mean = c(5, 3, 1), SD = c(1, 2, 5), sample.type = "one")
#> Mean=5 Mean=3 Mean=1
#> SD=1 [ 4.4, 5.6] [ 2.4, 3.6] [ 0.4, 1.6]
#> SD=2 [ 3.9, 6.1] [ 1.9, 4.1] [-0.1, 2.1]
#> SD=5 [ 2.2, 7.8] [ 0.2, 5.8] [-1.8, 3.8]
# Mean=5 Mean=3 Mean=1
#SD=1 [ 4.4, 5.6] [ 2.4, 3.6] [ 0.4, 1.6]
#SD=2 [ 3.9, 6.1] [ 1.9, 4.1] [-0.1, 2.1]
#SD=5 [ 2.2, 7.8] [ 0.2, 5.8] [-1.8, 3.8]
#==========
# The data frame EPA.09.Ex.16.1.sulfate.df contains sulfate concentrations
# (ppm) at one background and one downgradient well. The estimated
# mean and standard deviation for the background well are 536 and 27 ppm,
# respectively, based on a sample size of n = 8 quarterly samples taken over
# 2 years. A two-sided 95% confidence interval for this mean is [514, 559],
# which has a half-width of 23 ppm.
#
# The estimated mean and standard deviation for the downgradient well are
# 608 and 18 ppm, respectively, based on a sample size of n = 6 quarterly
# samples. A two-sided 95% confidence interval for the difference between
# this mean and the background mean is [44, 100] ppm.
#
# Suppose we want to design a future sampling program and are interested in
# the size of the confidence interval for the difference between the two means.
# We will use ciTableMean to generate a table of possible confidence intervals
# by varying the assumed standard deviation and assumed differences between
# the means.
# Look at the data
#-----------------
EPA.09.Ex.16.1.sulfate.df
#> Month Year Well.type Sulfate.ppm
#> 1 Jan 1995 Background 560
#> 2 Apr 1995 Background 530
#> 3 Jul 1995 Background 570
#> 4 Oct 1995 Background 490
#> 5 Jan 1996 Background 510
#> 6 Apr 1996 Background 550
#> 7 Jul 1996 Background 550
#> 8 Oct 1996 Background 530
#> 9 Jan 1995 Downgradient NA
#> 10 Apr 1995 Downgradient NA
#> 11 Jul 1995 Downgradient 600
#> 12 Oct 1995 Downgradient 590
#> 13 Jan 1996 Downgradient 590
#> 14 Apr 1996 Downgradient 630
#> 15 Jul 1996 Downgradient 610
#> 16 Oct 1996 Downgradient 630
# Month Year Well.type Sulfate.ppm
#1 Jan 1995 Background 560
#2 Apr 1995 Background 530
#3 Jul 1995 Background 570
#4 Oct 1995 Background 490
#5 Jan 1996 Background 510
#6 Apr 1996 Background 550
#7 Jul 1996 Background 550
#8 Oct 1996 Background 530
#9 Jan 1995 Downgradient NA
#10 Apr 1995 Downgradient NA
#11 Jul 1995 Downgradient 600
#12 Oct 1995 Downgradient 590
#13 Jan 1996 Downgradient 590
#14 Apr 1996 Downgradient 630
#15 Jul 1996 Downgradient 610
#16 Oct 1996 Downgradient 630
# Compute the estimated mean and standard deviation for the
# background well.
#-----------------------------------------------------------
Sulfate.back <- with(EPA.09.Ex.16.1.sulfate.df,
Sulfate.ppm[Well.type == "Background"])
enorm(Sulfate.back, ci = TRUE)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Normal
#>
#> Estimated Parameter(s): mean = 536.2500
#> sd = 26.6927
#>
#> Estimation Method: mvue
#>
#> Data: Sulfate.back
#>
#> Sample Size: 8
#>
#> Confidence Interval for: mean
#>
#> Confidence Interval Method: Exact
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 513.9343
#> UCL = 558.5657
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 536.2500
# sd = 26.6927
#
#Estimation Method: mvue
#
#Data: Sulfate.back
#
#Sample Size: 8
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Exact
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 513.9343
# UCL = 558.5657
# Compute the estimated mean and standard deviation for the
# downgradient well.
#----------------------------------------------------------
Sulfate.down <- with(EPA.09.Ex.16.1.sulfate.df,
Sulfate.ppm[Well.type == "Downgradient"])
enorm(Sulfate.down, ci = TRUE)
#> Warning: There were 2 nonfinite values in x : 2 NA's
#> Warning: 2 observations with NA/NaN/Inf in 'x' removed.
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Normal
#>
#> Estimated Parameter(s): mean = 608.33333
#> sd = 18.34848
#>
#> Estimation Method: mvue
#>
#> Data: Sulfate.down
#>
#> Sample Size: 6
#>
#> Number NA/NaN/Inf's: 2
#>
#> Confidence Interval for: mean
#>
#> Confidence Interval Method: Exact
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 589.0778
#> UCL = 627.5889
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Normal
#
#Estimated Parameter(s): mean = 608.33333
# sd = 18.34848
#
#Estimation Method: mvue
#
#Data: Sulfate.down
#
#Sample Size: 6
#
#Number NA/NaN/Inf's: 2
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Exact
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 589.0778
# UCL = 627.5889
# Compute the estimated difference between the means and the confidence
# interval for the difference:
#----------------------------------------------------------------------
t.test(Sulfate.down, Sulfate.back, var.equal = TRUE)
#>
#> Two Sample t-test
#>
#> data: Sulfate.down and Sulfate.back
#> t = 5.661, df = 12, p-value = 0.0001054
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 44.33974 99.82693
#> sample estimates:
#> mean of x mean of y
#> 608.3333 536.2500
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: difference in means = 0
#
#Alternative Hypothesis: True difference in means is not equal to 0
#
#Test Name: Two Sample t-test
#
#Estimated Parameter(s): mean of x = 608.3333
# mean of y = 536.2500
#
#Data: Sulfate.down and Sulfate.back
#
#Test Statistic: t = 5.660985
#
#Test Statistic Parameter: df = 12
#
#P-value: 0.0001054306
#
#95% Confidence Interval: LCL = 44.33974
# UCL = 99.82693
# Use ciTableMean to look how the confidence interval for the difference
# between the background and downgradient means in a future study using eight
# quarterly samples at each well varies with assumed value of the pooled standard
# deviation and the observed difference between the sample means.
#--------------------------------------------------------------------------------
# Our current estimate of the pooled standard deviation is 24 ppm:
summary(lm(Sulfate.ppm ~ Well.type, data = EPA.09.Ex.16.1.sulfate.df))$sigma
#> [1] 23.57759
#[1] 23.57759
# We can see that if this is overly optimistic and in our next study the
# pooled standard deviation is around 50 ppm, then if the observed difference
# between the means is 50 ppm, the lower end of the confidence interval for
# the difference between the two means will include 0, so we may want to
# increase our sample size.
ciTableMean(n1 = 8, n2 = 8, diff = c(100, 50, 0), SD = c(15, 25, 50), digits = 0)
#> Diff=100 Diff=50 Diff=0
#> SD=15 [ 84, 116] [ 34, 66] [-16, 16]
#> SD=25 [ 73, 127] [ 23, 77] [-27, 27]
#> SD=50 [ 46, 154] [ -4, 104] [-54, 54]
# Diff=100 Diff=50 Diff=0
#SD=15 [ 84, 116] [ 34, 66] [-16, 16]
#SD=25 [ 73, 127] [ 23, 77] [-27, 27]
#SD=50 [ 46, 154] [ -4, 104] [-54, 54]
#==========
# Clean up
#---------
rm(Sulfate.back, Sulfate.down)