ciNormN.Rd
Compute the sample size necessary to achieve a specified half-width of a confidence interval for the mean of a normal distribution or the difference between two means, given the estimated standard deviation and confidence level.
numeric vector of (positive) half-widths.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
) values are not allowed.
numeric vector specifying the value(s) of the estimated standard deviation(s).
numeric vector of numbers between 0 and 1 indicating the confidence level
associated with the confidence interval(s). The default value is conf.level=0.95
.
character string indicating whether this is a one-sample
(sample.type="one.sample"
) or two-sample
(sample.type="two.sample"
) confidence interval.
When sample.type="one.sample"
, the computed sample size is based on
a confidence interval for a single mean.
When sample.type="two.sample"
, the computed sample size is based on
a confidence interval for the difference between two means.
The default value is sample.type="one.sample"
unless the argument
n2
is supplied.
numeric vector of sample sizes for group 2. The default value is NULL
,
in which case it is assumed that the sample sizes for groups 1 and 2 are equal.
This argument is ignored when sample.type="one.sample"
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
) values are not allowed.
logical scalar indicating whether to round up the values of the computed sample size(s)
to the next smallest integer. The default value is round.up=TRUE
.
positive integer greater than 1 specifying the maximum sample size for the single
group when sample.type="one.sample"
or for group 1 when sample.type="two.sample"
. The default value is n.max=5000
.
numeric scalar indicating the tolerance to use in the uniroot
search algorithm. The default value is tol=1e-7
.
positive integer indicating the maximum number of iterations to use in the
uniroot
search algorithm. The default value is
maxiter=1000
.
If the arguments half.width
, n2
, sigma.hat
, and
conf.level
are not all the same length, they are replicated to be the same length as
the length of the longest argument.
The function ciNormN
uses the formulas given in the help file for
ciNormHalfWidth
for the half-width of the confidence interval
to iteratively solve for the sample size. For the two-sample case, the default
is to assume equal sample sizes for each group unless the argument n2
is supplied.
When sample.type="one.sample"
, or sample.type="two.sample"
and n2
is not supplied (so equal sample sizes for each group is assumed),
the function ciNormN
returns a numeric vector of sample sizes.
When sample.type="two.sample"
and n2
is supplied,
the function ciNormN
returns a list with two components called n1
and n2
,
specifying the sample sizes for each group.
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.
Millard, S.P., and N. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-3.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapters 7 and 8.
The normal distribution and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean. This is done with confidence intervals.
In the course of designing a sampling program, an environmental scientist may wish to determine
the relationship between sample size, confidence level, and half-width if one of the objectives
of the sampling program is to produce confidence intervals. The functions
ciNormHalfWidth
, ciNormN
, and plotCiNormDesign
can be used to investigate these relationships for the case of normally-distributed observations.
# Look at how the required sample size for a one-sample
# confidence interval decreases with increasing half-width:
seq(0.25, 1, by = 0.25)
#> [1] 0.25 0.50 0.75 1.00
#[1] 0.25 0.50 0.75 1.00
ciNormN(half.width = seq(0.25, 1, by = 0.25))
#> [1] 64 18 10 7
#[1] 64 18 10 7
ciNormN(seq(0.25, 1, by=0.25), round = FALSE)
#> [1] 63.897899 17.832337 9.325967 6.352717
#[1] 63.897899 17.832337 9.325967 6.352717
#----------------------------------------------------------------
# Look at how the required sample size for a one-sample
# confidence interval increases with increasing estimated
# standard deviation for a fixed half-width:
seq(0.5, 2, by = 0.5)
#> [1] 0.5 1.0 1.5 2.0
#[1] 0.5 1.0 1.5 2.0
ciNormN(half.width = 0.5, sigma.hat = seq(0.5, 2, by = 0.5))
#> [1] 7 18 38 64
#[1] 7 18 38 64
#----------------------------------------------------------------
# Look at how the required sample size for a one-sample
# confidence interval increases with increasing confidence
# level for a fixed half-width:
seq(0.5, 0.9, by = 0.1)
#> [1] 0.5 0.6 0.7 0.8 0.9
#[1] 0.5 0.6 0.7 0.8 0.9
ciNormN(half.width = 0.25, conf.level = seq(0.5, 0.9, by = 0.1))
#> [1] 9 13 19 28 46
#[1] 9 13 19 28 46
#----------------------------------------------------------------
# Modifying the example on pages 21-4 to 21-5 of USEPA (2009),
# determine the required sample size in order to achieve a
# half-width that is 10% of the observed mean (based on the first
# four months of observations) for the Aldicarb level at the first
# compliance well. Assume a 95% confidence level and use the
# estimated standard deviation from the first four months of data.
# (The data are stored in EPA.09.Ex.21.1.aldicarb.df.)
#
# The required sample size is 20, so almost two years of data are
# required assuming observations are taken once per month.
EPA.09.Ex.21.1.aldicarb.df
#> Month Well Aldicarb.ppb
#> 1 1 Well.1 19.9
#> 2 2 Well.1 29.6
#> 3 3 Well.1 18.7
#> 4 4 Well.1 24.2
#> 5 1 Well.2 23.7
#> 6 2 Well.2 21.9
#> 7 3 Well.2 26.9
#> 8 4 Well.2 26.1
#> 9 1 Well.3 5.6
#> 10 2 Well.3 3.3
#> 11 3 Well.3 2.3
#> 12 4 Well.3 6.9
# Month Well Aldicarb.ppb
#1 1 Well.1 19.9
#2 2 Well.1 29.6
#3 3 Well.1 18.7
#4 4 Well.1 24.2
#...
mu.hat <- with(EPA.09.Ex.21.1.aldicarb.df,
mean(Aldicarb.ppb[Well=="Well.1"]))
mu.hat
#> [1] 23.1
#[1] 23.1
sigma.hat <- with(EPA.09.Ex.21.1.aldicarb.df,
sd(Aldicarb.ppb[Well=="Well.1"]))
sigma.hat
#> [1] 4.93491
#[1] 4.93491
ciNormN(half.width = 0.1 * mu.hat, sigma.hat = sigma.hat)
#> [1] 20
#[1] 20
#----------
# Clean up
rm(mu.hat, sigma.hat)