Plots for Sampling Design Based on Confidence Interval for Mean of a Normal Distribution or Difference Between Two Means

Create plots involving sample size, half-width, estimated standard deviation, and confidence level for a confidence interval for the mean of a normal distribution or the difference between two means.

plotCiNormDesign(x.var = "n", y.var = "half.width", 
    range.x.var = NULL, n.or.n1 = 25, n2 = n.or.n1, 
    half.width = sigma.hat/2, sigma.hat = 1, conf.level = 0.95, 
    sample.type = ifelse(missing(n2), "one.sample", "two.sample"), 
    round.up = FALSE, n.max = 5000, tol = 1e-07, maxiter = 1000,
    plot.it = TRUE, add = FALSE, n.points = 100,
    plot.col = "black", plot.lwd = 3 * par("cex"), plot.lty = 1, 
    digits = .Options$digits, 
    main = NULL, xlab = NULL, ylab = NULL, type = "l", ...)

Arguments

x.var: character string indicating what variable to use for the x-axis. Possible values are "n" (sample size; the default), "half.width" (the half-width of the confidence interval), "sigma.hat" (the estimated standard deviation), and "conf.level" (the confidence level).
y.var: character string indicating what variable to use for the y-axis. Possible values are "half.width" (the half-width of the confidence interval; the default), and "n" (sample size).
range.x.var: numeric vector of length 2 indicating the range of the x-variable to use for the plot. The default value depends on the value of x.var. When x.var="n" the default value is c(2,50). When x.var="half.width" the default value is c(0.1/sigma.hat, 2/sigma.hat). When x.var="sigma.hat", the default value is c(0.1, 2). When x.var="conf.level", the default value is c(0.5, 0.99).
n.or.n1: numeric scalar indicating the sample size. The default value is n.or.n1=25. When sample.type="one.sample", this argument denotes the number of observations in the single sample. When sample.type="two.sample", this argument denotes the number of observations from group 1. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed. This argument is ignored if either x.var="n" or y.var="n".
n2: numeric scalar indicating the sample size for group 2. The default value is the value of n.or.n1. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed. This argument is ignored when
sample.type="one.sample".
half.width: positive numeric scalar indicating the half-width of the confidence interval. The default value is sigma.hat/2. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed. This argument is ignored if either
x.var="half.width" or y.var="half.width".
sigma.hat: positive numeric scalar specifying the estimated standard deviation. The default value is sigma.hat=1. This argument is ignored if x.var="sigma.hat".
conf.level: a scalar between 0 and 1 indicating the confidence level associated with the confidence interval. The default value is conf.level=0.95. This argument is ignored if x.var="conf.level".
sample.type: character string indicating whether this is a one-sample or two-sample confidence interval.
When sample.type="one.sample", the computations for the plot are based on a confidence interval for a single mean.
When sample.type="two.sample", the computations for the plot are based on a confidence interval for the difference between two means.
The default value is sample.type="one.sample" unless the argument n2 is supplied.
round.up: logical scalar indicating whether to round up the computed sample sizes to the next smallest integer. The default value is round.up=FALSE. This argument is ignored unless y.var="n".
n.max: for the case when y.var="n", positive integer greater than 1 specifying the maximum sample size for the single group when sample.type="one.sample" or for group 1 when sample.type="two.sample". The default value is n.max=5000.
tol: for the case when y.var="n", numeric scalar indicating the tolerance to use in the uniroot search algorithm. The default value is tol=1e-7.
maxiter: for the case when y.var="n", positive integer indicating the maximum number of iterations to use in the uniroot search algorithm. The default value is maxiter=1000.
plot.it: a logical scalar indicating whether to create a plot or add to the existing plot (see explanation of the argument add below) on the current graphics device. If plot.it=FALSE, no plot is produced, but a list of (x,y) values is returned (see the section VALUE). The default value is plot.it=TRUE.
add: a logical scalar indicating whether to add the design plot to the existing plot (add=TRUE), or to create a plot from scratch (add=FALSE). The default value is add=FALSE. This argument is ignored if plot.it=FALSE.
n.points: a numeric scalar specifying how many (x,y) pairs to use to produce the plot. There are n.points x-values evenly spaced between range.x.var[1] and
range.x.var[2]. The default value is n.points=100.
plot.col: a numeric scalar or character string determining the color of the plotted line or points. The default value is plot.col=1. See the entry for col in the help file for par for more information.
plot.lwd: a numeric scalar determining the width of the plotted line. The default value is 3*par("cex"). See the entry for lwd in the help file for par for more information.
plot.lty: a numeric scalar determining the line type of the plotted line. The default value is plot.lty=1. See the entry for lty in the help file for par for more information.
digits: a scalar indicating how many significant digits to print out on the plot. The default value is the current setting of options("digits").
main, xlab, ylab, type, ...: additional graphical parameters (see par).

Details

See the help files for ciNormHalfWidth and ciNormN for information on how to compute a one-sample confidence interval for the mean of a normal distribution or a two-sample confidence interval for the difference between two means, how the half-width is computed when other quantities are fixed, and how the sample size is computed when other quantities are fixed.

Value

plotCiNormDesign invisibly returns a list with components:

x.var: x-coordinates of points that have been or would have been plotted.
y.var: y-coordinates of points that have been or would have been plotted.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.

Millard, S.P., and N. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.21-3.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapters 7 and 8.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The normal distribution and lognormal distribution are probably the two most frequently used distributions to model environmental data. In order to make any kind of probability statement about a normally-distributed population (of chemical concentrations for example), you have to first estimate the mean and standard deviation (the population parameters) of the distribution. Once you estimate these parameters, it is often useful to characterize the uncertainty in the estimate of the mean. This is done with confidence intervals.

In the course of designing a sampling program, an environmental scientist may wish to determine the relationship between sample size, confidence level, and half-width if one of the objectives of the sampling program is to produce confidence intervals. The functions ciNormHalfWidth, ciNormN, and plotCiNormDesign can be used to investigate these relationships for the case of normally-distributed observations.

Examples

  # Look at the relationship between half-width and sample size 
  # for a one-sample confidence interval for the mean, assuming 
  # an estimated standard deviation of 1 and a confidence level of 95%.

  dev.new()
  plotCiNormDesign()

  #--------------------------------------------------------------------

  # Plot sample size vs. the estimated standard deviation for 
  # various levels of confidence, using a half-width of 0.5.

  dev.new()
  plotCiNormDesign(x.var = "sigma.hat", y.var = "n", main = "") 

  plotCiNormDesign(x.var = "sigma.hat", y.var = "n", conf.level = 0.9, 
    add = TRUE, plot.col = 2) 

  plotCiNormDesign(x.var = "sigma.hat", y.var = "n", conf.level = 0.8, 
    add = TRUE, plot.col = 3) 

  legend(0.25, 60, c("95%", "90%", "80%"), lty = 1, lwd = 3, col = 1:3) 

  mtext("Sample Size vs. Estimated SD for Confidence Interval for Mean",
    font = 2, cex = 1.25, line = 2.75)
  mtext("with Half-Width=0.5 and Various Confidence Levels", font = 2, 
    cex = 1.25, line = 1.25)

  #--------------------------------------------------------------------

  # Modifying the example on pages 21-4 to 21-5 of USEPA (2009), 
  # look at the relationship between half-width and sample size for a 
  # 95% confidence interval for the mean level of Aldicarb at the 
  # first compliance well.  Use the estimated standard deviation from 
  # the first four months of data. 
  # (The data are stored in EPA.09.Ex.21.1.aldicarb.df.)

  EPA.09.Ex.21.1.aldicarb.df
#>    Month   Well Aldicarb.ppb
#> 1      1 Well.1         19.9
#> 2      2 Well.1         29.6
#> 3      3 Well.1         18.7
#> 4      4 Well.1         24.2
#> 5      1 Well.2         23.7
#> 6      2 Well.2         21.9
#> 7      3 Well.2         26.9
#> 8      4 Well.2         26.1
#> 9      1 Well.3          5.6
#> 10     2 Well.3          3.3
#> 11     3 Well.3          2.3
#> 12     4 Well.3          6.9
  #   Month   Well Aldicarb.ppb
  #1      1 Well.1         19.9
  #2      2 Well.1         29.6
  #3      3 Well.1         18.7
  #4      4 Well.1         24.2
  #...

  mu.hat <- with(EPA.09.Ex.21.1.aldicarb.df, 
    mean(Aldicarb.ppb[Well=="Well.1"]))

  mu.hat 
#> [1] 23.1
  #[1] 23.1 

  sigma.hat <- with(EPA.09.Ex.21.1.aldicarb.df, 
    sd(Aldicarb.ppb[Well=="Well.1"]))

  sigma.hat 
#> [1] 4.93491
  #[1] 4.93491 

  dev.new()
  plotCiNormDesign(sigma.hat = sigma.hat, digits = 2, 
    range.x.var = c(2, 25))

  #==========

  # Clean up
  #---------
  rm(mu.hat, sigma.hat)
  graphics.off()