Create Plots for a Sampling Design Based on a One-Way Fixed-Effects Analysis of Variance

Create plots involving sample size, power, scaled difference, and significance level for a one-way fixed-effects analysis of variance.

plotAovDesign(x.var = "n", y.var = "power", range.x.var = NULL, 
    n.vec = c(25, 25), mu.vec = c(0, 1), sigma = 1, alpha = 0.05, power = 0.95, 
    round.up = FALSE, n.max = 5000, tol = 1e-07, maxiter = 1000, plot.it = TRUE, 
    add = FALSE, n.points = 50, plot.col = 1, plot.lwd = 3 * par("cex"), 
    plot.lty = 1, digits = .Options$digits, main = NULL, xlab = NULL, ylab = NULL, 
    type = "l", ...)

Arguments

x.var: character string indicating what variable to use for the x-axis. Possible values are "n" (sample size; the default), "power" (power of the test), and "alpha" (significance level of the test).
y.var: character string indicating what variable to use for the y-axis. Possible values are "power" (power of the test; the default) and "n" (sample size).
range.x.var: numeric vector of length 2 indicating the range of the x-variable to use for the plot. The default value depends on the value of x.var. When x.var="n" the default value is c(2,50). When x.var="power" the default value is
c(alpha+.Machine$double.eps, 0.95). When x.var="alpha", the default value is c(0.01, 0.2).
n.vec: numeric vector indicating the sample size for each group. The default value is n.vec=c(25, 25). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed. This argument must be the same length as mu.vec. This argument is ignored if either x.var="n" or y.var="n".
mu.vec: numeric vector indicating the population mean for each group. The default value is mu.vec=c(0, 1). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed. This argument must be the same length as n.vec.
sigma: numeric scalar indicating the population standard deviation for all groups. The default value is sigma=1. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.
alpha: numeric scalar between 0 and 1 indicating the Type I error level associated with the hypothesis test. The default value is alpha=0.05. This argument is ignored when x.var="alpha".
power: numeric scalar between 0 and 1 indicating the power associated with the hypothesis test. The default value is power=0.95. This argument is ignored when x.var="power" or y.var="power".
round.up: logical scalar indicating whether to round up the values of the computed sample size(s) to the next smallest integer. The default value is FALSE. This argument is ignored unless y.var="n".
n.max: for the case when y.var="n", a positive integer greater than 2 indicating the maximum sample size per group. The default value is n.max=5000.
tol: for the case when y.var="n", numeric scalar indicating the tolerance to use in the uniroot search for the sample size. The default value is tol=1e-7.
maxiter: for the case when y.var="n", positive integer greater then 1 indicating the maximum number of iterations to use in the uniroot search for the sample size. The default value is maxiter=1000.
plot.it: a logical scalar indicating whether to create a plot or add to the existing plot (see add) on the current graphics device. If plot.it=FALSE, no plot is produced, but a list of (x,y) values is returned (see VALUE). The default value is plot.it=TRUE.
add: a logical scalar indicating whether to add the design plot to the existing plot (add=TRUE), or to create a new plot (add=FALSE). The default value is
add=FALSE. This argument is ignored if plot.it=FALSE.
n.points: a numeric scalar specifying how many (x,y) pairs to use to produce the plot. There are n.points x-values evenly spaced between range.x.var[1] and
range.x.var[2]. The default value is n.points=50.
plot.col: a numeric scalar or character string determining the color of the plotted line or points. The default value is plot.col=1. See the entry for col in the help file for par for more information.
plot.lwd: a numeric scalar determining the width of the plotted line. The default value is 3*par("cex"). See the entry for lwd in the help file for par for more information.
plot.lty: a numeric scalar determining the line type of the plotted line. The default value is plot.lty=1. See the entry for lty in the help file for par for more information.
digits: a scalar indicating how many significant digits to print out on the plot. The default value is the current setting of options("digits").
main, xlab, ylab, type, ...: additional graphical parameters (see par).

Details

See the help files for aovPower and aovN for information on how to compute the power and sample size for a one-way fixed-effects analysis of variance.

Value

plotAovDesign invisibly returns a list with components:

x.var: x-coordinates of the points that have been or would have been plotted
y.var: y-coordinates of the points that have been or would have been plotted

References

Berthouex, P.M., and L.C. Brown. (1994). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton, FL, Chapter 17.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, Chapter 7.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York, Chapters 27, 29, 30.

Scheffe, H. (1959). The Analysis of Variance. John Wiley and Sons, New York, 477pp.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapter 10.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The normal and lognormal distribution are probably the two most frequently used distributions to model environmental data. Sometimes it is necessary to compare several means to determine whether any are significantly different from each other (e.g., USEPA, 2009, p.6-38). In this case, assuming normally distributed data, you perform a one-way parametric analysis of variance.

In the course of designing a sampling program, an environmental scientist may wish to determine the relationship between sample size, Type I error level, power, and differences in means if one of the objectives of the sampling program is to determine whether a particular mean differs from a group of means. The functions aovPower, aovN, and plotAovDesign can be used to investigate these relationships for the case of normally-distributed observations.

Examples

  # Look at the relationship between power and sample size 
  # for a one-way ANOVA, assuming k=2 groups, group means of 
  # 0 and 1, a population standard deviation of 1, and a 
  # 5% significance level:

  dev.new()
  plotAovDesign()

  #--------------------------------------------------------------------

  # Plot power vs. sample size for various levels of significance:

  dev.new()
  plotAovDesign(mu.vec = c(0, 0.5, 1), ylim=c(0, 1), main="") 

  plotAovDesign(mu.vec = c(0, 0.5, 1), alpha=0.1, add=TRUE, plot.col=2) 

  plotAovDesign(mu.vec = c(0, 0.5, 1), alpha=0.2, add=TRUE, plot.col=3) 

  legend(35, 0.6, c("20%", "10%", "   5%"), lty=1, lwd = 3, col=3:1, 
    bty = "n") 

  mtext("Power vs. Sample Size for One-Way ANOVA", line = 3, cex = 1.25)
  mtext(expression(paste("with ", mu, "=(0, 0.5, 1), ", sigma, 
    "=1, and Various Significance Levels", sep="")), 
    line = 1.5, cex = 1.25)

  #--------------------------------------------------------------------

  # The example on pages 5-11 to 5-14 of USEPA (1989b) shows 
  # log-transformed concentrations of lead (mg/L) at two 
  # background wells and four compliance wells, where 
  # observations were taken once per month over four months 
  # (the data are stored in EPA.89b.loglead.df).  
  # Assume the true mean levels at each well are 
  # 3.9, 3.9, 4.5, 4.5, 4.5, and 5, respectively.  Plot the 
  # power vs. sample size of a one-way ANOVA to test for mean 
  # differences between wells.  Use alpha=0.05, and assume the 
  # true standard deviation is equal to the one estimated 
  # from the data in this example.

  names(EPA.89b.loglead.df) 
#> [1] "LogLead"   "Month"     "Well"      "Well.type"
  #[1] "LogLead"   "Month"     "Well"      "Well.type"

  # Perform the ANOVA and get the estimated sd 
  aov.list <- aov(LogLead ~ Well, data=EPA.89b.loglead.df) 

  summary(aov.list) 
#>             Df Sum Sq Mean Sq F value Pr(>F)  
#> Well         5  5.745  1.1489   3.347  0.026 *
#> Residuals   18  6.179  0.3433                 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  #            Df Sum Sq Mean Sq F value  Pr(>F)  
  #Well         5 5.7447 1.14895  3.3469 0.02599 *
  #Residuals   18 6.1791 0.34328                  
  #---
  #Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
 
  # Now create the plot 
  dev.new()
  plotAovDesign(range.x.var = c(2, 20), 
    mu.vec = c(3.9,3.9,4.5,4.5,4.5,5), 
    sigma=sqrt(0.34), 
    ylim = c(0, 1), digits=2)

  # Clean up
  #---------
  rm(aov.list)
  graphics.off()