Plot Empirical Probability Density Function

Produces an empirical probability density function plot.

epdfPlot(x, discrete = FALSE, density.arg.list = NULL, plot.it = TRUE, 
    add = FALSE, epdf.col = "black", epdf.lwd = 3 * par("cex"), epdf.lty = 1, 
    curve.fill = FALSE, curve.fill.col = "cyan", ..., 
    type = ifelse(discrete, "h", "l"), main = NULL, xlab = NULL, ylab = NULL, 
    xlim = NULL, ylim = NULL)

Arguments

x: numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
discrete: logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE; the default).
density.arg.list: list with arguments to the density function. The default value is
density.arg.list=NULL. This argument is ignored if discrete=TRUE.
plot.it: logical scalar indicating whether to produce a plot or add to the current plot (see add) on the current graphics device. The default value is plot.it=TRUE.
add: logical scalar indicating whether to add the empirical pdf to the current plot (add=TRUE) or generate a new plot (add=FALSE; the default). This argument is ignored if plot.it=FALSE.
epdf.col: a numeric scalar or character string determining the color of the empirical pdf line or points. The default value is epdf.col="black". See the entry for col in the help file for par for more information.
epdf.lwd: a numeric scalar determining the width of the empirical pdf line. The default value is epdf.lwd=3*par("cex"). See the entry for lwd in the help file for par for more information.
epdf.lty: a numeric scalar determining the line type of the empirical pdf line. The default value is ecdf.lty=1. See the entry for lty in the help file for par for more information.
curve.fill: a logical scalar indicating whether to fill in the area below the empirical pdf curve with the color specified by curve.fill.col. The default value is
curve.fill=FALSE.
curve.fill.col: a numeric scalar or character string indicating what color to use to fill in the area below the empirical pdf curve. The default value is curve.fill.col="cyan". This argument is ignored if curve.fill=FALSE.
type, main, xlab, ylab, xlim, ylim, ...: additional graphical parameters (see lines and par). In particular, the argument type specifies the kind of line type. By default, the function epdfPlot plots histogram-like vertical lines (type="h") when discrete=TRUE, and plots a straight line between points (type="l") when discrete=FALSE. The user may override these defaults by supplying the graphics parameter type (type="h" for histogram-like vertical lines, type="l" for linear interpolation, type="p" for points only, etc.).

Details

When a distribution is discrete and can only take on a finite number of values, the empirical pdf plot is the same as the standard relative frequency histogram; that is, each bar of the histogram represents the proportion of the sample equal to that particular number (or category). When a distribution is continuous, the function epdfPlot calls the R function density to compute the estimated probability density at a number of evenly spaced points between the minimum and maximum values.

Value

epdfPlot invisibly returns a list with the following components:

x: numeric vector of ordered quantiles.
f.x: numeric vector of the associated estimated values of the pdf.

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA.

See the REFERENCES section in the help file for density.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

An empirical probability density function (epdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms and boxplots to assess the characteristics of a set of data.

Examples

  # Using Reference Area TcCB data in EPA.94b.tccb.df, 
  # create a histogram of the log-transformed observations, 
  # then superimpose the empirical pdf plot.

  dev.new()
  log.TcCB <- with(EPA.94b.tccb.df, log(TcCB[Area == "Reference"]))

  hist(log.TcCB, freq = FALSE, xlim = c(-2, 1),
    col = "cyan", xlab = "log [ TcCB (ppb) ]",
    ylab = "Relative Frequency", 
    main = "Reference Area TcCB with Empirical PDF")

  epdfPlot(log.TcCB, add = TRUE)

  #==========

  # Generate 20 observations from a Poisson distribution with 
  # parameter lambda = 10, and plot the empirical PDF.

  set.seed(875)
  x <- rpois(20, lambda = 10)
  dev.new()
  epdfPlot(x, discrete = TRUE)

  #==========

  # Clean up
  #---------
  rm(log.TcCB, x)
  graphics.off()