Produce a cumulative distribution function (cdf) plot for a user-specified distribution.

cdfPlot(distribution = "norm", param.list = list(mean = 0, sd = 1), 
    left.tail.cutoff = ifelse(is.finite(supp.min), 0, 0.001), 
    right.tail.cutoff = ifelse(is.finite(supp.max), 0, 0.001), plot.it = TRUE, 
    add = FALSE, n.points = 1000, cdf.col = "black", cdf.lwd = 3 * par("cex"), 
    cdf.lty = 1, curve.fill = FALSE, curve.fill.col = "cyan", 
    digits = .Options$digits, ..., type = ifelse(discrete, "s", "l"), 
    main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL)

Arguments

distribution

a character string denoting the distribution abbreviation. The default value is distribution="norm". See the help file for Distribution.df for a list of possible distribution abbreviations.

param.list

a list with values for the parameters of the distribution. The default value is param.list=list(mean=0, sd=1). See the help file for Distribution.df for the names and possible values of the parameters associated with each distribution.

left.tail.cutoff

a numeric scalar indicating what proportion of the left-tail of the probability distribution to omit from the plot. For densities with a finite support minimum (e.g., Lognormal) the default value is 0; for all other densities the default value is 0.001.

right.tail.cutoff

a scalar indicating what proportion of the right-tail of the probability distribution to omit from the plot. For densities with a finite support maximum (e.g., Binomial) the default value is 0; for all other densities the default value is 0.001.

plot.it

a logical scalar indicating whether to create a plot or add to the existing plot (see add) on the current graphics device. If plot.it=FALSE, no plot is produced, but a list of \((x, y)\) values is returned (see the section VALUE below). The default value is plot.it=TRUE.

add

a logical scalar indicating whether to add the cumulative distribution function curve to the existing plot (add=TRUE), or to create a new plot (add=FALSE; the default). This argument is ignored if plot.it=FALSE.

n.points

a numeric scalar specifying at how many evenly-spaced points the cumulative distribution function will be evaluated. The default value is n.points=1000.

cdf.col

a numeric scalar or character string determining the color of the cdf line in the plot. The default value is pdf.col="black". See the entry for col in the help file for par for more information.

cdf.lwd

a numeric scalar determining the width of the cdf line in the plot. The default value is pdf.lwd=3*par("cex"). See the entry for lwd in the help file for par for more information.

cdf.lty

a numeric scalar determining the line type of the cdf line in the plot. The default value is pdf.lty=1. See the entry for lty in the help file for par for more information.

curve.fill

a logical value indicating whether to fill in the area below the cumulative distribution function curve with the color specified by curve.fill.col. The default value is curve.fill=FALSE.

curve.fill.col

when curve.fill=TRUE, a numeric scalar or character string indicating what color to use to fill in the area below the cumulative distribution function curve. The default value is curve.fill.col="cyan". See the entry for col in the help file for par for more information.

digits

a scalar indicating how many significant digits to print for the distribution parameters. The default value is digits=.Options$digits.

type, main, xlab, ylab, xlim, ylim, ...

additional graphical parameters (see lines and par). In particular, the argument type specifies the kind of line type. By default, the function cdfPlot plots a step function (type="s") for discrete distributions, and plots a straight line between points (type="l") otherwise. The user may override these defaults by supplying the graphics parameter type (type="s" for a step function, type="l" for linear interpolation, type="p" for points only, etc.).

Details

The cumulative distribution function (cdf) of a random variable \(X\), usually denoted \(F\), is defined as: $$F(x) = Pr(X \le x) \;\;\;\;\;\; (1)$$ That is, \(F(x)\) is the probability that \(X\) is less than or equal to \(x\). This is the probability that the random variable \(X\) takes on a value in the interval \((-\infty, x]\) and is simply the (Lebesgue) integral of the pdf evaluated between \(-\infty\) and \(x\). That is, $$F(x) = Pr(X \le x) = \int_{-\infty}^x f(t) dt \;\;\;\;\;\; (2)$$ where \(f(t)\) denotes the probability density function of \(X\) evaluated at \(t\). For discrete distributions, Equation (2) translates to summing up the probabilities of all values in this interval: $$F(x) = Pr(X \le x) = \sum_{t \in (-\infty,x]} f(t) = \sum_{t \in (-\infty,x]} Pr(X = t) \;\;\;\;\;\; (3)$$

A cumulative distribution function (cdf) plot plots the values of the cdf against quantiles of the specified distribution. Theoretical cdf plots are sometimes plotted along with empirical cdf plots to visually assess whether data have a particular distribution.

Value

cdfPlot invisibly returns a list giving coordinates of the points that have been or would have been plotted:

Quantiles

The quantiles used for the plot.

Cumulative.Probabilities

The values of the cdf associated with the quantiles.

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions, Second Edition. John Wiley and Sons, New York.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Examples

  # Plot the cdf of the standard normal distribution 
  #-------------------------------------------------
  dev.new()
  cdfPlot()

  #==========

  # Plot the cdf of the standard normal distribution
  # and a N(2, 2) distribution on the sample plot. 
  #-------------------------------------------------
  dev.new()
  cdfPlot(param.list = list(mean=2, sd=2), main = "") 

  cdfPlot(add = TRUE, cdf.col = "red") 

  legend("topleft", legend = c("N(2,2)", "N(0,1)"), 
    col = c("black", "red"), lwd = 3 * par("cex")) 

  title("CDF Plots for Two Normal Distributions")
 
  #==========

  # Clean up
  #---------
  graphics.off()