cdfCompare.Rd
For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.
cdfCompare(x, y = NULL, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"), plot.pos.con = NULL,
distribution = "norm", param.list = NULL,
estimate.params = is.null(param.list), est.arg.list = NULL,
x.col = "blue", y.or.fitted.col = "black",
x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"),
x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
a numeric vector (not necessarily of the same length as x
).
Missing (NA
), undefined (NaN
), and infinite
(Inf
, -Inf
) values are allowed but will be removed.
The default value is y=NULL
, in which case the empirical cdf of
x
will be plotted along with the theoretical cdf specified by the
argument distribution
.
logical scalar indicating whether the assumed parent distribution of x
is discrete (discrete=TRUE
) or continuous (discrete=FALSE
; the default).
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are
plot.pos
(plotting positions, the default if discrete=FALSE
) and
emp.probs
(empirical probabilities, the default if discrete=TRUE
).
See the help file for ecdfPlot
for more explanation.
numeric scalar between 0 and 1 containing the value of the plotting position constant.
When y
is supplied, the default value is plot.pos.con=0.375
.
When y
is not supplied, for the normal, lognormal, three-parameter lognormal,
zero-modified normal, and zero-modified lognormal distributions, the default value
is plot.pos.con=0.375
.
For the Type I extreme value (Gumbel) distribution (distribution="evd"
),
the default value is plot.pos.con=0.44
. For all other distributions, the
default value is plot.pos.con=0.4
.
See the help files for ecdfPlot
and qqPlot
for more
information. This argument is ignored if prob.method="emp.probs"
.
when y
is not supplied,
a character string denoting the distribution abbreviation. The default value is
distribution="norm"
. See the help file for Distribution.df
for a
list of possible distribution abbreviations. This argument is ignored if y
is supplied.
when y
is not supplied,
a list with values for the parameters of the distribution. The default value is
param.list=list(mean=0, sd=1)
. See the help file for Distribution.df
for the names and possible values of the parameters associated with each distribution.
This argument is ignored if y
is supplied or estimate.params=TRUE
.
when y
is not supplied,
a logical scalar indicating whether to compute the cdf for x
based on estimating the distribution parameters (estimate.params=TRUE
) or
using the known distribution parameters specified in param.list
(estimate.params=FALSE
). The default value is TRUE
unless the argument param.list
is supplied. The argument estimate.params
is ignored if y
is supplied.
when y
is not supplied and estimate.params=TRUE
,
a list whose components are optional arguments associated with the function used to
estimate the parameters of the assumed distribution (see the help file
Estimating Distribution Parameters).
For example, all functions used to estimate distribution parameters have an
optional argument called method
that specifies the method to use to estimate the parameters.
(See the help file for Distribution.df
for a list of available estimation
methods for each distribution.) To override the default estimation method, supply the argument
est.arg.list
with a component called method
; for example
est.arg.list=list(method="mle")
. The default value is
est.arg.list=NULL
so that all default values for the estimating function are used.
This argument is ignored if estimate.params=FALSE
or y
is supplied.
a numeric scalar or character string determining the color of the empirical cdf
(based on x
) line or points. The default value is x.col="blue"
.
See the entry for col
in the help file for par
for more
information.
a numeric scalar or character string determining the color of the empirical cdf
(based on y
) or the theoretical cdf line or points.
The default value is y.or.fitted.col="black"
. See the entry for
col
in the help file for par
for more information.
a numeric scalar determining the width of the empirical cdf (based on x
) line.
The default value is x.lwd=3*par("cex")
.
See the entry for lwd
in the help file for par
for more information.
a numeric scalar determining the width of the empirical cdf (based on y
)
or theoretical cdf line.
The default value is y.or.fitted.lwd=3*par("cex")
.
See the entry for lwd
in the help file for par
for more information.
a numeric scalar determining the line type of the empirical cdf
(based on x
) line. The default value is
x.lty=1
. See the entry for lty
in the help file for par
for more information.
a numeric scalar determining the line type of the empirical cdf
(based on y
) or theoretical cdf line. The default value is
y.or.fitted.lty=2
.
See the entry for lty
in the help file for par
for more information.
when y
is not supplied,
a scalar indicating how many significant digits to print for the distribution
parameters. The default value is digits=.Options$digits
.
additional graphical parameters (see lines
and par
).
In particular, the argument type
specifies the kind of line type.
By default, the function cdfCompare
plots a step function (type="s"
)
when discrete=TRUE
, and plots a straight line between points
(type="l"
) when discrete=FALSE
.
The user may override these defaults by supplying the graphics parameter type
(type="s"
for a step function, type="l"
for linear interpolation,
type="p"
for points only, etc.).
When both x
and y
are supplied, the function cdfCompare
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlot
.
When y
is not supplied, the function cdfCompare
creates the
emprical cdf plot of x
(by calling ecdfPlot
) and the
theoretical cdf plot (by calling cdfPlot
and using the
argument distribution
) on the same plot.
When y
is supplied, cdfCompare
invisibly returns a list with
components:
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the x
values.
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the y
values.
When y
is not supplied, cdfCompare
invisibly returns a list with
components:
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the x
values.
a list with components Quantiles
and
Cumulative.Probabilities
, giving coordinates of the
points that have been plotted for the fitted cdf.
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.
Chambers et al. (1983, pp.11-16) plot the observed order statistics on the \(y\)-axis vs. the ecdf on the \(x\)-axis and call this a quantile plot.
Empirical cumulative distribution function (ecdf) plots are often plotted with
theoretical cdf plots (see cdfPlot
and cdfCompare
) to
graphically assess whether a sample of observations comes from a particular
distribution. The Kolmogorov-Smirnov goodness-of-fit test
(see gofTest
) is the statistical companion of this kind of
comparison; it is based on the maximum vertical distance between the empirical
cdf plot and the theoretical cdf plot. More often, however,
quantile-quantile (Q-Q) plots are used instead of ecdf plots to graphically assess
departures from an assumed distribution (see qqPlot
).
# Generate 20 observations from a normal (Gaussian) distribution
# with mean=10 and sd=2 and compare the empirical cdf with a
# theoretical normal cdf that is based on estimating the parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x <- rnorm(20, mean = 10, sd = 2)
dev.new()
cdfCompare(x)
#----------
# Generate 30 observations from an exponential distribution with parameter
# rate=0.1 (see the R help file for Exponential) and compare the empirical
# cdf with the empirical cdf of the normal observations generated in the
# previous example:
set.seed(432)
y <- rexp(30, rate = 0.1)
dev.new()
cdfCompare(x, y)
#==========
# Generate 20 observations from a Poisson distribution with parameter lambda=10
# (see the R help file for Poisson) and compare the empirical cdf with a
# theoretical Poisson cdf based on estimating the distribution parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x <- rpois(20, lambda = 10)
dev.new()
cdfCompare(x, dist = "pois")
#==========
# Clean up
#---------
rm(x, y)
graphics.off()