cdfCompareCensored.Rd
For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.
cdfCompareCensored(x, censored, censoring.side = "left",
y = NULL, y.censored = NULL, y.censoring.side = censoring.side,
discrete = FALSE, prob.method = "michael-schucany",
plot.pos.con = NULL, distribution = "norm", param.list = NULL,
estimate.params = is.null(param.list), est.arg.list = NULL,
x.col = "blue", y.or.fitted.col = "black", x.lwd = 3 * par("cex"),
y.or.fitted.lwd = 3 * par("cex"), x.lty = 1, y.or.fitted.lty = 2,
include.x.cen = FALSE, x.cen.pch = ifelse(censoring.side == "left", 6, 2),
x.cen.cex = par("cex"), x.cen.col = "red",
include.y.cen = FALSE, y.cen.pch = ifelse(y.censoring.side == "left", 6, 2),
y.cen.cex = par("cex"), y.cen.col = "black", digits = .Options$digits, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
numeric or logical vector indicating which values of x
are censored. This must be the
same length as x
. If the mode of censored
is "logical"
, TRUE
values
correspond to elements of x
that are censored, and FALSE
values correspond to
elements of x
that are not censored. If the mode of censored
is "numeric"
,
it must contain only 1
's and 0
's; 1
corresponds to TRUE
and
0
corresponds to FALSE
. Missing (NA
) values are allowed but will be removed.
character string indicating on which side the censoring occurs. The possible values are
"left"
(the default) and "right"
.
a numeric vector (not necessarily of the same length as x
).
Missing (NA
), undefined (NaN
), and infinite
(Inf
, -Inf
) values are allowed but will be removed.
The default value is y=NULL
, in which case the empirical cdf of
x
will be plotted along with the theoretical cdf specified by the
argument distribution
.
numeric or logical vector indicating which values of y
are censored.
This must be the same length as y
. If the mode of censored
is
"logical"
, TRUE
values correspond to elements of y
that are
censored, and FALSE
values correspond to elements of y
that are not
censored. If the mode of censored
is "numeric"
, it must contain only
1
's and 0
's; 1
corresponds to TRUE
and
0
corresponds to FALSE
.
Missing (NA
) values are allowed but will be removed.
This argument is ignored when y
is not supplied. The default value is y.censored=NULL
since the default value of y
is y=NULL
.
character string indicating on which side the censoring occurs for the values of
y
. The possible values are "left"
(the default) and "right"
.
This argument is ignored when y
is not supplied. The default value is y.censoring.side=censoring.side
.
logical scalar indicating whether the assumed parent distribution of x
is discrete (discrete=TRUE
) or continuous (discrete=FALSE
; the default).
character string indicating what method to use to compute the plotting positions (empirical probabilities).
Possible values are
"kaplan-meier"
(product-limit method of Kaplan and Meier (1958)),
"nelson"
(hazard plotting method of Nelson (1972)),
"michael-schucany"
(generalization of the product-limit method due to Michael and Schucany (1986)), and
"hirsch-stedinger"
(generalization of the product-limit method due to Hirsch and Stedinger (1987)).
The default value is prob.method="michael-schucany"
.
The "nelson"
method is only available for censoring.side="right"
.
See the help file for ecdfPlotCensored
for more explanation.
numeric scalar between 0 and 1 containing the value of the plotting position constant.
When y
is supplied, the default value is plot.pos.con=0.375
.
When y
is not supplied, for the normal, lognormal, three-parameter lognormal,
zero-modified normal, and zero-modified lognormal distributions, the default value
is plot.pos.con=0.375
.
For the Type I extreme value (Gumbel) distribution (distribution="evd"
),
the default value is plot.pos.con=0.44
. For all other distributions, the
default value is plot.pos.con=0.4
.
See the help files for ecdfPlot
and qqPlot
for more
information. This argument is used only if prob.method
is equal to
"michael-schucany"
or "hirsch-stedinger"
.
when y
is not supplied,
a character string denoting the distribution abbreviation. The default value is
distribution="norm"
. See the help file for Distribution.df
for a
list of possible distribution abbreviations. This argument is ignored if y
is supplied.
when y
is not supplied,
a list with values for the parameters of the distribution. The default value is
param.list=list(mean=0, sd=1)
. See the help file for Distribution.df
for the names and possible values of the parameters associated with each distribution.
This argument is ignored if y
is supplied or estimate.params=TRUE
.
when y
is not supplied,
a logical scalar indicating whether to compute the cdf for x
based on estimating the distribution parameters (estimate.params=TRUE
) or
using the known distribution parameters specified in param.list
(estimate.params=FALSE
). The default value is TRUE
unless the argument param.list
is supplied. The argument estimate.params
is ignored if y
is supplied.
when y
is not supplied and estimate.params=TRUE
,
a list whose components are optional arguments associated with the function used to
estimate the parameters of the assumed distribution (see the Section
Estimating Distribution Parameters in the help file Censored Data).
For example, all functions used to estimate distribution parameters have an
optional argument called method
that specifies the method to use to estimate
the parameters.
(See the help file for Distribution.df
for a list of available estimation
methods for each distribution.) To override the default estimation method, supply the argument
est.arg.list
with a component called method
; for example est.arg.list=list(method="mle")
. The default value is
est.arg.list=NULL
so that all default values for the estimating function are used.
This argument is ignored if estimate.params=FALSE
or y
is supplied.
a numeric scalar or character string determining the color of the empirical cdf
(based on x
) line or points. The default value is x.col="blue"
.
See the entry for col
in the help file for par
for more
information.
a numeric scalar or character string determining the color of the empirical cdf
(based on y
) or the theoretical cdf line or points.
The default value is y.or.fitted.col="black"
. See the entry for
col
in the help file for par
for more information.
a numeric scalar determining the width of the empirical cdf (based on x
) line.
The default value is x.lwd=3*par("cex")
.
See the entry for lwd
in the help file for par
for more information.
a numeric scalar determining the width of the empirical cdf (based on y
)
or theoretical cdf line.
The default value is y.or.fitted.lwd=3*par("cex")
.
See the entry for lwd
in the help file for par
for more information.
a numeric scalar determining the line type of the empirical cdf
(based on x
) line. The default value is
x.lty=1
. See the entry for lty
in the help file for par
for more information.
a numeric scalar determining the line type of the empirical cdf
(based on y
) or theoretical cdf line. The default value is
y.or.fitted.lty=2
.
See the entry for lty
in the help file for par
for more information.
logical scalar indicating whether to include censored values in x
in the
plot. The default value is include.x.cen=FALSE
. If include.x.cen=TRUE
,
censored values in x
are plotted using the plotting character indicated by
the argument x.cen.pch
(see below). This argument is ignored if there are
no censored values in x
.
numeric scalar or character string indicating the plotting character to use to plot
censored values in x
. The default value is x.cen.pch=2
(hollow triangle pointing up) when x.censoring.side="right"
, and
x.cen.pch=6
(hollow triangle pointing down) when x.censoring.side="left"
.
See the R help file for points
for an explanation of how plotting
symbols are specified. This argument is ignored if include.x.cen=FALSE
.
numeric scalar that determines the size of the plotting character used to plot
censored values in x
. The default value is the current value of the
cex
graphics parameter. See the entry for cex
in the R help file for
par
for more information. This argument is ignored if
include.x.cen=FALSE
.
numeric scalar or character string that determines the color of the plotting
character used to plot censored values in x
. The default value is
x.cen.col="red"
. See the entry for col
in the R help file for
par
for more information. This argument is ignored if
include.x.cen=FALSE
.
logical scalar indicating whether to include censored values in y
in the plot.
The default value is include.y.cen=FALSE
. If include.y.cen=TRUE
,
censored values in y
are plotted using the plotting character indicated by
the argument y.cen.pch
(see below). This argument is ignored if y
is
not supplied and/or there are no censored values in y
.
numeric scalar or character string indicating the plotting character to use to plot
censored values in y
. The default value is y.cen.pch=2
(hollow triangle pointing up) when y.censoring.side="right"
, and
y.cen.pch=6
(hollow triangle pointing down) when y.censoring.side="left"
.
See the R help file for points
for an explanation of how plotting
symbols are specified. This argument is ignored if include.y.cen=FALSE
.
numeric scalar that determines the size of the plotting character used to plot
censored values in y
. The default value is the current value of the
cex
graphics parameter. See the entry for cex
in the R help file for
par
for more information. This argument is ignored if
include.y.cen=FALSE
.
numeric scalar or character string that determines the color of the plotting
character used to plot censored values in y
. The default value is
y.cen.col="black"
. See the entry for col
in the R help file for
par
for more information. This argument is ignored if
include.y.cen=FALSE
.
when y
is not supplied,
a scalar indicating how many significant digits to print for the distribution
parameters. The default value is digits=.Options$digits
.
additional graphical parameters (see lines
and par
).
In particular, the argument type
specifies the kind of line type.
By default, the function cdfCompareCensored
plots a step function (type="s"
)
when discrete=TRUE
, and plots a straight line between points
(type="l"
) when discrete=FALSE
.
The user may override these defaults by supplying the graphics parameter type
(type="s"
for a step function, type="l"
for linear interpolation,
type="p"
for points only, etc.).
When both x
and y
are supplied, the function cdfCompareCensored
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlotCensored
.
When y
is not supplied, the function cdfCompareCensored
creates the
emprical cdf plot of x
(by calling ecdfPlotCensored
) and the
theoretical cdf plot (by calling cdfPlot
and using the
argument distribution
) on the same plot.
When y
is supplied, cdfCompareCensored
invisibly returns a list with
components:
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the x
values.
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the y
values.
When y
is not supplied, cdfCompareCensored
invisibly returns a list with
components:
a list with components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted for the x
values.
a list with components Quantiles
and
Cumulative.Probabilities
, giving coordinates of the
points that have been plotted for the fitted cdf.
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse Kaplan-Meier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997-2004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715-727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457-481.
Lee, E.T., and J.W. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley & Sons, Hoboken, New Jersey, 513pp.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461-496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945-966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.
Chambers et al. (1983, pp.11-16) plot the observed order statistics on the \(y\)-axis vs. the ecdf on the \(x\)-axis and call this a quantile plot.
Censored observations complicate the procedures used to graphically explore data.
Techniques from survival analysis and life testing have been developed to generalize
the procedures for constructing plotting positions, empirical cdf plots, and
q-q plots to data sets with censored observations
(see ppointsCensored
).
Empirical cumulative distribution function (ecdf) plots are often plotted with
theoretical cdf plots to graphically assess whether a sample of observations
comes from a particular distribution. More often, however, quantile-quantile
(Q-Q) plots are used instead of ecdf plots to graphically assess departures from
an assumed distribution (see qqPlotCensored
).
# Generate 20 observations from a normal distribution with mean=20 and sd=5,
# censor all observations less than 18, then compare the empirical cdf with a
# theoretical normal cdf that is based on estimating the parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(333)
x <- sort(rnorm(20, mean=20, sd=5))
x
#> [1] 9.743551 12.370197 14.375499 15.628482 15.883507 17.080124 17.197588
#> [8] 18.097714 18.654182 19.585942 20.219308 20.268505 20.552964 21.388695
#> [15] 21.763587 21.823639 23.168039 26.165269 26.843362 29.673405
# [1] 9.743551 12.370197 14.375499 15.628482 15.883507 17.080124
# [7] 17.197588 18.097714 18.654182 19.585942 20.219308 20.268505
#[13] 20.552964 21.388695 21.763587 21.823639 23.168039 26.165269
#[19] 26.843362 29.673405
censored <- x < 18
x[censored] <- 18
sum(censored)
#> [1] 7
#[1] 7
dev.new()
cdfCompareCensored(x, censored)
# Clean up
#---------
rm(x, censored)
#==========
# Example 15-1 of USEPA (2009, page 15-10) gives an example of
# computing plotting positions based on censored manganese
# concentrations (ppb) in groundwater collected at 5 monitoring
# wells. The data for this example are stored in
# EPA.09.Ex.15.1.manganese.df. Here we will compare the empirical
# cdf based on Kaplan-Meier plotting positions or Michael-Schucany
# plotting positions with various assumed distributions
# (based on estimating the parameters of these distributions):
# 1) normal distribution
# 2) lognormal distribution
# 3) gamma distribution
# First look at the data:
#------------------------
EPA.09.Ex.15.1.manganese.df
#> Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#> 1 1 Well.1 <5 5.0 TRUE
#> 2 2 Well.1 12.1 12.1 FALSE
#> 3 3 Well.1 16.9 16.9 FALSE
#> 4 4 Well.1 21.6 21.6 FALSE
#> 5 5 Well.1 <2 2.0 TRUE
#> 6 1 Well.2 <5 5.0 TRUE
#> 7 2 Well.2 7.7 7.7 FALSE
#> 8 3 Well.2 53.6 53.6 FALSE
#> 9 4 Well.2 9.5 9.5 FALSE
#> 10 5 Well.2 45.9 45.9 FALSE
#> 11 1 Well.3 <5 5.0 TRUE
#> 12 2 Well.3 5.3 5.3 FALSE
#> 13 3 Well.3 12.6 12.6 FALSE
#> 14 4 Well.3 106.3 106.3 FALSE
#> 15 5 Well.3 34.5 34.5 FALSE
#> 16 1 Well.4 6.3 6.3 FALSE
#> 17 2 Well.4 11.9 11.9 FALSE
#> 18 3 Well.4 10 10.0 FALSE
#> 19 4 Well.4 <2 2.0 TRUE
#> 20 5 Well.4 77.2 77.2 FALSE
#> 21 1 Well.5 17.9 17.9 FALSE
#> 22 2 Well.5 22.7 22.7 FALSE
#> 23 3 Well.5 3.3 3.3 FALSE
#> 24 4 Well.5 8.4 8.4 FALSE
#> 25 5 Well.5 <2 2.0 TRUE
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#4 4 Well.1 21.6 21.6 FALSE
#5 5 Well.1 <2 2.0 TRUE
#...
#21 1 Well.5 17.9 17.9 FALSE
#22 2 Well.5 22.7 22.7 FALSE
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
longToWide(EPA.09.Ex.15.1.manganese.df,
"Manganese.Orig.ppb", "Sample", "Well",
paste.row.name = TRUE)
#> Well.1 Well.2 Well.3 Well.4 Well.5
#> Sample.1 <5 <5 <5 6.3 17.9
#> Sample.2 12.1 7.7 5.3 11.9 22.7
#> Sample.3 16.9 53.6 12.6 10 3.3
#> Sample.4 21.6 9.5 106.3 <2 8.4
#> Sample.5 <2 45.9 34.5 77.2 <2
# Well.1 Well.2 Well.3 Well.4 Well.5
#Sample.1 <5 <5 <5 6.3 17.9
#Sample.2 12.1 7.7 5.3 11.9 22.7
#Sample.3 16.9 53.6 12.6 10 3.3
#Sample.4 21.6 9.5 106.3 <2 8.4
#Sample.5 <2 45.9 34.5 77.2 <2
# Assume a normal distribution
#-----------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored,
prob.method = "kaplan-meier"))
# Assume a lognormal distribution
#--------------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm"))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm",
prob.method = "kaplan-meier"))
# Assume a gamma distribution
#----------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma"))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma",
prob.method = "kaplan-meier"))
# Clean up
#---------
graphics.off()
#==========
# Compare the distributions of copper and zinc between the Alluvial Fan Zone
# and the Basin-Trough Zone using the data of Millard and Deverel (1988).
# The data are stored in Millard.Deverel.88.df.
Millard.Deverel.88.df
#> Cu.orig Cu Cu.censored Zn.orig Zn Zn.censored Zone Location
#> 1 < 1 1 TRUE <10 10 TRUE Alluvial.Fan 1
#> 2 < 1 1 TRUE 9 9 FALSE Alluvial.Fan 2
#> 3 3 3 FALSE NA NA FALSE Alluvial.Fan 3
#> 4 3 3 FALSE 5 5 FALSE Alluvial.Fan 4
#> 5 5 5 FALSE 18 18 FALSE Alluvial.Fan 5
#> 6 1 1 FALSE <10 10 TRUE Alluvial.Fan 6
#> 7 4 4 FALSE 12 12 FALSE Alluvial.Fan 7
#> 8 4 4 FALSE 10 10 FALSE Alluvial.Fan 8
#> 9 2 2 FALSE 11 11 FALSE Alluvial.Fan 9
#> 10 2 2 FALSE 11 11 FALSE Alluvial.Fan 10
#> 11 1 1 FALSE 19 19 FALSE Alluvial.Fan 11
#> 12 2 2 FALSE 8 8 FALSE Alluvial.Fan 12
#> 13 < 5 5 TRUE < 3 3 TRUE Alluvial.Fan 13
#> 14 11 11 FALSE <10 10 TRUE Alluvial.Fan 14
#> 15 < 1 1 TRUE <10 10 TRUE Alluvial.Fan 15
#> 16 2 2 FALSE 10 10 FALSE Alluvial.Fan 16
#> 17 2 2 FALSE 10 10 FALSE Alluvial.Fan 17
#> 18 2 2 FALSE 10 10 FALSE Alluvial.Fan 18
#> 19 2 2 FALSE 10 10 FALSE Alluvial.Fan 19
#> 20 <20 20 TRUE <10 10 TRUE Alluvial.Fan 20
#> 21 2 2 FALSE 10 10 FALSE Alluvial.Fan 21
#> 22 2 2 FALSE <10 10 TRUE Alluvial.Fan 22
#> 23 3 3 FALSE 10 10 FALSE Alluvial.Fan 23
#> 24 3 3 FALSE <10 10 TRUE Alluvial.Fan 24
#> 25 NA NA FALSE 10 10 FALSE Alluvial.Fan 25
#> 26 <20 20 TRUE <10 10 TRUE Alluvial.Fan 26
#> 27 <10 10 TRUE 10 10 FALSE Alluvial.Fan 27
#> 28 7 7 FALSE 10 10 FALSE Alluvial.Fan 28
#> 29 5 5 FALSE 20 20 FALSE Alluvial.Fan 29
#> 30 2 2 FALSE 20 20 FALSE Alluvial.Fan 30
#> 31 2 2 FALSE <10 10 TRUE Alluvial.Fan 31
#> 32 <10 10 TRUE 20 20 FALSE Alluvial.Fan 32
#> 33 7 7 FALSE 20 20 FALSE Alluvial.Fan 33
#> 34 12 12 FALSE 20 20 FALSE Alluvial.Fan 34
#> 35 < 1 1 TRUE <10 10 TRUE Alluvial.Fan 35
#> 36 20 20 FALSE 10 10 FALSE Alluvial.Fan 36
#> 37 NA NA FALSE 20 20 FALSE Alluvial.Fan 37
#> 38 NA NA FALSE 620 620 FALSE Alluvial.Fan 38
#> 39 16 16 FALSE 40 40 FALSE Alluvial.Fan 39
#> 40 < 5 5 TRUE 50 50 FALSE Alluvial.Fan 40
#> 41 1 1 FALSE 33 33 FALSE Alluvial.Fan 41
#> 42 2 2 FALSE 10 10 FALSE Alluvial.Fan 42
#> 43 < 5 5 TRUE 20 20 FALSE Alluvial.Fan 43
#> 44 3 3 FALSE 10 10 FALSE Alluvial.Fan 44
#> 45 2 2 FALSE 10 10 FALSE Alluvial.Fan 45
#> 46 8 8 FALSE 10 10 FALSE Alluvial.Fan 46
#> 47 7 7 FALSE 30 30 FALSE Alluvial.Fan 47
#> 48 5 5 FALSE 20 20 FALSE Alluvial.Fan 48
#> 49 < 5 5 TRUE 10 10 FALSE Alluvial.Fan 49
#> 50 2 2 FALSE 20 20 FALSE Alluvial.Fan 50
#> 51 <10 10 TRUE 20 20 FALSE Alluvial.Fan 51
#> 52 < 5 5 TRUE 20 20 FALSE Alluvial.Fan 52
#> 53 < 5 5 TRUE <10 10 TRUE Alluvial.Fan 53
#> 54 2 2 FALSE 20 20 FALSE Alluvial.Fan 54
#> 55 10 10 FALSE 23 23 FALSE Alluvial.Fan 55
#> 56 2 2 FALSE 17 17 FALSE Alluvial.Fan 56
#> 57 4 4 FALSE 10 10 FALSE Alluvial.Fan 57
#> 58 < 5 5 TRUE <10 10 TRUE Alluvial.Fan 58
#> 59 2 2 FALSE 10 10 FALSE Alluvial.Fan 59
#> 60 3 3 FALSE 20 20 FALSE Alluvial.Fan 60
#> 61 9 9 FALSE 29 29 FALSE Alluvial.Fan 61
#> 62 < 5 5 TRUE 20 20 FALSE Alluvial.Fan 62
#> 63 2 2 FALSE <10 10 TRUE Alluvial.Fan 63
#> 64 2 2 FALSE 10 10 FALSE Alluvial.Fan 64
#> 65 2 2 FALSE <10 10 TRUE Alluvial.Fan 65
#> 66 2 2 FALSE 10 10 FALSE Alluvial.Fan 66
#> 67 1 1 FALSE 7 7 FALSE Alluvial.Fan 67
#> 68 1 1 FALSE <10 10 TRUE Alluvial.Fan 68
#> 69 2 2 FALSE 20 20 FALSE Basin.Trough 1
#> 70 2 2 FALSE 10 10 FALSE Basin.Trough 2
#> 71 12 12 FALSE 60 60 FALSE Basin.Trough 3
#> 72 2 2 FALSE 20 20 FALSE Basin.Trough 4
#> 73 1 1 FALSE 12 12 FALSE Basin.Trough 5
#> 74 <10 10 TRUE 8 8 FALSE Basin.Trough 6
#> 75 <10 10 TRUE <10 10 TRUE Basin.Trough 7
#> 76 4 4 FALSE 14 14 FALSE Basin.Trough 8
#> 77 <10 10 TRUE <10 10 TRUE Basin.Trough 9
#> 78 < 1 1 TRUE 17 17 FALSE Basin.Trough 10
#> 79 1 1 FALSE < 3 3 TRUE Basin.Trough 11
#> 80 < 2 2 TRUE 11 11 FALSE Basin.Trough 12
#> 81 < 2 2 TRUE 5 5 FALSE Basin.Trough 13
#> 82 1 1 FALSE 12 12 FALSE Basin.Trough 14
#> 83 2 2 FALSE 4 4 FALSE Basin.Trough 15
#> 84 <10 10 TRUE 3 3 FALSE Basin.Trough 16
#> 85 3 3 FALSE 6 6 FALSE Basin.Trough 17
#> 86 < 1 1 TRUE 3 3 FALSE Basin.Trough 18
#> 87 1 1 FALSE 15 15 FALSE Basin.Trough 19
#> 88 1 1 FALSE 13 13 FALSE Basin.Trough 20
#> 89 3 3 FALSE 4 4 FALSE Basin.Trough 21
#> 90 < 5 5 TRUE 20 20 FALSE Basin.Trough 22
#> 91 NA NA FALSE 20 20 FALSE Basin.Trough 23
#> 92 17 17 FALSE 70 70 FALSE Basin.Trough 24
#> 93 23 23 FALSE 60 60 FALSE Basin.Trough 25
#> 94 9 9 FALSE 40 40 FALSE Basin.Trough 26
#> 95 9 9 FALSE 30 30 FALSE Basin.Trough 27
#> 96 3 3 FALSE 40 40 FALSE Basin.Trough 28
#> 97 3 3 FALSE 17 17 FALSE Basin.Trough 29
#> 98 <15 15 TRUE 10 10 FALSE Basin.Trough 30
#> 99 < 5 5 TRUE 20 20 FALSE Basin.Trough 31
#> 100 4 4 FALSE 20 20 FALSE Basin.Trough 32
#> 101 < 5 5 TRUE 5 5 FALSE Basin.Trough 33
#> 102 < 5 5 TRUE 10 10 FALSE Basin.Trough 34
#> 103 < 5 5 TRUE 50 50 FALSE Basin.Trough 35
#> 104 4 4 FALSE 30 30 FALSE Basin.Trough 36
#> 105 8 8 FALSE 25 25 FALSE Basin.Trough 37
#> 106 1 1 FALSE <10 10 TRUE Basin.Trough 38
#> 107 15 15 FALSE 10 10 FALSE Basin.Trough 39
#> 108 3 3 FALSE 40 40 FALSE Basin.Trough 40
#> 109 3 3 FALSE 20 20 FALSE Basin.Trough 41
#> 110 1 1 FALSE 10 10 FALSE Basin.Trough 42
#> 111 6 6 FALSE 20 20 FALSE Basin.Trough 43
#> 112 3 3 FALSE 20 20 FALSE Basin.Trough 44
#> 113 6 6 FALSE 30 30 FALSE Basin.Trough 45
#> 114 3 3 FALSE 20 20 FALSE Basin.Trough 46
#> 115 4 4 FALSE 30 30 FALSE Basin.Trough 47
#> 116 5 5 FALSE 50 50 FALSE Basin.Trough 48
#> 117 14 14 FALSE 90 90 FALSE Basin.Trough 49
#> 118 4 4 FALSE 20 20 FALSE Basin.Trough 50
# Cu.orig Cu Cu.censored Zn.orig Zn Zn.censored Zone Location
#1 < 1 1 TRUE <10 10 TRUE Alluvial.Fan 1
#2 < 1 1 TRUE 9 9 FALSE Alluvial.Fan 2
#3 3 3 FALSE NA NA FALSE Alluvial.Fan 3
#.
#.
#.
#116 5 5 FALSE 50 50 FALSE Basin.Trough 48
#117 14 14 FALSE 90 90 FALSE Basin.Trough 49
#118 4 4 FALSE 20 20 FALSE Basin.Trough 50
Cu.AF <- with(Millard.Deverel.88.df,
Cu[Zone == "Alluvial.Fan"])
Cu.AF.cen <- with(Millard.Deverel.88.df,
Cu.censored[Zone == "Alluvial.Fan"])
Cu.BT <- with(Millard.Deverel.88.df,
Cu[Zone == "Basin.Trough"])
Cu.BT.cen <- with(Millard.Deverel.88.df,
Cu.censored[Zone == "Basin.Trough"])
Zn.AF <- with(Millard.Deverel.88.df,
Zn[Zone == "Alluvial.Fan"])
Zn.AF.cen <- with(Millard.Deverel.88.df,
Zn.censored[Zone == "Alluvial.Fan"])
Zn.BT <- with(Millard.Deverel.88.df,
Zn[Zone == "Basin.Trough"])
Zn.BT.cen <- with(Millard.Deverel.88.df,
Zn.censored[Zone == "Basin.Trough"])
# First compare the copper concentrations
#----------------------------------------
dev.new()
cdfCompareCensored(x = Cu.AF, censored = Cu.AF.cen,
y = Cu.BT, y.censored = Cu.BT.cen)
#> Warning: 3 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.
#> Warning: 1 observations with NA/NaN/Inf in 'y' and/or 'y.censored' removed.
# Now compare the zinc concentrations
#------------------------------------
dev.new()
cdfCompareCensored(x = Zn.AF, censored = Zn.AF.cen,
y = Zn.BT, y.censored = Zn.BT.cen)
#> Warning: 1 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.
# Compare the Zinc concentrations again, but delete
# the one "outlier".
#--------------------------------------------------
summaryStats(Zn.AF)
#> N Mean SD Median Min Max NA's N.Total
#> Zn.AF 67 23.5075 74.4192 10 3 620 1 68
# N Mean SD Median Min Max NA's N.Total
#Zn.AF 67 23.5075 74.4192 10 3 620 1 68
summaryStats(Zn.BT)
#> N Mean SD Median Min Max
#> Zn.BT 50 21.94 18.7044 18.5 3 90
# N Mean SD Median Min Max
#Zn.BT 50 21.94 18.7044 18.5 3 90
which(Zn.AF == 620)
#> [1] 38
#[1] 38
summaryStats(Zn.AF[-38])
#> N Mean SD Median Min Max NA's N.Total
#> Zn.AF[-38] 66 14.4697 8.1604 10 3 50 1 67
# N Mean SD Median Min Max NA's N.Total
#Zn.AF[-38] 66 14.4697 8.1604 10 3 50 1 67
dev.new()
cdfCompareCensored(x = Zn.AF[-38], censored = Zn.AF.cen[-38],
y = Zn.BT, y.censored = Zn.BT.cen)
#> Warning: 1 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.
#----------
# Clean up
#---------
rm(Cu.AF, Cu.AF.cen, Cu.BT, Cu.BT.cen,
Zn.AF, Zn.AF.cen, Zn.BT, Zn.BT.cen)
graphics.off()