Plot Two Cumulative Distribution Functions Based on Censored Data

For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.

cdfCompareCensored(x, censored, censoring.side = "left",
    y = NULL, y.censored = NULL, y.censoring.side = censoring.side,
    discrete = FALSE, prob.method = "michael-schucany",
    plot.pos.con = NULL, distribution = "norm", param.list = NULL,
    estimate.params = is.null(param.list), est.arg.list = NULL,
    x.col = "blue", y.or.fitted.col = "black", x.lwd = 3 * par("cex"),
    y.or.fitted.lwd = 3 * par("cex"), x.lty = 1, y.or.fitted.lty = 2,
    include.x.cen = FALSE, x.cen.pch = ifelse(censoring.side == "left", 6, 2),
    x.cen.cex = par("cex"), x.cen.col = "red",
    include.y.cen = FALSE, y.cen.pch = ifelse(y.censoring.side == "left", 6, 2),
    y.cen.cex = par("cex"), y.cen.col = "black", digits = .Options$digits, ...,
    type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
    xlim = NULL, ylim = NULL)

Arguments

x

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

censored

numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of x that are censored, and FALSE values correspond to elements of x that are not censored. If the mode of censored is "numeric", it must contain only 1's and 0's; 1 corresponds to TRUE and 0 corresponds to FALSE. Missing (NA) values are allowed but will be removed.

censoring.side

character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".

y

a numeric vector (not necessarily of the same length as x). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. The default value is y=NULL, in which case the empirical cdf of x will be plotted along with the theoretical cdf specified by the argument distribution.

y.censored

numeric or logical vector indicating which values of y are censored. This must be the same length as y. If the mode of censored is "logical", TRUE values correspond to elements of y that are censored, and FALSE values correspond to elements of y that are not censored. If the mode of censored is "numeric", it must contain only 1's and 0's; 1 corresponds to TRUE and 0 corresponds to FALSE. Missing (NA) values are allowed but will be removed.

This argument is ignored when y is not supplied. The default value is
y.censored=NULL since the default value of y is y=NULL.

y.censoring.side

character string indicating on which side the censoring occurs for the values of y. The possible values are "left" (the default) and "right". This argument is ignored when y is not supplied. The default value is
y.censoring.side=censoring.side.

discrete

logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE; the default).

prob.method

character string indicating what method to use to compute the plotting positions (empirical probabilities). Possible values are "kaplan-meier" (product-limit method of Kaplan and Meier (1958)), "nelson" (hazard plotting method of Nelson (1972)), "michael-schucany" (generalization of the product-limit method due to Michael and Schucany (1986)), and "hirsch-stedinger" (generalization of the product-limit method due to Hirsch and Stedinger (1987)). The default value is prob.method="michael-schucany".

The "nelson" method is only available for censoring.side="right". See the help file for ecdfPlotCensored for more explanation.

plot.pos.con

numeric scalar between 0 and 1 containing the value of the plotting position constant. When y is supplied, the default value is plot.pos.con=0.375. When y is not supplied, for the normal, lognormal, three-parameter lognormal, zero-modified normal, and zero-modified lognormal distributions, the default value is plot.pos.con=0.375. For the Type I extreme value (Gumbel) distribution (distribution="evd"), the default value is plot.pos.con=0.44. For all other distributions, the default value is plot.pos.con=0.4. See the help files for ecdfPlot and qqPlot for more information. This argument is used only if prob.method is equal to "michael-schucany" or "hirsch-stedinger".

distribution

when y is not supplied, a character string denoting the distribution abbreviation. The default value is distribution="norm". See the help file for
Distribution.df for a list of possible distribution abbreviations. This argument is ignored if y is supplied.

param.list

when y is not supplied, a list with values for the parameters of the distribution. The default value is param.list=list(mean=0, sd=1). See the help file for Distribution.df for the names and possible values of the parameters associated with each distribution. This argument is ignored if y is supplied or estimate.params=TRUE.

estimate.params

when y is not supplied, a logical scalar indicating whether to compute the cdf for x based on estimating the distribution parameters (estimate.params=TRUE) or using the known distribution parameters specified in param.list
(estimate.params=FALSE). The default value is TRUE unless the argument
param.list is supplied. The argument estimate.params is ignored if y is supplied.

est.arg.list

when y is not supplied and estimate.params=TRUE, a list whose components are optional arguments associated with the function used to estimate the parameters of the assumed distribution (see the Section Estimating Distribution Parameters in the help file Censored Data). For example, all functions used to estimate distribution parameters have an optional argument called method that specifies the method to use to estimate the parameters. (See the help file for Distribution.df for a list of available estimation methods for each distribution.) To override the default estimation method, supply the argument est.arg.list with a component called method; for example
est.arg.list=list(method="mle"). The default value is est.arg.list=NULL so that all default values for the estimating function are used. This argument is ignored if estimate.params=FALSE or y is supplied.

x.col

a numeric scalar or character string determining the color of the empirical cdf (based on x) line or points. The default value is x.col="blue". See the entry for col in the help file for par for more information.

y.or.fitted.col

a numeric scalar or character string determining the color of the empirical cdf (based on y) or the theoretical cdf line or points. The default value is
y.or.fitted.col="black". See the entry for col in the help file for par for more information.

x.lwd

a numeric scalar determining the width of the empirical cdf (based on x) line. The default value is x.lwd=3*par("cex"). See the entry for lwd in the help file for par for more information.

y.or.fitted.lwd

a numeric scalar determining the width of the empirical cdf (based on y) or theoretical cdf line. The default value is y.or.fitted.lwd=3*par("cex"). See the entry for lwd in the help file for par for more information.

x.lty

a numeric scalar determining the line type of the empirical cdf (based on x) line. The default value is x.lty=1. See the entry for lty in the help file for par for more information.

y.or.fitted.lty

a numeric scalar determining the line type of the empirical cdf (based on y) or theoretical cdf line. The default value is y.or.fitted.lty=2. See the entry for lty in the help file for par for more information.

include.x.cen

logical scalar indicating whether to include censored values in x in the plot. The default value is include.x.cen=FALSE. If include.x.cen=TRUE, censored values in x are plotted using the plotting character indicated by the argument x.cen.pch (see below). This argument is ignored if there are no censored values in x.

x.cen.pch

numeric scalar or character string indicating the plotting character to use to plot censored values in x. The default value is x.cen.pch=2 (hollow triangle pointing up) when x.censoring.side="right", and x.cen.pch=6 (hollow triangle pointing down) when x.censoring.side="left". See the R help file for points for an explanation of how plotting symbols are specified. This argument is ignored if include.x.cen=FALSE.

x.cen.cex

numeric scalar that determines the size of the plotting character used to plot censored values in x. The default value is the current value of the cex graphics parameter. See the entry for cex in the R help file for par for more information. This argument is ignored if include.x.cen=FALSE.

x.cen.col

numeric scalar or character string that determines the color of the plotting character used to plot censored values in x. The default value is x.cen.col="red". See the entry for col in the R help file for par for more information. This argument is ignored if include.x.cen=FALSE.

include.y.cen

logical scalar indicating whether to include censored values in y in the plot. The default value is include.y.cen=FALSE. If include.y.cen=TRUE, censored values in y are plotted using the plotting character indicated by the argument y.cen.pch (see below). This argument is ignored if y is not supplied and/or there are no censored values in y.

y.cen.pch

numeric scalar or character string indicating the plotting character to use to plot censored values in y. The default value is y.cen.pch=2 (hollow triangle pointing up) when y.censoring.side="right", and y.cen.pch=6 (hollow triangle pointing down) when y.censoring.side="left". See the R help file for points for an explanation of how plotting symbols are specified. This argument is ignored if include.y.cen=FALSE.

y.cen.cex

numeric scalar that determines the size of the plotting character used to plot censored values in y. The default value is the current value of the cex graphics parameter. See the entry for cex in the R help file for par for more information. This argument is ignored if include.y.cen=FALSE.

y.cen.col

numeric scalar or character string that determines the color of the plotting character used to plot censored values in y. The default value is y.cen.col="black". See the entry for col in the R help file for par for more information. This argument is ignored if include.y.cen=FALSE.

digits

when y is not supplied, a scalar indicating how many significant digits to print for the distribution parameters. The default value is digits=.Options$digits.

type, main, xlab, ylab, xlim, ylim, ...

additional graphical parameters (see lines and par). In particular, the argument type specifies the kind of line type. By default, the function
cdfCompareCensored plots a step function (type="s") when discrete=TRUE, and plots a straight line between points (type="l") when discrete=FALSE. The user may override these defaults by supplying the graphics parameter type (type="s" for a step function, type="l" for linear interpolation, type="p" for points only, etc.).

Details

When both x and y are supplied, the function cdfCompareCensored creates the empirical cdf plot of x and y on the same plot by calling the function ecdfPlotCensored.

When y is not supplied, the function cdfCompareCensored creates the emprical cdf plot of x (by calling ecdfPlotCensored) and the theoretical cdf plot (by calling cdfPlot and using the argument distribution) on the same plot.

Value

When y is supplied, cdfCompareCensored invisibly returns a list with components:

x.ecdf.list: a list with components Order.Statistics and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the x values.
y.ecdf.list: a list with components Order.Statistics and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the y values.

When y is not supplied, cdfCompareCensored invisibly returns a list with components:

x.ecdf.list: a list with components Order.Statistics and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the x values.
fitted.cdf.list: a list with components Quantiles and Cumulative.Probabilities, giving coordinates of the points that have been plotted for the fitted cdf.

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.

Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.

D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.

Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse Kaplan-Meier Estimator. Epidemiology 21(4), S64–S70.

Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey.

Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997-2004.

Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715-727.

Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457-481.

Lee, E.T., and J.W. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley & Sons, Hoboken, New Jersey, 513pp.

Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461-496.

Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945-966.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.

Chambers et al. (1983, pp.11-16) plot the observed order statistics on the $y$-axis vs. the ecdf on the $x$-axis and call this a quantile plot.

Censored observations complicate the procedures used to graphically explore data. Techniques from survival analysis and life testing have been developed to generalize the procedures for constructing plotting positions, empirical cdf plots, and q-q plots to data sets with censored observations (see ppointsCensored).

Empirical cumulative distribution function (ecdf) plots are often plotted with theoretical cdf plots to graphically assess whether a sample of observations comes from a particular distribution. More often, however, quantile-quantile (Q-Q) plots are used instead of ecdf plots to graphically assess departures from an assumed distribution (see qqPlotCensored).

Examples

  # Generate 20 observations from a normal distribution with mean=20 and sd=5,
  # censor all observations less than 18, then compare the empirical cdf with a
  # theoretical normal cdf that is based on estimating the parameters.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(333)
  x <- sort(rnorm(20, mean=20, sd=5))
  x
#>  [1]  9.743551 12.370197 14.375499 15.628482 15.883507 17.080124 17.197588
#>  [8] 18.097714 18.654182 19.585942 20.219308 20.268505 20.552964 21.388695
#> [15] 21.763587 21.823639 23.168039 26.165269 26.843362 29.673405
  # [1]  9.743551 12.370197 14.375499 15.628482 15.883507 17.080124
  # [7] 17.197588 18.097714 18.654182 19.585942 20.219308 20.268505
  #[13] 20.552964 21.388695 21.763587 21.823639 23.168039 26.165269
  #[19] 26.843362 29.673405

  censored <- x < 18
  x[censored] <- 18

  sum(censored)
#> [1] 7
  #[1] 7

  dev.new()
  cdfCompareCensored(x, censored)

  # Clean up
  #---------
  rm(x, censored)

  #==========

  # Example 15-1 of USEPA (2009, page 15-10) gives an example of
  # computing plotting positions based on censored manganese
  # concentrations (ppb) in groundwater collected at 5 monitoring
  # wells.  The data for this example are stored in
  # EPA.09.Ex.15.1.manganese.df.  Here we will compare the empirical
  # cdf based on Kaplan-Meier plotting positions or Michael-Schucany
  # plotting positions with various assumed distributions
  # (based on estimating the parameters of these distributions):
  # 1) normal distribution
  # 2) lognormal distribution
  # 3) gamma distribution

  # First look at the data:
  #------------------------

  EPA.09.Ex.15.1.manganese.df
#>    Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
#> 1       1 Well.1                 <5           5.0     TRUE
#> 2       2 Well.1               12.1          12.1    FALSE
#> 3       3 Well.1               16.9          16.9    FALSE
#> 4       4 Well.1               21.6          21.6    FALSE
#> 5       5 Well.1                 <2           2.0     TRUE
#> 6       1 Well.2                 <5           5.0     TRUE
#> 7       2 Well.2                7.7           7.7    FALSE
#> 8       3 Well.2               53.6          53.6    FALSE
#> 9       4 Well.2                9.5           9.5    FALSE
#> 10      5 Well.2               45.9          45.9    FALSE
#> 11      1 Well.3                 <5           5.0     TRUE
#> 12      2 Well.3                5.3           5.3    FALSE
#> 13      3 Well.3               12.6          12.6    FALSE
#> 14      4 Well.3              106.3         106.3    FALSE
#> 15      5 Well.3               34.5          34.5    FALSE
#> 16      1 Well.4                6.3           6.3    FALSE
#> 17      2 Well.4               11.9          11.9    FALSE
#> 18      3 Well.4                 10          10.0    FALSE
#> 19      4 Well.4                 <2           2.0     TRUE
#> 20      5 Well.4               77.2          77.2    FALSE
#> 21      1 Well.5               17.9          17.9    FALSE
#> 22      2 Well.5               22.7          22.7    FALSE
#> 23      3 Well.5                3.3           3.3    FALSE
#> 24      4 Well.5                8.4           8.4    FALSE
#> 25      5 Well.5                 <2           2.0     TRUE
  #   Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
  #1       1 Well.1                 <5           5.0     TRUE
  #2       2 Well.1               12.1          12.1    FALSE
  #3       3 Well.1               16.9          16.9    FALSE
  #4       4 Well.1               21.6          21.6    FALSE
  #5       5 Well.1                 <2           2.0     TRUE
  #...
  #21      1 Well.5               17.9          17.9    FALSE
  #22      2 Well.5               22.7          22.7    FALSE
  #23      3 Well.5                3.3           3.3    FALSE
  #24      4 Well.5                8.4           8.4    FALSE
  #25      5 Well.5                 <2           2.0     TRUE

  longToWide(EPA.09.Ex.15.1.manganese.df,
    "Manganese.Orig.ppb", "Sample", "Well",
    paste.row.name = TRUE)
#>          Well.1 Well.2 Well.3 Well.4 Well.5
#> Sample.1     <5     <5     <5    6.3   17.9
#> Sample.2   12.1    7.7    5.3   11.9   22.7
#> Sample.3   16.9   53.6   12.6     10    3.3
#> Sample.4   21.6    9.5  106.3     <2    8.4
#> Sample.5     <2   45.9   34.5   77.2     <2

  #         Well.1 Well.2 Well.3 Well.4 Well.5
  #Sample.1     <5     <5     <5    6.3   17.9
  #Sample.2   12.1    7.7    5.3   11.9   22.7
  #Sample.3   16.9   53.6   12.6     10    3.3
  #Sample.4   21.6    9.5  106.3     <2    8.4
  #Sample.5     <2   45.9   34.5   77.2     <2


  # Assume a normal distribution
  #-----------------------------

  # Michael-Schucany plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored))

  # Kaplan-Meier plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored,
      prob.method = "kaplan-meier"))


  # Assume a lognormal distribution
  #--------------------------------

  # Michael-Schucany plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm"))

  # Kaplan-Meier plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm",
      prob.method = "kaplan-meier"))


  # Assume a gamma distribution
  #----------------------------

  # Michael-Schucany plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma"))

  # Kaplan-Meier plotting positions:
  dev.new()
  with(EPA.09.Ex.15.1.manganese.df,
    cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma",
      prob.method = "kaplan-meier"))

  # Clean up
  #---------
  graphics.off()

  #==========

  # Compare the distributions of copper and zinc between the Alluvial Fan Zone
  # and the Basin-Trough Zone using the data of Millard and Deverel (1988).
  # The data are stored in Millard.Deverel.88.df.

  Millard.Deverel.88.df
#>     Cu.orig Cu Cu.censored Zn.orig  Zn Zn.censored         Zone Location
#> 1       < 1  1        TRUE     <10  10        TRUE Alluvial.Fan        1
#> 2       < 1  1        TRUE       9   9       FALSE Alluvial.Fan        2
#> 3         3  3       FALSE      NA  NA       FALSE Alluvial.Fan        3
#> 4         3  3       FALSE       5   5       FALSE Alluvial.Fan        4
#> 5         5  5       FALSE      18  18       FALSE Alluvial.Fan        5
#> 6         1  1       FALSE     <10  10        TRUE Alluvial.Fan        6
#> 7         4  4       FALSE      12  12       FALSE Alluvial.Fan        7
#> 8         4  4       FALSE      10  10       FALSE Alluvial.Fan        8
#> 9         2  2       FALSE      11  11       FALSE Alluvial.Fan        9
#> 10        2  2       FALSE      11  11       FALSE Alluvial.Fan       10
#> 11        1  1       FALSE      19  19       FALSE Alluvial.Fan       11
#> 12        2  2       FALSE       8   8       FALSE Alluvial.Fan       12
#> 13      < 5  5        TRUE     < 3   3        TRUE Alluvial.Fan       13
#> 14       11 11       FALSE     <10  10        TRUE Alluvial.Fan       14
#> 15      < 1  1        TRUE     <10  10        TRUE Alluvial.Fan       15
#> 16        2  2       FALSE      10  10       FALSE Alluvial.Fan       16
#> 17        2  2       FALSE      10  10       FALSE Alluvial.Fan       17
#> 18        2  2       FALSE      10  10       FALSE Alluvial.Fan       18
#> 19        2  2       FALSE      10  10       FALSE Alluvial.Fan       19
#> 20      <20 20        TRUE     <10  10        TRUE Alluvial.Fan       20
#> 21        2  2       FALSE      10  10       FALSE Alluvial.Fan       21
#> 22        2  2       FALSE     <10  10        TRUE Alluvial.Fan       22
#> 23        3  3       FALSE      10  10       FALSE Alluvial.Fan       23
#> 24        3  3       FALSE     <10  10        TRUE Alluvial.Fan       24
#> 25       NA NA       FALSE      10  10       FALSE Alluvial.Fan       25
#> 26      <20 20        TRUE     <10  10        TRUE Alluvial.Fan       26
#> 27      <10 10        TRUE      10  10       FALSE Alluvial.Fan       27
#> 28        7  7       FALSE      10  10       FALSE Alluvial.Fan       28
#> 29        5  5       FALSE      20  20       FALSE Alluvial.Fan       29
#> 30        2  2       FALSE      20  20       FALSE Alluvial.Fan       30
#> 31        2  2       FALSE     <10  10        TRUE Alluvial.Fan       31
#> 32      <10 10        TRUE      20  20       FALSE Alluvial.Fan       32
#> 33        7  7       FALSE      20  20       FALSE Alluvial.Fan       33
#> 34       12 12       FALSE      20  20       FALSE Alluvial.Fan       34
#> 35      < 1  1        TRUE     <10  10        TRUE Alluvial.Fan       35
#> 36       20 20       FALSE      10  10       FALSE Alluvial.Fan       36
#> 37       NA NA       FALSE      20  20       FALSE Alluvial.Fan       37
#> 38       NA NA       FALSE     620 620       FALSE Alluvial.Fan       38
#> 39       16 16       FALSE      40  40       FALSE Alluvial.Fan       39
#> 40      < 5  5        TRUE      50  50       FALSE Alluvial.Fan       40
#> 41        1  1       FALSE      33  33       FALSE Alluvial.Fan       41
#> 42        2  2       FALSE      10  10       FALSE Alluvial.Fan       42
#> 43      < 5  5        TRUE      20  20       FALSE Alluvial.Fan       43
#> 44        3  3       FALSE      10  10       FALSE Alluvial.Fan       44
#> 45        2  2       FALSE      10  10       FALSE Alluvial.Fan       45
#> 46        8  8       FALSE      10  10       FALSE Alluvial.Fan       46
#> 47        7  7       FALSE      30  30       FALSE Alluvial.Fan       47
#> 48        5  5       FALSE      20  20       FALSE Alluvial.Fan       48
#> 49      < 5  5        TRUE      10  10       FALSE Alluvial.Fan       49
#> 50        2  2       FALSE      20  20       FALSE Alluvial.Fan       50
#> 51      <10 10        TRUE      20  20       FALSE Alluvial.Fan       51
#> 52      < 5  5        TRUE      20  20       FALSE Alluvial.Fan       52
#> 53      < 5  5        TRUE     <10  10        TRUE Alluvial.Fan       53
#> 54        2  2       FALSE      20  20       FALSE Alluvial.Fan       54
#> 55       10 10       FALSE      23  23       FALSE Alluvial.Fan       55
#> 56        2  2       FALSE      17  17       FALSE Alluvial.Fan       56
#> 57        4  4       FALSE      10  10       FALSE Alluvial.Fan       57
#> 58      < 5  5        TRUE     <10  10        TRUE Alluvial.Fan       58
#> 59        2  2       FALSE      10  10       FALSE Alluvial.Fan       59
#> 60        3  3       FALSE      20  20       FALSE Alluvial.Fan       60
#> 61        9  9       FALSE      29  29       FALSE Alluvial.Fan       61
#> 62      < 5  5        TRUE      20  20       FALSE Alluvial.Fan       62
#> 63        2  2       FALSE     <10  10        TRUE Alluvial.Fan       63
#> 64        2  2       FALSE      10  10       FALSE Alluvial.Fan       64
#> 65        2  2       FALSE     <10  10        TRUE Alluvial.Fan       65
#> 66        2  2       FALSE      10  10       FALSE Alluvial.Fan       66
#> 67        1  1       FALSE       7   7       FALSE Alluvial.Fan       67
#> 68        1  1       FALSE     <10  10        TRUE Alluvial.Fan       68
#> 69        2  2       FALSE      20  20       FALSE Basin.Trough        1
#> 70        2  2       FALSE      10  10       FALSE Basin.Trough        2
#> 71       12 12       FALSE      60  60       FALSE Basin.Trough        3
#> 72        2  2       FALSE      20  20       FALSE Basin.Trough        4
#> 73        1  1       FALSE      12  12       FALSE Basin.Trough        5
#> 74      <10 10        TRUE       8   8       FALSE Basin.Trough        6
#> 75      <10 10        TRUE     <10  10        TRUE Basin.Trough        7
#> 76        4  4       FALSE      14  14       FALSE Basin.Trough        8
#> 77      <10 10        TRUE     <10  10        TRUE Basin.Trough        9
#> 78      < 1  1        TRUE      17  17       FALSE Basin.Trough       10
#> 79        1  1       FALSE     < 3   3        TRUE Basin.Trough       11
#> 80      < 2  2        TRUE      11  11       FALSE Basin.Trough       12
#> 81      < 2  2        TRUE       5   5       FALSE Basin.Trough       13
#> 82        1  1       FALSE      12  12       FALSE Basin.Trough       14
#> 83        2  2       FALSE       4   4       FALSE Basin.Trough       15
#> 84      <10 10        TRUE       3   3       FALSE Basin.Trough       16
#> 85        3  3       FALSE       6   6       FALSE Basin.Trough       17
#> 86      < 1  1        TRUE       3   3       FALSE Basin.Trough       18
#> 87        1  1       FALSE      15  15       FALSE Basin.Trough       19
#> 88        1  1       FALSE      13  13       FALSE Basin.Trough       20
#> 89        3  3       FALSE       4   4       FALSE Basin.Trough       21
#> 90      < 5  5        TRUE      20  20       FALSE Basin.Trough       22
#> 91       NA NA       FALSE      20  20       FALSE Basin.Trough       23
#> 92       17 17       FALSE      70  70       FALSE Basin.Trough       24
#> 93       23 23       FALSE      60  60       FALSE Basin.Trough       25
#> 94        9  9       FALSE      40  40       FALSE Basin.Trough       26
#> 95        9  9       FALSE      30  30       FALSE Basin.Trough       27
#> 96        3  3       FALSE      40  40       FALSE Basin.Trough       28
#> 97        3  3       FALSE      17  17       FALSE Basin.Trough       29
#> 98      <15 15        TRUE      10  10       FALSE Basin.Trough       30
#> 99      < 5  5        TRUE      20  20       FALSE Basin.Trough       31
#> 100       4  4       FALSE      20  20       FALSE Basin.Trough       32
#> 101     < 5  5        TRUE       5   5       FALSE Basin.Trough       33
#> 102     < 5  5        TRUE      10  10       FALSE Basin.Trough       34
#> 103     < 5  5        TRUE      50  50       FALSE Basin.Trough       35
#> 104       4  4       FALSE      30  30       FALSE Basin.Trough       36
#> 105       8  8       FALSE      25  25       FALSE Basin.Trough       37
#> 106       1  1       FALSE     <10  10        TRUE Basin.Trough       38
#> 107      15 15       FALSE      10  10       FALSE Basin.Trough       39
#> 108       3  3       FALSE      40  40       FALSE Basin.Trough       40
#> 109       3  3       FALSE      20  20       FALSE Basin.Trough       41
#> 110       1  1       FALSE      10  10       FALSE Basin.Trough       42
#> 111       6  6       FALSE      20  20       FALSE Basin.Trough       43
#> 112       3  3       FALSE      20  20       FALSE Basin.Trough       44
#> 113       6  6       FALSE      30  30       FALSE Basin.Trough       45
#> 114       3  3       FALSE      20  20       FALSE Basin.Trough       46
#> 115       4  4       FALSE      30  30       FALSE Basin.Trough       47
#> 116       5  5       FALSE      50  50       FALSE Basin.Trough       48
#> 117      14 14       FALSE      90  90       FALSE Basin.Trough       49
#> 118       4  4       FALSE      20  20       FALSE Basin.Trough       50
  #    Cu.orig Cu Cu.censored Zn.orig  Zn Zn.censored         Zone Location
  #1       < 1  1        TRUE     <10  10        TRUE Alluvial.Fan        1
  #2       < 1  1        TRUE       9   9       FALSE Alluvial.Fan        2
  #3         3  3       FALSE      NA  NA       FALSE Alluvial.Fan        3
  #.
  #.
  #.
  #116       5  5       FALSE      50  50       FALSE Basin.Trough       48
  #117      14 14       FALSE      90  90       FALSE Basin.Trough       49
  #118       4  4       FALSE      20  20       FALSE Basin.Trough       50

  Cu.AF <- with(Millard.Deverel.88.df,
    Cu[Zone == "Alluvial.Fan"])

  Cu.AF.cen <- with(Millard.Deverel.88.df,
    Cu.censored[Zone == "Alluvial.Fan"])

  Cu.BT <- with(Millard.Deverel.88.df,
    Cu[Zone == "Basin.Trough"])

  Cu.BT.cen <- with(Millard.Deverel.88.df,
    Cu.censored[Zone == "Basin.Trough"])

  Zn.AF <- with(Millard.Deverel.88.df,
    Zn[Zone == "Alluvial.Fan"])

  Zn.AF.cen <- with(Millard.Deverel.88.df,
    Zn.censored[Zone == "Alluvial.Fan"])

  Zn.BT <- with(Millard.Deverel.88.df,
    Zn[Zone == "Basin.Trough"])

  Zn.BT.cen <- with(Millard.Deverel.88.df,
    Zn.censored[Zone == "Basin.Trough"])


  # First compare the copper concentrations
  #----------------------------------------
  dev.new()
  cdfCompareCensored(x = Cu.AF, censored = Cu.AF.cen,
    y = Cu.BT, y.censored = Cu.BT.cen)
#> Warning: 3 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.
#> Warning: 1 observations with NA/NaN/Inf in 'y' and/or 'y.censored' removed.


  # Now compare the zinc concentrations
  #------------------------------------
  dev.new()
  cdfCompareCensored(x = Zn.AF, censored = Zn.AF.cen,
    y = Zn.BT, y.censored = Zn.BT.cen)
#> Warning: 1 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.


  # Compare the Zinc concentrations again, but delete
  # the one "outlier".
  #--------------------------------------------------

  summaryStats(Zn.AF)
#>        N    Mean      SD Median Min Max NA's N.Total
#> Zn.AF 67 23.5075 74.4192     10   3 620    1      68
#> attr(,"class")
#> [1] "summaryStats"
#> attr(,"stats.in.rows")
#> [1] FALSE
#> attr(,"drop0trailing")
#> [1] TRUE
  #       N    Mean      SD Median Min Max NA's N.Total
  #Zn.AF 67 23.5075 74.4192     10   3 620    1      68

  summaryStats(Zn.BT)
#>        N  Mean      SD Median Min Max
#> Zn.BT 50 21.94 18.7044   18.5   3  90
#> attr(,"class")
#> [1] "summaryStats"
#> attr(,"stats.in.rows")
#> [1] FALSE
#> attr(,"drop0trailing")
#> [1] TRUE
  #       N  Mean      SD Median Min Max
  #Zn.BT 50 21.94 18.7044   18.5   3  90

  which(Zn.AF == 620)
#> [1] 38
  #[1] 38

  summaryStats(Zn.AF[-38])
#>             N    Mean     SD Median Min Max NA's N.Total
#> Zn.AF[-38] 66 14.4697 8.1604     10   3  50    1      67
#> attr(,"class")
#> [1] "summaryStats"
#> attr(,"stats.in.rows")
#> [1] FALSE
#> attr(,"drop0trailing")
#> [1] TRUE
  #            N    Mean     SD Median Min Max NA's N.Total
  #Zn.AF[-38] 66 14.4697 8.1604     10   3  50    1      67


  dev.new()
  cdfCompareCensored(x = Zn.AF[-38], censored = Zn.AF.cen[-38],
    y = Zn.BT, y.censored = Zn.BT.cen)
#> Warning: 1 observations with NA/NaN/Inf in 'x' and/or 'censored' removed.

  #----------

  # Clean up
  #---------

  rm(Cu.AF, Cu.AF.cen, Cu.BT, Cu.BT.cen,
     Zn.AF, Zn.AF.cen, Zn.BT, Zn.BT.cen)
  graphics.off()