Empirical.RdDensity, distribution function, quantile function, and random generation for the empirical distribution based on a set of observations
vector of quantiles.
vector of quantiles.
vector of probabilities between 0 and 1.
sample size.  If length(n) is larger than 1, then length(n)
  random values are returned.
numeric vector of observations.  Missing (NA), undefined (NaN), and
  infinite (Inf, -Inf) values are allowed but will be removed.
logical scalar indicating whether the assumed parent distribution of x is
  discrete (discrete=TRUE) or continuous (discrete=FALSE).  The
  default value is FALSE.
list with arguments to the R density function.  The default value is
  NULL.  (See the help file for density
  for more information on the arguments to density.)  The argument
  density.arg.list is ignored if discrete=TRUE.
character string indicating what method to use to compute the empirical
  probabilities.  Possible values are "emp.probs" (empirical probabilities,
  default if discrete=TRUE) and "plot.pos" (plotting positions,
  default if discrete=FALSE).  See the DETAILS section for more explanation.
numeric scalar between 0 and 1 containing the value of the plotting position
  constant.  The default value is plot.pos.con=0.375.  See the DETAILS
  section for more information. This argument is ignored if
  prob.method="emp.probs".
Let \(x_1, x_2, \ldots, x_n\) denote a random sample of n observations
  from some unknown probability distribution (i.e., the elements of the argument
  obs), and let \(x_{(i)}\) denote the \(i^{th}\) order statistic, that is,
  the \(i^{th}\) largest observation, for \(i = 1, 2, \ldots, n\).
Estimating Density 
  The function demp computes the empirical probability density function.  If
  the observations are assumed to come from a discrete distribution, the probability
  density (mass) function is estimated by:
  $$\hat{f}(x) = \widehat{Pr}(X = x) = \frac{\sum^n_{i=1} I_{[x]}(x_i)}{n}$$
  where \(I\) is the indicator function:
| \(I_{[x]}(y) =\) | \(1\) | if \(y = x\), | 
| \(0\) | if \(y \ne x\) | 
That is, the estimated probability of observing the value \(x\) is simply the observed proportion of observations equal to \(x\).
If the observations are assumed to come from a continuous distribution, the
  function demp calls the R function density to compute the
  estimated density based on the values specified in the argument obs,
  and then uses linear interpolation to estimate the density at the values
  specified in the argument x.  See the R help file for
  density for more information on how the empirical density is
  computed in the continuous case.
Estimating Probabilities 
  The function pemp computes the estimated cumulative distribution function
  (cdf), also called the empirical cdf (ecdf).  If the observations are assumed to
  come from a discrete distribution, the value of the cdf evaluated at the \(i^{th}\)
  order statistic is usually estimated by:
  $$\hat{F}[x_{(i)}] = \widehat{Pr}(X \le x_{(i)}) = \hat{p}_i =
    \frac{\sum^n_{j=1} I_{(-\infty, x_{(i)}]}(x_j)}{n}$$
  where:
| \(I_{(-\infty, x]}(y) =\) | \(1\) | if \(y \le x\), | 
| \(0\) | if \(y > x\) | 
(D'Agostino, 1986a).  That is, the estimated value of the cdf at the \(i^{th}\)
  order statistic is simply the observed proportion of observations less than or
  equal to the \(i^{th}\) order statistic.  This estimator is sometimes called the
  “empirical probabilities” estimator and is intuitively appealing.
  The function pemp uses the above equations to compute the empirical cdf when
  prob.method="emp.probs".
For any general value of \(x\), when the observations are assumed to come from a discrete distribution, the value of the cdf is estimated by:
| \(\hat{F}(x) =\) | \(0\) | if \(x < x_{(1)}\), | 
| \(\hat{p}_i\) | if \(x_{(i)} \le x < x_{(i+1)}\), | |
| \(1\) | if \(x \ge x_{(n)}\) | 
The function pemp uses the above equation when discrete=TRUE.
If the observations are assumed to come from a continuous distribution, the value
  of the cdf evaluated at the \(i^{th}\) order statistic is usually estimated by:
  $$\hat{F}[x_{(i)}] = \hat{p}_i = \frac{i - a}{n - 2a + 1}$$
  where \(a\) denotes the plotting position constant and \(0 \le a \le 1\)
  (Cleveland, 1993, p.18; D'Agostino, 1986a, pp.8,25).  The estimators defined by
  the above equation are called plotting positions and are used to construct
  probability plots.  The function pemp uses the above equation
  when prob.method="plot.pos".
For any general value of \(x\), the value of the cdf is estimated by linear interpolation:
| \(\hat{F}(x) =\) | \(\hat{p}_1\) | if \(x < x_{(1)}\), | 
| \((1 - r)\hat{p}_i + r\hat{p}_{i+1}\) | if \(x_{(i)} \le x < x_{(i+1)}\), | |
| \(\hat{p}_n\) | if \(x \ge x_{(n)}\) | 
where
  $$r = \frac{x - x_{(i)}}{x_{(i+1)} - x_{(i)}}$$
  (Chambers et al., 1983).  The function pemp uses the above two equations
  when discrete=FALSE.
Estimating Quantiles 
  The function qemp computes the estimated quantiles based on the observed
  data.  If the observations are assumed to come from a discrete distribution, the
  \(p^{th}\) quantile is usually estimated by:
| \(\hat{x}_p =\) | \(x_{(1)}\) | if \(p \le \hat{p}_1\), | 
| \(x_{(i)}\) | if \(\hat{p}_{i-1} < p \le \hat{p}_i\), | |
| \(x_n\) | if \(p > \hat{p}_n\) | 
The function qemp uses the above equation when discrete=TRUE.
If the observations are assumed to come from a continuous distribution, the \(p^{th}\) quantile is usually estimated by linear interpolation:
| \(\hat{x}_p =\) | \(x_{(1)}\) | if \(p \le \hat{p}_1\), | 
| \((1 - r)x_{(i-1)} + rx_{(i)}\) | if \(\hat{p}_{i-1} < p \le \hat{p}_i\), | |
| \(x_n\) | if \(p > \hat{p}_n\) | 
where
  $$r = \frac{p - \hat{p}_{i-1}}{\hat{p}_i - \hat{p}_{i-1}}$$
  The function qemp uses the above two equations when discrete=FALSE.
Generating Random Numbers From the Empirical Distribution 
  The function remp simply calls the R function sample to
  sample the elements of obs with replacement.
density (demp), probability (pemp), quantile (qemp), or
  random sample (remp) for the empirical distribution based on the data
  contained in the vector obs.
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11–16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7–62.
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley and Sons, New York.
Sheather, S. J. and Jones M. C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statististical Society B, 683–690.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
Wegman, E.J. (1972). Nonparametric Probability Density Estimation. Technometrics 14, 533-546.
The function demp let's you perform nonparametric density estimation.
  The function pemp computes the value of the empirical cumulative
  distribution function (ecdf) for user-specified quantiles.  The ecdf is a
  nonparametric estimate of the true cdf (see ecdfPlot).  The
  function qemp computes nonparametric estimates of quantiles
  (see the help files for eqnpar and quantile).
  The function remp let's you sample a set of observations with replacement,
  which is often done while bootstrapping or performing some other kind of
  Monte Carlo simulation.
  # Create a set of 100 observations from a gamma distribution with 
  # parameters shape=4 and scale=5. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)
  set.seed(3) 
  obs <- rgamma(100, shape=4, scale=5)
  # Now plot the empirical distribution (with a histogram) and the true distribution:
  dev.new()
  hist(obs, col = "cyan", xlim = c(0, 65), freq = FALSE, 
    ylab = "Relative Frequency") 
  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE) 
  box()
  # Now plot the empirical distribution (based on demp) with the 
  # true distribution:
  x <- qemp(p = seq(0, 1, len = 100), obs = obs) 
  y <- demp(x, obs) 
  dev.new()
  plot(x, y, xlim = c(0, 65), type = "n", 
    xlab = "Value of Random Variable", 
    ylab = "Relative Frequency") 
  lines(x, y, lwd = 2, col = "cyan") 
  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)
  # Alternatively, you can create the above plot with the function 
  # epdfPlot:
  dev.new()
  epdfPlot(obs, xlim = c(0, 65), epdf.col = "cyan", 
    xlab = "Value of Random Variable", 
    main = "Empirical and Theoretical PDFs")
  pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)
  
  # Clean Up
  #---------
  rm(obs, x, y)