Estimate Quantiles of a Poisson Distribution

Estimate quantiles of an Poisson distribution, and optionally contruct a confidence interval for a quantile.

eqpois(x, p = 0.5, method = "mle/mme/mvue", ci = FALSE, ci.method = "exact",
    ci.type = "two-sided", conf.level = 0.95, digits = 0)

Arguments

x: a numeric vector of observations, or an object resulting from a call to an estimating function that assumes an Poisson distribution (e.g., epois). If ci=TRUE then x must be a numeric vector of observations. If x is a numeric vector, missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
p: numeric vector of probabilities for which quantiles will be estimated. All values of p must be between 0 and 1. When ci=TRUE, p must be a scalar. The default value is p=0.5.
method: character string specifying the method to use to estimate the mean. Currently the only possible value is "mle/mme/mvue" (maximum likelihood/method of moments/minimum variance unbiased; the default). See the DETAILS section of the help file for epois for more information.
ci: logical scalar indicating whether to compute a confidence interval for the specified quantile. The default value is ci=FALSE.
ci.method: character string indicating what method to use to construct the confidence interval for the quantile. The only possible value is "exact" (exact method; the default). See the DETAILS section for more information.
ci.type: character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.
conf.level: a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.
digits: an integer indicating the number of decimal places to round to when printing out the value of 100*p. The default value is digits=0.

Details

The function eqpois returns estimated quantiles as well as the estimate of the mean parameter.

Estimation
Let $X$ denote a Poisson random variable with parameter lambda=$\lambda$. Let $x_{p|\lambda}$ denote the $p$'th quantile of the distribution. That is, $$Pr(X < x_{p|\lambda}) \le p \le Pr(X \le x_{p|\lambda}) \;\;\;\; (1)$$ Note that due to the discrete nature of the Poisson distribution, there will be several values of $p$ associated with one value of $X$. For example, for $\lambda=2$, the value $1$ is the $p$'th quantile for any value of $p$ between 0.14 and 0.406.

Let $\underline{x}$ denote a vector of $n$ observations from a Poisson distribution with parameter lambda=$\lambda$. The $p$'th quantile is estimated as the $p$'th quantile from a Poisson distribution assuming the true value of $\lambda$ is equal to the estimated value of $\lambda$. That is: $$\hat{x}_{p|\lambda} = x_{p|\lambda=\hat{\lambda}} \;\;\;\; (2)$$ where $$\hat{\lambda} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\; (3)$$ Because the estimator in equation (3) is the maximum likelihood estimator of $\lambda$ (see the help file for epois), the estimated quantile is the maximum likelihood estimator.

Quantiles are estimated by 1) estimating the mean parameter by calling epois, and then 2) calling the function qpois and using the estimated value for the mean parameter.

Confidence Intervals
It can be shown (e.g., Conover, 1980, pp.119-121) that an upper confidence interval for the $p$'th quantile with confidence level $100(1-\alpha)\%$ is equivalent to an upper $\beta$-content tolerance interval with coverage $100p\%$ and confidence level $100(1-\alpha)\%$. Also, a lower confidence interval for the $p$'th quantile with confidence level $100(1-\alpha)\%$ is equivalent to a lower $\beta$-content tolerance interval with coverage $100(1-p)\%$ and confidence level $100(1-\alpha)\%$.

Thus, based on the theory of tolerance intervals for a Poisson distribution (see tolIntPois), if ci.type="upper", a one-sided upper $100(1-\alpha)\%$ confidence interval for the $p$'th quantile is constructed as: $$[0, x_{p|\lambda=UCL}] \;\;\;\; (4)$$ where $UCL$ denotes the upper $100(1-\alpha)\%$ confidence limit for $\lambda$ (see the help file for epois for information on how $UCL$ is computed).

Similarly, if ci.type="lower", a one-sided lower $100(1-\alpha)\%$ confidence interval for the $p$'th quantile is constructed as: $$[x_{p|\lambda=LCL}, \infty] \;\;\;\; (5)$$ where $LCL$ denotes the lower $100(1-\alpha)\%$ confidence limit for $\lambda$ (see the help file for epois for information on how $LCL$ is computed).

Finally, if ci.type="two-sided", a two-sided $100(1-\alpha)\%$ confidence interval for the $p$'th quantile is constructed as: $$[x_{p|\lambda=LCL}, x_{p|\lambda=UCL}] \;\;\;\; (6)$$ where $LCL$ and $UCL$ denote the two-sided lower and upper $100(1-\alpha)\%$ confidence limits for $\lambda$ (see the help file for epois for information on how $LCL$ and $UCL$ are computed).

Value

If x is a numeric vector, eqpois returns a list of class "estimate" containing the estimated quantile(s) and other information. See estimate.object for details.

If x is the result of calling an estimation function, eqpois returns a list whose class is the same as x. The list contains the same components as x, as well as components called quantiles and quantile.method.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.

Berthouex, P.M., and I. Hau. (1991). Difficulties Related to Using Extreme Percentiles for Water Quality Regulations. Research Journal of the Water Pollution Control Federation 63(6), 873–879.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, Chapter 3.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Gibbons, R.D. (1987b). Statistical Models for the Analysis of Volatile Organic Compounds in Waste Disposal Sites. Ground Water 25, 572-580.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 4.

Pearson, E.S., and H.O. Hartley, eds. (1970). Biometrika Tables for Statisticians, Volume 1. Cambridge Universtiy Press, New York, p.81.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Percentiles are sometimes used in environmental standards and regulations. For example, Berthouex and Brown (2002, p.71) state:

The U.S. EPA has specifications for air quality monitoring that are, in effect, percentile limitations. ... The U.S. EPA has provided guidance for setting aquatic standards on toxic chemicals that require estimating 99th percentiles and using this statistic to make important decisions about monitoring and compliance. They have also used the 99th percentile to establish maximum daily limits for industrial effluents (e.g., pulp and paper).

Given the importance of these quantities, it is essential to characterize the amount of uncertainty associated with the estimates of these quantities. This is done with confidence intervals.

The Poisson distribution is named after Poisson, who derived this distribution as the limiting distribution of the binomial distribution with parameters size=$N$ and prob=$p$, where $N$ tends to infinity, $p$ tends to 0, and $Np$ stays constant.

In this context, the Poisson distribution was used by Bortkiewicz (1898) to model the number of deaths (per annum) from kicks by horses in Prussian Army Corps. In this case, $p$, the probability of death from this cause, was small, but the number of soldiers exposed to this risk, $N$, was large.

The Poisson distribution has been applied in a variety of fields, including quality control (modeling number of defects produced in a process), ecology (number of organisms per unit area), and queueing theory. Gibbons (1987b) used the Poisson distribution to model the number of detected compounds per scan of the 32 volatile organic priority pollutants (VOC), and also to model the distribution of chemical concentration (in ppb).

Examples

  # Generate 20 observations from a Poisson distribution with parameter
  # lambda=2.  The true 90'th percentile of this distribution is 4 (actually,
  # 4 is the p'th quantile for any value of p between 0.86 and 0.947).
  # Here we will use eqpois to estimate the 90'th percentile and construct a
  # two-sided 95% confidence interval for this percentile.
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250)
  dat <- rpois(20, lambda = 2)
  eqpois(dat, p = 0.9, ci = TRUE)
#> $distribution
#> [1] "Poisson"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#> lambda 
#>    1.8 
#> 
#> $n.param.est
#> [1] 1
#> 
#> $method
#> [1] "mle/mme/mvue"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $quantiles
#> 90'th %ile 
#>          4 
#> 
#> $quantile.method
#> [1] "mle"
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "90'th %ile"
#> 
#> $limits
#> LCL UCL 
#>   3   5 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Exact"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $sample.size
#> [1] 20
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Poisson
  #
  #Estimated Parameter(s):          lambda = 1.8
  #
  #Estimation Method:               mle/mme/mvue
  #
  #Estimated Quantile(s):           90'th %ile = 4
  #
  #Quantile Estimation Method:      mle
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         90'th %ile
  #
  #Confidence Interval Method:      Exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 3
  #                                 UCL = 5

  # Clean up
  #---------
  rm(dat)