Estimate Quantiles of a Gamma Distribution

Estimate quantiles of a gamma distribution, and optionally construct a confidence interval for a quantile.

eqgamma(x, p = 0.5, method = "mle", ci = FALSE, 
    ci.type = "two-sided", conf.level = 0.95, 
    normal.approx.transform = "kulkarni.powar", digits = 0)

  eqgammaAlt(x, p = 0.5, method = "mle", ci = FALSE, 
    ci.type = "two-sided", conf.level = 0.95, 
    normal.approx.transform = "kulkarni.powar", digits = 0)

Arguments

x: a numeric vector of observations, or an object resulting from a call to an estimating function that assumes a gamma distribution (e.g., egamma or egammaAlt). If ci=TRUE then x must be a numeric vector of observations. If x is a numeric vector, missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
p: numeric vector of probabilities for which quantiles will be estimated. All values of p must be between 0 and 1. When ci=TRUE, p must be a scalar. The default value is p=0.5.
method: character string specifying the method to use to estimate the shape and scale parameters of the distribution. The possible values are "mle" (maximum likelihood; the default), "bcmle" (bias-corrected mle), "mme" (method of moments), and "mmue" (method of moments based on the unbiased estimator of variance). See the DETAILS section of the help file for egamma for more information.
ci: logical scalar indicating whether to compute a confidence interval for the quantile. The default value is ci=FALSE.
ci.type: character string indicating what kind of confidence interval for the quantile to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.
conf.level: a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.
normal.approx.transform: character string indicating which power transformation to use. Possible values are "kulkarni.powar" (the default), "cube.root", and
"fourth.root". See the DETAILS section for more informaiton. This argument is ignored if ci=FALSE.
digits: an integer indicating the number of decimal places to round to when printing out the value of 100*p. The default value is digits=0.

Details

The function eqgamma returns estimated quantiles as well as estimates of the shape and scale parameters. The function eqgammaAlt returns estimated quantiles as well as estimates of the mean and coefficient of variation.

Quantiles are estimated by 1) estimating the shape and scale parameters by calling egamma, and then 2) calling the function qgamma and using the estimated values for shape and scale.

The confidence interval for a quantile is computed by:

using a power transformation on the original data to induce approximate normality,
using eqnorm to compute the confidence interval, and then
back-transforming the interval to create a confidence interval on the original scale.

This is similar to what is done to create tolerance intervals for a gamma distribuiton (Krishnamoorthy et al., 2008), and there is a one-to-one relationship between confidence intervals for a quantile and tolerance intervals (see the DETAILS section of the help file for eqnorm). The value normal.approx.transform="cube.root" uses the cube root transformation suggested by Wilson and Hilferty (1931) and used by Krishnamoorthy et al. (2008) and Singh et al. (2010b), and the value normal.approx.transform="fourth.root" uses the fourth root transformation suggested by Hawkins and Wixley (1986) and used by Singh et al. (2010b). The default value normal.approx.transform="kulkarni.powar" uses the “Optimum Power Normal Approximation Method” of Kulkarni and Powar (2010). The “optimum” power \(r\) is determined by:

\(r = -0.0705 - 0.178 \, shape + 0.475 \, \sqrt{shape}\)	if \(shape \le 1.5\)
\(r = 0.246\)	if \(shape > 1.5\)

where \(shape\) denotes the estimate of the shape parameter. Although Kulkarni and Powar (2010) use the maximum likelihood estimate of shape to determine the power \(r\), for the functions eqgamma and eqgammaAlt the power \(r\) is based on whatever estimate of shape is used
(e.g., method="mle", method="bcmle", etc.).

Value

If x is a numeric vector, eqgamma and eqgammaAlt return a list of class "estimate" containing the estimated quantile(s) and other information. See estimate.object for details.

If x is the result of calling an estimation function, eqgamma and eqgammaAlt return a list whose class is the same as x. The list contains the same components as x, as well as components called quantiles and quantile.method. In addition, if ci=TRUE, the returned list contains a component called interval containing the confidence interval information. If x already has a component called interval, this component is replaced with the confidence interval information.

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Hawkins, D. M., and R.A.J. Wixley. (1986). A Note on the Transformation of Chi-Squared Variables to Normality. The American Statistician, 40, 296–298.

Johnson, N.L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York, Chapter 17.

Krishnamoorthy K., T. Mathew, and S. Mukherjee. (2008). Normal-Based Methods for a Gamma Distribution: Prediction and Tolerance Intervals and Stress-Strength Reliability. Technometrics, 50(1), 69–78.

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Kulkarni, H.V., and S.K. Powar. (2010). A New Method for Interval Estimation of the Mean of the Gamma Distribution. Lifetime Data Analysis, 16, 431–447.

Singh, A., A.K. Singh, and R.J. Iaci. (2002). Estimation of the Exposure Point Concentration Term Using a Gamma Distribution. EPA/600/R-02/084. October 2002. Technology Support Center for Monitoring and Site Characterization, Office of Research and Development, Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Wilson, E.B., and M.M. Hilferty. (1931). The Distribution of Chi-Squares. Proceedings of the National Academy of Sciences, 17, 684–688.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The gamma distribution takes values on the positive real line. Special cases of the gamma are the exponential distribution and the chi-square distributions. Applications of the gamma include life testing, statistical ecology, queuing theory, inventory control, and precipitation processes. A gamma distribution starts to resemble a normal distribution as the shape parameter a tends to infinity.

Some EPA guidance documents (e.g., Singh et al., 2002; Singh et al., 2010a,b) strongly recommend against using a lognormal model for environmental data and recommend trying a gamma distribuiton instead.

Percentiles are sometimes used in environmental standards and regulations. For example, Berthouex and Brown (2002, p.71) note that England has water quality limits based on the 90th and 95th percentiles of monitoring data not exceeding specified levels. They also note that the U.S. EPA has specifications for air quality monitoring, aquatic standards on toxic chemicals, and maximum daily limits for industrial effluents that are all based on percentiles. Given the importance of these quantities, it is essential to characterize the amount of uncertainty associated with the estimates of these quantities. This is done with confidence intervals.

Examples

  # Generate 20 observations from a gamma distribution with parameters 
  # shape=3 and scale=2, then estimate the 90th percentile and create 
  # a one-sided upper 95% confidence interval for that percentile. 
  # (Note: the call to set.seed simply allows you to reproduce this 
  # example.)

  set.seed(250) 
  dat <- rgamma(20, shape = 3, scale = 2) 
  eqgamma(dat, p = 0.9, ci = TRUE, ci.type = "upper")
#> $distribution
#> [1] "Gamma"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>    shape    scale 
#> 2.203862 2.174928 
#> 
#> $method
#> [1] "MLE"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $quantiles
#> 90'th %ile 
#>   9.113446 
#> 
#> $quantile.method
#> [1] "Quantile(s) Based on\n                                 MLE Estimators"
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "90'th %ile"
#> 
#> $limits
#>      LCL      UCL 
#>  0.00000 13.79733 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "Exact using\n                                 Kulkarni & Powar (2010)\n                                 transformation to Normality\n                                 based on mle of 'shape'"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Gamma
  #
  #Estimated Parameter(s):          shape = 2.203862
  #                                 scale = 2.174928
  #
  #Estimation Method:               mle
  #
  #Estimated Quantile(s):           90'th %ile = 9.113446
  #
  #Quantile Estimation Method:      Quantile(s) Based on
  #                                 mle Estimators
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         90'th %ile
  #
  #Confidence Interval Method:      Exact using
  #                                 Kulkarni & Powar (2010)
  #                                 transformation to Normality
  #                                 based on mle of 'shape'
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =  0.00000
  #                                 UCL = 13.79733

  #----------
  # Compare these results with the true 90'th percentile:

  qgamma(p = 0.9, shape = 3, scale = 2)
#> [1] 10.64464
  #[1] 10.64464

  #----------

  # Using the same data as in the previous example, use egammaAlt
  # to estimate the mean and cv based on the bias-corrected 
  # estimate of shape, and use the cube-root transformation to 
  # normality.

  eqgammaAlt(dat, p = 0.9, method = "bcmle", ci = TRUE, 
    ci.type = "upper", normal.approx.transform = "cube.root")
#> $distribution
#> [1] "Gamma"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>      mean        cv 
#> 4.7932408 0.7242165 
#> 
#> $method
#> [1] "Bias-Corrected MLE"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $quantiles
#> 90'th %ile 
#>   9.427999 
#> 
#> $quantile.method
#> [1] "Quantile(s) Based on\n                                 Bias-Corrected MLE"
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "90'th %ile"
#> 
#> $limits
#>      LCL      UCL 
#>  0.00000 12.89643 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "Exact using\n                                 Wilson & Hilferty (1931) cube-root\n                                 transformation to Normality"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 19
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Gamma
  #
  #Estimated Parameter(s):          mean = 4.7932408
  #                                 cv   = 0.7242165
  #
  #Estimation Method:               bcmle of 'shape'
  #
  #Estimated Quantile(s):           90'th %ile = 9.428
  #
  #Quantile Estimation Method:      Quantile(s) Based on
  #                                 bcmle of 'shape'
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         90'th %ile
  #
  #Confidence Interval Method:      Exact using
  #                                 Wilson & Hilferty (1931) cube-root
  #                                 transformation to Normality
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =  0.00000
  #                                 UCL = 12.89643

  #----------

  # Clean up
  rm(dat)
  
  #--------------------------------------------------------------------

  # Example 17-3 of USEPA (2009, p. 17-17) shows how to construct a 
  # beta-content upper tolerance limit with 95% coverage and 
  # 95% confidence  using chrysene data and assuming a lognormal 
  # distribution.  Here we will use the same chrysene data but assume a 
  # gamma distribution.

  # A beta-content upper tolerance limit with 95% coverage and 
  # 95% confidence is equivalent to the 95% upper confidence limit for 
  # the 95th percentile.

  attach(EPA.09.Ex.17.3.chrysene.df)
  Chrysene <- Chrysene.ppb[Well.type == "Background"]
  eqgamma(Chrysene, p = 0.95, ci = TRUE, ci.type = "upper")
#> $distribution
#> [1] "Gamma"
#> 
#> $sample.size
#> [1] 8
#> 
#> $parameters
#>    shape    scale 
#> 2.806929 5.286026 
#> 
#> $method
#> [1] "MLE"
#> 
#> $data.name
#> [1] "Chrysene"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $quantiles
#> 95'th %ile 
#>   31.74348 
#> 
#> $quantile.method
#> [1] "Quantile(s) Based on\n                                 MLE Estimators"
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "95'th %ile"
#> 
#> $limits
#>      LCL      UCL 
#>  0.00000 69.32425 
#> 
#> $type
#> [1] "upper"
#> 
#> $method
#> [1] "Exact using\n                                 Kulkarni & Powar (2010)\n                                 transformation to Normality\n                                 based on mle of 'shape'"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $sample.size
#> [1] 8
#> 
#> $dof
#> [1] 7
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Gamma
  #
  #Estimated Parameter(s):          shape = 2.806929
  #                                 scale = 5.286026
  #
  #Estimation Method:               mle
  #
  #Estimated Quantile(s):           95'th %ile = 31.74348
  #
  #Quantile Estimation Method:      Quantile(s) Based on
  #                                 mle Estimators
  #
  #Data:                            Chrysene
  #
  #Sample Size:                     8
  #
  #Confidence Interval for:         95'th %ile
  #
  #Confidence Interval Method:      Exact using
  #                                 Kulkarni & Powar (2010)
  #                                 transformation to Normality
  #                                 based on mle of 'shape'
  #
  #Confidence Interval Type:        upper
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL =  0.00000
  #                                 UCL = 69.32425

  #----------
  # Clean up

  rm(Chrysene)
  detach("EPA.09.Ex.17.3.chrysene.df")