Estimate Parameters of a Generalized Extreme Value Distribution

Estimate the location, scale and shape parameters of a generalized extreme value distribution, and optionally construct a confidence interval for one of the parameters.

egevd(x, method = "mle", pwme.method = "unbiased", tsoe.method = "med", 
    plot.pos.cons = c(a = 0.35, b = 0), ci = FALSE, ci.parameter = "location", 
    ci.type = "two-sided", ci.method = "normal.approx", information = "observed", 
    conf.level = 0.95)

Arguments

x: numeric vector of observations.
method: character string specifying the method of estimation. Possible values are "mle" (maximum likelihood; the default), "pwme" (probability-weighted moments), and "tsoe" (two-stage order-statistics estimator of Castillo and Hadi (1994)). See the DETAILS section for more information on these estimation methods.
pwme.method: character string specifying what method to use to compute the probability-weighted moments when method="pwme". The possible values are "ubiased" (method based on the U-statistic; the default), or "plotting.position" (method based on the plotting position formula). See the DETAILS section in this help file and the help file for pwMoment for more information. This argument is ignored if method is not equal to "pwme".
tsoe.method: character string specifying the robust function to apply in the second stage of the two-stage order-statistics estimator when method="tsoe". Possible values are "med" (median; the default), and "lms" (least median of squares). See the DETAILS section for more information on these estimation methods. This argument is ignored if method is not equal to "tsoe".
plot.pos.cons: numeric vector of length 2 specifying the constants used in the formula for the plotting positions when method="pwme" and
pwme.method="plotting.position". The default value is
plot.pos.cons=c(a=0.35, b=0). If this vector has a names attribute with the value c("a","b") or c("b","a"), then the elements will be matched by name in the formula for computing the plotting positions. Otherwise, the first element is mapped to the name "a" and the second element to the name "b". See the DETAILS section in this help file and the help file for pwMoment for more information. This argument is used only if method="tsoe", or if both method="pwme" and pwme.method="plotting.position".
ci: logical scalar indicating whether to compute a confidence interval for the location, scale, or shape parameter. The default value is FALSE.
ci.parameter: character string indicating the parameter for which the confidence interval is desired. The possible values are "location" (the default), "scale", or "shape". This argument is ignored if ci=FALSE.
ci.type: character string indicating what kind of confidence interval to compute. The possible values are "two-sided" (the default), "lower", and "upper". This argument is ignored if ci=FALSE.
ci.method: character string indicating what method to use to construct the confidence interval for the location or scale parameter. Currently, the only possible value is "normal.approx" (the default). See the DETAILS section for more information. This argument is ignored if ci=FALSE.
information: character string indicating which kind of Fisher information to use when computing the variance-covariance matrix of the maximum likelihood estimators. The possible values are "observed" (the default) and "expected". See the DETAILS section for more information. This argument is used only when method="mle" and ci=TRUE.
conf.level: a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default value is conf.level=0.95. This argument is ignored if ci=FALSE.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

Let $\underline{x} = (x_1, x_2, \ldots, x_n)$ be a vector of $n$ observations from a generalized extreme value distribution with parameters location=$\eta$, scale=$\theta$, and shape=$\kappa$.

Estimation

Maximum Likelihood Estimation (method="mle")
The log likelihood function is given by: $$L(\eta, \theta, \kappa) = -n \, log(\theta) - (1 - \kappa) \sum^n_{i=1} y_i - \sum^n_{i=1} e^{y_i}$$ where $$y_i = -\frac{1}{\kappa} log[\frac{1 - \kappa(x_i - \eta)}{\theta}]$$ (see, for example, Jenkinson, 1969; Prescott and Walden, 1980; Prescott and Walden, 1983; Hosking, 1985; MacLeod, 1989). The maximum likelihood estimators (MLE's) of $\eta$, $\theta$, and $\kappa$ are those values that maximize the likelihood function, subject to the following constraints: $$\theta > 0$$ $$\kappa \le 1$$ $$x_i < \eta + \frac{\theta}{\kappa} \; if \kappa > 0$$ $$x_i > \eta + \frac{\theta}{\kappa} \; if \kappa < 0$$ Although in theory the value of $\kappa$ may lie anywhere in the interval $(-\infty, \infty)$ (see GEVD), the constraint $\kappa \le 1$ is imposed because when $\kappa > 1$ the likelihood can be made infinite and thus the MLE does not exist (Castillo and Hadi, 1994). Hence, this method of estimation is not valid when the true value of $\kappa$ is larger than 1. Hosking (1985) and Hosking et al. (1985) note that in practice the value of $\kappa$ tends to lie in the interval $-1/2 < \kappa < 1/2$.

The value of $-L$ is minimized using the R function nlminb. Prescott and Walden (1983) give formulas for the gradient and Hessian. Only the gradient is supplied in the call to nlminb. The values of the PWME (see below) are used as the starting values. If the starting value of $\kappa$ is less than 0.001 in absolute value, it is reset to sign(k) * 0.001, as suggested by Hosking (1985).

Probability-Weighted Moments Estimation (method="pwme")
The idea of probability-weighted moments was introduced by Greenwood et al. (1979). Landwehr et al. (1979) derived probability-weighted moment estimators (PWME's) for the parameters of the Type I (Gumbel) extreme value distribution. Hosking et al. (1985) extended these results to the generalized extreme value distribution. See the abstract for Hosking et al. (1985) for details on how these estimators are computed.

Two-Stage Order Statistics Estimation (method="tsoe")
The two-stage order statistics estimator (TSOE) was introduced by Castillo and Hadi (1994) as an alternative to the MLE and PWME. Unlike the MLE and PWME, the TSOE of $\kappa$ exists for all combinations of sample values and possible values of $\kappa$. See the abstract for Castillo and Hadi (1994) for details on how these estimators are computed. In the second stage, Castillo and Hadi (1984) suggest using either the median or the least median of squares as the robust function. The function egevd allows three options for the robust function: median (tsoe.method="med"; see the R help file for median), least median of squares (tsoe.method="lms"; see the help file for lmsreg in the package MASS), and least trimmed squares (tsoe.method="lts"; see the help file for ltsreg in the package MASS).

Confidence Intervals
When ci=TRUE, an approximate $(1-\alpha)$100% confidence intervals for $\eta$ can be constructed assuming the distribution of the estimator of $\eta$ is approximately normally distributed. A two-sided confidence interval is constructed as: $$[\hat{\eta} - t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\eta}}, \, \hat{\eta} + t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\eta}}]$$ where $t(\nu, p)$ is the $p$'th quantile of Student's t-distribution with $\nu$ degrees of freedom, and the quantity $$\hat{\sigma}_{\hat{\eta}}$$ denotes the estimated asymptotic standard deviation of the estimator of $\eta$.

Similarly, a two-sided confidence interval for $\theta$ is constructed as: $$[\hat{\theta} - t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\theta}}, \, \hat{\theta} + t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\theta}}]$$ and a two-sided confidence interval for $\kappa$ is constructed as: $$[\hat{\kappa} - t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\kappa}}, \, \hat{\kappa} + t(n-1, 1-\alpha/2) \hat{\sigma}_{\hat{\kappa}}]$$

One-sided confidence intervals for $\eta$, $\theta$, and $\kappa$ are computed in a similar fashion.

Maximum Likelihood Estimator (method="mle")
Prescott and Walden (1980) derive the elements of the Fisher information matrix (the expected information). The inverse of this matrix, evaluated at the values of the MLE, is the estimated asymptotic variance-covariance matrix of the MLE. This method is used to estimate the standard deviations of the estimated distribution parameters when information="expected". The necessary regularity conditions hold for $\kappa < 1/2$. Thus, this method of constructing confidence intervals is not valid when the true value of $\kappa$ is greater than or equal to 1/2.

Prescott and Walden (1983) derive expressions for the observed information matrix (i.e., the Hessian). This matrix is used to compute the estimated asymptotic variance-covariance matrix of the MLE when information="observed".

In computer simulations, Prescott and Walden (1983) found that the variance-covariance matrix based on the observed information gave slightly more accurate estimates of the variance of MLE of $\kappa$ compared to the estimated variance based on the expected information.

Probability-Weighted Moments Estimator (method="pwme")
Hosking et al. (1985) show that these estimators are asymptotically multivariate normal and derive the asymptotic variance-covariance matrix. See the abstract for Hosking et al. (1985) for details on how this matrix is computed.

Two-Stage Order Statistics Estimator (method="tsoe")
Currently there is no built-in method in EnvStats for computing confidence intervals when
method="tsoe". Castillo and Hadi (1994) suggest using the bootstrap or jackknife method.

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

References

Castillo, E., and A. Hadi. (1994). Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution. Environmetrics 5, 417–432.

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Greenwood, J.A., J.M. Landwehr, N.C. Matalas, and J.R. Wallis. (1979). Probability Weighted Moments: Definition and Relation to Parameters of Several Distributions Expressible in Inverse Form. Water Resources Research 15(5), 1049–1054.

Hosking, J.R.M. (1984). Testing Whether the Shape Parameter is Zero in the Generalized Extreme-Value Distribution. Biometrika 71(2), 367–374.

Hosking, J.R.M. (1985). Algorithm AS 215: Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics 34(3), 301–310.

Hosking, J.R.M., J.R. Wallis, and E.F. Wood. (1985). Estimation of the Generalized Extreme-Value Distribution by the Method of Probability-Weighted Moments. Technometrics 27(3), 251–261.

Jenkinson, A.F. (1969). Statistics of Extremes. Technical Note 98, World Meteorological Office, Geneva.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York.

Landwehr, J.M., N.C. Matalas, and J.R. Wallis. (1979). Probability Weighted Moments Compared With Some Traditional Techniques in Estimating Gumbel Parameters and Quantiles. Water Resources Research 15(5), 1055–1064.

Macleod, A.J. (1989). Remark AS R76: A Remark on Algorithm AS 215: Maximum Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Applied Statistics 38(1), 198–199.

Prescott, P., and A.T. Walden. (1980). Maximum Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. Biometrika 67(3), 723–724.

Prescott, P., and A.T. Walden. (1983). Maximum Likelihood Estimation of the Three-Parameter Generalized Extreme-Value Distribution from Censored Samples. Journal of Statistical Computing and Simulation 16, 241–250.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Two-parameter extreme value distributions (EVD) have been applied extensively since the 1930's to several fields of study, including the distributions of hydrological and meteorological variables, human lifetimes, and strength of materials. The three-parameter generalized extreme value distribution (GEVD) was introduced by Jenkinson (1955) to model annual maximum and minimum values of meteorological events. Since then, it has been used extensively in the hydological and meteorological fields.

The three families of EVDs are all special kinds of GEVDs. When the shape parameter $\kappa=0$, the GEVD reduces to the Type I extreme value (Gumbel) distribution. (The function zTestGevdShape allows you to test the null hypothesis $H_0: \kappa=0$.) When $\kappa > 0$, the GEVD is the same as the Type II extreme value distribution, and when $\kappa < 0$ it is the same as the Type III extreme value distribution.

Hosking et al. (1985) compare the asymptotic and small-sample statistical properties of the PWME with the MLE and Jenkinson's (1969) method of sextiles. Castillo and Hadi (1994) compare the small-sample statistical properties of the MLE, PWME, and TSOE. Hosking and Wallis (1995) compare the small-sample properties of unbaised $L$-moment estimators vs. plotting-position $L$-moment estimators. (PWMEs can be written as linear combinations of $L$-moments and thus have equivalent statistical properties.) Hosking and Wallis (1995) conclude that unbiased estimators should be used for almost all applications.

Examples

  # Generate 20 observations from a generalized extreme value distribution 
  # with parameters location=2, scale=1, and shape=0.2, then compute the 
  # MLE and construct a 90% confidence interval for the location parameter. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(498) 
  dat <- rgevd(20, location = 2, scale = 1, shape = 0.2) 
  egevd(dat, ci = TRUE, conf.level = 0.9)
#> $distribution
#> [1] "Generalized Extreme Value"
#> 
#> $sample.size
#> [1] 20
#> 
#> $parameters
#>  location     scale     shape 
#> 1.6144630 0.9867007 0.2632493 
#> 
#> $n.param.est
#> [1] 3
#> 
#> $method
#> [1] "mle"
#> 
#> $data.name
#> [1] "dat"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "location"
#> 
#> $limits
#>      LCL      UCL 
#> 1.225249 2.003677 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Normal Approximation\n                                 (t Distribution) based on\n                                 observed information"
#> 
#> $conf.level
#> [1] 0.9
#> 
#> $sample.size
#> [1] 20
#> 
#> $dof
#> [1] 18
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "estimate"

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Generalized Extreme Value
  #
  #Estimated Parameter(s):          location = 1.6144631
  #                                 scale    = 0.9867007
  #                                 shape    = 0.2632493
  #
  #Estimation Method:               mle
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Confidence Interval for:         location
  #
  #Confidence Interval Method:      Normal Approximation
  #                                 (t Distribution) based on
  #                                 observed information
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                90%
  #
  #Confidence Interval:             LCL = 1.225249
  #                                 UCL = 2.003677

  #----------

  # Compare the values of the different types of estimators:

  egevd(dat, method = "mle")$parameters 
#>  location     scale     shape 
#> 1.6144630 0.9867007 0.2632493 
  # location     scale     shape 
  #1.6144631 0.9867007 0.2632493 

  egevd(dat, method = "pwme")$parameters
#>  location     scale     shape 
#> 1.5785779 1.0187880 0.2257948 
  # location     scale     shape 
  #1.5785779 1.0187880 0.2257948 

  egevd(dat, method = "pwme", pwme.method = "plotting.position")$parameters 
#>  location     scale     shape 
#> 1.5509183 0.9804992 0.1657040 
  # location     scale     shape 
  #1.5509183 0.9804992 0.1657040

  egevd(dat, method = "tsoe")$parameters 
#>  location     scale     shape 
#> 1.5372694 1.0876041 0.2927272 
  # location     scale     shape 
  #1.5372694 1.0876041 0.2927272 

  egevd(dat, method = "tsoe", tsoe.method = "lms")$parameters 
#> location    scale    shape 
#> 1.519469 1.081149 0.284863 
  #location    scale    shape 
  #1.519469 1.081149 0.284863

  egevd(dat, method = "tsoe", tsoe.method = "lts")$parameters 
#>  location     scale     shape 
#> 1.4840198 1.0679549 0.2691914 
  # location     scale     shape 
  #1.4840198 1.0679549 0.2691914 

  #----------

  # Clean up
  #---------
  rm(dat)