Expected Value of Order Statistics for Random Sample from Standard Normal Distribution

Compute the expected values of order statistics for a random sample from a standard normal distribution.

evNormOrdStats(n = 1, 
    method = "royston", lower = -9, inc = 0.025, warn = TRUE, 
    alpha = 3/8, nmc = 2000, seed = 47, approximate = NULL)

  evNormOrdStatsScalar(r = 1, n = 1, 
    method = "royston", lower = -9, inc = 0.025, warn = TRUE, 
    alpha = 3/8, nmc = 2000, conf.level = 0.95, seed = 47, approximate = NULL)

Arguments

n

positive integer indicating the sample size.

r

positive integer between 1 and n specifying the order statistic for which to compute the expected value.

method

character string indicating what method to use. The possible values are:

"royston". Method based on approximating the exact integral as given in Royston (1982).
"blom". Method based on the approximation formula proposed by Blom (1958).
"mc". Method based on Monte Carlo simulation.

See the DETAILS section below.

lower

numeric scalar $\le -9$ defining the lower bound used for approximating the integral when method="royston". The upper bound is automatically set to -lower. The default value is lower=-9.

inc

numeric scalar between .Machine$double.eps and 0.025 that determines the width of each subdivision used to approximate the integral when
method="royston". The default value is inc=0.025.

warn

logical scalar indicating whether to issue a warning when method="royston" and the sample size is greater than 2000. The default value is warn=TRUE.

alpha

numeric scalar between 0 and 0.5 that determines the constant used when
method="blom". The default value is alpha=3/8.

nmc

integer $\ge 100$ denoting the number of Monte Carlo simulations to use when method="mc". The default value is nmc=2000.

conf.level

numeric scalar between 0 and 1 denoting the confidence level of the confidence interval for the expected value of the normal order statistic when method="mc". The default value is conf.level=0.95.

seed

integer between $-(2^31 - 1)$ and $2^31 - 1$ specifying the argument to set.seed (the random number seed) when method="mc". The default value is seed=47.

approximate

logical scalar included for backwards compatibility with versions of EnvStats prior to version 2.3.0. When method is not supplied and
approximate=FALSE, method is set to method="royston". When method is not supplied and approximate=TRUE, method is set to method="blom". This argument is ignored if method is supplied and/or approxmiate=NULL (the default).

Details

Let $\underline{z} = z_1, z_2, \ldots, z_n$ denote a vector of $n$ observations from a normal distribution with parameters mean=0 and sd=1. That is, $\underline{z}$ denotes a vector of $n$ observations from a standard normal distribution. Let $z_{(r)}$ denote the $r$'th order statistic of $\underline{z}$, for $r = 1, 2, \ldots, n$. The probability density function of $z_{(r)}$ is given by: $$f_{r,n}(t) = \frac{n!}{(r-1)!(n-r)!} [\Phi(t)]^{r-1} [1 - \Phi(t)]^{n-r} \phi(t) \;\;\;\;\;\; (1)$$ where $\Phi$ and $\phi$ denote the cumulative distribution function and probability density function of the standard normal distribution, respectively (Johnson et al., 1994, p.93). Thus, the expected value of $z_{(r)}$ is given by: $$E(r, n) = E[z_{(r)}] = \int_{-\infty}^{\infty} t f_{r,n}(t) dt \;\;\;\;\;\; (2)$$ It can be shown that if $n$ is odd, then $$E[(n+1)/2, n] = 0 \;\;\;\;\;\; (3)$$ Also, for all values of $n$, $$E(r, n) = -E(n-r+1, n) \;\;\;\;\;\; (4)$$

The function evNormOrdStatsScalar computes the value of $E(r,n)$ for user-specified values of $r$ and $n$.

The function evNormOrdStats computes the values of $E(r,n)$ for all values of $r$ (i.e., for $r = 1, 2, \ldots, n$) for a user-specified value of $n$.

Exact Method Based on Royston's Approximation to the Integral (method="royston")

When method="royston", the integral in Equation (2) above is approximated by computing the value of the integrand between the values of lower and -lower using increments of inc, then summing these values and multiplying by inc. In particular, the integrand is restructured as: $$t \; f_{r,n}(t) = t \; exp\{log(n!) - log[(r-1)!] - log[(n-r)!] + (r-1)log[\Phi(t)] + (n-r)log[1 - \Phi(t)] + log[\phi(t)]\} \;\;\; (5)$$ By default, as per Royston (1982), the integrand is evaluated between -9 and 9 in increments of 0.025. The approximation is computed this way for values of $r$ between $1$ and $[n/2]$, where $[x]$ denotes the floor of $x$. If $r > [n/2]$, then the approximation is computed for $E(n-r+1, n)$ and Equation (4) is used.

Note that Equation (1) in Royston (1982) differs from Equations (1) and (2) above because Royston's paper is based on the $r^{th}$ largest value, not the $r^{th}$ order statistic.

Royston (1982) states that this algorithm “is accurate to at least seven decimal places on a 36-bit machine,” that it has been validated up to a sample size of $n=2000$, and that the accuracy for $n > 2000$ may be improved by reducing the value of the argument inc. Note that making inc smaller will increase the computation time.

Approxmation Based on Blom's Method (method="blom")

When method="blom", the following approximation to $E(r,n)$, proposed by Blom (1958, pp. 68-75), is used: $$E(r, n) \approx \Phi^{-1}(\frac{r - \alpha}{n - 2\alpha + 1}) \;\;\;\;\;\; (5)$$ By default, $\alpha = 3/8 = 0.375$. This approximation is quite accurate. For example, for $n \ge 2$, the approximation is accurate to the first decimal place, and for $n \ge 9$ it is accurate to the second decimal place.

Harter (1961) discusses appropriate values of $\alpha$ for various sample sizes $n$ and values of $r$.

Approximation Based on Monte Carlo Simulation (method="mc")

When method="mc", Monte Carlo simulation is used to estmate the expected value of the $r^{th}$ order statistic. That is, $N =$ nmc trials are run in which, for each trial, a random sample of $n$ standard normal observations is generated and the $r^{th}$ order statistic is computed. Then, the average value of this order statistic over all $N$ trials is computed, along with a confidence interval for the expected value, assuming an approximately normal distribution for the mean of the order statistic (the confidence interval is computed by supplying the simulated values of the $r^{th}$ order statistic to the function enorm).

NOTE: This method has not been optimized for large sample sizes $n$ (i.e., large values of the argument n) and/or a large number of Monte Carlo trials $N$ (i.e., large values of the argument nmc) and may take a long time to execute in these cases.

Value

For evNormOrdStats: a numeric vector of length n containing the expected values of all the order statistics for a random sample of n standard normal deviates.

For evNormOrdStatsScalar: a numeric scalar containing the expected value of the r'th order statistic from a random sample of n standard normal deviates. When method="mc", the returned object also has a cont.int attribute that contains the 95 and a nmc attribute indicating the number of Monte Carlo trials run.

References

Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. John Wiley and Sons, New York.

Harter, H. L. (1961). Expected Values of Normal Order Statistics 48, 151–165.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York, pp. 93–99.

Royston, J.P. (1982). Algorithm AS 177. Expected Normal Order Statistics (Exact and Approximate). Applied Statistics 31, 161–165.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The expected values of normal order statistics are used to construct normal quantile-quantile (Q-Q) plots (see qqPlot) and to compute goodness-of-fit statistics (see gofTest). Usually, however, approximations are used instead of exact values. The functions evNormOrdStats and
evNormOrdStatsScalar have been included mainly because evNormOrdStatsScalar is called by elnorm3 and predIntNparSimultaneousTestPower.

Examples

  # Compute the expected value of the minimum for a random sample of size 10 
  # from a standard normal distribution:

  # Based on method="royston"
  #--------------------------
  evNormOrdStatsScalar(r = 1, n = 10) 
#> [1] -1.538753
  #[1] -1.538753


  # Based on method="blom"
  #-----------------------
  evNormOrdStatsScalar(r = 1, n = 10, method = "blom") 
#> [1] -1.546635
  #[1] -1.546635


  # Based on method="mc" with 10,000 Monte Carlo trials
  #----------------------------------------------------
  evNormOrdStatsScalar(r = 1, n = 10, method = "mc", nmc = 10000) 
#> [1] -1.544318
#> attr(,"confint")
#>    95%LCL    95%UCL 
#> -1.555838 -1.532797 
#> attr(,"nmc")
#> [1] 10000
  #[1] -1.544318
  #attr(,"confint")
  #   95%LCL    95%UCL 
  #-1.555838 -1.532797 
  #attr(,"nmc")
  #[1] 10000

  #====================

  # Compute the expected values of all of the order statistics 
  # for a random sample of size 10 from a standard normal distribution
  # based on Royston's (1982) method:
  #--------------------------------------------------------------------

  evNormOrdStats(10) 
#>  [1] -1.5387527 -1.0013570 -0.6560591 -0.3757647 -0.1226678  0.1226678
#>  [7]  0.3757647  0.6560591  1.0013570  1.5387527
  #[1] -1.5387527 -1.0013570 -0.6560591 -0.3757647 -0.1226678
  #[6]  0.1226678  0.3757647  0.6560591  1.0013570  1.5387527


  # Compare the above with Blom (1958) scores:
  #-------------------------------------------

  evNormOrdStats(10, method = "blom") 
#>  [1] -1.5466353 -1.0004905 -0.6554235 -0.3754618 -0.1225808  0.1225808
#>  [7]  0.3754618  0.6554235  1.0004905  1.5466353
  #[1] -1.5466353 -1.0004905 -0.6554235 -0.3754618 -0.1225808
  #[6]  0.1225808  0.3754618  0.6554235  1.0004905  1.5466353