ehyper.Rd
Estimate \(m\), the number of white balls in the urn, or \(m+n\), the total number of balls in the urn, for a hypergeometric distribution.
ehyper(x, m = NULL, total = NULL, k, method = "mle")
non-negative integer indicating the number of white balls out of a sample of
size k
drawn without replacement from the urn. Missing (NA
),
undefined (NaN
), and infinite (Inf
, -Inf
) values are not
allowed.
non-negative integer indicating the number of white balls in the urn.
You must supply m
or total
, but not both.
Missing values (NA
s) are not allowed.
positive integer indicating the total number of balls in the urn (i.e.,
m+n
). You must supply m
or total
, but not both.
Missing values (NA
s) are not allowed.
positive integer indicating the number of balls drawn without replacement from the
urn. Missing values (NA
s) are not allowed.
character string specifying the method of estimation. Possible values are
"mle"
(maximum likelihood; the default) and "mvue"
(minimum variance unbiased). The mvue method is only available when you
are estimating \(m\) (i.e., when you supply the argument total
).
See the DETAILS section for more information on these estimation methods.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are not allowed.
Let \(x\) be an observation from a
hypergeometric distribution with
parameters m=
\(M\), n=
\(N\), and k=
\(K\).
In R nomenclature, \(x\) represents the number of white balls drawn out of a
sample of \(K\) balls drawn without replacement from an urn containing
\(M\) white balls and \(N\) black balls. The total number of balls in the
urn is thus \(M+N\). Denote the total number of balls by \(T = M+N\).
Estimation
Estimating M, Given T and K are known
When \(T\) and \(K\) are known, the maximum likelihood estimator (mle) of
\(M\) is given by (Forbes et al., 2011):
$$\hat{M}_{mle} = floor[(T + 1) x / K] \;\;\;\; (1)$$
where \(floor()\) represents the floor
function.
That is, \(floor(y)\) is the largest integer less than or equal to \(y\).
If the quantity \(floor[(T + 1) x / K]\) is an integer, then the mle of
\(M\) is also given by (Johnson et al., 1992, p.263):
$$\hat{M}_{mle} = [(T + 1) x / K] - 1 \;\;\;\; (2)$$
which is what the function ehyper
uses for this case.
The minimum variance unbiased estimator (mvue) of \(M\) is given by
(Forbes et al., 2011):
$$\hat{M}_{mvue} = (T x / K) \;\;\;\; (3)$$
Estimating T, given M and K are known
When \(M\) and \(K\) are known, the maximum likelihood estimator (mle) of
\(T\) is given by (Forbes et al., 2011):
$$\hat{T}_{mle} = floor(K M / x) \;\;\;\; (4)$$
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 6.
The hypergeometric distribution can be described by
an urn model with \(M\) white balls and \(N\) black balls. If \(K\) balls
are drawn with replacement, then the number of white balls in the sample
of size \(K\) follows a binomial distribution with
parameters size=
\(K\) and prob=
\(M/(M+N)\). If \(K\) balls are
drawn without replacement, then the number of white balls in the sample of
size \(K\) follows a hypergeometric distribution
with parameters m=
\(M\), n=
\(N\), and k=
\(K\).
The name “hypergeometric” comes from the fact that the probabilities associated with this distribution can be written as successive terms in the expansion of a function of a Gaussian hypergeometric series.
The hypergeometric distribution is applied in a variety of fields, including quality control and estimation of animal population size. It is also the distribution used to compute probabilities for Fishers's exact test for a 2x2 contingency table.
# Generate an observation from a hypergeometric distribution with
# parameters m=10, n=30, and k=5, then estimate the parameter m.
# Note: the call to set.seed simply allows you to reproduce this example.
# Also, the only parameter actually estimated is m; once m is estimated,
# n is computed by subtracting the estimated value of m (8 in this example)
# from the given of value of m+n (40 in this example). The parameters
# n and k are shown in the output in order to provide information on
# all of the parameters associated with the hypergeometric distribution.
set.seed(250)
dat <- rhyper(nn = 1, m = 10, n = 30, k = 5)
dat
#> [1] 1
#[1] 1
ehyper(dat, total = 40, k = 5)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Hypergeometric
#>
#> Estimated Parameter(s): m = 8
#> n = 32
#> k = 5
#>
#> Estimation Method: mle for 'm'
#>
#> Data: dat
#>
#> Sample Size: 1
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Hypergeometric
#
#Estimated Parameter(s): m = 8
# n = 32
# k = 5
#
#Estimation Method: mle for 'm'
#
#Data: dat
#
#Sample Size: 1
#----------
# Use the same data as in the previous example, but estimate m+n instead.
# Note: The only parameter estimated is m+n. Once this is estimated,
# n is computed by subtracting the given value of m (10 in this case)
# from the estimated value of m+n (50 in this example).
ehyper(dat, m = 10, k = 5)
#>
#> Results of Distribution Parameter Estimation
#> --------------------------------------------
#>
#> Assumed Distribution: Hypergeometric
#>
#> Estimated Parameter(s): m = 10
#> n = 40
#> k = 5
#>
#> Estimation Method: mle for 'm+n'
#>
#> Data: dat
#>
#> Sample Size: 1
#>
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Hypergeometric
#
#Estimated Parameter(s): m = 10
# n = 40
# k = 5
#
#Estimation Method: mle for 'm+n'
#
#Data: dat
#
#Sample Size: 1
#----------
# Clean up
#---------
rm(dat)