signTest.Rd
Estimate the median, test the null hypothesis that the median is equal to a user-specified value based on the sign test, and create a confidence interval for the median.
signTest(x, y = NULL, alternative = "two.sided", mu = 0, paired = FALSE,
conf.level = 0.95)
numeric vector of observations.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed.
optional numeric vector of observations that are paired with the observations in
x
. The length of y
must be the same as the length of x
.
This argument is ignored if paired=FALSE
, and must be supplied if
paired=TRUE
. The default value is y=NULL
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed.
character string indicating the kind of alternative hypothesis. The possible values
are "two.sided"
(the default), "greater"
, and "less"
.
numeric scalar indicating the hypothesized value of the median. The default value is
mu=0
.
logical scalar indicating whether to perform a paired or one-sample sign test.
The possible values are paired=FALSE
(the default; indicates a one-sample
sign test) and paired=TRUE
.
numeric scalar between 0 and 1 indicating the confidence level associated with the
confidence interval for the population median. The default value is conf.level=0.95
.
One-Sample Case (paired=FALSE
)
Let \(\underline{x} = x_1, x_2, \ldots, x_n\) be a vector of \(n\)
independent observations from one or more distributions that all have the
same median \(\mu\).
Consider the test of the null hypothesis:
$$H_0: \mu = \mu_0 \;\;\;\;\;\; (1)$$
The three possible alternative hypotheses are the upper one-sided alternative
(alternative="greater"
)
$$H_a: \mu > \mu_0 \;\;\;\;\;\; (2)$$
the lower one-sided alternative (alternative="less"
)
$$H_a: \mu < \mu_0 \;\;\;\;\;\; (3)$$
and the two-sided alternative (alternative="two.sided"
)
$$H_a: \mu \ne \mu_0 \;\;\;\;\;\; (4)$$
To perform the test of the null hypothesis (1) versus any of the three alternatives
(2)-(4), the sign test uses the test statistic \(T\) which is simply the number of
observations that are greater than \(\mu_0\) (Conover, 1980, p. 122;
van Belle et al., 2004, p. 256; Hollander and Wolfe, 1999, p. 60;
Lehmann, 1975, p. 120; Sheskin, 2011; Zar, 2010, p. 537). Under the null
hypothesis, the distribution of \(T\) is a
binomial random variable with
parameters size=
\(n\) and prob=0.5
. Usually, however, cases for
which the observations are equal to \(\mu_0\) are discarded, so the distribution
of \(T\) is taken to be binomial with parameters size=
\(r\) and
prob=0.5
, where \(r\) denotes the number of observations not equal to
\(\mu_0\). The sign test only requires that the observations are independent
and that they all come from one or more distributions (not necessarily the same
ones) that all have the same population median.
For a two-sided alternative hypothesis (Equation (4)), the p-value is computed as:
$$p = Pr(X_{r,0.5} \le r-m) + Pr(X_{r,0.5} > m) \;\;\;\;\;\; (5)$$
where \(X_{r,p}\) denotes a binomial random variable
with parameters size=
\(r\) and prob=
\(p\), and \(m\)
is defined by:
$$m = max(T, r-T) \;\;\;\;\;\; (6)$$
For a one-sided lower alternative hypothesis (Equation (3)), the p-value is computed as: $$p = Pr(X_{m,0.5} \le T) \;\;\;\;\;\; (7)$$ and for a one-sided upper alternative hypothesis (Equation (2)), the p-value is computed as: $$p = Pr(X_{m,0.5} \ge T) \;\;\;\;\;\; (8)$$
It is obvious that the sign test is simply a special case of the
binomial test with p=0.5
.
Computing Confidence Intervals
Based on the relationship between hypothesis tests and confidence intervals,
we can construct a confidence interval for the population median based on the
sign test (e.g., Hollander and Wolfe, 1999, p. 72; Lehmann, 1975, p. 182).
It turns out that this is equivalent to using the formulas for a nonparametric
confidence intervals for the 0.5 quantile (see eqnpar
).
Paired-Sample Case (paired=TRUE
)
When the argument paired=TRUE
, the arguments x
and y
are
assumed to have the same length, and the \(n\) differences
\(d_i = x_i - y_i, \;\; i = 1, 2, \ldots, n\) are assumed to be independent
observations from distributions with the same median \(\mu\). The sign test
can then be applied to the differences.
A list of class "htest"
containing the results of the hypothesis test.
See the help file for htest.object
for details.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, p.122
Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods. Second Edition. John Wiley and Sons, New York, p.60.
Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, p.120.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.404–406.
Sheskin, D.J. (2011). Handbook of Parametric and Nonparametric Statistical Procedures Fifth Edition. CRC Press, Boca Raton, FL.
van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences 2nd Edition. John Wiley & Sons, New York.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ,
A frequent question in environmental statistics is “Is the concentration of chemical X greater than Y units?”. For example, in groundwater assessment (compliance) monitoring at hazardous and solid waste sites, the concentration of a chemical in the groundwater at a downgradient well must be compared to a groundwater protection standard (GWPS). If the concentration is “above” the GWPS, then the site enters corrective action monitoring. As another example, soil screening at a Superfund site involves comparing the concentration of a chemical in the soil with a pre-determined soil screening level (SSL). If the concentration is “above” the SSL, then further investigation and possible remedial action is required. Determining what it means for the chemical concentration to be “above” a GWPS or an SSL is a policy decision: the average of the distribution of the chemical concentration must be above the GWPS or SSL, or the median must be above the GWPS or SSL, or the 95th percentile must be above the GWPS or SSL, or something else. Often, the first interpretation is used.
Hypothesis tests you can use to perform tests of location include: Student's t-test, Fisher's randomization test, the Wilcoxon signed rank test, Chen's modified t-test, the sign test, and a test based on a bootstrap confidence interval. For a discussion comparing the performance of these tests, see Millard and Neerchal (2001, pp.408-409).
# Generate 10 observations from a lognormal distribution with parameters
# meanlog=2 and sdlog=1. The median of this distribution is e^2 (about 7.4).
# Test the null hypothesis that the true median is equal to 5 against the
# alternative that the true mean is greater than 5.
# (Note: the call to set.seed allows you to reproduce this example).
set.seed(23)
dat <- rlnorm(10, meanlog = 2, sdlog = 1)
signTest(dat, mu = 5)
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: median = 5
#>
#> Alternative Hypothesis: True median is not equal to 5
#>
#> Test Name: Sign test
#>
#> Estimated Parameter(s): median = 19.21717
#>
#> Data: dat
#>
#> Test Statistic: # Obs > median = 9
#>
#> P-value: 0.02148438
#>
#> Confidence Interval for: median
#>
#> Confidence Interval Method: interpolate (Nyblom, 1992)
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Limit Rank(s): 2 3 9 8
#>
#> Confidence Interval: LCL = 7.000846
#> UCL = 26.937725
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: median = 5
#
#Alternative Hypothesis: True median is not equal to 5
#
#Test Name: Sign test
#
#Estimated Parameter(s): median = 19.21717
#
#Data: dat
#
#Test Statistic: # Obs > median = 9
#
#P-value: 0.02148438
#
#Confidence Interval for: median
#
#Confidence Interval Method: exact
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 93.45703%
#
#Confidence Limit Rank(s): 3 9
#
#Confidence Interval: LCL = 7.732538
# UCL = 35.722459
# Clean up
#---------
rm(dat)
#==========
# The guidance document "Supplemental Guidance to RAGS: Calculating the
# Concentration Term" (USEPA, 1992d) contains an example of 15 observations
# of chromium concentrations (mg/kg) which are assumed to come from a
# lognormal distribution. These data are stored in the vector
# EPA.92d.chromium.vec. Here, we will use the sign test to test the null
# hypothesis that the median chromium concentration is less than or equal to
# 100 mg/kg vs. the alternative that it is greater than 100 mg/kg. The
# estimated median is 110 mg/kg. There are 8 out of 15 observations greater
# than 100 mg/kg, the p-value is equal to 0.5, and the lower 94% confidence
# limit is 41 mg/kg.
signTest(EPA.92d.chromium.vec, mu = 100, alternative = "greater")
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: median = 100
#>
#> Alternative Hypothesis: True median is greater than 100
#>
#> Test Name: Sign test
#>
#> Estimated Parameter(s): median = 110
#>
#> Data: EPA.92d.chromium.vec
#>
#> Test Statistic: # Obs > median = 8
#>
#> P-value: 0.5
#>
#> Confidence Interval for: median
#>
#> Confidence Interval Method: interpolate (Nyblom, 1992)
#>
#> Confidence Interval Type: lower
#>
#> Confidence Level: 95%
#>
#> Confidence Limit Rank(s): 4 5 NA NA
#>
#> Confidence Interval: LCL = 40.53074
#> UCL = Inf
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: median = 100
#
#Alternative Hypothesis: True median is greater than 100
#
#Test Name: Sign test
#
#Estimated Parameter(s): median = 110
#
#Data: EPA.92d.chromium.vec
#
#Test Statistic: # Obs > median = 8
#
#P-value: 0.5
#
#Confidence Interval for: median
#
#Confidence Interval Method: exact
#
#Confidence Interval Type: lower
#
#Confidence Level: 94.07654%
#
#Confidence Limit Rank(s): 5
#
#Confidence Interval: LCL = 41
# UCL = Inf