Two-Sample Rank Test to Detect a Shift in a Proportion of the "Treated" Population

Two-sample rank test to detect a positive shift in a proportion of one population (here called the “treated” population) compared to another (here called the “reference” population). This test is usually called the quantile test (Johnson et al., 1987).

quantileTest(x, y, alternative = "greater", target.quantile = 0.5, 
    target.r = NULL, exact.p = TRUE)

Arguments

x: numeric vector of observations from the “treatment” group. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
y: numeric vector of observations from the “reference” group. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.
alternative: character string indicating the kind of alternative hypothesis. The possible values are "greater" (right tail of treatment group shifted to the right of the right tail of the reference group) and "less" (left tail of treatment group shifted to the left of the left tail of the reference group). The default value is
alternative="greater".
target.quantile: numeric scalar between 0 and 1 indicating the desired quantile to use as the lower cut off point for the test. Because of the discrete nature of empirical quantiles, the upper bound for the possible empirical quantiles will often differ from the value of target.quantile. The default value is target.quantile=0.5 (i.e., the median). This argument is ignored if the argument target.r is supplied.
target.r: integer indicating the rank of the observation to use as the lower cut off point for the test. The value of target.r must be greater than or equal to 2 and less than or equal to $N$ (the total number of valid observations contained in the arguments x and y). The actual rank of the cut off point may differ from target.r in the case of tied observations in x and/or y. The default value of this argument is NULL, in which case the argument target.quantile is used to determine the lower cut off for the test.
exact.p: logical scalar indicating whether to compute the p-value based on the exact distribution of the test statistic (exact.p=TRUE; the default) or based on the normal approximation (exact.p=FALSE).

Details

Let $X$ denote a random variable representing measurements from a “treatment” group with cumulative distribution function (cdf) $$F_X(t) = Pr(X \le t) \;\;\;\;\;\; (1)$$ and let $x_1, x_2, \ldots, x_m$ denote $m$ observations from this treatment group. Let $Y$ denote a random variable from a “reference” group with cdf $$F_Y(t) = Pr(Y \le t) \;\;\;\;\;\; (2)$$ and let $y_1, y_2, \ldots, y_n$ denote $n$ observations from this reference group. Consider the null hypothesis: $$H_0: F_X(t) = F_Y(t), \;\; -\infty < t < \infty \;\;\;\;\;\; (3)$$ versus the alternative hypothesis $$H_a: F_X(t) = (1 - \epsilon) F_Y(t) + \epsilon F_Z(t) \;\;\;\;\;\; (4)$$ where $Z$ denotes some random variable with cdf $$F_Z(t) = Pr(Z \le t) \;\;\;\;\;\; (5)$$ and $0 < \epsilon \le 1$, $F_Z(t) \le F_Y(t)$ for all values of $t$, and $F_Z(t) \ne F_Y(t)$ for at least one value of $t$.

In English, the alternative hypothesis (4) says that a portion $\epsilon$ of the distribution for the treatment group (the distribution of $X$) is shifted to the right of the distribution for the reference group (the distribution of $Y$). The alternative hypothesis (4) with $\epsilon = 1$ is the alternative hypothesis associated with testing a location shift, for which the the Wilcoxon rank sum test can be used.

Johnson et al. (1987) investigated locally most powerful rank tests for the test of the null hypothesis (3) against the alternative hypothesis (4). They considered the case when $Y$ and $Z$ were normal random variables and the case when the densities of $Y$ and $Z$ assumed only two positive values. For the latter case, the locally most powerful rank test reduces to the following procedure, which Johnson et al. (1987) call the quantile test.

Combine the $n$ observations from the reference group and the $m$ observations from the treatment group and rank them from smallest to largest. Tied observations receive the average rank of all observations tied at that value.
Choose a quantile $q$ and determine the smallest rank $r$ such that $$\frac{r}{m+n+1} > q \;\;\;\;\;\; (6)$$ Note that because of the discrete nature of ranks, any quantile $q'$ such that $$\frac{r}{m+n+1} > q' \ge \frac{r-1}{m+n+1} \;\;\;\;\;\; (7)$$ will yield the same value for $r$ as the quantile $q$ does. Alternatively, choose a value of $r$. The bounds on an associated quantile are then given in Equation (7). Note: the component called parameters in the list returned by quantileTest contains an element named quantile.ub. The value of this element is the left-hand side of Equation (7).
Set $k$ equal to the number of observations from the treatment group (the number of $X$ observations) with ranks bigger than or equal to $r$.
Under the null hypothesis (3), the probability that at least $k$ out of the $r$ largest observations come from the treatment group is given by: $$p = \sum_{i=k}^r \frac{{m+n-r \choose m-i} {r \choose i}}{{m+n \choose n}} \;\;\;\;\;\; (8)$$ This probability may be approximated by: $$p = 1 - \Phi(\frac{k - \mu_k - 1/2}{\sigma_k}) \;\;\;\;\;\; (9)$$ where $$\mu_k = \frac{mr}{m+n} \;\;\;\;\;\; (10)$$ $$\sigma_k^2 = \frac{mnr(m+n-r)}{(m+n)^2 (m+n-1)} \;\;\;\;\;\; (11)$$ and $\Phi$ denotes the cumulative distribution function of the standard normal distribution (USEPA, 1994, pp.7.16-7.17). (See quantileTestPValue.)
Reject the null hypothesis (3) in favor of the alternative hypothesis (4) at significance level $\alpha$ if $p \le \alpha$.

Johnson et al. (1987) note that their quantile test is asymptotically equivalent to one proposed by Carrano and Moore (1982) in the context of a two-sided test. Also, when $q=0.5$, the quantile test reduces to Mood's median test for two groups (see Zar, 2010, p.172; Conover, 1980, pp.171-178).

The optimal choice of $q$ or $r$ in Step 2 above (i.e., the choice that yields the largest power) depends on the true underlying distributions of $Y$ and $Z$ and the mixing proportion $\epsilon$. Johnson et al. (1987) performed a simulation study and showed that the quantile test performs better than the Wilcoxon rank sum test and the normal scores test under the alternative of a mixed normal distribution with a shift of at least 2 standard deviations in the $Z$ distribution. USEPA (1994, pp.7.17-7.21) shows that when the mixing proportion $\epsilon$ is small and the shift is large, the quantile test is more powerful than the Wilcoxon rank sum test, and when $\epsilon$ is large and the shift is small the Wilcoxon rank sum test is more powerful than the quantile test.

Value

A list of class "htestEnvStats" containing the results of the hypothesis test. See the help file for
htestEnvStats.object for details.

References

Carrano, A., and D. Moore. (1982). The Rationale and Methodology for Quantifying Sister Chromatid Exchange in Humans. In Heddle, J.A., ed., Mutagenicity: New Horizons in Genetic Toxocology. Academic Press, New York, pp.268-304.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, Chapter 4.

Johnson, R.A., S. Verrill, and D.H. Moore. (1987). Two-Sample Rank Tests for Detecting Changes That Occur in a Small Proportion of the Treated Population. Biometrics 43, 641-655.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.435-439.

USEPA. (1994). Statistical Methods for Evaluating the Attainment of Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid Media. EPA/230-R-94-004. Office of Policy, Planning, and Evaluation, U.S. Environmental Protection Agency, Washington, D.C.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

The EPA guidance document Statistical Methods for Evaluating the Attainment of Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid Media (USEPA, 1994, pp.4.7-4.9) recommends three different statistical tests for determining whether a remediated Superfund site has attained compliance: the Wilcoxon rank sum test, the quantile test, and the “hot measurement” comparison test. The Wilcoxon rank sum test and quantile test are nonparametric tests that compare chemical concentrations in the cleanup area with those in the reference area. The hot-measurement comparison test compares concentrations in the cleanup area with a pre-specified upper limit value Hm (the value of Hm must be negotiated between the EPA and the Superfund-site owner or operator). The Wilcoxon rank sum test is appropriate for detecting uniform failure of remedial action throughout the cleanup area. The quantile test is appropriate for detecting failure in only a few areas within the cleanup area. The hot-measurement comparison test is appropriate for detecting hot spots that need to be remediated regardless of the outcomes of the other two tests.

USEPA (1994, pp.4.7-4.9) recommends applying all three tests to all cleanup units within a cleanup area. This leads to the usual multiple comparisons problem: the probability of at least one of the tests indicating non-compliance, when in fact the cleanup area is in compliance, is greater than the pre-set Type I error level for any of the individual tests. USEPA (1994, p.3.3) recommends against using multiple comparison procedures to control the overall Type I error and suggests instead a re-sampling scheme where additional samples are taken in cases where non-compliance is indicated.

Examples

  # Following Example 7.5 on pages 7.23-7.24 of USEPA (1994b), perform the 
  # quantile test for the TcCB data (the data are stored in EPA.94b.tccb.df).  
  # There are n=47 observations from the reference area and m=77 observations 
  # from the cleanup unit.  The target rank is set to 9, resulting in a value 
  # of quantile.ub=0.928.  Note that the p-value is 0.0114, not 0.0117.

  EPA.94b.tccb.df
#>     TcCB.orig   TcCB Censored      Area
#> 1        0.22   0.22    FALSE Reference
#> 2        0.23   0.23    FALSE Reference
#> 3        0.26   0.26    FALSE Reference
#> 4        0.27   0.27    FALSE Reference
#> 5        0.28   0.28    FALSE Reference
#> 6        0.28   0.28    FALSE Reference
#> 7        0.29   0.29    FALSE Reference
#> 8        0.33   0.33    FALSE Reference
#> 9        0.34   0.34    FALSE Reference
#> 10       0.35   0.35    FALSE Reference
#> 11       0.38   0.38    FALSE Reference
#> 12       0.39   0.39    FALSE Reference
#> 13       0.39   0.39    FALSE Reference
#> 14       0.42   0.42    FALSE Reference
#> 15       0.42   0.42    FALSE Reference
#> 16       0.43   0.43    FALSE Reference
#> 17       0.45   0.45    FALSE Reference
#> 18       0.46   0.46    FALSE Reference
#> 19       0.48   0.48    FALSE Reference
#> 20       0.50   0.50    FALSE Reference
#> 21       0.50   0.50    FALSE Reference
#> 22       0.51   0.51    FALSE Reference
#> 23       0.52   0.52    FALSE Reference
#> 24       0.54   0.54    FALSE Reference
#> 25       0.56   0.56    FALSE Reference
#> 26       0.56   0.56    FALSE Reference
#> 27       0.57   0.57    FALSE Reference
#> 28       0.57   0.57    FALSE Reference
#> 29       0.60   0.60    FALSE Reference
#> 30       0.62   0.62    FALSE Reference
#> 31       0.63   0.63    FALSE Reference
#> 32       0.67   0.67    FALSE Reference
#> 33       0.69   0.69    FALSE Reference
#> 34       0.72   0.72    FALSE Reference
#> 35       0.74   0.74    FALSE Reference
#> 36       0.76   0.76    FALSE Reference
#> 37       0.79   0.79    FALSE Reference
#> 38       0.81   0.81    FALSE Reference
#> 39       0.82   0.82    FALSE Reference
#> 40       0.84   0.84    FALSE Reference
#> 41       0.89   0.89    FALSE Reference
#> 42       1.11   1.11    FALSE Reference
#> 43       1.13   1.13    FALSE Reference
#> 44       1.14   1.14    FALSE Reference
#> 45       1.14   1.14    FALSE Reference
#> 46       1.20   1.20    FALSE Reference
#> 47       1.33   1.33    FALSE Reference
#> 48      <0.09   0.09     TRUE   Cleanup
#> 49       0.09   0.09    FALSE   Cleanup
#> 50       0.09   0.09    FALSE   Cleanup
#> 51       0.12   0.12    FALSE   Cleanup
#> 52       0.12   0.12    FALSE   Cleanup
#> 53       0.14   0.14    FALSE   Cleanup
#> 54       0.16   0.16    FALSE   Cleanup
#> 55       0.17   0.17    FALSE   Cleanup
#> 56       0.17   0.17    FALSE   Cleanup
#> 57       0.17   0.17    FALSE   Cleanup
#> 58       0.18   0.18    FALSE   Cleanup
#> 59       0.19   0.19    FALSE   Cleanup
#> 60       0.20   0.20    FALSE   Cleanup
#> 61       0.20   0.20    FALSE   Cleanup
#> 62       0.21   0.21    FALSE   Cleanup
#> 63       0.21   0.21    FALSE   Cleanup
#> 64       0.22   0.22    FALSE   Cleanup
#> 65       0.22   0.22    FALSE   Cleanup
#> 66       0.22   0.22    FALSE   Cleanup
#> 67       0.23   0.23    FALSE   Cleanup
#> 68       0.24   0.24    FALSE   Cleanup
#> 69       0.25   0.25    FALSE   Cleanup
#> 70       0.25   0.25    FALSE   Cleanup
#> 71       0.25   0.25    FALSE   Cleanup
#> 72       0.25   0.25    FALSE   Cleanup
#> 73       0.26   0.26    FALSE   Cleanup
#> 74       0.28   0.28    FALSE   Cleanup
#> 75       0.28   0.28    FALSE   Cleanup
#> 76       0.29   0.29    FALSE   Cleanup
#> 77       0.31   0.31    FALSE   Cleanup
#> 78       0.33   0.33    FALSE   Cleanup
#> 79       0.33   0.33    FALSE   Cleanup
#> 80       0.33   0.33    FALSE   Cleanup
#> 81       0.34   0.34    FALSE   Cleanup
#> 82       0.37   0.37    FALSE   Cleanup
#> 83       0.38   0.38    FALSE   Cleanup
#> 84       0.39   0.39    FALSE   Cleanup
#> 85       0.40   0.40    FALSE   Cleanup
#> 86       0.43   0.43    FALSE   Cleanup
#> 87       0.43   0.43    FALSE   Cleanup
#> 88       0.47   0.47    FALSE   Cleanup
#> 89       0.48   0.48    FALSE   Cleanup
#> 90       0.48   0.48    FALSE   Cleanup
#> 91       0.49   0.49    FALSE   Cleanup
#> 92       0.51   0.51    FALSE   Cleanup
#> 93       0.51   0.51    FALSE   Cleanup
#> 94       0.54   0.54    FALSE   Cleanup
#> 95       0.60   0.60    FALSE   Cleanup
#> 96       0.61   0.61    FALSE   Cleanup
#> 97       0.62   0.62    FALSE   Cleanup
#> 98       0.75   0.75    FALSE   Cleanup
#> 99       0.82   0.82    FALSE   Cleanup
#> 100      0.85   0.85    FALSE   Cleanup
#> 101      0.92   0.92    FALSE   Cleanup
#> 102      0.94   0.94    FALSE   Cleanup
#> 103      1.05   1.05    FALSE   Cleanup
#> 104      1.10   1.10    FALSE   Cleanup
#> 105      1.10   1.10    FALSE   Cleanup
#> 106      1.19   1.19    FALSE   Cleanup
#> 107      1.22   1.22    FALSE   Cleanup
#> 108      1.33   1.33    FALSE   Cleanup
#> 109      1.39   1.39    FALSE   Cleanup
#> 110      1.39   1.39    FALSE   Cleanup
#> 111      1.52   1.52    FALSE   Cleanup
#> 112      1.53   1.53    FALSE   Cleanup
#> 113      1.73   1.73    FALSE   Cleanup
#> 114      2.35   2.35    FALSE   Cleanup
#> 115      2.46   2.46    FALSE   Cleanup
#> 116      2.59   2.59    FALSE   Cleanup
#> 117      2.61   2.61    FALSE   Cleanup
#> 118      3.06   3.06    FALSE   Cleanup
#> 119      3.29   3.29    FALSE   Cleanup
#> 120      5.56   5.56    FALSE   Cleanup
#> 121      6.61   6.61    FALSE   Cleanup
#> 122     18.40  18.40    FALSE   Cleanup
#> 123     51.97  51.97    FALSE   Cleanup
#> 124    168.64 168.64    FALSE   Cleanup
  #    TcCB.orig   TcCB Censored      Area
  #1        0.22   0.22    FALSE Reference
  #2        0.23   0.23    FALSE Reference
  #...
  #46       1.20   1.20    FALSE Reference
  #47       1.33   1.33    FALSE Reference
  #48      <0.09   0.09     TRUE   Cleanup
  #49       0.09   0.09    FALSE   Cleanup
  #...
  #123     51.97  51.97    FALSE   Cleanup
  #124    168.64 168.64    FALSE   Cleanup

  # Determine the values to use for r and k for 
  # a desired significance level of 0.01 
  #--------------------------------------------

  p.vals <- quantileTestPValue(m = 77, n = 47, 
    r = c(rep(8, 3), rep(9, 3), rep(10, 3)), 
    k = c(6, 7, 8, 7, 8, 9, 8, 9, 10)) 

  round(p.vals, 3) 
#> [1] 0.355 0.122 0.019 0.264 0.081 0.011 0.193 0.053 0.007
  #[1] 0.355 0.122 0.019 0.264 0.081 0.011 0.193 0.053 0.007 

  # Choose r=9, k=9 to get a significance level of 0.011
  #-----------------------------------------------------

  with(EPA.94b.tccb.df, 
    quantileTest(TcCB[Area=="Cleanup"], TcCB[Area=="Reference"], 
    target.r = 9)) 
#> $statistic
#> k (# x obs of r largest)                        r 
#>                        9                        9 
#> 
#> $parameters
#>           m           n quantile.ub 
#>      77.000      47.000       0.928 
#> 
#> $p.value
#> [1] 0.01136926
#> 
#> $estimate
#> NULL
#> 
#> $null.value
#> e 
#> 0 
#> 
#> $alternative
#> [1] "Tail of Fx Shifted to Right of\n                                 Tail of Fy.\n                                 0 < e <= 1, where\n                                 Fx(t) = (1-e)*Fy(t) + e*Fz(t),\n                                 Fz(t) <= Fy(t) for all t,\n                                 and Fy != Fz"
#> 
#> $method
#> [1] "Quantile Test"
#> 
#> $estimation.method
#> NULL
#> 
#> $sample.size
#> nx ny 
#> 77 47 
#> 
#> $data.name
#>                             x                             y 
#>   "TcCB[Area == \"Cleanup\"]" "TcCB[Area == \"Reference\"]" 
#> 
#> $bad.obs
#> [1] 0
#> 
#> attr(,"class")
#> [1] "htestEnvStats"

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 e = 0
  #
  #Alternative Hypothesis:          Tail of Fx Shifted to Right of
  #                                 Tail of Fy.
  #                                 0 < e <= 1, where
  #                                 Fx(t) = (1-e)*Fy(t) + e*Fz(t),
  #                                 Fz(t) <= Fy(t) for all t,
  #                                 and Fy != Fz
  #
  #Test Name:                       Quantile Test
  #
  #Data:                            x = TcCB[Area == "Cleanup"]  
  #                                 y = TcCB[Area == "Reference"]
  #
  #Sample Sizes:                    nx = 77
  #                                 ny = 47
  #
  #Test Statistics:                 k (# x obs of r largest) = 9
  #                                 r                        = 9
  #
  #Test Statistic Parameters:       m           = 77.000
  #                                 n           = 47.000
  #                                 quantile.ub =  0.928
  #
  #P-value:                         0.01136926

  #==========

  # Clean up
  #---------

  rm(p.vals)