serialCorrelationTest.Rd
serialCorrelationTest
is a generic function used to test for the
presence of lag-one serial correlation using either the rank
von Neumann ratio test, the normal approximation based on the Yule-Walker
estimate of lag-one correlation, or the normal approximation based on the
MLE of lag-one correlation. The function invokes particular
methods
which depend on the class
of the first
argument.
Currently, there is a default method and a method for objects of class "lm"
.
serialCorrelationTest(x, ...)
# Default S3 method
serialCorrelationTest(x, test = "rank.von.Neumann",
alternative = "two.sided", conf.level = 0.95, ...)
# S3 method for class 'lm'
serialCorrelationTest(x, test = "rank.von.Neumann",
alternative = "two.sided", conf.level = 0.95, ...)
numeric vector of observations, a numeric univariate time series of
class "ts"
, or an object of class "lm"
. Undefined (NaN
) and
infinite (Inf
, -Inf
) values are not allowed for x
when x
is a numeric vector or time series, nor for the residuals
associated with x
when x
is an object of class "lm"
.
When test="AR1.mle"
, missing (NA
) values are allowed, otherwise
they are not allowed. When x
is a numeric vector of observations
or a numeric univariate time series of class "ts"
, it must contain at least
3 non-missing values. When x
is an object of class "lm"
, the
residuals must contain at least 3 non-missing values.
Note: when x
is an object of class "lm"
, the linear model
should have been fit using the argument na.action=na.exclude
in the
call to lm
in order to correctly deal with missing values.
character string indicating which test to use. The possible values are: "rank.von.Neumann"
(rank von Neumann ratio test; the default), "AR1.yw"
(z-test based on Yule-Walker lag-one estimate of correlation), and "AR1.mle"
(z-test based on MLE of lag-one correlation).
character string indicating the kind of alternative hypothesis. The possible
values are "two.sided"
(the default), "greater"
, and "less"
.
numeric scalar between 0 and 1 indicating the confidence level associated with
the confidence interval for the population lag-one autocorrelation. The default
value is conf.level=0.95
.
optional arguments for possible future methods. Currently not used.
Let \(\underline{x} = x_1, x_2, \ldots, x_n\) denote \(n\) observations from a
stationary time series sampled at equispaced points in time with normal (Gaussian)
errors. The function serialCorrelationTest
tests the null hypothesis:
$$H_0: \rho_1 = 0 \;\;\;\;\;\; (1)$$
where \(\rho_1\) denotes the true lag-1 autocorrelation (also called the lag-1
serial correlation coefficient). Actually, the null hypothesis is that the
lag-\(k\) autocorrelation is 0 for all values of \(k\) greater than 0 (i.e.,
the time series is purely random).
In the case when the argument x
is a linear model, the function
serialCorrelationTest
tests the null hypothesis (1) for the
residuals.
The three possible alternative hypotheses are the upper one-sided alternative
(alternative="greater"
):
$$H_a: \rho_1 > 0 \;\;\;\;\;\; (2)$$
the lower one-sided alternative (alternative="less"
):
$$H_a: \rho_1 < 0 \;\;\;\;\;\; (3)$$
and the two-sided alternative:
$$H_a: \rho_1 \ne 0 \;\;\;\;\;\; (4)$$
Testing the Null Hypothesis of No Lag-1 Autocorrelation
There are several possible methods for testing the null hypothesis (1) versus any
of the three alternatives (2)-(4). The function serialCorrelationTest
allows
you to use one of three possible tests:
The rank von Neuman ratio test.
The test based on the normal approximation for the distribution of the Yule-Walker estimate of lag-one correlation.
The test based on the normal approximation for the distribution of the maximum likelihood estimate (MLE) of lag-one correlation.
Each of these tests is described below.
Test Based on Yule-Walker Estimate (test="AR1.yw"
)
The Yule-Walker estimate of the lag-1 autocorrelation is given by:
$$\hat{\rho}_1 = \frac{\hat{\gamma}_1}{\hat{\gamma}_0} \;\;\;\;\;\; (5)$$
where
$$\hat{\gamma}_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x}) \;\;\;\;\;\; (6)$$
is the estimate of the lag-\(k\) autocovariance.
(This estimator does not allow for missing values.)
Under the null hypothesis (1), the estimator of lag-1 correlation in Equation (5) is
approximately distributed as a normal (Gaussian) random variable with mean 0 and
variance given by:
$$Var(\hat{\rho}_1) \approx \frac{1}{n} \;\;\;\;\;\; (7)$$
(Box and Jenkins, 1976, pp.34-35). Thus, the null hypothesis (1) can be tested
with the statistic
$$z = \sqrt{n} \hat{\rho_1} \;\;\;\;\;\; (8)$$
which is distributed approximately as a standard normal random variable under the
null hypothesis that the lag-1 autocorrelation is 0.
Test Based on the MLE (test="AR1.mle"
)
The function serialCorrelationTest
the R function arima
to
compute the MLE of the lag-one autocorrelation and the estimated variance of this
estimator. As for the test based on the Yule-Walker estimate, the z-statistic is
computed as the estimated lag-one autocorrelation divided by the square root of the
estimated variance.
Test Based on Rank von Neumann Ratio (test="rank.von.Neumann"
)
The null distribution of the serial correlation coefficient may be badly affected
by departures from normality in the underlying process (Cox, 1966; Bartels, 1977).
It is therefore a good idea to consider using a nonparametric test for randomness if
the normality of the underlying process is in doubt (Bartels, 1982).
Wald and Wolfowitz (1943) introduced the rank serial correlation coefficient, which for lag-1 autocorrelation is simply the Yule-Walker estimate (Equation (5) above) with the actual observations replaced with their ranks.
von Neumann et al. (1941) introduced a test for randomness in the context of testing for trend in the mean of a process. Their statistic is given by: $$V = \frac{\sum_{i=1}^{n-1}(x_i - x_{i+1})^2}{\sum_{i=1}^n (x_i - \bar{x})^2} \;\;\;\;\;\; (9)$$ which is the ratio of the square of successive differences to the usual sums of squared deviations from the mean. This statistic is bounded between 0 and 4, and for a purely random process is symmetric about 2. Small values of this statistic indicate possible positive autocorrelation, and large values of this statistics indicate possible negative autocorrelation. Durbin and Watson (1950, 1951, 1971) proposed using this statistic in the context of checking the independence of residuals from a linear regression model and provided tables for the distribution of this statistic. This statistic is therefore often called the “Durbin-Watson statistic” (Draper and Smith, 1998, p.181).
The rank version of the von Neumann ratio statistic is given by: $$V_{rank} = \frac{\sum_{i=1}^{n-1}(R_i - R_{i+1})^2}{\sum_{i=1}^n (R_i - \bar{R})^2} \;\;\;\;\;\; (10)$$ where \(R_i\) denotes the rank of the \(i\)'th observation (Bartels, 1982). (This test statistic does not allow for missing values.) In the absence of ties, the denominator of this test statistic is equal to $$\sum_{i=1}^n (R_i - \bar{R})^2 = \frac{n(n^2 - 1)}{12} \;\;\;\;\;\; (11)$$ The range of the \(V_{rank}\) test statistic is given by: $$[\frac{12}{(n)(n+1)} , 4 - \frac{12}{(n)(n+1)}] \;\;\;\;\;\; (12)$$ if n is even, with a negligible adjustment if n is odd (Bartels, 1982), so asymptotically the range is from 0 to 4, just as for the \(V\) test statistic in Equation (9) above.
Bartels (1982) shows that asymptotically, the rank von Neumann ratio statistic is a linear transformation of the rank serial correlation coefficient, so any asymptotic results apply to both statistics.
For any fixed sample size \(n\), the exact distribution of the \(V_{rank}\) statistic in Equation (10) above can be computed by simply computing the value of \(V_{rank}\) for all possible permutations of the serial order of the ranks. Based on this exact distribution, Bartels (1982) presents a table of critical values for the numerator of the RVN statistic for sample sizes between 4 and 10.
Determining the exact distribution of \(V_{rank}\) becomes impractical as the
sample size increases. For values of n between 10 and 100, Bartels (1982)
approximated the distribution of \(V_{rank}\) by a
beta distribution over the range 0 to 4 with shape parameters
shape1=
\(\nu\) and shape2=
\(\omega\) and:
$$\nu = \omega = \frac{5n(n+1)(n-1)^2}{2(n-2)(5n^2 - 2n - 9)} - \frac{1}{2} \;\;\;\;\;\; (13)$$
Bartels (1982) checked this approximation by simulating the distribution of
\(V_{rank}\) for \(n=25\) and \(n=50\) and comparing the empirical quantiles
at \(0.005\), \(0.01\), \(0.025\), \(0.05\), and \(0.1\) with the
approximated quantiles based on the beta distribution. He found that the quantiles
agreed to 2 decimal places for eight of the 10 values, and differed by \(0.01\)
for the other two values.
Note: The definition of the beta distribution assumes the random variable ranges from 0 to 1. This definition can be generalized as follows. Suppose the random variable \(Y\) has a beta distribution over the range \(a \le y \le b\), with shape parameters \(\nu\) and \(\omega\). Then the random variable \(X\) defined as: $$X = \frac{Y-a}{b-a} \;\;\;\;\;\; (14)$$ has the “standard beta distribution” as described in the help file for Beta (Johnson et al., 1995, p.210).
Bartels (1982) shows that asymptotically, \(V_{rank}\) has normal distribution with mean 2 and variance \(4/n\), but notes that a slightly better approximation is given by using a variance of \(20/(5n + 7)\).
To test the null hypothesis (1) when test="rank.von.Neumann"
, the function serialCorrelationTest
does the following:
When the sample size is between 3 and 10, the exact distribution of \(V_{rank}\) is used to compute the p-value.
When the sample size is between 11 and 100, the beta approximation to the distribution of \(V_{rank}\) is used to compute the p-value.
When the sample size is larger than 100, the normal approximation to the distribution of \(V_{rank}\) is used to compute the p-value. (This uses the variance \(20/(5n + 7)\).)
When ties are present in the observations and midranks are used for the tied observations, the distribution of the \(V_{rank}\) statistic based on the assumption of no ties is not applicable. If the number of ties is small, however, they may not grossly affect the assumed p-value.
When ties are present, the function serialCorrelationTest
issues a warning.
When the sample size is between 3 and 10, the p-value is computed based on
rounding up the computed value of \(V_{rank}\) to the nearest possible value
that could be observed in the case of no ties.
Computing a Confidence Interval for the Lag-1 Autocorrelation
The function serialCorrelationTest
computes an approximate
\(100(1-\alpha)\%\) confidence interval for the lag-1 autocorrelation as follows:
$$[\hat{\rho}_1 - z_{1-\alpha/2}\hat{\sigma}_{\hat{\rho}_1}, \hat{\rho}_1 + z_{1-\alpha/2}\hat{\sigma}_{\hat{\rho}_1}] \;\;\;\;\;\; (15)$$
where \(\hat{\sigma}_{\hat{\rho}_1}\) denotes the estimated standard deviation of
the estimated of lag-1 autocorrelation and \(z_p\) denotes the \(p\)'th quantile
of the standard normal distribution.
When test="AR1.yw"
or test="rank.von.Neumann"
, the Yule-Walker
estimate of lag-1 autocorrelation is used and the variance of the estimated
lag-1 autocorrelation is approximately:
$$Var(\hat{\rho}_1) \approx \frac{1}{n} (1 - \rho_1^2) \;\;\;\;\;\; (16)$$
(Box and Jenkins, 1976, p.34), so
$$\hat{\sigma}_{\hat{\rho}_1} = \sqrt{\frac{1 - \hat{\rho}_1^2}{n}} \;\;\;\;\;\; (17)$$
When test="AR1.mle"
, the MLE of the lag-1 autocorrelation is used, and its
standard deviation is estimated with the square root of the estimated variance
returned by arima
.
A list of class "htest"
containing the results of the hypothesis test.
See the help file for htest.object
for details.
Bartels, R. (1982). The Rank Version of von Neumann's Ratio Test for Randomness. Journal of the American Statistical Association 77(377), 40–46.
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.
Box, G.E.P., and G.M. Jenkins. (1976). Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, Chapter 2.
Cox, D.R. (1966). The Null Distribution of the First Serial Correlation Coefficient. Biometrika 53, 623–626.
Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York, pp.69-70;181-192.
Durbin, J., and G.S. Watson. (1950). Testing for Serial Correlation in Least Squares Regression I. Biometrika 37, 409–428.
Durbin, J., and G.S. Watson. (1951). Testing for Serial Correlation in Least Squares Regression II. Biometrika 38, 159–178.
Durbin, J., and G.S. Watson. (1971). Testing for Serial Correlation in Least Squares Regression III. Biometrika 58, 1–19.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.250–253.
Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York, Chapter 25.
Knoke, J.D. (1975). Testing for Randomness Against Autocorrelation Alternatives: The Parametric Case. Biometrika 62, 571–575.
Knoke, J.D. (1977). Testing for Randomness Against Autocorrelation Alternatives: Alternative Tests. Biometrika 64, 523–529.
Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, 457pp.
von Neumann, J., R.H. Kent, H.R. Bellinson, and B.I. Hart. (1941). The Mean Square Successive Difference. Annals of Mathematical Statistics 12(2), 153–162.
Wald, A., and J. Wolfowitz. (1943). An Exact Test for Randomness in the Non-Parametric Case Based on Serial Correlation. Annals of Mathematical Statistics 14, 378–388.
Data collected over time on the same phenomenon are called a time series. A time series is usually modeled as a single realization of a stochastic process; that is, if we could go back in time and repeat the experiment, we would get different results that would vary according to some probabilistic law. The simplest kind of time series is a stationary time series, in which the mean value is constant over time, the variability of the observations is constant over time, etc. That is, the probability distribution associated with each future observation is the same.
A common concern in applying standard statistical tests to time series data is the assumption of independence. Most conventional statistical hypothesis tests assume the observations are independent, but data collected sequentially in time may not satisfy this assumption. For example, high observations may tend to follow high observations (positive serial correlation), or low observations may tend to follow high observations (negative serial correlation). One way to investigate the assumption of independence is to estimate the lag-one serial correlation and test whether it is significantly different from 0.
The null distribution of the serial correlation coefficient may be badly affected by departures from normality in the underlying process (Cox, 1966; Bartels, 1977). It is therefore a good idea to consider using a nonparametric test for randomness if the normality of the underlying process is in doubt (Bartels, 1982). Knoke (1977) showed that under normality, the test based on the rank serial correlation coefficient (and hence the test based on the rank von Neumann ratio statistic) has asymptotic relative efficiency of 0.91 with respect to using the test based on the ordinary serial correlation coefficient against the alternative of first-order autocorrelation.
Bartels (1982) performed an extensive simulation study of the power of the rank von Neumann ratio test relative to the standard von Neumann ratio test (based on the statistic in Equation (9) above) and the runs test (Lehmann, 1975, 313-315). He generated a first-order autoregressive process for sample sizes of 10, 25, and 50, using 6 different parent distributions: normal, Cauchy, contaminated normal, Johnson, Stable, and exponential. Values of lag-1 autocorrelation ranged from -0.8 to 0.8. Bartels (1982) found three important results:
The rank von Neumann ratio test is far more powerful than the runs test.
For the normal process, the power of the rank von Neumann ratio test was never less than 89% of the power of the standard von Neumann ratio test.
For non-normal processes, the rank von Neumann ratio test was often much more powerful than of the standard von Neumann ratio test.
# Generate a purely random normal process, then use serialCorrelationTest
# to test for the presence of correlation.
# (Note: the call to set.seed allows you to reproduce this example.)
set.seed(345)
x <- rnorm(100)
# Look at the data
#-----------------
dev.new()
ts.plot(x)
dev.new()
acf(x)
# Test for serial correlation
#----------------------------
serialCorrelationTest(x)
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: rho = 0
#>
#> Alternative Hypothesis: True rho is not equal to 0
#>
#> Test Name: Rank von Neumann Test for
#> Lag-1 Autocorrelation
#> (Beta Approximation)
#>
#> Estimated Parameter(s): rho = 0.02773737
#>
#> Estimation Method: Yule-Walker
#>
#> Data: x
#>
#> Sample Size: 100
#>
#> Test Statistic: RVN = 1.929733
#>
#> P-value: 0.7253405
#>
#> Confidence Interval for: rho
#>
#> Confidence Interval Method: Normal Approximation
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = -0.1681836
#> UCL = 0.2236584
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: rho = 0
#
#Alternative Hypothesis: True rho is not equal to 0
#
#Test Name: Rank von Neumann Test for
# Lag-1 Autocorrelation
# (Beta Approximation)
#
#Estimated Parameter(s): rho = 0.02773737
#
#Estimation Method: Yule-Walker
#
#Data: x
#
#Sample Size: 100
#
#Test Statistic: RVN = 1.929733
#
#P-value: 0.7253405
#
#Confidence Interval for: rho
#
#Confidence Interval Method: Normal Approximation
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = -0.1681836
# UCL = 0.2236584
# Clean up
#---------
rm(x)
graphics.off()
#==========
# Now use the R function arima.sim to generate an AR(1) process with a
# lag-1 autocorrelation of 0.8, then test for autocorrelation.
set.seed(432)
y <- arima.sim(model = list(ar = 0.8), n = 100)
# Look at the data
#-----------------
dev.new()
ts.plot(y)
dev.new()
acf(y)
# Test for serial correlation
#----------------------------
serialCorrelationTest(y)
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: rho = 0
#>
#> Alternative Hypothesis: True rho is not equal to 0
#>
#> Test Name: Rank von Neumann Test for
#> Lag-1 Autocorrelation
#> (Beta Approximation)
#>
#> Estimated Parameter(s): rho = 0.835214
#>
#> Estimation Method: Yule-Walker
#>
#> Data: y
#>
#> Sample Size: 100
#>
#> Test Statistic: RVN = 0.3743174
#>
#> P-value: 0
#>
#> Confidence Interval for: rho
#>
#> Confidence Interval Method: Normal Approximation
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 0.7274307
#> UCL = 0.9429973
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: rho = 0
#
#Alternative Hypothesis: True rho is not equal to 0
#
#Test Name: Rank von Neumann Test for
# Lag-1 Autocorrelation
# (Beta Approximation)
#
#Estimated Parameter(s): rho = 0.835214
#
#Estimation Method: Yule-Walker
#
#Data: y
#
#Sample Size: 100
#
#Test Statistic: RVN = 0.3743174
#
#P-value: 0
#
#Confidence Interval for: rho
#
#Confidence Interval Method: Normal Approximation
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.7274307
# UCL = 0.9429973
#----------
# Clean up
#---------
rm(y)
graphics.off()
#==========
# The data frame Air.df contains information on ozone (ppb^1/3),
# radiation (langleys), temperature (degrees F), and wind speed (mph)
# for 153 consecutive days between May 1 and September 30, 1973.
# First test for serial correlation in (the cube root of) ozone.
# Note that we must use the test based on the MLE because the time series
# contains missing values. Serial correlation appears to be present.
# Next fit a linear model that includes the predictor variables temperature,
# radiation, and wind speed, and test for the presence of serial correlation
# in the residuals. There is no evidence of serial correlation.
# Look at the data
#-----------------
Air.df
#> ozone radiation temperature wind
#> 05/01/1973 3.448217 190 67 7.4
#> 05/02/1973 3.301927 118 72 8.0
#> 05/03/1973 2.289428 149 74 12.6
#> 05/04/1973 2.620741 313 62 11.5
#> 05/05/1973 NA NA 56 14.3
#> 05/06/1973 3.036589 NA 66 14.9
#> 05/07/1973 2.843867 299 65 8.6
#> 05/08/1973 2.668402 99 59 13.8
#> 05/09/1973 2.000000 19 61 20.1
#> 05/10/1973 NA 194 69 8.6
#> 05/11/1973 1.912931 NA 74 6.9
#> 05/12/1973 2.519842 256 69 9.7
#> 05/13/1973 2.223980 290 66 9.2
#> 05/14/1973 2.410142 274 68 10.9
#> 05/15/1973 2.620741 65 58 13.2
#> 05/16/1973 2.410142 334 64 11.5
#> 05/17/1973 3.239612 307 66 12.0
#> 05/18/1973 1.817121 78 57 18.4
#> 05/19/1973 3.107233 322 68 11.5
#> 05/20/1973 2.223980 44 62 9.7
#> 05/21/1973 1.000000 8 59 9.7
#> 05/22/1973 2.223980 320 73 16.6
#> 05/23/1973 1.587401 25 61 9.7
#> 05/24/1973 3.174802 92 61 12.0
#> 05/25/1973 NA 66 57 16.6
#> 05/26/1973 NA 266 58 14.9
#> 05/27/1973 NA NA 57 8.0
#> 05/28/1973 2.843867 13 67 12.0
#> 05/29/1973 3.556893 252 81 14.9
#> 05/30/1973 4.862944 223 79 5.7
#> 05/31/1973 3.332222 279 76 7.4
#> 06/01/1973 NA 286 78 8.6
#> 06/02/1973 NA 287 74 9.7
#> 06/03/1973 NA 242 67 16.1
#> 06/04/1973 NA 186 84 9.2
#> 06/05/1973 NA 220 85 8.6
#> 06/06/1973 NA 264 79 14.3
#> 06/07/1973 3.072317 127 82 9.7
#> 06/08/1973 NA 273 87 6.9
#> 06/09/1973 4.140818 291 90 13.8
#> 06/10/1973 3.391211 323 87 11.5
#> 06/11/1973 NA 259 93 10.9
#> 06/12/1973 NA 250 92 9.2
#> 06/13/1973 2.843867 148 82 8.0
#> 06/14/1973 NA 332 80 13.8
#> 06/15/1973 NA 322 79 11.5
#> 06/16/1973 2.758924 191 77 14.9
#> 06/17/1973 3.332222 284 72 20.7
#> 06/18/1973 2.714418 37 65 9.2
#> 06/19/1973 2.289428 120 73 11.5
#> 06/20/1973 2.351335 137 76 10.3
#> 06/21/1973 NA 150 77 6.3
#> 06/22/1973 NA 59 76 1.7
#> 06/23/1973 NA 91 76 4.6
#> 06/24/1973 NA 250 76 6.3
#> 06/25/1973 NA 135 75 8.0
#> 06/26/1973 NA 127 78 8.0
#> 06/27/1973 NA 47 73 10.3
#> 06/28/1973 NA 98 80 11.5
#> 06/29/1973 NA 31 77 14.9
#> 06/30/1973 NA 138 83 8.0
#> 07/01/1973 5.129928 269 84 4.0
#> 07/02/1973 3.659306 248 85 9.2
#> 07/03/1973 3.174802 236 81 9.2
#> 07/04/1973 NA 101 84 10.9
#> 07/05/1973 4.000000 175 83 4.6
#> 07/06/1973 3.419952 314 83 10.9
#> 07/07/1973 4.254321 276 88 5.1
#> 07/08/1973 4.594701 267 92 6.3
#> 07/09/1973 4.594701 272 92 5.7
#> 07/10/1973 4.396830 175 89 7.4
#> 07/11/1973 NA 139 82 8.6
#> 07/12/1973 2.154435 264 73 14.3
#> 07/13/1973 3.000000 175 81 14.9
#> 07/14/1973 NA 291 91 14.9
#> 07/15/1973 1.912931 48 80 14.3
#> 07/16/1973 3.634241 260 81 6.9
#> 07/17/1973 3.271066 274 82 10.3
#> 07/18/1973 3.936497 285 84 6.3
#> 07/19/1973 4.290840 187 87 5.1
#> 07/20/1973 3.979057 220 85 11.5
#> 07/21/1973 2.519842 7 74 6.9
#> 07/22/1973 NA 258 81 9.7
#> 07/23/1973 NA 295 82 11.5
#> 07/24/1973 4.308869 294 86 8.6
#> 07/25/1973 4.762203 223 85 8.0
#> 07/26/1973 2.714418 81 82 8.6
#> 07/27/1973 3.732511 82 86 12.0
#> 07/28/1973 4.344481 213 88 7.4
#> 07/29/1973 3.684031 275 86 7.4
#> 07/30/1973 4.000000 253 83 7.4
#> 07/31/1973 3.892996 254 81 9.2
#> 08/01/1973 3.391211 83 81 6.9
#> 08/02/1973 2.080084 24 81 13.8
#> 08/03/1973 2.519842 77 82 7.4
#> 08/04/1973 4.272659 NA 86 6.9
#> 08/05/1973 3.271066 NA 85 7.4
#> 08/06/1973 4.041240 NA 87 4.6
#> 08/07/1973 4.959676 255 89 4.0
#> 08/08/1973 4.464745 229 90 10.3
#> 08/09/1973 4.791420 207 90 8.0
#> 08/10/1973 NA 222 92 8.6
#> 08/11/1973 NA 137 86 11.5
#> 08/12/1973 3.530348 192 86 11.5
#> 08/13/1973 3.036589 273 82 11.5
#> 08/14/1973 4.020726 157 80 9.7
#> 08/15/1973 NA 64 79 11.5
#> 08/16/1973 2.802039 71 77 10.3
#> 08/17/1973 3.892996 51 79 6.3
#> 08/18/1973 2.843867 115 76 7.4
#> 08/19/1973 3.141381 244 78 10.9
#> 08/20/1973 3.530348 190 78 10.3
#> 08/21/1973 2.758924 259 77 15.5
#> 08/22/1973 2.080084 36 72 14.3
#> 08/23/1973 NA 255 75 12.6
#> 08/24/1973 3.556893 212 79 9.7
#> 08/25/1973 5.517848 238 81 3.4
#> 08/26/1973 4.179339 215 86 8.0
#> 08/27/1973 NA 153 88 5.7
#> 08/28/1973 4.235824 203 97 9.7
#> 08/29/1973 4.904868 225 94 2.3
#> 08/30/1973 4.379519 237 96 6.3
#> 08/31/1973 4.396830 188 94 6.3
#> 09/01/1973 4.578857 167 91 6.9
#> 09/02/1973 4.272659 197 92 5.1
#> 09/03/1973 4.179339 183 93 2.8
#> 09/04/1973 4.497941 189 93 4.6
#> 09/05/1973 3.608826 95 87 7.4
#> 09/06/1973 3.174802 92 84 15.5
#> 09/07/1973 2.714418 252 80 10.9
#> 09/08/1973 2.843867 220 78 10.3
#> 09/09/1973 2.758924 230 75 10.9
#> 09/10/1973 2.884499 259 73 9.7
#> 09/11/1973 3.530348 236 81 14.9
#> 09/12/1973 2.758924 259 76 15.5
#> 09/13/1973 3.036589 238 77 6.3
#> 09/14/1973 2.080084 24 71 10.9
#> 09/15/1973 2.351335 112 71 11.5
#> 09/16/1973 3.583048 237 78 6.9
#> 09/17/1973 2.620741 224 67 13.8
#> 09/18/1973 2.351335 27 76 10.3
#> 09/19/1973 2.884499 238 68 10.3
#> 09/20/1973 2.519842 201 82 8.0
#> 09/21/1973 2.351335 238 64 12.6
#> 09/22/1973 2.843867 14 71 9.2
#> 09/23/1973 3.301927 139 81 10.3
#> 09/24/1973 1.912931 49 69 10.3
#> 09/25/1973 2.410142 20 63 16.6
#> 09/26/1973 3.107233 193 70 6.9
#> 09/27/1973 NA 145 77 13.2
#> 09/28/1973 2.410142 191 75 14.3
#> 09/29/1973 2.620741 131 76 8.0
#> 09/30/1973 2.714418 223 68 11.5
# ozone radiation temperature wind
#05/01/1973 3.448217 190 67 7.4
#05/02/1973 3.301927 118 72 8.0
#05/03/1973 2.289428 149 74 12.6
#05/04/1973 2.620741 313 62 11.5
#05/05/1973 NA NA 56 14.3
#...
#09/27/1973 NA 145 77 13.2
#09/28/1973 2.410142 191 75 14.3
#09/29/1973 2.620741 131 76 8.0
#09/30/1973 2.714418 223 68 11.5
#----------
# Test for serial correlation
#----------------------------
with(Air.df,
serialCorrelationTest(ozone, test = "AR1.mle"))
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: rho = 0
#>
#> Alternative Hypothesis: True rho is not equal to 0
#>
#> Test Name: z-Test for
#> Lag-1 Autocorrelation
#> (Wald Test Based on MLE)
#>
#> Estimated Parameter(s): rho = 0.5641616
#>
#> Estimation Method: Maximum Likelihood
#>
#> Data: ozone
#>
#> Sample Size: 153
#>
#> Number NA/NaN/Inf's: 37
#>
#> Test Statistic: z = 7.586952
#>
#> P-value: 3.28626e-14
#>
#> Confidence Interval for: rho
#>
#> Confidence Interval Method: Normal Approximation
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = 0.4184197
#> UCL = 0.7099034
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: rho = 0
#
#Alternative Hypothesis: True rho is not equal to 0
#
#Test Name: z-Test for
# Lag-1 Autocorrelation
# (Wald Test Based on MLE)
#
#Estimated Parameter(s): rho = 0.5641616
#
#Estimation Method: Maximum Likelihood
#
#Data: ozone
#
#Sample Size: 153
#
#Number NA/NaN/Inf's: 37
#
#Test Statistic: z = 7.586952
#
#P-value: 3.28626e-14
#
#Confidence Interval for: rho
#
#Confidence Interval Method: Normal Approximation
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.4184197
# UCL = 0.7099034
#----------
# Next fit a linear model that includes the predictor variables temperature,
# radiation, and wind speed, and test for the presence of serial correlation
# in the residuals. Note setting the argument na.action = na.exclude in the
# call to lm to correctly deal with missing values.
#----------------------------------------------------------------------------
lm.ozone <- lm(ozone ~ radiation + temperature + wind +
I(temperature^2) + I(wind^2),
data = Air.df, na.action = na.exclude)
# Now test for serial correlation in the residuals.
#--------------------------------------------------
serialCorrelationTest(lm.ozone, test = "AR1.mle")
#>
#> Results of Hypothesis Test
#> --------------------------
#>
#> Null Hypothesis: rho = 0
#>
#> Alternative Hypothesis: True rho is not equal to 0
#>
#> Test Name: z-Test for
#> Lag-1 Autocorrelation
#> (Wald Test Based on MLE)
#>
#> Estimated Parameter(s): rho = 0.1298024
#>
#> Estimation Method: Maximum Likelihood
#>
#> Data: Residuals
#>
#> Data Source: lm.ozone
#>
#> Sample Size: 153
#>
#> Number NA/NaN/Inf's: 42
#>
#> Test Statistic: z = 1.285963
#>
#> P-value: 0.1984559
#>
#> Confidence Interval for: rho
#>
#> Confidence Interval Method: Normal Approximation
#>
#> Confidence Interval Type: two-sided
#>
#> Confidence Level: 95%
#>
#> Confidence Interval: LCL = -0.06803223
#> UCL = 0.32763704
#>
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: rho = 0
#
#Alternative Hypothesis: True rho is not equal to 0
#
#Test Name: z-Test for
# Lag-1 Autocorrelation
# (Wald Test Based on MLE)
#
#Estimated Parameter(s): rho = 0.1298024
#
#Estimation Method: Maximum Likelihood
#
#Data: Residuals
#
#Data Source: lm.ozone
#
#Sample Size: 153
#
#Number NA/NaN/Inf's: 42
#
#Test Statistic: z = 1.285963
#
#P-value: 0.1984559
#
#Confidence Interval for: rho
#
#Confidence Interval Method: Normal Approximation
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = -0.06803223
# UCL = 0.32763704
# Clean up
#---------
rm(lm.ozone)