Test for the Presence of Serial Correlation

serialCorrelationTest is a generic function used to test for the presence of lag-one serial correlation using either the rank von Neumann ratio test, the normal approximation based on the Yule-Walker estimate of lag-one correlation, or the normal approximation based on the MLE of lag-one correlation. The function invokes particular methods which depend on the class of the first argument.

Currently, there is a default method and a method for objects of class "lm".

serialCorrelationTest(x, ...)

  # Default S3 method
serialCorrelationTest(x, test = "rank.von.Neumann", 
    alternative = "two.sided", conf.level = 0.95, ...) 

  # S3 method for class 'lm'
serialCorrelationTest(x, test = "rank.von.Neumann", 
    alternative = "two.sided", conf.level = 0.95, ...)

Arguments

x

numeric vector of observations, a numeric univariate time series of class "ts", or an object of class "lm". Undefined (NaN) and infinite (Inf, -Inf) values are not allowed for x when x is a numeric vector or time series, nor for the residuals associated with x when x is an object of class "lm".

When test="AR1.mle", missing (NA) values are allowed, otherwise they are not allowed. When x is a numeric vector of observations or a numeric univariate time series of class "ts", it must contain at least 3 non-missing values. When x is an object of class "lm", the residuals must contain at least 3 non-missing values.

Note: when x is an object of class "lm", the linear model should have been fit using the argument na.action=na.exclude in the call to lm in order to correctly deal with missing values.

test

character string indicating which test to use. The possible values are:
"rank.von.Neumann" (rank von Neumann ratio test; the default),
"AR1.yw" (z-test based on Yule-Walker lag-one estimate of correlation), and
"AR1.mle" (z-test based on MLE of lag-one correlation).

alternative

character string indicating the kind of alternative hypothesis. The possible values are "two.sided" (the default), "greater", and "less".

conf.level

numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the population lag-one autocorrelation. The default value is conf.level=0.95.

...

optional arguments for possible future methods. Currently not used.

Details

Let $\underline{x} = x_1, x_2, \ldots, x_n$ denote $n$ observations from a stationary time series sampled at equispaced points in time with normal (Gaussian) errors. The function serialCorrelationTest tests the null hypothesis: $$H_0: \rho_1 = 0 \;\;\;\;\;\; (1)$$ where $\rho_1$ denotes the true lag-1 autocorrelation (also called the lag-1 serial correlation coefficient). Actually, the null hypothesis is that the lag-$k$ autocorrelation is 0 for all values of $k$ greater than 0 (i.e., the time series is purely random).

In the case when the argument x is a linear model, the function serialCorrelationTest tests the null hypothesis (1) for the residuals.

The three possible alternative hypotheses are the upper one-sided alternative (alternative="greater"): $$H_a: \rho_1 > 0 \;\;\;\;\;\; (2)$$ the lower one-sided alternative (alternative="less"): $$H_a: \rho_1 < 0 \;\;\;\;\;\; (3)$$ and the two-sided alternative: $$H_a: \rho_1 \ne 0 \;\;\;\;\;\; (4)$$

Testing the Null Hypothesis of No Lag-1 Autocorrelation
There are several possible methods for testing the null hypothesis (1) versus any of the three alternatives (2)-(4). The function serialCorrelationTest allows you to use one of three possible tests:

The rank von Neuman ratio test.
The test based on the normal approximation for the distribution of the Yule-Walker estimate of lag-one correlation.
The test based on the normal approximation for the distribution of the maximum likelihood estimate (MLE) of lag-one correlation.

Each of these tests is described below.

Test Based on Yule-Walker Estimate (test="AR1.yw")
The Yule-Walker estimate of the lag-1 autocorrelation is given by: $$\hat{\rho}_1 = \frac{\hat{\gamma}_1}{\hat{\gamma}_0} \;\;\;\;\;\; (5)$$ where $$\hat{\gamma}_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x}) \;\;\;\;\;\; (6)$$ is the estimate of the lag-$k$ autocovariance. (This estimator does not allow for missing values.)

Under the null hypothesis (1), the estimator of lag-1 correlation in Equation (5) is approximately distributed as a normal (Gaussian) random variable with mean 0 and variance given by: $$Var(\hat{\rho}_1) \approx \frac{1}{n} \;\;\;\;\;\; (7)$$ (Box and Jenkins, 1976, pp.34-35). Thus, the null hypothesis (1) can be tested with the statistic $$z = \sqrt{n} \hat{\rho_1} \;\;\;\;\;\; (8)$$ which is distributed approximately as a standard normal random variable under the null hypothesis that the lag-1 autocorrelation is 0.

Test Based on the MLE (test="AR1.mle")
The function serialCorrelationTest the R function arima to compute the MLE of the lag-one autocorrelation and the estimated variance of this estimator. As for the test based on the Yule-Walker estimate, the z-statistic is computed as the estimated lag-one autocorrelation divided by the square root of the estimated variance.

Test Based on Rank von Neumann Ratio (test="rank.von.Neumann")
The null distribution of the serial correlation coefficient may be badly affected by departures from normality in the underlying process (Cox, 1966; Bartels, 1977). It is therefore a good idea to consider using a nonparametric test for randomness if the normality of the underlying process is in doubt (Bartels, 1982).

Wald and Wolfowitz (1943) introduced the rank serial correlation coefficient, which for lag-1 autocorrelation is simply the Yule-Walker estimate (Equation (5) above) with the actual observations replaced with their ranks.

von Neumann et al. (1941) introduced a test for randomness in the context of testing for trend in the mean of a process. Their statistic is given by: $$V = \frac{\sum_{i=1}^{n-1}(x_i - x_{i+1})^2}{\sum_{i=1}^n (x_i - \bar{x})^2} \;\;\;\;\;\; (9)$$ which is the ratio of the square of successive differences to the usual sums of squared deviations from the mean. This statistic is bounded between 0 and 4, and for a purely random process is symmetric about 2. Small values of this statistic indicate possible positive autocorrelation, and large values of this statistics indicate possible negative autocorrelation. Durbin and Watson (1950, 1951, 1971) proposed using this statistic in the context of checking the independence of residuals from a linear regression model and provided tables for the distribution of this statistic. This statistic is therefore often called the “Durbin-Watson statistic” (Draper and Smith, 1998, p.181).

The rank version of the von Neumann ratio statistic is given by: $$V_{rank} = \frac{\sum_{i=1}^{n-1}(R_i - R_{i+1})^2}{\sum_{i=1}^n (R_i - \bar{R})^2} \;\;\;\;\;\; (10)$$ where $R_i$ denotes the rank of the $i$'th observation (Bartels, 1982). (This test statistic does not allow for missing values.) In the absence of ties, the denominator of this test statistic is equal to $$\sum_{i=1}^n (R_i - \bar{R})^2 = \frac{n(n^2 - 1)}{12} \;\;\;\;\;\; (11)$$ The range of the $V_{rank}$ test statistic is given by: $$[\frac{12}{(n)(n+1)} , 4 - \frac{12}{(n)(n+1)}] \;\;\;\;\;\; (12)$$ if n is even, with a negligible adjustment if n is odd (Bartels, 1982), so asymptotically the range is from 0 to 4, just as for the $V$ test statistic in Equation (9) above.

Bartels (1982) shows that asymptotically, the rank von Neumann ratio statistic is a linear transformation of the rank serial correlation coefficient, so any asymptotic results apply to both statistics.

For any fixed sample size $n$, the exact distribution of the $V_{rank}$ statistic in Equation (10) above can be computed by simply computing the value of $V_{rank}$ for all possible permutations of the serial order of the ranks. Based on this exact distribution, Bartels (1982) presents a table of critical values for the numerator of the RVN statistic for sample sizes between 4 and 10.

Determining the exact distribution of $V_{rank}$ becomes impractical as the sample size increases. For values of n between 10 and 100, Bartels (1982) approximated the distribution of $V_{rank}$ by a beta distribution over the range 0 to 4 with shape parameters shape1=$\nu$ and shape2=$\omega$ and: $$\nu = \omega = \frac{5n(n+1)(n-1)^2}{2(n-2)(5n^2 - 2n - 9)} - \frac{1}{2} \;\;\;\;\;\; (13)$$ Bartels (1982) checked this approximation by simulating the distribution of $V_{rank}$ for $n=25$ and $n=50$ and comparing the empirical quantiles at $0.005$, $0.01$, $0.025$, $0.05$, and $0.1$ with the approximated quantiles based on the beta distribution. He found that the quantiles agreed to 2 decimal places for eight of the 10 values, and differed by $0.01$ for the other two values.

Note: The definition of the beta distribution assumes the random variable ranges from 0 to 1. This definition can be generalized as follows. Suppose the random variable $Y$ has a beta distribution over the range $a \le y \le b$, with shape parameters $\nu$ and $\omega$. Then the random variable $X$ defined as: $$X = \frac{Y-a}{b-a} \;\;\;\;\;\; (14)$$ has the “standard beta distribution” as described in the help file for Beta (Johnson et al., 1995, p.210).

Bartels (1982) shows that asymptotically, $V_{rank}$ has normal distribution with mean 2 and variance $4/n$, but notes that a slightly better approximation is given by using a variance of $20/(5n + 7)$.

To test the null hypothesis (1) when test="rank.von.Neumann", the function
serialCorrelationTest does the following:

When the sample size is between 3 and 10, the exact distribution of $V_{rank}$ is used to compute the p-value.
When the sample size is between 11 and 100, the beta approximation to the distribution of $V_{rank}$ is used to compute the p-value.
When the sample size is larger than 100, the normal approximation to the distribution of $V_{rank}$ is used to compute the p-value. (This uses the variance $20/(5n + 7)$.)

When ties are present in the observations and midranks are used for the tied observations, the distribution of the $V_{rank}$ statistic based on the assumption of no ties is not applicable. If the number of ties is small, however, they may not grossly affect the assumed p-value.

When ties are present, the function serialCorrelationTest issues a warning. When the sample size is between 3 and 10, the p-value is computed based on rounding up the computed value of $V_{rank}$ to the nearest possible value that could be observed in the case of no ties.

Computing a Confidence Interval for the Lag-1 Autocorrelation
The function serialCorrelationTest computes an approximate $100(1-\alpha)\%$ confidence interval for the lag-1 autocorrelation as follows: $$[\hat{\rho}_1 - z_{1-\alpha/2}\hat{\sigma}_{\hat{\rho}_1}, \hat{\rho}_1 + z_{1-\alpha/2}\hat{\sigma}_{\hat{\rho}_1}] \;\;\;\;\;\; (15)$$ where $\hat{\sigma}_{\hat{\rho}_1}$ denotes the estimated standard deviation of the estimated of lag-1 autocorrelation and $z_p$ denotes the $p$'th quantile of the standard normal distribution.

When test="AR1.yw" or test="rank.von.Neumann", the Yule-Walker estimate of lag-1 autocorrelation is used and the variance of the estimated lag-1 autocorrelation is approximately: $$Var(\hat{\rho}_1) \approx \frac{1}{n} (1 - \rho_1^2) \;\;\;\;\;\; (16)$$ (Box and Jenkins, 1976, p.34), so $$\hat{\sigma}_{\hat{\rho}_1} = \sqrt{\frac{1 - \hat{\rho}_1^2}{n}} \;\;\;\;\;\; (17)$$ When test="AR1.mle", the MLE of the lag-1 autocorrelation is used, and its standard deviation is estimated with the square root of the estimated variance returned by arima.

Value

A list of class "htestEnvStats" containing the results of the hypothesis test. See the help file for htestEnvStats.object for details.

References

Bartels, R. (1982). The Rank Version of von Neumann's Ratio Test for Randomness. Journal of the American Statistical Association 77(377), 40–46.

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Second Edition. Lewis Publishers, Boca Raton, FL.

Box, G.E.P., and G.M. Jenkins. (1976). Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs, NJ, Chapter 2.

Cox, D.R. (1966). The Null Distribution of the First Serial Correlation Coefficient. Biometrika 53, 623–626.

Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York, pp.69-70;181-192.

Durbin, J., and G.S. Watson. (1950). Testing for Serial Correlation in Least Squares Regression I. Biometrika 37, 409–428.

Durbin, J., and G.S. Watson. (1951). Testing for Serial Correlation in Least Squares Regression II. Biometrika 38, 159–178.

Durbin, J., and G.S. Watson. (1971). Testing for Serial Correlation in Least Squares Regression III. Biometrika 58, 1–19.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.250–253.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York, Chapter 25.

Knoke, J.D. (1975). Testing for Randomness Against Autocorrelation Alternatives: The Parametric Case. Biometrika 62, 571–575.

Knoke, J.D. (1977). Testing for Randomness Against Autocorrelation Alternatives: Alternative Tests. Biometrika 64, 523–529.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, 457pp.

von Neumann, J., R.H. Kent, H.R. Bellinson, and B.I. Hart. (1941). The Mean Square Successive Difference. Annals of Mathematical Statistics 12(2), 153–162.

Wald, A., and J. Wolfowitz. (1943). An Exact Test for Randomness in the Non-Parametric Case Based on Serial Correlation. Annals of Mathematical Statistics 14, 378–388.

Author

Steven P. Millard (EnvStats@ProbStatInfo.com)

Note

Data collected over time on the same phenomenon are called a time series. A time series is usually modeled as a single realization of a stochastic process; that is, if we could go back in time and repeat the experiment, we would get different results that would vary according to some probabilistic law. The simplest kind of time series is a stationary time series, in which the mean value is constant over time, the variability of the observations is constant over time, etc. That is, the probability distribution associated with each future observation is the same.

A common concern in applying standard statistical tests to time series data is the assumption of independence. Most conventional statistical hypothesis tests assume the observations are independent, but data collected sequentially in time may not satisfy this assumption. For example, high observations may tend to follow high observations (positive serial correlation), or low observations may tend to follow high observations (negative serial correlation). One way to investigate the assumption of independence is to estimate the lag-one serial correlation and test whether it is significantly different from 0.

The null distribution of the serial correlation coefficient may be badly affected by departures from normality in the underlying process (Cox, 1966; Bartels, 1977). It is therefore a good idea to consider using a nonparametric test for randomness if the normality of the underlying process is in doubt (Bartels, 1982). Knoke (1977) showed that under normality, the test based on the rank serial correlation coefficient (and hence the test based on the rank von Neumann ratio statistic) has asymptotic relative efficiency of 0.91 with respect to using the test based on the ordinary serial correlation coefficient against the alternative of first-order autocorrelation.

Bartels (1982) performed an extensive simulation study of the power of the rank von Neumann ratio test relative to the standard von Neumann ratio test (based on the statistic in Equation (9) above) and the runs test (Lehmann, 1975, 313-315). He generated a first-order autoregressive process for sample sizes of 10, 25, and 50, using 6 different parent distributions: normal, Cauchy, contaminated normal, Johnson, Stable, and exponential. Values of lag-1 autocorrelation ranged from -0.8 to 0.8. Bartels (1982) found three important results:

The rank von Neumann ratio test is far more powerful than the runs test.
For the normal process, the power of the rank von Neumann ratio test was never less than 89% of the power of the standard von Neumann ratio test.
For non-normal processes, the rank von Neumann ratio test was often much more powerful than of the standard von Neumann ratio test.

Examples

  # Generate a purely random normal process, then use serialCorrelationTest 
  # to test for the presence of correlation. 
  # (Note: the call to set.seed allows you to reproduce this example.) 

  set.seed(345) 
  x <- rnorm(100) 

  # Look at the data
  #-----------------
  dev.new()
  ts.plot(x)

  dev.new()
  acf(x)

  # Test for serial correlation
  #----------------------------
  serialCorrelationTest(x) 
#> $statistic
#>      RVN 
#> 1.929733 
#> 
#> $parameters
#> NULL
#> 
#> $p.value
#> [1] 0.7253405
#> 
#> $estimate
#>        rho 
#> 0.02773737 
#> 
#> $estimation.method
#> [1] "Yule-Walker"
#> 
#> $null.value
#> rho 
#>   0 
#> 
#> $alternative
#> [1] "two.sided"
#> 
#> $method
#> [1] "Rank von Neumann Test for\n                                 Lag-1 Autocorrelation\n                                 (Beta Approximation)"
#> 
#> $sample.size
#> [1] 100
#> 
#> $data.name
#> [1] "x"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "rho"
#> 
#> $limits
#>        LCL        UCL 
#> -0.1681836  0.2236584 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Normal Approximation"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "htestEnvStats"

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 rho = 0
  #
  #Alternative Hypothesis:          True rho is not equal to 0
  #
  #Test Name:                       Rank von Neumann Test for
  #                                 Lag-1 Autocorrelation
  #                                 (Beta Approximation)
  #
  #Estimated Parameter(s):          rho = 0.02773737
  #
  #Estimation Method:               Yule-Walker
  #
  #Data:                            x
  #
  #Sample Size:                     100
  #
  #Test Statistic:                  RVN = 1.929733
  #
  #P-value:                         0.7253405
  #
  #Confidence Interval for:         rho
  #
  #Confidence Interval Method:      Normal Approximation
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = -0.1681836
  #                                 UCL =  0.2236584

  # Clean up
  #---------
  rm(x)
  graphics.off()

  #==========

  # Now use the R function arima.sim to generate an AR(1) process with a 
  # lag-1 autocorrelation of 0.8, then test for autocorrelation.

  set.seed(432) 
  y <- arima.sim(model = list(ar = 0.8), n = 100) 

  # Look at the data
  #-----------------
  dev.new()
  ts.plot(y)

  dev.new()
  acf(y)

  # Test for serial correlation
  #----------------------------
  serialCorrelationTest(y)
#> $statistic
#>       RVN 
#> 0.3743174 
#> 
#> $parameters
#> NULL
#> 
#> $p.value
#> [1] 0
#> 
#> $estimate
#>      rho 
#> 0.835214 
#> 
#> $estimation.method
#> [1] "Yule-Walker"
#> 
#> $null.value
#> rho 
#>   0 
#> 
#> $alternative
#> [1] "two.sided"
#> 
#> $method
#> [1] "Rank von Neumann Test for\n                                 Lag-1 Autocorrelation\n                                 (Beta Approximation)"
#> 
#> $sample.size
#> [1] 100
#> 
#> $data.name
#> [1] "y"
#> 
#> $bad.obs
#> [1] 0
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "rho"
#> 
#> $limits
#>       LCL       UCL 
#> 0.7274307 0.9429973 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Normal Approximation"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "htestEnvStats"

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 rho = 0
  #
  #Alternative Hypothesis:          True rho is not equal to 0
  #
  #Test Name:                       Rank von Neumann Test for
  #                                 Lag-1 Autocorrelation
  #                                 (Beta Approximation)
  #
  #Estimated Parameter(s):          rho = 0.835214
  #
  #Estimation Method:               Yule-Walker
  #
  #Data:                            y
  #
  #Sample Size:                     100
  #
  #Test Statistic:                  RVN = 0.3743174
  #
  #P-value:                         0
  #
  #Confidence Interval for:         rho
  #
  #Confidence Interval Method:      Normal Approximation
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.7274307
  #                                 UCL = 0.9429973

  #----------

  # Clean up
  #---------
  rm(y)
  graphics.off()

  #==========

  # The data frame Air.df contains information on ozone (ppb^1/3), 
  # radiation (langleys), temperature (degrees F), and wind speed (mph) 
  # for 153 consecutive days between May 1 and September 30, 1973.  
  # First test for serial correlation in (the cube root of) ozone.  
  # Note that we must use the test based on the MLE because the time series 
  # contains missing values.  Serial correlation appears to be present.  
  # Next fit a linear model that includes the predictor variables temperature, 
  # radiation, and wind speed, and test for the presence of serial correlation 
  # in the residuals.  There is no evidence of serial correlation.

  # Look at the data
  #-----------------

  Air.df
#>               ozone radiation temperature wind
#> 05/01/1973 3.448217       190          67  7.4
#> 05/02/1973 3.301927       118          72  8.0
#> 05/03/1973 2.289428       149          74 12.6
#> 05/04/1973 2.620741       313          62 11.5
#> 05/05/1973       NA        NA          56 14.3
#> 05/06/1973 3.036589        NA          66 14.9
#> 05/07/1973 2.843867       299          65  8.6
#> 05/08/1973 2.668402        99          59 13.8
#> 05/09/1973 2.000000        19          61 20.1
#> 05/10/1973       NA       194          69  8.6
#> 05/11/1973 1.912931        NA          74  6.9
#> 05/12/1973 2.519842       256          69  9.7
#> 05/13/1973 2.223980       290          66  9.2
#> 05/14/1973 2.410142       274          68 10.9
#> 05/15/1973 2.620741        65          58 13.2
#> 05/16/1973 2.410142       334          64 11.5
#> 05/17/1973 3.239612       307          66 12.0
#> 05/18/1973 1.817121        78          57 18.4
#> 05/19/1973 3.107233       322          68 11.5
#> 05/20/1973 2.223980        44          62  9.7
#> 05/21/1973 1.000000         8          59  9.7
#> 05/22/1973 2.223980       320          73 16.6
#> 05/23/1973 1.587401        25          61  9.7
#> 05/24/1973 3.174802        92          61 12.0
#> 05/25/1973       NA        66          57 16.6
#> 05/26/1973       NA       266          58 14.9
#> 05/27/1973       NA        NA          57  8.0
#> 05/28/1973 2.843867        13          67 12.0
#> 05/29/1973 3.556893       252          81 14.9
#> 05/30/1973 4.862944       223          79  5.7
#> 05/31/1973 3.332222       279          76  7.4
#> 06/01/1973       NA       286          78  8.6
#> 06/02/1973       NA       287          74  9.7
#> 06/03/1973       NA       242          67 16.1
#> 06/04/1973       NA       186          84  9.2
#> 06/05/1973       NA       220          85  8.6
#> 06/06/1973       NA       264          79 14.3
#> 06/07/1973 3.072317       127          82  9.7
#> 06/08/1973       NA       273          87  6.9
#> 06/09/1973 4.140818       291          90 13.8
#> 06/10/1973 3.391211       323          87 11.5
#> 06/11/1973       NA       259          93 10.9
#> 06/12/1973       NA       250          92  9.2
#> 06/13/1973 2.843867       148          82  8.0
#> 06/14/1973       NA       332          80 13.8
#> 06/15/1973       NA       322          79 11.5
#> 06/16/1973 2.758924       191          77 14.9
#> 06/17/1973 3.332222       284          72 20.7
#> 06/18/1973 2.714418        37          65  9.2
#> 06/19/1973 2.289428       120          73 11.5
#> 06/20/1973 2.351335       137          76 10.3
#> 06/21/1973       NA       150          77  6.3
#> 06/22/1973       NA        59          76  1.7
#> 06/23/1973       NA        91          76  4.6
#> 06/24/1973       NA       250          76  6.3
#> 06/25/1973       NA       135          75  8.0
#> 06/26/1973       NA       127          78  8.0
#> 06/27/1973       NA        47          73 10.3
#> 06/28/1973       NA        98          80 11.5
#> 06/29/1973       NA        31          77 14.9
#> 06/30/1973       NA       138          83  8.0
#> 07/01/1973 5.129928       269          84  4.0
#> 07/02/1973 3.659306       248          85  9.2
#> 07/03/1973 3.174802       236          81  9.2
#> 07/04/1973       NA       101          84 10.9
#> 07/05/1973 4.000000       175          83  4.6
#> 07/06/1973 3.419952       314          83 10.9
#> 07/07/1973 4.254321       276          88  5.1
#> 07/08/1973 4.594701       267          92  6.3
#> 07/09/1973 4.594701       272          92  5.7
#> 07/10/1973 4.396830       175          89  7.4
#> 07/11/1973       NA       139          82  8.6
#> 07/12/1973 2.154435       264          73 14.3
#> 07/13/1973 3.000000       175          81 14.9
#> 07/14/1973       NA       291          91 14.9
#> 07/15/1973 1.912931        48          80 14.3
#> 07/16/1973 3.634241       260          81  6.9
#> 07/17/1973 3.271066       274          82 10.3
#> 07/18/1973 3.936497       285          84  6.3
#> 07/19/1973 4.290840       187          87  5.1
#> 07/20/1973 3.979057       220          85 11.5
#> 07/21/1973 2.519842         7          74  6.9
#> 07/22/1973       NA       258          81  9.7
#> 07/23/1973       NA       295          82 11.5
#> 07/24/1973 4.308869       294          86  8.6
#> 07/25/1973 4.762203       223          85  8.0
#> 07/26/1973 2.714418        81          82  8.6
#> 07/27/1973 3.732511        82          86 12.0
#> 07/28/1973 4.344481       213          88  7.4
#> 07/29/1973 3.684031       275          86  7.4
#> 07/30/1973 4.000000       253          83  7.4
#> 07/31/1973 3.892996       254          81  9.2
#> 08/01/1973 3.391211        83          81  6.9
#> 08/02/1973 2.080084        24          81 13.8
#> 08/03/1973 2.519842        77          82  7.4
#> 08/04/1973 4.272659        NA          86  6.9
#> 08/05/1973 3.271066        NA          85  7.4
#> 08/06/1973 4.041240        NA          87  4.6
#> 08/07/1973 4.959676       255          89  4.0
#> 08/08/1973 4.464745       229          90 10.3
#> 08/09/1973 4.791420       207          90  8.0
#> 08/10/1973       NA       222          92  8.6
#> 08/11/1973       NA       137          86 11.5
#> 08/12/1973 3.530348       192          86 11.5
#> 08/13/1973 3.036589       273          82 11.5
#> 08/14/1973 4.020726       157          80  9.7
#> 08/15/1973       NA        64          79 11.5
#> 08/16/1973 2.802039        71          77 10.3
#> 08/17/1973 3.892996        51          79  6.3
#> 08/18/1973 2.843867       115          76  7.4
#> 08/19/1973 3.141381       244          78 10.9
#> 08/20/1973 3.530348       190          78 10.3
#> 08/21/1973 2.758924       259          77 15.5
#> 08/22/1973 2.080084        36          72 14.3
#> 08/23/1973       NA       255          75 12.6
#> 08/24/1973 3.556893       212          79  9.7
#> 08/25/1973 5.517848       238          81  3.4
#> 08/26/1973 4.179339       215          86  8.0
#> 08/27/1973       NA       153          88  5.7
#> 08/28/1973 4.235824       203          97  9.7
#> 08/29/1973 4.904868       225          94  2.3
#> 08/30/1973 4.379519       237          96  6.3
#> 08/31/1973 4.396830       188          94  6.3
#> 09/01/1973 4.578857       167          91  6.9
#> 09/02/1973 4.272659       197          92  5.1
#> 09/03/1973 4.179339       183          93  2.8
#> 09/04/1973 4.497941       189          93  4.6
#> 09/05/1973 3.608826        95          87  7.4
#> 09/06/1973 3.174802        92          84 15.5
#> 09/07/1973 2.714418       252          80 10.9
#> 09/08/1973 2.843867       220          78 10.3
#> 09/09/1973 2.758924       230          75 10.9
#> 09/10/1973 2.884499       259          73  9.7
#> 09/11/1973 3.530348       236          81 14.9
#> 09/12/1973 2.758924       259          76 15.5
#> 09/13/1973 3.036589       238          77  6.3
#> 09/14/1973 2.080084        24          71 10.9
#> 09/15/1973 2.351335       112          71 11.5
#> 09/16/1973 3.583048       237          78  6.9
#> 09/17/1973 2.620741       224          67 13.8
#> 09/18/1973 2.351335        27          76 10.3
#> 09/19/1973 2.884499       238          68 10.3
#> 09/20/1973 2.519842       201          82  8.0
#> 09/21/1973 2.351335       238          64 12.6
#> 09/22/1973 2.843867        14          71  9.2
#> 09/23/1973 3.301927       139          81 10.3
#> 09/24/1973 1.912931        49          69 10.3
#> 09/25/1973 2.410142        20          63 16.6
#> 09/26/1973 3.107233       193          70  6.9
#> 09/27/1973       NA       145          77 13.2
#> 09/28/1973 2.410142       191          75 14.3
#> 09/29/1973 2.620741       131          76  8.0
#> 09/30/1973 2.714418       223          68 11.5
  #              ozone radiation temperature wind
  #05/01/1973 3.448217       190          67  7.4
  #05/02/1973 3.301927       118          72  8.0
  #05/03/1973 2.289428       149          74 12.6
  #05/04/1973 2.620741       313          62 11.5
  #05/05/1973       NA        NA          56 14.3
  #...
  #09/27/1973       NA       145          77 13.2
  #09/28/1973 2.410142       191          75 14.3
  #09/29/1973 2.620741       131          76  8.0
  #09/30/1973 2.714418       223          68 11.5

  #----------

  # Test for serial correlation
  #----------------------------

  with(Air.df, 
    serialCorrelationTest(ozone, test = "AR1.mle"))
#> $statistic
#>        z 
#> 7.586952 
#> 
#> $parameters
#> NULL
#> 
#> $p.value
#>         rho 
#> 3.28626e-14 
#> 
#> $estimate
#>       rho 
#> 0.5641616 
#> 
#> $estimation.method
#> [1] "Maximum Likelihood"
#> 
#> $null.value
#> rho 
#>   0 
#> 
#> $alternative
#> [1] "two.sided"
#> 
#> $method
#> [1] "z-Test for\n                                 Lag-1 Autocorrelation\n                                 (Wald Test Based on MLE)"
#> 
#> $sample.size
#> [1] 153
#> 
#> $data.name
#> [1] "ozone"
#> 
#> $bad.obs
#> [1] 37
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "rho"
#> 
#> $limits
#>       LCL       UCL 
#> 0.4184197 0.7099034 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Normal Approximation"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> attr(,"class")
#> [1] "htestEnvStats"

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 rho = 0
  #
  #Alternative Hypothesis:          True rho is not equal to 0
  #
  #Test Name:                       z-Test for
  #                                 Lag-1 Autocorrelation
  #                                 (Wald Test Based on MLE)
  #
  #Estimated Parameter(s):          rho = 0.5641616
  #
  #Estimation Method:               Maximum Likelihood
  #
  #Data:                            ozone
  #
  #Sample Size:                     153
  #
  #Number NA/NaN/Inf's:             37
  #
  #Test Statistic:                  z = 7.586952
  #
  #P-value:                         3.28626e-14
  #
  #Confidence Interval for:         rho
  #
  #Confidence Interval Method:      Normal Approximation
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.4184197
  #                                 UCL = 0.7099034

  #----------

  # Next fit a linear model that includes the predictor variables temperature, 
  # radiation, and wind speed, and test for the presence of serial correlation 
  # in the residuals.  Note setting the argument na.action = na.exclude in the 
  # call to lm to correctly deal with missing values.
  #----------------------------------------------------------------------------

  lm.ozone <- lm(ozone ~ radiation + temperature + wind + 
    I(temperature^2) + I(wind^2), 
    data = Air.df, na.action = na.exclude) 


  # Now test for serial correlation in the residuals.
  #--------------------------------------------------

  serialCorrelationTest(lm.ozone, test = "AR1.mle") 
#> $statistic
#>        z 
#> 1.285963 
#> 
#> $parameters
#> NULL
#> 
#> $p.value
#>       rho 
#> 0.1984559 
#> 
#> $estimate
#>       rho 
#> 0.1298024 
#> 
#> $estimation.method
#> [1] "Maximum Likelihood"
#> 
#> $null.value
#> rho 
#>   0 
#> 
#> $alternative
#> [1] "two.sided"
#> 
#> $method
#> [1] "z-Test for\n                                 Lag-1 Autocorrelation\n                                 (Wald Test Based on MLE)"
#> 
#> $sample.size
#> [1] 153
#> 
#> $data.name
#> [1] "Residuals"
#> 
#> $bad.obs
#> [1] 42
#> 
#> $interval
#> $name
#> [1] "Confidence"
#> 
#> $parameter
#> [1] "rho"
#> 
#> $limits
#>         LCL         UCL 
#> -0.06803223  0.32763704 
#> 
#> $type
#> [1] "two-sided"
#> 
#> $method
#> [1] "Normal Approximation"
#> 
#> $conf.level
#> [1] 0.95
#> 
#> attr(,"class")
#> [1] "intervalEstimate"
#> 
#> $parent.of.data
#> [1] "lm.ozone"
#> 
#> attr(,"class")
#> [1] "htestEnvStats"

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 rho = 0
  #
  #Alternative Hypothesis:          True rho is not equal to 0
  #
  #Test Name:                       z-Test for
  #                                 Lag-1 Autocorrelation
  #                                 (Wald Test Based on MLE)
  #
  #Estimated Parameter(s):          rho = 0.1298024
  #
  #Estimation Method:               Maximum Likelihood
  #
  #Data:                            Residuals
  #
  #Data Source:                     lm.ozone
  #
  #Sample Size:                     153
  #
  #Number NA/NaN/Inf's:             42
  #
  #Test Statistic:                  z = 1.285963
  #
  #P-value:                         0.1984559
  #
  #Confidence Interval for:         rho
  #
  #Confidence Interval Method:      Normal Approximation
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = -0.06803223
  #                                 UCL =  0.32763704

  # Clean up
  #---------
  rm(lm.ozone)