Cover

Contents

Preface

1 Introduction

1.1 Stationarity

1.2 The effect of correlation in estimation and prediction

2 Geostatistics

2.1 A model for optimal prediction and error assessment

2.2 Optimal prediction (kriging)

2.3 Prediction intervals

2.4 Universal kriging

2.5 The intuition behind kriging

3 Variogram and covariance models and estimation

3.1 Empirical estimation of the variogram or covariance function

3.2 On the necessity of parametric variogram and covariance models

3.3 Covariance and variogram models

3.4 Convolution methods and extensions

3.5 Parameter estimation for variogram and covariance models

3.6 Prediction for the phosphorus data

3.7 Nonstationary covariance models

4 Spatial models and statistical inference

4.1 Estimationinthe Gaussian case

4.2 Estimation for binary spatial observations

5 Isotropy

5.1 Geometric anisotropy

5.2 Other typesofanisotropy

5.3 Covariance modeling under anisotropy

5.4 Detectionofanisotropy: the rose plot

5.5 Parametric methodstoassess isotropy

5.6 Nonparametric methods of assessing anisotropy

5.7 Assessment of isotropy for general sampling designs

5.8 An assessment of isotropy for the longleaf pine sizes

6 Space–time data

6.1 Space–time observations

6.2 Spatio-temporal stationarity and spatio-temporal prediction

6.3 Empirical estimation of the variogram, covariance models, and estimation

6.4 Spatio-temporal covariance models

6.5 Space–time models

6.6 Parametric methods of assessing full symmetry and space–time separability

6.7 Nonparametric methods of assessing full symmetry and space–time separability

6.8 Nonstationary space–time covariance models

7 Spatial point patterns

7.1 The Poisson process and spatial randomness

7.2 Inhibition models

7.3 Clustered models

8 Isotropy for spatial point patterns

8.1 Some large sample results

8.2 Atest for isotropy

8.3 Practical issues

8.4 Numerical results

8.5 An application to leukemia data

9 Multivariate spatial and spatio-temporal models

9.1 Cokriging

9.2 Analternativetocokriging

9.3 Multivariate covariance functions

9.4 Testing and assessing intrinsic correlation

9.5 Numerical experiments

9.6 Adata applicationtopollutants

9.7 Discussion

10 Resampling for correlated observations

10.1 Independent observations

10.2 Other data structures

10.3 Model-based bootstrap

10.4 Model-free resampling methods

10.5 Spatial resampling

10.6 Model-free spatial resampling

10.7 Unequally spaced observations

Bibliography

Index

Spatial Statistics and Spatio-Temporal Data

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors

David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Harvey Goldstein,

Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F. M. Smith,

Ruey S. Tsay, Sanford Weisberg

Editors Emeriti

Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G. Kendall,

Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume

Title

Preface

The fields of spatial data and spatio-temporal data have expanded greatly over the previous 20 years. This has occurred as the amount of spatial and spatio-temporal data has increased. One main tool in spatial prediction is the covariance function or the variogram. Given these functions, we know how to make optimal predictions of quantities of interest at unsampled locations. In practice, these covariance functions are unknown and need to be estimated from sample data. Covariance functions and their estimation is a subset of the field of geostatistics. Several previous texts in geostatistics consider these topics in great detail, specifically those of Cressie (1993), Chiles and Delfiner (1999), Diggle and Ribeiro (2007), Isaaks and Srivastava (1989), and Stein (1999).

A common assumption on variogram or covariance functions is that they are isotropic, that is, not direction dependent. For spatio-temporal covariance functions, a common assumption is that the spatial and temporal covariances are separable. For multivariate spatial observations, a common assumption is intrinsic correlation; that is, that the variable correlations and spatial correlations are separable. All these types of assumptions make models simpler, and thus aid in effective parameter estimation in these covariance models. Much of this book details the effects of these assumptions, and addresses methods to assess the appropriateness of such assumptions for these various data structures.

Chapters 1–3 are an introduction to the topics of stationarity, spatial prediction, variogram and covariance models, and estimation for these models. Chapter 4 gives a brief survey of spatial models, highlighting the Gaussian case and the binary data setting and the different methodologies for these two data structures. Chapter 5 discusses the assumption of isotropy for spatial covariances, and methods to assess and correct for anisotropies; while Chapter 6 discusses models for spatio-temporal covariances and assessment of symmetry and separability assumptions. Chapter 7 serves as an introduction to spatial point patterns. In this chapter we discuss testing for spatial randomness and models for both regular and clustered point patterns. These and further topics in the analysis of point patterns can be found in, for example, Diggle (2003) or Illian et al. (2008). The isotropy assumption for point pattern models has not been as often addressed as in the geostatistical setting. Chapter 8 details methods for testing for isotropy based on spatial point pattern observations. Chapter 9 considers models for multivariate spatial and spatio-temporal observations and covariance functions for these data. Due to spatial correlations and unwieldy likelihoods in the spatial setting, many statistics are complicated. In particular, this means that variances and other distributional properties are difficult to derive analytically. Resampling methodology can greatly aid in estimating these quantities. For this reason, Chapter 10 gives some background and details on resampling methodology for independent, time series, and spatial observations.

The first four chapters and Chapters 7 and 10 of this book are relatively non-technical, and any necessary technical items should be accessible on the way. Chapters 5, 6, 8, and 9 often make reference to large sample theory, but the basic methodology can be followed without reference to these large sample results. The chapters that address the testing of various assumptions of covariance functions, Chapters 5, 6, 8, and 9, often rely on a common testing approach. This approach is repeated separately, to some extent, within each of these chapters. Hopefully, this will aid the data analyst who may be interested in only one or two of the data structures addressed in these chapters. There are no exercises given at the end of chapters. It is hoped that some of the details within the chapters will lend themselves to further exploration, if desired, for an instructor. All data analyses have been carried out using the R language, and various R packages. I have not listed any specific packages, as the continual growth and improvement of these packages would make this inappropriate. Furthermore, as R is freeware, users can experiment, and find the software they are most comfortable with.

This book introduces spatial covariance models and discusses their importance in making predictions. Whenever building models based on data, a key component is to assess the validity of any model assumptions. It is hoped that this book shows how this can be done, and hopefully suggests further methodology to expand the applicability of such assessments.

The content of this book could never have come into being without the benefits of associations with mentors, colleagues, collaborators, and students. Specifically, I greatly appreciate Ed Carlstein and Martin Tanner for their imparting of wisdom and experience to me when I was a student.I greatly thank colleagues, with whom many of the results in this book have been obtained. Specifically, I thank Tanya Apanosovich, Jim Calvin, Ray Carroll, Marc Genton, Yongtao Guan, Bo Li, Johan Lim, Arnab Maity, Dimitris Politis, Gad Ritvo, and Michael Speed for their collaborative efforts over the years. I also thank Professor Christopher K. Wikle for the use of the Pacific Ocean wind-speed data in Chapters 5 and 6, and Professor Sue Carrozza for use of the leukemia data in Chapter 8. Lastly, I sincerely appreciate the loving aid of my wife, Aviva Sherman, who provided the utmost emotional and technical support in the writing of this book.

1

Introduction

Spatial statistics, like all branches of statistics, is the process of learning from data. Many of the questions that arise in spatial analyses are common to all areas of statistics. Namely,

i. What are the phenomena under study.

ii. What are the relevant data and how should it be collected.

iii. How should we analyze the data after it is collected.

iv. How can we draw inferences from the data collected to the phenomena under study.

The way these questions are answered depends on the type of phenomena under study. In the spatial or spatio-temporal setting, these issues are typically addressed in certain ways. We illustrate this from the following study of phosphorus measurements in shrimp ponds.

Figure 1.1 gives the locations of phosphorus measurements in a 300m × 100m pond in a Texas shrimp farm.

i. The phenomena under study are:

a. Are the observed measurements sufficient to measure total phosphorus in the pond? What can be gained in precision by further sampling?

b. What are the levels of phosphorus at unsampled locations in the pond, and how can we predict them?

c. How does the phosphorus level at one location relate to the amount at another location?

d. Does this relationship depend only on distance or also on direction?

Figure 1.1 Sampling locations of phosphorus measurements.

c01f001

ii. The relevant data that are collected are as follows: a total of n = 103 samples were collected from the top 10 cm of the soil from each pond by a core sampler with a 2.5 cm diameter. We see 15 equidistant samples on the long edge (300 m), and 5 equidistant samples from the short edge (100 m). Additionally, 14 samples were taken from each of the shallow and deep edges of each pond. The 14 samples were distributed in a cross shape. Two of the sides of the cross consist of samples at distances of 1, 5, 10, and 15 m from the center while the remaining two have samples at 1, 5, and 10 m from the center.

iii. The analysis of the data shows that the 14 samples in each of the two cross patterns turn out to be very important for both the analysis, (iii), and inferences, (iv), drawn from these data. This will be discussed further in Section 3.5.

iv. Inferences show that the answer to (d) helps greatly in answering question (c), which in turn helps in answering question (b) in an informative and efficient manner. Further, the answers to (b), (c), and (d) determine how well we can answer question (a). Also, we will see that increased sampling will not give much better answers to (a); while addressing (c), it is found that phosphorus levels are related but only up to a distance of about 15–20 m. The exact meaning of ‘related,’ and how these conclusions are reached, are discussed in the next paragraph and in Chapter 2.

We consider all observed values to be the outcome of random variables observed at the given locations. Let {Z(si), i = 1,…, n} denote the random quantity Z of interest observed at locations Image, where D is the domain where observations are taken, and d is the dimension of the domain. In the phosphorus study, Z(si) denotes the log(phosphorus) measurement at the ith sampling location, i = 1,…, 103. The dimension d is 2, and the domain D is the 300m × 100m pond. For usual spatial data, the dimension, d, is 2.

Sometimes the locations themselves will be considered random, but for now we consider them to be fixed by the experimenter (as they are, e.g., in the phosphorus study). A fundamental concept for addressing question (iii) in the first paragraph of the introduction is the covariance function.

For any two variables Z(s) and Z(t) with means μ(s) and μ(t), respectively, we define the covariance to be

Image

The correlation function is then Cov[Z(s), Z(t)]/(σsσt), where σs and σt denote the standard deviations of the two variables. We see, for example, that if all random observations are independent, then the covariance and the correlation are identically zero, for all locations s and t, such that s ≠ t. In the special case where the mean and variances are constant, that is, μ(t) = μ and σs = σ for all locations s, we have

Image

The covariance function, which is very important for prediction and inference, typically needs to be estimated. Without any replication this is usually not feasible. We next give a common assumption made in order to obtain replicates.

1.1 Stationarity

A standard method of obtaining replication is through the assumption of second-order stationarity (SOS). This assumption holds that:

i. E[Z(s)] = μ;

ii. Cov[Z(s), Z(t)] = Cov[Z(s + h), (t + h)] for all shifts h.

Figure 1.2 shows the locations for a particular shift vector h. In this case we can write

Image

so that the covariance depends only on the spatial lag between the locations, t − s, and not on the two locations themselves. Second-order stationarity is often known as ‘weak stationarity.’ Strong (or strict) stationarity assumes that, for any collection of k variables, Z(si), i = 1,…, k, and constants ai, i = 1,…, k, we have

Image

for all shift vectors h.

Figure 1.2 A depiction of stationarity: two identical lag vectors.

c01f002

This says that the entire joint distribution of k variables is invariant under shifts. Taking k = 1 and k = 2, and observing that covariances are determined by the joint distribution, it is seen that strong stationarity implies SOS. Generally, to answer the phenomenon of interest in the phosphorus study (and many others) only the assumption of weak stationarity is necessary. Still, we will have occasions to use both concepts in what follows.

It turns out that the effects of covariance and correlation in estimation and prediction are entirely different. To illustrate this, the role of covariance in estimation and prediction is considered in the times series setting (d = 1). The lessons learned here are more simply derived, but are largely analogous to the situation for spatial observations, and spatio-temporal observations.

1.2 The effect of correlation in estimation and prediction

1.2.1 Estimation

Consider equally spaced observations, Zi, representing the response variable of interest at time i. Assume that the observations come from an autoregressive time series of order one. This AR(1) model is given by

Image

where the independent errors, εi, are such that E(εi) = 0 and Var(εi) = η2. For the sake of simplicity, take μ = 0 and η2 = 1, and then the AR(1) model simplifies to

Image

with Var(εi) = 1.

For −1 < ρ < 1, assume that Var(Zi) is constant. Then we have Var(Zi) = (1 − ρ2)−1, and thus direct calculations show that Cov(Zi+1, Zi) = ρ/(1 − ρ2). Iteration then shows that, for any time lag k, we have:

Image

Noting that the right hand side does not depend on i, it is seen that SOS holds, and we can define C(k) := ρ|k|/(1 − ρ2). Further, note that the distribution of Zi conditional on the entire past is the same as the distribution of Zi given only the immediate past, Zi−1. Any such process is an example of a Markov process. We say that the AR(1) process is a Markov process of order one, as the present depends only on the one, immediately previous observation in time.

Figure 1.3 Outcomes of three AR(1) time series.

c01f003

Figure 1.3 shows the outcomes of three AR(1) time series, the first an uncorrelated series (ρ = 0.0), the second with moderate correlation (ρ = 0.5), and the third with strong correlation (ρ = 0.9). Each time series consists of n = 50 temporal observations. Note that as ρ increases the oscillations of the time plots decrease. For example, the number of crossings of the mean (µ = 0) decreases from 22 (ρ = 0), to 17 (ρ = 0.5), to 8 (ρ = 0.9). In other words, the ‘smoothness’ of the time plots increases. This notion of ‘smoothness’ and its importance in spatial prediction is discussed in Section 3.3.

To examine the effect of correlation on estimation, assume that SOS holds. From observations Zi, i = 1,…, n, we seek to estimate and draw inferences concerning the mean of the process, μ. To do this we desire a confidence interval for the unknown μ. Under SOS, it holds that each observation has the same variability, σ2 = Var(Zi), and to simplify we assume that this value is known. The usual large sample 95 percent confidence interval for μ is then given by

Image

where

Image

denotes the sample mean of the observations.

We hope that the true coverage of this interval is equal to the nominal coverage of 95%. To see the true coverage of this interval, continue to assume that the data come from a (SOS) time series. Using the fact that, for any constants ai, i = 1,…, n, we have

Image

and setting ai = 1/n, for all i, gives:

Image

where the second equality uses SOS and counting. To evaluate this for large n, we need the following result named after the 19th century mathematician, Leopold Kronecker:

Lemma 1.1 Kronecker's lemma

For a sequence of numbers ai, i = 1,…, such that

Image

we have that

Image

In a direct application taking ai = C(i) in Kronecker’s lemma, it is seen that:

Image

and thus that

Image

whenever Image This last condition, known as ‘summable covariances’ says that the variance of the mean tends to 0 at the same rate as in the case of independent observations. In particular, it holds for the AR(1) process with −1 < ρ < 1. The same rate of convergence does not mean, however, that the correlation has no effect in estimation.

To see the effects of correlation, note that in the case of independent observations, it holds that the variance of the standardized mean, Image It is seen that in the presence of correlation, the true variance of the standardized mean, Image, is quite different from σ2. In particular, for the stationary AR(1) process (with η = 1), σ2 = C(0) = (1 − ρ2)−1 while arithmetic shows that Image = (1 − ρ)−2, so that the ratio of the large sample variance of the mean under independence to the true variance of the mean is R = σ2/Image = (1 − ρ)/(1 + ρ). In the common situation where correlation is positive, 0 < ρ < 1, we see that ignoring correlation leads to underestimation of the correct variance.

To determine the practical effect of this, let Ф(·) denote the cumulative distribution function of a standard normal variable. The coverage of the interval that ignores the correlation is given by

Image

We have assumed that the Central Limit theorem holds for temporary stationary observations. It does under mild moment conditions on the Zis and on the strength of correlation. In particular, it holds for the stationary AR(1) model. Some details are given in Chapter 10.

Evaluating the approximate coverage from the last expression, we see that when ρ = 0.2, the ratio R = 0.667 and the approximate coverage of the usual nominal 95% confidence interval is 89%. When ρ = 0.5, R = 0.333 and the approximate coverage is 74%. The true coverage has begun to differ from the nominal of 95% so much that the interval is not performing at all as advertised. When ρ = 0.9, R = 0.053, and the true coverage is approximately 35%. This interval is completely unreliable. It is seen that the undercoverage becomes more severe as temporal correlation increases.

Using the correct interval, with Image replacing σ, makes the interval wider, but we now obtain approximately the correct coverage. Note, however, that the estimator, Image, is still (mean square) consistent for its target, μ, as we still have Var(Image) → 0, as n → ∞, whenever Image.

To generalize this to the spatial setting, first note that we can write the conditional mean and the conditional variance for the temporal AR(1) model as:

Image

and

Image

A spatial first-order autoregressive model is a direct generalization of these two conditional moments. Specifically, conditioning on the past is replaced by conditioning on all other observations. In the temporal AR(1) case, it is assumed that the conditional distribution of the present given the past depends only on the immediate past. The spatial analogue assumes that the conditional distribution of Z(s) depends only on the nearest neighbors of s. Specifically, with equally spaced observations in two dimensions, assume that:

Image

and

Image

Note how these two conditional moments are a natural spatial analogue to the conditional moments in the temporal AR(1) model. If the observations follow a normal distribution, then we call this spatial model a Gaussian first-order autoregressive model. The first-order Gaussian autoregressive model is an example of a (spatial) Markov process. Figure 1.4 shows sample observations from a first-order Gaussian model on a 100 × 100 grid with γ = 0.0 and γ = 0.2. Note how high values (and low values) tend to accumulate near each other for the γ = 0.2 data set. In particular, we find that when γ = 0.0, 2428 of the observations with a positive neighbor sum (of which there are 4891) are also positive (49.6 percent), while when γ = 0.2, we have that 3166 of the observations with a positive neighbor sum (of which there are 4813) are also positive (65.8 percent). To see the effects of spatial correlation on inference for the mean, we again compare the true variances of the mean with the variances that ignore correlation.

First we need to find the variance of the mean as a function of the strength of correlation. Analogously to the temporal case, we have that

Image

as n → ∞. Unfortunately, unlike in the temporal case of an AR(1), it is not a simple matter to evaluate this sum for the conditionally specified spatial model. Instead, we compute the actual finite sample variance for any given sample size n and correlation parameter γ.

Towards this end, let Z denote the vector of n spatial observations (in some order). Then Var(Z) is an n × n matrix, and it can be shown using a factorization theorem of Besag (1974), that Image, where Γ is an n × n matrix with elements γst = γ whenever locations s and t are neighbors, that is, d = (s, t) = 1. This model is discussed further in Chapter 4.

Using this and the fact that Image, for any sample size n, we can compute the variance for any value of γ (this is simply the sum of all elements in ∑ divided by the sample size n). Take this value to be Image2. In the time series AR(1) setting, we were able to find the stationary variance explicitly. In this spatial model this is not simply done. Nevertheless, observations from the center of the spatial field are close to the stationary distribution. From the diagonal elements of Var(Z), for observations near the center of the field we can see the (unconditional) variance of a single observation, σ2, for various values of γ.

Figure 1.4 Output from two 100 × 100 first-order spatial Gaussian models.

c01f004

For a 30 × 30 grid of observations (with η2 = 1.0), direct calculation shows that Image2 = 1.24 for γ = 0.05 (σ2 = 1.01), Image2 = 1.63 for γ = 0.10 (σ2 = 1.03), and Image2 = 4.60 for γ = 0.20 (σ2 = 1.17). It is seen that, as in the temporal setting, the variance of the mean increases as spatial correlation increases, and that the ratio R = σ2/Image2 = 0.813, 0.632, 0.254 for γ = 0.05, 0.10, 0.20, respectively. This leads to approximate coverages of the usual 95% nominal confidence interval for μ of 92%, 88%, and 68%, respectively. We have seen that, as in the temporal setting, accounting for spatial correlation is necessary to obtain accurate inferences. Further, it is seen that when correlations are positive, ignoring the correlation leads to undercoverage of the incorrect intervals. This corresponds to an increased type-I error in hypothesis testing, and thus the errors are often of the most serious kind. Further, to obtain accurate inferences, we need to account for the spatial correlation and use the correct Image2, or a good estimate of it, in place of the incorrect σ2.

1.2.2 Prediction

To see the effects of correlation on prediction, consider again the temporal AR(1) process. In this situation, we observe the first n observations in time, Zi, i = 1,…, n, and seek to predict the unobserved Zn+1. If we entirely ignore the temporal correlation, then each observation is an equally good predictor, and this leads to the predictor Image. Direct calculation shows that the true expected square prediction error for this estimator, Image, is approximately given by

Image

where the error is in terms of order smaller than 1/n2. From this equation we see that, as Image. This is in stark contrast to the situation in Section 1.2.1, where the sample mean estimator has asymptotic MSE equal to 0. This is generally true in prediction. No amount of data will make the prediction error approach zero. The reason is that the future observation Zn+1 is random for any sample size n. Additionally, unlike in Section 1.2.1, we see that as ρ increases, Image decreases. So, although strong correlation is hurtful when the goal is estimation (i.e., estimation becomes more difficult), strong correlation is helpful when the goal is prediction (prediction becomes easier).

Consider the unbiased linear estimator, Image with Image, that minimizes MSE over a1,…, an. Then, it can be shown using the methods in Chapter 2, Section 2.2, that

Image

Note that this predictor is approximately the weighted average of Zn and the average of all previous time points. Also, the weight on Zn increases as correlation increases. The methods in Section 2.2 further show that, for this predictor,

Image

where the error is in terms of order smaller than 1/ n.

Imagine that we ignore the correlation and use the predictor Image (i.e., we assume ρ = 0). Then we would approximately report

Image

which is approximately equal to MSE(Image) (for large n and/or moderate ρ, the error is of order 1/n2), and thus Image is approximately accurate. Accurate means that the inferences drawn using this predictor and the assumed MSE would be approximately right for this predictor. In particular, prediction intervals will have approximately the correct coverage for the predictand Zn+1. This is in stark contrast to the estimation setting in Section 1.2.1, where ignoring the correlation led to completely inaccurate inferences (confidence intervals with coverage far from nominal). It seems that, in prediction, ignoring the correlation is not as serious as in estimation. It holds on the other hand, that

Image

This shows that Image is not the correct predictor under correlation. Correct means that the inferences drawn using this predictor are the ‘best’ possible. Here ‘best’ means the linear unbiased predictor with minimal variance. The predictor Image is both accurate and correct for the AR(1) model with known AR(1) parameter ρ. In estimation, the estimator Image is approximately correct for all but extremely large | ρ| , but is only approximately accurate when we use the correct variance, Image2.

The conclusions just drawn concerning prediction in the temporal setting are qualitatively similar in the spatial setting. Ignoring spatial correlation leads to predictions which are approximately accurate, but are not correct. The correct predictor is formed by accounting for the spatial correlation that is present. This is done using the kriging methodology discussed in Chapter 2.

In summary, it is seen that, when estimation is the goal, we need to account for correlation to draw accurate inferences. Specifically, when positive correlation is present, ignoring the correlation leads to confidence intervals which are too narrow. In other words, in hypothesis testing there is an inflated type-I error. When prediction is the goal, we can obtain approximately accurate inferences when ignoring correlations, but we need to account for the temporal or spatial correlation in order to obtain correct (i.e., efficient) predictions of unobserved variables.

We now discuss in the temporal setting a situation where ignoring correlation leads to inaccurate and surprising conclusions in the estimation setting.

1.3 Texas tidal data

A court case tried to decide a very fundamental question: where is the coastline, that is, the division between land and water. In many places of the world, most people would agree to within a few meters as to where the coastline is. However, near Port Mansfield, TX (south of Corpus Christi, TX), there is an area of approximately six miles between the intercoastal canal and a place where almost all people would agree land begins. Within this six-mile gap, it could be water or land depending on the season of the year and on the observer. To help determine a coastline it is informative to consider the history of this question.

In the 1300s, the Spanish ‘Las Siete Partidas,’ Law 4 of Title 28, stated that the ‘…sea shore is that space of ground … covered by water in their … highest annual swells.’ This suggests that the furthest reach of the water in a typical year determines the coastline. This coastline is approximately six miles away from the intercoastal canal. In 1935, the US Supreme Court, in Borax v. Los Angeles, established MHW – ‘Mean High Water’ as the definition of coastal boundary. This states that the coastline is the average of the daily high-tide reaches of the water. In 1956, in Rudder s. Ponder, Texas adopted MHW as the definition of coastal boundaries. This coastline is relatively close to the intercoastal canal. The two definitions of coastline do not agree in this case and we seek to understand which is more appropriate. The development here follows that in Sherman et al. (1997).

The hourly data in a typical year are given by Yt, t = 1,…, 8760, where Yt denotes the height of the water at hour t at a station at the intercoastal canal. The horizontal projection from this height determines the coastal boundary. The regression model dictated by NOAA (National Oceanographic and Atmospheric Administration) is

Image

where ai and bi are amplitudes associated with Si, the speed of the ith constituent, i = 1,…, 37 , and εts are random errors. The speeds are assumed to be known, while the amplitudes are unknown and need to be estimated. This model is similar to that in classical harmonic analysis and periodogram analysis as discussed in, for example, Hartley (1949).

The basic question in the coastal controversy is: which constituents best explain the variability in water levels? If annual or semiannual constituents explain a large proportion of the overall variability in tidal levels, this suggests that the flooded regions between the intercoastal canal and land are an important feature in the data, and suggests that the contested area cannot be called land. If, however, daily and twice-daily constituents explain most of the variability in tidal levels, then the contested area should be considered land. Note that the regression model is an example of a general linear model, and the amplitudes can be estimated using least squares estimation. In an effort to assess goodness of fit, consider the residuals from this fitted model. Figure 1.5 shows (a) the first 200 residuals, et, t = 1,…, 200, and (b) residuals et, t = 1001,…, 1200, from the least squares fit. One typical assumption in multiple regression is one of independent errors, that is, Cov[εs, εt] = 0 whenever st.

Notice that the plot of the first 200 residuals shows a stretch of approximately 60 consecutive negative residuals. This suggests that the errors are (strongly) positively correlated. The second residual plot similarly suggests a clear lack of independence in the errors, as do most stretches of residuals. From the results in estimation in Section 1.2.1, we know that ignoring the correlation would likely be a serious error if our goal is to estimate the mean of the process. The goal here, however, is to estimate the regression parameters in the harmonic analysis, and it is not clear in the regression setting what the effect of ignoring the correlation would be. To explore this, consider the setting of a simple linear regression model:

Figure 1.5 Two sets of residuals from OLS in the tidal data. (a) Residuals 1–200; (b) residuals 1001–1200.

c01f005
Image

where Yt is the response, xt denotes a covariate, and εt are stationary errors. The ordinary least squares (OLS) estimator of β is

Image

with an associated variance of

Image

It is seen that the variance of the estimated slope depends on the correlations between errors and on the structure of the design. To see the effect of the latter, consider AR(1) errors, εt = ρεt−1 + ηt [with Var(ηt) = 1.0], under the following two designs:

Design 1: Monotone

Image

Design 2: Alternating

Image

To numerically compare the variance under these two scenarios, consider T = 10 and ρ = 0.5. In this case we have

Image

If we ignore the correlation, we then report the variance to be:

Image

for both designs.

The conclusion is that, in contrast to the stationary case in Section 1.2.1, OLS variance estimates that ignore the positive correlation can under or over estimate the correct variance, depending on the structure of the design. In the tidal data, constituents of fast speed (small periods) correspond to the alternating design, while constituents of low speed (long periods) correspond to the monotone design.

Table 1.1 P-value comparison between ignoring and accounting for correlation in tidal data.

PeriodCorrelationOLS
8765 (annual)0.550≤0.001
4382 (semiannual)≤0.001≤0.001
3270.6900.002
25.8 (day)≤0.001≤0.001
12≤0.0010.145

OLS are the p-values for parameter estimates that ignore correlation. Correlation p-values account for the correlation.

Table 1.1 gives the p-values for the test that the given constituent is not present in the model, based on the usual t-statistic for a few selected constituents for data from 1993. One set of p-values is computed under an assumption of independent errors, while the second set of p-values is based on standard errors which account for correlation. Variances in these cases are constructed using the block bootstrap in the regression setting.

We discuss the block bootstrap in more detail in Chapter 10. The block bootstrap appears to be a reliable tool in this case, as the residual process is well approximated by a low-level autoregressive moving-average (ARMA) process.

From the table we see that the OLS p-values that ignore correlation cannot be trusted. Further, the errors in ignoring the correlation are as predicted from the simple linear regression example. Namely that, for long periods, OLS variance estimates underestimate the correct variance, and thus lead to large t-statistics and hence p-values which are too small. For short periods, however, this is reversed and the OLS variances are too large, leading to small t-statistics and overly large p-values. The block bootstrap accounts for the temporal correlation and gives reliable variance estimates and thus reliable p-values. A careful parametric approach that estimates the correlation from within the ARMA(p, q) class of models gives results similar to those using the block bootstrap.

Finally, the semiannual period (Period = 4382) is very significant. This suggests that the flooding of the contested area is a significant feature of the data and thus this area cannot reasonably be considered as land. This outcome is qualitatively similar for other years of data as well. Although the Mean High Water criterion may be reasonable for tides in Los Angeles, CA (on which the original Supreme Court decision was based), it does not appear to be reasonable for tides in Port Manfield, TX.

Much of the discussion in this chapter has focused on the role of correlation and how the effects of correlation are similar in the time series and spatial settings. There are, however, several fundamental differences between time series and spatial observations. Some of these will become clear as we develop spatial methodology. For now, note that in time there is a natural ordering, while this is not the case in the spatial setting. One effect of this became clear when considering the marginal variance of time series and spatial fields in Section 1.2.1. A second major difference between the time series and spatial settings is the effect of edge sites, observations on the domain boundary. For a time series of length n, there are only two observations on the boundary. For spatial observations on a n × n grid, there are approximately 4n observations on the boundary. The effects of this, and methods to account for a large proportion of edge sites are discussed in Section 4.2.1. A third fundamental difference is that, in time series, observations are typically equally spaced and predictions are typically made for future observations. In the spatial setting, observations are often not equally spaced and predictions are typically made for unobserved variables ‘between’ existing observations. The effects of this will be discussed throughout the text, but especially in Chapters 3 and 5. Other differences in increased computational burden, complicated parameter estimation, and unwieldy likelihoods in the spatial setting will be discussed, particularly in Chapters 4 and 9.

2

Geostatistics

The name geostatistics comes from the common applications in geology. The term is somewhat unfortunate, however, because the general methodology is quite general, and appropriate to the effort to understand the type and extent of spatial correlation, and to use this knowledge to make efficient predictions. The basic situation is as follows:

A process of interest is observed at n locations s1, … , sn. These n observations are denoted by Z(s1), … , Z(sn), although sometimes this will be denoted as Z1, … , Zn. This is simply shorthand, as we typically need to know the specific locations s1, …, sn to carry out any estimation and inferences. In the phosphorus example, described in Chapter 1, the n = 103 locations are displayed in Figure 1.1. The main goals are:

i. Predict the process at unobserved locations. For example, let s0 denote a specific location of interest, and let Z(s0) denote the amount of log(Phosphorus) at the specific location of interest. We desire the unobserved value Z(s0). If, however, for example, we desire the total amount of phosphorus in a region B of the pond, then we seek to predict B Z(s)d s.

ii. Understand the connection between process values at ‘neighboring’ locations. In the phosphorus study this helps to understand the additional information gained by more intensive sampling.

In order to describe the relationship between nearby observations, we (re)introduce the following models of correlation:

Second-order stationarity (SOS)

i. E[Z(s)] − µ = 0,

ii. Cov[Z (s + h), Z(s)] = Cov[Z(h), Z(0)] = C(h) for all shifts h.

This is identical to the notion introduced in Section 1.1.

Intrinsic stationarity (IS)

i. E[Z(s) − µ] = 0,

ii. Var[Z(s + h) − Z(s)] =: 2γ(h) for all shifts h,

which is a closely related notion.

Although it is stated in terms of a difference, assumption (i) for IS is identical to assumption (i) in SOS. Under IS, from (i), we can write 2γ(h) = E[Z(s + h) − Z(s)]2.

Typically, 2γ(h) is called the variogram function, and γ(h) is called the semivariogram, although some also call γ(h) the variogram function.

A reasonable question is: why consider the assumption of IS when the assumption of SOS is apparently more immediate and interpretable. There are four main reasons to consider the variogram and the assumption of IS.

i. IS is a more general assumption than SOS.

Assume that SOS holds, then direct calculation shows that 2γ(h) = 2[C(0) − C(h)]. Thus, we can define the variogram as the right hand side of the previous equality and IS then holds.

On the other hand, for temporal data consider an AR(1) process with ρ = 1, that is, Zi = Zi−1 + ɛi. Iterating, we have

(2.1)  Image

Then Cov[Zi+j, Zi] = iσ2, which is a function of i, so SOS does not hold. We can, however, simply write 2γ(j) = j σ2 so IS holds. This example is called an unstable AR(1) in the time series literature. It is also known as a random walk in one dimension, or one-dimensional Brownian motion.

ii. The variogram adapts more easily to nonstationary observations.

One such example is when the mean function is not constant. This is explored in Section 3.5.1.

iii. To estimate the variogram, no estimate of µ is required.

Using increments filters out the mean. This is not the case when estimating the covariance function. For this, an estimate of µ is required. This is shown in Section 3.1.

iv. Estimation of the variogram is easier than estimation of the covariance function. This is shown, for example, in Cressie (1993).

Although, from (i), IS is a more general assumption, (iv) states that estimation of the variogram is easier than estimation of the covariance function. Easier means that the common moment estimator is more accurate in most cases. In particular, the natural moment estimator of the variogram is unbiased under IS, while the covariance estimator is biased. This is surprising given reason (i): the more general estimator is more easily estimated.

2.1 A model for optimal prediction and error assessment

To begin, start with the simplest structure for the underlying mean function, namely that it is constant. Specifically, the initial model is

(2.2)  Image

where µ is an unknown constant, and δ(s) is intrinsically stationary, IS. Notice this is exactly the assumption that Z(s) is IS. Although simple, this is a model that is often practically useful. When sampling is relatively dense, and the mean function changes smoothly, there is often little harm in assuming that the mean function is constant.

Consider an unsampled location, s0, where we desire to know the unobserved process value Z(s0). The goal is to find the ‘best’ prediction of Z(s0) based on the observations Z(s1), … , Z(sn).

First we need to define what is meant by best. As in the time series setting, described in Chapter 1, this is taken to mean having the lowest squared prediction error, that is, the desire is to minimize ImageImage

We now show that the solution is:

Image

To see this, let Zn: = [Z(s1), …, Z(sn)] denote the vector of n observations. Then, adding and subtracting the solution inside the square, we have

Image

The third term in the last expression is 0. This can be seen by conditioning on Zn, and observing that both Image and E[Z(s0)|Zn] are constant given Zn. Finally, E{E[Z(s0)|Zn] − Z(s0)} given Zn is identically 0 for any value of Zn. Thus,

Image

and this is minimized when Image The prediction variance for this predictor is then

Image

The predictor, Image, makes great intuitive sense. The best guess of the response at any unsampled location based on observed values is its expectation given all the observed values. The fact that we\break can easily derive this optimal predictor is the good news. The difficulty occurs in attempting to calculate this predictor. When Z(s) has a continuous distribution (i.e., a density function), this conditional expectation depends on the joint distribution of the observations Z(s0), Z(s1), …, Z(sn). Thus, computation of this conditional mean requires knowledge of this joint density, and calculation of a (n + 1)-dimensional integral. Given that we have only n available data points, there is no hope of estimating this joint distribution, and thus little hope of estimating the conditional expectation.

A more modest goal is to seek the predictor, Image that is the best linear function of the observed values, that is,

Image

for constants λi, i = 1,…,n. This has reduced the problem from estimating an (n + 1)-dimensional distribution to estimating n constants.

Certainly we could take all λi = 1/n, which gives Image the sample mean. The question is: can we do better? Before launching into finding the best linear estimator, an immediate question arises: are there any constraints on the coefficients λi? Given that the mean is assumed to be constant in Equation 2.2, it seems natural to require that Image This would imply that the expectation of our predictor is the expectation of the predictand, that is, the predictor is unbiased. For this to hold, it is clear that Image is required. Thus the goal has now become: minimize

Image

Can any of the λis be negative? At first glance, allowing negative weights may seem inappropriate. Negative weights can lead to inappropriate predictions for variables known to be positive. On the other hand, allowing negative weights is necessary to allow for predictions to be larger than any observed values (or smaller than any observed values), and this is often desirable. For this reason, the development here allows for negative weights in prediction. Further, negative weights will be seen in examples, for instance in Section 2.2.2 and in Section 2.5.1.

2.2 Optimal prediction (kriging)

Now that we have defined what is meant by best, we can address obtaining the best linear predictor. First expand the square error of prediction:

Image

Now, completing the square on the first two terms on the right hand side of the previous equation by adding Image subtracting the same quantity from the third term, and using the fact that Image we have that

Image

Now, taking the expectation of both sides gives:

(2.3)  Image

using the definition of the variogram function and the fact that the semivariogram, Image for all h. Note that this last expression in Equation 2.3 is the prediction variance for a linear predictor using any weights λi, i = 1,…,n.