Contents
Preface
1 Introduction
1.1 Stationarity
1.2 The effect of correlation in estimation and prediction
2 Geostatistics
2.1 A model for optimal prediction and error assessment
2.2 Optimal prediction (kriging)
2.3 Prediction intervals
2.4 Universal kriging
2.5 The intuition behind kriging
3 Variogram and covariance models and estimation
3.1 Empirical estimation of the variogram or covariance function
3.2 On the necessity of parametric variogram and covariance models
3.3 Covariance and variogram models
3.4 Convolution methods and extensions
3.5 Parameter estimation for variogram and covariance models
3.6 Prediction for the phosphorus data
3.7 Nonstationary covariance models
4 Spatial models and statistical inference
4.1 Estimationinthe Gaussian case
4.2 Estimation for binary spatial observations
5 Isotropy
5.1 Geometric anisotropy
5.2 Other typesofanisotropy
5.3 Covariance modeling under anisotropy
5.4 Detectionofanisotropy: the rose plot
5.5 Parametric methodstoassess isotropy
5.6 Nonparametric methods of assessing anisotropy
5.7 Assessment of isotropy for general sampling designs
5.8 An assessment of isotropy for the longleaf pine sizes
6 Space–time data
6.1 Space–time observations
6.2 Spatio-temporal stationarity and spatio-temporal prediction
6.3 Empirical estimation of the variogram, covariance models, and estimation
6.4 Spatio-temporal covariance models
6.5 Space–time models
6.6 Parametric methods of assessing full symmetry and space–time separability
6.7 Nonparametric methods of assessing full symmetry and space–time separability
6.8 Nonstationary space–time covariance models
7 Spatial point patterns
7.1 The Poisson process and spatial randomness
7.2 Inhibition models
7.3 Clustered models
8 Isotropy for spatial point patterns
8.1 Some large sample results
8.2 Atest for isotropy
8.3 Practical issues
8.4 Numerical results
8.5 An application to leukemia data
9 Multivariate spatial and spatio-temporal models
9.1 Cokriging
9.2 Analternativetocokriging
9.3 Multivariate covariance functions
9.4 Testing and assessing intrinsic correlation
9.5 Numerical experiments
9.6 Adata applicationtopollutants
9.7 Discussion
10 Resampling for correlated observations
10.1 Independent observations
10.2 Other data structures
10.3 Model-based bootstrap
10.4 Model-free resampling methods
10.5 Spatial resampling
10.6 Model-free spatial resampling
10.7 Unequally spaced observations
Bibliography
Index
1
Introduction
Spatial statistics, like all branches of statistics, is the process of learning from data. Many of the questions that arise in spatial analyses are common to all areas of statistics. Namely,
i. What are the phenomena under study.
ii. What are the relevant data and how should it be collected.
iii. How should we analyze the data after it is collected.
iv. How can we draw inferences from the data collected to the phenomena under study.
The way these questions are answered depends on the type of phenomena under study. In the spatial or spatio-temporal setting, these issues are typically addressed in certain ways. We illustrate this from the following study of phosphorus measurements in shrimp ponds.
Figure 1.1 gives the locations of phosphorus measurements in a 300m × 100m pond in a Texas shrimp farm.
i. The phenomena under study are:
a. Are the observed measurements sufficient to measure total phosphorus in the pond? What can be gained in precision by further sampling?
b. What are the levels of phosphorus at unsampled locations in the pond, and how can we predict them?
c. How does the phosphorus level at one location relate to the amount at another location?
d. Does this relationship depend only on distance or also on direction?
ii. The relevant data that are collected are as follows: a total of n = 103 samples were collected from the top 10 cm of the soil from each pond by a core sampler with a 2.5 cm diameter. We see 15 equidistant samples on the long edge (300 m), and 5 equidistant samples from the short edge (100 m). Additionally, 14 samples were taken from each of the shallow and deep edges of each pond. The 14 samples were distributed in a cross shape. Two of the sides of the cross consist of samples at distances of 1, 5, 10, and 15 m from the center while the remaining two have samples at 1, 5, and 10 m from the center.
iii. The analysis of the data shows that the 14 samples in each of the two cross patterns turn out to be very important for both the analysis, (iii), and inferences, (iv), drawn from these data. This will be discussed further in Section 3.5.
iv. Inferences show that the answer to (d) helps greatly in answering question (c), which in turn helps in answering question (b) in an informative and efficient manner. Further, the answers to (b), (c), and (d) determine how well we can answer question (a). Also, we will see that increased sampling will not give much better answers to (a); while addressing (c), it is found that phosphorus levels are related but only up to a distance of about 15–20 m. The exact meaning of ‘related,’ and how these conclusions are reached, are discussed in the next paragraph and in Chapter 2.
We consider all observed values to be the outcome of random variables observed at the given locations. Let {Z(si), i = 1,…, n} denote the random quantity Z of interest observed at locations , where D is the domain where observations are taken, and d is the dimension of the domain. In the phosphorus study, Z(si) denotes the log(phosphorus) measurement at the ith sampling location, i = 1,…, 103. The dimension d is 2, and the domain D is the 300m × 100m pond. For usual spatial data, the dimension, d, is 2.
Sometimes the locations themselves will be considered random, but for now we consider them to be fixed by the experimenter (as they are, e.g., in the phosphorus study). A fundamental concept for addressing question (iii) in the first paragraph of the introduction is the covariance function.
For any two variables Z(s) and Z(t) with means μ(s) and μ(t), respectively, we define the covariance to be
The correlation function is then Cov[Z(s), Z(t)]/(σsσt), where σs and σt denote the standard deviations of the two variables. We see, for example, that if all random observations are independent, then the covariance and the correlation are identically zero, for all locations s and t, such that s ≠ t. In the special case where the mean and variances are constant, that is, μ(t) = μ and σs = σ for all locations s, we have
The covariance function, which is very important for prediction and inference, typically needs to be estimated. Without any replication this is usually not feasible. We next give a common assumption made in order to obtain replicates.
1.1 Stationarity
A standard method of obtaining replication is through the assumption of second-order stationarity (SOS). This assumption holds that:
i. E[Z(s)] = μ;
ii. Cov[Z(s), Z(t)] = Cov[Z(s + h), (t + h)] for all shifts h.
Figure 1.2 shows the locations for a particular shift vector h. In this case we can write
so that the covariance depends only on the spatial lag between the locations, t − s, and not on the two locations themselves. Second-order stationarity is often known as ‘weak stationarity.’ Strong (or strict) stationarity assumes that, for any collection of k variables, Z(si), i = 1,…, k, and constants ai, i = 1,…, k, we have
for all shift vectors h.
This says that the entire joint distribution of k variables is invariant under shifts. Taking k = 1 and k = 2, and observing that covariances are determined by the joint distribution, it is seen that strong stationarity implies SOS. Generally, to answer the phenomenon of interest in the phosphorus study (and many others) only the assumption of weak stationarity is necessary. Still, we will have occasions to use both concepts in what follows.
It turns out that the effects of covariance and correlation in estimation and prediction are entirely different. To illustrate this, the role of covariance in estimation and prediction is considered in the times series setting (d = 1). The lessons learned here are more simply derived, but are largely analogous to the situation for spatial observations, and spatio-temporal observations.
1.2 The effect of correlation in estimation and prediction
1.2.1 Estimation
Consider equally spaced observations, Zi, representing the response variable of interest at time i. Assume that the observations come from an autoregressive time series of order one. This AR(1) model is given by
where the independent errors, εi, are such that E(εi) = 0 and Var(εi) = η2. For the sake of simplicity, take μ = 0 and η2 = 1, and then the AR(1) model simplifies to
with Var(εi) = 1.
For −1 < ρ < 1, assume that Var(Zi) is constant. Then we have Var(Zi) = (1 − ρ2)−1, and thus direct calculations show that Cov(Zi+1, Zi) = ρ/(1 − ρ2). Iteration then shows that, for any time lag k, we have:
Noting that the right hand side does not depend on i, it is seen that SOS holds, and we can define C(k) := ρ|k|/(1 − ρ2). Further, note that the distribution of Zi conditional on the entire past is the same as the distribution of Zi given only the immediate past, Zi−1. Any such process is an example of a Markov process. We say that the AR(1) process is a Markov process of order one, as the present depends only on the one, immediately previous observation in time.
Figure 1.3 shows the outcomes of three AR(1) time series, the first an uncorrelated series (ρ = 0.0), the second with moderate correlation (ρ = 0.5), and the third with strong correlation (ρ = 0.9). Each time series consists of n = 50 temporal observations. Note that as ρ increases the oscillations of the time plots decrease. For example, the number of crossings of the mean (µ = 0) decreases from 22 (ρ = 0), to 17 (ρ = 0.5), to 8 (ρ = 0.9). In other words, the ‘smoothness’ of the time plots increases. This notion of ‘smoothness’ and its importance in spatial prediction is discussed in Section 3.3.
To examine the effect of correlation on estimation, assume that SOS holds. From observations Zi, i = 1,…, n, we seek to estimate and draw inferences concerning the mean of the process, μ. To do this we desire a confidence interval for the unknown μ. Under SOS, it holds that each observation has the same variability, σ2 = Var(Zi), and to simplify we assume that this value is known. The usual large sample 95 percent confidence interval for μ is then given by
where
denotes the sample mean of the observations.
We hope that the true coverage of this interval is equal to the nominal coverage of 95%. To see the true coverage of this interval, continue to assume that the data come from a (SOS) time series. Using the fact that, for any constants ai, i = 1,…, n, we have
and setting ai = 1/n, for all i, gives:
where the second equality uses SOS and counting. To evaluate this for large n, we need the following result named after the 19th century mathematician, Leopold Kronecker:
Lemma 1.1 Kronecker's lemma
For a sequence of numbers ai, i = 1,…, such that
we have that
In a direct application taking ai = C(i) in Kronecker’s lemma, it is seen that:
and thus that
whenever This last condition, known as ‘summable covariances’ says that the variance of the mean tends to 0 at the same rate as in the case of independent observations. In particular, it holds for the AR(1) process with −1 < ρ < 1. The same rate of convergence does not mean, however, that the correlation has no effect in estimation.
To see the effects of correlation, note that in the case of independent observations, it holds that the variance of the standardized mean, It is seen that in the presence of correlation, the true variance of the standardized mean, , is quite different from σ2. In particular, for the stationary AR(1) process (with η = 1), σ2 = C(0) = (1 − ρ2)−1 while arithmetic shows that = (1 − ρ)−2, so that the ratio of the large sample variance of the mean under independence to the true variance of the mean is R = σ2/ = (1 − ρ)/(1 + ρ). In the common situation where correlation is positive, 0 < ρ < 1, we see that ignoring correlation leads to underestimation of the correct variance.
To determine the practical effect of this, let Ф(·) denote the cumulative distribution function of a standard normal variable. The coverage of the interval that ignores the correlation is given by
We have assumed that the Central Limit theorem holds for temporary stationary observations. It does under mild moment conditions on the Zis and on the strength of correlation. In particular, it holds for the stationary AR(1) model. Some details are given in Chapter 10.
Evaluating the approximate coverage from the last expression, we see that when ρ = 0.2, the ratio R = 0.667 and the approximate coverage of the usual nominal 95% confidence interval is 89%. When ρ = 0.5, R = 0.333 and the approximate coverage is 74%. The true coverage has begun to differ from the nominal of 95% so much that the interval is not performing at all as advertised. When ρ = 0.9, R = 0.053, and the true coverage is approximately 35%. This interval is completely unreliable. It is seen that the undercoverage becomes more severe as temporal correlation increases.
Using the correct interval, with replacing σ, makes the interval wider, but we now obtain approximately the correct coverage. Note, however, that the estimator, , is still (mean square) consistent for its target, μ, as we still have Var() → 0, as n → ∞, whenever .
To generalize this to the spatial setting, first note that we can write the conditional mean and the conditional variance for the temporal AR(1) model as:
and
A spatial first-order autoregressive model is a direct generalization of these two conditional moments. Specifically, conditioning on the past is replaced by conditioning on all other observations. In the temporal AR(1) case, it is assumed that the conditional distribution of the present given the past depends only on the immediate past. The spatial analogue assumes that the conditional distribution of Z(s) depends only on the nearest neighbors of s. Specifically, with equally spaced observations in two dimensions, assume that:
and
Note how these two conditional moments are a natural spatial analogue to the conditional moments in the temporal AR(1) model. If the observations follow a normal distribution, then we call this spatial model a Gaussian first-order autoregressive model. The first-order Gaussian autoregressive model is an example of a (spatial) Markov process. Figure 1.4 shows sample observations from a first-order Gaussian model on a 100 × 100 grid with γ = 0.0 and γ = 0.2. Note how high values (and low values) tend to accumulate near each other for the γ = 0.2 data set. In particular, we find that when γ = 0.0, 2428 of the observations with a positive neighbor sum (of which there are 4891) are also positive (49.6 percent), while when γ = 0.2, we have that 3166 of the observations with a positive neighbor sum (of which there are 4813) are also positive (65.8 percent). To see the effects of spatial correlation on inference for the mean, we again compare the true variances of the mean with the variances that ignore correlation.
First we need to find the variance of the mean as a function of the strength of correlation. Analogously to the temporal case, we have that
as n → ∞. Unfortunately, unlike in the temporal case of an AR(1), it is not a simple matter to evaluate this sum for the conditionally specified spatial model. Instead, we compute the actual finite sample variance for any given sample size n and correlation parameter γ.
Towards this end, let Z denote the vector of n spatial observations (in some order). Then Var(Z) is an n × n matrix, and it can be shown using a factorization theorem of Besag (1974), that , where Γ is an n × n matrix with elements γst = γ whenever locations s and t are neighbors, that is, d = (s, t) = 1. This model is discussed further in Chapter 4.
Using this and the fact that , for any sample size n, we can compute the variance for any value of γ (this is simply the sum of all elements in ∑ divided by the sample size n). Take this value to be 2. In the time series AR(1) setting, we were able to find the stationary variance explicitly. In this spatial model this is not simply done. Nevertheless, observations from the center of the spatial field are close to the stationary distribution. From the diagonal elements of Var(Z), for observations near the center of the field we can see the (unconditional) variance of a single observation, σ2, for various values of γ.
For a 30 × 30 grid of observations (with η2 = 1.0), direct calculation shows that 2 = 1.24 for γ = 0.05 (σ2 = 1.01), 2 = 1.63 for γ = 0.10 (σ2 = 1.03), and 2 = 4.60 for γ = 0.20 (σ2 = 1.17). It is seen that, as in the temporal setting, the variance of the mean increases as spatial correlation increases, and that the ratio R = σ2/2 = 0.813, 0.632, 0.254 for γ = 0.05, 0.10, 0.20, respectively. This leads to approximate coverages of the usual 95% nominal confidence interval for μ of 92%, 88%, and 68%, respectively. We have seen that, as in the temporal setting, accounting for spatial correlation is necessary to obtain accurate inferences. Further, it is seen that when correlations are positive, ignoring the correlation leads to undercoverage of the incorrect intervals. This corresponds to an increased type-I error in hypothesis testing, and thus the errors are often of the most serious kind. Further, to obtain accurate inferences, we need to account for the spatial correlation and use the correct 2, or a good estimate of it, in place of the incorrect σ2.
1.2.2 Prediction
To see the effects of correlation on prediction, consider again the temporal AR(1) process. In this situation, we observe the first n observations in time, Zi, i = 1,…, n, and seek to predict the unobserved Zn+1. If we entirely ignore the temporal correlation, then each observation is an equally good predictor, and this leads to the predictor . Direct calculation shows that the true expected square prediction error for this estimator, , is approximately given by
where the error is in terms of order smaller than 1/n2. From this equation we see that, as . This is in stark contrast to the situation in Section 1.2.1, where the sample mean estimator has asymptotic MSE equal to 0. This is generally true in prediction. No amount of data will make the prediction error approach zero. The reason is that the future observation Zn+1 is random for any sample size n. Additionally, unlike in Section 1.2.1, we see that as ρ increases, decreases. So, although strong correlation is hurtful when the goal is estimation (i.e., estimation becomes more difficult), strong correlation is helpful when the goal is prediction (prediction becomes easier).
Consider the unbiased linear estimator, with , that minimizes MSE over a1,…, an. Then, it can be shown using the methods in Chapter 2, Section 2.2, that
Note that this predictor is approximately the weighted average of Zn and the average of all previous time points. Also, the weight on Zn increases as correlation increases. The methods in Section 2.2 further show that, for this predictor,
where the error is in terms of order smaller than 1/ n.
Imagine that we ignore the correlation and use the predictor (i.e., we assume ρ = 0). Then we would approximately report
which is approximately equal to MSE() (for large n and/or moderate ρ, the error is of order 1/n2), and thus is approximately accurate. Accurate means that the inferences drawn using this predictor and the assumed MSE would be approximately right for this predictor. In particular, prediction intervals will have approximately the correct coverage for the predictand Zn+1. This is in stark contrast to the estimation setting in Section 1.2.1, where ignoring the correlation led to completely inaccurate inferences (confidence intervals with coverage far from nominal). It seems that, in prediction, ignoring the correlation is not as serious as in estimation. It holds on the other hand, that
This shows that is not the correct predictor under correlation. Correct means that the inferences drawn using this predictor are the ‘best’ possible. Here ‘best’ means the linear unbiased predictor with minimal variance. The predictor is both accurate and correct for the AR(1) model with known AR(1) parameter ρ. In estimation, the estimator is approximately correct for all but extremely large | ρ| , but is only approximately accurate when we use the correct variance, 2.
The conclusions just drawn concerning prediction in the temporal setting are qualitatively similar in the spatial setting. Ignoring spatial correlation leads to predictions which are approximately accurate, but are not correct. The correct predictor is formed by accounting for the spatial correlation that is present. This is done using the kriging methodology discussed in Chapter 2.
In summary, it is seen that, when estimation is the goal, we need to account for correlation to draw accurate inferences. Specifically, when positive correlation is present, ignoring the correlation leads to confidence intervals which are too narrow. In other words, in hypothesis testing there is an inflated type-I error. When prediction is the goal, we can obtain approximately accurate inferences when ignoring correlations, but we need to account for the temporal or spatial correlation in order to obtain correct (i.e., efficient) predictions of unobserved variables.
We now discuss in the temporal setting a situation where ignoring correlation leads to inaccurate and surprising conclusions in the estimation setting.
1.3 Texas tidal data
A court case tried to decide a very fundamental question: where is the coastline, that is, the division between land and water. In many places of the world, most people would agree to within a few meters as to where the coastline is. However, near Port Mansfield, TX (south of Corpus Christi, TX), there is an area of approximately six miles between the intercoastal canal and a place where almost all people would agree land begins. Within this six-mile gap, it could be water or land depending on the season of the year and on the observer. To help determine a coastline it is informative to consider the history of this question.
In the 1300s, the Spanish ‘Las Siete Partidas,’ Law 4 of Title 28, stated that the ‘…sea shore is that space of ground … covered by water in their … highest annual swells.’ This suggests that the furthest reach of the water in a typical year determines the coastline. This coastline is approximately six miles away from the intercoastal canal. In 1935, the US Supreme Court, in Borax v. Los Angeles, established MHW – ‘Mean High Water’ as the definition of coastal boundary. This states that the coastline is the average of the daily high-tide reaches of the water. In 1956, in Rudder s. Ponder, Texas adopted MHW as the definition of coastal boundaries. This coastline is relatively close to the intercoastal canal. The two definitions of coastline do not agree in this case and we seek to understand which is more appropriate. The development here follows that in Sherman et al. (1997).
The hourly data in a typical year are given by Yt, t = 1,…, 8760, where Yt denotes the height of the water at hour t at a station at the intercoastal canal. The horizontal projection from this height determines the coastal boundary. The regression model dictated by NOAA (National Oceanographic and Atmospheric Administration) is
where ai and bi are amplitudes associated with Si, the speed of the ith constituent, i = 1,…, 37 , and εts are random errors. The speeds are assumed to be known, while the amplitudes are unknown and need to be estimated. This model is similar to that in classical harmonic analysis and periodogram analysis as discussed in, for example, Hartley (1949).
The basic question in the coastal controversy is: which constituents best explain the variability in water levels? If annual or semiannual constituents explain a large proportion of the overall variability in tidal levels, this suggests that the flooded regions between the intercoastal canal and land are an important feature in the data, and suggests that the contested area cannot be called land. If, however, daily and twice-daily constituents explain most of the variability in tidal levels, then the contested area should be considered land. Note that the regression model is an example of a general linear model, and the amplitudes can be estimated using least squares estimation. In an effort to assess goodness of fit, consider the residuals from this fitted model. Figure 1.5 shows (a) the first 200 residuals, et, t = 1,…, 200, and (b) residuals et, t = 1001,…, 1200, from the least squares fit. One typical assumption in multiple regression is one of independent errors, that is, Cov[εs, εt] = 0 whenever s ≠ t.
Notice that the plot of the first 200 residuals shows a stretch of approximately 60 consecutive negative residuals. This suggests that the errors are (strongly) positively correlated. The second residual plot similarly suggests a clear lack of independence in the errors, as do most stretches of residuals. From the results in estimation in Section 1.2.1, we know that ignoring the correlation would likely be a serious error if our goal is to estimate the mean of the process. The goal here, however, is to estimate the regression parameters in the harmonic analysis, and it is not clear in the regression setting what the effect of ignoring the correlation would be. To explore this, consider the setting of a simple linear regression model:
where Yt is the response, xt denotes a covariate, and εt are stationary errors. The ordinary least squares (OLS) estimator of β is
with an associated variance of
It is seen that the variance of the estimated slope depends on the correlations between errors and on the structure of the design. To see the effect of the latter, consider AR(1) errors, εt = ρεt−1 + ηt [with Var(ηt) = 1.0], under the following two designs:
Design 1: Monotone
Design 2: Alternating
To numerically compare the variance under these two scenarios, consider T = 10 and ρ = 0.5. In this case we have
If we ignore the correlation, we then report the variance to be:
for both designs.
The conclusion is that, in contrast to the stationary case in Section 1.2.1, OLS variance estimates that ignore the positive correlation can under or over estimate the correct variance, depending on the structure of the design. In the tidal data, constituents of fast speed (small periods) correspond to the alternating design, while constituents of low speed (long periods) correspond to the monotone design.
Table 1.1 P-value comparison between ignoring and accounting for correlation in tidal data.
|
8765 (annual) | 0.550 | ≤0.001 |
4382 (semiannual) | ≤0.001 | ≤0.001 |
327 | 0.690 | 0.002 |
25.8 (day) | ≤0.001 | ≤0.001 |
12 | ≤0.001 | 0.145 |
Table 1.1 gives the p-values for the test that the given constituent is not present in the model, based on the usual t-statistic for a few selected constituents for data from 1993. One set of p-values is computed under an assumption of independent errors, while the second set of p-values is based on standard errors which account for correlation. Variances in these cases are constructed using the block bootstrap in the regression setting.
We discuss the block bootstrap in more detail in Chapter 10. The block bootstrap appears to be a reliable tool in this case, as the residual process is well approximated by a low-level autoregressive moving-average (ARMA) process.
From the table we see that the OLS p-values that ignore correlation cannot be trusted. Further, the errors in ignoring the correlation are as predicted from the simple linear regression example. Namely that, for long periods, OLS variance estimates underestimate the correct variance, and thus lead to large t-statistics and hence p-values which are too small. For short periods, however, this is reversed and the OLS variances are too large, leading to small t-statistics and overly large p-values. The block bootstrap accounts for the temporal correlation and gives reliable variance estimates and thus reliable p-values. A careful parametric approach that estimates the correlation from within the ARMA(p, q) class of models gives results similar to those using the block bootstrap.
Finally, the semiannual period (Period = 4382) is very significant. This suggests that the flooding of the contested area is a significant feature of the data and thus this area cannot reasonably be considered as land. This outcome is qualitatively similar for other years of data as well. Although the Mean High Water criterion may be reasonable for tides in Los Angeles, CA (on which the original Supreme Court decision was based), it does not appear to be reasonable for tides in Port Manfield, TX.
Much of the discussion in this chapter has focused on the role of correlation and how the effects of correlation are similar in the time series and spatial settings. There are, however, several fundamental differences between time series and spatial observations. Some of these will become clear as we develop spatial methodology. For now, note that in time there is a natural ordering, while this is not the case in the spatial setting. One effect of this became clear when considering the marginal variance of time series and spatial fields in Section 1.2.1. A second major difference between the time series and spatial settings is the effect of edge sites, observations on the domain boundary. For a time series of length n, there are only two observations on the boundary. For spatial observations on a n × n grid, there are approximately 4n observations on the boundary. The effects of this, and methods to account for a large proportion of edge sites are discussed in Section 4.2.1. A third fundamental difference is that, in time series, observations are typically equally spaced and predictions are typically made for future observations. In the spatial setting, observations are often not equally spaced and predictions are typically made for unobserved variables ‘between’ existing observations. The effects of this will be discussed throughout the text, but especially in Chapters 3 and 5. Other differences in increased computational burden, complicated parameter estimation, and unwieldy likelihoods in the spatial setting will be discussed, particularly in Chapters 4 and 9.
Although, from (i), IS is a more general assumption, (iv) states that estimation of the variogram is easier than estimation of the covariance function. Easier means that the common moment estimator is more accurate in most cases. In particular, the natural moment estimator of the variogram is unbiased under IS, while the covariance estimator is biased. This is surprising given reason (i): the more general estimator is more easily estimated.
To begin, start with the simplest structure for the underlying mean function, namely that it is constant. Specifically, the initial model is
First we need to define what is meant by best. As in the time series setting, described in Chapter 1, this is taken to mean having the lowest squared prediction error, that is, the desire is to minimize
The third term in the last expression is 0. This can be seen by conditioning on Zn, and observing that both and E[Z(s0)|Zn] are constant given Zn. Finally, E{E[Z(s0)|Zn] − Z(s0)} given Zn is identically 0 for any value of Zn. Thus,
Now that we have defined what is meant by best, we can address obtaining the best linear predictor. First expand the square error of prediction:
Now, completing the square on the first two terms on the right hand side of the previous equation by adding subtracting the same quantity from the third term, and using the fact that we have that
using the definition of the variogram function and the fact that the semivariogram, for all h. Note that this last expression in Equation 2.3 is the prediction variance for a linear predictor using any weights λi, i = 1,…,n.