Details

Basic Data Analysis for Time Series with R


Basic Data Analysis for Time Series with R


1. Aufl.

von: DeWayne R. Derryberry

104,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 23.07.2014
ISBN/EAN: 9781118593370
Sprache: englisch
Anzahl Seiten: 320

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Presents modern methods to analyzing data with multiple applications in a variety of scientific fields</b></p> <p>Written at a readily accessible level, <i>Basic Data Analysis for Time Series with R</i> emphasizes the mathematical importance of collaborative analysis of data used to collect increments of time or space. Balancing a theoretical and practical approach to analyzing data within the context of serial correlation, the book presents a coherent and systematic regression-based approach to model selection. The book illustrates these principles of model selection and model building through the use of information criteria, cross validation, hypothesis tests, and confidence intervals.</p> <p>Focusing on frequency- and time-domain and trigonometric regression as the primary themes, the book also includes modern topical coverage on Fourier series and Akaike's Information Criterion (AIC). In addition, <i>Basic Data Analysis for Time Series with R</i> also features:</p> <ul> <li>Real-world examples to provide readers with practical hands-on experience</li> <li>Multiple R software subroutines employed with graphical displays</li> <li>Numerous exercise sets intended to support readers understanding of the core concepts</li> <li>Specific chapters devoted to the analysis of the Wolf sunspot number data and the Vostok ice core data sets</li> </ul>
<p>PREFACE xv</p> <p>ACKNOWLEDGMENTS xvii</p> <p><b>PART I BASIC CORRELATION STRUCTURES</b></p> <p><b>1 RBasics 3</b></p> <p>1.1 Getting Started, 3</p> <p>1.2 Special R Conventions, 5</p> <p>1.3 Common Structures, 5</p> <p>1.4 Common Functions, 6</p> <p>1.5 Time Series Functions, 6</p> <p>1.6 Importing Data, 7</p> <p>Exercises, 7</p> <p><b>2 Review of Regression and More About R 8</b></p> <p>2.1 Goals of this Chapter, 8</p> <p>2.2 The Simple(ST) Regression Model, 8</p> <p>2.2.1 Ordinary Least Squares, 8</p> <p>2.2.2 Properties of OLS Estimates, 9</p> <p>2.2.3 Matrix Representation of the Problem, 9</p> <p>2.3 Simulating the Data from a Model and Estimating the Model Parameters in R, 9</p> <p>2.3.1 Simulating Data, 9</p> <p>2.3.2 Estimating the Model Parameters in R, 9</p> <p>2.4 Basic Inference for the Model, 12</p> <p>2.5 Residuals Analysis—What Can Go Wrong…, 13</p> <p>2.6 Matrix Manipulation in R, 15</p> <p>2.6.1 Introduction, 15</p> <p>2.6.2 OLS the Hard Way, 15</p> <p>2.6.3 Some Other Matrix Commands, 16</p> <p>Exercises, 16</p> <p><b>3 The Modeling Approach Taken in this Book and Some Examples of Typical Serially Correlated Data 18</b></p> <p>3.1 Signal and Noise, 18</p> <p>3.2 Time Series Data, 19</p> <p>3.3 Simple Regression in the Framework, 20</p> <p>3.4 Real Data and Simulated Data, 20</p> <p>3.5 The Diversity of Time Series Data, 21</p> <p>3.6 Getting Data Into R, 24</p> <p>3.6.1 Overview, 24</p> <p>3.6.2 The Diskette and the scan() and ts() Functions—New York City Temperatures, 25</p> <p>3.6.3 The Diskette and the read.table() Function—The Semmelweis Data, 25</p> <p>3.6.4 Cut and Paste Data to a Text Editor, 26</p> <p>Exercises, 26</p> <p><b>4 Some Comments on Assumptions 28</b></p> <p>4.1 Introduction, 28</p> <p>4.2 The Normality Assumption, 29</p> <p>4.2.1 Right Skew, 30</p> <p>4.2.2 Left Skew, 30</p> <p>4.2.3 Heavy Tails, 30</p> <p>4.3 Equal Variance, 31</p> <p>4.3.1 Two-Sample t-Test, 31</p> <p>4.3.2 Regression, 31</p> <p>4.4 Independence, 31</p> <p>4.5 Power of Logarithmic Transformations Illustrated, 32</p> <p>4.6 Summary, 34</p> <p>Exercises, 34</p> <p><b>5 The Autocorrelation Function And AR(1), AR(2) Models 35</b></p> <p>5.1 Standard Models—What are the Alternatives to White Noise?, 35</p> <p>5.2 Autocovariance and Autocorrelation, 36</p> <p>5.2.1 Stationarity, 36</p> <p>5.2.2 A Note About Conditions, 36</p> <p>5.2.3 Properties of Autocovariance, 36</p> <p>5.2.4 White Noise, 37</p> <p>5.2.5 Estimation of the Autocovariance and Autocorrelation, 37</p> <p>5.3 The acf() Function in R, 37</p> <p>5.3.1 Background, 37</p> <p>5.3.2 The Basic Code for Estimating the Autocovariance, 38</p> <p>5.4 The First Alternative to White Noise: Autoregressive Errors—AR(1), AR(2), 40</p> <p>5.4.1 Definition of the AR(1) and AR(2) Models, 40</p> <p>5.4.2 Some Preliminary Facts, 40</p> <p>5.4.3 The AR(1) Model Autocorrelation and Autocovariance, 41</p> <p>5.4.4 Using Correlation and Scatterplots to Illustrate the AR(1) Model, 41</p> <p>5.4.5 The AR(2) Model Autocorrelation and Autocovariance, 41</p> <p>5.4.6 Simulating Data for AR(m) Models, 42</p> <p>5.4.7 Examples of Stable and Unstable AR(1) Models, 44</p> <p>5.4.8 Examples of Stable and Unstable AR(2) Models, 46</p> <p>Exercises, 49</p> <p><b>6 The Moving Average Models MA(1) And MA(2) 51</b></p> <p>6.1 The Moving Average Model, 51</p> <p>6.2 The Autocorrelation for MA(1) Models, 51</p> <p>6.3 A Duality Between MA(l) And AR(m) Models, 52</p> <p>6.4 The Autocorrelation for MA(2) Models, 52</p> <p>6.5 Simulated Examples of the MA(1) Model, 52</p> <p>6.6 Simulated Examples of the MA(2) Model, 54</p> <p>6.7 AR(m) and MA(l) model acf() Plots, 54</p> <p>Exercises, 57</p> <p><b>PART II ANALYSIS OF PERIODIC DATA AND MODEL SELECTION</b></p> <p><b>7 Review of Transcendental Functions and Complex Numbers 61</b></p> <p>7.1 Background, 61</p> <p>7.2 Complex Arithmetic, 62</p> <p>7.2.1 The Number i, 62</p> <p>7.2.2 Complex Conjugates, 62</p> <p>7.2.3 The Magnitude of a Complex Number, 62</p> <p>7.3 Some Important Series, 63</p> <p>7.3.1 The Geometric and Some Transcendental Series, 63</p> <p>7.3.2 A Rationale for Euler’s Formula, 63</p> <p>7.4 Useful Facts About Periodic Transcendental Functions, 64</p> <p>Exercises, 64</p> <p><b>8 The Power Spectrum and the Periodogram 65</b></p> <p>8.1 Introduction, 65</p> <p>8.2 A Definition and a Simplified Form for p(f ), 66</p> <p>8.3 Inverting p(f ) to Recover the Ck Values, 66</p> <p>8.4 The Power Spectrum for Some Familiar Models, 68</p> <p>8.4.1 White Noise, 68</p> <p>8.4.2 The Spectrum for AR(1) Models, 68</p> <p>8.4.3 The Spectrum for AR(2) Models, 70</p> <p>8.5 The Periodogram, a Closer Look, 72</p> <p>8.5.1 Why is the Periodogram Useful?, 72</p> <p>8.5.2 Some Na¨ýve Code for a Periodogram, 72</p> <p>8.5.3 An Example—The Sunspot Data, 74</p> <p>8.6 The Function spec.pgram() in R, 75</p> <p>Exercises, 77</p> <p><b>9 Smoothers, The Bias-Variance Tradeoff, and the Smoothed Periodogram 79</b></p> <p>9.1 Why is Smoothing Required?, 79</p> <p>9.2 Smoothing, Bias, and Variance, 79</p> <p>9.3 Smoothers Used in R, 80</p> <p>9.3.1 The R Function lowess(), 81</p> <p>9.3.2 The R Function smooth.spline(), 82</p> <p>9.3.3 Kernel Smoothers in spec.pgram(), 83</p> <p>9.4 Smoothing the Periodogram for a Series With a Known and Unknown Period, 85</p> <p>9.4.1 Period Known, 85</p> <p>9.4.2 Period Unknown, 86</p> <p>9.5 Summary, 87</p> <p>Exercises, 87</p> <p><b>10 A Regression Model for Periodic Data 89</b></p> <p>10.1 The Model, 89</p> <p>10.2 An Example: The NYC Temperature Data, 91</p> <p>10.2.1 Fitting a Periodic Function, 91</p> <p>10.2.2 An Outlier, 92</p> <p>10.2.3 Refitting the Model with the Outlier Corrected, 92</p> <p>10.3 Complications 1: CO2 Data, 93</p> <p>10.4 Complications 2: Sunspot Numbers, 94</p> <p>10.5 Complications 3: Accidental Deaths, 96</p> <p>10.6 Summary, 96</p> <p>Exercises, 96</p> <p><b>11 Model Selection and Cross-Validation 98</b></p> <p>11.1 Background, 98</p> <p>11.2 Hypothesis Tests in Simple Regression, 99</p> <p>11.3 A More General Setting for Likelihood Ratio Tests, 101</p> <p>11.4 A Subtlety Different Situation, 104</p> <p>11.5 Information Criteria, 106</p> <p>11.6 Cross-validation (Data Splitting): NYC Temperatures, 108</p> <p>11.6.1 Explained Variation, R2, 108</p> <p>11.6.2 Data Splitting, 108</p> <p>11.6.3 Leave-One-Out Cross-Validation, 110</p> <p>11.6.4 AIC as Leave-One-Out Cross-Validation, 112</p> <p>11.7 Summary, 112</p> <p>Exercises, 113</p> <p><b>12 Fitting Fourier series 115</b></p> <p>12.1 Introduction: More Complex Periodic Models, 115</p> <p>12.2 More Complex Periodic Behavior: Accidental Deaths, 116</p> <p>12.2.1 Fourier Series Structure, 116</p> <p>12.2.2 R Code for Fitting Large Fourier Series, 116</p> <p>12.2.3 Model Selection with AIC, 117</p> <p>12.2.4 Model Selection with Likelihood Ratio Tests, 118</p> <p>12.2.5 Data Splitting, 119</p> <p>12.2.6 Accidental Deaths—Some Comment on Periodic Data, 120</p> <p>12.3 The Boise River Flow data, 121</p> <p>12.3.1 The Data, 121</p> <p>12.3.2 Model Selection with AIC, 122</p> <p>12.3.3 Data Splitting, 123</p> <p>12.3.4 The Residuals, 123</p> <p>12.4 Where Do We Go from Here?, 124</p> <p>Exercises, 124</p> <p><b>13 Adjusting for AR(1) Correlation in Complex Models 125</b></p> <p>13.1 Introduction, 125</p> <p>13.2 The Two-Sample t-Test—UNCUT and Patch-Cut Forest, 125</p> <p>13.2.1 The Sleuth Data and the Question of Interest, 125</p> <p>13.2.2 A Simple Adjustment for t-Tests When the Residuals Are AR(1), 128</p> <p>13.2.3 A Simulation Example, 129</p> <p>13.2.4 Analysis of the Sleuth Data, 131</p> <p>13.3 The Second Sleuth Case—Global Warming, A Simple Regression, 132</p> <p>13.3.1 The Data and the Question, 132</p> <p>13.3.2 Filtering to Produce (Quasi-)Independent Observations, 133</p> <p>13.3.3 Simulated Example—Regression, 134</p> <p>13.3.4 Analysis of the Regression Case, 135</p> <p>13.3.5 The Filtering Approach for the Logging Case, 136</p> <p>13.3.6 A Few Comments on Filtering, 137</p> <p>13.4 The Semmelweis Intervention, 138</p> <p>13.4.1 The Data, 138</p> <p>13.4.2 Why Serial Correlation?, 139</p> <p>13.4.3 How This Data Differs from the Patch/Uncut Case, 139</p> <p>13.4.4 Filtered Analysis, 140</p> <p>13.4.5 Transformations and Inference, 142</p> <p>13.5 The NYC Temperatures (Adjusted), 142</p> <p>13.5.1 The Data and Prediction Intervals, 142</p> <p>13.5.2 The AR(1) Prediction Model, 144</p> <p>13.5.3 A Simulation to Evaluate These Formulas, 144</p> <p>13.5.4 Application to NYC Data, 146</p> <p>13.6 The Boise River Flow Data: Model Selection With Filtering, 147</p> <p>13.6.1 The Revised Model Selection Problem, 147</p> <p>13.6.2 Comments on R2 and R2 pred, 147</p> <p>13.6.3 Model Selection After Filtering with a Matrix, 148</p> <p>13.7 Implications of AR(1) Adjustments and the “Skip” Method, 151</p> <p>13.7.1 Adjustments for AR(1) Autocorrelation, 151</p> <p>13.7.2 Impact of Serial Correlation on p-Values, 152</p> <p>13.7.3 The “skip” Method, 152</p> <p>13.8 Summary, 152</p> <p>Exercises, 153</p> <p><b>PART III COMPLEX TEMPORAL STRUCTURES</b></p> <p><b>14 The Backshift Operator, the Impulse Response Function, and General ARMA Models 159</b></p> <p>14.1 The General ARMA Model, 159</p> <p>14.1.1 The Mathematical Formulation, 159</p> <p>14.1.2 The arima.sim() Function in R Revisited, 159</p> <p>14.1.3 Examples of ARMA(m,l) Models, 160</p> <p>14.2 The Backshift (Shift, Lag) Operator, 161</p> <p>14.2.1 Definition of B, 161</p> <p>14.2.2 The Stationary Conditions for a General AR(m) Model, 161</p> <p>14.2.3 ARMA(m,l) Models and the Backshift Operator, 162</p> <p>14.2.4 More Examples of ARMA(m,l) Models, 162</p> <p>14.3 The Impulse Response Operator—Intuition, 164</p> <p>14.4 Impulse Response Operator, g(B)—Computation, 165</p> <p>14.4.1 Definition of g(B), 165</p> <p>14.4.2 Computing the Coefficients, gj., 165</p> <p>14.4.3 Plotting an Impulse Response Function, 166</p> <p>14.5 Interpretation and Utility of the Impulse Response Function, 167</p> <p>Exercises, 167</p> <p><b>15 The Yule–Walker Equations and the Partial Autocorrelation Function 169</b></p> <p>15.1 Background, 169</p> <p>15.2 Autocovariance of an ARMA(m,l) Model, 169</p> <p>15.2.1 A Preliminary Result, 169</p> <p>15.2.2 The Autocovariance Function for ARMA(m,l) Models, 170</p> <p>15.3 AR(m) and the Yule–Walker Equations, 170</p> <p>15.3.1 The Equations, 170</p> <p>15.3.2 The R Function ar.yw() with an AR(3) Example, 171</p> <p>15.3.3 Information Criteria-Based Model Selection Using ar.yw(), 173</p> <p>15.4 The Partial Autocorrelation Plot, 174</p> <p>15.4.1 A Sequence of Hypothesis Tests, 174</p> <p>15.4.2 The pacf() Function—Hypothesis Tests Presented in a Plot, 174</p> <p>15.5 The Spectrum For Arma Processes, 175</p> <p>15.6 Summary, 177</p> <p>Exercises, 178</p> <p><b>16 Modeling Philosophy and Complete Examples 180</b></p> <p>16.1 Modeling Overview, 180</p> <p>16.1.1 The Algorithm, 180</p> <p>16.1.2 The Underlying Assumption, 180</p> <p>16.1.3 An Example Using an AR(m) Filter to Model MA(3), 181</p> <p>16.1.4 Generalizing the “Skip” Method, 184</p> <p>16.2 A Complex Periodic Model—Monthly River Flows, Furnas 1931–1978, 185</p> <p>16.2.1 The Data, 185</p> <p>16.2.2 A Saturated Model, 186</p> <p>16.2.3 Building an AR(m) Filtering Matrix, 187</p> <p>16.2.4 Model Selection, 189</p> <p>16.2.5 Predictions and Prediction Intervals for an AR(3) Model, 190</p> <p>16.2.6 Data Splitting, 191</p> <p>16.2.7 Model Selection Based on a Validation Set, 192</p> <p>16.3 A Modeling Example—Trend and Periodicity: CO2 Levels at Mauna Lau, 193</p> <p>16.3.1 The Saturated Model and Filter, 193</p> <p>16.3.2 Model Selection, 194</p> <p>16.3.3 How Well Does the Model Fit the Data?, 197</p> <p>16.4 Modeling Periodicity with a Possible Intervention—Two Examples, 198</p> <p>16.4.1 The General Structure, 198</p> <p>16.4.2 Directory Assistance, 199</p> <p>16.4.3 Ozone Levels in Los Angeles, 202</p> <p>16.5 Periodic Models: Monthly, Weekly, and Daily Averages, 205</p> <p>16.6 Summary, 207</p> <p>Exercises, 207</p> <p><b>PART IV SOME DETAILED AND COMPLETE EXAMPLES</b></p> <p><b>17 Wolf’s Sunspot Number Data 213</b></p> <p>17.1 Background, 213</p> <p>17.2 Unknown Period ⇒ Nonlinear Model, 214</p> <p>17.3 The Function nls() in R, 214</p> <p>17.4 Determining the Period, 216</p> <p>17.5 Instability in the Mean, Amplitude, and Period, 217</p> <p>17.6 Data Splitting for Prediction, 220</p> <p>17.6.1 The Approach, 220</p> <p>17.6.2 Step 1—Fitting One Step Ahead, 222</p> <p>17.6.3 The AR Correction, 222</p> <p>17.6.4 Putting it All Together, 223</p> <p>17.6.5 Model Selection, 223</p> <p>17.6.6 Predictions Two Steps Ahead, 224</p> <p>17.7 Summary, 226</p> <p>Exercises, 226</p> <p><b>18 An Analysis of Some Prostate and Breast Cancer Data 228</b></p> <p>18.1 Background, 228</p> <p>18.2 The First Data Set, 229</p> <p>18.3 The Second Data Set, 232</p> <p>18.3.1 Background and Questions, 232</p> <p>18.3.2 Outline of the Statistical Analysis, 233</p> <p>18.3.3 Looking at the Data, 233</p> <p>18.3.4 Examining the Residuals for AR(m) Structure, 235</p> <p>18.3.5 Regression Analysis with Filtered Data, 238</p> <p>Exercises, 243</p> <p><b>19 Christopher Tennant/Ben Crosby Watershed Data 245</b></p> <p>19.1 Background and Question, 245</p> <p>19.2 Looking at the Data and Fitting Fourier Series, 246</p> <p>19.2.1 The Structure of the Data, 246</p> <p>19.2.2 Fourier Series Fits to the Data, 246</p> <p>19.2.3 Connecting Patterns in Data to Physical Processes, 246</p> <p>19.3 Averaging Data, 248</p> <p>19.4 Results, 250</p> <p>Exercises, 250</p> <p><b>20 Vostok Ice Core Data 251</b></p> <p>20.1 Source of the Data, 251</p> <p>20.2 Background, 252</p> <p>20.3 Alignment, 253</p> <p>20.3.1 Need for Alignment, and Possible Issues Resulting from Alignment, 253</p> <p>20.3.2 Is the Pattern in the Temperature Data Maintained?, 254</p> <p>20.3.3 Are the Dates Closely Matched?, 254</p> <p>20.3.4 Are the Times Equally Spaced?, 255</p> <p>20.4 A Na¨ýve Analysis, 256</p> <p>20.4.1 A Saturated Model, 256</p> <p>20.4.2 Model Selection, 258</p> <p>20.4.3 The Association Between CO2 and Temperature Change, 258</p> <p>20.5 A Related Simulation, 259</p> <p>20.5.1 The Model and the Question of Interest, 259</p> <p>20.5.2 Simulation Code in R, 260</p> <p>20.5.3 A Model Using all of the Simulated Data, 261</p> <p>20.5.4 A Model Using a Sample of 283 from the Simulated Data, 262</p> <p>20.6 An AR(1) Model for Irregular Spacing, 265</p> <p>20.6.1 Motivation, 265</p> <p>20.6.2 Method, 266</p> <p>20.6.3 Results, 266</p> <p>20.6.4 Sensitivity Analysis, 267</p> <p>20.6.5 A Final Analysis, Well Not Quite, 268</p> <p>20.7 Summary, 269</p> <p>Exercises, 270</p> <p><b>Appendix A Using Datamarket 273</b></p> <p>A.1 Overview, 273</p> <p>A.2 Loading a Time Series in Datamarket, 277</p> <p>A.3 Respecting Datamarket Licensing Agreements, 280</p> <p><b>Appendix B AIC is PRESS! 281</b></p> <p>B.1 Introduction, 281</p> <p>B.2 PRESS, 281</p> <p>B.3 Connection to Akaike’s Result, 282</p> <p>B.4 Normalization and R2, 282</p> <p>B.5 An example, 283</p> <p>B.6 Conclusion and Further Comments, 283</p> <p><b>Appendix C A 15-Minute Tutorial on Nonlinear Optimization 284</b></p> <p>C.1 Introduction, 284</p> <p>C.2 Newton’s Method for One-Dimensional Nonlinear Optimization, 284</p> <p>C.3 A Sequence of Directions, Step Sizes, and a Stopping Rule, 285</p> <p>C.4 What Could Go Wrong?, 285</p> <p>C.5 Generalizing the Optimization Problem, 286</p> <p>C.6 What Could Go Wrong—Revisited, 286</p> <p>C.7 What Can be Done?, 287</p> <p>REFERENCES 291</p> <p>INDEX 293</p>
<p><b>DeWayne R. Derryberry, PhD,</b> is Associate Professor in the Department of Mathematics and Statistics at Idaho State University. Dr. Derryberry has published more than a dozen journal articles and his research interests include meta-analysis, discriminant analysis with messy data, time series analysis of the relationship between several cancers, and geographically-weighted regression.</p>
<p><b>Presents modern methods to analyzing data with multiple applications in a variety of scientific fields</b></p> <p>Written at a readily accessible level, <i>Basic Data Analysis for Time Series with R</i> emphasizes the mathematical importance of collaborative analysis of data used to collect increments of time or space. Balancing a theoretical and practical approach to analyzing data within the context of serial correlation, the book presents a coherent and systematic regression-based approach to model selection.  The book illustrates these principles of model selection and model building through the use of information criteria, cross validation, hypothesis tests, and confidence intervals.</p> <p>Focusing on frequency- and time-domain and trigonometric regression as the primary themes, the book also includes modern topical coverage on Fourier series and Akaike's Information Criterion (AIC). In addition, <i>Basic Data Analysis for Time Series with R</i> also features:</p> <ul> <li>Real-world examples to provide readers with practical hands-on experience</li> <li>Multiple R software subroutines employed with graphical displays</li> <li>Numerous exercise sets intended to support readers understanding of the core concepts</li> <li>Specific chapters devoted to the analysis of the Wolf sunspot number data and the Vostok ice core data sets</li> </ul> <p><i>Basic Data Analysis for Time Series with R</i> is an ideal textbook for upper-undergraduate and beginning graduate-level courses using time series analysis as well as a useful reference for practicing statisticians and scientists.</p>

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €