Cover Page

Kernel Smoothing

Principles, Methods and Applications

Sucharita Ghosh

Swiss Federal Research Institute WSL
Birmensdorf, Switzerland

Wiley Logo


Typically, patterns in real data, which we may call curves or surfaces, will not follow simple rules. However, there may be a sufficiently good description in terms of a finite number of interpretable parameters. When this is not the case, or if the parametric description is too complex, a nonparametric approach is an option. In developing nonparametric curve estimation methods, however, sometimes we may take advantage of the vast array of available parametric statistical methods and adapt these to the nonparametric setting. While assessing properties of the nonparametric curve estimators, we will use asymptotic arguments.

This book grew out of a set of lecture notes for a course on smoothing given to the graduate students of Seminar für Statistik (Department of Mathematics, ETH, Zürich). To understand the material presented here, knowledge of linear algebra, calculus, and a background in statistical inference, in particular the theory of estimation, testing, and linear models should suffice. The textbooks Statistical Inference (Chapman & Hall) by Samuel David Silvey, Regression Analysis, Theory, Methods and Applications (Springer-Verlag) by Ashis Sen and Muni Srivastava, Linear Statistical Inference, second edition (John Wiley) by Calyampudi Radhakrishna Rao, and Robert Serfling’s book Approximation Theorems of Mathematical Statistics (John Wiley) are excellent sources for background material. For nonparametric curve estimation, there are several good books and in particular the classic Density Estimation (Chapman & Hall) by Bernard Silverman is a must-have for anyone venturing into this topic. The present text also includes some discussions on nonparametric curve estimation with time series and spatial data, in particular with different correlation types such as long-memory. A nice monograph on long-range dependence is Statistics for Long-Memory Processes (Chapman & Hall) by Jan Beran. Additional references to this topic as well as an incomplete list of textbooks on smoothing methods are included in the list of references.

Our discussion on nonparametric curve estimation starts with density estimation (Chapter 1) for continuous random variables, followed by a chapter on nonparametric regression (Chapter 2). Inspired by applications of nonparametric curve estimation techniques to dependent data, several chapters are dedicated to a selection of problems in nonparametric regression, specifically trend estimation (Chapter 3) and semiparametric regression (Chapter 4), with time series data and surface estimation with spatial observations (Chapter 5). While, for such data sets, types of dependence structures can be vast, we mainly focus on (slow) hyperbolic decays (long memory), as these types of data occur often in many important fields of applications in science as well as in business. Results for short-memory and anti-persistence are also presented in some cases. Of additional interest are spatial or temporal observations that are not necessarily Gaussian, but are unknown transformations of latent Gaussian processes. Moreover, their marginal probability distributions may be time (or spatial location) dependent and assume arbitrary (non-Gaussian) shapes. These types of model assumptions provide flexible yet parsimonious alternatives to stronger distributional assumptions such as Gaussianity or stationarity. An overview of the relevant literature on this topic is in Long Memory Processes – Probabilistic Properties and Statistical Models (Springer-Verlag) by Beran et al. (2013). This is advantageous for analyzing large-scale and long-term spatial and temporal data sets occurring, for instance, in the geosciences, forestry, climate research, medicine, finance, and others. The literature on nonparametric curve estimation is vast. There are other important methods that have not been covered here, such as wavelets – see Percival and Walden (2000), splines (a very brief discussion is included here in Chapter 2 of this book); see in particular Wahba (1990) and Eubank (1988), as well as other approaches. This book looks at kernel smoothing methods and even for kernel based approaches, admittedly, not all topics are presented here, and the focus is merely on a selection. The book also includes a few data examples, outlines of proofs are included in several cases, and otherwise references to relevant sources are provided. The data examples are based on calculations done using the S-plus statistical package (TIBCO Software, TIBCO Spotfire) and the R-package for statistical computing (The R Foundation for Statistical Computing).

Various people have been instrumental in seeing through this project. First and foremost, I am very grateful to my students at ETH, Zürich, for giving me the motivation to write this book and for pointing out many typos in earlier versions of the lecture notes. A big thank you goes to Debbie Jupe, Heather Kay, Richard Davies, and Liz Wingett, at John Wiley & Sons in Chichester, West Sussex, Alison Oliver at Oxford and to the editors at Wiley, India, for their support from the start of the project and for making it possible. I am grateful to the Swiss National Science Foundation for funding PhD students, the IT unit of the WSL for infallible support and for maintaining an extremely comfortable and state-of-the-art computing infrastructure, and the Forest Resources and Management Unit, WSL for generous funding and collaboration. Special thanks go to Jan Beran (Konstanz, Germany) for many helpful remarks on earlier versions of the manuscript and long-term collaboration on several papers on this and related topics. I also wish to thank Yuanhua Feng (Paderborn, Germany), Philipp Sibbertsen (Hannover, Germany), Rafal Kulik (Ottawa, Canada), Hans Künsch (Zurich, Switzerland), and my graduate students Dana Draghicescu, Patricia Menéndez, Hesam Montazeri, Gabrielle Moser, Carlos Ricardo Ochoa Pereira, and Fan Wu, for close collaboration, as well as Bimal Roy and various other colleagues at the Indian Statistical Institute, Kolkata and Liudas Giraitis at Queen Mary, University of London, for fruitful discussions and warm hospitality during recent academic trips. I want to thank the following for sharing data and subject specific knowledge, which have been used in related research elsewhere or in this book: Christoph Frei at MeteoSwiss and ETH, Zürich, various colleagues at the University of Bern, in particular, Willy Tinner at the Oeschger Centre for Climate Change Research, Brigitta Ammann at the Institute of Plant Sciences and Jakob Schwander at the Department of Physics, as well as Matthias Plattner at Hintermann & Weber, AG, Switzerland and various colleagues from the Swiss Federal Research Institute WSL, Birmensdorf, in particulear Urs-Beat Brändli, Fabrizio Cioldi and Andreas Schwyzer, all at the Forest Resources and Management unit. Data obtained from the MeteoSwiss, the Swiss National Forest Inventory, the Federal Office of the Environment (FOEN) in Switzerland, and various public domain data sets made available through the web platforms of the National Aeronautics and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and the Meteorological Office, UK (Met Office) used in related research elsewhere or used in this book for methodological illustrations are gratefully acknowledged.

My deepest gratitude goes to my family and friends. I want to thank my family Céline and Jan for being with me every step of the way, making sure that I finish this book at last, my family in India for their unfailing support, our colleagues Suju and Yuanhua for their hospitality on many occasions, Maria, Gunnar, Shila, and Goutam for holding the fort during conferences and other long trips, Wolfgang for his sense of humor, and last but not the least, Sir Hastings, our lovely Coton de Tuléar, for keeping us all on track with his incredible wit and judgment.

Sucharita Ghosh