Contents

Cover

Title Page

Preface

Part I: Fuzzy Information

1: Fuzzy data

1.1 One-dimensional fuzzy data

1.2 Vector-valued fuzzy data

1.3 Fuzziness and variability

1.4 Fuzziness and errors

1.5 Problems

2: Fuzzy numbers and fuzzy vectors

2.1 Fuzzy numbers and characterizing functions

2.2 Vectors of fuzzy numbers and fuzzy vectors

2.3 Triangular norms

2.4 Problems

3: Mathematical operations for fuzzy quantities

3.1 Functions of fuzzy variables

3.3 Multiplication of fuzzy numbers

3.4 Mean value of fuzzy numbers

3.5 Differences and quotients

3.6 Fuzzy valued functions

3.7 Problems

Part II: Descriptive Statistics for Fuzzy Data

4: Fuzzy samples

4.1 Minimum of fuzzy data

4.2 Maximum of fuzzy data

4.3 Cumulative sum for fuzzy data

4.4 Problems

5: Histograms for fuzzy data

5.1 Fuzzy frequency of a fixed class

5.2 Fuzzy frequency distributions

5.3 Axonometric diagram of the fuzzy histogram

5.4 Problems

6: Empirical distribution functions

6.1 Fuzzy valued empirical distribution function

6.2 Fuzzy empirical fractiles

6.3 Smoothed empirical distribution function

6.4 Problems

7: Empirical correlation for fuzzy data

7.1 Fuzzy empirical correlation coefficient

7.2 Problems

Part III: Foundations of Statistical Inference With Fuzzy Data

8: Fuzzy probability distributions

8.1 Fuzzy probability densities

8.2 Probabilities based on fuzzy probability densities

8.3 General fuzzy probability distributions

8.4 Problems

9: A law of large numbers

9.1 Fuzzy random variables

9.2 Fuzzy probability distributions induced by fuzzy random variables

9.3 Sequences of fuzzy random variables

9.4 Law of large numbers for fuzzy random variables

9.5 Problems

10: Combined fuzzy samples

10.1 Observation space and sample space

10.2 Combination of fuzzy samples

10.3 Statistics of fuzzy data

10.4 Problems

Part IV: Classical Statistical Inference for Fuzzy Data

11: Generalized point estimators

11.1 Estimators based on fuzzy samples

11.2 Sample moments

11.3 Problems

12: Generalized confidence regions

12.1 Confidence functions

12.2 Fuzzy confidence regions

12.3 Problems

13: Statistical tests for fuzzy data

13.1 Test statistics and fuzzy data

13.2 Fuzzy p-values

13.3 Problems

Part V: Bayesian Inference and Fuzzy Information

14: Bayes’ theorem and fuzzy information

14.1 Fuzzy a priori distributions

14.2 Updating fuzzy a priori distributions

14.3 Problems

15: Generalized Bayes’ theorem

15.1 Likelihood function for fuzzy data

15.2 Bayes’ theorem for fuzzy a priori distribution and fuzzy data

15.3 Problems

16: Bayesian confidence regions

16.1 Bayesian confidence regions based on fuzzy data

16.2 Fuzzy HPD-regions

16.3 Problems

17: Fuzzy predictive distributions

17.1 Discrete case

17.2 Discrete models with continuous parameter space

17.3 Continuous case

17.4 Problems

18: Bayesian decisions and fuzzy information

18.1 Bayesian decisions

18.2 Fuzzy utility

18.3 Discrete state space

18.4 Continuous state space

18.5 Problems

Part VI: Regression Analysis and Fuzzy Information

19: Classical regression analysis

19.1 Regression models

19.2 Linear regression models with Gaussian dependent variables

19.3 General linear models

19.4 Nonidentical variances

19.5 Problems

20: Regression models and fuzzy data

20.1 Regression Models and Fuzzy Data

20.2 Generalized estimators for linear regression models based on the extension principle

20.3 Generalized confidence regions for parameters

20.4 Prediction in fuzzy regression models

20.5 Problems

21: Bayesian regression analysis

21.1 Calculation of a posteriori distributions

21.2 Bayesian confidence regions

21.3 Probabilities of Hypotheses

21.4 Predictive distributions

21.5 A posteriori Bayes estimators for regression parameters

21.6 Bayesian regression with Gaussian distributions

21.7 Problems

22: Bayesian regression analysis and fuzzy information

22.1 Fuzzy estimators of regression parameters

22.2 Generalized Bayesian confidence regions

22.3 Fuzzy predictive distributions

22.4 Problems

Part VII: Fuzzy time series

23: Mathematical concepts

23.1 Support functions of fuzzy quantities

23.2 Distances of fuzzy quantities

23.3 Generalized Hukuhara difference

24: Descriptive methods for fuzzy time series

24.1 Moving averages

24.2 Filtering

24.3 Exponential smoothing

24.4 Components model

24.5 Difference filters

24.6 Generalized Holt–Winter method

24.7 Presentation in the frequency domain

25: More on fuzzy random variables and fuzzy random vectors

25.1 Basics

25.2 Expectation and variance of fuzzy random variables

25.3 Covariance and correlation

25.4 Further results

26: Stochastic methods in fuzzy time series analysis

26.1 Linear approximation and prediction

26.2 Remarks concerning Kalman filtering

Part VIII: Appendices

A1: List of symbols and abbreviations

A2: Solutions to the problems

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Chapter 16

Chapter 17

Chapter 18

Chapter 19

Chapter 20

Chapter 21

Chapter 22

A3: Glossary

A4: Related literature

References

Index

Preface

Statistics is concerned with the analysis of data and estimation of probability distribution and stochastic models. Therefore the quantitative description of data is essential for statistics.

In standard statistics data are assumed to be numbers, vectors or classical functions. But in applications real data are frequently not precise numbers or vectors, but often more or less imprecise, also called fuzzy. It is important to note that this kind of uncertainty is different from errors; it is the imprecision of individual observations or measurements.

Whereas counting data can be precise, possibly biased by errors, measurement data of continuous quantities like length, time, volume, concentrations of poisons, amounts of chemicals released to the environment and others, are always not precise real numbers but connected with imprecision.

In measurement analysis usually statistical models are used to describe data uncertainty. But statistical models are describing variability and not the imprecision of individual measurement results. Therefore other models are necessary to quantify the imprecision of measurement results.

For a special kind of data, e.g. data from digital instruments, interval arithmetic can be used to describe the propagation of data imprecision in statistical inference. But there are data of a more general form than intervals, e.g. data obtained from analog instruments or data from oscillographs, or graphical data like color intensity pictures. Therefore it is necessary to have a more general numerical model to describe measurement data.

The most up-to-date concept for this is special fuzzy subsets of the set of real numbers, or special fuzzy subsets of the k-dimensional Euclidean space k in the case of vector quantities. These special fuzzy subsets of are called nonprecise numbers in the one-dimensional case and nonprecise vectors in the k-dimensional case for k > 1. Nonprecise numbers are defined by so-called characterizing functions and nonprecise vectors by so-called vector-characterizing functions. These are generalizations of indicator functions of classical sets in standard set theory. The concept of fuzzy numbers from fuzzy set theory is too restrictive to describe real data. Therefore nonprecise numbers are introduced.

By the necessity of the quantitative description of fuzzy data it is necessary to adapt statistical methods to the situation of fuzzy data. This is possible and generalized statistical procedures for fuzzy data are described in this book.

There are also other approaches for the analysis of fuzzy data. Here an approach from the viewpoint of applications is used. Other approaches are mentioned in Appendix A4.

Besides fuzziness of data there is also fuzziness of a priori distributions in Bayesian statistics. So called fuzzy probability distributions can be used to model nonprecise a priori knowledge concerning parameters in statistical models.

In the text the necessary foundations of fuzzy models are explained and basic statistical analysis methods for fuzzy samples are described. These include generalized classical statistical procedures as well as generalized Bayesian inference procedures.

A software system for statistical analysis of fuzzy data (AFD) is under development. Some procedures are already available, and others are in progress. The available software can be obtained from the author.

Last but not least I want to thank all persons who contributed to this work: Dr D. Hareter, Mr H. Schwarz, Mrs D. Vater, Dr I. Meliconi, H. Kay, P. Sinha-Sahay and B. Kaur from Wiley for the excellent cooperation, and my wife Dorothea for preparing the files for the last two parts of this book.

I hope the readers will enjoy the text.

Reinhard Viertl
Vienna, Austria
July 2010

Part I

FUZZY INFORMATION

Fuzzy information is a special kind of information and information is an omnipresent word in our society. But in general there is no precise definition of information.

However, in the context of statistics which is connected to uncertainty, a possible definition of information is the following: Information is everything which has influence on the assessment of uncertainty by an analyst. This uncertainty can be of different types: data uncertainty, nondeterministic quantities, model uncertainty, and uncertainty of a priori information.

Measurement results and observational data are special forms of information. Such data are frequently not precise numbers but more or less nonprecise, also called fuzzy. Such data will be considered in the first chapter.

Another kind of information is probabilities. Standard probability theory is considering probabilities to be numbers. Often this is not realistic, and in a more general approach probabilities are considered to be so-called fuzzy numbers.

The idea of generalized sets was originally published in Menger (1951) and the term ‘fuzzy set’ was coined in Zadeh (1965).

1

Fuzzy data

All kinds of data which cannot be presented as precise numbers or cannot be precisely classified are called nonprecise or fuzzy. Examples are data in the form of linguistic descriptions like high temperature, low flexibility and high blood pressure. Also, precision measurement results of continuous variables are not precise numbers but always more or less fuzzy.

1.1 One-dimensional fuzzy data

Measurement results of one-dimensional continuous quantities are frequently idealized to be numbers times a measurement unit. However, real measurement results of continuous quantities are never precise numbers but always connected with uncertainty. Usually this uncertainty is considered to be statistical in nature, but this is not suitable since statistical models are suitable to describe variability. For a single measurement result there is no variability, therefore another method to model the measurement uncertainty of individual measurement results is necessary. The best up-to-date mathematical model for that are so-called fuzzy numbers which are described in Section 2.1 [cf. Viertl (2002)].

Examples of one-dimensional fuzzy data are lifetimes of biological units, length measurements, volume measurements, height of a tree, water levels in lakes and rivers, speed measurements, mass measurements, concentrations of dangerous substances in environmental media, and so on.

A special kind of one-dimensional fuzzy data are data in the form of intervals [a;b]⊆. Such data are generated by digital measurement equipment, because they have only a finite number of digits.

1.2 Vector-valued fuzzy data

Many statistical data are multivariate, i.e. ideally the corresponding measurement results are real vectors (x1, … , xk)∈k. In applications such data are frequently not precise vectors but to some degree fuzzy. A mathematical model for this kind of data is so-called fuzzy vectors which are formalized in Section 2.2.

Examples of vector valued fuzzy data are locations of objects in space like positions of ships on radar screens, space–time data, multivariate nonprecise data in the form of vectors (x1*,…,xn*) of fuzzy numbers xi*.

1.3 Fuzziness and variability

In statistics frequently so-called stochastic quantities (also called random variables) are observed, where the observed results are fuzzy. In this situation two kinds of uncertainty are present: Variability, which can be modeled by probability distributions, also called stochastic models, and fuzziness, which can be modeled by fuzzy numbers and fuzzy vectors, respectively. It is important to note that these are two different kinds of uncertainty. Moreover it is necessary to describe fuzziness of data in order to obtain realistic results from statistical analysis. In Figure 1.1 the situation is graphically outlined.

Real data are also subject to a third kind of uncertainty: errors. These are the subject of Section 1.4.

1.4 Fuzziness and errors

In standard statistics errors are modeled in the following way. The observation y of a stochastic quantity is not its true value x, but superimposed by a quantity e, called error, i.e.

The error is considered as the realization of another stochastic quantity. These kinds of errors are denoted as random errors.

For one-dimensional quantities, all three quantities x, y, and e are, after the experiment, real numbers. But this is not suitable for continuous variables because the observed values y are fuzzy.

It is important to note that all three kinds of uncertainty are present in real data. Therefore it is necessary to generalize the mathematical operations for real numbers to the situation of fuzzy numbers.

1.5 Problems

a. Find examples of fuzzy numerical data which are not given in Section 1.1 and Section 1.2.

b. Work out the difference between stochastic uncertainty and fuzziness of individual observations.

c. Make clear how data in the form of intervals are obtained by digital measurement devices.

d. What do X-ray pictures and data from satellite photographs have in common?