Cover
Title Page
PREFACE
PART I: FUNDAMENTALS
1. CHAPTER 1: PATTERN RECOGNITION: FEATURE SPACE CONSTRUCTION
  1. 1.1 CONCEPTS
  2. 1.2 FROM PATTERNS TO FEATURES
  3. 1.3 FEATURES SCALING
  4. 1.4 EVALUATION AND SELECTION OF FEATURES
  5. 1.5 CONCLUSIONS
  6. APPENDIX 1.A
  7. APPENDIX 1.B
  8. REFERENCES
2. CHAPTER 2: PATTERN RECOGNITION: CLASSIFIERS
  1. 2.1 CONCEPTS
  2. 2.2 NEAREST NEIGHBORS CLASSIFICATION METHOD
  3. 2.3 SUPPORT VECTOR MACHINES CLASSIFICATION ALGORITHM
  4. 2.4 DECISION TREES IN CLASSIFICATION PROBLEMS
  5. 2.5 ENSEMBLE CLASSIFIERS
  6. 2.6 BAYES CLASSIFIERS
  7. 2.7 CONCLUSIONS
  8. REFERENCES
3. CHAPTER 3: CLASSIFICATION WITH REJECTION PROBLEM FORMULATION AND AN OVERVIEW
  1. 3.1 CONCEPTS
  2. 3.2 THE CONCEPT OF REJECTING ARCHITECTURES
  3. 3.3 NATIVE PATTERNS‐BASED REJECTION
  4. 3.4 REJECTION OPTION IN THE DATASET OF NATIVE PATTERNS: A CASE STUDY
  5. 3.5 CONCLUSIONS
  6. REFERENCES
4. CHAPTER 4: EVALUATING PATTERN RECOGNITION PROBLEM
  1. 4.1 EVALUATING RECOGNITION WITH REJECTION: BASIC CONCEPTS
  2. 4.2 CLASSIFICATION WITH REJECTION WITH NO FOREIGN PATTERNS
  3. 4.3 CLASSIFICATION WITH REJECTION: LOCAL CHARACTERIZATION
  4. 4.4 CONCLUSIONS
  5. REFERENCES
5. CHAPTER 5: RECOGNITION WITH REJECTION: EMPIRICAL ANALYSIS
  1. 5.1 EXPERIMENTAL RESULTS
  2. 5.2 GEOMETRICAL APPROACH
  3. 5.3 CONCLUSIONS
  4. REFERENCES
PART II: ADVANCED TOPICS: A FRAMEWORK OF GRANULAR COMPUTING
1. CHAPTER 6: CONCEPTS AND NOTIONS OF INFORMATION GRANULES
  1. 6.1 INFORMATION GRANULARITY AND GRANULAR COMPUTING
  2. 6.2 FORMAL PLATFORMS OF INFORMATION GRANULARITY
  3. 6.3 INTERVALS AND CALCULUS OF INTERVALS
  4. 6.4 CALCULUS OF FUZZY SETS
  5. 6.5 CHARACTERIZATION OF INFORMATION GRANULES: COVERAGE AND SPECIFICITY
  6. 6.6 MATCHING INFORMATION GRANULES
  7. 6.7 CONCLUSIONS
  8. REFERENCES
2. CHAPTER 7: INFORMATION GRANULES: FUNDAMENTAL CONSTRUCTS
  1. 7.1 THE PRINCIPLE OF JUSTIFIABLE GRANULARITY
  2. 7.2 INFORMATION GRANULARITY AS A DESIGN ASSET
  3. 7.3 SINGLE‐STEP AND MULTISTEP PREDICTION OF TEMPORAL DATA IN TIME SERIES MODELS
  4. 7.4 DEVELOPMENT OF GRANULAR MODELS OF HIGHER TYPE
  5. 7.5 CLASSIFICATION WITH GRANULAR PATTERNS
  6. 7.6 CONCLUSIONS
  7. REFERENCES
3. CHAPTER 8: CLUSTERING
  1. 8.1 FUZZY C‐MEANS CLUSTERING METHOD
  2. 8.2 k‐MEANS CLUSTERING ALGORITHM
  3. 8.3 AUGMENTED FUZZY CLUSTERING WITH CLUSTERS AND VARIABLES WEIGHTING
  4. 8.4 KNOWLEDGE‐BASED CLUSTERING
  5. 8.5 QUALITY OF CLUSTERING RESULTS
  6. 8.6 INFORMATION GRANULES AND INTERPRETATION OF CLUSTERING RESULTS
  7. 8.7 HIERARCHICAL CLUSTERING
  8. 8.8 INFORMATION GRANULES IN PRIVACY PROBLEM: A CONCEPT OF MICROAGGREGATION
  9. 8.9 DEVELOPMENT OF INFORMATION GRANULES OF HIGHER TYPE
  10. 8.10 EXPERIMENTAL STUDIES
  11. 8.11 CONCLUSIONS
  12. REFERENCES
4. CHAPTER 9: QUALITY OF DATA: IMPUTATION AND DATA BALANCING
  1. 9.1 DATA IMPUTATION: UNDERLYING CONCEPTS AND KEY PROBLEMS
  2. 9.2 SELECTED CATEGORIES OF IMPUTATION METHODS
  3. 9.3 IMPUTATION WITH THE USE OF INFORMATION GRANULES
  4. 9.4 GRANULAR IMPUTATION WITH THE PRINCIPLE OF JUSTIFIABLE GRANULARITY
  5. 9.5 GRANULAR IMPUTATION WITH FUZZY CLUSTERING
  6. 9.6 DATA IMPUTATION IN SYSTEM MODELING
  7. 9.7 IMBALANCED DATA AND THEIR GRANULAR CHARACTERIZATION
  8. 9.8 CONCLUSIONS
  9. REFERENCES
INDEX
End User License Agreement

List of Tables

Chapter 01
1. TABLE 1.1 Numerical features derived from two vectorial features: vertical projection and differential of vertical projections
2. TABLE 1.2 Matrix of the Pearson correlation coefficients for features outlined in Table 1.1
3. TABLE 1.3 Features ranking with two classifiers and different indices applied: classifiers SVM and k‐NN (k = 1), ANOVA index, and several clustering indices
4. TABLE 1.4 Features ranking with distance by rank: compared are all pairs of classifiers and indices, the scores less than 6000 for the SVM index and less than 7000 for the k‐NN index are bolded
5. TABLE 1.5 Features ranking with distance by segments cardinality: compared are initial segments of ranks created by classifiers and an index
6. TABLE 1.6 Performance time of Algorithm 1.10 on the MNIST dataset (LeCun et al., 1998) for three clustering indices and two classifiers
Chapter 04
1. TABLE 4.1 Confusion matrix for rejecting option of pattern recognition applied to two sets of native and foreign patterns without splitting the set of native patterns into classes
2. TABLE 4.2 Balanced confusion matrix from Table 4.1 with equalizing parameter applied
3. TABLE 4.3 Classification with rejection applied to handwritten digits (native set, 10 classes) and crossed‐out digits and handwritten Latin alphabet letters (foreign sets)
4. TABLE 4.4 Results of classification with rejection from Table 4.3 in terms of identification of native and foreign patterns, according to (4.6) and (4.7), while CC = 9558 according to (4.8)
5. TABLE 4.5 Characteristics of classification with rejection delineated by accuracy, sensitivity, and precision measures obtained from Tables 4.3 and 4.4
6. TABLE 4.6 Characteristics of multiclass classification with rejection delineated by precision, sensitivity, accuracy, and separability measures drawn from Tables 4.3 and 4.4
7. TABLE 4.7 Confusion matrix for classification without rejection applied to the set of native patterns only
8. TABLE 4.8 Confusion matrix for classification with local rejection for the native set of patterns only, CC = 9558
9. TABLE 4.9 Summary of recognition results without rejection and with rejection
10. TABLE 4.10 Characteristics of classification with and without rejection delineated by accuracy measures drawn from Tables 4.7 and 4.8
11. TABLE 4.11 References to class names, which are used in Tables 4.12 and 4.13
12. TABLE 4.12 Confusion matrix for classification without rejection applied to the set of symbols of music notation (selected 10 classes) standardized features
13. TABLE 4.13 Confusion matrix for classification with local rejection for selected 10 classes from the dataset of music notation symbols, standardized features
14. TABLE 4.14 Characteristics of classification with and without rejection delineated by accuracy measures drawn from Tables 4.12 and 4.13
Chapter 05
1. TABLE 5.1 Confusion matrices for the classifiers without and with rejection shown in Figures 3.13 and 3.15, respectively
2. TABLE 5.2 Comparison of classification results with rejection (global, local, and embedded architectures) on the train and test sets of native patterns of handwritten digits with classification results without a rejection mechanism
3. TABLE 5.3 Classification performance of classifiers constructed in features spaces of different dimensionalities: ranging from 24 to 4 features
4. TABLE 5.4 Confusion matrix for the low quality basic classifier
5. TABLE 5.5 Results of the considered models without rejection (upper part) and with rejection (bottom part)
6. TABLE 5.6 Comparison of experiments on classification with rejection (global, local, and embedded architectures) for two native datasets: handwritten digits and printed music notation
7. TABLE 5.7 Comparing results of rejection with the hyperrectangles model performed for datasets of handwritten digits and music notation
Chapter 06
1. TABLE 6.1 Selected examples of t‐norms and t‐conorms
2. TABLE 6.2 Selected examples of parametric t‐norms and t‐conorms
Chapter 08
1. TABLE 8.1 Values of the reconstruction criterion V obtained for selected values of c
2. TABLE 8.2 Values of the reconstruction criterion V obtained for selected numbers of clusters c; the drop in the values of V(c) are included in the last column with the higher values of drop marked in boldface
3. TABLE 8.3 Values of the reconstruction error V obtained for single‐, complete‐, and average‐linkage method and selected values of c
Chapter 09
1. TABLE 9.1 Coverage produced by imputed data versus different values of p; the parameter β comes with the performance index maximized by the principle of justifiable granularity

List of Illustrations

Chapter 01
1. Figure 1.1 Pattern recognition schemes: direct mapping from the space of patterns into the space of classes (a) and composition of mappings from the space of patterns into the space of features and from the space of features into the space of classes (b).
2. Figure 1.2 A typical pattern recognition scheme.
3. Figure 1.3 Pattern recognition with rejection.
4. Figure 1.4 A treble clef, a symbol belonging to a data set of printed music notation, taken as an example of a pattern. The pattern is surrounded by a bounding box of width W = 22 and height H = 60 pixels. The bounding box is not part of the pattern; it has been added only for illustrative purposes.
5. Figure 1.5 Vectorial features: (a) original pattern, (b) horizontal projection, (c) vertical projection, (d) left margin, (e) right margin, (f) bottom margin, (g) top margin, (h) horizontal transition, and (i) vertical transition. Please note that the transition values are very small, so in order to enhance visibility, we multiplied them by 4.
6. Figure 1.6 Vectorial to vectorial transformations: (a) original pattern, (b) horizontal projection, (c) its histogram, (d) its smoothing, (e) its differentiation, (f) vertical projection, (g) its histogram, (h) its smoothing, and (i) its differentiation. Note: The values of the vertical histogram are multiplied by 4.
7. Figure 1.7 Vectorial to numerical transformations: (a) original pattern, (b) numerical features of vertical projection (min = 2, mean = 23, max = 34, min position = 22, max position = 13), (c) directions—white lines on the black pattern (W–E = 13, N–S = 28, NE–SW = 20, NW–SE = 11), (d) eccentricity, and (e) Euler numbers (treble clef: −2, flat: 0, sharp: 0, fermata: 2, mezzo forte: 2).
8. Figure 1.8 Quality of different sets of features selected with the greedy search method. The procedure was adding features one by one: in each iteration one best feature was added. Feature evaluation was performed using the ANOVA F‐test and PBM index. We display accuracy (vertical axis) versus feature sets cardinality (horizontal axis). Results concern sets of features ranging from 1 to 100. Plots present accuracy measured on test sets. Plots concern different sets: original, normalized, standardized digits, and musical symbols for the dataset (Homenda et al., 2017). Information about the kind of data is presented in each individual plot.
9. Figure 1.9 Evaluation of sets of features with ANOVA F‐test, PBM index, GDI‐41 index, and SVM classifier. Sets of features were selected with greedy forward with expansion limited to 1, 3, and 5. Displayed are scores of indices at the whole learning set of patterns and SVM classifier accuracy at the training set of patterns (vertical axis) as a function of cardinality of features sets ranging from 1 to 100 (horizontal axis). Results concern a dataset of handwritten digits.
10. Figure 1.10 Quality of classification using SVM classifier constructed based on selected sets of features. Sets of features were selected using the ANOVA F‐test and PBM index. Classification accuracy (vertical axis) is measured at the training set and at the test set for three values of expansion limit and for the best features in the individual index rank, cardinality of features sets ranging from 1 to 100 (horizontal axis). The first two rows of graphs concern F‐ANOVA; the last two rows—PBM. Results concern handwritten digits and musical symbols datasets.
11. Figure 1.11 Quality of classification using the SVM classifier constructed based on selected sets of features. Sets of features were selected using the GDI‐41 index and SVM classifier. Classification accuracy (vertical axis) is measured at the training set and at the test set for three values of expansion limit and for the best features in the individual index rank, cardinality of features sets ranging from 1 to 100 (horizontal axis). The first two rows of graphs concern F‐ANOVA; the last two rows, PBM. Results concern handwritten digits and musical symbols datasets.
Chapter 02
1. Figure 2.1 Illustration of the k‐NN classifier. The pattern marked with a black star is being classified depending on the value of k: for k = 1 to the class of plus signs, for k = 3 to the class of squares, for k = 5 to the class of squares, and for k = 7 to the class of triangles. We may conclude that the classification outcome for the pattern marked with a star greatly depends on parameter k. In contrast, it is clear that classification of two patterns marked with a hash (#) and an at (@) sign is independent of the parameter k for quite wide ranges of k.
2. Figure 2.2 Illustration of the SVM algorithm: a linearly separable case.
3. Figure 2.3 The SVM algorithm: a case when classes O₁ and O₋₁ are not linearly separable.
4. Figure 2.4 The decision tree built for the training set of the wine dataset. Nodes are illustrated with ellipses (non‐leaf nodes) and rectangles (leaf nodes). Notation x‐y‐z (e.g., 43‐52‐35 in the root) informs about the number of patterns from classes O₁, O₂, and O₃, respectively, that were assigned to a given node. A single number inside tree nodes, positioned above the x‐y‐z notation, informs about the label of the majority class in a node. The majority class is also indicated with a number of patterns in boldface, for example, in the root we have 43‐52‐35 that says that the second class is the majority class in this node. Inequalities on the edges indicate the split condition of the father’s set of patterns and the feature that was used to perform this split.
5. Figure 2.5 Classification procedure for the test set using the decision tree constructed in Figure 2.4. Numbers inside nodes (ellipses and rectangles) depict numbers of patterns from each class that fall into a given node. The majority class number (an integer in the first row in each node) concerns the set of training data. The majority class for the test set is indicated with a number written in boldface in the second row.
6. Figure 2.6 A simplified decision tree. In the full tree outlined in Figure 2.4, internal nodes for which the set of training patterns included less than 10% incorrectly classified patterns were turned to leaves. Classification results for the simplified (pruned) tree are displayed for the training set (left graph) and the test set (right graph).
7. Figure 2.7 Graphs of entropy, the Gini index, and index of incorrect classification for a case of two classes. (a) True values of indexes and (b) values of indexes scaled to the unit interval.
8. Figure 2.8 Illustration of the Bayes classifier. Minimal misclassification error is achieved for the crossing point of joined probability densities. Minimal misclassification error is shown as a lined area; reducible error is shown as a double lined area.
9. Figure 2.9 Illustration of rejecting regions of the Bayes classifier. A posteriori probabilities for both classes are smaller than δ_max on the left and the right, while the absolute difference between a posteriori probabilities is smaller than δ_diff in the middle.
10. Figure 2.10 Density function estimation for the first feature of the wine dataset. We present estimation of density function for classes 1, 2, and 3 in the first, second, and third columns, respectively. Results concern interval length h equal to 0.2, 0.4, and 0.8 in the first, second, and third rows, respectively.
11. Figure 2.11 Estimation of density function with the Gaussian kernel function for the first feature of the wine dataset. We present the estimation of the density function for classes 1, 2, and 3 and for interval length (h) equal to 0.2, 0.4, and 0.8, respectively.
Chapter 03
1. Figure 3.1 An idea of classification without rejection. (a) A classifier is constructed on the basis of a learning set of native patterns. (b) Constructed classifier is used in practice. Native and foreign patterns are presented to the classifier. Foreign patterns do not belong to any class, but anyway they have to be classified to a native class, which decreases the overall quality of data processing. In addition, we see that a few native patterns are classified to incorrect classes.
2. Figure 3.2 An idea of classification with rejection. Now most of foreign patterns are rejected. Please compare this sketch with Figure 3.1b that depicts analogous classification mechanism but without rejection, where foreign patterns are incorrectly classified to native classes.
3. Figure 3.3 The architecture of global rejection. Native and foreign patterns are separated prior to classification of native patterns. Since foreign patterns are eliminated at the first step of recognition with rejection, the set of patterns subjected to classification is assumed to include native patterns only.
4. Figure 3.4 An example of a dataset of native patterns contaminated with foreign patterns suitable for global rejection. Both sets of native patterns are easily separable from the set of foreign patterns.
5. Figure 3.5 The architecture of local rejection. Sets of native and foreign patterns are subjected to classification prior to foreign pattern rejection.
6. Figure 3.6 An example of a dataset consisting of native and foreign patterns suitable to be processed using the local rejection architecture. Native and foreign patterns are hardly separable. Hence, local rejecting would be more effective than global rejecting.
7. Figure 3.7 Illustration of embedded rejection architecture.
8. Figure 3.8 Native patterns—handwritten digits.
9. Figure 3.9 Native patterns—symbols of printed music notation symbols dataset. Below name of each symbol, we put number of samples in this class in the dataset.
10. Figure 3.10 An excerpt of printed music notation—visualization of imbalances in the dataset of printed music notation symbols with regard to shapes and sizes.
11. Figure 3.11 Samples of foreign patterns, two rows with samples of each dataset. From top to bottom: handwritten digits crossed out, handwritten Latin alphabet letters, garbage patterns of music notation.
12. Figure 3.12 The binary tree structure of the classifier built for the set of 10 classes of the MNIST database (cf. Yann LeCun et al., 1998). The set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} of 10 classes is split into two subsets, {2, 3, 5, 7} and {0, 1, 4, 6, 8, 9}, and then the set {2, 3, 5, 7} of four classes is split into two subsets, {3, 5} and {2, 7}, and so on.
13. Figure 3.13 Architecture of the example tree of classifiers. In this study, random forests and SVMs were used as binary classifiers placed in nodes of this tree.
14. Figure 3.14 Tree‐based classifier from Figure 3.13 furnished with the global rejection option. The illustration concerns the right branch of the tree (the left branch is skipped). Rejecting is performed with the same binary classifiers as in the basic C‐class classifier, that is, random forests or SVMs.
15. Figure 3.15 The tree‐based classifier from Figure 3.13 furnished with the local rejection option. The illustration concerns the right branch of the tree (the left branch is skipped). Rejecting is performed with random forests or SVM binary classifiers attached to each leaf of the tree.
16. Figure 3.16 Tree‐based classifier from Figure 3.13 furnished with the embedded rejection option. The illustration concerns the right branch of the tree. Rejecting is performed with SVM classifiers attached to internal nodes and leaves of the tree.
Chapter 04
1. Figure 4.1 The structure of binary tree‐based classifiers used in musical symbol classification.
Chapter 05
1. Figure 5.1 Structure of binary tree‐based classifiers and its textual representation for the handwritten digits dataset.
2. Figure 5.2 Structure of binary tree‐based classifiers for various sets of features: of size 24, 20, 16, 12, 8, and 4. The classifier for the full set of 24 features is fully represented in the textual form (it is the first shown structure). Dots represent digits (class labels), which are not changed in comparison with the classifier above.
3. Figure 5.3 Textual representation of the structure of binary tree‐based classifiers applied in classification of music notation symbols.
4. Figure 5.4 Construction of a hyperrectangles‐based area enclosing the training set of native patterns. The area is the union of hyperrectangles built on classes of patterns from the training set.
5. Figure 5.5 Construction of an area enclosing the training set of patterns. The area is the union (E) of three ellipsoids (E₁, E₂, E₃) built on classes of patterns from the training set.
6. Figure 5.6 Construction of k‐spans H^k for two features and three classes of native patterns: (a) H⁰ span is the union of rectangles enclosing all patterns of classes. (b) H³ span is constructed based on removed three outermost patterns from each class.
7. Figure 5.7 Geometrical rejection with hyperrectangles—characteristics for handwritten digits and symbols of music notation datasets. Plots outline acceptance rate (for the training and test sets of native patterns) and rejection rate (for the set of foreign patterns) as functions of iterations performed in Algorithm 5.2. Compared are two methods of removing patterns: one feature and volume.
8. Figure 5.8 Construction of ellipsoids‐based k‐spans E^k for two features and three classes of native patterns: (a) E⁰ span is the union of three ellipsoids; each ellipsoid encloses all patterns of one class. (b) Spans E¹, then E², and then E³ are constructed by removing three (one from each class) outermost patterns from each class in each iteration.
9. Figure 5.9 Geometrical rejection with ellipsoids—characteristics for handwritten digits and symbols of music notation datasets. Plots outline acceptance rate (for the training and test sets of native patterns) and rejection rate (for the set of foreign patterns) as functions of iterations performed in Algorithm 5.4.
Chapter 06
1. Figure 6.1 From a stream of numeric data to their granular description completed with the aid of information granules.
2. Figure 6.2 Histogram as an example of information granule with data coming from a single‐class (a) and two‐class problem (b).
3. Figure 6.3 Conceptual realization of information granules—a comparative view.
4. Figure 6.4 Examples of set‐theoretic operations on numeric intervals.
5. Figure 6.5 Examples of α‐cut and strong α‐cut.
6. Figure 6.6 Determination of specificity of (a) interval (set), (b) unimodal fuzzy set, and (c) multimodal fuzzy set.
7. Figure 6.7 Plots of coverage and specificity as a function of b: (a) σ = 1.0, (b) σ = 2.0.
8. Figure 6.8 Join and meet of interval information granules.
Chapter 07
1. Figure 7.1 Example plots of coverage and specificity (linear model) regarded as a function of b.
2. Figure 7.2 V(b) as a function b: (a) σ = 1.0 and (b) σ = 2.0.
3. Figure 7.3 Development of information granules in the two‐dimensional case when using two distance functions: (a) Euclidean distance and (b) Chebyshev distance.
4. Figure 7.4 Triangular membership function with adjustable (optimized) bounds a and b.
5. Figure 7.5 Aggregation of experimental evidence through the principle of justifiable granularity: an elevation of type of information granularity.
6. Figure 7.6 Plots of pdfs of the data for which the principle of justifiable granularity is applied. Shown are also inhibitory data (governed by p₂).
7. Figure 7.7 Plots of V(b) for γ = 1 (solid line) and γ = 0 (dotted line).
8. Figure 7.8 Original numeric mapping along with the interval bounds.
9. Figure 7.9 Linear mapping (dark line) along with its interval (granular) generalization; gray and dotted lines show the upper and lower bounds, respectively.
10. Figure 7.10 Nonlinear mapping and its granular (interval‐valued) generalization.
11. Figure 7.11 From fuzzy models to granular fuzzy models: a formation of a granular space of parameters.
12. Figure 7.12 Allocation of information granularity in aggregation problem.
13. Figure 7.13 Multistep prediction in granular time series; shown are successive steps of prediction.
14. Figure 7.14 Forming granular models of higher type: a concept.
15. Figure 7.15 Interval‐valued output of the rule‐based fuzzy model; gray lines—the bounds of the interval.
16. Figure 7.16 Interval‐valued output of the rule‐based fuzzy model: the bounds y⁻ and y⁺ of the interval are shown in gray.
17. Figure 7.17 Characteristics of type‐2 granular rule‐based models; dotted lines show the bounds produced by the type‐2 (granular) intervals.
18. Figure 7.18 An overview of the design of granular classifiers; note a functionality of the preprocessing module forming a granular feature space. Two modes of performance evaluation: (a) considering class membership and (b) considering binary classification.
Chapter 08
1. Figure 8.1 Geometry of data distributed from the origin at some constant distance ρ.
2. Figure 8.2 Example of a dendrogram formed through hierarchical clustering; in the case of agglomerative bottom‐up clustering, starting from the number of clusters being equal to the number of data in X, larger groups are formed by aggregating the two clusters that are the closest to each other.
3. Figure 8.3 Computing distances in single‐linkage (a), complete‐linkage (b), and average‐linkage (c) variants of hierarchical clustering.
4. Figure 8.4 From data to hotspots and information granules of higher type.
5. Figure 8.5 Synthetic data coming as a mixture of normal distributions; various symbols denote data coming from the individual distributions.
6. Figure 8.6 Results of fuzzy clustering; included are prototypes superimposed over the existing data.
7. Figure 8.7 Reconstruction results obtained for the k‐means clustering; the plots include the obtained prototypes superimposed over the data.
8. Figure 8.8 Plots of granular prototypes for prototypes produced by k‐means. Shown are the numeric prototypes along with their radii (shown in dark color).
9. Figure 8.9 Granular prototypes constructed for results produced by the FCM clustering; shown are the numeric centers and their radii (shown in dark color).
10. Figure 8.10 Granular prototypes built on a basis of prototypes formed by k‐means. Inhibitory information is included when forming information granules, centers, and radii are shown in dark color.
11. Figure 8.11 Dendrograms obtained for hierarchical clustering for c = 2, 5, and 8: (a) single linkage, (b) complete linkage, and (c) average linkage.
Chapter 09
1. Figure 9.1 Illustration of the principle of the selected imputation methods: (a) statistics‐based imputation and (b) hot deck imputation.
2. Figure 9.2 Main phases of processing with a visualization of imputation module; here the features space is treated as an n‐dimensional space of real numbers Rⁿ.
3. Figure 9.3 Process of granular imputation yielding a new features space.
4. Figure 9.4 Granular imputation realized with the principle of justifiable granularity.
5. Figure 9.5 Building augmented features space resulting from data imputation to be used for models of classification and prediction.
6. Figure 9.6 Location of z and the resulting information granule (circles, majority class). (a) z surrounded by data coming from the majority class, (b) z located in the region exhibiting a mixture of data coming from the majority and minority class, and (c) z located in the region occupied by data coming from the minority class.
7. Figure 9.7 From a set of imbalanced data to their balanced counterpart involving granular data.

PREFACE

Pattern recognition has established itself as an advanced area with a well‐defined methodology, a plethora of algorithms, and well‐defined application areas. For decades, pattern recognition has been a subject of intense theoretical and applied research inspired by practical needs. Prudently formulated evaluation strategies and methods of pattern recognition, especially a suite of classification algorithms, constitute the crux of numerous pattern classifiers. There are numerous representative realms of applications including recognizing printed text and manuscripts, identifying musical notation, supporting multimodal biometric systems (voice, iris, signature), classifying medical signals (including ECG, EEG, EMG, etc.), and classifying and interpreting images.

With the abundance of data, their volume, and existing diversity arise evident challenges that need to be carefully addressed to foster further advancements of the area and meet the needs of the ever‐growing applications. In a nutshell, they are concerned with the data quality. This term manifests in numerous ways and has to be perceived in a very general sense. Missing data, data affected by noise, foreign patterns, limited precision, information granularity, and imbalanced data are commonly encountered phenomena one has to take into consideration in building pattern classifiers and carrying out comprehensive data analysis. In particular, one has to engage suitable ways of transforming (preprocessing) data (patterns) prior to their analysis, classification, and interpretation.

The quality of data impacts the very essence of pattern recognition and calls for thorough investigations of the principles of the area. Data quality exhibits a direct impact on architectures and the development schemes of the classifiers. This book aims to cover the essentials of pattern recognition by casting it in a new perspective of data quality—in essence we advocate that a new framework of pattern recognition along with its methodology and algorithms has to be established to cope with the challenges of data quality. As a representative example, it is of interest to look at the problem of the so‐called foreign (odd) patterns. By foreign patterns we mean patterns not belonging to a family of classes under consideration. The ever‐growing presence of pattern recognition technologies increases the importance of identifying foreign patterns. For example, in recognition of printed texts, odd patterns (say, blots, grease, or damaged symbols) appear quite rarely. On the other hand, in recognition problem completed for some other sources such as geodetic maps or musical notation, foreign patterns occur quite often and their presence cannot be ignored. Unlike printed text, such documents contain objects of irregular positioning, differing in size, overlapping, or having complex shape. Thus, too strict segmentation results in the rejection of many recognizable symbols. Due to the weak separability of recognized patterns, segmentation criteria need to be relaxed and foreign patterns similar to recognized symbols have to be carefully inspected and rejected.

The exposure of the overall material is structured into two parts, Part I: Fundamentals and Part II: Advanced Topics: A Framework of Granular Computing. This arrangement reflects the general nature of the main topics being covered.

Part I addresses the principles of pattern recognition with rejection. The task of a rejection of foreign pattern arises as an extension and an enhancement of the standard schemes and practices of pattern recognition. Essential notions of pattern recognition are elaborated on and carefully revisited in order to clarify on how to augment existing classifiers with a new rejection option required to cope with the discussed category of problems. As stressed, this book is self‐contained, and this implies that a number well‐known methods and algorithms are discussed to offer a complete overview of the area to identify main objectives and to present main phases of pattern recognition. The key topics here involve problem formulation and understanding; feature space formation, selection, transformation, and reduction; pattern classification; and performance evaluation. Analyzed is the evolution of research on pattern recognition with rejection, including historical perspective. Identified are current approaches along with present and forthcoming issues that need to be tackled to ensure further progress in this domain. In particular, new trends are identified and linked with existing challenges. The chapters forming this part revisit the well‐known material, as well as elaborate on new approaches to pattern recognition with rejection. Chapter 1 covers fundamental notions of feature space formation. Feature space is of a paramount relevance implying quality of classifiers. The focus of the chapter is on the analysis and comparative assessment of the main categories of methods used in feature construction, transformation, and reduction. In Chapter 2, we cover a variety of design approaches to the design of fundamental classifiers, including such well‐known constructs as k‐NN (nearest neighbor), naïve Bayesian classifier, decision trees, random forests, and support vector machines (SVMs). Comparative studies are supported by a suite of illustrative examples. Chapter 3 offers a detailed formulation of the problem of recognition with rejection. It delivers a number of motivating examples and elaborates on the existing studies carried out in this domain. Chapter 4 covers a suite of evaluation methods required to realize tasks of pattern recognition with a rejection option. Along with classic performance evaluation approaches, a thorough discussion is presented on a multifaceted nature of pattern recognition evaluation mechanisms. The analysis is extended by dealing with balanced and imbalanced datasets. The discussion commences with an evaluation of a standard pattern recognition problem and then progresses toward pattern recognition with rejection. We tackle an issue of how to evaluate pattern recognition with rejection when the problem is further exacerbated by the presence of imbalanced data. A wide spectrum of measures is discussed and employed in experiments, including those of comparative nature. In Chapter 5, we present an empirical evaluation of different rejecting architectures. An empirical verification is performed using datasets of handwritten digits and symbols of printed music notation. In addition, we propose a rejecting method based on a concept of geometrical regions. This method, unlike rejecting architectures, is a stand‐alone approach to support discrimination between native and foreign patterns. We study the usage of elementary geometrical regions, especially hyperrectangles and hyperellipsoids.

Part II focuses on the fundamental concept of information granules and information granularity. Information granules give rise to the area of granular computing—a paradigm of forming, processing, and interpreting information granules. Information granularity comes hand in hand with the key notion of data quality—it helps identify, quantify, and process patterns of a certain quality. The chapters are structured in a way to offer a top‐down way of material exposure. Chapter 6 brings the fundamentals of information granules delivering the key motivating factors, elaborating on the underlying formalisms (including sets, fuzzy sets, probabilities) along with the operations and transformation mechanisms as well as the characterization of information granules. The design of information granules is covered in Chapter 7. Chapter 8 positions clustering in a new setting, revealing its role as a mechanism of building information granules. In the same vein, it is shown that the clustering results (predominantly of a numeric nature) are significantly augmented by bringing information granularity to the description of the originally constructed numeric clusters. A question of clustering information granules is posed and translated into some algorithmic augmentations of the existing clustering methods. Further studies on data quality and its quantification and processing are contained in Chapter 9. Here we focus on data (value) imputation and imbalanced data—the two dominant manifestations in which the quality of data plays a pivotal role. In both situations, the problem is captured through information granules that lead to the quantification of the quality of data as well as enrich the ensuing classification schemes.

This book exhibits a number of essential and appealing features:

Systematic exposure of the concepts, design methodology, and detailed algorithms. In the organization of the material, we adhere to the top‐down strategy starting with the concepts and motivation and then proceeding with the detailed design materializing in specific algorithms and a slew of representative applications.

A wealth of carefully structured and organized illustrative material. This book includes a series of brief illustrative numeric experiments, detailed schemes, and more advanced problems.

Self‐containment. We aimed at the delivery of self‐contained material providing with all necessary prerequisites. If required, some parts of the text are augmented with a step‐by‐step explanation of more advanced concepts supported by carefully selected illustrative material.

Given the central theme of this book, we hope that this volume would appeal to a broad audience of researchers and practitioners in the area of pattern recognition and data analytics. It can serve as a compendium of actual methods in the area and offer a sound algorithmic framework.

This book could not have been possible without support provided by organizations and individuals.

We fully acknowledge the support provided by the National Science Centre, grant No 2012/07/B/ST6/01501, decision no. UMO‐2012/07/B/ST6/01501.

Dr Agnieszka Jastrzebska has done a meticulous job by helping in the realization of experiments and producing graphic materials. We are grateful to the team of professionals at John Wiley, Kshitija Iyer, and Grace Paulin Jeeva S for their encouragement from the outset of the project and their continuous support through its realization.

Władysław Homenda and Witold Pedrycz

WILEY SERIES ON METHODS AND APPLICATIONS IN DATA MINING

PATTERN RECOGNITION

A Quality of Data Perspective

PREFACE

PART I
FUNDAMENTALS