Cover Page

Geographic Data Imperfection 1

From Theory to Applications

Edited by

Mireille Batton-Hubert

Eric Desjardin

François Pinet

images

Preface

Geomatics is a scientific field that in the last 30 years has become closely intwined with our everyday life, to such an extent that we often forget all its underlying challenges. Who does not have a navigation application on his or her mobile phone? Who does not manipulate geolocated data? In the coming decades, the volumes of georeferenced data generated should increase dramatically.

This book focuses on the notion of imperfection in geographic data, which is a significant topic in geomatics. In fact, it is essential to define and represent the imperfection that may affect geographic data. Uncertainty constitutes the basis of the study of the so-called modern probability, a field that became very active in the 18th century (thanks to the works carried out by P. de Fermat, B. Pascal, Th. Bayes, P. S. Laplace, and several others) and was complemented by concepts developed in the 19th century and, more particularly, later in the 20th century. The notion of imperfection supplements this concept; the single representation of the stochastic (random) nature of a fact is limited when the aim is to represent the precision of a fact and/or the lack of knowledge about data. These theories, which deal with these two aspects, were complemented in particular by the Dempster–Shafer theory.

A better awareness of this imperfection, which is linked specifically to geographic data, especially during the formalization, storage, and manipulation of this characteristic, improves their analyses and any decision-analysis process. Even if it is important (and even critical) to manage imperfection, it must be acknowledged that integrating it into data-processing procedures may be challenging. To take up this challenge, this book intends to bridge the gap between the need and its implementation. It simultaneously explores theoretical aspects, in order to illustrate more clearly phenomena and representations, and practical/pragmatic aspects by presenting concrete examples and applied tools.

This book was written in the context of an initiative of the Groupement de Recherche du CNRS sur les Méthodes et Applications pour la Géomatique et l’Information Spatiale (GDR MAGIS) (Associated Research of the CNRS on the Methods and Uses of Geomatics and Spatial Information). This initiative, which targeted the uncertainty of spatial data, gave rise to a specific work group which took part in writing this book. Thus, this book is the common product of an analysis of this topic. It is our hope that it will manage to meet the readers’ expectations.

We would like to express our sincere thanks to the authors of the various chapters and, more generally, to all the individuals who took part in the work groups of the GDR MAGIS over time. We extend our thanks to them for their fruitful ideas, which have made it possible to elaborate further on the ideas expressed in this book. We would like to thank the GDR MAGIS of the CNRS as well as its various directors for their support and their trust in this project.

We hope that readers will enjoy this book and that it will shed some light on the methods that make it possible to better understand and process geographic imperfections.

The editors of this book and the organizers of the initiative Incertitude épistémique – des données aux modèles en géomatique (Epistemic Uncertainty – from Data to Models in Geomatics) of the GDR MAGIS of the CNRS:

Mireille BATTON-HUBERT

Eric DESJARDIN

François PINET

May 2019

Part 1
Bases and Concepts

1
Imperfection and Geographic Information

“We should learn to navigate on a sea of uncertainties, sailing in and around islands of certainty”

Edgar Morin, Seven Complex Lessons in Education for the Future (2000)

“Uncertainty is not in things but in our head: uncertainty is a lack of knowledge”

Jacques Bernoulli, Ars Conjectandi (1713)

1.1. Context

Today, geographic information is everywhere. With the constant development of new information and communication technologies, we are witnessing a significant increase in the number of sources of georeferenced data. Data are acquired by IT (information technology) means, such as connected objects, computers, mobile equipment, and through remote sensing, and are then processed in Geographic Information Systems (GISs). The increasing systematization of the automated acquisition of geographic data is paving the way for ever more numerous and complex applications.

In several fields, the terms “data” and “information” are quite often considered to be interchangeable. Yet, many distinguish between the concept of information and that of data [COO 17]. A piece of data corresponds to a value. It may be seen as the assignment of values to properties, for example, City = “Paris”. Sometimes, the types of data are complex, as is the case for multimedia data. When data are processed, organized together, and structured in a precise context, we refer to it as information. In IT, knowledge often corresponds to rules and models that rely on information [BEL 04]. A knowledge base will make it possible, among other things, to reason and make deductions [ABI 00, NIL 90].

Information and data may be geographic or spatial [BEA 19]. “Geographic” is the adjective used when we refer to the Earth. In the field on which this book focuses, the term “spatial” usually refers to a localization (coordinates, topology, etc.) in some type of space (whether geographic or not). A spatial or geographic object has a geometry (a dimension, a shape, some coordinates) that may be more or less known or established. Different properties may be assigned to the object depending on its meaning. The field that studies the methods and technologies linked to geographic information (from its acquisition to its dissemination) is called “geomatics”. The geomatic paradigm was born in Canada [BÉD 07].

Objects are often affected by imperfections. In the literature, various terms are used to refer to these imperfections, so it is difficult to put forward only one type of terminology. Depending on the points of view, the same term may be defined in a different manner.

The imperfection of information and geographic data is often neglected so, occasionally, there are risks involved when using them [BÉD 86, EDO 15]. For example, these risks are significant when data are used to help decision-making. Imperfection often derives from a restriction that hinders the correct identification of an object and/or the accurate measurement of its properties [BÉD 86]. In most cases, a representation said to be certain is used even if the object has not been completely defined. There will be a difference between the object and its representation. Finding out this difference is indeed a difficult and intricate task. Conveying this difference implies an “actual world” independent of the observer. This is often difficult and complex [BÉD 86] as the objects of the actual world are in general perceived and known through observations. According to [FIS 99], the main problem concerns the way in which a data collector and a data user understand the natures of uncertainties, which may be of different kinds.

As this book demonstrates, it is possible to avoid overlooking data imperfection. There are solutions that allow us to manipulate imperfect geographic data effectively. Over time, various specific techniques and methods have been put forward to define, represent, and deal with the imperfection of a geographic object. Each of them may be used in relation to the level of quality expected and the application targeted. As [EDO 15] and [BÉD 86] recall, using imperfect data may indeed be acceptable for some uses but not for others. This book aims to present some of the techniques and methods used to manage the imperfection of geographic data.

In order to give a (very general) trend to the theme of uncertainty of spatial data in the scientific domain, in Table 1.1 and in Figure 1.1, we present the search results in Scopus, a bibliographical and scientific database. A 25-year interval (1994–2018) is considered. Column A indicates the number of scientific publications whose keywords include the terms “spatial data” and “uncertainty”. Column B shows the number of publications that include “spatial data” in their keywords. Column C shows the ratio A/B over these 25 years, which corresponds to 2.25%. The chosen terms, i.e. “spatial data” and “uncertainty”, are quite emblematic of the topic we are focused on. Yet, the Scopus searches could certainly be refined, especially through a test with various keywords of concepts related to data and spatialized information as well as imprecision.

In this chapter, we will introduce the different parts of this book while also revealing which issues they tackle. We have chosen to structure this book into three different sections: an introduction of the foundations and main concepts, a part on the modes of representation, and then a description of reasoning systems and processes.

1.2. Concepts, representation, reasoning system, and data processing

1.2.1. Foundations and concepts

The first part describes the foundations and main concepts related to the imperfection of geographic data. The issue is to shed light on and provide a summary of terminologies, the origins of imperfections, as well as the concepts of quality, integrity, and confidence.

The main goal of this chapter is to clarify the terminology and the definitions assigned to various concepts that revolve around the imperfection and uncertainty of geographic information. These terms have been used in different ways over the years. This chapter will underline some definitions that can be found in the field. The analysis put forward does not lead to a new terminology. Rather, it brings into relief the diversity of uses while also highlighting the main differences and similarities between the concepts and the terminologies.

Chapter 2 introduces the principal sources of imperfections. It attempts to answer the following question: “where do the imperfections of geographic data originate?”. Naturally, there is more than one answer to this question. There are different causes behind these imperfections. One of the aims of this chapter is to show and illustrate imperfection at various points during the life cycle of geographic information.

Chapter 3 provides a basic explanation of the quality and integrity of data. On several occasions, it recalls standard quality criteria as well as the way in which they are assessed. This chapter establishes the notions of data integrity and confidence, and it concretely illustrates the various problems related to these concepts through examples drawn from the field of maritime navigation.

1.2.2. Representations of imperfection

Part 2 tackles the main representations of imperfection and their applications for geographic information.

Chapter 5 describes various modeling formalisms, especially fuzzy sets and the means of representing confidence and certainty (probability, possibility, necessity, etc.). It also presents the operations used to manipulate these concepts and reveals how spatial entities like broad boundary objects and fuzzy objects can be modeled.

Chapter 6 focuses on the representation of classes of objects. When several objects share the same properties and are of the same type, they can be grouped into classes. Elements of the same class share the same characteristics. Thus, establishing classes denotes identifying the points in common among the various entities. Defining classes is very important when a dataset must be described. This chapter reveals how it is possible to describe data imperfection when drawing class diagrams.

1.2.3. Reasoning systems and data processing

Part 3 introduces a few data processing and reasoning systems that involve spatial objects. Imperfection is considered in relation to our knowledge about the objects.

Chapter 7 concerns the spatial relations among objects. It reveals how it is possible to reason specifically about these relations and then move on to modeling these relations on imperfect objects such as broad boundary objects in space.

Chapter 8 deals with a type of knowledge that is founded on rules and deduction. This chapter provides rational approaches that employ a type of modeling based on the rules in first-order logic and then in modal logic. Modal logic can describe uncertainties. This chapter includes an example involving geographic data so that this approach can be understood.

Chapter 9 deals with the case involving the gradual acquisition of information and the repeated revision of the state of knowledge. This necessity is all the more significant as geographic data may be acquired in various ways over time and their sources may be heterogeneous depending on the case. This chapter focuses on the belief revision of imperfect information, especially Bayesian revisions and alternatives in nonprobabilistic formalisms.

Chapter 10 considers on an operational level the awareness in decision-support processes of uncertainties, representations, and perceptions of the territory. Choosing a suitable approach is a difficult issue: the potential variety of the number of formalisms employed to deal with these aspects in relation to the types of management is an issue tackled in this chapter. Chapter 10 highlights that the decision-support process may be of a specific kind when it is managed by an analyst/geographer and when it addresses the question of multi-criteria approaches and risk analyses.

1.3. Some conclusive remarks

This book aims to present an analysis of the imperfection of spatial information and its origins as well as to group together various descriptions of methods that can be useful for representing, reasoning about, and processing this information. In order to make its content more intuitive, this work provides various illustrations chapter after chapter in various applications.

Table 1.1. Evolution of the appearance of the terms “spatial data” and “uncertainty” from 1994 to 2018 in the keywords of publications (Scopus – January 30, 2019)

Years

A: keywords “spatial data”

“uncertainty”

B: keyword “spatial data”

A/B

1994

2

75

2.67%

1995

0

62

0.00%

1996

1

75

1.33%

1997

3

80

3.75%

1998

2

73

2.74%

1999

1

68

1.47%

2000

8

168

4.76%

2001

3

131

2.29%

2002

2

161

1.24%

2003

7

214

3.27%

2004

6

286

2.10%

2005

15

427

3.51%

2006

13

456

2.85%

2007

19

504

3.77%

2008

22

770

2.86%

2009

14

1084

1.29%

2010

29

1106

2.62%

2011

25

1001

2.50%

2012

17

923

1.84%

2013

17

835

2.04%

2014

12

756

1.59%

2015

14

759

1.84%

2016

20

825

2.42%

2017

17

812

2.09%

2018

10

775

1.29%

279

12426

2.25%

Naturally, this work does not treat the topic exhaustively, but it allows us to take into consideration various points related to managing imperfection in geographic information. According to us, it is advisable to pay more attention to the imperfection of spatial data, in order to improve and increase the reliability of the future use of geographic information. The use of geographic data should tend to speed up so it is quite likely that being aware of these issues will become more and more important in quite a few areas of application.

images

Figure 1.1. A chart of the ratio A/B of Table 1.1

1.4. References

[ABI 00] ABITEBOUL S., HULL R., VIANU V., Fondements des bases de données, Vuibert, Paris, 2000.

[BEA 19] BEAL V., Spatial Data, available at: https://www.webopedia.com/TERM/S/spatial_data.html, 2019.

[BÉD 07] BÉDARD Y., “Geomatics. 26 years of history already!”, Geomatica, vol. 61, no. 3, pp. 269–272, 2007.

[BÉD 86] BÉDARD Y., “A study of data using a communication based conceptual framework of land information systems”, (updated version), The Canadian Surveyor, vol. 40, no. 4, pp. 449–460, 1986.

[BEL 04] BELLINGER G., CASTRO D., MILLS A., Data, Information, Knowledge, and Wisdom, available at: http://www.systems-thinking.org/dikw/dikw.htm, 2004.

[COO 17] COOPER P., “Data, information, knowledge and wisdom”, Anaesthesia & Intensive Care Medicine, vol. 18, no. 1, pp. 55–56, 2017.

[EDO 15] EDOH-ALOVE D.E.A., Handling spatial vagueness issues in SOLAP datacubes by introducing a risk-aware approach in their design, PhD thesis, Université Blaise Pascal – Clermont-Ferrand II, Université Laval Québec, available at: https://tel.archives-ouvertes.fr/tel-01875720, 2015.

[FIS 99] FISHER P.F., “Models of uncertainty in spatial data”, in LONGLEY P., GOODCHILD M., MAGUIRE D. et al. (eds), Geographical Information Systems: Principles, Techniques, Management and Applications, vol. 1, John Wiley & Sons, New York, 1999.

[NIL 90] NILSSON U., MATUSZYNSKI J., Logic, Programming and Prolog, Wiley, Chichester, 1990.

Chapter written by François PINET, Mireille BATTON-HUBERT and Eric DESJARDIN.