Cover Page

Series Editor
Patrick Paroubek

Natural Language Processing and Computational Linguistics 2

Semantics, Discourse and Applications

Mohamed Zakaria Kurdi

Wiley Logo

Introduction

Language is a central tool in our social and professional lives. It is a means to convey ideas, information, opinions and emotions, as well as to persuade, request information, give orders, etc. The interest in language from a computer science point of view began with the start of computer science studies themselves, notably in the context of work in the area of artificial intelligence. The Turing test, one of the first tests developed to determine whether a machine is intelligent or not, stipulates that to be considered as intelligent, the machine must have conversational capacities comparable to those of a human [TUR 50]. This means that an intelligent machine must have the capacity for comprehension and generation, in the broad sense of the terms, hence the interest in natural language processing (NLP) at the dawn of the computer age. Historically, computer processing of languages was very quickly directed toward applied domains such as machine translation (MT) in the context of the Cold War. Thus, the first MT system was created as part of a shared project between Georgetown University and IBM in the United States [DOS 55, HUT 04]. These applied works were not as successful as intended and the researchers quickly became aware that a deep understanding of the linguistic system was a prerequisite for any successful application.

The internet wave between the mid-1990s and the start of the 2000s was a very significant driving force for NLP and related domains, notably information retrieval, which grew from a marginal domain limited to information retrieval in the context of a large company to information retrieval on the scale of the Internet, whose content is constantly growing. This development in terms of the availability of data also favored a discipline that was already in its infancy: Data Science. Located at the intersection of statistics, computer science and mathematics, Data Science focuses on the analysis, visualization and processing of digital data in all forms: images, text and speech. The role of NLP within Data Science is obvious, given that the majority of the information processed is contained in written documents or speech recordings. It is therefore possible to distinguish two different but complementary research approaches in the domain of NLP. On the one hand, there are works that aim to solve the fundamental problem of language processing and that are consequently concerned with the cognitive and linguistics aspects of this problem. On the other hand, several works are dedicated to optimizing and adapting existing NLP techniques for various applied domains such as the medical or banking sectors.

The objective of this book is to provide a comprehensive review of classic and modern works in the domains of lexical databases and the representation of knowledge for NLP, semantics, discourse analysis, and NLP applications such as machine translation and information retrieval. This book also aims to be profoundly interdisciplinary by giving equal consideration to linguistic and cognitive models, algorithms and computer applications as much as possible because we are starting from the premise, which has been proven in NLP and elsewhere time and time again, that the best results are the product of a good theory paired with a well-designed empirical approach.

In addition to the Introduction, this book has four chapters. The first chapter concerns the lexicon and the representation of knowledge. After an introduction to the principles of lexical semantics and theories of lexical meaning, this chapter covers lexical databases, the main procedures for representing knowledge and ontologies. The second chapter is dedicated to semantics. First, the main approaches in combinatorial semantics such as interpretive semantics, generative semantics, case grammar, etc. will be presented. The next section is dedicated to the logical approaches to formal semantics used in the domain of NLP. The third chapter focuses on discourse. It covers the fundamental concepts in discourse analysis such as utterance production, thematic progression, structuring information in discourse, coherence and cohesion. This chapter also presents different approaches to discourse processing such as linear segmentation, discourse analysis and interpretation, and anaphora resolution. The fourth and final chapter is dedicated to NLP applications. First, the fundamental aspects of NLP systems such as software architecture and evaluation approaches are presented. Then, some particularly important applications in the domain of NLP, such as machine translation, information retrieval and information extraction, are reviewed.