Table of Contents

Cover

Dedication

Title

Acknowledgments

1 Introduction: the Project

1.1. Characterizing a set of infinite size
1.2. Computers and linguistics
1.3. Levels of formalization
1.4. Not applicable
1.5. NLP applications
1.6. Linguistic formalisms: NooJ
1.7. Conclusion and structure of this book
1.8. Exercises
1.9. Internet links

PART 1: Linguistic Units

2 Formalizing the Alphabet
1. 2.1. Bits and bytes
2. 2.2. Digitizing information
3. 2.3. Representing natural numbers
4. 2.4. Encoding characters
5. 2.5. Alphabetical order
6. 2.6. Classification of characters
7. 2.7. Conclusion
8. 2.8. Exercises
9. 2.9. Internet links
3 Defining Vocabulary
1. 3.1. Multiple vocabularies and the evolution of vocabulary
2. 3.2. Derivation
3. 3.3. Atomic linguistic units (ALUs)
4. 3.4. Multiword units versus analyzable sequences of simple words
5. 3.5. Conclusion
6. 3.6. Exercises
7. 3.7. Internet links
4 Electronic Dictionaries
1. 4.1. Could editorial dictionaries be reused?
2. 4.2. LADL electronic dictionaries
3. 4.3. Dubois and Dubois-Charlier electronic dictionaries
4. 4.4. Specifications for the construction of an electronic dictionary
5. 4.5. Conclusion
6. 4.6. Exercises
7. 4.7. Internet links

PART 2: Languages, Grammars and Machines

5 Languages, Grammars, and Machines
1. 5.1. Definitions
2. 5.2. Generative grammars
3. 5.3. Chomsky-Schützenberger hierarchy
4. 5.4. The NooJ approach
5. 5.5. Conclusion
6. 5.6. Exercises
7. 5.7. Internet links
6 Regular Grammars
1. 6.1. Regular expressions
2. 6.2. Finite-state graphs
3. 6.3. Non-deterministic and deterministic graphs
4. 6.4. Minimal deterministic graphs
5. 6.5. Kleene’s theorem
6. 6.6. Regular expressions with outputs and finite-state transducers
7. 6.7. Extensions of regular grammars
8. 6.8. Conclusion
9. 6.9. Exercises
10. 6.10. Internet links
7 Context-Free Grammars
1. 7.1. Recursion
2. 7.2. Parse trees
3. 7.3. Conclusion
4. 7.4. Exercises
5. 7.5. Internet links
8 Context-Sensitive Grammars
1. 8.1. The NooJ approach
2. 8.2. NooJ contextual constraints
3. 8.3. NooJ variables
4. 8.4. Conclusion
5. 8.5. Exercises
6. 8.6. Internet links
9 Unrestricted Grammars
1. 9.1. Linguistic adequacy
2. 9.2. Conclusion
3. 9.3. Exercise
4. 9.4. Internet links

PART 3: Automatic Linguistic Parsing

10 Text Annotation Structure
1. 10.1. Parsing a text
2. 10.2. Annotations
3. 10.3. Text annotation structure (TAS)
4. 10.4. Exercise
5. 10.5. Internet links
11 Lexical Analysis
1. 11.1. Tokenization
2. 11.2. Word forms
3. 11.3. Morphological analyses
4. 11.4. Multiword unit recognition
5. 11.5. Recognizing expressions
6. 11.6. Conclusion
7. 11.7. Exercise
12 Syntactic Analysis
1. 12.1. Local grammars
2. 12.2. Structural grammars
3. 12.3. Conclusion
4. 12.4. Exercises
5. 12.5. Internet links
13 Transformational Analysis
1. 13.1. Implementing transformations
2. 13.2. Theoretical problems
3. 13.3. Transformational analysis with NooJ
4. 13.4. Question answering
5. 13.5. Semantic analysis
6. 13.6. Machine translation
7. 13.7. Conclusion
8. 13.8. Exercises
9. 13.9. Internet links

Conclusion

Bibliography

Index

End User License Agreement

List of Illustrations

1 Introduction: the Project

Figure 1.1. The number of any set of sentences can be doubled
Figure 1.2. Really?
Figure 1.3. Vietnamese–English translation with Google Translate
Figure 1.4. Translation with Google Translate vs. with NooJ
Figure 1.5. Article from the newspaper Le Monde (October 2014) translated with Google Translate
Figure 1.6. Extract from Penn Treebank
Figure 1.7. A single tool for formalization: NooJ

2 Formalizing the Alphabet

Figure 2.1. Two electrical states: a light bulb turned on or off
Figure 2.2. Representation of numbers in binary notation
Figure 2.3. Extract from the Unicode table
Figure 2.4. One possible encoding of the Latin alphabet
Figure 2.5. ASCII encoding
Figure 2.6. Accented Latin letters
Figure 2.7. Character encoding is still problematic as of late 2015
Figure 2.8. Unicode representation of the character “é”
Figure 2.9. A Chinese character that has no Unicode code
Figure 2.10. One Chinese character has three Unicode codes
Figure 2.11. Four graphical variants for a single Unicode code

3 Defining Vocabulary

Figure 3.1. Phablet, Bushism, Chipotlification, tocoupify

4 Electronic Dictionaries

Figure 4.1. Analysis of the lexical entry “artisan”
Figure 4.2. A lexicon-grammar table for English verbs
Figure 4.3. Lexicon-grammar table for phrasal verbs
Figure 4.4. Extract from DELAC dictionary (Nouns)
Figure 4.5. Le Dictionnaire électronique des mots
Figure 4.6. Les Verbes Français dictionary
Figure 4.7. T grammar of constructions
Figure 4.8. Occurrences of the verb abriter in a direct transitive construction (T)
Figure 4.9. Occurrences of the verb “abriter” in a pronominal construction (P)

5 Languages, Grammars, and Machines

Figure 5.1. A generative grammar
Figure 5.2. Generation of the sentence “the cat sees a dog”
Figure 5.3. Chomsky-Schutzenberger hierarchy

6 Regular Grammars

Figure 6.1. Applying a regular expression to a text
Figure 6.2. Display of a graph using XFST
Figure 6.3. Informal time
Figure 6.4. A non-deterministic graph
Figure 6.5. A deterministic graph
Figure 6.6. A minimal graph
Figure 6.7. Five basic graphs
Figure 6.8. Disjunction and Kleene operator
Figure 6.9. Graph equivalent to a regular expression
Figure 6.10. A finite-state graph
Figure 6.11. Incorporating the node “red”
Figure 6.12. Incorporating the node “pretty”
Figure 6.13. Completeing the node “very”
Figure 6.14. Final graph
Figure 6.15. An spelling transducer
Figure 6.16. A terminological transducer
Figure 6.17. A morphological transducer
Figure 6.18. A transducer for translation
Figure 6.19. A query containing syntactic symbols
Figure 6.20. The operator +ONE

7 Context-Free Grammars

Figure 7.1. A NooJ context-free grammar
Figure 7.2. A context-free grammar with syntactic symbols
Figure 7.3. Recursive Graph
Figure 7.4. A more general grammar
Figure 7.5. A recursive context-free grammar
Figure 7.6. Right recursive grammar
Figure 7.7. Finite-state graph equivalent to a right-recursive context-free grammar
Figure 7.8. Left recursive grammar
Figure 7.9. Finite-state graph equivalent to a left-recursive context-free grammar
Figure 7.10. Middle recursion
Figure 7.11. An ambiguous grammar
Figure 7.12. First parse tree for the ambiguous sentence: This man sees the chair from his house
Figure 7.13. Second derivation for the sentence: This man sees the chair from his house

8 Context-Sensitive Grammars

Figure 8.1. Context-sensitive grammar for the language aⁿbⁿcⁿ
Figure 8.2. NooJ grammar for the language aⁿbⁿcⁿ
Figure 8.3. NooJ grammar that recognizes the language aⁿbⁿcⁿdⁿeⁿ
Figure 8.4. Grammar of language a^2ⁿ
Figure 8.5. Grammar that recognizes reduplications
Figure 8.6. A German finite-state graph to describe agreement in gender, number and case.
Figure 8.7. Agreement with constraints'
Figure 8.8. Morphological context-sensitive grammar
Figure 8.9. Checking the presence of a question mark
Figure 8.10. Setting a variable
Figure 8.11. Inheritance: $N → $NPH

9 Unrestricted Grammars

Figure 9.1. Unrestricted grammar
Figure 9.2. NooJ unrestricted grammar
Figure 9.3. Respectively

10 Text Annotation Structure

Figure 10.1. Annotations for the ambiguous sequence “black box”
Figure 10.2. The two terms “big screen” and “screen star” overlap
Figure 10.3. Annotating the contracted form “cannot”
Figure 10.4. Annotating the phrasal verb “call back”
Figure 10.5. A TAS right after the lexical analysis

11 Lexical Analysis

Figure 11.1. Ambiguity triggered by the lack of vowels
Figure 11.2. Hebrew and Latin alphabets together in same text
Figure 11.3. Itogi Weekly no. 40, October 3rd 2011
Figure 11.4. Transliteration variants
Figure 11.5. Contractions
Figure 11.6. Contractions of “not”
Figure 11.7. Prefixes
Figure 11.8. Numerical determinants
Figure 11.9. Multiple solutions for breaking down a Chinese text
Figure 11.10. Intonation in Armenian
Figure 11.11. Recognizing US Phone Numbers
Figure 11.12. Roman numerals
Figure 11.13. Paradigm TABLE
Figure 11.14. Inflection codes used in the English NooJ module
Figure 11.15. Paradigm HELP
Figure 11.16. Paradigm for KNOW
Figure 11.17. Morphological operators
Figure 11.19. Paradigm NN
Figure 11.20. France and its derived forms
Figure 11.21. Dictionary produced automatically from a morphological grammar
Figure 11.22. A productive morphological rule
Figure 11.23. Description of Spanish clitics (infinitive form)
Figure 11.24. Agglutination in German
Figure 11.25. A family of terms
Figure 11.26. Checking context for the characteristic constituent
Figure 11.27. Checking context, v2
Figure 11.28. Checking context, v3
Figure 11.29. Annotate phrasal verbs
Figure 11.30. Discontinuous annotation in the TAS

12 Syntactic Analysis

Figure 12.1. A local grammer for common email address
Figure 12.2. Graph “on the 3rd of June”
Figure 12.3. Graph “at seven o’clock”
Figure 12.4. Date grammar
Figure 12.5. A syntactic annotation in TAS
Figure 12.6. Grammar of preverbal particles in French
Figure 12.7. Detecting ambiguities in the word form “this”
Figure 12.8. A syntax tree
Figure 12.9. Structure of a sentence that contains a discontinuous expression
Figure 12.10. A grammar produces structured annotations
Figure 12.11. A structured group of syntactic annotations
Figure 12.12. Syntactic analysis of a lexically ambiguous sentence
Figure 12.13. Analyzing a structurally ambiguous sentence
Figure 12.14. Simplified grammar
Figure 12.15. Another parse tree for a simplified grammar
Figure 12.16. Parse tree for a structured grammar
Figure 12.17. Dependency grammar
Figure 12.18. Dependency tree
Figure 12.19. ALUs in the syntax tree

13 Transformational Analysis

Figure 13.1. The sentence Joe loves Lea is transformed automatically
Figure 13.2. Passive
Figure 13.3. Negation
Figure 13.4. Making the subject into a pronoun
Figure 13.5. A few elementary transformations
Figure 13.6. The operation [Passive-inv]
Figure 13.7. A transformation chain
Figure 13.8. Grammar for declarative transitive sentences
Figure 13.9. Grammar used in mixed “analysis + generation”mode
Figure 13.10. Linking complex sentences to their transformational properties
Figure 13.11. Automatic transformation
Figure 13.12. Simple French → English translation
Figure 13.13. Translation changing the word order
Figure 13.14. Translation with constraints

Bibliography

[AHO 03] AHO A., LAM M., SETHI R. et al., Compilers: Principles, Techniques, and Tools, 2nd ed., Addison Wesley, 2006.
[ALL 07] ALLAUZEN C., RILEY M., SCHALKWYK J., “Open Fst: a general and efficient weighted finite-state transducer library”, Proceedings of the 12^th International Conference on Implementation and Application of Automata (CIAA), vol. 4783, pp. 11–23, 2007.
[AME 11] American Heritage Dictionary of the English Language, Fifth Edition. Boston: Houghton Mifflin Company, 2011.
[AOU 07] AOUGHLIS F., “A computer science dictionary for NooJ”, Lecture Notes in Computer Science, Springer-Verlag, vol. 4592, p. 341–351, 2007.
[BAC 59] BACKUS J., “The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference”, Proceedings of the International Conference on Information Processing, UNESCO, pp. 125–132, 1959.
[BAL 02] BALDRIDGE J., Lexically Specified Derivational Control in Combinatory Categorial Grammar, PhD Dissertation. Univ. of Edinburgh, 2002.
[BAR 08] BARREIRO A., “Para MT: a paraphraser for machine translation”, Lecture Notes in Computer Science, Springer-Verlag, vol. 5190, pp. 202–211, 2008.
[BAR 14] BARREIRO A., BATISTA F., RIBEIRO R. et al., “Open Logos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries”, Proceedings of the 9th edition of the LREC Conference, 2014.
[BEN 15] BEN A., FEHRI H., BEN H., “Translating Arabic relative clauses into English using NooJ”, Formalising Natural Languages with NooJ 2014, Cambridge Scholars Publishing, Newcastle, 2015.
[BEN 10] BEN H., PITON O., FEHRI H., “Recognition and Arabic-French translation of named entities: case of the sport places”, Finite-State Language Engineering with NooJ: Selected Papers from the NooJ 2009 International Conference, Sfax University Press, Tunisia, 2010.
[BER 60] BERNER R., “A proposal for character code compatibility”, Communications of the ACM, vol. 3, no. 2, pp. 71–72, 1960.
[BIN 90] BINYONG Y., FELLEY M., Chinese Romanization: Pronunciation and Orthography, Sinolingua, Peking, 1990.
[BLA 90] BLAKE B., Relational Grammar, Routledge, London, 1990.
[BLO 33] BLOOMFIELD L., Language, Henry Holt, New York, 1933.
[BÖG 07] BÖGEL T., BUTT M., HAUTLI A. et al., “Developing a finite-state morphological analyzer for Urdu and Hindi: some issues”, Proceedings of FSMNLP07, Potsdam, Germany, 2007.
[BRI 92] BRILL E., “A simple rule-based part of speech tagger”, Proceedings of the ANLC’92 3rd Conference on Applied Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, 1992.
[BRU 02] BRUNSTEIN B., “Annotation guidelines for answer types”, Linguistic Data Consortium, Philadelphia, 2002.
[DON 13] DONNELLY C., STALLMAN R., GNU Bison-The Yacc-Compatible Parser Generator: Bison Version 2.7, FSF, p. 201, 2013.
[CHA 97] CHARNIAK E., “Statistical techniques for natural language parsing”, AI Magazine, vol. 18, no. 4, p. 33, 1997.
[CHO 57] CHOMSKY N., Syntactic Structures, Mouton: The Hague, 1957.
[CHR 92] CHRISTIANSEN M., “The (non) necessity of recursion in natural language processing”, Proceedings of the 14^th Annual Conference of the Cognitive Science Society, Cognitive Science Society, Indiana University, pp. 665–670, 1992.
[CHR 99] CHROBOT A., COURTOIS B., HAMMANI-MCCARTHY M. et al., Dictionnaire électronique DELAC anglais: noms composés. Technical Report 59, LADL, Université Paris 7, 1999.
[COU 90a] COURTOIS B., “Un système de dictionnaires électroniques pour les mots simples du français”, in COURTOIS B., SILBERZTEIN M. (eds), Dictionnaires électroniques du français, Larousse, Paris, pp. 5–10, 1990.
[COU 90b] COURTOIS B., SILBERZTEIN M., Les dictionnaires électroniques, Langue française no. 87, Larousse, Paris, 1990.
[CAL 95] DALRYMPLE M., KAPLAN R., MAXWELL J. et al., Formal Issues in Lexical-Functional Grammar, CSLI Publications, Stanford, 1995.
[DAN 85] DANLOS L., Génération automatique de textes en langue naturelle, Masson, Paris, 1985.
[DON 07] DONABEDIAN B., “La lemmatisation de l’arménien occidental avec NooJ”, Formaliser les langues avec l’ordinateur: De INTEX à NooJ, Les cahiers de la MSH Ledoux, Presses universitaires de Franche-Comté, pp. 55–75, 2007.
[DON 13] DONNELLY Ch. S. R., The Bison Manual, https://jdcqivvcr.updog.co/amRjcWl2dmNyMTg4MjExNDIzWA.pdf, 2013.
[DUB 97] DUBOIS J., DUBOIS-CHARLIER F., Les verbes français, Larousse, Paris, 1997.
[DUB 10] DUBOIS J., DUBOIS-CHARLIER F., “La combinatoire lexico-syntaxique dans le Dictionnaire électronique des mots”, Langages, vol. 3, pp. 31–56, 2010.
[DUR 14] DURAN M., “Formalising Quechua Verb Inflection”, Formalising Natural Languages with NooJ 2013: Selected Papers from the NooJ 2013 International Conference (Saarbrucken, Germany), Cambridge Scholars Publishing, Newcastle, 2014.
[EIL 74] EILENBERG S., Automata, Languages and Machines, Academic Press, New York, 1974.
[EVE 95] EVERAERT M., VAN DER LINDEN E.-J., SCHENK A. et al. (eds), Idioms: Structural and psychological perspectives, Erlbaum, Hillsdale, NJ, 1995.
[FEH 10] FEHRI H., HADDAR K., BEN H., “Integration of a transliteration process into an automatic translation system for named entities from Arabic to French”, Proceedings of the NooJ 2009 International Conference and Workshop, Sfax, University Press, pp. 285–300, 2010.
[FEL 14] FELLBAUM C., “Word Net: an electronic lexical resource for English”, in CHIPMAN S. (ed.), The Oxford Handbook of Cognitive Science, Oxford University Press, New York, 2014.
[FIL 08] FILLMORE C., “A valency dictionary of English”, International Journal of Lexicography Advance Access, October 8, 2008.
[FRE 85] FRECKLETON P., Sentence idioms in English, Working Papers in Linguistics 11, University of Melbourne. 1985.
[FRI 03] FRIEDERICI A., KOTZ S., “The brain basis of syntactic processes: functional imaging and lesion studies”, Neuroimage, vol. 20, no. 1, pp. S8–S17, 2003.
[GAZ 85] GAZDAR G., KLEIN E., PULLUM G. et al., Generalized Phrase Structure Grammar, Blackwell and Cambridge, Harvard University Press, Oxford, MA, 1985.
[GAZ 88] GAZDAR G., “Applicability of Indexed Grammars to Natural Languages”, in REYLE U., ROHRER C. (eds), Natural Language Parsing and Linguistic Theories, Studies in Linguistics and Philosophy 35, D. Reidel Publishing Company, pp. 69–94, 1988.
[GRA 02] GRASS T., MAUREL D., PITON O., “Description of a multilingual database of proper names”, Lecture Notes in Computer Science, vol. 2389, pp. 31–36, 2002.
[GRE 11] GREENEMEIER L., “Say what? Google works to improve YouTube autocaptions for the deaf”, Scientific American, 23rd June 2011.
[GRO 68] GROSS M., Grammaire transformationnelle du français, 1: le verbe, Larousse, Paris, 1968.
[GRO 75] GROSS M., Méthodes en syntaxe, Hermann, Paris, 1975.
[GRO 77] GROSS M., Grammaire transformationnelle du français, 2: syntaxe du nom, Larousse, Paris, 1977.
[GRO 86] GROSS M., Grammaire transformationnelle du français, 3: syntaxe de l’adverbe, Cantilène, Paris, 1986.
[GRO 94] GROSS M., “Constructing lexicon-grammars”, Computational Approaches to the Lexicon, Oxford University Press, pp. 213–263, 1994.
[GRO 96] GROSS M., “Lexicon Grammar”, in BROWN K., MILLER J. (eds), Concise Encyclopedia of Syntactic Theories, Elsevier, New York, pp. 244–258, 1996.
[HAL 94] HALLIDAY M., Introduction to Functional Grammar, 2^nd edition, Edward Arnold, London, 1994.
[HAR 70] HARRIS Z., Papers in Structural and Transformational Linguistics, Springer Science and Business Media, Dodrecht, 1970.
[HAR 02] HARALAMBOUS Y., “Unicode ettypographie: un amour impossible”, Document Numérique, vol. 6, no. 3, pp. 105–137, 2002.
[HER 04] HERBST T., HEATH D., ROE I. et al., (eds). A Valency Dictionary of English: A Corpus-Based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives, Mouton de Gruyter Berlin, 2004.
[HO 78] HO S.H., “An analysis of the two Chinese radical systems”, Journal of the Chinese Language Teachers Association, vol. 13, no. 2, pp. 95–109, 1978.
[HOB 99] HOBBS A., Five-Unit Codes, The North American Data Communications Museum, Sandy Hook, CT, available at: www.nadcomm.com/fiveunit/fiveunits.htm, 1999.
[HOP 79] HOPCROFT J., ULLMAN J., Introduction to Automata Theory, Languages and Computation, Addison-Wesley Publishing, Reading Massachusetts, 1979.
[ITT 07] ITTYCHERIAH A., ROUKOS S., “IBM’s statistical question answering system”, TREC-11 Proceedings, NIST Special Publication, available at: trec.nist.gov/pubs.html, 2007.
[JOH 74] JOHNSON D., Toward a Theory of Relationally-Based Grammar, Garland Publishing, New York, 1974.
[JOH 12] JOHNSON S.C., Yacc: Yet Another Compile Compiler, AT&T Bell Laboratories Murray Hill, NJ, Nov. 2012.
[JOS 87] JOSHI A., “An introduction to tree adjoining grammars”, in MANASTER-RAMER A. (ed.), Mathematics of Language, John Benjamins, Amsterdam, pp. 87–114, 1987.
[JUR 00] JURAFSKY D., MARTIN J., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall, New York, 2000.
[KAP 82] KAPLAN R., BRESNAN J., “Lexical-functional grammar: a formal system for grammatical representation”, in BRESNAN J. (ed.), The Mental Representation of Grammatical Relations, pp. 173–281, MIT Press, Cambridge, 1982.
[KAR 97] KARTTUNEN L., TAMÁS G., KEMPE A., Xerox finite-state tool, Technical report, Xerox Research Centre Europe, 1997.
[KAR 07] KARLSSON F., “Constraints on multiple center-embedding of clauses”, Journal of Linguistics, vol. 43, no. 2, pp. 365–392, 2007.
[KLA 91] KLARSFELDG., HAMMANI-MCCARTHY M., Dictionnaire électronique du LADL pour les mots simples de l’anglais, Technical report, LADL, Université Paris 7, 1991.
[KLE 56] KLEENE S.C., “Representation of events in nerve nets and finite automata”, Automata Studies, Annals of Mathematics Studies, vol. 34, pp. 3–41, 1956.
[KÜB 02] KÜBLER N., “Creating a term base to customise an MT system: reusability of resources and tools from the translator’s point of view”, Proceedings of the 1^st International Workshop on Language Resources in Translation Work and Research (LREC), Las Palmas de Gran Canaria, pp. 44–48, 2002.
[KUP 08] KUPŚĆ A., ABEILLÉ A., “Growing tree Lex”, Computational Linguistics and Intelligent Text Processing, vol. 4919, pp. 28–39, 2008.
[LEC 98] LECLÈRE C., “Travaux récents en lexique-grammaire”, Travaux de linguistique, vol. 37, pp. 155–186, 1998.
[LEC 05] LECLÈRE C., “The lexicon-grammar of french verbs: a syntactic database”, in KAWAGUCHI Y., ZAIMA S., TAKAGAKI et al. (eds.), Linguistic Informatics – State of the Art and the Future, pp. 29–45, Benjamins, Amsterdam/Philadelphia, 2005.
[LEE 90] LEEMAN D., MELEUC S., “Verbes en table et adjectifs en –able”, in COURTOIS B., SILBERZTEIN M. (eds), Dictionnaires électroniques du français, Larousse, Paris, pp. 30–51, 1990.
[LEV 93] LEVIN B. English Verb Classes and Alternations. The University of Chicago Press, Chicago, 1993.
[LIN 08] LIN H.C., “Treatment of Chinese orthographical and lexical variants with NooJ”, in BLANCO X., SILBERZTEIN M. (eds), Proceedings of the 2007 International NooJ Conference, pp. 139–148, Cambridge Scholars Publishing, Cambridge, 2008.
[LIN 10] LINDÉN K., SILFVERBERG M., PIRINEN T., HFST Tools for Morphology: An Efficient Open-Source Package for Construction of Morphological Analysers, University of Helsinki, Finland, 2010.
[MCC 03] MCCARTHY, D., KELLER B., CARROLL J.,“Detecting a continuum of compositionality in phrasal verbs”, Proceedings of the ACL 2003 Workshop on Multiword Expressions, 2003.
[MAC 10] MACHONIS P., “English phrasal verbs: from Lexicon-Grammar to Natural Language Processing”, Southern Journal of Linguistics, vol. 34, no. 1. pp. 21–48, 2010.
[MAC 12] MACHONIS P., “Sorting NooJ out to take multiword expressions into account”, in VUČKOVIĆ K. et al. (ed.), Proceedings of the NooJ 2011 Conference, pp. 152–165, Cambridge Scholars Publishing, Newcastle, 2012.
[MAC 94] MACLEOD C., GRISHMAN R., MEYERS A., “Creating a Common Syntactic Dictionary of English”, Proceedings of the International Workshop on Shareable Natural Language Resources, Nara, Japan, August 10–11, 1994.
[MAC 04] MACLEOD C., GRISHMAN R., MEYERS A. et al., “The NomBank Project: an interim report”, HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotations, 2004.
[MAN 99] MANNING C., SCHÜTZE H., Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, 1999.
[MAN 01] MANI I. (ed.), Automatic Summarization, John Benjamins Publishing, Amsterdam, Philadelphia, 2001.
[MAR 93] MARCUS M., SANTORINI B., MARCINKIEWICZ M., “Building a large annotated corpus of English: the Penn Treebank”, Computational Linguistics, vol. 19, no. 2, pp. 313–330, 1993.
[MCI 81] MCILROY D., “Development of a spelling list”, IEEE Transactions on Communications, vol. 30, no. 1, pp. 91–99, 1981.
[MEL 87] MEL’ČUK I., Dependency Syntax: Theory and Practice, Albany State University Press of New York, 1987.
[MES 08a] MESFAR S., Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard, Thesis, University of Franche-Comté, 2008.
[MES 08b] MESFAR S., SILBERZTEIN M., “Transducer minimization and information compression for NooJ dictionaries”, Proceedings of the FSMNLP 2008 Conference, Frontiers in Artificial Intelligence and Applications, IOS Press, The Netherlands, 2008.
[MOG 08] MOGORRON P., MEJRI S., Las construccionesverbo-nominales libres y fijas, available at: halshs.archives-ouvertes.fr/halshs-00410995, 2008.
[MON 14] MONTELEONE M., VIETRI S., “The NooJ English Dictionary”, in KOEVA S., MESFAR S., SILBERZTEIN M. (eds.), Formalising Natural Languages with NooJ 2013: Selected Papers from the NooJ 2013 International Conference, Cambridge Scholars Publishing, Newcastle, UK, 2014.
[MOO 56] MOORE E., “Gedanken experiments on sequential machines”, Automata studies, Annals of mathematics studies, vol. 32, pp. 129–153, Princeton University Press, 1956.
[MOO 00] MOORE R.C., “Removing left recursion from context-free grammars”, 6th Applied Natural Language Processing Conference / Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics Conference, Association for Computational Linguistics, , pp. 249–255, 2000.
[MOO 97] MOORTGAT M., “Categorial type logics”, in VAN BENTHEM J., MEULEN T. (eds), Handbook of Logic and Language, Elsevier, pp. 93–178, 1997.
[NUN 94] NUNBERG G., SAG I., WASOW T., “Idioms”, Language, vol. 70, pp. 491–538, 1994.
[POL 84] POLLARD C., Generalized Phrase Structure Grammars, Head Grammars, and Natural Language, Ph.D. thesis, Stanford University, 1984.
[POL 94] POLLARD C., SAG I., Head-Driven Phrase Structure Grammar, University of Chicago Press, Chicago, 1994.
[RAY 06] RAYNER M., HOCKEY B., BOUILLON P., Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler, CSLI Publications, Stanford, 2006.
[RHE 88] RHEINGOLD H., They Have a Word for It: A Lighthearted Lexicon of Untranslatable Words and Phrases, Jeremy P. Tarcher Inc., Los Angeles, 1988.
[ROC 97] ROCHE E., SCHABES Y. (eds), Finite-State Language Processing, MIT Press, Cambridge, MA, 1997.
[ROU 06] ROUX M., EL ZANT M., ROYAUTÉ J., “Projet Epidemia, intervention des transducteurs NooJ”, Actes des 9^èmes journées scientifiques INTEX/NooJ, Belgrade, 1–3 June 2006.
[SAB 13] SABATIER P., LE PESANT D., “Les dictionnaires électroniques de Jean Dubois et Françoise Dubois-Charlier et leur exploitation en TAL”, in GALA N., ZOCK M. (eds), Ressources Lexicales, Linguisticae Investigationes Supplementa 30, John Benjamins Publishing Company, Amsterdam, 2013.
[SAG 02] SAG I., BALDWIN T., BOND F. et al., “Multiword Expressions: A Pain in the Neck for NLP”, in Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 1–15, Mexico City, 2002.
[SAL 83] SALKOFF M., “Bees are swarming in the garden: a systematic synchronic study of productivity”, Language, vol. 59, no. 2, 1983.
[SAL 99] SALKOFF M., A French-English Grammar: A Contrastive Grammar on Translation Principles, John Benjamins, Amsterdam, 1999.
[SAL 04] SALKOFF M., “Verbs of mental states”, in Lexique, syntaxe et lexique-grammaire. Papers in honour of Maurice Gross, volume 24 of Lingvisticæ Investigationes Sup-plementa, pp. 561–571, Benjamins, Amsterdam/Philadelphia, 2004.
[SAU 16] SAUSSURE F., Cours de linguistique générale, Payot, Paris, 1916.
[SCH 05] SCHMID H., “A programming language for finite-state transducers”, Proceedings of the 5^th International Workshop on Finite State Methods in Natural Language Processing (FSMNLP), Helsinki, Finland, 2005.
[SIL 87] SILBERZTEIN M., “The lexical analysis of French”, Electronic Dictionaries and Automata in Computational Linguistics, vol. 377, pp. 93–110, 1987.
[SIL 90] SILBERZTEIN M., “Le dictionnaire électronique des mots composés”, in COURTOIS B., SILBERZTEIN M. (eds), Dictionnaires électroniques du français, Larousse, Paris, pp. 11–22, 1990.
[SIL 93a] SILBERZTEIN M., Dictionnaires électroniques et analyse automatique de textes: le système INTEX, Masson, Paris, 1993.
[SIL 93b] SILBERZTEIN M., “Groupes nominaux libres et noms composés lexicalisés”, Linguisticae Investigationes, vol. XVII, no. 2, pp. 405–425, 1993.
[SIL 95] SILBERZTEIN M., “Dictionnaires électroniques et comptage des mots”, 3^es Journées internationales d’analyse statistique des données textuelles (JADT), Rome, 1995.
[SIL 03a] SILBERZTEIN M., NooJ Manual, available at: www.nooj4nlp.net, 2003.
[SIL 03b] SILBERZTEIN M., “Finite-State Recognition of the French determiner system”, Journal of French Language Studies, Cambridge University Press, pp. 221–246, 2003.
[SIL 06] SILBERZTEIN M., “NooJ’s linguistic annotation engine”, in KOEVA S. et al. (ed.), INTEX/NooJ pour le Traitement automatique des langues, pp. 9–26, Presses universitaires de Franche-Comté, 2006.
[SIL 07] SILBERZTEIN M., “An alternative approach to tagging”, in KEDAD Z. et al. (ed.), Proceedings of NLDB 2007, pp. 1–11, LNCS series, Springer-Verlag, 2007.
[SIL 08] SILBERZTEIN M., “Complex annotations with NooJ”, in BLANCO X., SILBERZTEIN M. (ed.), Proceedings of the International NooJ Conference, pp. 214–227, Barcelona, Cambridge Scholars Publishing, Newcastle, 2008.
[SIL 09] SILBERZTEIN M., “Disambiguation tools for NooJ”, in SILBERZTEIN M., VÁRADI T. (eds), Proceedings of the 2008 International NooJ Conference, pp. 158–171, Cambridge Scholars Publishing, Newcastle, 2009.
[SIL 10] SILBERZTEIN M., “Syntactic parsing with NooJ”, in HAMADOU B., SILBERZTEIN M. (eds), Finite-State Language Engineering: NooJ 2009 International Conference and Workshop, Centre for University Publication, Tunisia, 2010.
[SIL 11] SILBERZTEIN M., “Automatic transformational analysis and generation”, Proceedings of the 2010 International Conference and Workshop, pp. 221–231, Greece, 2011.
[SIL 15] SILBERZTEIN M., “The DEM and the LVF dictionaries in NooJ”, in MONTELEONE M., MONTI J., PIA DI BUONO M. et al. (eds), Formalizing Natural Languages with NooJ 2014, Cambridge Scholars Publishing, 2015.
[SLA 07] SLAYDEN G., How to use a Thai dictionary, available at: thai-language.com, 2007.
[SMI 14] SMITH G., Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics, Overlook Hardcover, p. 304, 2014.
[STE 77] STEELE, G. “Debunking the ‘Expensive Procedure Call’ Myth or ‘Procedure Call Implementations Considered Harmful’ or, ‘LAMDBA: The Ultimate GOTO’”, Massachusetts Institute of Technology, Cambridge, MA, 1977.
[THO 68] THOMPSON K., “Regular expression search algorithm”, Communications of the ACM, vol. 11, no. 6, pp. 419–422, 1968.
[TOP 01] TOPPING S., The secret life of Unicode: a peek at Unicode’s soft underbelly, available at: www.ibm.com/developerworks/java/library/u-secret.html, 2001.
[TRO 12] TROUILLEUX F., “A new French dictionary for NooJ: le DM”, in VUČKOVIC K. et al. (ed.), Selected Papers from the 2011 International NooJ Conference, Cambridge Scholar Publishing, Newcastle, 2012.
[TRO 13] TROUILLEUX F., “A description of the French nucleus VP using cooccurrence contraints”, in DONABÉDIAN A. et al. (ed.), Formalising Natural Languages with NooJ, Selected Papers from the NooJ 2012 International Conference, Cambridge Scholars Publishing, 2013.
[TUR 37] TURING A., “On Computable Numbers, with an Application to the Entscheidungsproblem”, Proc. London Math. Soc., 2nd series, vol. 42, pp. 230–265, 1937.
[VIE 08] VIETRI S., “The formalization of Italian lexicon-grammars tables in a Nooj pair dictionary/grammar”, Proceedings of the International NooJ Conference, Budapest, Cambridge Scholars Publishing, Newcastle, 8–10 June 2008.
[VIE 10] VIETRI S., “Building structural trees for frozen sentences”, Proceedings of the NooJ 2009 International Conference and Workshop, pp. 219–230, Sfax, University Publication Center, 2010.
[VIJ 94] VIJAY SHANKER K., WEIR D., “The equivalence of four extensions of context-free grammars”, Mathematical Systems Theory, vol. 27, no. 6, pp. 511–546, 1994.
[VOL 11] VOLOKH A., NEUMANN G., “Automatic detection and correction of errors in dependency tree-banks”, Proceedings of the 49^th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 346–350, 2011.
[WU 10] WU M., “Integrating a dictionary of psychological verbs into a French-Chinese MT system”, Proceedings of the NooJ 2009 International Conference and Workshop, pp. 315–328, Sfax, University Publication Center, 2010.

C, D

characteristic constituent

chinese character

Chomsky-Schützenberger hierarchy

composite character

compound noun

conjugation

context-free grammar (CFG)

context-sensitive grammar (CSG)

contextual constraint

contraction

corpus linguistics

dash

decimal notation

delimiter

dependency tree

descriptive linguistics

dictionnaire electronique des mots (DEM)

digitization

discontinuous annotation

distributional class

Acknowledgments

Formalizing Natural Languages

The NooJ Approach

WILEY END USER LICENSE AGREEMENT

Bibliography

Index

A, B

C, D

E, F, G, H

I, K, L

M, N, P

Q, R, S

T, U, V, X

Conclusion

PART 1
Linguistic Units

PART 2
Languages, Grammars and Machines