Cover page

Table of Contents

Cover

Praise for Articulatory Phonetics

Title page

Copyright page

List of Figures

Acknowledgments

Introduction

Part I: Getting to Sounds

Chapter 1 The Speech System and Basic Anatomy

1.1 The Speech Chain

1.2 The Building Blocks of Articulatory Phonetics

1.3 The Tools of Articulatory Phonetics

Exercises

Chapter 2 Where It All Starts: The Central Nervous System

2.1 The Basic Units of the Nervous System

2.2 The Central Nervous System

2.3 Measuring the Brain: fMRI, PET, EEG, MEG, TMS

Exercises

Chapter 3 From Thought to Movement: The Peripheral Nervous System

3.1 The Peripheral Nervous System

3.2 How Muscles Move

3.3 Measuring Muscles: EMG

Exercises

Chapter 4 From Movement to Flow: Respiration

4.1 Breathing Basics

4.2 The Anatomy of Breathing

4.3 Measuring Airflow and Pressure: Pneumotachograph

4.4 Sounds

Exercises

Chapter 5 From Flow to Sound

5.1 Intrinsic Laryngeal Anatomy

5.2 Sounds: The Voice

5.3 Measuring the Vocal Folds: EGG

Exercises

Part II: Articulating Sounds

Chapter 6 Articulating Laryngeal Sounds

6.1 Extrinsic Laryngeal Anatomy

6.2 Sounds

6.3 Measuring Laryngeal Articulations: Endoscopy

Exercises

Chapter 7 Articulating Velic Sounds

7.1 Anatomy of the Velum

7.2 Sounds

7.3 Measuring the Velum: X-ray Video

Exercises

Chapter 8 Articulating Vowels

8.1 The Jaw and Extrinsic Tongue Muscles

8.2 Sounds: Vowels

8.3 Measuring Vowels: Ultrasound

Exercises

Chapter 9 Articulating Lingual Consonants

9.1 The Intrinsic Tongue Muscles

9.2 Sounds: Lingual Consonants

9.3 Measuring Lingual Consonants: Palatography and Linguography

Exercises

Chapter 10 Articulating Labial Sounds

10.1 Muscles of the Lips and Face

10.2 Sounds: Making Sense of [labial]

10.3 Measuring the Lips and Face: Point Tracking and Video

Exercises

Chapter 11 Putting Articulations Together

11.1 Coordinating Movements

11.2 Coordinating Complex Sounds

11.3 Coarticulation

11.4 Measuring the Whole Vocal Tract: Tomography

Exercises

Abbreviations Used in this Book

Muscles with Innervation, Origin, and Insertion

Index

Praise for Articulatory Phonetics

“Life has just become less lonely for Acoustic and Auditory Phonetics. Gick, Wilson, and Derrick have given us a marvelous addition to the classroom, providing an authoritative description of speech articulation, an insightful and balanced guide to the theory of cognitive control of speech, and a highly readable introduction to the methods used in articulatory phonetics. All students of phonetics should study this book!”

Keith Johnson, University of California, Berkeley

“Gick, Wilson, and Derrick offer an engaging, comprehensive introduction to how articulation works and how it is investigated in the laboratory. This textbook fills an important gap in our training of phoneticians and speech scientists.”

Patrice Beddor, University of Michigan

“A rich yet approachable source of phonetic information, this new text is well structured, well designed, and full of original diagrams.”

James Scobbie, Queen Margaret University

Title page

List of Figures

Figure 1.1 Feed-forward, auditory-only speech chain (image by W. Murphey and A. Yeung).
Figure 1.2 Multimodal speech chain with feedback loops (image by W. Murphey and A. Yeung).
Figure 1.3 Speech production chain; the first half (left) takes you through Part I of the book, and the second half (right) covers Part II (image by D. Derrick and W. Murphey).
Figure 1.4 Anatomy overview: full body (left), vocal tract (right) (image by D. Derrick).
Figure 1.5 Anatomical planes and spatial relationships: full body (left), vocal tract (right) (image by D. Derrick).
Figure 1.6a Measurement Tools for Articulatory Phonetics (image by D. Derrick).
Figure 1.6b Measurement Tools for Articulatory Phonetics (image by D. Derrick).
Figure 2.1 Central nervous system (CNS) versus peripheral nervous system: coronal view with sagittal head view (PNS) (image by A. Klenin).
Figure 2.2 A myelinated neuron (image by D. Derrick).
Figure 2.3 An action potential and its chemical reactions (image by D. Derrick).
Figure 2.4a Gross anatomy of the brain: left side view of gyri, sulci, and lobes (image by A. Yeung).
Figure 2.4b Gross anatomy of the brain: top view (image by E. Komova).
Figure 2.5 The perisylvian language zone of the brain: left side view (image by D. Derrick and A. Yeung).
Figure 2.6 Motor cortex, somatosensory cortex, and visual cortex of the brain: left side view (image by D. Derrick and A. Yeung).
Figure 2.7 Sensory and motor homunculi: coronal view of brain (image adapted from Penfield and Rasmussen, 1950, Wikimedia Commons public domain).
Figure 2.8 Deeper structures of the brain: left side view (image by D. Derrick and A. Yeung).
Figure 2.9 Structural MRI image with fMRI overlay of areas of activation (in white): sagittal (top left), coronal (top right), transverse (bottom) (image by D. Derrick, with data from R. Watts).
Figure 3.1 Cranial nerves (left to right and top to bottom: Accessory, Vagus, Glossopharyngeal, Trigeminal, Hypoglossal, and Facial) (image by W. Murphey).
Figure 3.2 Spinal nerves: posterior view (image by W. Murphey and A. Yeung).
Figure 3.3 Muscle bundles (image by D. Derrick, from United States government public domain).
Figure 3.4 Motor unit and muscle fibers (image by D. Derrick).
Figure 3.5 Sliding filament model (image by D. Derrick).
Figure 3.6a EMG signal of the left sternocleidomastoid muscle during startle response. On the left is the raw signal. In the center is that same raw signal rectified. On the right is the rectified image after low-pass filtering (image by D. Derrick, from data provided by C. Chiu).
Figure 3.6b EMG signal of the left sternocleidomastoid muscle during startle response. On the left are 5 raw signals. The top-right image shows those 5 signals after they have been rectified and then averaged. The bottom-right image is after low-pass filtering (image by D. Derrick, from data provided by C. Chiu).
Figure 3.7 Reaction and response times based on EMG of the lower lip and lower lip displacement (image by C. Chiu and D. Derrick).
Figure 4.1 A bellows (based on public domain image by Pearson Scott Foresman).
Figure 4.2 Respiration volumes and graphs for tidal breathing, speech breathing, and maximum breathing (See Marieb and Hoehn, 2010) (image by D. Derrick).
Figure 4.3 Overview of the respiratory system: coronal cross-section (image by D. Derrick).
Figure 4.4 Collapsed lung: coronal cross-section (image by D. Derrick).
Figure 4.5a Bones and cartilages of respiration: anterior view (image by A. Yeung).
Figure 4.5b Superior view of vertebra and rib (with right side view of vertebra in the center) (image by D. Derrick).
Figure 4.6 Pump handle and bucket handle motions of the ribcage: anterior view (left), right side view (right) (image by D. Derrick, E. Komova, W. Murphey, and A. Yeung).
Figure 4.7 Diaphragm: anterior view (image by W. Murphey and A. Yeung).
Figure 4.8 Vectors of the EI and III muscles: anterior view (image by W. Murphey and A. Yeung).
Figure 4.9 Some accessory muscles of inspiration: posterior view (image by W. Murphey and A. Yeung).
Figure 4.10 II muscle vectors with EI muscle overlay: right side view (image by W. Murphey and A. Yeung).
Figure 4.11 Abdominal muscles: anterior view (image by W. Murphey).
Figure 4.12a Accessory muscles of expiration: deep accessory muscles, posterior view (image by W. Murphey and A. Yeung).
Figure 4.12b Accessory muscles of expiration: shallow accessory muscles, posterior view (image by W. Murphey and A. Yeung).
Figure 4.13 The respiratory cycle and muscle activation (image by D. Derrick, based in part on ideas in Ladefoged, 1967).
Figure 4.14 Acoustic waveform, oral airflow, oral air pressure, and nasal airflow (top to bottom) as measured by an airflow meter (image by D. Derrick).
Figure 5.1a Major cartilages of the larynx: anterior view (top left), posterior view (top right), right side view (bottom left), top view (bottom right) (image by D. Derrick).
Figure 5.1b Larynx: top view (image by A. Yeung).
Figure 5.2 Internal structure of a vocal fold: coronal cross-section (image by W. Murphey).
Figure 5.3 Coronal cross-section of the vocal folds and false vocal folds (image by W. Murphey).
Figure 5.4 Intrinsic muscles of the larynx: top view (left), posterior view (right) (image by W. Murphey and A. Yeung).
Figure 5.5 Intrinsic muscles of the larynx: right side view (image by E. Komova, based on Gray, 1918).
Figure 5.6 Schematic showing functions of the laryngeal muscles: top view of the larynx (image by D. Derrick).
Figure 5.7 Top view of the glottis (image by E. Komova, based on Rohen et al., 1998).
Figure 5.8 High-speed laryngograph images of modal voice cycle with typical EGG: top view. Note that the EGG image does not come from the same person, and has been added for illustrative purposes (image by D. Derrick, full video available on YouTube by tjduetsch).
Figure 5.9 Laminar (above) and turbulent (below) airflow (image by D. Derrick).
Figure 5.10 The Venturi tube (left); demonstrating the Bernoulli effect by blowing over a sheet of paper (right) (image by D. Derrick).
Figure 5.11 A schematic of vocal fold vibration: coronal view (image from Reetz and Jongman, 2008).
Figure 5.12 One-, two-, and three-mass models generate different predictions (dotted lines) for the motion of the vocal folds: coronal view (image by D. Derrick).
Figure 5.13 Flow separation theory: coronal view of the vocal folds (image from Reetz and Jongman, 2008).
Figure 5.14a Muscles of the larynx – pitch control: top view (image by D. Derrick and W. Murphey).
Figure 5.14b Muscles of the larynx – pitch control: right side view (image by D. Derrick and W. Murphey).
Figure 5.15 Comparison of glottal waveform from acoustics and EGG (image by D. Derrick).
Figure 5.16 Paper larynx (image by B. Gick).
Figure 6.1 Skull: right side view (on right), bottom view (on left) showing styloid and mastoid processes (image by A. Yeung).
Figure 6.2 Hyoid bone: isomorphic view (left), anterior view (right) (image by W. Murphey).
Figure 6.3 Pharyngeal constrictor muscles: posterior view (image by D. Derrick and W. Murphey).
Figure 6.4 Infrahyoid muscles: anterior view (image by W. Murphey).
Figure 6.5 Suprahyoid muscles – digastric and stylohyoid: right view (image by W. Murphey).
Figure 6.6 Suprahyoid muscles – mylohyoid and geniohyoid: posterior view (image by W. Murphey).
Figure 6.7 Pharyngeal elevator muscles. The pharyngeal constrictor muscles have been cut away on the left side to show deeper structures. Posterior view (image by D. Derrick and W. Murphey).
Figure 6.8 Breathy voice: one full cycle of vocal fold opening and closure, as seen via high-speed video. Note that the EGG image at the bottom does not come from the same person, and is added for illustrative purposes (image by D. Derrick, full video available on YouTube by tjduetsch).
Figure 6.9 Creaky voice versus modal voice – as seen via a laryngoscope: top view. Note that this speaker’s mechanism for producing creaky voice appears to engage the aryepiglottic folds. Note that the EGG image at the bottom does not come from the same person, and is added for illustrative purposes (image by D. Derrick, with data from Esling and Harris, 2005).
Figure 6.10 Vertical compressing and stretching of the true and false vocal folds, as the larynx is raised and lowered (image by D. Derrick and W. Murphey).
Figure 6.11 Voiceless aryepiglottic trill as seen via high-speed video (image by D. Derrick, with data from Moisik et al., 2010).
Figure 6.12 Falsetto: one full cycle of vocal fold opening and closure, as seen via high-speed video. Note that the EGG image at the bottom does not come from the same person, and is added for illustrative purposes (image by D. Derrick, full video available on YouTube by tjduetsch).
Figure 6.13 Timing of airflow and articulator motion for ingressive sounds: midsagittal cross-section (image by D. Derrick).
Figure 6.14 Timing of airflow and articulator motion for ejective sounds: midsagittal cross-section (image by D. Derrick).
Figure 6.15 Comparison of glottal waveforms from acoustics, PGG, and EGG. Note that the acoustic waveform and EGG data come from the same person but the PGG data were constructed for illustrative purposes (image by D. Derrick).
Figure 7.1 Velopharyngeal port (VPP) and the oropharyngeal isthmus (OPI): right view (see Kuehn and Azzam, 1978) (image by D. Derrick).
Figure 7.2 Skull: right side view (left), bottom view (right) (image by W. Murphey and A. Yeung).
Figure 7.3 Three types of cleft palate (incomplete, unilateral complete, and bilateral complete): bottom view (image from public domain: Felsir).
Figure 7.4 Muscles of the VPP: posterior isometric view (image by D. Derrick and N. Francis).
Figure 7.5 Some locations where sphincters have been described in the vocal tract (image by D. Derrick and B. Gick).
Figure 7.6a Various VPP closure mechanisms: midsagittal view (image by D. Derrick).
Figure 7.6b Various VPP closure mechanisms, superimposed on a transverse cross-section through the VPP (image by D. Derrick, inspired by Biavati et al., 2009).
Figure 7.7 The double archway of the OPI (image by D. Derrick).
Figure 7.8 X-ray film of two French speakers each producing a uvular “r” in a different way: right side view; note that the tongue, velum, and rear pharyngeal wall have been traced in these images (image by D. Derrick and N. Francis, with data from Munhall et al., 1995).
Figure 8.1 Traditional vowel quadrilateral (image from Creative Commons, Kwamikagami, based on International Phonetic Association, 1999).
Figure 8.2 A midsagittal circular tongue model with two degrees of freedom (image by D. Derrick).
Figure 8.3 Full jaw (top left); mandible anterior view (top right); right half of mandible – side view from inside (bottom) (image by W. Murphey).
Figure 8.4 The six degrees of freedom of a rigid body in 3D space (image by D. Derrick).
Figure 8.5 Shallow muscles of the jaw: right side view (image by E. Komova and A. Yeung).
Figure 8.6a Deep muscles of the jaw: left side view (image by E. Komova and A. Yeung).
Figure 8.6b Deep muscles of the jaw: bottom view (image by E. Komova and A. Yeung).
Figure 8.7 Mandible depressors (posterior view on left; anterior view on right) (image by W. Murphey).
Figure 8.8 Extrinsic tongue muscles: right side view. Geniohyoid and mylohyoid are included for context only (image by D. Derrick).
Figure 8.9 Regions of control of the genioglossus (GG) muscle: right midsagittal view (image by D. Derrick).
Figure 8.10 Overlay of tongue positions for three English vowels (high front [i], high back [u], and low back [c08uf005]) from MRI data: right midsagittal view (image by D. Derrick).
Figure 8.11a Ultrasound images of tongue: right midsagittal B mode (image by D. Derrick).
Figure 8.11b Ultrasound images of tongue: right midsagittal B/M mode (image by D. Derrick).
Figure 9.1 Example of a hydrostat (left) dropped onto a hard surface – think “water balloon.” The sponge on the right loses volume (i.e., there is no expansion to the sides as it loses volume vertically), while the hydrostat on the left bulges out to the sides. The ball in the middle is partially hydrostatic – it also bulges out to the sides, but loses some of its volume vertically (image by D. Derrick).
Figure 9.2 Tongue muscles: coronal cross-section through the mid-tongue; the location of the cross-section is indicated by the vertical line through the sagittal tongue image at the bottom (image by D. Derrick, inspired by Strong, 1956).
Figure 9.3a Degrees of anterior tongue constriction for vowel, approximant, fricative, and stop: midsagittal (top) and coronal (bottom) cross-sections are shown; dotted lines indicate location of the coronal cross-sections (image by D. Derrick, W. Murphey, and A. Yeung).
Figure 9.3b Lateral constrictions with medial bracing for a lateral fricative (on the left) and medial constrictions with lateral bracing for schwa (on the right): transverse (top) and midsagittal (bottom) cross-sections are shown; dotted lines indicate location of the transverse cross-sections (image by D. Derrick, W. Murphey, and A. Yeung).
Figure 9.4a Overshoot in lingual stop production: right midsagittal view of vocal tract (image by D. Derrick).
Figure 9.4b Overshoot in flap production: right midsagittal view of vocal tract; the up-arrow and down-arrow tap symbols indicate an upward vs. downward flap motion, after Derrick and Gick (2011) (image by D. Derrick, with x-ray data from Cooper and Abramson, 1960).
Figure 9.5 An electropalate (top) and electropalatography data (bottom); black cells indicate tongue contact on the electropalate (image by D. Derrick).
Figure 9.6a Clay tongue model exercise (image by D. Derrick, B. Gick, and G. Carden).
Figure 9.6b Clay tongue model exercise (image by D. Derrick, B. Gick, and G. Carden).
Figure 10.1 The Tadoma method (image by D. Derrick).
Figure 10.2 Muscles of the lips and face: anterior coronal view (image by W. Murphey).
Figure 10.3 Orbicularis oris muscle: right midsagittal cross-section (left), coronal view (right) (image by D. Derrick).
Figure 10.4 Labial constrictions by type (stop, fricative, approximant). Midsagittal MRI images are shown above, and frontal video images below (image by D. Derrick).
Figure 10.5 Video images of lip constrictions in Norwegian (speaker: S. S. Johnsen).
Figure 10.6 Optotrak: example placement of nine infrared markers (image by D. Derrick).
Figure 10.7 Electromagnetic Articulometer (EMA) marker placement: right midsagittal cross-section (image by D. Derrick).
Figure 11.1 Light and dark English /l/ based on MRI images: right midsagittal cross-section; tracings are overlaid in lower image (image by D. Derrick).
Figure 11.2 Bunched vs. tip-up English “r”: right midsagittal cross-section; tracings are overlaid in lower image (image by D. Derrick).
Figure 11.3a Schematic of a click produced by Jenggu Rooi Fransisko, a speaker of Mangetti Dune !Xung: right midsagittal cross-section (image by D. Derrick, with data supplied by A. Miller).
Figure 11.3b   Tracings of the tongue during a palatal click [‡] produced by Jenggu Rooi Fransisko, a speaker of Mangetti Dune !Xung: right midsagittal cross-section (image by D. Derrick, with data supplied by A. Miller).
Figure 11.3c Tracings of the tongue during an alveolar click [!] produced by Jenggu Rooi Fransisko, a speaker of Mangetti Dune !Xung: right midsagittal cross-section (image by D. Derrick, with data supplied by A. Miller).
Figure 11.4 Non-labialized schwa (top) versus labialized [w] (bottom): right midsagittal cross-section (image by D. Derrick).
Figure 11.5 Vocal tract slices (in black), showing cross-sectional area of the vocal tract during production of schwa (image by D. Derrick).
Figure 11.6 CT image of the vocal tract. The bright objects visible in the bottom-center and bottom-right are an ultrasound probe and a chinrest, respectively; this image was taken during an experiment that used simultaneous CT and ultrasound imaging: right midsagittal cross-section (image by D. Derrick).
Figure 11.7 MRI image of the vocal tract: right midsagittal cross-section (image by D. Derrick).

Acknowledgments

We owe thanks to many people without whom this book might not exist.

For the beautiful images, thanks to Chenhao Chiu, Anna Klenin, Ekaterina Komova, Naomi Francis, Andrea Yeung, and especially to Winifred Murphey. Thanks to J. W. Rohen, C. Yokochi, and E. Lütjen-Drecoll for their Color Atlas of Anatomy, the photographs from which inspired us, and to W. R. Zemlin for his excellent work, Speech and Hearing Science. A special thanks to the countless authors and creators who produced the publicly available articles and videos that have inspired us. These resources and the detailed sourcing they provided saved countless hours of effort on our part, and made this textbook better than it would have been otherwise.

There is a great deal of original data in this book for which we owe credit to many contributors. Thanks to Dr Sayoko Takano and Mr Ichiro Fujimoto at ATR (Kyoto, Japan) for the MRI images of Ian Wilson, to Dr John Houde at the University of California (San Francisco) for the MRI images of Donald Derrick, and to Dr Elaine Orpe at the University of British Columbia School of Dentistry for the CT images of Ian Wilson. We are grateful to Dr Sverre Stausland Johnsen for the image of Norwegian lip rounding in Chapter 10, to Dr Richard Watts for fMRI data used in Figure 2.9, and to Chenhao Chiu for EMG data used in Figure 3.6. Many thanks to Dr Amanda Miller for the click data upon which the click images in Chapter 11 are based. Special thanks to Jenggu Rooi Fransisko, speaker of Mangetti Dune !Xung, whose speech provided the information for those click images. We are grateful to Scott Moisik, Dr John Esling, and colleagues for the use of their high-speed laryngoscope data in Chapters 5 and 6, and to Dr Penelope Bacsfalvi for her helpful input on cleft palate speech in Chapter 7.

We are grateful to the phonetics students at the 2009 LSA Linguistic Institute at UC Berkeley, and to generations of Linguistics and Speech Science students at the University of British Columbia (Department of Linguistics) for patiently working with earlier, rougher versions of this book. In particular, thanks to the UBC Linguistics 316 students for helpful feedback, and many thanks to Taucha Gretzinger and Priscilla Riegel for detailed comments on the very early drafts of the textbook.

Thanks to Dr Guy Carden for teaching all of us about speech science and for developing the original clay tongue exercise we have modified for this text, and to Vocal Process (UK) for the idea of creating a paper larynx exercise. Special thanks to Chenhao Chiu for pulling together the chapter assignments from various sources and cleaning them up, and for contributions and suggestions throughout the text.

We couldn’t have done it without Danielle Descoteaux and Julia Kirk at Wiley-Blackwell, as well as the anonymous reviewers. This book is the result of a dynamic collaborative process; we the authors jointly reserve responsibility for all inconsistencies, oversights, errors, and omissions.

At many junctures in this book there simply were not satisfying answers to even quite basic questions. As such, we have done a good deal of original research to confirm or underscore many points in this book. Part of this research was funded by a Discovery Grant from the Natural Sciences and Engineering Council of Canada (NSERC) to Bryan Gick, by National Institutes of Health (NIH) Grant DC-02717 to Haskins Laboratories, and by Japan Society for the Promotion of Science (JSPS) “kakenhi” grant 19520355 to Ian Wilson.

Finally, to our families and loved ones who have sacrificed so many hours together and provided support in so many ways in the creation of this textbook … Thank you!

Introduction

The goal of this book is to provide a short, non-technical introduction to articulatory phonetics. We focus especially on (1) the basic anatomy and physiology of speech, (2) how different kinds of speech sounds are made, and (3) how to measure the vocal tract to learn about these speech sounds. This book was conceived of and written to function as a companion to Keith Johnson’s Acoustic and Auditory Phonetics (also published by Wiley-Blackwell). It is intended as a supplement or follow-up to a general introduction to phonetics or speech science for students of linguistic phonetics, speech science, and the psychology of speech.

Part I of this book, entitled “Getting to Sounds,” leads the reader through the speech production system up to the point where simple vocal sounds are produced. Chapter 1 introduces the speech chain and basic terms and concepts that will be useful in the rest of the book; this chapter also introduces anatomical terminology and an overview of tools used to measure anatomy. Chapters 2 and 3 walk the reader from thought to action, starting with the brain in Chapter 2, following through the peripheral nervous system and ending with muscle movement in Chapter 3. Chapter 4 continues from muscle action to airflow, describing respiratory anatomy and physiology. Chapter 5 moves from airflow to sound by describing laryngeal anatomy and physiology and introducing basic phonation.

Part II, entitled “Articulating Sounds,” continues through the speech system, introducing more anatomy and tools along the way, but giving more focus to particular sounds of speech. Chapter 6 introduces more advanced phonation types and airstream mechanisms, and describes the hyoid bone and supporting muscles. Chapter 7 introduces the na­­sopharynx, skull, and palate, and the sphincter mechanisms that allow the description of velic sounds. Chapter 8 describes how vowel sounds are made, introducing the jaw and jaw muscles, and the extrinsic muscles of the tongue, with special emphasis on hydrostatics and the inverse problem of speech. Chapter 9 describes how lingual consonant sounds are made, introducing the intrinsic muscles of the tongue and the concepts of ballistics, overshoot, and constriction degree and location. Chapter 10 covers labial sounds, introducing lip and face anatomy and the visual modality in speech. Chapter 11 wraps up by considering what happens when we combine the articulations discussed throughout the book. It starts by talking about context-sensitive versus context-invariant models of coordinating sounds, describes complex sounds including liquids and clicks, and finishes with coarticulation. At the very end of the book, there is a list of all abbreviations used, as well as a table of muscles with their innervations and attachment points. While this book follows a logical flow, it is possible to cover some parts in a different order. In particular, Chapter 2, which deals with the brain, is designed so that it can be read either in the order presented, or at the end of the book.

While there is very little math in this textbook, many of the questions and assignments at the end of each chapter and in the online material (www.wiley.com/go/articulatoryphonetics) require making measurements, and often require a basic knowledge of descriptive statistics and the use of t-tests. Students who study articulatory phonetics are strongly encouraged to study statistics for psychological or motor behavior experiments, as our field often requires the use of more complex statistics such as those offered in more advanced courses.

One important note about this book: some traditions identify articulatory phonetics with a general description of how sounds are made, or with a focus on recognizing, producing or transcribing sounds using systems such as the International Phonetic Alphabet (IPA). We do not. Rather, this textbook sets out to give students the basic content and conceptual grounding they will need to understand how articulation works, and to navigate the kinds of research conducted by practitioners in the field of articulatory phonetics. IPA symbols are used throughout the text, and students are expected to use other sources to learn how to pronounce such sounds and transcribe acoustic data.


Semi-Related Stuff in Boxes
Our textbook includes semi-related topics in gray boxes. These boxes give us a place to point out some of the many interesting side notes relating to articulatory phonetics that we would otherwise not have the space to cover. Don’t be surprised if you find that the boxes contain some of the most interesting material in this book.

Part I

Getting to Sounds

Chapter 1

The Speech System and Basic Anatomy

Sound is movement. You can see or feel an object even if it – and everything around it – is perfectly still, but you can only hear an object when it moves. When things move, they sometimes create disturbances in the surrounding air that can, in turn, move the eardrum, giving us the sensation of hearing (Keith Johnson’s Acoustic and Auditory Phonetics discusses this topic in detail). In order to understand the sounds of speech (the central goal of phonetics as a whole), we must first understand how the different parts of the human body move to produce those sounds (the central goal of articulatory phonetics).

This chapter describes the roadmap we follow in this book, as well as some of the background basics you’ll need to know.

1.1 The Speech Chain

Traditionally, scientists have described the process of producing and perceiving speech in terms of a mostly feed-forward system, represented by a linear speech chain (Denes and Pinson, 1993). A feed-forward system is one in which a plan (in this case a speech plan) is constructed and carried out, without paying attention to the results. If you were to draw a map of a feed-forward system, all the arrows would go in one direction (see Figure 1.1).

Figure 1.1 Feed-forward, auditory-only speech chain

(image by W. Murphey and A. Yeung).

c01f001

Thus, in a feed-forward speech chain model, a speaker’s thoughts are converted into linguistic representations, which are organized into vocal tract movements – articulations – that produce acoustic output. A listener can then pick up this acoustic signal through hearing, or audition, after which it is perceived by the brain, converted into abstract linguistic representations and, finally, meaning.

Although the simplicity of a feed-forward model is appealing, we know that producing speech is not strictly linear and unidirectional. Rather, when we speak, we are also constantly monitoring and adjusting what we’re doing as we move along the chain. We do this by using our senses to perceive what we are doing. This is called feedback. In a feedback system, control is based on observed results, rather than on a predetermined plan. The relationship between feedforward and feedback control in speech is complex. Also, speech perception feedback is multimodal. That is, we use not just our sense of hearing when we perceive and produce speech, but all of our sense modalities – even some you may not have heard of before. Thus, while the speech chain as a whole is generally linear, each link in the chain – and each step in the process of speech communication – is a loop (see Figure 1.2). We can think of each link of the chain as a feedback loop.

Figure 1.2 Multimodal speech chain with feedback loops

(image by W. Murphey and A. Yeung).

c01f002

Multimodality and Feedback
Speech production uses many different sensory mechanisms for feedback. The most commonly known feedback in speech is auditory feedback, though many senses are important in providing feedback in speech.
Speech is often thought of largely in terms of sound. Sound is indeed an efficient medium for sharing information: it can be disconnected from its source, can travel a long distance through and around objects, and so on. As such, sound is a powerful modality for communication. Likewise, auditory feedback from sound provides a speaker with a constant flow of feedback about his or her speech.
Speech can also be perceived visually, by watching movements of the face and body. However, because one cannot normally see oneself speaking, vision is of little use for providing speech feedback from one’s own articulators.
The tactile, or touch, senses can also be used to perceive speech. For example, perceivers are able to pick up vibrotactile and aero-tactile information from others’ vibrations and airflow, respectively. Tactile information from one’s own body can also be used as feedback. A related sense is the sense of proprioception (also known as kinesthetic sense), or the sense of body position and movement. The senses of touch and proprioception are often combined under the single term haptic (Greek, “grasp”).

1.1.1 The Speech Production Chain

Because this textbook is about articulatory phonetics, we’ll focus mainly on the first part of the speech chain, just up to where speech sounds leave the mouth. This part of the chain has been called the speech production chain (see Figure 1.3). For simplicity’s sake, this book will use a roadmap that follows along this feed-forward model of speech production, starting with the brain and moving in turn through the processes involved in making different speech sounds.

Figure 1.3 Speech production chain; the first half (left) takes you through Part I of the book, and the second half (right) covers Part II

(image by D. Derrick and W. Murphey).

c01f003

This is often how we think of speech: our brains come up with a speech plan, which is then sent through our bodies as nerve impulses. These nerve impulses reach muscles, causing them to contract. Muscle movements expand and contract our lungs, allowing us to move air. This air moves through our vocal tract, which we can shape with more muscle movements. By changing the shape of our vocal tract, we can block or release airflow, create vibrations or turbulence, change frequencies or resonances, and so on, all of which produce different speech sounds. The sound, air, vibrations and movements we produce through these actions can then be perceived by ourselves (through feedback) or by other people as speech.

1.2 The Building Blocks of Articulatory Phonetics

The field of articulatory phonetics is all about the movements we make when we speak. So, in order to understand articulatory phonetics, you’ll need to learn a good deal of anatomy. Figure 1.4 shows an overview of speech production anatomy. The speech production chain begins with the brain and other parts of the nervous system, and continues with the respiratory system, composed of the ribcage, lungs, trachea, and all the supporting muscles. Above the trachea is the larynx, and above that the pharynx, which is divided into the laryngeal, oral, and nasal parts. The upper vocal tract includes the nasal passages, and also the oral passage, which includes structures of the mouth such as the tongue and palate. The oral passage opens to the teeth and lips. The face is also intricately connected to the rest of the vocal tract, and is an important part of the visual and tactile communication of speech.

Figure 1.4 Anatomy overview: full body (left), vocal tract (right)

(image by D. Derrick).

c01f004

Scientists use many terms to describe anatomical structures, and anatomy diagrams often represent anatomical information along two-dimensional slices or planes (see Figure 1.5). A midsagittal plane divides a body down the middle into two halves: dextrad (Latin, “rightward”) and sinistrad (Latin, “leftward”). The two axes of the sagittal plane are (a) vertical and (b) anterior-posterior. Midsagittal slices run down the midline of the body and are the most common cross-sections seen in articulatory phonetics. Structures near this midline are called medial or mesial, and structures along the edge are called lateral.

Figure 1.5 Anatomical planes and spatial relationships: full body (left), vocal tract (right)

(image by D. Derrick).

c01f005

Coronal slices cut the body into anterior (front) and posterior (back) parts. The two axes of the coronal plane are (a) vertical and (b) side-to-side.

The transverse plane is horizontal, and cuts a body into superior (top) and inferior (bottom) parts.

The direction of the head is cranial or cephalad, and the direction of the tail is caudal. Also, ventral refers to the belly, and dorsal refers to the back. So, for creatures like humans that stand in an upright position, ventral is equivalent to anterior, and dorsal is equivalent to posterior. There are also terms that refer to locations relative to a center point rather than planes. Areas closer to the trunk are called proximal, while areas away from the trunk, like hands and feet, are distal.

Finally, structures in the body can also be described in terms of depth, with superficial structures being nearer the skin surface, and deep structures being closer to the center of the body.

1.2.1 Materials in the Body

Anatomical structures are made up of several materials. Nerves make up the nervous system, and will be discussed in Chapters 2 and 3. As we are mostly interested in movement in this book, though, we’ll mainly be learning about bones and muscles.

The “hard parts” of the body are made up of bones and cartilages. Bony, or osseous material is the hardest. The skull, ribs, and vertebrae are all composed of bone. These bones form the support structure of the vocal tract. Cartilaginous or chondral (Greek, “cartilage, grain”) material is composed of semi-flexible material called cartilage. Cartilage is what makes up the stiff but flexible parts you can feel in your ears and nose. The larynx and ribcage contain several important cartilages for speech. Bones and cartilages are also the “hard parts” in the sense that you need to memorize their names, whereas most muscles are just named according to which hard parts they are attached to. For these reasons, we usually learn about the hard parts first, and then proceed to the muscles.

Muscles are made up of long strings of cells that have the specialized ability to contract (we’ll look in more detail at how muscles work in Chapter 3). The word “muscle” comes from the Latin musculus, meaning “little mouse” (when you look at your biceps, it’s not hard to imagine it’s a little mouse moving under your skin!). In this textbook we’ll study only striated, or skeletal, muscles. Striated muscles are often named by combining their origin, which is the larger, unmoving structure (usually a bone) to which they attach, and their insertion, which is usually the part that moves most when a muscle contracts. Many muscle names take the form “origin-insertion.” For example, the muscle that originates at the palate and inserts in the pharynx is called the “palatopharyngeus” muscle (this is a real muscle you’ll learn about later)! As you can see, if you know the origin and insertion of a muscle, then in many cases, you can guess its name.

Muscles seldom act alone. Most of the time, they interact in agonist-antagonist pairs. The agonist produces the main movement of an articulator, while the antagonist pulls in the opposite direction, lending control to the primary movement. Other muscles may also act as synergists. A synergist does not create movement, but lends stability to the system by preventing other unwanted motion.

Depending on whether a muscle is attached closer to or farther from a joint, it can have a higher or lower mechanical advantage. A muscle attached farther from a joint has a higher mechanical advantage, giving the muscle greater strength, but less speed and a smaller range of motion. A muscle attached closer to a joint has a lower mechanical advantage, reducing power but increasing speed and range of motion.

1.3 The Tools of Articulatory Phonetics

Scientists interested in articulatory phonetics use a wide array of tools to track and measure the movements of articulators. Some of these tools are shown in Figures 1.6a and 1.6b, including examples of data obtained using each tool. Each tool has important advantages and disadvantages, including time and space resolution, subject comfort, availability and setup time, data storage and analysis, and expense. The issues related to each tool are discussed in depth in their respective chapters.

Figure 1.6a Measurement Tools for Articulatory Phonetics

(image by D. Derrick).

c01f006a

Figure 1.6b Measurement Tools for Articulatory Phonetics

(image by D. Derrick).

c01f006b

Decades ago, when graphical computer games first came out, the objects on the screen made choppy movements and looked blocky. Their movements were choppy because it took a long time for early computers to redraw images on the screen, indicating poor temporal resolution. Temporal resolution is a term for how often an event happens, such as how often a recording, or sample, is taken. The term “temporal resolution” is often interchangeable with “sampling rate,” and is often measured in samples per second, or Hertz (Hz). The “blocky” look was because only a few square pixels were used to represent an object, indicating low spatial resolution. From a measurement point of view, spatial resolution can be thought of as a term for how accurately you can identify or represent a specific location in space. Temporal and spatial resolution both draw on a computer’s memory, as one involves recording more detail in time and the other requires recording more detail in space. Because of this, temporal and spatial resolution normally trade off, such that when one increases, the other will often decrease.

Many of the data-recording tools we’ll look at in this book are ones that researchers use to measure anatomy relatively directly: imaging devices and point-tracking devices. Imaging