Speech Production

  • Reference work entry
  • First Online: 01 January 2015
  • pp 1493–1498
  • Cite this reference work entry

what is meaning speech production

  • Laura Docio-Fernandez 3 &
  • Carmen García Mateo 4  

1252 Accesses

3 Altmetric

Sound generation; Speech system

Speech production is the process of uttering articulated sounds or words, i.e., how humans generate meaningful speech. It is a complex feedback process in which hearing, perception, and information processing in the nervous system and the brain are also involved.

Speaking is in essence the by-product of a necessary bodily process, the expulsion from the lungs of air charged with carbon dioxide after it has fulfilled its function in respiration. Most of the time, one breathes out silently; but it is possible, by contracting and relaxing the vocal tract, to change the characteristics of the air expelled from the lungs.

Introduction

Speech is one of the most natural forms of communication for human beings. Researchers in speech technology are working on developing systems with the ability to understand speech and speak with a human being.

Human-computer interaction is a discipline concerned with the design, evaluation, and implementation...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

T. Hewett, R. Baecker, S. Card, T. Carey, J. Gasen, M. Mantei, G. Perlman, G. Strong, W. Verplank, Chapter 2: Human-computer interaction, in ACM SIGCHI Curricula for Human-Computer Interaction ed. by B. Hefley (ACM, 2007)

Google Scholar  

G. Fant, Acoustic Theory of Speech Production , 1st edn. (Mouton, The Hague, 1960)

G. Fant, Glottal flow: models and interaction. J. Phon. 14 , 393–399 (1986)

R.D. Kent, S.G. Adams, G.S. Turner, Models of speech production, in Principles of Experimental Phonetics , ed. by N.J. Lass (Mosby, St. Louis, 1996), pp. 2–45

T.L. Burrows, Speech Processing with Linear and Neural Network Models (1996)

J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals , 1st edn. (Macmillan, New York, 1993)

Download references

Author information

Authors and affiliations.

Department of Signal Theory and Communications, University of Vigo, Vigo, Spain

Laura Docio-Fernandez

Atlantic Research Center for Information and Communication Technologies, University of Vigo, Pontevedra, Spain

Carmen García Mateo

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Center for Biometrics and Security, Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Departments of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA

Anil K. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this entry

Cite this entry.

Docio-Fernandez, L., García Mateo, C. (2015). Speech Production. In: Li, S.Z., Jain, A.K. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7488-4_199

Download citation

DOI : https://doi.org/10.1007/978-1-4899-7488-4_199

Published : 03 July 2015

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4899-7487-7

Online ISBN : 978-1-4899-7488-4

eBook Packages : Computer Science Reference Module Computer Science and Engineering

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

BASLP COURSE

Overview of Speech Production and Speech Mechanism

by BASLPCOURSE.COM

Overview of Speech Production and Speech Mechanism: Communication is a fundamental aspect of human interaction, and speech production is at the heart of this process. Behind every spoken word lies a series of intricate steps that allow us to convey our thoughts and ideas effectively. Speech production involves three essential levels: conceptualization, formulation, and articulation. In this article, we will explore each level and understand how they contribute to the seamless flow of communication.  

Overview of Speech Production  

Speech Production deals in 3 levels:  

Conceptualization  

Formulation  , articulation  .

Speech production is a remarkable process that involves multiple intricate levels. From the initial conceptualization of ideas to their formulation into linguistic forms and the precise articulation of sounds, each stage plays a vital role in effective communication. Understanding these levels helps us appreciate the complexity of human speech and the incredible coordination between the brain and the vocal tract. By honing our speech production skills, we can become more effective communicators and forge stronger connections with others.

Steps-of-Speech-Production

Steps of Speech Production

Conceptualization is the first level of speech production, where ideas and thoughts are born in the mind. At this stage, a person identifies the message they want to convey, decides on the key points, and organizes the information in a coherent manner. This process is highly cognitive and involves accessing knowledge, memories, and emotions related to the topic.  

During conceptualization, the brain’s language centers, such as the Broca’s area and Wernicke’s area, play a crucial role. The Broca’s area is involved in the planning and sequencing of speech, while the Wernicke’s area is responsible for understanding and accessing linguistic information.  

For example, when preparing to give a presentation, the conceptualization phase involves structuring the content logically, identifying the main ideas, and determining the tone and purpose of the speech.  

The formulation stage follows conceptualization and involves transforming abstract thoughts and ideas into linguistic forms. In this stage, the brain converts the intended message into grammatically correct sentences and phrases. The formulation process requires selecting appropriate words, arranging them in a meaningful sequence, and applying the rules of grammar and syntax.  

At the formulation level, the brain engages the motor cortex and the areas responsible for language production. These regions work together to plan the motor movements required for speech.  

During formulation, individuals may face challenges, such as word-finding difficulties or grammatical errors. However, with practice and language exposure, these difficulties can be minimized.  

Continuing with the previous example of a presentation, during the formulation phase, the speaker translates the organized ideas into spoken language, ensuring that the sentences are clear and coherent.  

Articulation is the final level of speech production, where the formulated linguistic message is physically produced and delivered. This stage involves the precise coordination of the articulatory organs, such as the tongue, lips, jaw, and vocal cords, to create the specific sounds and speech patterns of the chosen language.  

Smooth and accurate articulation is essential for clear communication. Proper articulation ensures that speech sounds are recognizable and intelligible to the listener. Articulation difficulties can lead to mispronunciations or speech disorders , impacting effective communication.  

In the articulation phase, the motor cortex sends signals to the speech muscles, guiding their movements to produce the intended sounds. The brain continuously monitors and adjusts these movements to maintain the fluency of speech.  

For instance, during the presentation, the speaker’s articulation comes into play as they deliver each sentence, ensuring that their words are pronounced correctly and clearly.  

Overview of Speech Mechanism

speech-sub-system

Speech Sub-system

The speech mechanism is a complex and intricate process that enables us to produce and comprehend speech. The speech mechanism involves a coordinated effort of speech subsystems working together seamlessly. Speech Mechanism is done by 5 Sub-systems:

  • Respiratory System

Phonatory System

  • Resonatory System
  • Articulatory System

Regulatory System

I. Respiratory System

Respiration: The Foundation of Speech

Speech begins with respiration, where the lungs provide the necessary airflow. The diaphragm and intercostal muscles play a crucial role in controlling the breath, facilitating the production of speech sounds.

II. Phonatory System

Phonation: Generating the Sound Source

Phonation refers to the production of sound by the vocal cords in the larynx. As air from the lungs passes through the vocal cords, they vibrate, creating the fundamental frequency of speech sounds.  

Phonation, in simple terms, refers to the production of sound through the vibration of the vocal folds in the larynx. When air from the lungs passes through the vocal folds, they rapidly open and close, generating vibrations that produce sound waves. These sound waves then resonate in the vocal tract, shaping them into distinct speech sounds.  

The Importance of Phonation in Speech Production  

Phonation is a fundamental aspect of speech production as it forms the basis for vocalization. The process allows us to articulate various speech sounds, control pitch, and modulate our voices to convey emotions and meaning effectively.  

Mechanism of Phonation

Phonatory-System

Vocal Fold Structure  

To understand phonation better, we must examine the structure of the vocal folds. The vocal folds, also known as vocal cords, are situated in the larynx (voice box) and are composed of elastic tissues. They are divided into two pairs, with the true vocal folds responsible for phonation.  

The Process of Phonation  

The process of phonation involves a series of coordinated movements. When we exhale, air is expelled from the lungs, causing the vocal folds to close partially. The buildup of air pressure beneath the closed vocal folds causes them to be pushed open, releasing a burst of air. As the air escapes, the vocal folds quickly close again, repeating the cycle of vibrations, which results in a continuous sound stream during speech.

III. Resonatory System

Resonance: Amplifying the Sound  

The sound produced in the larynx travels through the pharynx, oral cavity, and nasal cavity, where resonance occurs. This amplification process adds richness and depth to the speech sounds.

IV. Articulatory System

Articulation: Shaping Speech Sounds  

Articulation involves the precise movements of the tongue, lips, jaw, and soft palate to shape the sound into recognizable speech sounds or phonemes.  

When we speak, our brain sends signals to the muscles responsible for controlling these speech organs, guiding them to produce different articulatory configurations that result in distinct sounds. For example, to form the sound of the letter “t,” the tongue makes contact with the alveolar ridge (the ridge behind the upper front teeth), momentarily blocking the airflow before releasing it to create the characteristic “t” sound.  

The articulation process is highly complex and allows us to produce a vast array of speech sounds, enabling effective communication. Different languages use different sets of speech sounds, and variations in articulation lead to various accents and dialects.  

Efficient articulation is essential for clear and intelligible speech, and any impairment or deviation in the articulatory process can result in speech disorders or difficulties. Speech therapists often work with individuals who have articulation problems to help them improve their speech and communication skills. Understanding the mechanisms of articulation is crucial in studying linguistics, phonetics, and the science of speech production.  

Articulators are the organs and structures within the vocal tract that are involved in shaping the airflow to produce specific sounds. Here are some of the main articulators and the sounds they help create:

Overview-of-speech-production-and-Speech-mechanism

  • The tongue is one of the most versatile articulators and plays a significant role in shaping speech sounds.  
  • It can move forward and backward, up and down, and touch various parts of the mouth to produce different sounds.  
  • For example, the tip of the tongue is involved in producing sounds like “t,” “d,” “n,” and “l,” while the back of the tongue is used for sounds like “k,” “g,” and “ng.”  
  • The lips are essential for producing labial sounds, which involve the use of the lips to shape the airflow.  
  • Sounds like “p,” “b,” “m,” “f,” and “v” are all labial sounds, where the lips either close or come close together during articulation.  
  • The teeth are involved in producing sounds like “th” as in “think” and “this.”  
  • In these sounds, the tip of the tongue is placed against the upper front teeth, creating a unique airflow pattern.  

Alveolar Ridge:  

  • The alveolar ridge is a small ridge just behind the upper front teeth.  
  • Sounds like “t,” “d,” “s,” “z,” “n,” and “l” involve the tongue making contact with or near the alveolar ridge.  
  • The palate, also known as the roof of the mouth, plays a role in producing sounds like “sh” and “ch.”  
  • These sounds, known as postalveolar or palato-alveolar sounds, involve the tongue articulating against the area just behind the alveolar ridge.  

Velum (Soft Palate):  

  • The velum is the soft part at the back of the mouth.  
  • It is raised to close off the nasal cavity during the production of non-nasal sounds like “p,” “b,” “t,” and “d” and lowered to allow airflow through the nose for nasal sounds like “m,” “n,” and “ng.”  
  • The glottis is the space between the vocal cords in the larynx.  
  • It plays a role in producing sounds like “h,” where the vocal cords remain open, allowing the airflow to pass through without obstruction.  

By combining the movements and positions of these articulators, we can produce the vast range of speech sounds used in different languages around the world. Understanding the role of articulators is fundamental to the study of phonetics and speech production .

V. Regulatory system

regulatory-system

Regulation: The Role of the Brain and Nervous System  

The brain plays a pivotal role in controlling and coordinating the speech mechanism.  

Broca’s Area: The Seat of Speech Production  

Located in the left frontal lobe, Broca’s area is responsible for speech production and motor planning for speech movements.  

Wernicke’s Area: Understanding Spoken Language  

Found in the left temporal lobe, Wernicke’s area is crucial for understanding spoken language and processing its meaning.  

Arcuate Fasciculus: Connecting Broca’s and Wernicke’s Areas  

The arcuate fasciculus is a bundle of nerve fibers that connects Broca’s and Wernicke’s areas, facilitating communication between speech production and comprehension centers.  

Motor Cortex: Executing Speech Movements  

The motor cortex controls the muscles involved in speech production, translating neural signals into precise motor movements.

References :

  • Speech Science Primer (Sixth Edition) Lawrence J. Raphael, Gloria J. Borden, Katherine S. Harris [Book]
  • Manual on Developing Communication Skill in Mentally Retarded Persons T.A. Subba Rao [Book]
  • SPEECH CORRECTION An Introduction to Speech Pathology and Audiology 9th Edition Charles Van Riper [Book]

You are reading about:

Share this:.

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Telegram (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • Click to print (Opens in new window)

If you have any Suggestion or Question Please Leave a Reply Cancel reply

Recent articles.

  • Types of Earmolds for Hearing Aid – Skeleton | Custom
  • Procedure for Selecting Earmold and Earshell
  • Ear Impression Techniques for Earmolds and Earshells
  • Speech and Language Disability Percentage Calculation 
  • Frenchay Dysarthria Assessment 2 (FDA 2): Scoring | Interpretation

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Introduction

Respiratory mechanisms

Brain functions.

  • Cartilages of the larynx
  • Extrinsic muscles
  • Intrinsic muscles
  • Vocal cords
  • Esophageal voice
  • Artificial larynx
  • The basic registers
  • Studies of register differences
  • Vocal frequency
  • Voice types
  • Vocal ranges
  • Harmonic structure
  • Vocal styles
  • Individual voice quality
  • Singing and speaking
  • Synthetic production of speech sounds

Uncover the science behind the transformation of sounds into speech

  • What did Martin Luther King, Jr., do?
  • What is Martin Luther King, Jr., known for?
  • Who did Martin Luther King, Jr., influence and in what ways?
  • What was Martin Luther King’s family life like?
  • How did Martin Luther King, Jr., die?

Spike Lee at the 2007 Primetime Creative Arts Emmy Awards. Shrine Auditorium, Los Angeles, California

Our editors will review what you’ve submitted and determine whether to revise the article.

  • American Speech-Language-Hearing Association - What is Speech? What is Language?
  • Institute for Natural Language Processing - Voice quality: description and classification
  • speech - Children's Encyclopedia (Ages 8-11)
  • speech - Student Encyclopedia (Ages 11 and up)
  • Table Of Contents

speech , human communication through spoken language . Although many animals possess voices of various types and inflectional capabilities, humans have learned to modulate their voices by articulating the laryngeal tones into audible oral speech.

The regulators

Uncover the science behind the transformation of sounds into speech

Human speech is served by a bellows-like respiratory activator, which furnishes the driving energy in the form of an airstream; a phonating sound generator in the larynx (low in the throat) to transform the energy; a sound-molding resonator in the pharynx (higher in the throat), where the individual voice pattern is shaped; and a speech-forming articulator in the oral cavity ( mouth ). Normally, but not necessarily, the four structures function in close coordination. Audible speech without any voice is possible during toneless whisper , and there can be phonation without oral articulation as in some aspects of yodeling that depend on pharyngeal and laryngeal changes. Silent articulation without breath and voice may be used for lipreading .

An early achievement in experimental phonetics at about the end of the 19th century was a description of the differences between quiet breathing and phonic (speaking) respiration. An individual typically breathes approximately 18 to 20 times per minute during rest and much more frequently during periods of strenuous effort. Quiet respiration at rest as well as deep respiration during physical exertion are characterized by symmetry and synchrony of inhalation ( inspiration ) and exhalation ( expiration ). Inspiration and expiration are equally long, equally deep, and transport the same amount of air during the same period of time, approximately half a litre (one pint) of air per breath at rest in most adults. Recordings (made with a device called a pneumograph) of respiratory movements during rest depict a curve in which peaks are followed by valleys in fairly regular alternation.

Phonic respiration is different; inhalation is much deeper than it is during rest and much more rapid. After one takes this deep breath (one or two litres of air), phonic exhalation proceeds slowly and fairly regularly for as long as the spoken utterance lasts. Trained speakers and singers are able to phonate on one breath for at least 30 seconds, often for as much as 45 seconds, and exceptionally up to one minute. The period during which one can hold a tone on one breath with moderate effort is called the maximum phonation time; this potential depends on such factors as body physiology, state of health, age, body size, physical training, and the competence of the laryngeal voice generator—that is, the ability of the glottis (the vocal cords and the opening between them) to convert the moving energy of the breath stream into audible sound. A marked reduction in phonation time is characteristic of all the laryngeal diseases and disorders that weaken the precision of glottal closure, in which the cords (vocal folds) come close together, for phonation.

YOLO "You Only Live Once" written in bright colors and repeated on a purple background (acronym, slang)

Respiratory movements when one is awake and asleep, at rest and at work, silent and speaking are under constant regulation by the nervous system . Specific respiratory centres within the brain stem regulate the details of respiratory mechanics according to the body needs of the moment. Conversely, the impact of emotions is heard immediately in the manner in which respiration drives the phonic generator; the timid voice of fear, the barking voice of fury, the feeble monotony of melancholy , or the raucous vehemence during agitation are examples. Conversely, many organic diseases of the nervous system or of the breathing mechanism are projected in the sound of the sufferer’s voice. Some forms of nervous system disease make the voice sound tremulous; the voice of the asthmatic sounds laboured and short winded; certain types of disease affecting a part of the brain called the cerebellum cause respiration to be forced and strained so that the voice becomes extremely low and grunting. Such observations have led to the traditional practice of prescribing that vocal education begin with exercises in proper breathing.

The mechanism of phonic breathing involves three types of respiration: (1) predominantly pectoral breathing (chiefly by elevation of the chest), (2) predominantly abdominal breathing (through marked movements of the abdominal wall), (3) optimal combination of both (with widening of the lower chest). The female uses upper chest respiration predominantly, the male relies primarily on abdominal breathing. Many voice coaches stress the ideal of a mixture of pectoral (chest) and abdominal breathing for economy of movement. Any exaggeration of one particular breathing habit is impractical and may damage the voice.

How does the McGurk effect trick your brain?

The question of what the brain does to make the mouth speak or the hand write is still incompletely understood despite a rapidly growing number of studies by specialists in many sciences, including neurology, psychology , psycholinguistics, neurophysiology, aphasiology, speech pathology , cybernetics, and others. A basic understanding, however, has emerged from such study. In evolution, one of the oldest structures in the brain is the so-called limbic system , which evolved as part of the olfactory (smell) sense. It traverses both hemispheres in a front to back direction, connecting many vitally important brain centres as if it were a basic mainline for the distribution of energy and information. The limbic system involves the so-called reticular activating system (structures in the brain stem), which represents the chief brain mechanism of arousal, such as from sleep or from rest to activity. In humans, all activities of thinking and moving (as expressed by speaking or writing) require the guidance of the brain cortex. Moreover, in humans the functional organization of the cortical regions of the brain is fundamentally distinct from that of other species, resulting in high sensitivity and responsiveness toward harmonic frequencies and sounds with pitch , which characterize human speech and music.

Know Broca's lesion method in mapping brain activity in humans and how studies of brain disorders to the Broca area help evolve the scientific understanding of cognition

In contrast to animals, humans possess several language centres in the dominant brain hemisphere (on the left side in a clearly right-handed person). It was previously thought that left-handers had their dominant hemisphere on the right side, but recent findings tend to show that many left-handed persons have the language centres more equally developed in both hemispheres or that the left side of the brain is indeed dominant. The foot of the third frontal convolution of the brain cortex, called Broca’s area, is involved with motor elaboration of all movements for expressive language. Its destruction through disease or injury causes expressive aphasia , the inability to speak or write. The posterior third of the upper temporal convolution represents Wernicke’s area of receptive speech comprehension. Damage to this area produces receptive aphasia, the inability to understand what is spoken or written as if the patient had never known that language.

Broca’s area surrounds and serves to regulate the function of other brain parts that initiate the complex patterns of bodily movement (somatomotor function) necessary for the performance of a given motor act. Swallowing is an inborn reflex (present at birth) in the somatomotor area for mouth, throat, and larynx. From these cells in the motor cortex of the brain emerge fibres that connect eventually with the cranial and spinal nerves that control the muscles of oral speech.

In the opposite direction, fibres from the inner ear have a first relay station in the so-called acoustic nuclei of the brain stem. From here the impulses from the ear ascend, via various regulating relay stations for the acoustic reflexes and directional hearing, to the cortical projection of the auditory fibres on the upper surface of the superior temporal convolution (on each side of the brain cortex). This is the cortical hearing centre where the effects of sound stimuli seem to become conscious and understandable. Surrounding this audito-sensory area of initial crude recognition, the inner and outer auditopsychic regions spread over the remainder of the temporal lobe of the brain, where sound signals of all kinds appear to be remembered, comprehended, and fully appreciated. Wernicke’s area (the posterior part of the outer auditopsychic region) appears to be uniquely important for the comprehension of speech sounds.

The integrity of these language areas in the cortex seems insufficient for the smooth production and reception of language. The cortical centres are interconnected with various subcortical areas (deeper within the brain) such as those for emotional integration in the thalamus and for the coordination of movements in the cerebellum (hindbrain).

All creatures regulate their performance instantaneously comparing it with what it was intended to be through so-called feedback mechanisms involving the nervous system. Auditory feedback through the ear, for example, informs the speaker about the pitch, volume, and inflection of his voice, the accuracy of articulation, the selection of the appropriate words, and other audible features of his utterance. Another feedback system through the proprioceptive sense (represented by sensory structures within muscles, tendons, joints, and other moving parts) provides continual information on the position of these parts. Limitations of these systems curtail the quality of speech as observed in pathologic examples (deafness, paralysis , underdevelopment).

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

2.1 How Humans Produce Speech

Phonetics studies human speech. Speech is produced by bringing air from the lungs to the larynx (respiration), where the vocal folds may be held open to allow the air to pass through or may vibrate to make a sound (phonation). The airflow from the lungs is then shaped by the articulators in the mouth and nose (articulation).

Check Yourself

Video script.

The field of phonetics studies the sounds of human speech.  When we study speech sounds we can consider them from two angles.   Acoustic phonetics ,  in addition to being part of linguistics, is also a branch of physics.  It’s concerned with the physical, acoustic properties of the sound waves that we produce.  We’ll talk some about the acoustics of speech sounds, but we’re primarily interested in articulatory phonetics , that is, how we humans use our bodies to produce speech sounds. Producing speech needs three mechanisms.

The first is a source of energy.  Anything that makes a sound needs a source of energy.  For human speech sounds, the air flowing from our lungs provides energy.

The second is a source of the sound:  air flowing from the lungs arrives at the larynx. Put your hand on the front of your throat and gently feel the bony part under your skin.  That’s the front of your larynx . It’s not actually made of bone; it’s cartilage and muscle. This picture shows what the larynx looks like from the front.

Larynx external

This next picture is a view down a person’s throat.

Cartilages of the Larynx

What you see here is that the opening of the larynx can be covered by two triangle-shaped pieces of skin.  These are often called “vocal cords” but they’re not really like cords or strings.  A better name for them is vocal folds .

The opening between the vocal folds is called the glottis .

We can control our vocal folds to make a sound.  I want you to try this out so take a moment and close your door or make sure there’s no one around that you might disturb.

First I want you to say the word “uh-oh”. Now say it again, but stop half-way through, “Uh-”. When you do that, you’ve closed your vocal folds by bringing them together. This stops the air flowing through your vocal tract.  That little silence in the middle of “uh-oh” is called a glottal stop because the air is stopped completely when the vocal folds close off the glottis.

Now I want you to open your mouth and breathe out quietly, “haaaaaaah”. When you do this, your vocal folds are open and the air is passing freely through the glottis.

Now breathe out again and say “aaah”, as if the doctor is looking down your throat.  To make that “aaaah” sound, you’re holding your vocal folds close together and vibrating them rapidly.

When we speak, we make some sounds with vocal folds open, and some with vocal folds vibrating.  Put your hand on the front of your larynx again and make a long “SSSSS” sound.  Now switch and make a “ZZZZZ” sound. You can feel your larynx vibrate on “ZZZZZ” but not on “SSSSS”.  That’s because [s] is a voiceless sound, made with the vocal folds held open, and [z] is a voiced sound, where we vibrate the vocal folds.  Do it again and feel the difference between voiced and voiceless.

Now take your hand off your larynx and plug your ears and make the two sounds again with your ears plugged. You can hear the difference between voiceless and voiced sounds inside your head.

I said at the beginning that there are three crucial mechanisms involved in producing speech, and so far we’ve looked at only two:

  • Energy comes from the air supplied by the lungs.
  • The vocal folds produce sound at the larynx.
  • The sound is then filtered, or shaped, by the articulators .

The oral cavity is the space in your mouth. The nasal cavity, obviously, is the space inside and behind your nose. And of course, we use our tongues, lips, teeth and jaws to articulate speech as well.  In the next unit, we’ll look in more detail at how we use our articulators.

So to sum up, the three mechanisms that we use to produce speech are:

  • respiration at the lungs,
  • phonation at the larynx, and
  • articulation in the mouth.

Essentials of Linguistics Copyright © 2018 by Catherine Anderson is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies

Linguistics

  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Speech Production

Introduction.

  • Historical Studies
  • Animal Studies
  • Evolution and Development
  • Functional Magnetic Resonance and Positron Emission Tomography
  • Electroencephalography and Other Approaches
  • Theoretical Models
  • Speech Apparatus
  • Speech Disorders

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Acoustic Phonetics
  • Animal Communication
  • Articulatory Phonetics
  • Biology of Language
  • Clinical Linguistics
  • Cognitive Mechanisms for Lexical Access
  • Cross-Language Speech Perception and Production
  • Dementia and Language
  • Early Child Phonology
  • Interface Between Phonology and Phonetics
  • Khoisan Languages
  • Language Acquisition
  • Speech Perception
  • Speech Synthesis
  • Voice and Voice Quality

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Attention and Salience
  • Edward Sapir
  • Text Comprehension
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Speech Production by Eryk Walczak LAST REVIEWED: 22 February 2018 LAST MODIFIED: 22 February 2018 DOI: 10.1093/obo/9780199772810-0217

Speech production is one of the most complex human activities. It involves coordinating numerous muscles and complex cognitive processes. The area of speech production is related to Articulatory Phonetics , Acoustic Phonetics and Speech Perception , which are all studying various elements of language and are part of a broader field of Linguistics . Because of the interdisciplinary nature of the current topic, it is usually studied on several levels: neurological, acoustic, motor, evolutionary, and developmental. Each of these levels has its own literature but in the vast majority of speech production literature, each of these elements will be present. The large body of relevant literature is covered in the speech perception entry on which this bibliography builds upon. This entry covers general speech production mechanisms and speech disorders. However, speech production in second language learners or bilinguals has special features which were described in separate bibliography on Cross-Language Speech Perception and Production . Speech produces sounds, and sounds are a topic of study for Phonology .

As mentioned in the introduction, speech production tends to be described in relation to acoustics, speech perception, neuroscience, and linguistics. Because of this interdisciplinarity, there are not many published textbooks focusing exclusively on speech production. Guenther 2016 and Levelt 1993 are the exceptions. The former has a stronger focus on the neuroscientific underpinnings of speech. Auditory neuroscience is also extensively covered by Schnupp, et al. 2011 and in the extensive textbook Hickok and Small 2015 . Rosen and Howell 2011 is a textbook focusing on signal processing and acoustics which are necessary to understand by any speech scientist. A historical approach to psycholinguistics which also covers speech research is Levelt 2013 .

Guenther, F. H. 2016. Neural control of speech . Cambridge, MA: MIT.

This textbook provides an overview of neural processes responsible for speech production. Large sections describe speech motor control, especially the DIVA model (co-authored by Guenther). It includes extensive coverage of behavioral and neuroimaging studies of speech as well as speech disorders and ties them together with a unifying theoretical framework.

Hickok, G., and S. L. Small. 2015. Neurobiology of language . London: Academic Press.

This voluminous textbook edited by Hickok and Small covers a wide range of topics related to neurobiology of language. It includes a section devoted to speaking which covers neurobiology of speech production, motor control perspective, neuroimaging studies, and aphasia.

Levelt, W. J. M. 1993. Speaking: From intention to articulation . Cambridge, MA: MIT.

A seminal textbook Speaking is worth reading particularly for its detailed explanation of the author’s speech model, which is part of the author’s language model. The book is slightly dated, as it was released in 1993, but chapters 8–12 are especially relevant to readers interested in phonetic plans, articulating, and self-monitoring.

Levelt, W. J. M. 2013. A history of psycholinguistics: The pre-Chomskyan era . Oxford: Oxford University Press.

Levelt published another important book detailing the development of psycholinguistics. As its title suggests, it focuses on the early history of discipline, so readers interested in historical research on speech can find an abundance of speech-related research in that book. It covers a wide range of psycholinguistic specializations.

Rosen, S., and P. Howell. 2011. Signals and Systems for Speech and Hearing . 2d ed. Bingley, UK: Emerald.

Rosen and Howell provide a low-level explanation of speech signals and systems. The book includes informative charts explaining the basic acoustic and signal processing concepts useful for understanding speech science.

Schnupp, J., I. Nelken, and A. King. 2011. Auditory neuroscience: Making sense of sound . Cambridge, MA: MIT.

A general introduction to speech concepts with main focus on neuroscience. The textbook is linked with a website which provides a demonstration of described phenomena.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Linguistics »
  • Meet the Editorial Board »
  • Acceptability Judgments
  • Accessibility Theory in Linguistics
  • Acquisition, Second Language, and Bilingualism, Psycholin...
  • Adpositions
  • African Linguistics
  • Afroasiatic Languages
  • Algonquian Linguistics
  • Altaic Languages
  • Ambiguity, Lexical
  • Analogy in Language and Linguistics
  • Applicatives
  • Applied Linguistics, Critical
  • Arawak Languages
  • Argument Structure
  • Artificial Languages
  • Australian Languages
  • Austronesian Linguistics
  • Auxiliaries
  • Balkans, The Languages of the
  • Baudouin de Courtenay, Jan
  • Berber Languages and Linguistics
  • Bilingualism and Multilingualism
  • Borrowing, Structural
  • Caddoan Languages
  • Caucasian Languages
  • Celtic Languages
  • Celtic Mutations
  • Chomsky, Noam
  • Chumashan Languages
  • Classifiers
  • Clauses, Relative
  • Cognitive Linguistics
  • Colonial Place Names
  • Comparative Reconstruction in Linguistics
  • Comparative-Historical Linguistics
  • Complementation
  • Complexity, Linguistic
  • Compositionality
  • Compounding
  • Comprehension, Sentence
  • Computational Linguistics
  • Conditionals
  • Conjunctions
  • Connectionism
  • Consonant Epenthesis
  • Constructions, Verb-Particle
  • Contrastive Analysis in Linguistics
  • Conversation Analysis
  • Conversation, Maxims of
  • Conversational Implicature
  • Cooperative Principle
  • Coordination
  • Creoles, Grammatical Categories in
  • Critical Periods
  • Cyberpragmatics
  • Default Semantics
  • Definiteness
  • Dene (Athabaskan) Languages
  • Dené-Yeniseian Hypothesis, The
  • Dependencies
  • Dependencies, Long Distance
  • Derivational Morphology
  • Determiners
  • Dialectology
  • Distinctive Features
  • Dravidian Languages
  • Endangered Languages
  • English as a Lingua Franca
  • English, Early Modern
  • English, Old
  • Eskimo-Aleut
  • Euphemisms and Dysphemisms
  • Evidentials
  • Exemplar-Based Models in Linguistics
  • Existential
  • Existential Wh-Constructions
  • Experimental Linguistics
  • Fieldwork, Sociolinguistic
  • Finite State Languages
  • First Language Attrition
  • Formulaic Language
  • Francoprovençal
  • French Grammars
  • Gabelentz, Georg von der
  • Genealogical Classification
  • Generative Syntax
  • Genetics and Language
  • Grammar, Categorial
  • Grammar, Cognitive
  • Grammar, Construction
  • Grammar, Descriptive
  • Grammar, Functional Discourse
  • Grammars, Phrase Structure
  • Grammaticalization
  • Harris, Zellig
  • Heritage Languages
  • History of Linguistics
  • History of the English Language
  • Hmong-Mien Languages
  • Hokan Languages
  • Humor in Language
  • Hungarian Vowel Harmony
  • Idiom and Phraseology
  • Imperatives
  • Indefiniteness
  • Indo-European Etymology
  • Inflected Infinitives
  • Information Structure
  • Interjections
  • Iroquoian Languages
  • Isolates, Language
  • Jakobson, Roman
  • Japanese Word Accent
  • Jones, Daniel
  • Juncture and Boundary
  • Kiowa-Tanoan Languages
  • Kra-Dai Languages
  • Labov, William
  • Language and Law
  • Language Contact
  • Language Documentation
  • Language, Embodiment and
  • Language for Specific Purposes/Specialized Communication
  • Language, Gender, and Sexuality
  • Language Geography
  • Language Ideologies and Language Attitudes
  • Language in Autism Spectrum Disorders
  • Language Nests
  • Language Revitalization
  • Language Shift
  • Language Standardization
  • Language, Synesthesia and
  • Languages of Africa
  • Languages of the Americas, Indigenous
  • Languages of the World
  • Learnability
  • Lexical Access, Cognitive Mechanisms for
  • Lexical Semantics
  • Lexical-Functional Grammar
  • Lexicography
  • Lexicography, Bilingual
  • Linguistic Accommodation
  • Linguistic Anthropology
  • Linguistic Areas
  • Linguistic Landscapes
  • Linguistic Prescriptivism
  • Linguistic Profiling and Language-Based Discrimination
  • Linguistic Relativity
  • Linguistics, Educational
  • Listening, Second Language
  • Literature and Linguistics
  • Machine Translation
  • Maintenance, Language
  • Mande Languages
  • Mass-Count Distinction
  • Mathematical Linguistics
  • Mayan Languages
  • Mental Health Disorders, Language in
  • Mental Lexicon, The
  • Mesoamerican Languages
  • Minority Languages
  • Mixed Languages
  • Mixe-Zoquean Languages
  • Modification
  • Mon-Khmer Languages
  • Morphological Change
  • Morphology, Blending in
  • Morphology, Subtractive
  • Munda Languages
  • Muskogean Languages
  • Nasals and Nasalization
  • Niger-Congo Languages
  • Non-Pama-Nyungan Languages
  • Northeast Caucasian Languages
  • Oceanic Languages
  • Papuan Languages
  • Penutian Languages
  • Philosophy of Language
  • Phonetics, Acoustic
  • Phonetics, Articulatory
  • Phonological Research, Psycholinguistic Methodology in
  • Phonology, Computational
  • Phonology, Early Child
  • Policy and Planning, Language
  • Politeness in Language
  • Positive Discourse Analysis
  • Possessives, Acquisition of
  • Pragmatics, Acquisition of
  • Pragmatics, Cognitive
  • Pragmatics, Computational
  • Pragmatics, Cross-Cultural
  • Pragmatics, Developmental
  • Pragmatics, Experimental
  • Pragmatics, Game Theory in
  • Pragmatics, Historical
  • Pragmatics, Institutional
  • Pragmatics, Second Language
  • Pragmatics, Teaching
  • Prague Linguistic Circle, The
  • Presupposition
  • Psycholinguistics
  • Quechuan and Aymaran Languages
  • Reading, Second-Language
  • Reciprocals
  • Reduplication
  • Reflexives and Reflexivity
  • Register and Register Variation
  • Relevance Theory
  • Representation and Processing of Multi-Word Expressions in...
  • Salish Languages
  • Sapir, Edward
  • Saussure, Ferdinand de
  • Second Language Acquisition, Anaphora Resolution in
  • Semantic Maps
  • Semantic Roles
  • Semantic-Pragmatic Change
  • Semantics, Cognitive
  • Sentence Processing in Monolingual and Bilingual Speakers
  • Sign Language Linguistics
  • Sociolinguistics
  • Sociolinguistics, Variationist
  • Sociopragmatics
  • Sound Change
  • South American Indian Languages
  • Specific Language Impairment
  • Speech, Deceptive
  • Speech Production
  • Switch-Reference
  • Syntactic Change
  • Syntactic Knowledge, Children’s Acquisition of
  • Tense, Aspect, and Mood
  • Text Mining
  • Tone Sandhi
  • Transcription
  • Transitivity and Voice
  • Translanguaging
  • Translation
  • Trubetzkoy, Nikolai
  • Tucanoan Languages
  • Tupian Languages
  • Usage-Based Linguistics
  • Uto-Aztecan Languages
  • Valency Theory
  • Verbs, Serial
  • Vocabulary, Second Language
  • Vowel Harmony
  • Whitney, William Dwight
  • Word Classes
  • Word Formation in Japanese
  • Word Recognition, Spoken
  • Word Recognition, Visual
  • Word Stress
  • Writing, Second Language
  • Writing Systems
  • Zapotecan Languages
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [185.80.151.41]
  • 185.80.151.41

Speech Production

In subject area: Medicine and Dentistry

Speech may be defined as the production of vocal sounds for communication through the processes of respiration, phonation, resonation, and articulation.

From: Encyclopedia of Neuroscience , 2009

Chapters and Articles

You might find these chapters and articles relevant to this topic.

Neural Models of Motor Speech Control

Frank H. Guenther , Gregory Hickok , in Neurobiology of Language , 2016

58.1 Introduction

Speech production is a highly complex motor act involving the finely coordinated activation of approximately 100 muscles in the respiratory, laryngeal, and oral motor systems. To achieve this task, speakers utilize a large network of brain regions. This network includes regions involved in other motor tasks, such as the motor and somatosensory cortical areas, cerebellum, basal ganglia, and thalamus, as well as regions that are more specialized for speech and language, including inferior and middle prefrontal cortex and superior and middle temporal cortex. Our goal in this chapter is to describe the critical role of the auditory system in speech production. We first discuss the role of sensory systems in motor control broadly and summarize the long history of ideas and research on the interaction between auditory and motor systems for speech. We then describe current research on speech planning, which strongly implicates the auditory system in this process. Two large-scale neurocomputational models of speech production are then discussed. Finally, we highlight some future directions for research on speech production.

Movement is absolutely dependent on sensory information. We know where and how to reach for an object because we see its location and shape; we know how much force to exert while we are holding the object because we feel the pressure of the object on our hand and the weight on our limb; and we know how to initiate any of these movements because our sensory systems tell us where our limb is in relation to our body and the object. British neurologist and physiologist Henry Charlton Bastian (1837–1915) wrote on the topic of movement control in 1887, stating, “It may be regarded as a physiological axiom, that all purposive movements of animals are guided by sensations or by afferent impressions of some kind” ( Bastian, 1887 , p. 1). Experimental work over the decades backs these claims. This work has found, for example, that blocking somatosensory feedback from a monkey’s limb (while leaving motor fibers intact) causes the limb to go dead. With training, the monkey can learn to reuse it clumsily, but only with visual feedback; blindfold the animal and motor control degrades dramatically ( Sanes, Mauritz, Evarts, Dalakas, & Chu, 1984 ). Similar symptomology can be found in humans suffering from large-fiber sensory neuropathy, which deafferents the body sense while leaving motor fibers intact ( Sanes et al., 1984 ).

Speech is no different. Without the auditory system, as in prelingual-onset peripheral deafness, normal speech development cannot occur. Importantly, it is not just during development that auditory information is critical. Experimental or naturally caused manipulations of acoustic input can have dramatic effects on speech production. For example, delayed auditory feedback induces nonfluency ( Yates, 1963 ), altering feedback in the form of pitch or the formant frequency structure results in automatic and largely unconscious compensation in speech articulation ( Burnett, Freedland, Larson, & Hain, 1998; Houde & Jordan, 1998; Larson, Burnett, Bauer, Kiran, & Hain, 2001 ), and exposure to a different linguistic environment can induce changes in the listener-speaker’s articulation (picking up accents; Sancier & Fowler, 1997 ). Furthermore, although individuals who become deaf as adults can remain intelligible for years after they lose hearing, they show some speech output impairments immediately, including impaired ability to adjust pitch loudness in different listening conditions; over time, their phonetic contrasts become reduced ( Perkell et al., 2000 ) and they exhibit articulatory decline ( Waldstein, 1989 ).

The speech research literature contains numerous theoretical proposals that strongly link speech perception and speech production. Notable examples include the motor theory of speech perception ( Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Liberman & Mattingly, 1985 ), which posits that speech perception involves translating acoustic signals into the motor gestures that produce them, as well as acoustic theories of speech production ( Fant, 1960; Stevens, 1998 ), which highlight the importance of acoustic or auditory targets in the speech production process. In the following sections we elaborate on the roles of auditory information in recent neural models of speech planning and execution.

Language and Communication – Brain Substrate

S. Coulson , in Encyclopedia of Behavioral Neuroscience , 2010

If speech production is a somewhat understudied area in the cognitive neuroscience of language, the study of the neural correlates of writing is virtually uncharted territory. A handful of studies on the production of written words, however, point to the importance of the left superior parietal lobule and the left inferior posterior frontal cortex.

Motor-Timing and Sequencing in Speech Production

Sonja A. Kotz , Michael Schwartze , in Neurobiology of Language , 2016

Speech production can be considered as a sensorimotor behavior that requires precise control and the dynamic interplay of several parallel processing levels. To produce an utterance, the respective information has to be selected, sequenced, and articulated in an adequate, highly time-sensitive manner. Moreover, while these incremental processes develop, sensory feedback is integrated into the system. Efficient temporal processing and coordination is therefore mandatory to successful speech production. Temporal processing in the range of milliseconds to seconds, critical to speech production and perception alike, has been suggested to engage dedicated neural mechanisms. Similar to selection and sequencing, these mechanisms rely on the basal ganglia and their associated thalamocortical circuits. On this basis, we discuss and refine anatomical and functional characteristics of a general-purpose framework in which dedicated temporal processing establishes a reference that guides the generation of faster and efficient dynamics in speech production.

Psycholinguistics☆

Nan Bernstein Ratner , Jean Berko Gleason , in Reference Module in Neuroscience and Biobehavioral Psychology , 2017

How Individuals Produce Language: Speech Production

It is easier to evaluate the comprehension process than the production process. We can analyze patterns of accuracy and error, response time, and other behaviors to arrive at a rough estimate of how language stimuli are comprehended by listeners. However, it is more difficult to gain insight into how ideas are put into linguistic form; the process is largely hidden from observation, and speakers' verbal expressions, even in response to controlled eliciting stimuli, vary considerably. Researchers continue to develop complex models of the production process. Typical models of language production contain multiple stages, beginning with the meaning that speakers wish to convey; they then proceed to the selection of syntactic structures and intonation contours, the specification of words from the lexicon, and, finally, phonological rules and commands to the motor system for speech. Major sources of information about the probable nature of the speech production process can be found in speakers' speech errors or false starts and in hesitations and pauses that break the rhythm of ongoing speech, as well as in the time course of language production in brain imaging experiments.

Phonological Aspects of Aphasia

Sheila E. Blumstein , in Acquired Aphasia (Third Edition) , 1998

As Figure 5.1 shows, the production of a word or words involves the selection of a word candidate or candidates from the lexicon, the encoding of the abstract phonological representation of the word in terms of the proper order of the segments and in terms of the phonological context in which they appear (articulatory planning), and then the implementation of this phonetic string into a set of motor commands or motor programs to the vocal tract. A number of models of speech production propose that these word candidates are scanned into a short-term buffer to account for the fact that the ultimate production of a sequence of words or an utterance is influenced not only at the segmental level but also at the prosodic level by the phonological context of neighboring words, and, ultimately, the syntactic role that the individual lexical item plays in the utterance string (cf. Levelt, 1989 , for discussion). For example, the auxiliary have may be reduced in certain syntactic contexts and appended to the preceding word, for example, “I have eaten” may be produced as “I’ve eaten.”

The study of speech production deficits in aphasia have typically distinguished between analyses of the phonological patterns of production, that is, the selection of a lexical item and the planning of its production, and the phonetic patterns of speech production, that is, its articulatory implementation. Thus, for example, if a patient produces a wrong sound segment, but its phonetic (articulatory) implementation is correct, that is, for cat the patient says gat , then it is inferred that the basis of the patient’s output error is phonological and is related to either the selection or planning stages of speech output. In contrast, if a patient produces the correct sound segment but its phonetic implementation is incorrect, that is, for cat the patient produces an initial /k/ that is overly aspirated [k hh ], then it is inferred that the basis of the patient’s disorder is phonetic and is related to the articulatory implementation stages of speech output.

Recently, phonological output disorders have been looked at within a broader framework, not just focused on sound structure per se, but on how phonological form may be instantiated in the processes of word (lexical) retrieval. The essential features of the working model proposed in Figure 5.1 are implemented within an interactive spreading activation framework (cf. Dell, 1989; Schwartz, Saffran, Bloch, & Dell, 1994 ). It is beyond the scope of this chapter to present the details of these models. Nonetheless, the working model in Figure 5.1 is compatible with these models, and, more importantly, the overall conclusions drawn from the results of phonological disorders presented here are similar.

Phonological Patterns of Speech Production

Clinical evidence shows that nearly all aphasic patients produce phonological errors in their speech output. These errors can be characterized according to four main types:

Phoneme substitution errors, in which a phoneme is substituted for a different phoneme in the language, for example, teams →/kimz/.

Simplification errors, in which a phoneme or syllable is deleted, for example, brown → /bawn/.

Addition errors, in which an extra phoneme or syllable is added to a word, for example, papa → [paprә]

Environment errors, in which the occurrence of a particular phoneme is influenced by the surrounding phonetic context. The order of the segments may be changed, for example, degree → [gәdri], or the presence of one sound may influence the occurrence of another, for example, Crete → [trit].

Within each of the four categories of errors, there are systematic patterns that have been observed among the aphasic patients studied and provide clues as to the basis of the deficit. The majority of phoneme substitution errors are characterized by the replacement of a single phonetic feature. For example, patients may make errors involving the phonetic feature [voice], for example, peace → [bis], the phonetic feature [place of articulation], for example, pay → [tei], or manner of articulation such as [nasal], for example, day → [nei]. Rarely do they make errors involving more than one phonetic feature. Moreover, there is a hierarchy of phoneme substitution errors, with a greater preponderance of errors involving place of articulation, then voicing, and fewest, manner of articulation. The overall pattern of sound substitutions is consistent with the view that the incorrect phonetic features have been selected or activated, but they have been correctly implemented by the articulatory system. Most simplification errors and addition errors result in what is believed to be the simplest and thus the canonical syllable structure of language, C onsonant V owel. For example, consonants are more likely deleted in a word beginning with two consonants, sky → ky , and are more likely added in a word beginning with a vowel, army → jarmy ( Blumstein, 1990 ). And finally, environment errors which occur across word boundaries preserve the syllable structure relations of the lexical candidates. That is, if the influencing phoneme is at the beginning of the target word, so is the influenced phoneme, for example, history books → bistory books . If the influencing phoneme is at the end of the target word, so is the influenced phoneme: roast beef → roaf beef .

The stability of these patterns is evidenced by their occurrence across languages: French ( Bouman & Grunbaum, 1925; Lecours & Lhermitte, 1969 ), German ( Bouman & Grunbaum, 1925; Goldstein, 1948 ), English ( Blumstein, 1973; Green, 1969 ), Turkish ( Peuser & Fittschen, 1977 ), Russian ( Luria, 1966 ), and Finnish ( Niemi, Koivuselka-Sallinen, & Hanninen, 1985 ). Despite the systematicity and regularity of these phonological errors, their particular occurrence cannot be predicted. That is, sometimes the patient may make an error on a particular word, and other times she or he will produce it correctly. Moreover, the pattern of errors are bidirectional ( Blumstein, 1973; Hatfield & Walton, 1975 ). A voiced stop consonant may become voiceless, /d/ → /t/, and a voiceless stop consonant may become voiced, /1/ → /d/.

Taken together, these results suggest that the patient has not “lost” the ability to produce particular phonemes or to instantiate particular features. Rather, his or her speech output mechanism does not seem to be able to encode consistently the correct phonemic (i.e., phonetic feature) representation of the word. As a consequence, the patient may produce an utterance that is articulatorily correct but deviates phonologically from the target word. On other occasions, the patient may produce the same target word correctly. These results are consistent with the view that the underlying phonological representations are intact, but there are deficits in accessing these representations ( Butterworth, 1992 ). As such, these patients have a selection or phonological planning deficit ( Blumstein, 1973, 1994 ; cf. also Nespoulous & Villiard, 1990 ). To return to the model for speech production in Figure 5.1 , a word candidate is selected from the lexicon. To produce the word requires that its sound properties (i.e., its segments and features) be specified so that they can be “planned” for articulation and ultimately translated into neuromuscular commands relating to the speech apparatus. Phonological deficits seem then to relate to changes in the activation patterns of the nodes corresponding to the phonetic representations themselves (e.g., features, syllable structure) as the word candidate is selected, as well as to deficits in the processes involved in storage in the shortterm lexical buffer and in phonological planning (cf. also Schwartz et al., 1994; Waters & Caplan, 1995 ).

The similar patterns of performance are particularly striking given the very different clinical characteristics and neuropathology of the patients investigated. The groups studied have included both anterior and posterior patients. Anterior aphasics, especially Broca’s aphasics, show a profound expressive deficit in the face of relatively preserved auditory language comprehension. Speech output is nonfluent in that it is slow, labored, and often dysarthric, and the melody pattern is often flat. Furthermore, speech output is often agrammatic. This agrammatism is characterized by the omission of grammatical words, such as the and is , as well as the substitution of grammatical inflectional endings marking number, tense, and so forth.

In contrast to the nonfluent speech output of the anterior aphasic, the posterior patient’s speech output is fluent. Among the posterior aphasias, Wernicke’s and conduction aphasia are perhaps the most studied in relation to phonology (cf. Ardila, 1992; Buckingham & Kertesz, 1976; Kohn, 1992; Schwartz et al., 1994 ). The characteristic features of the language abilities of Wernicke’s aphasia include well articulated but paraphasic speech in the context of severe auditory language comprehension deficits. Paraphasias include literal paraphasias (sound substitutions), verbal paraphasias (word substitutions), or neologisms (productions that are phonologically possible but have no meaning associated with them). Speech output, although grammatically full, is often empty of semantic content and is marked by the overuse of high-frequency “contentless” nouns and verbs, such as thing and be . Another frequent characteristic of this disorder is logorrhea, or a press for speech.

Conduction aphasia refers to the syndrome in which there is a disproportionately severe repetition deficit in relation to the relative fluency and ease of spontaneous speech production and to the generally good auditory language comprehension of the patient. Speech output contains many literal paraphasias and some verbal paraphasias.

The results of the studies of the phonological patterns of speech production challenge the classical view of the clinical/neurological basis of language disorders in adult aphasics. The classical view has typically characterized the aphasia syndromes in broad anatomical (anterior and posterior) and functional (expressive and receptive) dichotomies (cf. Geschwind, 1965 ). To a first approximation, the anterior/posterior anatomical dichotomy corresponds well with the functional expressive/receptive dichotomy as anterior patients are typically nonfluent and posterior patients are typically fluent, and anterior patients typically have good comprehension and posterior patients typically have poor comprehension. Nonetheless, the similar patterns of performance across these aphasic syndromes indicates that both anterior and posterior brain structures contribute to the selection of phonological representations as well as to phonological planning in speech production.

An interesting syndrome from the perspective of phonological output disorders is jargon aphasia (those Wernicke’s aphasics who produce neologisms or jargon, which are defined as the production of nonwords that do not derive from any obvious literal paraphasia or phonologically distorted semantic paraphasia). Phonological analyses reveal that neologisms follow the phonological patterns of the language. They respect the sound structure, stress rules, syllable structure, and phonotactics (allowable order of sounds). Although it is not clear what the source of these jargon productions are, their phonological characteristics are consistent with the general observation that the processes of lexical activation and retrieval are the source of the problem, not the more abstract phonological shape or organizational principles of the lexicon ( Christman, 1994; Hanlon & Edmondson, 1996; Kohn, Melvold, & Smith, 1995 ).

Phonetic Patterns of Speech Production

As Figure 5.1 shows, subsequent to the selection of a lexical candidate or candidates and the articulatory planning of the utterance, the phonetic string is ultimately converted into a set of motor commands to the articulatory system. There is a wide range of speech production deficits that reflect impairments to the motor commands or motor programs to the vocal tract system. For the purpose of this chapter, we limit discussion to those phonetic disorders occurring in the context of a language impairment, that is, aphasia. Thus, we will not consider the dysarthrias, which are speech disorders resulting from damage to the speech musculature itself or to the neural mechanisms that regulate speech movements, or those speech production deficits that involve the descending motor pathways, including subcortical structures, various levels of the brain stem, the extrapyramidal system, the cranial nerves, and so on.

Whereas the processes involved in the selection and planning of the sound structure of language seem to be broadly represented in the left dominant language hemisphere, the neural basis of phonetic disorders seems to be more localized. And yet, as will be discussed, the language production apparatus also seems to be a highly interconnected system, involving the contribution not only of anterior speech motor areas, but also potentially of posterior areas as well.

A long-held observation is that anterior aphasics produce phonetic errors. The implied basis for these errors is one of articulatory implementation; that is, the commands to the articulators to encode the word are poorly timed and impaired. A number of studies have explored these phonetic patterns of speech by investigating the acoustic properties or the articulatory parameters underlying the production of particular phonetic dimensions.

Studies of speech production in anterior patients have shown that these patients have difficulty producing phonetic dimensions that require the timing of two independent articulators. These findings have emerged in the analysis of the production of two phonetic dimensions, voicing and nasality. In the case of the feature voicing, the dimension studied has been voice-onset time, that is, the timing relation between the release of a stop consonant and the onset of vocal cord vibration. For voiceless consonants, such as /p/, there is a delay in the onset of vocal cord vibration of around 30   ms after the stop consonant is released; whereas for voiced consonants, such as /b/, vocal cord vibration begins either coincident with the release of the consonant or some tens of milliseconds later. The production of nasal consonants also requires appropriate timing between two articulators; in this case, the release of the closure in the oral cavity and the velum opening. For /m/, the velum must be opened when the closure at the lips is released; whereas for /b/, the velum must stay closed while the closure at the lips is released.

Results of analyses of the production of the voicing and nasal phonetic dimensions have shown that anterior aphasics evidence significant deficits ( Blumstein, Cooper, Goodglass, Statlender, & Gottlieb, 1980; Blumstein, Cooper, Zurif, & Caramazza, 1977; Freeman, Sands, & Harris, 1978; Gandour & Dardarananda, 1984a; Itoh, Sasanuma, Hirose, Yoshioka, & Ushijima, 1980; Itoh et al., 1982; Itoh, Sasanuma, & Ushijima, 1979; Shewan, Leeper, & Booth, 1984 ). These same patterns emerge across different languages. They occur not only in English and Japanese for which voice-onset time distinguishes two categories of voicing, voiced and voiceless, but also in Thai for which voice-onset time distinguishes three categories of voicing in stop consonants, pre-voiced, voiced, and voiceless aspirated (although cf. Ryalls, Provost, & Arsenault, 1995 , for some different findings in Frenchspeaking aphasics). All of these studies have used acoustic measurements and have inferred the articulatory states giving rise to the acoustic patterns observed. More direct measures of articulatory timing with fiber optics ( Itoh & Sasanuma, 1983; Itoh et al., 1979 ), computer-controlled X-ray microbeams ( Itoh et al., 1980 ), and electromyography ( Shankweiler, Harris, & Taylor, 1968 ) have also shown that the timing relations among the articulators is impaired.

That the anterior aphasics have particular difficulties with the production of two phonetic features, voice and nasal, could indicate that these patients have an impairment that is affecting the articulatory implementation of particular phonetic features (voice and nasal) or, alternatively, the implementation of particular articulatory maneuvers. It is possible to answer this question by exploring the constellation of spared and impaired patterns of articulation associated with the production of voicing in stop consonants. In English, the feature voicing in stop consonants can be cued in several ways. Voice-onset time provides one measure of voicing for stop consonants occurring in initial position. A second measure is the duration of the vowel preceding a stop consonant. Vowels are short before voiceless stops, write , and long before voiced stops, ride . If patients have a deficit related to the implementation of the feature voicing, then they should display impairments in the production of voice-onset time as well as vowel length preceding voiced and voiceless stop consonants. In contrast, if they have a deficit related to particular articulatory maneuvers, such as the timing of two independent articulators, the production of voice-onset time may be impaired, whereas the production of vowel length may be normal. Results indicate that although these patients show an impairment in the implementation of the voicing phonetic dimension, via voice-onset time, they are able to maintain the distinction between voiced and voiceless stops on the basis of the duration of the preceding vowel ( Baum, Blumstein, Naeser, & Palumbo, 1990; Duffy & Gawle, 1984; Tuller, 1984 ). Thus, these patients do not have a disorder affecting the articulatory production of the feature voicing, but a disorder affecting particular articulatory maneuvers, namely, the timing or integration of movements of two independent articulators.

Consistent with this view are the results from the acoustic analysis of the production of vowels. Differences among vowel sounds such as /i a u/ are determined acoustically by the frequency of the first two resonant peaks, called formant frequencies. Analyses of the formant frequencies of spoken vowel utterances show that anterior aphasics, including Broca’s aphasics, maintain formant frequency characteristics of different vowels, despite increased variability in their productions ( Kent & Rosenbek, 1983; Ryalls, 1981, 1986, 1987 ). The production of vowels requires articulatory gestures based on the overall shape of the tongue, rather than on the coordination of independent articulators.

Although anterior aphasics show a disorder in temporal coordination, their disorder does not reflect a pervasive timing impairment. Fricative durations do not differ significantly from those of normals ( Harmes et al., 1984 ), and the patients maintain the intrinsic duration differences characteristic of fricatives varying in place of articulation; for example /s/ and /š/ are longer in duration than /f/ and /θ/ ( Baum, 1996; Baum et al., 1990 ). Although overall vowel duration is longer for anterior aphasics than for normals ( Baum, 1993 ; see Ryalls, 1987 , for review), these patients do maintain differences in the intrinsic durations of vowels; for example, tense vowels such as /i/ and /e/ are longer than their lax vowel counterparts, /I/ and /E/. In addition, Thai-speaking anterior aphasics maintain the contrast between short vowels and long vowels. In Thai, vowel length is phonemic, in that long and short vowels distinguish words in the language, for example, /hat/ to practice versus /haat/ shoal ( Gandour & Dardarananda, 1984b; Gandour, Ponglorpisit, Khunadorn, Dechongkit, Boongird, & Boonklam, 1992 ).

In addition to impairment in timing of independent articulators, difficulties for anterior aphasics have also emerged with laryngeal control. They have shown impairments in voicing in the production of voiced fricatives ( Baum, 1996; Baum et al., 1990; Harmes et al., 1984; Kent & Rosenbek, 1983 ), and impairments in voicing influencing the spectral shape associated with place of articulation in stop consonants ( Shinn & Blumstein, 1983 ).

Consistent with the findings that anterior aphasics have impairments involving laryngeal control are studies of intonation. Intonation, or the melody of language, is ultimately determined by laryngeal maneuvers. A number of acoustic parameters are used to study intonation. Among the most common is the analysis of fundamental frequency, which relates to the frequency of vibration of the vocal cords. The study of intonation provides important clues to speech planning abilities. That is, different intonation patterns emerge as a function of syntactic complexity and sentence length. Typically, declarative sentences in English have a falling intonation at the end of the sentence, called terminal falling f o (fundamental frequency), and the final word of the sentence is typically lengthened ( Cooper & Sorenson, 1980 ). For the speaker to produce the appropriate pitch contours and word duration, it is necessary to effectively preplan the sentence, taking into consideration its length and syntactic structure.

Acoustic analyses of two-word spontaneous speech utterances and reading in Broca’s aphasics have shown that although these patients have rudimentary control over some features of prosody, in that they maintain a terminal falling fundamental frequency even in utterances in which the pauses between words may reach durations of as long as 7   sec ( Cooper, Soares, Nicol, Michelow, & Goloskie, 1984 ), they show a restriction in the fundamental frequency range ( Cooper et al., 1984; Ryalls, 1982 ). Restrictions in fundamental frequency range support the clinical impression that these speakers produce utterances in a monotone or with a flattened intonation. However, the fact that they maintain a falling fundamental frequency suggests that they do have a linguistic sense of an utterance and are not simply stringing together lexical items. Nonetheless, these patients do show a number of systematic problems in the production of prosody. They typically do not show utterance final lengthening, but rather show longer durations in word initial position ( Danly, de Villiers, & Cooper, 1979; Danly & Shapiro, 1982 ). An increased threshold for initiating and maintaining the flow of speech may account for these findings.

Another dimension of prosody, tone production, is used in languages such as Thai and Chinese to distinguish among lexical items or words. Only a few acoustic analyses of the production of tone have been conducted. The results suggest that deficits in tonal production may emerge in anterior aphasics; however, it appears that the global properties of the tone, for example, whether the tone is high or falling, is maintained, suggesting that the production deficit is due to articulatory implementation rather than to phonological planning ( Gandour, Holasuit, Petty, & Dardarananda, 1988; Gandour, Ponglorpisit, Khunadorn, Dechongkit, Boongird, Boonklam, & Potisuk, 1992 ).

Kent and Rosenbek (1983) have suggested that the timing problem found for individual segments and their underlying features is a manifestation of a broader impairment in the integration of articulatory movements from one phonetic segment to another. The sounds of speech are affected by the phonetic contexts in which they occur. For example, the production of /s/ and its consequent acoustic characteristics vary depending on whether /s/ is followed by the vowel /i/ or the vowel /u/. When /s/ is followed by the vowel /u/, it is produced with rounding (pursing of the lips) in anticipation of the rounded vowel /u/. No such adjustments are made for /i/. The rounding of the lips lengthens the vocal tract causing a lowering of the formant frequencies for /s/ before /u/ compared with /s/ before /i/. The study of such coarticulation effects provides insights into the dynamic aspects of speech production, and also provides evidence about the size of the planning units that can be programmed in the production of syllables or words.

Investigations of coarticulation effects in anterior aphasics show that they produce relatively normal anticipatory coarticulation ( Katz, 1988; Katz, Machetanz, Orth, & Schonle, 1990a, 1990b ; Sussman, Marquardt, Hutchinson, & MacNeilage, 1988 ). For example, in producing the syllable /s/, they anticipate the rounded vowel /u/ in the production of the preceding /s/ ( Katz, 1988 ). Nevertheless, they seem to show a delay in the time it takes to produce these effects ( Ziegler & von Cramon, 1985, 1986 ), and they show some deficiencies in their production ( Tuller & Story, 1986 ; but see Katz, 1987 , for discussion). What these results suggest is that phonological planning is relatively intact, but it is the ultimate timing or coordination of the implementation of the articulatory movements that is impaired. Consistent with this view are results showing that Broca’s patients demonstrate impairments in the complex timing relation between syllables ( Gandour, Dechongkit, Ponglorpisit, & Khunadorn, 1994; Gandour, Dechongkit, Ponglorpisit, Khunadorn, & Boongird, 1993 ). For example, they do not show the normal decrease in the duration of a root syllable as word length increases (cf. Baum, 1992 ), nor do they show a normal ability to increase rate of articulation beyond a certain limit. Such an impairment abnormally affects the production of the segmental properties of speech, such as voice-onset in stop consonants, fricative duration, and vowel duration as a function of speaking rate (cf. Baum, 1993, 1996; Baum & Ryan, 1993; Kent & McNeill, 1987; McNeill, Liss, Tseng, & Kent, 1990 ).

Several conclusions can be made concerning the nature of the phonetic disorders and their ultimate underlying mechanisms. In particular, the impairment is not a linguistic one, in the sense that the patient is unable to implement a particular phonetic feature. Moreover, the patients have not lost the representation for implementation nor the knowledge base for how to implement sounds in context. They not only adjust their articulatory mechanism in the implementation of a segment to anticipate a neighboring segment and to produce the appropriate timing relations in a consonant-vowel sequence, but also compensate for the fixation of the jaw by a bite block ( Baum, Kim, & Katz, 1997 ). Instead, particular maneuvers relating to the timing of articulators seem to be impaired, ultimately affecting the phonetic realization of some sound segments and of some aspects of speech prosody.

Computerized tomography (CT) scan correlations with patterns of speech production deficits suggest the involvement of Broca’s area (slice B and B/W), the anterior limb of the internal capsule (including slice B, B/W, and W; Baum et al., 1990 ), and the insula of the precentral gyrus ( Dronkers, 1997 ). The lower motor cortex regions for larynx, tongue, and lips (slices W and SM) are also implicated, although less consistently so. Nevertheless, phonetic disorders, as described in this chapter, do not emerge with damage to analogous speech areas in the right hemisphere, suggesting that even though both hemispheres may be ultimately involved in the production of speech, the control site for these mechanisms is in the left hemisphere ( Gandour et al., 1994; Kurowski, Blumstein, & Mathison, 1998 ).

There is another phonetic disorder that occurs rarely and reflects a pattern of deficit different from that previously described. This disorder is called the foreign accent syndrome and is characterized by the emergence of what is perceived by the listener as a foreign accent subsequent to organic brain disease ( Blumstein, Alexander, Ryalls, Katz, & Dworetzky, 1987; Gurd, Bessel, Bladon, & Bamford, 1988; Ingram, McCormack, & Kennedy, 1992; Kurowski, Blumstein, & Alexander, 1996 ). Acoustic analyses of the patterns of speech production of these patients provide a potential explanation for why listeners report that the patient speaks as though she or he has a foreign accent. Although the deficit is primarily phonetic in nature, it particularly affects the rhythmic and prosodic patterns of language, including the production of vowels, the syllable structure of words, and the prosody of language (especially hypermelodic), phonetic characteristics which are only minimally affected in anterior aphasics. Even phonetic errors which occur on consonants, typically affect the syllable structure of the output. Thus, unlike the articulatory implementation deficits that characterize anterior aphasics, the phonetic patterns of speech of patients displaying the foreign accent syndrome preserve the patterns of prosody and rhythm that occur in natural language. It is for this reason that listeners are likely to “hear” the speech output pattern of these patients as “foreign.” These results suggest that there are multiple mechanisms that result in speech output disorders. A great deal more research is required to determine whether these mechanisms are part of a single output system or whether there are a number of different mechanisms contributing to the articulatory implementation of speech.

Although it is not surprising to find that anterior portions of the left hemisphere, particularly those localized in the vicinity of the motor cortex, are implicated in the production of speech, recent results suggest that posterior areas of the brain are also involved. There is no question that phonetic patterns of speech are qualitatively distinct in anterior and posterior aphasics. Posterior aphasics do not display the timing deficits that anterior aphasics manifest in the production of voice-onset time in stop consonants ( Blumstein et al., 1980; Gandour & Dardarananda, 1984a; Hoit-Dalgaard, Murry, & Kopp, 1983; Shewan et al., 1984; Tuller, 1984 ) or in the production of nasal consonants ( Itoh & Sasanuma, 1983 ). Nor do they show impairments in laryngeal control either for the production of voicing or for those articulatory maneuvers requiring the integration of laryngeal movements and movements of the supralaryngeal vocal tract ( Baum et al., 1990; Shinn & Blumstein, 1983 ). Nevertheless, although clearly distinguished from anterior aphasics, posterior patients do display a subtle phonetic impairment even in the production of single syllables or isolated words in citation form. Most typically, they show increased variability in the implementation of a number of phonetic parameters ( Kent & McNeill, 1987; Ryalls, 1986 ), including vowel formant frequencies ( Ryalls, 1986 ) and vowel durations ( Gandour, Ponglorpisit, Khunadorn, Dechongkit, Boongird, & Boonklam, 1992; Ryalls, 1986; Tuller, 1984 ). In addition, they show abnormal patterns in the temporal relations of segmental structure within and between words ( Baum, 1992; Baum et al., 1990; Gandour et al., 1993, 1994 ). Finally, they show impairments in the production of a number of phonetic dimensions under different speaking rate conditions, including voice-onset time in stop consonants, fricative duration as a cue to voicing, and vowel duration ( Baum, 1993, 1996; Baum & Ryan, 1993; Kent & McNeill, 1987; McNeill et al., 1990 ). Studies exploring the temporal patterns at the sentence level also indicate deficits in temporal (durational) patterns, perhaps caused by speech planning deficits ( Gandour et al., 1994 ). Because these phonetic impairments are not clinically perceptible but emerge only on acoustic analysis, they are thought to be subclinical (cf. Baum et al., 1990; Vijayan & Gandour, 1995 ).

These subclinical impairments in speech production found in left hemisphere posterior aphasics do not emerge in right hemisphere patients ( Gandour et al., 1994; Kurowski et al., 1998 ). Thus, the increased variability in posterior aphasics is not due to a so-called brain-damage effect. Rather, these impairments suggest that the speech production system is a complex network involving both posterior and anterior brain structures. The role of these brain structures in speech production seems to be different, as shown by the differential patterns of deficits. Nevertheless, both anterior and posterior structures ultimately contribute to the speech production process.

The nature of the posterior mechanism contributing to articulatory implementation is not clear. Several hypotheses may be suggested, but at this point they remain speculative. It is the case that posterior fibers project anteriorly to the motor cortex system, and damage to those fibers could affect the speech implementation system itself. Alternatively, the auditory feedback system normally contributing to the control of the articulatory parameters of speech may be impaired. Finally, the speech planning mechanism allowing for the production of word strings at the sentence level may be shorter than normal. More research is required to determine the nature of the mechanisms involved, but what is clear is that the traditional dichotomy between production, subserved by anterior brain structures, and perception, subserved solely by posterior structures, is not supported.

Deep Brain Stimulation and Speech Production in Parkinson's Disease

Gary Weismer , ... Kate Bunton , in Journal of Neurolinguistics , 2012

1 Introduction

Speech production is complex motor behavior, implemented by multiple structures moving simultaneously and often asynchronously. The movements occur within the respiratory system, larynx, and vocal tract, the latter a flexible tube extending from the upper level of the vocal folds to the lips. The vocal tract plays the major role in the production of speech sounds such as vowels and consonants. Two intertwined results of this time-varying ensemble of movements are 1) the airstream generated by movements of the respiratory system and modulated by the nearly-periodic valving at the level of the vocal folds is intermittently and briefly interrupted, partially or completely, by movements of vocal tract structures (such as the lips, tongue, jaw), and 2) the time-varying shapes and valvings within the vocal tract, often combined with quasi-periodic energy generated by the vibrating vocal folds but sometimes independently of this energy, generate the speech acoustic signal.

The generation of the speech acoustic signal by the underlying movements of the speech mechanism defines one side of the communication process. The other side is a listener’s understanding of the speech acoustic signal. Depending on the specific design of an experimental test, the identity of the speaker and listener, and additional factors, lawful relations of varying strength can be demonstrated between the “goodness” of the speech acoustic signal and a listener’s ability to understand a talker’s speech movements ( Weismer, 2008 ).

When a speaker experiences some loss of ability to control speech movements, a measure of listener ability to identify the intended speech message should reflect that loss. Moreover, this listener measure might be expected to vary systematically with the magnitude of the speech movement problem: the greater the problem, the lower the index of speech understanding. Correspondingly, the index of speech understanding should improve with improvements in speech movements resulting from disease recovery or any of several therapies designed to ameliorate the problem. The opposite side of this expectation is of deteriorating speech movements with disease progression or as an unintended result of certain contemporary drug or surgical therapies whose primary purpose is relief from nonspeech (limb) symptoms in persons with neurological disease. Some loss of speech movement control, of course, might not affect the recovery by the listener of the intended message, but may still be detectable by a listener as a deviation from “normal” speech; because this paper is concerned with speech movement effects on intelligibility (rather than, say, “acceptability”), we will not be concerned with this issue further, although it is worthy of careful consideration in the future.

An open question concerns the effects of Deep Brain Stimulation (DBS) on speech production and intelligibility, primarily in persons with Parkinson’s disease (PD) but in patients with other neurological diseases as well. The current paper considers different ways to measure changes in speech production, and by implication speech intelligibility, resulting from disease progression, speech therapy, or drug and/or surgical interventions whose primary goal is alleviation of nonspeech (e.g., limb) symptoms. As reviewed by Tripoliti and Limousin (2010) , some studies of patients with PD have shown limited improvement in speech measures as a result of DBS, but other studies show an absence of a speech effect or a worsening of speech production and speech intelligibility. Many published evaluations of DBS and speech production/intelligibility have made use of relatively crude measures, or measures whose relevance to speech can be questioned; these issues are treated below. The discussion that follows focuses on DBS and its potential effects on speech production/intelligibility, but the considerations and conclusions may be applicable to any treatment or disease-progression changes in speech production and intelligibility.

Pediatric Facial Plastic and Reconstructive Surgery

Tianjie Shen MD , Kathleen C.Y. Sie MD , in Facial Plastic Surgery Clinics of North America , 2014

Speech is one of the main forms of communication. Speech production is a complex motor activity that requires generation of air pressure (respiratory control), vocal function (phonation), articulation, motor planning, and velopharyngeal (VP) function. Coordination of these components is needed to produce intelligible speech. 1,2 Most speech delays, such as dysfluency, articulation errors, and childhood apraxia of speech, are treated with speech therapy. However, 2 conditions, ankyloglossia and VP dysfunction (VPD), may require surgical intervention for their associated speech symptoms. This article discusses the surgical management of ankyloglossia and VPD.

Cerebral Lateralization and Cognition: Evolutionary and Developmental Investigations of Behavioral Biases

Jessica C. Hodgson , John M. Hudson , in Progress in Brain Research , 2018

4.5 Neurobiological Evidence

It has been suggested that the crucial component underlying the relationship between language lateralization and handedness is the extent to which each of these functions relies on sequential processing, for example, through motor phrases or speech utterances ( Kimura, 1993 ). Speech production is a highly complex motor act involving the coordination and synchronization of multiple neural and muscular networks. During speech, a number of component processes occur to support the retrieval of the phonological code, which underlies the lexical representation of word forms ( Tremblay et al., 2016 ). This code consists of segmental information, such as syllables and phonemes, and suprasegmental information, such as emphasis or stress.

This process of retrieval of phonologically encoded material is associated with the preparation of speech motor action, which involves the activation and translation of phonological representations into multiple domain-general mechanisms, such as response selection, response sequencing, and movement initiation. These so-called supramotor functions are not specific to speech but underlie all action preparation ( Pulvermüller and Fadiga, 2010 ). Such models of speech motor planning posit that speech builds on common action control and motor-sequencing mechanisms which support many different cognitive processes ( Freund et al., 2005 ; Tremblay et al., 2016 ) and therefore suggest that the neural organization of speech production is likely to overlap with motor control regions.

Studies have examined each component process of the speech motor control system and have made considerable progress identifying the associated neural substrates. Response selection processes involve a bilateral network of supporting motor areas, predominately the pre-SMA, which have been shown to activate more strongly during execution of specific stimulus responses tasks requiring selection of appropriate responses from a range of alternatives ( Crosson et al., 2001 ). Speech motor response sequencing, the act of organizing segmental information into words and sentences prior to vocalization, relies more heavily on the motor areas. Indeed, interruption of these regions via transcranial magnetic stimulation (TMS) has been shown to disrupt sequencing performing on a finger-tapping task and impair performance on oral motor gestures ( Tremblay and Gracco, 2009 ), which demonstrate the complementary cortical organization of motor and language networks ( Gentilucci, 2003 ). Finally, the response initiation component of speech production is perhaps the least well documented, although early studies show a clear role for the cerebellum and basal ganglia in translating motor planning into action ( Tremblay et al., 2016 ). Indeed, deficits in cerebellar–parietal networks have been identified in children who have neurodevelopmental motor impairments such as DCD ( Zwicker et al., 2011 ), supporting the suggestion that response initiation involves these regions.

Speech MRI: Morphology and function

Andrew D. Scott , ... Marc E. Miquel , in Physica Medica , 2014

Basics of speech production

The production of human speech is a complex process, see Refs. [1–4] for example, involving numerous organs, namely: the lungs, diaphragm and chest wall; the larynx, pharynx and vocal folds (or cords); the tongue, lips and soft palate (or velum); and the teeth, jaw and nasal cavity. The lungs, driven by the diaphragm and/or chest wall provide the airflow which travels through the lower respiratory tract (the bronchioles, the two bronchial tubes and the trachea). Air enters the upper respiratory tract (which will be the focus of this review) through the larynx which contains the vocal cords, a schematic of the upper airway in mid-sagittal view is represented on Fig. 1 . Air is forced through a narrow gap between the vocal folds which vibrate, producing a fundamental frequency plus harmonics. By manipulation of the tension, length and separation of the vocal cords and control of the airflow between them, the fundamental frequency, volume, and therefore the intonation, of speech can be controlled. The remainder of the upper respiratory tract forms a series of connected resonant cavities which can be modified in size and shape using the pharynx, velum, uvula, jaw, tongue and lips. These manipulators modify the formant frequencies of speech. For consonant sounds, speech is further complicated by articulation, which is the partial or full obstruction of the vocal tract by a pair of articulators – the tongue tip and upper teeth, tongue body and hard palate, the lips or the velum and tongue dorsum, for example ( Fig. 2 ).

what is meaning speech production

Figure 1 . Anatomy of the upper airways, mid-sagittal view.

what is meaning speech production

Figure 2 . Anatomy of the palate and principle musculature involved in speech, en-face view.

Speech: Biological Basis

D.H. Whalen , B. Lindblom , in Encyclopedia of Language & Linguistics (Second Edition) , 2006

Speech is produced by coordinated action of the lungs, larynx, jaw, tongue, velum, and lips ( see Speech Production ). As mentioned, each of these organs contributes to other functions, ranging from the necessary (breathing) to the aesthetic (singing). The neural control of the vocal system is quite complex and is generally separate from that of the other functions. There is a different coupling of muscle activations for chewing than for speech ( Moore et al. , 1988 ); chewing is even dissociated from babbling ( Moore and Ruark, 1996 ). This calls into question theories that propose that speech arose out of nonspeech gestures ( MacNeilage, 1998 ). Speech is not only biologically special (unique to humans), it is specialized as well (innervating the same muscles differently depending on whether speech function is involved or not).

For humans, unlike other primates, vocalization is under cortical control. Other primates primarily initiate their vocalizations subcortically and thus have no control over the form of most their vocalizations, though they are able to control some ( Jürgens, 1998 ). Humans have volitional (though usually not conscious) control over every aspect of the speech production process, from breathing up through articulation. The breathing patterns for speech are quite different from those of tidal breathing, with very short intakes and long exhalations, under active control ( Lenneberg, 1967; MacLarnon and Hewitt, 1999 ). The lungs are elastic, like a balloon, so that pressure is generated by this elasticity itself, promoting exhalation. During speech, the chest cavity is held outward to slow the escape of the air ( Winkworth et al. , 1995 ); those same muscles can contract to increase pressure, especially as volume decreases.

The larynx provides the voiced source for speech, creating a periodic source by the alternating closure and opening of the vocal folds. The rate of alternation is the fundamental frequency of a sound and is governed by the rate of air flow, the stiffness and density of the folds, and other adjustments in the vocal tract. The cycling movement of the vocal folds is an aerodynamic phenomenon due to elasticity of the folds and variations in air pressure: the folds are forced apart by subglottal pressure, then drawn back together, according to the Bernoulli principle, due to reduced pressure in the air rushing through the glottis.

Speech sounds are overlapped in their production, resulting in coarticulation (e.g., Fowler, 1980; Liberman et al. , 1967 ). This allows for more efficient use of the sound stream but results in an acoustic signal that is highly context dependent. The perceptual system, having evolved in concert with the production system, is prepared to deal with this coarticulation, a fact that has been dubbed ‘parity’ between the two systems ( Liberman and Mattingly, 1985 ). Rather than treating the context dependence as noise to be overcome, the perceptual system treats virtually all of it as information about the production process.

Neurologically, the area most clearly involved is Broca's area (Brodmann area 44/45), which is essential to voluntary articulation. Motor, premotor, and anterior (insula) areas are involved as well; the cerebellum is involved in fine-tuning the timing of speech. Although recent neural imaging techniques have improved our ability to examine the brain during various tasks, studying speech production is difficult since the movement of the articulators tends to move the head as well, making alignment of the images problematic. Event-related fMRI allows us to overcome this limitation to some extent, but the results so far have been somewhat inconclusive about the exact network of systems that is responsible for speech production. Future refinements should improve our models.

Related terms:

  • Epileptic Seizure
  • Pervasive Developmental Disorder
  • Pathologist
  • Hearing Impairment

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.2 The Standard Model of Speech Production

Speech production falls into three broad areas: conceptualization, formulation and articulation (Levelt, 1989). In conceptualization , we determine what to say. This is sometimes known as message-level processing. Then we need to formulate the concepts into linguistic forms. Formulation takes conceptual entities as input and connects them with the relevant words associated with them to build a syntactic, morphological, and phonological structure. This structure is phonetically encoded and articulated , resulting in speech.

During conceptualization, we develop an intention and select relevant information from the internal (memory) or external (stimuli) environment to create an utterance. Very little is known about this level as it is pre-verbal. Levelt (1989) divided this stage into microplanning and macroplanning. Macroplanning is thought to be the elaboration of a communication goal into subgoals and connecting them with the relevant information. Microplanning assigns the correct shape to these pieces of information and deciding on the focus of the utterance.

Formulation is divided into lexicalization and syntactic planning . In lexicalization, we select the relevant word-forms and in syntactic planning we put these together into a sentence. In talking about word-forms, we need to consider the idea of lemmas . This is the basic abstract conceptual form which is the basis for other derivations. For example, break can be considered a lemma which is the basis for other forms such as break , breaks , broke , broken and breaking . Lemma retrieval used a conceptual structure to retrieve a lemma that makes syntactic properties available for encoding (Kempen & Hoenkamp, 1987). This can specify the parameters such as number, tense, and gender. During word-form encoding, the information connected to lemmas is used to access the morphemes and phonemes linked to the word. The reason these two processing levels, lemma retrieval and word-form encoding, are assumed to exist comes from speech errors where words exchange within the same syntactic categories. For example, nouns exchange with nouns and verbs with verbs from different phrases. Bierwisch (1970), Garrett (1975, 1980) and Nooteboom (1967) provide some examples:

  • “… I left my briefcase in the cigar ”
  • “What we want to do is train its tongue to move the cat ”
  •  “We completely forgot to add the list to the roof ”
  • “As you reap , Roger, so shall you sow ”

We see here that not only are the exchange of words within syntactic categories, the function words associated with the exchanges appear to be added after the exchange (as in ‘its’ before ‘tongue’ and ‘the’ before ‘cat’). In contrast to entire words (which exchange across different phrases), segment exchanges usually occur within the same phrase and do not make any reference to syntactic categories. Garrett (1988) provides an example in “she is a real r ack p at” instead of “she is a real pack rat.” In such errors, the segments involved in the error often share phonetic similarities or share the same syllable position (Dell, 1984). This suggests that these segments must be operating within some frame such as syllable structure. To state this in broader terms, word exchanges are assumed to occur during lemma retrieval, and segment exchanges occur during word-form encoding.

Putting these basic elements together, Meyer (2000) introduced the ‘Standard Model of Word-form Encoding’ (see Figure 9.2) as a summation of previously proposed speech production models (Dell, 1986; Levelt et al., 1999; Shattuck-Huffnagel, 1979, 1983; Fromkin, 1971, 1973; Garrett, 1975, 1980). The model is not complete in itself but a way for understanding the various levels assumed by most psycholinguistic models. The model represents levels for morphemes, segments, and phonetic representations.

image description linked to in caption

Morpheme Level

We have already seen (in Chapter 3 ) that morphemes are the smallest units of meaning. A word can be made up on one or more morphemes. Speech errors involving morphemes effect the lemma level or the wordform level (Dell, 1986) as in:

  • “how many pies does it take to make an apple ?” (Garrett, 1988)
  • “so the apple has less trees ” (Garrett, 2001)
  • “I’d hear one if I knew it” (Garrett, 1980)
  • “… slice ly thinn ed” (Stemberger, 1985)

In the first, we see that the morpheme that indicates the plural number has remained in place while the morpheme for ‘apple’ and ‘pie’ exchanged. This is also seen in the last example. This suggests that the exchange occurred after the parameters for number were set indicating that lemmas can switch independent of their morphological and phonological representations (which occur further down in speech production).

Segment Level

While speech production models differ in their organisation and storage of segments, we will assume thay segments have to be retrieved at some level of speech production. Between 60-90% of all speech errors tend to involve segments (Boomer & Laver, 1968; Fromkin, 1971; Nooteboom, 1969; Shattuck-Hufnagel, 1983). However, 10-30% of all speech errors also involve segment sequences (Stemberger, 1983; Shattuck-Hufnagel, 1983). Reaction time experiments have also been employed to justify this level. Roeloffs (1999) asked participants to learn a set of word pairs followed by the first word in the pair being presented as a prompt to produce the second word. These test blocks were presented as either homogeneous or heterogenous phonological forms. In the homogenous blocks there were shared onsets or the segments differed only in voicing. In the heterogenous blocks the initial segments contrasted in voicing and place of articulation. He found that there were priming effects in homogenous blocks when the targets shared an initial segment but not when all but one feature was shared suggesting that whole phonological segments are represented at some level rather than distinctive features.

Phonetic Level

The segmental level we just discussed is based on phonemes. The standard understanding of speech is that there must be a phonetic level that represents the actual articulated speech as opposed to the stored representations of sound. We have already discussed this in Chapter 2 and will expand here. For example, in English, there are two realizations of unvoiced stops. One form is unaspirated /p/, /k/, and /t/ and the other is aspirated [ph], [kh], and [th]. This can be seen in the words pit [phɪt] and lip [lɪp] where syllable-initial stops are aspirated as a rule. The pronunciation of pit as *[pɪt] doesn’t change the meaning but will sound odd to a native speaker. This shows that /p/ has one phonemic value but two phonetic values: [p] and [ph]. This can be understood as going from an abstract level to a concrete level developing as speech production occurs. Having familiarized ourselves with the basic levels of speech production, we can now go on to see how they are realized in actual speech production models.

Image descriptions

Figure 9.2 The Standard Model of Speech Production

The Standard Model of Word-form Encoding as described by Meyer (2000), illustrating five level of summation of conceptualization, lemma, morphemes, phonemes, and phonetic levels, using the example word “tiger”. From top to bottom, the levels are:

  • Semantic level: the conceptualization of “tiger” with an image of a tiger.
  • Lemma level: select the lemma of the word “tiger”.
  • Morpheme level: morphological encoding of the word tiger, t, i, g, e, r.
  • Phoneme level: phonological encoding of each morpheme in the word “tiger”.
  • Phonetic kevel: syllabification of the phonemes in the word “tiger”.

[Return to place in text (Figure 9.2)]

Media Attributions

  • Figure 9.2 The Standard Model of Speech Production by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .

The process of forming a concept or idea.

The creation of the word form during speech production.

The formation of speech.

The process of developing a word for production.

The planning of word order in a sentence.

The form of a word as it is presented at the head of an entry in a dictionary.

Psychology of Language Copyright © 2021 by Dinesh Ramoo is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

what is meaning speech production

  • Tools and Resources
  • Customer Services
  • Applied Linguistics
  • Biology of Language
  • Cognitive Science
  • Computational Linguistics
  • Historical Linguistics
  • History of Linguistics
  • Language Families/Areas/Contact
  • Linguistic Theories
  • Neurolinguistics
  • Phonetics/Phonology
  • Psycholinguistics
  • Sign Languages
  • Sociolinguistics
  • Share Facebook LinkedIn Twitter

Article contents

The source–filter theory of speech.

  • Isao Tokuda Isao Tokuda Ritsumeikan University
  • https://doi.org/10.1093/acrefore/9780199384655.013.894
  • Published online: 29 November 2021

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds.

The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra.

It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.

  • source–filter theory
  • speech production
  • vocal fold vibration
  • turbulent air flow
  • vocal tract acoustics
  • resonance tuning
  • source–filter interaction

1. Background

Human speech sounds are generated by a complex interaction of components of human anatomy. Most speech sounds begin with the respiratory system, which expels air from the lungs (figure 1 ). The air goes through the trachea and enters into the larynx, where two small muscular folds, called “vocal folds,” are located. As the vocal folds are brought together to form a narrow air passage, the airstream causes them to vibrate in a periodic manner (Titze, 2008 ). The vocal fold vibrations modulate the air pressure and produce a periodic sound. The produced sounds, when the vocal folds are vibrating, are called “voiced sounds,” while those in which the vocal folds do not vibrate are called “unvoiced sounds.” The air passages above the larynx are called the “vocal tract.” Turbulent air flows generated at constricted parts of the glottis or the vocal tract also contribute to aperiodic source sounds distributed over a wide range of frequencies. The shape of the vocal tract and consequently the positions of the articulators (i.e., jaw, tongue, velum, lips, mouth, teeth, and hard palate) provide a crucial factor to determine acoustical characteristics of the speech sounds. The state of the vocal folds, as well as the positions, shapes, and sizes of the articulators, changes over time to produce various phonetic sounds sequentially.

Figure 1. Concept of the source-filter theory. Airflow from the lung induces vocal fold vibrations, where glottal source sound is created. The vocal tract filter shapes the spectral structure of the source sound. The filtered speech sound is finally radiated from the mouth.

To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941 ; Fant, 1960 ) (see figure 1 ): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the “source” sound. Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components, which correspond to the resonances of the vocal tract, are amplified, while the other frequency components are diminished. The source sound characterizes mainly the vocal pitch, while the filter forms the overall spectral structure.

The source-filter theory provides a good approximation of normal human speech, under which the source sounds are only weakly influenced by the vocal tract filter, and has been applied successfully to speech analysis, synthesis, and processing (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). Independent control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications with language, which requires expression of various phonemes with a flexible maneuver of the vocal tract configuration (Fitch, 2010 ; Lieberman, 1977 ).

2. Source-Filter Theory

There are four main types of sound sources that provide an acoustic input to the vocal tract filter: glottal source, aspiration source, frication source, and transient source (Stevens, 1999 , 2005 ).

The glottal source is generated by the vocal fold vibrations. The vocal folds are muscular folds located in the larynx. The opening space between the left and right vocal folds is called “glottal area.” When the vocal folds are closely located to each other, the airflow coming from the lungs can cause the vocal fold tissues to vibrate. With combined effects of pressure, airflow, tissue elasticity, and collision between the left and right vocal folds, the vocal folds give rise to vibrations, which periodically modulate acoustic air pressure at the glottis. The number of the periodic glottal vibrations per second is called “fundamental frequency ( f o )” and is expressed in Hz or cycles per second. In the spectral space, the glottal source sound determines the strengths of the fundamental frequency and its integer multiples (harmonics). The glottal wave provides sources for voiced sounds such as vowels (e.g., [a],[e],[i],[o],[u]), diphthongs (i.e., combinations of two vowel sounds), and voiced consonants (e.g., [b],[d],[ɡ],[v],[z],[ð],[ʒ],[ʤ], [h],[w],[n],[m],[r],[j],[ŋ],[l]).

In addition to the glottal source, noisy signals also serve as the sound sources for consonants. Here, air turbulence developed at constricted or obstructed parts of the airway contributes to random (aperiodic) pressure fluctuations over a wide range of frequencies. Among such noisy signals, the one generated through the glottis or immediately above the glottis is called “aspiration noise.” It is characterized by a strong burst of breath that accompanies either the release or the closure of some obstruents. “Frication noise,” on the other hand, is generated by forcing air through a supraglottal constriction created by placing two articulators close together (e.g., constrictions between lower lip and upper teeth, between back of the tongue and soft palate, and between side of the tongue and molars) (Shadle, 1985 , 1991 ). When an airway in the vocal tract is completely closed and then released, “transient noise” is generated. By forming a closure in the vocal tract, a pressure is built up in the mouth behind the closure. As the closure is released, a brief burst of turbulence is produced, which lasts for a few milliseconds.

Some speech sounds may involve more than one sound source. For instance, a voiced fricative combines the glottal source and the frication noise. A breathy voice may come from the glottal source and the aspiration noise, whereas voiceless fricatives can combine two noise sources generated at the glottis and at the supralaryngeal constriction. These sound sources are fed into the vocal-tract filter to create speech sounds.

In the source-filter theory, the vocal tract acts as an acoustic filter to modify the source sound. Through this acoustic filter, certain frequency components are passed to the output speech, while the others are attenuated. The characteristics of the filter depend upon the shape of the vocal tract. As a simple case, consider acoustic characteristics of an uniform tube of length L = 17.5 cm , that is, a standard length for a male vocal tract (see figure 2 ). At one end, the tube is closed (as glottis), while, at the other end, it is open (as mouth). Inside of the tube, longitudinal sound waves travel either toward the mouth or toward the glottis. The wave propagates by alternately compressing and expanding the air in the tube segments. By this compression/expansion, the air molecules are slightly displaced from their rest positions. Accordingly, the acoustic air pressure inside of the tube changes in time, depending upon the longitudinal displacement of the air along the direction of the traveling wave. Profile of the acoustic air pressure inside the tube is determined by the traveling waves going to the mouth or to the glottis. What is formed here is the “standing wave,” the peak amplitude profile of which does not move in space. The locations at which the absolute value of the amplitude is minimum are called “nodes,” whereas the locations at which the absolute value of the amplitude is maximum are called “antinodes.” Since the air molecules cannot vibrate much at the closed end of the tube, the closed end becomes a node. The open end of the tube, on the other hand, becomes an antinode, since the air molecules can move freely there. Various standing waves that satisfy these boundary conditions can be formed. In figure 2 , 1 / 4 (purple), 3 / 4 (green), and 5 / 4 (sky blue) waves indicate first, second, and third resonances, respectively. Depending upon the number of the nodes in the tube, wavelengths of the standing waves are determined as λ = 4 L , 4 / 3 L , 4 / 5 L . The corresponding frequencies are obtained as f = c / λ = 490 , 1470, 2450 Hz, where c = 343 m / s represents the sound speed. These resonant frequencies are called “formants” in phonetics.

Figure 2. Standing waves of an uniform tube. For a tube having one closed end (glottis) and one open end (mouth), only odd-numbered harmonics are available. 1 / 4 (purple), 3 / 4 (green), and 5 / 4 (sky blue) waves correspond to the first, second, and third resonances (“ 1 / 4 wave” means 1 / 4 of one-cycle waveform is inside the tube).

Next, consider that a source sound is input to this acoustic tube. In the source sound (voiced source or noise, or both), acoustic energy is distributed in a broad range of frequencies. The source sound induces vibrations of the air column inside the tube and produces a sound wave in the external air as the output. The strength at which an input frequency is output from this acoustic filter depends upon the characteristics of the tube. If the input frequency component is close to one of the formants, the tube resonates with the input and propagates the corresponding vibration. Consequently, the frequency components near the formant frequencies are passed to the output at their full strength. If the input frequency component is far from any of these formants, however, the tube does not resonate with the input. Such frequency components are strongly attenuated and achieve only low oscillation amplitudes in the output. In this way, the acoustic tube, or the vocal tract, filters the source sound. This filtering process can be characterized by a transfer function, which describes dependence of the amplification ratio between the input and output acoustic signals on the frequency. Physically, the transfer function is determined by the shape of the vocal tract.

Finally, the sound wave is radiated from the lips of the mouth and the nose. Their radiation characteristics are also included in the vocal-tract transfer function.

2.3 Convolution of the Source and the Filter

Humans are able to control phonation (source generation) and articulation (filtering process) largely independently. The speech sounds are therefore considered as the response of the vocal-tract filter, into which a sound source is fed. To model such source-filter systems for speech production, the sound source, or excitation signal x t , is often implemented as a periodic impulse train for voiced speech, while white noise is used as a source for unvoiced speech. If the vocal-tract configuration does not changed in time, the vocal-tract filter becomes a linear time-invariant (LTI) system, and the output signal y t can be expressed by a convolution of the input signal x t and the impulse response of the system h t as

where the asterisk denotes the convolution. Equation ( 1 ), which is described in the time domain, can be also expressed in the frequency domain as

The frequency domain formula states that the speech spectrum Y ω is modeled as a product of the source spectrum X ω and the spectrum of the vocal-tract filter H ω . The spectrum of the vocal-tract filter H ω is represented by the product of the vocal-tract transfer function T ω and the radiation characteristics from the mouth and the nose R ω , that is, H ω = T ω R ω .

There exist several ways to estimate the vocal-tract filter H ω . The most popular approach is the inverse filtering, in which autoregressive parameters are estimated from an acoustic speech signal by the method of least-squares (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). The transfer function can then be recovered from the estimated autoregressive parameters. In practice, however, the inverse-filtering is limited to non-nasalized or slightly nasalized vowels. An alternative approach is based upon the measurement of the vocal tract shape. For a human subject, a cross-sectional area of the vocal tract can be measured by X-ray photography or magnetic resonance imaging (MRI). Once the area function of the vocal tract is obtained, the corresponding transfer function can be computed by the so-called transmission line model, which assumes one-dimensional plane-wave propagation inside the vocal tract (Sondhi & Schroeter, 1987 ; Story et al., 1996 ).

Figure 3. (a) Vocal tract area function for a male speaker’s vowel [a]. (b) Transfer function calculated from the area function of (a). (c) Power spectrum of the source sound generated from Liljencrants-Fant model. (d) Power spectrum of the speech signal generated from the source-filter theory.

As an example to illustrate the source-filter modeling, a sound of vowel /a/ is synthesized in figure 3 . The vocal tract area function of figure 3 (a) was measured from a male subject by the MRI (Story et al., 1996 ). By the transmission line model, the transfer function H ω is obtained as figure 3 (b) . The first and the second formants are located at F 1 = 805 Hz and F 2 = 1205 . By the inverse Fourier transform, the impulse response of the vocal tract system h t is derived. As a glottal source sound, the Liljencrants-Fant synthesize model (Fant et al., 1985 ) is utilized. The fundamental frequency is set to f o = 100 Hz , which gives rise to a sharp peak in the power spectrum in figure 3 (c) . Except for the peaks appearing at higher harmonics of f o , the spectral structure of the glottal source is rather flat. As shown in figure 3 (d) , convolution of the source signal with the vocal tract filter amplifies the higher harmonics of f o located close to the formants.

Since the source-filter modeling captures essence of the speech production, it has been successfully applied to speech analysis, synthesis, and processing (Atal & Schroeder, 1978 ; Markel & Gray, 2013 ). It was Chiba and Kajiyama ( 1941 ) who first explained the mechanisms of speech production based on the concept of phonation (source) and articulation (filter). Their idea was combined with Fant’s filter theory (Fant, 1960 ), which led to the “source-filter theory of vowel production” in the studies of speech production.

So far, the source-filter modeling has been applied only to the glottal source, in which the vocal fold vibrations provide the main source sounds. There are other sound sources, such as the frication noise. In the frication noise, air turbulence is developed at constricted (or obstructed) parts of the airway. Such random source also excites the resonances of the vocal tract in a similar manner as the glottal source (Stevens, 1999 , 2005 ). Its marked difference from the glottal source is that the filter property is determined by the vocal tract shape downstream from the constriction (or obstruction). For instance, if the constriction is at the lips, there exists no cavity downstream from the constriction, and therefore the acoustic source is radiated directly from the mouth opening with no filtering. When the constriction is upstream from the lips, the shape of the airway between the constriction and the lips determines the filter properties. It should be also noted that the turbulent source, generated at the constriction, depends sensitively on a three-dimensional geometry of the vocal tract. Therefore, the three-dimensional shape of the vocal tract (not the one-dimensional shape of the area function) should be taken into account to model the frication noise (Shadle, 1985 , 1991 ).

3. Resonance Tuning

As an interesting application of the source-filter theory, “resonance tuning” (Sundberg, 1989 ) is illustrated. In female speech, the first and the second formants lie between 300 and 900 Hz and between 900 and 2,800 Hz, respectively. In soprano singing, the vocal pitch can reach to these two ranges. To increase the efficiency of the vocalization at high f o , a soprano singer adjusts the shape of the vocal tract to tune the first or second resonance ( R 1 or R 2 ) to the fundamental frequency f o . When one of the harmonics of the f o coincides with a formant resonance, the resulting acoustic power (and musical success) is enhanced.

Figure 4. Resonance tuning. (a) The same transfer function as figure 3 (b). (b) Power spectrum of the source sound, whose fundamental frequency f o is tuned to the first resonance R 1 of the vocal tract. (c) Power spectrum of the speech signal generated from the source-filter theory. (d) Dependence of the amplification rate (i.e., power ratio between the output speech and the input source) on the fundamental frequency f o .

Figure 4 shows an example of the resonance tuning, in which the fundamental frequency is tuned to the first resonance R 1 of the vowel /a/ as f o = 805 Hz . As recognized in the output speech spectrum (figure 4 (c) ), the vocal tract filter strongly amplifies the fundamental frequency component of the vocal source, while the other harmonics are attenuated. Since only a single frequency component is emphasized, the output speech sounds like a pure tone. Figure 4 (d) shows dependence of the amplification ratio (i.e., the power ratio between the output speech and the input source) on the fundamental frequency f o . Indeed, the power of the output speech is maximized at the resonance tuning point of f o = 805 Hz . Without losing the source power, loud voices can be produced with less effort from the singers and, moreover, they are well perceived in a large concert hall over the orchestra (Joliveau et al., 2004 ).

Despite the significant increase in loudness, comprehensibility is sacrificed. With a strong enhancement of the fundamental frequency f o , its higher harmonics are weakened considerably, making it difficult to perceive the formant structure (figure 4 (c) ). This explains why it is difficult to identify words sung in the high range by sopranos.

The resonance tuning discussed here has been based on the linear convolution of the source and the filter, which are assumed to be independent from each other. In reality, however, the source and the filter interact with each other. Depending upon the acoustic property of the vocal tract, it facilitates the vocal fold oscillations and makes the vocal source stronger. Consequently, this source-filter interaction can make the output speech sound even louder in addition to the linear resonance effect. Such interaction will be explained in more detail in section 4 .

It should be of interest to note that some animals such as songbirds and gibbons utilize the technique of resonance tuning in their vocalizations (Koda et al., 2012 ; Nowicki, 1987 ; Riede et al., 2006 ). It has been found through X-ray filming as well as via heliox experiments that these animals adjust the vocal tract resonance to track the fundamental frequency f o . This may facilitate the acoustic communication by increasing the loudness of their vocalization. Again, higher harmonic components, which are needed to emphasize the formants in human language communications, are suppressed. Whether the animals utilize formants information in their communications is under debate (Fitch, 2010 ; Lieberman, 1977 ) but, at least in this context, production of a loud sound is more advantageous for long-distance alarm calls and pure-tone singing of animals.

4. Source-Filter Interaction

The linear source–filter theory, under which speech is represented as a convolution of the source and the filter, is based upon the assumption that the vocal fold vibrations as well as the turbulent noise sources are only weakly influenced by the vocal tract. Such an assumption is, however, valid mostly for male adult speech. The actual process of speech production is nonlinear. The vocal fold oscillations are due to combined effects of pressure, airflow, tissue elasticity, and tissue collision. It is natural that such a complex system obeys nonlinear equations of motion. Aerodynamics inside the glottis and the vocal tract is also governed by nonlinear equations in a strict sense. Moreover, there exists a mutual interaction between the source and the filter (Flanagan, 1968 ; Lucero et al., 2012 ; Rothenberg, 1981 ; Titze, 2008 ; Titze & Alipour, 2006 ). First, the source sound, which is generated from the vocal folds, is influenced by the vocal tract, since the vocal tract determines pressure above the vocal folds to change the aerodynamics of the glottal flow. As described in section 2.3 , the turbulent source is also very sensitive to the vocal tract geometry. Second, the source sound, which then propagates through the vocal tract, is not only radiated from the mouth but is also partially reflected back to the glottis through the vocal tract. Such reflection can influence the vocal fold oscillations, especially when the fundamental frequency or its harmonics is closely located to one of the vocal tract resonances, for instance, in singing. The strong acoustic feedback makes the interrelation between the source and the filter nonlinear and induces various voice instabilities, for example., sudden pitch jump, subharmonics, resonance, quenching, and chaos (Hatzikirou et al., 2006 ; Lucero et al., 2012 ; Migimatsu & Tokuda, 2019 ; Titze et al., 2008 ).

Figure 5. Example of a glissando singing. A male subject glided the fundamental frequency ( f o ) from 120 Hz to 350 Hz and then back. The first resonance ( R 1 = 270 Hz ) is indicated by a black bold line. The pitch jump occurred when f o crossed R 1 .

Figure 5 shows a spectrogram that demonstrates such pitch jump. The horizontal axis represents time, while the vertical axis represents spectral power of a singing voice. In this recording, a male singer glided his pitch in a certain frequency range. Accordingly, the fundamental frequency increases from 120 Hz to 350 Hz and then decreases back to 120 Hz. Around 270Hz, the fundamental frequency or its higher harmonics crosses one of the resonances of the vocal tract (black bold line of figure 5 ), and it jumps abruptly. At such frequency crossing point, acoustic reflection from the vocal tract to the vocal folds becomes very strong and non-negligible. The source-filter interaction has two aspects (Story et al., 2000 ). On one side, the vocal tract acoustics facilitates the vocal fold oscillations and contributes to the production of a loud vocal sound as discussed in the resonance tuning (section 3 ). On the other side, the vocal tract acoustics inhibits the vocal fold oscillations and consequently induces a voice instability. For instance, the vocal folds oscillation can stop suddenly or spontaneously jump to another fundamental frequency as exemplified by the glissando singing of figure 5 . To avoid such voice instabilities, singers must weaken the level of the acoustic coupling, possibly by adjusting the epilarynx, whenever the frequency crossing takes place (Lucero et al., 2012 ; Titze et al., 2008 ).

5. Conclusions

Summarizing, the source-filter theory has been described as a basic framework to model human speech production. The source is generated from the vocal fold oscillations and/or the turbulent airflows developed above the glottis. The vocal tract functions as a filter to modify the spectral structure of the source sounds. This filtering mechanism has been explained in terms of the resonances of the acoustical tube. Independence between the source and the filter is vital for language-based acoustic communications in humans, which require flexible maneuvering of the vocal tract configuration to express various phonemes sequentially and smoothly (Fitch, 2010 ; Lieberman, 1977 ). As an application of the source-filter theory, resonance tuning is explained as a technique utilized by soprano singers and some animals. Finally, existence of the source-filter interaction has been described. It is inevitable that the source sound is aerodynamically influenced by the vocal tract, since they are closely located to each other. Moreover, acoustic pressure wave reflecting back from the vocal tract to the glottis influences the vocal fold oscillations and can induce various voice instabilities. The source-filter interaction may become strong when the fundamental frequency or its higher harmonics crosses one of the vocal tract resonances, for example, in singing.

Further Reading

  • Atal, B. S. , & Schroeder, M. (1978). Linear prediction analysis of speech based on a pole-zero representation. The Journal of the Acoustical Society of America , 64 (5), 1310–1318.
  • Chiba, T. , & Kajiyama, M. (1941). The vowel: Its nature and structure . Tokyo, Japan: Kaiseikan.
  • Fant, G. (1960). Acoustic theory of speech production . The Hague, The Netherlands: Mouton.
  • Lieberman, P. (1977). Speech physiology and acoustic phonetics: An introduction . New York: Macmillan.
  • Markel, J. D. , & Gray, A. J. (2013). Linear prediction of speech (Vol. 12). New York: Springer Science & Business Media.
  • Stevens, K. N. (1999). Acoustic phonetics . Cambridge, MA: MIT Press.
  • Sundberg, J. (1989). The science of singing voice . DeKalb, IL: Northern Illinois University Press.
  • Titze, I. R. (1994). Principles of voice production . Englewood Cliffs, NJ: Prentice Hall.
  • Titze, I. R. , & Alipour, F. (2006). The myoelastic aerodynamic theory of phonation . Iowa, IA: National Center for Voice and Speech.
  • Fant, G. , Liljencrants, J. , & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory. Quarterly Progress and Status Report , 26 (4), 1–13.
  • Fitch, W. T. (2010). The evolution of language . Cambridge, UK: Cambridge University Press.
  • Flanagan, J. L. (1968). Source-system interaction in the vocal tract. Annals of the New York Academy of Sciences , 155 (1), 9–17.
  • Hatzikirou, H. , Fitch, W. T. , & Herzel, H. (2006). Voice instabilities due to source-tract interactions. Acta Acoustica United With Acoustica , 92 , 468–475.
  • Joliveau, E. , Smith, J. , & Wolfe, J. (2004). Acoustics: Tuning of vocal tract resonance by sopranos. Nature , 427 (6970), 116.
  • Koda, H. , Nishimura, T. , Tokuda, I. T. , Oyakawa, C. , Nihonmatsu, T. , & Masataka, N. (2012). Soprano singing in gibbons. American Journal of Physical Anthropology , 149 (3), 347–355.
  • Lucero, J. C. , Lourenço, K. G. , Hermant, N. , Van Hirtum, A. , & Pelorson, X. (2012). Effect of source–tract acoustical coupling on the oscillation onset of the vocal folds. The Journal of the Acoustical Society of America , 132 (1), 403–411.
  • Migimatsu, K. , & Tokuda, I. T. (2019). Experimental study on nonlinear source–filter interaction using synthetic vocal fold models. The Journal of the Acoustical Society of America , 146 (2), 983–997.
  • Nowicki, S. (1987). Vocal tract resonances in oscine bird sound production: Evidence from birdsongs in a helium atmosphere. Nature , 325 (6099), 53–55.
  • Riede, T. , Suthers, R. A. , Fletcher, N. H. , & Blevins, W. E. (2006). Songbirds tune their vocal tract to the fundamental frequency of their song. Proceedings of the National Academy of Sciences , 103 (14), 5543–5548.
  • Rothenberg, M. (1981). The voice source in singing. In J. Sundberg (Ed.), Research aspects on singing (pp. 15–33). Stockholm, Sweden: Royal Swedish Academy of Music.
  • Shadle, C. H. (1985). The acoustics of fricative consonants [Doctoral thesis]. Cambridge, MA: Massachusetts Institute of Technology, released as MIT-RLE Technical Report No. 506.
  • Shadle, C. H. (1991). The effect of geometry on source mechanisms of fricative consonants. Journal of Phonetics , 19 (3–4), 409–424.
  • Sondhi, M. , & Schroeter, J. (1987). A hybrid time-frequency domain articulatory speech synthesizer. IEEE Transactions on Acoustics, Speech, and Signal Processing , 35 (7), 955–967.
  • Stevens, K. N. (2005). The acoustic/articulatory interface. Acoustical Science and Technology , 26 (5), 410–417.
  • Story, B. H. , Laukkanen, A.M. , & Titze, I. R. (2000). Acoustic impedance of an artificially lengthened and constricted vocal tract. Journal of Voice , 14 (4), 455–469.
  • Story, B. H. , Titze, I. R. , & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America , 100 (1), 537–554.
  • Sundberg, J. (1989). The science of singing voice . DeKlab, IL: Northern Illinois University Press.
  • Titze, I. R. (2008). Nonlinear source–filter coupling in phonation: Theory. The Journal of the Acoustical Society of America , 123 (4), 1902–1915.
  • Titze, I. , Riede, T. , & Popolo, P. (2008). Nonlinear source–filter coupling in phonation: Vocal exercises. The Journal of the Acoustical Society of America , 123 (4), 1902–1915.

Related Articles

  • Articulatory Phonetics
  • Child Phonology
  • Speech Perception in Phonetics
  • Direct Perception of Speech
  • Phonetics of Singing in Western Classical Style
  • Phonetics of Vowels
  • Phonetics of Consonants
  • Audiovisual Speech Perception and the McGurk Effect
  • The Motor Theory of Speech Perception
  • Articulatory Phonology
  • The Phonetics of Prosody
  • Tongue Muscle Anatomy: Architecture and Function

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 25 September 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • [185.80.151.41]
  • 185.80.151.41

Character limit 500 /500

Spencer Coffman

  • Guest Posts
  • Press Releases
  • Journal Articles
  • Supplemental Content
  • Guest Podcasts

Select Page

4 Stages of Speech Production

4 Stages of Speech Production

Humans produce speech on a daily basis. People are social creatures and are always talking to one another. Whether it is through social media, live conversation, texting, chat, or otherwise, we are always producing some form of speech. We produce this speech without thought.

That is, without the thought of how we produce it. Of course, we think about what we are going to say and how to say it so that the other people will listen but we don’t think about what it is made of and how our mind and body actually product speech.

If you have been following my other language-related articles, then you will not be surprised to find out that there are four stages of speech production. It seems that those who classified this data did so in measures of fours and fives. There are…

Five Methods to Learn a Language

Four Ways to Assess Student Knowledge

Five Language Learning Strategies

Four Properties of Spoken Language

The list goes on! Now we have four stages of speech production. These are the processes by which humans produce speech. All of the ways that we come up with the words we say have been compiled into four stages. These stages are not consecutive like normal scientific stages. Instead, they are simply classified as such.

This means that they are not something you go through developmentally. Rather they are simply different ways in which you may produce speech. I’ll describe each one of them so you can learn and understand what they are and know how exactly you come up with everything you say.

Money To Invest… Stop Paying For Trades!

Stage 1 – Conceptualization

The first one is called the Conceptualization Stage. This is when a speaker spontaneously thinks of what he or she is going to say. It is an immediate reaction to external stimuli and is often based on prior knowledge of the particular subject. No premeditation goes into these words and they are all formulated based upon the speaker’s knowledge and experience at hand. It is spontaneous speech. Examples of this can range from answering questions to the immediate verbiage produced as a result of stubbing your toe.

Stage 2 – Formulation

The second stage is called the Formulation Stage. This is when the speaker thinks of the particular words that are going to express their thoughts. It occurs almost simultaneously with the conceptualization stage. However, this time the speaker thinks about the response before responding. The speaker is formulating his or her words and deciding how best to reply to the external stimuli. Where conceptualization is more of an instant and immediate response, formulation is a little delayed.

Stage 3 – Articulation

The third stage is the Articulation Stage. This is when the speaker physically says what he or she has thought of saying. This is a prepared speech or planned wordage. In addition, the words may have been rehearsed such as when someone practices a presentation or rehearses a lie.

It involves the training of physical actions of several motor speech organs such as the lungs, larynx, tongue, lips, and other vocal apparatuses. Of course, the first two stages also involve these organs, however, the articulation stage uses these organs multiple times for the same word patterns.

Stage 4 – Self-Monitoring

The fourth stage is called the Self-Monitoring Stage. This is when the speaker reflects on what he or she has said and makes an effort to correct any errors in his or her speech. Often times this is done in a rebuttal or last words argument.

In addition, it could also be done during a conversation when the speaker realizes that he or she slipped up. This is the action of reflecting on what you said and making sure that what you said is what you meant.

Real-Time Spell Check And Grammar Correction

There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage. Of course, you won’t be able to consciously notice what stage you are in all of the time. However, once in a while it may be amusing for you to reflect on these stages and see how they coincide with the words you speak.

For more great information take a look at  the supplemental content on this website  and check out  these great blog posts . In addition, feel free to  connect with me on social media.

Enjoying This Content?

Consider donating to support spencer coffman, venmo         paypal          cashapp, related posts.

When To Consider Debt Consolidation And How To Do It

When To Consider Debt Consolidation And How To Do It

December 28, 2017

Steemit Contests Help You Grow Faster

Steemit Contests Help You Grow Faster

December 14, 2020

Best Practices For Assessing Student Knowledge

Best Practices For Assessing Student Knowledge

September 27, 2017

9 Intelligence Types And The Best Activities For Each

9 Intelligence Types And The Best Activities For Each

March 1, 2018

  • Create account
  • Contributions

Psycholinguistics/Development of Speech Production

Introduction.

Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. However, we rarely consider how this ability develops. We know infants often begin producing one-word utterances, such as "mama," eventually move to two-word utterances, such as "gimme toy" and finally sound like an adult. However, the process itself involves development not only of the vocal sounds (phonology), but also semantics (meaning of words), morphology and syntax (rules and structure). How do children learn to this complex ability? Considering that an infant goes from an inability to speak to two-word utterances within 2 years, the accelerated development pattern is incredible and deserves some attention. When we ponder children's speech production development more closely, we begin to ask more questions. How does a child who says "tree" for "three" eventually learn to correct him/herself? How does a child know "nana" (banana) is the yellow,boat-shaped fruit he/she enjoys eating? Why does a child call all four-legged animals "horsie"? Why does this child say "I goed to the kitchen"? What causes a child to learn words such as "doggie" before "hand"? This chapter will address these questions and focus on the four areas of speech development mentioned: phonology, semantics, and morphology and syntax.

Prelinguistic Speech Development

Throughout infancy, vocalizations develop from automatic, reflexive vocalizations with no linguistic meaning to articulated words with meaning and intonation. In this section, we will examine the various stages an infant goes through while developing speech. In general, researchers seem to agree that as infants develop they increase their speech-like vocalizations and decrease their non-speech vocalizations (Nathani, Ertmer, & Stark) [ 1 ] . Many researchers (Oller, ; [ 2 ] Stark, as cited in Nathani, Ertmer, & Stark, 2006) [ 1 ] . Many researchers (Oller; [ 2 ] Stark, as cited in Nathani, Ertmer, & Stark,) [ 1 ] have documented this development and suggest growth through the following five stages: reflexive vocalizations, cooing and laughing, vocal play (expansion stage) , canonical babbling and finally, the integration stage.

Stage 1: Reflexive Vocalization

what is meaning speech production

As newborns, infants make noises in responses to their environment and current needs. These reflexive vocalizations may consist of crying or vegetative sounds such as grunting, burping, sneezing, and coughing (Oller) [ 2 ] . Although it is often thought that infants of this age do not show evidence of linguistic abilities, a recent study has found that newborns’ cries follow the melody of their surrounding language input (Mampe, Friederici, Christophe, & Wermke) [ 3 ] . They discovered that the French newborns’ pattern was a rising contour, where the melody of the cry rose slowly and then quickly decreased. In comparison, the German newborns’ cry pattern rose quickly and slowly decreased. These patterns matched the intonation patterns that are found in each of the respective spoken languages. Their finding suggest that perhaps infants vocalizations are not exclusively reflexive and may contain patterns of their native language.

Stage 2: Gooing, Cooing and Laughing

Between 2 and 4 months, infants begin to produce “cooing” and “gooing” to demonstrate their comfort states. These sounds may often take the form of vowel-like sounds such as “aah” or “oooh.” This stage is often associated with a happy infant as laughing and giggling begin and crying is reduced. Infants will also engage in more face-to-face interactions with their caregivers, smiling and attempting to make eye contact (Oller) [ 2 ] .

Stage 3: Vocal Play

From 4 to 6 months, and infants will attempt to vary the sounds they can produce using their developing vocal apparatus. They show a desire to explore and develop new sounds which may include yells, squeals, growls and whispers(Oller) [ 2 ] . Face-to-face interactions are still important at this stage as it promotes development of conversation abililities. Beebe, Alson, Jaffe et al. [ 4 ] found that even at this young age, infants’ vocal expression show a “ dialogic structure ” - meaning that, during interactions with caregivers, infants were able to take turns vocalizing.

Stage 4: Canonical babbling

After 6 months, infants begin to make and combine sounds that are found in their native language, sometimes known as “well-formed syllables,” which are often replicated in their first words(Oller) [ 2 ] . During this stage, infants combine consonants and vowels and replicate them over and over - they are thus called reduplicated babble . For example, an infant may produce ‘ga-ga’ over and over. Eventually, infants will begin to string together multiple varied syllables, such as ‘gabamaga’, called variegated babbles . Other times, infants will move right into the variegated babbles stage without evidence of the reduplicated babbles (Oller) [ 2 ] . Early in this stage, infants do not produce these sounds for communicative purposes. As they move closer to pronouncing their first words, they may begin to use use sounds for rudimentary communicative purposes(Oller) [ 2 ] .

Stage 5: Integration

what is meaning speech production

In the final stage of prelinguistic speech, 10 month-old infants use intonation and stress patterns in their babbling syllables, imitating adult-like speech. This stage is sometimes known as conversational babble or gibberish because infants may also use gestures and eye movements which resemble conversations(Oller) [ 2 ] . Interestingly, they also seem to have acoustic differences in their vocalizations depending on the purpose of their communication. Papaeliou and Trevarthen [ 5 ] found that when they were communicating for social purposes they used a higher pitch and were more expressive in their vocalizations and gestures than when exploring and investigating their surroundings. The transition from gibberish to real words is not obvious(Oller) [ 2 ] as this stage often overlaps with the acquisition of an infant’s first words. These words begin when an infant understands that the sounds produced are associated with an object .During this stage, infants develop vocal motor schemes , the consistent production of certain consonants in a certain period of time. Keren-Portnoy and Marjorano’s [ 6 ] study showed that these vocal motor schemes play a significant part in the development of first words as children who children who mastered them earlier, produced words earlier. These consistent consonants were used in babble and vocal motor schemes, and would also be present in a child’s first words. Evidence that a child may understand the connection between context and sounds is shown when they make consistent sound patterns in certain contexts (Oller) [ 2 ] . For example, a child may begin to call his favorite toy “mub.” These phonetically-consistent sound patterns, known as protowords or quasi-words , do not always reflect real words, but they are an important step towards achieving adult-like speech(Otomo [ 7 ] ; Oller) [ 2 ] . Infants may also use their proto-words to represent an entire sentence (Vetter) [ 8 ] . For example, the child may say “mub” but may be expressing “I want my toy”, “Give me back my toy” “Where is my toy?”, etc.

Phonological Development

When a child explicitly pronounces their first word they have understood the association between sounds and their meaning Yet, their pronunciation may be poor, they produce phonetic errors, and have yet to produce all the sound combinations in their language. Researchers have come up with many theories about the patterns and rules children and infants use while developing their language. In this section, we will examine some frequent error patterns and basic rules children use to articulate words. We will also look how phonological development can be enhanced.

Patterns of Speech

Depending on their personalities and individual development, infants develop their speech production slightly differently. Some children, productive learners , attempt any word regardless of proper pronunciation (Rabagaliati, Marcus, & Pylkkänen) [ 9 ] . Conservative learners (Rabagaliati, Marcus, & Pylkkänen) [ 9 ] , are hesitant until they are confident in their pronunciation. Other differences include preference to use nouns and name things versus use of language in a more social context. (Bates et al., as cited in Smits-Bandstra) [ 10 ] . Although infants vary in their first words and the development of their phonology, by examining the sound patterns found in their early language, researchers have extracted many similar patterns. For example, McIntosh and Dodd [ 11 ] examined these patterns in 2 year olds and found that they were able to produce multiple phonemes but were lacking [ ʃ , θ , tʃ , dʒ , r ]. They were also able to produce complex syllables. Vowel errors also occurred, although consonant errors are much more prevalent. The development of phonemes continues throughout childhood and many are not completely developed until age 8 (Vetter) [ 8 ] .

Phonological Errors

As a child pronounces new words and phonemes, he/she may produce various errors that follow patterns. However, all errors will reduce with age (McIntosh & Dodd) [ 11 ] . Although each child does not necessarily produce the same errors, errors can typically be categorized into various groups. For example, they are multiple kinds of consonant errors. A cluster reduction involves reducing a multiple consonants in a row (ie: skate). Most often, a child will skip the first consonant (thus skate becomes kate), or they may leave out the second stop consonant ( consonant deletion - Wyllie-Smith, McLeod, & Ball) [ 12 ] (thus skate becomes sate). This type of error has been found by McIntosh and Dodd [ 11 ] . For words that have multiple syllables, a child may skip the unstressed syllable at the beginning of the sentence (ie: potato becomes tato) or in the middle of a sentence (ie: telephone becomes tephone) (Ganger & Brent) [ 13 ] . This omission may simply be due to the properties of unstressed syllables as they are more difficult to perceive and thus a child may simply lack attention to it. As a child grows more aware of the unstressed syllable, he/she may chose to insert a dummy syllable in place of the unstressed syllable to attempt to lengthen the utterance (Aoyama, Peters, & Winchester [ 14 ] ). For example, a child may say [ə hat] (‘ə hot’) (Clark, as cited in Smits-Bandstra) [ 10 ] . Replacement shows that the child understands that there should be some sound there, but the child has inserted the wrong one. Another common phonological error pattern is assimilation . A child may pronounce a word such that a phoneme within that word sounds more like another phoneme near it (McIntosh & Dodd) [ 11 ] . For example, a child may say “”gug” instead of “bug”. This kind of error may also be seen for with vowels and is common in 2 year-olds, but decreases with age (Newton) [ 15 ] .

Definition of Error Patterns

Definition of error pattern

Factors affecting development of phonology

what is meaning speech production

As adequate phonology is an important aspect in effective communication, researchers are interested in factors that can enhance it. In a study done by Goldstein and Schwade [ 16 ] , it was found that interactions with caregivers provided opportunities for8-10 month old infants to increase their babbling of language sounds (consonant-vowel syllables and vowels). This study also found that infants were not simply imitations their caregivers vocalizations as they were producing various phonological patterns and had longer vocalizations! Thus, it would seem that social feedback from caregivers advances infants phonological development. On the other hand, factors such as hearing impairment, can negatively affect phonological development (Nicolaidis [ 17 ] ). A Greek population with hearing impairments was compared to a control group and it was found that they have a different pattern of pronunciation of phonemes. Their pattern displayed substitutions (ie:[x] for target /k/), distortions (ie: place of articulation)and epenthesis/cluster production (ie:[ʃtʃ] or [jθ] for /s/) of words.

Semantic Development

When children purposefully use words they are trying to express a desire, refusal, a label or for social communication (Ninio & Snow ) [ 18 ] . As a child begins to understand that each word has a specific purpose, they will inevitably need to learn meaning of multiple words. Their vocabulary will rapidly expand as they experience various social contexts, sing songs, practice routines and through direct instruction at school (Smits-Bandstra, 2006) [ 19 ] . In this section, we will examine children’s first words, their vocabulary spurt, and what their semantic errors are like.

First Words

Many studies have analyzed the types of words found in early speech. Overall, children’s first words are usually shorter in syllabic length, easier to pronounce, and occur frequently in everyday speech (Storkel, 2004 [ 20 ] ). Whether early vocabularies have a noun-bias or not tends to divide researchers. Some researchers argue that the noun bias, or children’s tendency to produce names for objects, people and animals, is sufficient evidence of this bias (Gllette et al.) [ 21 ] . However, this bias may not be entirely accurate. Recently, Tardif [ 22 ] studied first words cross-culturally between English, Cantonese and Mandarin 8-16 month old infants and found interesting differences. Although all children used terms for people, there was much variation between languages for animals and objects. This suggests that there may be some language differences in which types of words children acquire first.

Vocabulary Spurt

what is meaning speech production

Around the age of 18 months, many infants will undergo a vocabulary spurt , or vocabulary explosion , where they learn new words at an increasingly rapid rate (Smits-Bandstra) [ 10 ] ; Mitchell & McMurray,2009 [ 23 ] . Before onset of this spurt, the first 50 words a child learned as usually acquired at a gradual rate (Plunkett, as cited in Smits-Bandstra) [ 10 ] .Afterward the spurt, some studies have found upwards of 20 words learned per week( Mitchell and McMurray) [ 23 ] . There has been a lot of speculation about the process underlying the vocabulary spurt and there are three main theories. First, it has been suggested that the vocabulary spurt results from the naming insight (Reznick and Gldfield) [ 24 ] . The naming insight is a process where children begin to understand that referents can be labeled, either out of context or in place of the object. Second, this period seems to coincide with Piaget’s sensorimotor stage in which children are expanding their understanding of categorizing concepts and objects. Thus, children would necessarily need to expand their vocabulary to label categories (Gopnik) [ 25 ] . Finally, it has been suggested that leveraged learning may facilitate the vocabulary explosion (Mitchell & McMurray) [ 23 ] . Learning any word begins slowly - one word is learned, which acts as a ‘leverage’ to learn the next word, then those two words can each facilitates learning a new word, and so on. Learning therefore becomes easier. It is possible that not all children experience a vocabulary spurt, however. Some researchers have tested to determine whether there truly is an accelerated learning process. Interestingly, Ganger and Brent [ 13 ] used a mathematical model and found that only a minority of the infants studied fit the criteria of a growth spurt. Thus the growth spurt may not be as common as once believed.

Semantic Errors

Even after a child has developed a large vocabulary; errors are made in selecting words to convey the desired meaning. One type of improper word selection is when children invent a word (called lexical innovation ). This is usually because they have not yet learned a word associated with the meaning they are trying to express, or they simply cannot retrieve it properly. Although made-up words are not real words, it is fairly easy to figure out what a child means, and sometimes easier to remember than the traditional words (Clarke, as cited in Swan) [ 26 ] . For example, a child may say “pourer” for “cup” (Clarke, as cited in Swan) [ 26 ] .These lexical innovations show that the child is able to understand derivational morphology and use it creatively and productively (Swan) [ 26 ] .

Sometimes children may use a word in an inappropriate context either extending or restricting use of the word. For example, a child says “doggie” while pointing to any four-legged animal - this is known as overextension and is most common in 1-2 year olds (McGregor, et al. [ 27 ] Bloomquist; [ 28 ] Bowerman; [ 29 ] Jerger & Damian) [ 30 ] . Other times, children may use a word only in one specific context, this is called underextension (McGregor, et al. [ 27 ] Bloomquist; [ 28 ] Bowerman; [ 29 ] Jerger & Damian) [ 30 ] . For example, they may only say “baba” for their bottle and not another infant’s bottle. Semantic errors manifest themselves in naming tasks and provide an opportunity to examine how children might organize semantic representations. In McGregor et al.’s [ 27 ] naming pictures task for 3-5 year olds, errors were most often related to functional or physical properties (ie: saying chair for saddle). Why are such errors produced? McGregor et al. [ 27 ] proposed three reasons for these errors:

Grammatical and Morphological Development

As children develop larger lexicons, they begin to combine words into sentences that become progressively long and complex, demonstrating their syntactic development. Longer utterances provide evidence that children are reaching an important milestone in beginning the development of morphosyntax (Aoyama et al.) [ 14 ] . Brown [ 31 ] developed a method that would measure syntactic growth called mean length of the utterance (MLU) . It is determined by recording or listening to a 30-minute sample of a child’s speech, counting the number of meaningful morphemes (semantic roles – see chart below) and dividing it by the number of utterances. Meaningful morphemes can be function words (ie: “of” ), content words (ie: “cat”) or grammatical inflections (ie: -s). Utterances will include each separate thought conveyed thus repetitions, filler words, recitations, or titles and compound words would be counted as one utterance. Brown ended up with 5 different stages to describe syntactical development: Stage I (MLU 1.0-2.0), Stage II (MLU 2.0-2.5), Stage III (MLU 2.5-3.0), Stage IV (MLU 3.0-3.5) Stage V (MLU 3.5-4.0).

Semantic roles

Stage Stage I Stage II Stage III Stage IV Stage V
MLU 1.0-2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0

What is this child's MLU?

Sample of speech: # of utterances: # of morphemes: MLU:
“Mommy, want cookie. No dinner! Drink juice.”

3

1- Mommy want cookie.
2- No dinner
3- Drink juice

7

1- Mommy
2- want
3- Cookie
4- no
5- dinner
6- drink
7- juice

7/3= 2.33

Two-word utterances

Around the age of 18 months, children’s utterances are usually in two-word forms such as “want that, mommy do, doll fall, etc.” (Vetter [ 8 ] . In English, these forms are dominated by content words such as nouns, verbs and adjectives and are restricted to concepts that the child is learning based on their sensorimotor stage as suggested by Piaget (Brown) [ 31 ] . Thus, they will express relations between objects, actions and people. This type of speech is called telegraphic speech . During this development stage, children are combining words to convey various meanings. They are also displaying evidence of grammatical structure with consistent word orders and inflections.(Behrens & Gut; [ 32 ] Vetter) [ 8 ] .

Once the child moves from Stage 1, simple sentences begin to form and the child begins to use inflections and function words (Aoyama et al.) [ 14 ] . At this time, the child develops grammatical morphemes (Brown) [ 31 ] which are classified into 14 different categories organized by acquisition (See chart below).These morphemes modify the meaning of the utterance such as tense, plurality, possession, etc. There are two theories for why this particular order takes place. The frequency hypothesis suggests that children acquire the morphemes they hear most frequently in adult speech. Brown argued against this theory by analyzing adult speech where articles were the most common word form, yet children did not acquire articles quickly. He suggested that linguistic complexity may account for the order of acquisition where the less complex morphemes were acquired first. Complexity of the morphemes was determined based on semantics (meaning) and/or syntax (rules) of the morpheme. In other words, a morpheme with only one meaning such as plurality (-s) is easier to learn than the copula “is” (which encodes number and time the action occurs). Brown also suggested that for a child to have successfully mastered a grammatical morpheme, they must use it properly 90% of the time.

Order Morpheme Example
1 Present progressive ( ) runn
2-3 in, on sit chair
4 Plural ( ) cookie
5 Past irregular ran,drew
6 Possessive ( ) Daddy' toy
7 Uncontractible copula ( ) That my cookie.
8 Articles ( , ) cat ; "a" dog
9 Past regular ( ) jump
10 Third person regular cook
11 Third person irregular he my toy
12 Uncontractible auxiliary ( , ) you have one?
13 Contractible copula ( , You here
14 Contractible auxiliary ( ) He' coming!

Syntactic Errors

As children begin to develop more complex sentences, they must learn to use to grammar rules appropriately too. This is difficult in English because of the prevalence of irregular rules. For example, a child may say, “I buyed my toy from the store.” This is known as an overregularization error . The child has understood that there are syntactic patterns and rules to follow, but overuses them, failing to realize that there are exceptions to rules. In the previous example, the child applied a regular part tense rule (-ed) to an irregular verb. Why do these errors occur? It may be that the child does not have a complete understanding of the word meaning and thus incorrectly selects it (Pinker, et al.) [ 33 ] . Brooks et al. [ 34 ] suggested that these errors may be categorization errors. For example, intransitive or transitive verbs appear in different contexts and thus the child is required to learn that certain verbs appear only in certain contextes. (Brooks) [ 34 ] . Interestingly, Hartshorne and Ullman [ 35 ] found a gender difference for overregularization errors. Girls were more than three times more likely than boys to produce overregularizations. They concluded that girls were more likely to overgeneralize associatively, whereas boys overgeneralized only through rule-governed methods. In other words, girls, who remember regular forms, better than boys, quickly associated their rule forms to similar sounding words (ie: fold-folded, mold-molded, but they would say hold becomes holded). Boys, on the other hand, will use the regular rule when they have difficulty retrieving the irregular form (ie: past tense form - ed added to irregular form run becomes runed) (Hartshorne & Ullman) [ 35 ] .

Another common error committed by children is omission of words from an utterance. These errors are especially prevalent in their early speech production, which frequently lack function words (Gerken, Landau, & Remez) [ 36 ] . For example, a child may say “dog eat bone” forgetting function words “the” and “a”.This type of error has been frequently studied and researchers have proposed three main theories to account for omissions. First, it may be that children may focus on words that have referents (Brown) [ 31 ] . For example, a child may focus on “car” or “ball”, rather than “jump” or “happy.” The second theory suggests children simply recognize the content words which have greater stress and emphasis (Brown) [ 31 ] . The final theory, suggested by Gerken [ 36 ] , involves an immature production system. In their study, children could perceive function words and classify them into various syntactic categories, yet still omitted them from their speech production.

In this chapter, the development of speech production was examined in the areas of prelinguistics , phonology , semantics , syntax and morphology . As an infant develops, their vocalizations will undergo a transition from reflexive vocalizations to speech-like sounds and finally words. However, their linguistic development does not end there. Infants underdeveloped speech apparatus restricts them from producing all phonemes properly and thus they produce errors such as consonant cluster reduction , omissions of syllables and assimilation . At 18 months, many children seem to undergo a vocabulary spurt . Even with a larger vocabulary, children may also overextend (calling a horse a doggie) or underextend (not calling the neighbors’ dog, doggie) their words. When a child begins to combine words, they are developing syntax and morphology. Syntactic development is measured using mean length of the utterance (MLU) which is categorized into 5 stages (Brown) [ 31 ] . After stage II, children begin to use grammatical morphemes (ie: -ed, -s, is) which encode tense, plurality, etc. As with other areas of linguistic development, children also produce errors such as overregularization (ie: “I buyed it”) or omissions (ie: “dog eat bone”). In spite of children’s early errors patterns, children will eventually develop adult-like speech with few errors. Understanding and studying child language development is an important area of research as it may give us insight into underlying processes of language as well as how we might be able to facilitate it or treat individuals with language difficulties.

Learning Exercise

1. Watch the video clips of a young boy CC provided below.

Video 1 Video 2 Video 3 Video 4 Video 5

2. The following is a transcription of conversations between a mother (*MOT) and a child (*CHI) from Brown's (1970) corpus. You can ignore the # symbol as it represents unintelligible utterances. Use the charts found in the section on " Grammatical and Morphological Development " to help answer this question.

  • Possessive morphemes ('s)
  • Present progressive (-ing)
  • MOT: let me see .
  • MOT: over here +...
  • MOT: you have tapioca on your finger .
  • CHI: tapioca finger .
  • MOT: here you go .
  • CHI: more cookie .
  • MOT: you have another cookie right on the table .
  • CHI: Mommy fix .
  • MOT: want me to fix it ?
  • MOT: alright .
  • MOT: bring it here .
  • CHI: bring it .
  • CHI: that Kathy .
  • MOT: yes # that's Kathy .
  • CHI: op(en) .
  • MOT: no # we'll leave the door shut .
  • CHI: why ?
  • MOT: because I want it shut .
  • CHI: Mommy .
  • MOT: I'll fix it once more and that's all .
  • CHI: Mommy telephone .
  • MOT: well # go and get your telephone .
  • MOT: yes # he gave you your telephone .
  • MOT: who are you calling # Eve ?
  • CHI: my telephone .
  • CHI: Kathy cry .
  • MOT: yes # Kathy was crying .
  • MOT: Kathy was unhappy .
  • MOT: what is that ?
  • CHI: letter .
  • MOT: Eve's letter .
  • CHI: Mommy letter .
  • MOT: there's Mommy's letter .
  • CHI: Eve letter .
  • CHI: a fly .
  • MOT: yes # a fly .
  • MOT: why don't you go in the room and kill a fly ?
  • MOT: you go in the room and kill a fly .
  • MOT: yes # you get a fly .
  • MOT: oh # what's that ?
  • MOT: I'm going to go in the basement # Eve .

3. Below are examples of children's speech. These children are displaying some characteristics of terms of we have covered in this chapter. The specfic terms found in each video are provided. Find examples of these terms within their associated video. Indicate which type of development (phonological, semantic, syntactic) is associated with each of these term.

Terms Video
Dummy Syllable
Lexical Innovations
Assimilation
What kind of learner (conservative or productive)?
This child does not produce which two phonemes?
** hint, "camera" and "the"
Cluster reduction
Overregularization

5.The following are examples of children’s speech errors. Name the error and the type of development it is associated with (phonological, syntactic, morphological, or semantic). Can you explain why such an error occurs?

Learning Exercise Answers

Click here!

  • ↑ 1.0 1.1 1.2 Nathani, S., Ertmer, D. J., & Stark, R. E. (2006). Assessing vocal development in infants and toddlers. Clinical linguistics & phonetics, 20(5), 351-69.
  • ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 Oller, D.K.,(2000). The Emergence of the Speech Capacity. NJ: Lawrence Erlbaum Associates, Inc.
  • ↑ Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newbornsʼ cry melody is shaped by their native language. Current biology : CB, 19(23), 1994-7.
  • ↑ Beebe, B., Alson, D., Jaffe, J., Feldstein, S., & Crown, C. (1988). Vocal congruence in mother-infant play. Journal of psycholinguistic research, 17(3), 245-59.
  • ↑ Papaeliou, C. F., & Trevarthen, C. (2006). Prelinguistic pitch patterns expressing “communication” and “apprehension.” Journal of Child Language, 33(01), 163.
  • ↑ Keren-Portnoy, T., Majorano, M., & Vihman, M. M. (2009). From phonetics to phonology: the emergence of first words in Italian. Journal of child language, 36(2), 235-67.
  • ↑ Otomo, K. (2001). Maternal responses to word approximations in Japanese childrenʼs transition to language. Journal of Child Language, 28(1), 29-57.
  • ↑ 8.0 8.1 8.2 8.3 Vetter, H. J. (1971). Theories of Language Acquisition. Journal of Psycholinguistic Research, 1(1), 31. McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9. Cite error: Invalid <ref> tag; name "vet" defined multiple times with different content
  • ↑ 9.0 9.1 Rabagliati, H., Marcus, G. F., & Pylkkänen, L. (2010). Shifting senses in lexical semantic development. Cognition, 117(1), 17-37. Elsevier B.V.
  • ↑ 10.0 10.1 10.2 10.3 Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ 11.0 11.1 11.2 11.3 McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9.
  • ↑ Wyllie-Smith, L., McLeod, S., & Ball, M. J. (2006). Typically developing and speech-impaired childrenʼs adherence to the sonority hypothesis. Clinical linguistics & phonetics, 20(4), 271-91.
  • ↑ 13.0 13.1 Ganger, J., & Brent, M. R. (2004). Reexamining the vocabulary spurt. Developmental psychology, 40(4), 621-32.
  • ↑ 14.0 14.1 14.2 Aoyama, K., Peters, A. M., & Winchester, K. S. (2010). Phonological changes during the transition from one-word to productive word combination. Journal of child language, 37(1), 145-57.
  • ↑ Newton, C., & Wells, B. (2002, July). Between-word junctures in early multi-word speech. Journal of Child Language.
  • ↑ Goldstein, M. H., & Schwade, J. a. (2008). Social feedback to infantsʼ babbling facilitates rapid phonological learning. Psychological science : a journal of the American Psychological Society / APS, 19(5), 515-23. doi: 10.1111/j.1467-9280.2008.02117.x.
  • ↑ Nicolaidis, K. (2004). Articulatory variability during consonant production by Greek speakers with hearing impairment: an electropalatographic study. Clinical linguistics & phonetics, 18(6-8), 419-32.
  • ↑ Nionio, A., & Snow, C. (1996). Pragmatic development. Boulder, CO: Westview Press
  • ↑ Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ Storkel, H. L. (2004). Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition. Applied Psycholinguistics, 25(02), 201-221.
  • ↑ Gillette, J., Gleitman, H., Gleitman, L., & Lederer, a. (1999). Human simulations of vocabulary learning. Cognition, 73(2), 135-76.
  • ↑ Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N., & Marchman, V. a. (2008). Babyʼs first 10 words. Developmental psychology, 44(4), 929-38.
  • ↑ 23.0 23.1 23.2 Mitchell, C., & McMurray, B. (2009). On Leveraged Learning in Lexical Acquisition and Its Relationship to Acceleration. Cognitive Science, 33(8), 1503-1523.
  • ↑ Reznick, J. S., & Goldfield, B. a. (1992). Rapid change in lexical development in comprehension and production. Developmental Psychology, 28(3), 406-413.
  • ↑ Gopnik, A., & Meltzoff, A. (1987). The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. Child Development, 58(6), 1523.
  • ↑ 26.0 26.1 26.2 Swan, D. W. (2000). How to build a lexicon: a case study of lexical errors and innovations. First Language, 20(59), 187-204.
  • ↑ 27.0 27.1 27.2 27.3 McGregor, K. K., Friedman, R. M., Reilly, R. M., & Newman, R. M. (2002). Semantic representation and naming in young children. Journal of speech, language, and hearing research : JSLHR, 45(2), 332-46.
  • ↑ 28.0 28.1 Bloomquist, J. (2007). Developmental trends in semantic acquisition: Evidence from over-extensions in child language. First Language, 27(4), 407-420.
  • ↑ 29.0 29.1 Bowerman, M. (1978). Systematizing Semantic Knowledge ; Changes over Time in the Child ’ s Organization of Word Meaning tion that errors of word choice stem from the Substitution Errors as Evidence for the Recognition of Semantic Similarities among Words. Child Development, 7.
  • ↑ 30.0 30.1 Jerger, S., & Damian, M. F. (2005). Whatʼs in a name? Typicality and relatedness effects in children. Journal of experimental child psychology, 92(1), 46-75.
  • ↑ 31.0 31.1 31.2 31.3 31.4 31.5 A first Language. Cambridge, MA: Harvard University Press.
  • ↑ Behrens, H., & Gut, U. (2005). The relationship between prosodic and syntactic organization in early multiword speech. Journal of Child Language, 32(1), 1-34.
  • ↑ <Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., Xu, F., et al. (2011). IN LANGUAGE ACQUISITION Michael Ullman. Language Acquisition, 57(4).
  • ↑ 34.0 34.1 Brooks, P. J., Tomasello, M., Dodson, K., & Lewis, L. B. (1999). Young Childrenʼs Overgeneralizations with Fixed Transitivity Verbs. Child Development, 70(6), 1325-1337. doi: 10.1111/1467-8624.00097.
  • ↑ 35.0 35.1 Hartshorne, J. K., & Ullman, M. T. (2006). Why girls say “holded” more than boys. Developmental science, 9(1), 21-32.
  • ↑ 36.0 36.1 Gerken, L., Landau, B., & Remez, R. E. (1990). Function Morphemes in \ bung Children ’ s Speech Perception and Production. Developmental Psychology, 26(2), 204-216.

what is meaning speech production

  • Psycholinguistics
  • Pages with reference errors
  • Search Menu
  • Sign in through your institution
  • CNS Injury and Stroke
  • Epilepsy and Sleep
  • Movement Disorders
  • Multiple Sclerosis/Neuroinflammation
  • Neuro-oncology
  • Neurodegeneration - Cellular & Molecular
  • Neuromuscular Disease
  • Neuropsychiatry
  • Pain and Headache
  • Advance articles
  • Editor's Choice
  • Author Guidelines
  • Submission Site
  • Why publish with this journal?
  • Open Access
  • About Brain
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Terms and Conditions
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Brain areas involved in speech production.

  • Article contents
  • Figures & tables
  • Supplementary Data

Nina Dronkers, Jennifer Ogar, Brain areas involved in speech production, Brain , Volume 127, Issue 7, July 2004, Pages 1461–1462, https://doi.org/10.1093/brain/awh233

  • Permissions Icon Permissions

The ability to produce accurate speech sounds in rapid succession is something we humans take for granted. In fact, speech production is an extremely involved process. Thoughts must be translated into linguistic representations (itself not a trivial feat), which are then sent to speech mechanisms that can coordinate, initiate, modify and execute the articulation of an utterance. Through the study of patients with disorders affecting this complex process, we have come to learn that numerous brain areas are recruited in speech production and that they hang in a precarious balance that is easily affected by neurological disease and dysfunction.

The coordination of articulatory movements, an end‐stage component of speech production, has received increased attention in recent years. In order for sounds to be produced correctly, the lips, tongue, jaw, velum and larynx must make accurate movements at the right time or the intended sounds become distorted. For example, to say the simple word ‘gap,’ airflow must briefly be halted by raising the back of the tongue to the soft palate. This airflow is suddenly released, during which time the vocal cords must vibrate to create phonation. The tongue and jaw lower and the air should flow unobstructed to produce the proper vowel. The lips seal and the cords relax. All of this must be orchestrated perfectly in time and sequence so that the word ‘gap’ results. Given the many fine movements that are required for speech production, it is no wonder that the mouth area is so largely represented in the homunculus of primary motor cortex.

Patients with deficits in this ability to programme speech movements are said to have a disorder known as ‘apraxia of speech’. The disorder has been well studied in the realm of speech–language pathology, and treatment for the disorder has received equal attention ( Wertz et al., 1984 ; Duffy, 1995 ; McNeil et al., 1997 ). The brain regions that might support this function had been less well investigated until the advent of neuroimaging techniques that allowed for the in vivo investigation of the brain areas affected in patients who had sustained injuries that resulted in apraxia of speech. In one such study ( Dronkers, 1996 ), the computer‐reconstructed lesions of 25 chronic stroke patients with left hemisphere lesions who had been diagnosed with apraxia of speech were overlapped to determine if a common area of infarction could be found in this group. The only region of overlap in 100% of the cases was found in the superior tip of the precentral gyrus of the insula (SPGI). Since this region fell within the central‐most area of the brain, it was possible that this common area merely reflected a vulnerable area in patients with left hemisphere strokes and was not specific to apraxia of speech. For that reason, the lesions of 19 patients who were similarly assessed but who did not carry the diagnosis of apraxia of speech were also overlapped. Their lesions spanned the same distribution of the left hemisphere but completely spared the same region that was affected in the patients with the disorder. This dissociation was taken to mean that the SPGI might play some role in the coordination of articulatory movements. Such lesion analysis methods serve not only to tie behaviours to brain areas, but also to take the complementary, reverse step of comparing the behaviour of patients with spared regions of interest. Other patient studies and some functional imaging studies have also implicated the insula in the process of speech production (e.g. Wise et al., 1999 ; Nestor et al., 2003 ; Gorno‐Tempini et al., 2004 ).

In this issue of Brain , the relationship of the insula to apraxia of speech was examined by Hillis and colleagues in acute stroke patients by utilizing diffusion‐weighted imaging (DWI) and perfusion‐weighted imaging (PWI) within the first 24 h after stroke. Forty patients with and 40 without lesions and/or hypoperfusion to the insula were selected and given several short oral language tasks from which a diagnosis of apraxia of speech was later extracted. The authors found no reliable relationship between apraxia of speech and structural changes or low blood flow to regions of the insula, but instead found that 84% of patients with apraxia of speech had such changes in the posterior inferior frontal gyrus. The authors present an interesting and alternative method for identifying the relationship between behavioural deficits and affected regions of the brain, and raise questions concerning the best methods of lesion analysis.

The study of Hillis et al . makes a contribution to the field for several reasons. First, its starting point is the regions of interest that were lesioned and/or dysfunctional and evaluates whether patients with changes there show the expected deficit. This is the complementary approach to first selecting patients with the deficit and then evaluating if they demonstrate a common lesion. Secondly, the study evaluates patients in the acute stage of stroke and captures those who might have small lesions that could resolve quickly and might be overlooked in a study of chronic patients. Thirdly, the study draws on the authors’ earlier work that evaluates both dysfunctional and structural damage within the first 24 h. Few studies have assessed large numbers of patients with both techniques in this early stage after stroke and thus have not evaluated the effects of tissue dysfunction in addition to the effects of tissue loss.

At the same time, the paper opens the discussion concerning the assessment of lesion–symptom mapping in brain‐injured patients. What is the best way to assess which areas are important for certain functions? How do methods of lesion analysis (lesion overlapping, DWI and PWI) contribute to this understanding? How do brain–behaviour relationships in acute patients using one set of methods reliably compare with those found using an alternative method in chronic patients? Should these relationships be pursued in acute patients before the brain has had the opportunity for reorganization of function, or should they be assessed in chronic patients when the physiological effects of the brain injury have passed and the behaviour has settled into a stable pattern? Should we be viewing structural changes or functional ones, and how do they compare? Should we constrain our search to regions of interest or open our investigation to all regions of the brain? Finally, how should behavioural deficits be investigated? Should we try localizing individual symptoms or search for syndromes and networks in the brain?

Clearly all of these approaches contribute to the study of brain–behaviour relationships in complementary ways. The difference in findings between the acute patients of Hillis et al . and the chronic patients of Dronkers is of great interest and questions what might be happening between these two stages that yields a shift in localization between Broca’s area to the precentral gyrus of the insula for speech praxis. The ability to view both functional and structural lesions in the brain allows us to see which areas are recruited during a behavioural task and which ones are necessary to support the function. While lesion overlapping allows us to consider a wide area of brain in our search for localization of particular disorders (and has succeeded in yielding numerous associations throughout the brain, not just those in the insula), the a priori determination of regions of interest allows us to focus on the specific deficits that follow injury to that one area. Ideally, a mixture of both techniques would be advantageous and would allow for more detailed correlations between symptoms and brain regions. The new voxel‐based methods such as VLSM (voxel‐based lesion–symptom mapping; Bates et al., 2003 ) in which well‐defined continuous data can be evaluated at the voxel level are already making contributions in this area (e.g. Saygin et al., 2003 ; Dronkers et al., 2004 )

Speech production is a complex process, involving a networked system of brain areas that each contribute in unique ways. Areas beyond Broca’s area and the anterior insula have been implicated in the complex process of producing speech movements. Future studies, associating even more specific apraxia of speech symptoms (e.g. pure motoric groping) with discrete brain areas, may further our understanding of such a distributed network. For the patients suffering from apraxia of speech, a better characterization of the disorder and its symptoms may ultimately help clinicians in planning for more effective rehabilitation. Perhaps using multiple methods, e.g. lesion overlap, DWI, PWI and functional MRI, to follow brain‐damaged patients from the acute phase through early and late stages of rehabilitation will add to our knowledge of the time course of recovery, localization of function and the nature of reorganization after injury.

Bates E, Wilson SM, Saygin AP, Dick F, Sereno MI, Knight RT, Dronkers NF. Voxel‐based lesion–symptom mapping. Nature Neurosci 2003 ; 6 : 448 –50.

Dronkers NF. A new brain region for coordinating speech articulation. Nature 1996 ; 384 : 159 –61.

Dronkers NF, Wilkins DP, Van Valin RD, Redfern BB, Jaeger, JJ. Lesion analysis of the brain areas involved in language comprehension. Cognition 2004 ; 92 : 145 –77.

Duffy J. Motor speech disorders. St Louis: Mosby; 1995 .

Gorno‐Tempini ML, Dronkers NF, Rankin KP, Ogar JM, Phengrasamy L, Rosen HJ, et al. Cognition and anatomy in three variants of primary progressive aphasia. Ann Neurol 2004 ; 55 : 335 –46.

Hillis AE, Work M, Barker PB, Jacobs MA, Breese EL, Maurer K. Re‐examining the brain regions crucial for orchestrating speech articulation. Brain 2004 ; 127 : 0000 –0000.

McNeil MR, Robin DA, Schmidt RA. Apraxia of speech: definition, differentiation, and treatment. In: McNeil MR, editors. Clinical management of sensorimotor speech disorders. New York: Thieme; 1997 . p. 311 –44.

Nestor PJ, Graham NL, Fryer TD, Williams GB, Patterson K, Hodges JR. Progressive non‐fluent aphasia is associated with hypometabolism centred on the left anterior insula. Brain 2003 ; 126 : 2406 –18

Saygin AP, Dick F, Wilson S, Dronkers NF, Bates E. Neural resources for processing language and environmental sounds: evidence from aphasia. Brain 2003 ; 126 : 928 –45.

Wertz RT, LaPointe LL, Rosenbek JC. Apraxia of speech: the disorder and its management. New York: Grune and Stratton; 1984 .

Wise RJ, Green J, Buchel C, Scott SK. Brain regions involved in articulation. Lancet 1999 ; 353 : 1057 –61.

Month: Total Views:
December 2016 2
January 2017 40
February 2017 82
March 2017 53
April 2017 23
May 2017 28
June 2017 28
July 2017 39
August 2017 49
September 2017 96
October 2017 92
November 2017 140
December 2017 742
January 2018 675
February 2018 619
March 2018 751
April 2018 925
May 2018 707
June 2018 646
July 2018 580
August 2018 709
September 2018 788
October 2018 764
November 2018 890
December 2018 695
January 2019 636
February 2019 685
March 2019 844
April 2019 816
May 2019 736
June 2019 656
July 2019 556
August 2019 558
September 2019 756
October 2019 678
November 2019 597
December 2019 488
January 2020 443
February 2020 458
March 2020 383
April 2020 492
May 2020 259
June 2020 382
July 2020 366
August 2020 274
September 2020 215
October 2020 264
November 2020 300
December 2020 333
January 2021 204
February 2021 204
March 2021 275
April 2021 256
May 2021 236
June 2021 143
July 2021 143
August 2021 136
September 2021 211
October 2021 295
November 2021 201
December 2021 208
January 2022 228
February 2022 175
March 2022 284
April 2022 290
May 2022 224
June 2022 145
July 2022 185
August 2022 211
September 2022 208
October 2022 382
November 2022 292
December 2022 268
January 2023 188
February 2023 192
March 2023 225
April 2023 186
May 2023 225
June 2023 118
July 2023 150
August 2023 116
September 2023 170
October 2023 175
November 2023 202
December 2023 131
January 2024 181
February 2024 166
March 2024 163
April 2024 170
May 2024 154
June 2024 120
July 2024 91
August 2024 80
September 2024 104

Email alerts

Citing articles via, looking for your next opportunity.

  • Contact the editorial office
  • Guarantors of Brain
  • Recommend to your Library

Affiliations

  • Online ISSN 1460-2156
  • Print ISSN 0006-8950
  • Copyright © 2024 Guarantors of Brain
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • > English Phonology
  • > Speech sounds and their production

what is meaning speech production

Book contents

  • Frontmatter
  • 1 Speech sounds and their production
  • 2 Towards a sound system for English: consonant phonemes
  • 3 Some vowel systems of English
  • 4 Phonological features, part 1: the classification of English vowel phonemes
  • 5 Phonological features, part 2: the consonant system
  • 6 Syllables
  • 7 Word stress
  • 8 Phonetic representations: the realisations of phonemes
  • 9 Phrases, sentences and the phonology of connected speech
  • 10 Representations and derivations

1 - Speech sounds and their production

Published online by Cambridge University Press:  05 June 2012

Organs and processes

Most speech is produced by an air stream that originates in the lungs and is pushed upwards through the trachea (the windpipe) and the oral and nasal cavities . During its passage, the air stream is modified by the various organs of speech. Each such modification has different acoustic effects, which are used for the differentiation of sounds. The production of a speech sound may be divided into four separate but interrelated processes: the initiation of the air stream, normally in the lungs; its phonation in the larynx through the operation of the vocal folds; its direction by the velum into either the oral cavity or the nasal cavity (the oro-nasal process); and finally its articulation , mainly by the tongue, in the oral cavity. We shall deal with each of the four processes in turn. (See figure 1.1.)

The initiation process

The operation of the lungs is familiar through their primary function in the breathing process: contraction of the intercostal muscles and lowering of the diaphragm causes the chest volume to increase and air is sucked into the lungs through the trachea. When the process is reversed, air will escape – again through the trachea. Apart from recurring at regular intervals as breath, this air stream provides the source of energy for speech. In speech, the rate of the air flow is not constant; rather, the air stream pulsates as the result of variation in the activity of the chest muscles.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Speech sounds and their production
  • Heinz J. Giegerich , University of Edinburgh
  • Book: English Phonology
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9781139166126.002

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Speech Lang Hear Res

Speech Production From a Developmental Perspective

Melissa a. redford.

a Linguistics Department, University of Oregon, Eugene

An external file that holds a picture, illustration, etc.
Object name is CCBY.jpg

Current approaches to speech production aim to explain adult behavior and so make assumptions that, when taken to their logical conclusion, fail to adequately account for development. This failure is problematic if adult behavior can be understood to emerge from the developmental process. This problem motivates the proposal of a developmentally sensitive theory of speech production. The working hypothesis, which structures the theory, is that feedforward representations and processes mature earlier than central feedback control processes in speech production.

Theoretical assumptions that underpin the 2 major approaches to adult speech production are reviewed. Strengths and weaknesses are evaluated with respect to developmental patterns. A developmental approach is then pursued. The strengths of existing theories are borrowed, and the ideas are resynthesized under the working hypothesis. The speech production process is then reimagined in developmental stages, with each stage building on the previous one.

The resulting theory proposes that speech production relies on conceptually linked representations that are information-reduced holistic perceptual and motoric forms, constituting the phonological aspect of a system that is acquired with the lexicon. These forms are referred to as exemplars and schemas, respectively. When a particular exemplar and schema are activated with the selection of a particular lexical concept, their forms are used to define unique trajectories through an endogenous perceptual–motor space that guides implementation. This space is not linguistic, reflecting its origin in the prespeech period. Central feedback control over production emerges with failures in communication and the development of a self-concept.

Speech motor control allows for flexible, fast, and precise coordination of speech articulators to achieve a motor goal. Adult performance in auditory feedback perturbation experiments suggests not only sensitivity to deviations between, say, an intended vowel and the acoustics of the vowel produced but also an ability to compensate for these deviations with fine motor adjustments that can raise or lower a particular formant frequency by as little as 50 Hz (see, e.g., Katseff, Houde, & Johnson, 2012 ; MacDonald, Goldberg, & Munhall, 2010 ). It is perhaps not surprising that this kind of fine-grained spatiotemporal control over articulation develops slowly. Large gains in speech motor skill are made during the first few years of life, but adultlike control is not achieved until mid-adolescence. Evidence for this claim dates back to Kent and Forner (1980) , who pointed out that temporal variability in young school-aged children's segmental durations is higher than in adults' speech and that this remains true until 12 years of age (see also Lee, Potamianos, & Narayanan, 1999 ; B. L. Smith, 1992 ). These acoustic findings were later supplemented with kinematic ones, which validated the interpretation of greater temporal variability in children's speech as the result of immature articulatory timing control ( Green, Moore, Higashikawa, & Steeve, 2000 ; Sharkey & Folkins, 1985 ; A. Smith & Goffman, 1998 ). A. Smith and Zelaznik (2004) followed up on this work with older children and showed that articulatory timing control is not fully mature until mid-adolescence. So, given the protracted development of speech motor control, why can we more or less understand what children are saying when they first begin to use words at about 12 months of age? Also, even more strikingly, how is it possible that 3-year-old children seem to never stop talking when their speech motor skills are still so immature? The answer put forward in this review article is that feedforward processes mature earlier than central feedback control processes.

More specifically, the argument developed herein is that speech production relies on conceptually linked representations that are abstract (i.e., information-reduced) holistic perceptual and motoric forms. These forms constitute the phonological aspect of the lexicon. The perceptual phonological forms are exogenous representations. They are exemplars that are acquired with lexical concepts beginning around 9 months of age. The motoric phonological forms are endogenous representations. They are schemas that begin to be abstracted around 12 months of age with first word productions. When a particular exemplar and schema are activated with the selection of a particular concept, their forms are used to define unique trajectories through an endogenous perceptual–motor space that guides implementation. This space is not linguistic; its processes are entirely free from conceptual information. The absence of conceptual information reflects the origin of this space in the prespeech period when infants' vocal explorations create the first linkages between perceptual and motoric trajectories.

By hypothesis, schemas are modified through developmental time as central feedback control is incorporated into the production process. This is because the act of speaking indirectly modifies schemas via the same process used to first abstract them. The onset of high-level predictive feedback control emerges with communication failures. These failures are assumed to significantly increase with vocabulary size due to homophony, motivating a shift in the production system toward exemplar representations around 18 months of age. The shift drives the emergence of an internal loop that matches the (projected) perceptual consequences of self-productions against targeted exemplar representations. Selective attention to auditory feedback develops later during the preschool years with the emergence of self-concept. At this point, the child begins to focus on sound production per se in addition to communication. The latter hypothesis could explain why literacy acquisition becomes possible around the age of 5 years and why direct intervention for speech sound disorders also becomes effective at this age.

The argument outlined above is in fact a general theory of speech production that is developmentally sensitive. The theory combines those aspects of existing adult-focused theories that best accommodate acquisition to define whole-word production at different stages of development from infancy to childhood on into adulthood. This developmentally sensitive theory of speech production is further motivated below. This motivation begins with a review of adult-focused theories. A major point of the review will be that the two major approaches to speech, the ecological dynamics and information-processing approaches, lead to different emphases regarding the type of feedforward information used in production (motoric vs. perceptual) and to different views on the type of feedback control processes engaged during execution (peripheral vs. central). I will argue that the holistic motoric representations that drive production in the ecological dynamics approach are consistent with functional approaches to child phonology and better account for young children's speech patterns than the discrete perceptual representations that drive production in the information-processing approach. Nonetheless, the information-processing assumption of distinct production and perception systems is embraced in the developmentally sensitive theory of speech production that I put forward because central feedback control is deemed necessary to account for the evolution of children's speech patterns from first words to adultlike forms.

Adult-Focused Theories of Speech Production

Adult-focused theories of speech production assume the activation of an abstract phonological plan that is then rendered in sufficient phonetic detail for the sensorimotor system to activate speech movements (e.g., Browman & Goldstein, 1992 ; Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Goldstein, Byrd, & Saltzman, 2006 ; Guenther, 1995 ; Keating & Shattuck-Hufnagel, 2002 ; Roelofs, 1999 ; Turk & Shattuck-Hufnagel, 2014 ). The detailed phonetic plan is known as a speech plan . It contains or directly activates linguistic representations that provide relevant feedforward information for implementation. The representations and type of feedback control processes used in production differ according to the theoretical approach taken. Here, the two main approaches to speech production are reviewed: the ecological dynamics approach and the information-processing approach (see Figure 1 ). These approaches represent an amalgam of different theories, hence the generic labels. The different sets of theories emerge from two fundamentally different approaches to human cognition—an ecological-embodied approach versus a representation-based information-processing approach, which are briefly described next.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g001.jpg

The ecological dynamics and information-processing approaches to speech production both assume three major levels of analysis: a phonological level where abstract form representations are associated with conceptual meaning, a speech plan level where abstract forms are elaborated for implementation, and an implementation level where articulatory action is formulated and adjusted in real time to achieve the plan. The two approaches otherwise adopt very different fundamental assumptions, resulting in different theories of representation, sequencing, and control. In particular, the ecological dynamics approach emphasizes speech as action and assumes gestalt articulatory representations, emergent sequential structure, and self-organized articulation. In contrast, the information-processing approach emphasizes the importance of discrete elements and assumes executive control over sequencing and implementation, thus promoting a strong role for perception in production while assuming that the two processes are distinct. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Richardson, Shockley, Fajen, Riley, and Turvey (2009) outline the tenets of an ecological-embodied approach in contrast to the assumptions of an information-processing approach as follows. In an ecological-embodied approach, behavior is emergent and self-organized, which is to say behavior is not planned or controlled (pp. 170–173). Perception and action are viewed as continuous and cyclic and thus functionally united (pp. 173–175). In particular, the concept of affordances assumes that the objects of perception provide information about action possibilities (pp. 178–182). The theory of direct perception assumes that these useful objects are wholly conveyed by sensory input (pp. 176–178). This means that knowledge is simply extracted from the environment within which the individual lives and moves (pp. 167–170).

The ecological-embodied view of knowledge contrasts with the information-processing view where knowledge emerges from learned associations, which give rise to mediating representations. These representations are knowledge in the information-processing approach. This view of knowledge follows from other assumptions: Individuals are separate from their environment, the mind is separate from the body, and action is separate from perception. Overall, representational and computational processes are “lifted away from the organism–environment system and…studied on their own, permitting cognitive scientists to proceed whereas other specialists work to understand the body and environment of the knower” ( Richardson et al., 2009 , pp. 161–162). This approach to human cognition is likely more familiar to readers than the ecological-embodied approach because it has provided the philosophical foundation for much of mainstream cognitive sciences in North America, including linguistics and psychology, since the “cognitive revolution” in the 1950s (see Mandler, 2007 , Chap. 10). The assumptions of this approach are detailed in Newell and Simon's (1972) classic book, Human Problem Solving .

The information-processing approach has resulted in the modular study of language (e.g., syntax vs. phonology) and in a sharp division of expertise between those who study language and those who are interested in speech production (e.g., phonology vs. phonetics). Among the latter, those who adhere closely to the approach often focus on the translation problem that follows from their computational view, for example, the problem of how discrete phonological elements are transformed into continuous speech action (see, inter alia, Bladon & Al-Bamerni, 1976 ; Keating, 1990 ; MacNeilage, 1970 ; Recasens, 1989 ; Stevens & Blumstein, 1981 ; Wickelgran, 1969 ). This focus also structures psycholinguistic models of production that posit multiple processing stages to generate production units (e.g., Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Levelt, 1989 ; Roelofs, 1999 ), a generic version of which is presented in the right-hand panel of Figure 1 . Models of speech motor control that have discrete elements as goals emphasize feedback control to ensure accurate implementation of these elements in speech movement (e.g., Abbs & Gracco, 1984 ; Hickok, 2012 ; Houde & Nagarajan, 2011 ; Lindblom, Lubker, & Gay, 1979 ; Niziolek, Nagarajan, & Houde, 2013 ; Perkell, Matthies, Svirsky, & Jordan, 1993 ; Tourville & Guenther, 2011 ).

In contrast to the information-processing approach, the ecological-embodied approach has been mainly applied to the study of speech ( Best, 1995 ; Browman & Goldstein, 1992 ; Fowler, 1986 ; Galantucci, Fowler, & Turvey, 2006 ; Goldstein & Fowler, 2003 ; Kelso, Saltzman, & Tuller, 1986 ; Saltzman & Kelso, 1987 ; Saltzman & Munhall, 1989 ). The assumption of separate language and speech systems is thus preserved by default, and only speech processes are fully consistent with the tenets of an ecological-embodied approach. This entails no translation between higher level speech sound representations and lower level speech movement. Phonological forms are objects of both action and perception. These forms become increasingly elaborated when activated through self-organization rather than through planning. Thus, the flow from high to low is better conceived of as the emergence of speech form, which is mediated only by a linearized version of a nonlinear representation (i.e., a gestural score; see Figure 1 , left). The specific assumptions of each approach to speech production are elaborated further below, beginning with the action-focused ecological dynamics approach.

The Ecological Dynamics Approach

The ecological dynamics approach to speech production is best represented by articulatory phonology ( Browman & Goldstein, 1992 , and subsequent), a task-dynamic approach to articulation ( Kelso et al., 1986 ; Saltzman & Kelso, 1987 ; Saltzman & Munhall, 1989 ), and by ecological theories of speech perception ( Best, 1995 ; Fowler, 1986 ; Galantucci et al., 2006 ; Goldstein & Fowler, 2003 ) and speech sound acquisition ( Best, 1995 ; Best, Goldstein, Nam, & Tyler, 2016 ). The fundamental unit of analysis is a vocal tract constriction that serves as an articulatory attractor. This unit is known as a gesture . Gestures are linguistic primitives, similar to distinctive features in generative theory, that emerge during development under the assumption that infants acquire “a relation between actions of distinct (articulatory) organs and lexical units very early in the process of developing language” ( Goldstein & Fowler, 2003 , p. 35; see also Best et al., 2016 ). Gestures are defined as “events that unfold during speech production and whose consequences can be observed in the movements of the speech articulators” ( Browman & Goldstein, 1992 , p. 156). More specifically, they are abstract representations of “the formation and release of constrictions in the vocal tract ( ibid ),” which are realized dynamically, thus giving them an event-like status. This status in turn confers intrinsic timing; that is, once activated, gestures take time to achieve a target vocal tract constriction and then time to move away from the constriction.

The assumption of intrinsic timing has a number of interesting theoretical consequences, several of which are compatible with a developmental perspective on speech production. Perhaps, the most important of these consequences is in the representation of sequential articulation (see, e.g., Browman & Goldstein, 1992 ; Fowler, 1980 ; Fowler & Saltzman, 1993 ; Kelso et al., 1986 ; Saltzman & Munhall, 1989 ). Gestures, like their distinctive feature counterparts in generative phonology, are always realized as part of a larger whole (i.e., a “molecule”). However, unlike distinctive features, the wholes are not bundled up into individual phonemes that must be sequenced during the production process. Instead, gestures participate in an articulatory gestalt that is, minimally, syllable sized. Moreover, all relevant gestures associated with a lexical entry are coactivated when that entry is selected for production ( Browman & Goldstein, 1989 , 1992 ; Goldstein et al., 2006 ). Put another way, the articulatory phonology view of lexical form representations is that these are holistic and motorically based. The developmentally sensitive theory I propose shares this view of lexical representation; I also argue for holistic, perceptually based form representations.

Under the ecological-embodied assumption of cyclic action, appropriate sequencing within a word is emergent. To understand emergent sequencing, consider, for example, the coordination of a single consonantal and vocalic gesture. Consonantal gestures are intrinsically shorter than vocalic gestures. They are also phased relative to one another: If the cyclic gestures are coordinated without a phase difference, a consonant–vowel syllable emerges; if they are 180° out of phase, a vowel–consonant syllable emerges ( Browman & Goldstein, 1988 ; Goldstein et al., 2006 ; Nam, Goldstein, & Saltzman, 2009 ). These in-phase and antiphase relations are stable coordination patterns in motor systems ( Haken, Kelso, & Bunz, 1985 ; Turvey, 1990 ). Of course, languages allow for consonant or vowel sequences that complicate stable coordination dynamics (e.g., consider the English word “sixths” among many, many others). Thus, gestural timing associated with individual words may be learned during speech acquisition and incorporated into a coupling graph, which is the lexical form representation in articulatory phonology ( Goldstein & Fowler, 2003 ; Goldstein et al., 2006 ; Nam et al., 2009 ).

Note that the ecological dynamics conception of coordination also has implications for a theory of coarticulation, which is understood within this approach to speech production as coproduction (see Fowler, 1980 ). In contrast to information-processing approaches to coarticulation, dynamic formant trajectories and distributed spectral effects of rounding and nasalization and so on emerge directly from the representation; they are never due to a central executive that “looks ahead” to the next sound(s) while preparing the current one. This view of coarticulation appears to be more compatible with developmental findings on coarticulation than the information-processing view, a point to which I return later.

When words are selected for production, their coupling graphs give rise to linearized gestural scores (see, inter alia, Goldstein et al., 2006 ). These scores meet the generic definitions of both a speech plan and a motor program. They are plans in that they specify, abstractly, the relative timing and duration of specific speech actions. They are programs in that they drive these actions directly via task dynamics ( Saltzman & Munhall, 1989 ). The dynamic transformation from coupling graph to gestural score means that there is no speech planning in the ecological dynamic approach to speech production; there are only speech plans that serve also as phonological representations. I make a similar assumption in the developmentally sensitive theory proposed herein.

During the implementation stage of the production process, gestures represent motor goals ( Fowler & Saltzman, 1993 ; Löfqvist, 1990 ). Articulators self-organize to effect these goals. Self-organization is based in large part on functional synergies that stabilize over developmental time to become part of the motor control system (see, e.g., A. Smith & Zelaznik, 2004 ). In other words, gestures give rise to a type of functional motor unit of coordination (i.e., a “coordinative structure”). Peripheral perceptual feedback provides relevant context information to subcortical structures and the peripheral nervous system for goal achievement (see, e.g., Saltzman & Munhall, 1989 , p. 48) and to automatically compensate for perturbations (see, e.g., Abbs & Gracco, 1984 ). In this way, there is no real control over production in the sense of cortically mediated adjustments to movement direction and velocity. Whereas this view of implementation and its development can account for infant vocalizations and early speech attempts and for the overall slow development of speech motor skills, I argue below that the strong evidence from adult speech for cortically mediated control over production must be incorporated into a developmentally sensitive theory of speech production to account for phonological change through developmental time.

In summary, an ecological dynamics approach to speech production assumes an entirely feedforward process. Motor goals are articulatory and event-like and are phased relative to one another in articulatory gestalt representations that are linked to conceptual information in the lexicon. Sequential structure and coarticulatory overlap emerge from gestural dynamics. Production itself is a self-organized process. Thus, the approach eschews the concept of central control over speech production based on first principles.

The Information-Processing Approach

The information-processing approach to speech production is best represented by mainstream psycholinguistic theories of language production (e.g., Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Roelofs, 1999 ), phonetically informed theories of implementation (e.g., Guenther, 1995 ; Guenther & Perkell, 2004 ; Keating & Shattuck-Hufnagel, 2002 ; Turk & Shattuck-Hufnagel, 2014 ), and by prediction-based models of speech motor control (e.g., Hickok, 2012 ; Houde & Nagarajan, 2011 ; Niziolek et al., 2013 ; Tourville & Guenther, 2011 ). In this approach, phonological representations mediate between perception and production. They are abstract and symbolic.

The phoneme—a categorical and discrete element—is often the fundamental unit of analysis in this approach. The emphasis on phonemes is due to a modeling focus on speech errors (e.g., Bock & Levelt, 2002 ; Dell, 1986 ; Garrett, 1988 ; Levelt, 1989 ; Roelofs, 1999 ), which are best described with reference to segmental structure (see also MacKay, 1970 ; Shattuck-Hufnagel & Klatt, 1979 ). These modeling efforts have led to the psycholinguistic assumption that segment sequencing is an active process during production (see, inter alia, Bock & Levelt, 2002 ; Dell, 1986 ; Garrett, 1988 ; Levelt, 1989 ; Roelofs, 1999 ). This process has come to be known as phonological encoding (see Figure 1 , right). Theories diverge on how encoding happens, but once encoded, all theories recognize that the phonemic string must be further specified before it can be used as a plan for output. In Levelt's (1989) highly influential model, the string is metrically chunked for output, allowing for specification of positional information via allophone selection; for example, the aspirated variant of the voiceless alveolar stop is chosen for tab (i.e., [tʰæb]), the unreleased variant is selected for bat (i.e., [bæt̚]), and the stop is replaced by a flap in batter (i.e., [bæɾɚ]). From a developmental perspective, the mainstream assumption of phonological and phonetic encoding complexifies speech acquisition since it predicts that infants must learn a symbolic system and the computational steps necessary to translate symbolic representations into action plans.

Once a phonological string has been phonetically encoded, it can be implemented. Implementation can mean the appropriate selection of a syllable-sized motor program from a mental syllabary (e.g., Bohland, Bullock, & Guenther, 2010 ; Guenther, Ghosh, & Tourville, 2006 ; Levelt, 1989 ) or careful specification of articulatory timing information (e.g., Keating, 1990 ; Turk & Shattuck-Hufnagel, 2014 ). Either way, discrete phones remain high-level motor goals during execution. These goals are conceived of specifically as speech sound categories (e.g., Guenther, 1995 ; Hickok & Poeppel, 2000 ; Johnson, Flemming, & Wright, 1993 ; Lindblom, 1990 ; Lindblom et al., 1972 ) or more generally as perceptual categories (e.g., Perkell, Matthies, Svirsky, & Jordan, 1995 ; Savariaux, Perrier, & Orliaguet, 1995 ; Schwartz, Boë, Vallée, & Abry, 1997 ). Importantly, the goals remain nonoverlapping even in high-frequency combinations when, through repeated practice, they may be stored together as part of a larger chunk (see, e.g., Bohland et al., 2010 , p. 1505). This view stands very much in contrast to the ecological dynamics view where chunks are articulatory gestalts composed of overlapping gestures/articulatory events. The assumption of discrete goals also requires computationally intensive accounts of coarticulation, especially long-distance coarticulation, which is explained in the information-processing approach to result either from feature spreading at an early stage of encoding (e.g., Bladon & Al-Bamerni, 1976 ; Daniloff & Hammarberg, 1973 ; Recasens, 1989 ) or from planning for the articulation of individual phones within a well-defined window during a later stage of encoding (e.g., Guenther, 1995 ; Keating, 1990 ). These accounts wrongly predict the slow development of coarticulation (see below).

Although discrete perceptual speech motor goals are problematic from a development perspective, they are posited in the information-processing approach to explain “the exquisite control of vocal performance that speakers/singers retain for even the highest frequency syllables” ( Bohland et al., 2010 , p. 1509). Exquisite control of vocal performance requires the coordination of multiple independent speech articulators through time, each of which also has many degrees of movement freedom—another developmentally unfriendly computational problem. The coordination problem is solved in the information-processing approach by assuming central perceptual feedback control over articulatory movements—an assumption for which there is now abundant evidence.

Central feedback control means cortically mediated adjustments to articulation made with reference to perceptual goals in order to achieve on-target sound production. Of course, slow central processing of perceptual feedback presents a problem for perceptual feedback during real-time speech production (see, e.g., Lindblom et al., 1979 ; MacNeilage, 1970 ). Lindblom et al. (1979 , p. 160) were the first to propose a viable solution to this problem. Specifically, they proposed that motor control does not rely on processing perceptual feedback per se but instead references the simulated perceptual results of planned action while execution unfolds. Lindblom et al. called this proposal predictive encoding , and with it, they foreshadowed the emphasis in current models of speech motor control where a copy of the output signal (= efference copy) is used to predict sensory outcomes (e.g., Hickok, 2012 ; Houde & Nagarajan, 2011 ; Niziolek et al., 2013 ; Tourville & Guenther, 2011 ) for error correction purposes (e.g., Tourville & Guenther, 2011 ) or real-time speech motor control (see, e.g., Niziolek et al., 2013 ). The proposal is supported by speakers' remarkable ability to correctly produce target sounds when normal articulation is disrupted.

Lindblom et al. (1979) proposed predictive encoding to account for their speakers' near-instantaneous adaptation to different bite-block manipulations during vowel production. Since then, many sophisticated perturbation experiments have been conducted (e.g., Katseff et al., 2012 ; Lametti, Nasir, & Ostry, 2012 ; MacDonald et al., 2010 ; Savariaux et al., 1995 ). These experiments provide strong evidence in favor of perceptual goals and for the role of central feedback control in speech production. Consider, for example, a study by Lametti et al. (2012) , which investigated the effects of different types of perceptual feedback perturbations on the repetition of a target word, head . Somatosensory feedback was disrupted by a robot arm, which tugged randomly at the speakers' lower jaw, thereby disrupting the normal articulatory path for the target /ɛ/ vowel. Auditory feedback was perturbed by altering the speaker's own F1 upward in the direction of an /æ/ vowel. This real-time alteration was sent to the speaker via headphones. The results indicated that speakers counteracted the effects of perturbation through compensation to maintain the target, head, production. While the majority of speakers compensated more for auditory perturbations than somatosensory perturbations, some speakers showed the opposite effect and many adapted to both types of perturbations.

It has been argued that, whereas perturbation experiments provide evidence for error correction based on perceptual feedback, conclusions about real-time speech motor control are more dubious since the experimental findings require manipulations that create very unnatural speaking conditions (see, e.g., Guenther et al., 2006 , p. 288). Yet, the basic behavior observed in perturbation experiments—speaker adjustments based on incoming perceptual information—is also observed in phonetic imitation experiments, which are significantly more natural. Instead of participants hearing their own perturbed speech, they simply repeat words that others have produced (e.g., Babel, 2012 ; Goldinger, 1998 ; Nielsen, 2011 ; Shockley, Sabadini, & Fowler, 2004 ). Just as in the perturbation paradigm, participants are found to make fine-tuned adjustments to their own speech in the direction of the input; for example, participants' production of voice onset time (VOT) in stop production is measurably changed when shadowing exposure to stop-initial words with substantially different VOT values than their own ( Shockley et al., 2004 ). Moreover, behavior in these laboratory experiments also corresponds to the real-world language phenomenon of convergence ( Giles & Powesland, 1997 ), where interlocutors begin to sound like one another over the course of an exchange. When speakers subconsciously “converge” on a set of phonetic features during an interaction, they are demonstrating that perceptual input informs online spoken language production (see, e.g., Babel, 2012 ). Thus, speakers' behavior in contrived and natural speaking conditions provides strong evidence for the importance of perceptual feedback during speech production. The developmentally sensitive theory proposed herein is meant to accommodate this evidence.

In summary, the information-processing approach emphasizes the importance of discrete elements and so assumes executive control over sequencing and implementation. This assumption entails a role for perception in production. The evidence for online vocal–motor adjustments based on self- and other- generated auditory information is especially strong and consistent with the hypothesis of central perceptual feedback control over speech production.

Implications of Adult-Focused Theories for the Development of Speech Production

From a developmental perspective, the different approaches to speech production each has strengths and important limitations that were alluded to above. The main strength of the ecological dynamics approach is the central hypothesis that temporal relations between articulators are preserved as part of an articulatory gestalt lexical representation. This hypothesis, consistent with whole-word approaches to child phonology, provides a framework for understanding children's speech patterns. The strength of the information-processing approach is in recognizing the importance of perceptual feedback for tuning speech production. This emphasis is not only consistent with adult behavior; it also provides a powerful mechanism for learning and thus the ability to explain change over developmental time. These points are elaborated below with a focus on explaining children's speech patterns and developmental change.

Children's Speech Patterns

Child phonology is often viewed from the adult perspective, hence the description of children's speech as fronted, harmonized, simplified, and so on. Implicit is the idea of transformed adultlike representations. As long as the transformation results in a string of phonemes readied for output, speech acquisition can be handled by an information-processing approach and construed as phonemic acquisition (see Vihman, 2017 , for a review and critique of this view). When construed in this way, the learning problem is restricted to the mapping of phoneme-related speech sounds to articulatory movement. The DIVA model ( Guenther, 1995 ; Guenther et al., 2006 ) instantiates this view of speech acquisition and production. The following discussion focuses on the shortcomings of this model to convey a general, developmental critique of the information-processing approach. This focus is a testament to DIVA's influence on the field and to its status as the most complete and explicit statement of an information-processing theory of speech production. Also, the original DIVA model ( Guenther, 1995 ), though ultimately adult focused, was at least constructed to reflect the knowledge that adult behavior emerges over developmental time. This further increases the relevance of DIVA to the present discussion.

In DIVA, speech motor targets are specified as coordinates in an orosensory space. The coordinates correspond to vocal tract shapes. Speech motor goals are acoustically defined and reside in the speech sound map of the model. Linkages between the speech sound map and orosensory space are acquired during babbling. An orosensory to articulation map is established during the first phase of babbling via random articulatory movements. The speech sound map is then acquired during a second phase that relies on overt perceptual feedback to register regions in the orosensory space associated with known (i.e., perceptually acquired) language-specific sounds. Once linkages between discrete sounds and articulation have been established via orosensory space, speech production can be driven by phoneme strings that sequentially activate cells within the speech sound map.

The ease with which the DIVA model can learn to produce language-specific sequences highlights a limitation of the information-processing approach to the development of speech production: It does not take seriously the slow development of speech motor skills. Production proceeds just as in the adult once the phoneme-to-sound and sound-to-articulation mappings have been established. For example, “after babbling, the (DIVA) model can produce arbitrary phoneme strings using a set of 29 English phonemes in any combination” ( Guenther, 1995 , p. 598). In this way, DIVA's behavior is obviously at odds with real development. Child phonological patterns such as gliding ( leg ➔ weg, bread ➔ bwead ), stopping ( feet ➔ peet, house ➔ hout ), epenthesis ( sleep ➔ se-leep, green ➔ ge-reen ), and cluster simplification ( clean ➔ keen, stop ➔ top ) often persist until the school-age years ( Stoel-Gammon & Dunn, 1985 , pp. 43–46).

Although child phonological patterns can be explained within the information-processing approach by positing grammatical rules that constrain sequencing (see, e.g., Kager, Pater, & Zonneveld, 2004 , and the contributions therein), the assumption that children learn via perceptual feedback to produce discrete perceptual goals in sequence incorrectly predicts that young children produce speech that is less coarticulated than adult speech (see, e.g., Guenther, 1995 ; Kent, 1983 ; Tilsen, 2014 ). Guenther (1995 , p. 617) cites Thompson and Hixon's (1979) study on anticipatory nasal coarticulation in support of this prediction. However, the vowel midpoint measure used in that study assumes static phonemic targets that are achieved at the middle of an acoustic interval rather than the dynamic specification of movement. Flege (1988) took a different approach and measured the duration of nasalization across the entire vowel in child and adult speech. His results showed that both children and adults both open “the (velar-pharyngeal port) long before the lingual constriction for word-final /n/” (p. 533). Moreover, when vowel duration was controlled, Flege found no significant differences in the degree to which children and adults engaged in anticipatory behavior.

Guenther (1995) also cites Kent's (1983) chapter to argue that children's speech is more segmental than that of adults. This was Kent's contention, but it was not rigorously demonstrated. Instead, Kent made a qualitative comparison of F2 trajectories in 4-year-old children's and adults' production of spoken phrases. He discussed the F2 patterns in the spectrograms provided and noted that children's vowel productions appeared to be less influenced by adjacent consonantal articulations than adults' vowel productions. I found something similar in an acoustic investigation of unstressed vowels produced by 5-year-olds, 8-year-olds, and adults ( Redford, 2018 ), but other findings were that anticipatory V-to-C effects on F1 were stronger in children's speech than in adults' speech.

In fact, findings from recent ultrasound studies on coarticulation in children's and adults' speech strongly suggest that children's speech is more coarticulated than adults' speech ( Noiray, Abakarova, Rubertus, Krüger, & Tiede, 2018 ; Noiray, Ménard, & Iskarous, 2013 ; Zharkova, Hewlett, & Hardcastle, 2011 , 2012 ; but see Barbier, 2016 , for an alternative view). For instance, Zharkova et al. (2011) used ultrasound to investigate C-to-V coarticulation in school-aged children's and adults' production of /ʃV/ syllables in the frame sentence “It's a __ Pam.” They found that children's production of the palato-alveolar fricative was more influenced by the following vowel than adults' productions (see also Zharkova et al., 2011 ). Noiray et al. (2018) studied coarticulation degree across a wider age range and more consonantal and vocalic contrasts. Their results showed that coarticulation degree becomes weaker with age. In particular, they found that preschool children's articulation of labial, alveolar, and velar stop consonants was all more influenced by the following vowel than school-aged children's articulation of these consonants and that coarticulation degree was stronger in school-aged children's productions than in adults' productions. These and other similar results are opposite the prediction from the information-processing hypothesis that phonemes provide a basis for speech acquisition and production.

In contrast to the information-processing approach, the ecological dynamics approach to speech production predicts that children's speech is more coarticulated than adults' ( Nittrouer, 1993 , 1995 ; Nittrouer, Studdert-Kennedy, & McGowan, 1989 ; Nittrouer, Studdert-Kennedy, & Neely, 1996 ; see also Noiray et al., 2018 , 2013 ). For example, Nittrouer (1995) hypothesized that children's early word productions are articulatory gestalts and that “the emergence of mature production skills involves two processes: differentiation and tuning of individual gestures, and improvement in coordination among gestures that compose a word” (p. 521). The hypothesis aligns well with a functional approach to child phonology, which emphasizes the communicative intent behind spoken language production and so argues for word-based analyses of children's speech sound patterns (e.g., Ferguson & Farwell, 1975 ; Menn, 1983 ; Stoel-Gammon, 1983 ; Vihman, 2017 ; Vihman & Croft, 2007 ; Vihman, Macken, Miller, Simmons, & Miller, 1985 ; Waterson, 1971 ). In fact, Nittrouer et al. (1989, pp. 120–121) explicitly motivated their prediction that children's speech is more coarticulated than adults' with reference to two of the articles that first introduced the idea that child phonology should take the word as its principal unit of analysis (see “setting papers” in Vihman & Keren-Portnoy, 2013 ). Following Ferguson and Farwell (1975) , they suggested that a child's failure to appropriately generalize correct phonetic forms (e.g., [n] and [m]) from one word to another (e.g., “no” is [noʊ], but “night” is [mɑɪt], whereas “moo” is [buː]) indicated that whole words, rather than phonemes, were the targets of acquisition and also the units of production. Nittrouer et al. also referred to Ferguson and Farwell's observation of children's variable word realizations to argue for an account of word form representation as a “collection of gestures” that were inappropriately timed and so genuinely more gestalt-like than segment-like. Finally, they cited Menn's (1983) analysis of consonant harmony in her son's first words to make a point about the existence of “articulatory routines” for word production.

In summary, children's speech patterns are more compatible with the hypothesis of whole-word production than with the hypothesis of phonemic, or segmental, production. In so far as the systematic patterns of child phonology can also be explained to emerge from motoric constraints (see, e.g., Davis, MacNeilage, & Matyear, 2002 ; Locke, 1983 ; McCune & Vihman, 1987 ), the ecological dynamics emphasis on action-based representations is also more compatible with children's speech patterns than the information-processing emphasis on sequencing constraints derived from a child-specific grammar. For this reason, I deem holistic motoric word form representations fundamental to a developmentally sensitive theory of speech production.

Explaining Phonological Change Over Developmental Time

As in Redford (2015) , the specific proposal is that children begin to acquire holistic motoric representations, or schemas, with their attempts at first words. These schemas then provide the basic speech plan for future word productions. This proposal begs the developmental question: How do schema representations change over time as children's speech becomes more and more adultlike? Here, I argue that the information-processing assumption of separate perception and production systems is required to account for developmental change. To make this argument, let us first consider development from the ecological dynamics perspective.

In an ecological dynamics approach, learning is an attunement process ( Goldstein & Fowler, 2003 ; Studdert-Kennedy, 1987 ). Unsuccessful communication destabilizes representations that encode timing relations between gestures, forcing a random walk through motor space until the word-specific timing patterns have been discovered (see, e.g., Nam et al., 2009 ). This mode of phonological learning implies that the temporary but systematic patterns of child phonology represent local minima in the random walk. This implication is consistent with articulatory constraint-based explanations for these patterns (e.g., Davis & MacNeilage, 2000 ; Davis et al., 2002 ; Locke, 1983 ; McCune & Vihman, 1987 ). However, similar to the constraint-based explanations, the assumption of a self-organized system based on dynamic principles predicts a universal pattern of speech development, albeit one that interacts in predictable ways with the target language. This prediction is undermined by the strong individual differences in speech development that are observed within a language (e.g., Ferguson & Farwell, 1975 ; Macken & Ferguson, 1981 ; Stoel-Gammon & Cooper, 1984 ; Vihman, Ferguson, & Elbert, 1986 ).

Ferguson and Farwell (1975) were among the first to take individual differences in development seriously and to propose, in effect, that these signal the child's control over the speech production process. The specific suggestion was that children select word forms from the adult language that they are able to produce. Word selection implies a kind of insight into the production process meted out by an executive controller—an implication that is anathema to the ecological dynamics approach. McCune and Vihman (1987 , 2001) better defined the “what” of what children are able to produce when they proposed that children build up a unique set of vocal motor schemes during babbling based on individual preferences for particular patterns. Vihman (1996) then recast the notion of selection with respect to these schemas. She proposed that a schema acted as a kind of “articulatory filter” that “selectively enhances motoric recall of phonetically accessible words” (p. 142). Elsewhere, Vihman (2017) refers to resonances between the production and perception systems to explain the selective memory for phonetically accessible words. In this way, Vihman is able to explain individual differences in words and forms attempted while avoiding the homunculus problem inherent to the concept of an executive controller.

Although the idea of an articulatory filter very much implies interactions between action and perception, the specific theory of perception Vihman adopts is very clearly not a direct realist one; for example, elsewhere, Vihman is interested in the role of perceptual saliency in children's development of lexical representations (e.g., Vihman, Nakai, DePaolis, & Hallé, 2004 ). The notion of perceptual saliency relies on the psychoacoustic theory of speech perception that undergirds the information-processing approach of speech production, that is, a theory of perception in which the perceptual primitives are “intrinsically meaningless, simple acoustic features, such as spectral distribution patterns, bursts of band-limited aperiodic noise … into which the speech signal can be analyzed” ( Best, 1995 , p. 175). Why does Vihman adopt this theory? Probably because a psychoacoustic theory of speech perception provides targets of acquisition that go beyond a child's immediate abilities and so allow for directed motor learning and change (see also Menn, Schmidt, & Nicholas, 2013 ). More generally, a psychoacoustic theory of speech perception explains a wider variety of speech-related phenomena than a direct realist theory; for example, it accounts for categorical perception in nonhuman animals and why auditory processing constraints appear to affect the structure of phonological systems (see Diehl, Lotto, & Holt, 2004 , for a review).

In summary, the observation that individual children take very different paths to acquire the same spoken language suggests a developmental process more compatible with the information-processing assumption of distinct perception and production systems than with the ecological dynamics assumption of a unified perception–action system. The developmentally sensitive theory to speech production described below further assumes that distinct production and perception systems entail a role for central perceptual feedback control in speech production.

A Developmental Approach to Speech Production

The developmentally sensitive theory of speech production outlined in this section extends the basic idea, first outlined in Redford (2015) , that adult speech production processes and representations are structured by the acquisition of spoken language. The alternative view, implicit in mainstream theory, is that adult speech production processes and representations are the targets of spoken language acquisition. As in Redford (2015) , the theory assumes that the fundamental unit of production is a word. This assumption follows from the view that “the child's entry into language is mediated by meaning: and meaning cannot be conveyed by isolated features or phonemes” ( Studdert-Kennedy, 1987 , p. 51). Similar to an ecological dynamics approach, endogenous representations are assumed to be holistic and action based. As in Redford (2015) , I call these representations schemas, not gestural scores or coupling graphs, to acknowledge borrowing from Vihman and McCune's theoretical work on child phonology ( McCune & Vihman, 1987 , 2001 ; Vihman & McCune, 1994 ) and debts to schema theory in the area of skilled action and motor control ( Arbib, 1992 ; Cooper & Shallice, 2006 ; Norman & Shallice, 1986 ; Schmidt, 1975 ). These acknowledgments also signal the aforementioned embrace of certain information-processing assumptions, namely, that production and perception are distinct processes and that adults implicitly predict perceptual outcomes and use perceptual feedback to make articulatory (and whole-word) adjustments while speaking.

In addition to building on these assumptions, the developmentally sensitive theory outlined here emphasizes two distinctions: (a) the distinction between others' productions and self-productions and (b) the distinction between self-productions for oneself and self-productions for others. Self-productions provide a basis for endogenous representations. When these are for oneself, they are assumed to be exploratory and so free from association with conceptual information. In this way, they provide the basis for the nonlinguistic perceptual–motor map that is used to integrate exemplar and schema representations for production. When self-productions are for others, they are assumed to be communicative and associated with conceptual information. In this way, they provide the basis for schemas. In contrast to self-productions, others' productions provide the basis for just one type of representation—an exogenous perceptual representation associated with conceptual information. I will call this representation a perceptual exemplar . This label acknowledges inspiration from a class of phonetically informed phonological theories that emphasize the importance of detailed, often word-specific, acoustic–phonetic information for production (e.g., Johnson, 2007 ; Pierrehumbert, 2002 ). Perceptual exemplars provide production targets. A child cannot even attempt first words without having acquired at least a few of these from the ambient language.

The foundational assumptions enumerated above entail speech plan representations that are different from either the ecological dynamics or information-processing approaches to speech production. They also entail a different approach to phonology than the ones alluded to so far. Otherwise, the developmentally sensitive theory proposed here borrows heavily from current models of speech production and motor control. It contributes to the field by accounting for the transition from prespeech to adultlike speech in a series of steps that correspond to major developmental milestones.

Step 1: The Perceptual–Motor Map

As in an information-processing approach to speech production, a developmental approach requires a perceptual–motor map, specifically a mapping between auditory speech and articulatory movement that is likely mediated by somatosensory information (e.g., Guenther, 1995 ; Guenther et al., 2006 ; Perkell et al., 1993 ). The existence of a perceptual–motor map is supported by neuropsychological findings on sensorimotor integration in different regions along the auditory dorsal stream pathway from the primary auditory cortex (= superior temporal gyrus, superior temporal sulcus) to the anterior premotor cortex (= inferior frontal gyrus; see Hickok & Poeppel, 2007 ). It is common to assume that the perceptual–motor map develops during the first year of life as infants engage in vocal exploration (e.g., Davis & MacNeilage, 2000 ; Guenther, 1995 ; Hickok, Buchsbaum, Humphries, & Muftuler, 2003 ; Kuhl, 2000 ; Menn et al., 2013 ). Following Oller (2000, pp. 165–179) , I will assume that this exploration includes all prespeech vocalizations from cooing to squealing to babbling and so describes the mapping of continuous acoustic and motor dimensions, with somatosensory information at the intersection of these two. For example, it associates the frequency sweeps of squealing with continuous changes to the length and tension of the vocal folds and the amplitude-modulated frication of raspberries with the forcing of air through loosely coupled lips. It also associates static sounds, such as silence, to transient actions in the vocal tract, such as a briefly sustained oral or glottal closure. This view of the perceptual–motor map enables the gestural interpretation of acoustic form (cf. Best, 1995 ; see also Hickok, 2012 , 2014 ) and so can take holistic representations as input.

Although the map develops during the prespeech period of infant vocalization, it is important to stipulate that it continues to evolve with the acquisition of speech motor skills and across the life span with the acquisition of new languages and with conformity to or disengagement from the sociolinguistic environment (see Kuhl, Ramírez, Bosseler, Lin, & Imada, 2014 , for a related view). In the context of the current theory, this assumption is required to explain developmental changes that are traditionally attributed to the phonology, that is, the evolution of word forms from childlike to more adultlike. This is because the perceptual–motor map provides a source for the abstract action-based word form representations that are schemas, as described below.

Step 2: Perceptual Word Forms and Action Schemas

Children's first words mark the onset of speech production. Word production depends on conceptual development, including the insight that adult vocalizations are referential. This insight, which occurs perhaps as early as 7 months of age ( Bergelson & Swingley, 2012 ; Harris, Yeeles, Chasin, & Oakley, 1995 ), coincides with the acquisition of perceptual word forms—exemplars—from the ambient language. Bergelson and Swingley (2012) provided evidence for this claim when they used eye tracking to assess 6- to 9-month-old infants' ability to comprehend familiar nouns by discriminating between paired pictures while listening to spoken stimuli (e.g., “Can you find the X ?” and “Where's the X ?”). The authors reported that infants as young as 6 months of age were reliably able to discriminate a significant number of the pairs. Note that, by most accounts, perceptual attunement to the native language occurs between 6 and 10 months of age (see Vihman, 2017 , for a review). Bergelson and Swingley therefore interpreted the finding to indicate that learning the sounds of a language goes hand in hand with learning its vocabulary.

At around 12 months of age, the infant has acquired both a reasonably stable perceptual lexicon and a perceptual–motor map. The production of first words is now possible. This heralds the onset of speech production, which is imagined here as the moment when the infant, motivated to communicate a specific referential meaning, uses her perceptual–motor map to translate an exogenously derived perceptual exemplar into vocal action. As in Redford (2015) , I assume that the motor routines an infant first uses to convey a particular concept are abstracted and associated with that concept when the child has succeeded in communicating the intended meaning. This abstraction is the schema. Similar to gestural scores, schemas encode routine-specific relational information between articulators across time, for example, tongue advancement during jaw opening. Similar to coupling graphs, they are the action-based word form representations. Put another way, schemas are both the phonological representation and speech plan for a given word/concept, where word is broadly construed as any conventionalized form–meaning association that is part of the child's repertoire (e.g., “uh oh” or “gimme” for “give me”). Figure 2 depicts first word production and schema abstraction.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g002.jpg

The onset of speech coincides with attempts to produce specific meanings (i.e., concepts) associated with perceptual word forms learned from the ambient language (left). Specifically, infants engage their perceptual–motor map to derive a best motoric approximation of the exogenous perceptual form or “perceptual exemplar.” The shape of the approximation will depend on how the map has been warped through vocal exploration, which itself is constrained by motor development. The motor routines used to convey specific concepts are abstracted and stored during production (right). These abstractions, or “motor schemas,” are associated with the concept attempted and so serve as one half of the phonological representation of a word. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Schemas are continually updated with production. This means that they become more abstract over time as a one-to-one relationship with a single motor routine gives way to timing generalizations that are common to all attempts of a particular word production. Note that the protracted development of articulatory timing control, which results in highly variable speech output, ensures that the schema-encoded generalizations become abstract quite quickly. Ultimately, schemas may encode little else than the number of syllables as iterations of the open–close cycle of the vocal tract and the relative durations of these cycles, plus the initial posture and direction of major articulators for each cycle. This hypothesis is consistent (or at least reconcilable) with evidence for serial timing control and frame-based plans generated in the supplementary motor area and the pre–supplementary motor area, respectively, during adult speech production (see, e.g., Bohland & Guenther, 2006 ; MacNeilage, 1998 ).

Step 3: Onset of Perceptually Based Control

Once schemas are abstracted, they are activated with the perceptual form when a concept is selected for production. The motor and perceptual forms are integrated in the perceptual–motor map. Hickok, Houde, and Rong (2011 , p. 413) adopt a similar hypothesis, albeit with an emphasis on sensorimotor integration at the level of phoneme production. They note that the hypothesis “is consistent with Wernicke's early model in which he argued that the representation of speech, e.g., a word, has two components, one sensory (what the word sounds like) and one motor (what sequence of movements will generate that sequence of sounds).” Wernicke's exact hypothesis of dual word form representations is adopted here to explain both why child forms deviate from adult forms and how the forms change over time.

With respect to children's deviant forms, schemas are assumed to initially weight production in such a way that it appears motorically constrained. The weighting is the result of a very small productive vocabulary, which serves to entrench particular trajectories through motor space. For a while, this entrenchment may even limit the child's ability to form new motor trajectories. At this stage, children's productions of novel words may appear more template-like than in first word production. In Vihman and Croft's (2007 , p. 696) words, “the child (implicitly) impos(es) one or more preexisting templates, or familiar phonological patterns, on an adult form that is…similar to those patterns.”

Around 18 months of age, significant vocabulary expansion results in a developmental shift away from forms that suggest production constraints and toward those that suggest perceptual ones due to increasing homophony among expressive word forms ( Redford & Miikkulainen, 2007 ). This shift heralds the next critical step in the evolution of speech production: a newfound focus on how self-productions should sound. The onset of predictive encoding (state feedback control) emerges from this focus.

In particular, the proposed process by which the 18-month-old infant begins to forge new paths through motor space takes as its inspiration the hierarchical state feedback control model of production ( Hickok, 2012 , 2014 ; Hickok et al., 2011 ), where state feedback control is described as having two functions. The first is to adjust motor commands so that the articulators reach desired perceptual targets; the second is to use external feedback to update the representations that guide speech. In the present proposal, both functions are thought to emerge with a communication-driven shift in production toward better matching of endogenously derived motor forms to exogenously derived perceptual forms. Furthermore, Function 2 is proposed to drive Function 1 in that Function 1 may begin as a delayed comparison between the perceptual trace of a production and the intended target, absent any motor adjustments (see Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g003.jpg

Following early word production, the next major developmental change is hypothesized to occur when motorically driven homophony begins to threaten the young child's ability to effectively communicate. At this stage, the child begins to focus on how words should sound. As a result, production shifts from an entirely feedforward process to one where feedforward routines are adjusted to match perceptual representations. The adjustment process, carried out through interactions between the endogenous perceptual–motor map and the repository of exogenous word form representations or “perceptual exemplars,” sets the stage for state feedback control, which nonetheless begins with a delayed comparison between the perceptual trace and target—absent adjustment (left). Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

How might a delayed matching process evolve into real-time state feedback control? One possibility is that the matching process creates a bidirectional connection between the exogenously derived exemplar targets and the perceptual–motor map, where the connections between motor routines and perceptual patterns are already robust and bidirectional. Now, the perceptual outcomes of schema-associated routines can be matched in real time against perceptual exemplars. Any discrepancies between the expected self-outcomes and other-based representations could force new paths through motor space by stretching entrenched motor routines in the direction of the exogenously derived perceptual form.

Step 4: Self-Monitoring

Speech production does not become adultlike until children begin to externally monitor their own speech and consciously recognize its divergence from (chosen) adult norms. The evidence suggests that this may not occur until around the age of 4 years. In particular, feedback perturbation experiments with young children suggest that perceptual input plays little role in speech production before the age of 4 years; for example, toddlers neither immediately compensate nor adapt over time with articulatory changes to their vowel productions when hearing spectrally perturbed alterations of their own speech during a word production task ( MacDonald, Johnson, Forsythe, Plante, & Munhall, 2012 ). At the age of 4 years, children begin to compensate but do not adapt over the long term to perturbed feedback ( MacDonald et al., 2012 ; Ménard, Perrier, Aubin, Savariaux, & Thibeault, 2008 ); for example, Ménard et al. showed that 4-year-old children return immediately to preferred productions after compensating online to an articulatory perturbation. Failures to adapt suggest that, although 4-year-old children may use auditory information to help guide speech production, they do not yet use external feedback to update existing production representations and processes. Still, the ability to adapt appears to emerge soon after 4 years of age in typically developing children ( Terband, Van Brenk, & van Doornik-van der Zee, 2014 ).

Psycholinguistic evidence is consistent with the hypothesis that self-monitoring emerges late in the preschool years during spoken language development. For example, preschool children understand unfamiliar adult speech better than their own unadultlike speech ( Dodd, 1975 ). In addition, self-initiated speech repairs increase over developmental time, with many fewer repairs observed in the speech of 5-year-old children than in the speech of older school-aged children ( Evans, 1985 ; Rogers, 1978 ). Moreover, if we imagine the self-monitoring process as one where the speaker must identify particular discrepancies between what they intended to produce and what they actually produced, then its slow development is consistent with the slow development of selective attention (see, e.g., Plude, Enns, & Brodeur, 1994 ; Wellman, Cross, & Watson, 2001 ). The speculation here is that selective attention to one's own speech is motivated also by a developing self-concept. When the child begins to appreciate those aspects of his or her own speech that signal an undesired social distance between himself or herself and others, he or she shifts his or her attentional focus to identifying discrepancy between how he or she sounds and who he or she wants to sound like. This motivates a final marked disruption of entrenched motor routines in service of better approximating the exogenously derived exemplars.

Self-concept emerges with theory of mind during the preschool years (see Symons, 2004 ). Self-identity, which is part of the self-concept ( Baumeister, 1999 ; Gecas, 1982 ), manifests in speech with socio-indexical marking. For example, VOT for stops varies differently as a function of gender across languages ( Li, 2013 ; Oh, 2011 ; Whiteside & Irving, 1998 ), suggesting social as opposed to physiological reasons for this speech production difference. How does the child acquire female- versus male-gendered speech? The suggestion here is that a burgeoning sense of identity leads the child to selectively attend to those adult productions he or she is most interested in approximating. In identifying a discrepancy between how they sound and who they want to sound like, children may highlight exemplars associated with those individuals, thereby highlighting aspects of the perceptual form that need special attention in production. At the same time, self-monitoring focuses more attention on the perceptual consequences of one's own speech, which further increases the weight of exemplars in the production process, thus pushing motor routines and resulting schema ever more in the adult direction (see Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g004.jpg

During the preschool years, children begin to self-monitor based on external perceptual feedback to identify deviations between how they sound and who they want to sound like. The perceived deviations highlight aspects of the stored perceptual representations, driving the perceptual–motor mapping and resulting endogenous motoric representations (i.e., schemas) ever more toward matching exogenous perceptual goals (i.e., exemplars). Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Thus, the full proposal is that, during the preschool years, socially directed listening induces changes in speech production through a self-monitoring–led shift toward perceptually weighted production. Prior to this point, self-productions are (unconsciously) heard as being the same as other productions. Consider, for example, the toddler who points to a picture of a fish in a picture book and utters “fifth,” to which the parent responds “fifth?” and the child answers, “No, fifth!” (see Menn, 1983 ). Updates to both the perceptual–motor map and schema representations follow from this shift, soon resulting in adultlike representations. This proposed final stage in the development of speech production is consistent with the evidence that socio-indexical information, such as gender-specific use of phonetic features, begin to emerge in children's speech around the age of 4 years (see Foulkes & Docherty, 2006 , pp. 422–424). This observation brings us back to an earlier one that closes the gap between work in speech motor control and real-world speaker behavior, that is, the observation that participants' behavior in auditory feedback perturbation experiments resembles phonetic convergence, normally understood as a socially driven behavior meant to lubricate interactions between interlocutors.

Current approaches to speech production aim to explain adult behavior and, in so doing, frequently make at least some assumptions that, when taken to their logical conclusion, fail to adequately account for how the system develops. This failure is problematic from a developmental perspective. According to this perspective, the representations and processes of adult speech and language should emerge from the developmental process (for a similar view, see Menn et al., 2013 ; Vihman & Croft, 2007 ).

Development is particularly relevant for theories of speech production because of the paradox of early speech onset despite slowly developing speech motor control. Here, this paradox was taken to suggest the working hypothesis that feedforward processes mature earlier than central feedback control processes in speech production. This hypothesis structured a developmentally sensitive theory of speech production that was elaborated in stages, with each stage building on the previous one. The stages proposed were designed to accommodate developmental patterns. At the same time, developmental patterns were given new meanings and grouped in novel ways by the working hypothesis. The accommodation of speech production theory to developmental findings and vice versa results in many new testable hypotheses that could motivate future empirical work and usher in new knowledge and even new clinical practice. For example, the hypothesis that perceptual–motor integration relies on the development of a nonlinguistic perceptual–motor map suggests that therapeutic uses of speech sound practice should cover as broad a range of sound combinations as possible. By hypothesis, these sound combinations need not be tied to lexical content and so the therapy could involve a fun and silly random sound sequence–generating game using, say, magnetic letters that could be arranged and then rearranged on a board. Such a game would allow the set of possible sound combinations in a language to be more fully explored than is possible when that set is constrained by picturable words in the language. The benefits of this therapy for generalization to novel or known word production could be tested against current therapies where speech sound practice typically involves the use of visual props to elicit specific lexical items. Intriguingly, this idea echoes, to some extent, Gierut's (2007) differently motivated contention that words with complex speech sound sequences allow for better generalization of treatment in children with phonological disorder than words that have simple phonological structure.

The hypothesized disassociation of the perceptual–motor map and perceptual exemplar representation of word forms also has implications for the clinical assessment of speech sound disorder. For example, when this hypothesis is taken together with the idea that articulatory change is motivated by weighting perceptual exemplar representations more heavily during production, it suggests that the aforementioned fun and silly random sound sequence–generating game could be used to supplement a comprehensive evaluation of speech sound disorder. Performance in the game could help diagnose whether the articulation problem is due to a poorly developed perceptual–motor map or to poorly specified perceptual exemplars. The diagnosis would then lead to therapy that focuses either on speech sound practice or on developing perceptual exemplars. Finally, the theory-dependent hypothesis that perceptual weighting of production is driven in part by the emergence of a self-concept and the ensuing selective attention to self-productions suggests not only a testable hypothesis regarding the development of convergence behaviors in spoken language interactions but also a novel way to understand the absence of convergence behaviors and mild segmental speech sound disorders in individuals on the autism spectrum.

Another major implication of the developmentally sensitive theory elaborated in this review article is a new adult model of speech production. This model, illustrated in Figure 5 , incorporates insights from many existing theories. Some of these insights were explicitly acknowledged in the preceding text; others were merely implied. For example, the reference to “self-monitoring” indicates an acceptance of the evidence in favor of this well-established hypothesis (see Postma, 2000 , for a review). Otherwise, the model diverges from most adult-focused theories in assuming distinct action- and perception-based representations (though see Hickok, 2012 , 2014 ). This aspect of the model provides a framework for understanding phenomena that have been traditionally ignored in adult-focused theories of speech production. For example, the model very obviously allows for the different possible speaking modes that are thought to correspond with speaking style differences specifically, one mode wherein the motor pathway is emphasized over the perceptual pathway—this is Lindblom's (1990) hypo or system-oriented mode, one mode wherein the reverse occurs—this is Lindblom's hyper or output-oriented mode (shown); and a mode mode wherein the two pathways are in equilibrium—this is likely the default mode.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g005.jpg

The adult model of speech production implied by the developmental model outlined in this review article. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes. The linkages between the repository of lexical concepts and motor schemas and between lexical concepts and perceptual exemplars represent the conceptual and phonological aspects of the lexicon.

The implied adult model shown in Figure 5 also diverges from information-processing theories in assuming that holistic phonological representations serve as speech plan representations. This developmentally sensitive aspect of the model is not immediately compatible with the evidence for sublexical units in productions, including the speech error data that have long been used to argue for the psychological reality of a phonological encoding process. The developmentally sensitive adult model automatically fails if it cannot account for these data. Accordingly, we are currently pursuing the hypothesis that discreteness emerges at the level of the perceptual–motor map ( Davis & Redford, 2019 ). More specifically, we have formally defined the perceptual–motor map as a linked set of experienced perceptual and motor trajectories that are time-based excursions through speaker-defined perceptual and motor spaces. By hypothesis, nodes appear where motor trajectories intersect in motor space, creating perceptually linked node-delimited paths that can be recombined. Though weighted in the direction of already experienced paths, exemplar-driven novel word production picks new trajectories through motor space by deforming existing node-delimited paths in systematic ways. These new trajectories may intersect existing trajectories or go on to be intersected themselves. In this way, motor space is reticulated with vocabulary acquisition, and discrete speech motor goals emerge absent discrete phonological representations. In future work, we will investigate how this view of discreteness might account for the speech error data. Our initial hypothesis is that these arise from the competing motoric and perceptual pressures of schema and exemplar integration during speech production.

Theories of spoken language production provide frameworks for understanding developmental speech sound disorders. Even the distinction between motor speech, articulation, and phonological disorders reflects this fact. In so far as the types of interventions chosen to address a disorder follow from how the disorder is understood, theory informs practice. This is as it should be. However, the relationship between theory and practice should also motivate a reconsideration of theory when it fails to address a problem that is relevant to practice. The problem of development clearly falls into this category. A major aim of this review article was to show that current adult-focused approaches to speech production fail to address the paradox of slow developing speech motor control despite early speech onset because they depart from perspectives that are not developmental. A developmental perspective assumes change over time, and those who adopt it focus on explaining how this change occurs. A second major aim of this review article was to show how a commitment to this perspective leads to a theory of speech production that is different in many respects from existing theories. Thus, even if the various ideas presented herein are dismissed after testing, the conclusion should be that a developmental approach to understanding speech production should be pursued if theory is to be useful for practice.

Acknowledgments

Article preparation was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development under Grant R01HD087452. The content is solely the author's responsibility and does not necessarily reflect the views of the National Institute of Child Health & Human Development.

Funding Statement

Article preparation was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development under Grant R01HD087452.

  • Abbs J. H., & Gracco V. L. (1984). Control of complex motor gestures: Orofacial muscle responses to load perturbations of lip during speech . Journal of Neurophysiology , 51 ( 4 ), 705–723. [ PubMed ] [ Google Scholar ]
  • Arbib M. A. (1992). Schema theory . In Shapiro S. (Ed.), The encyclopedia of artificial intelligence (Vol. 2 , pp. 1427–1443). Hoboken, NJ: Wiley. [ Google Scholar ]
  • Babel M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation . Journal of Phonetics , 40 ( 1 ), 177–189. [ Google Scholar ]
  • Barbier G. (2016). Contrôle de la production de la parole chez l'enfant de 4 ans: L'anticipation comme indice de maturité motrice [Speech motor control in the 4-year-old child: Anticipatory coarticulation as an index of speech motor development.] (PhD thesis) . Université Grenoble Alpes, Grenoble, France. [ Google Scholar ]
  • Baumeister R. F. (1999). Self-concept, self-esteem, and identity . In Derlega V. J., Winstead B. A., & Jones W. H. (Eds.), Personality: Contemporary theory and research (Nelson-Hall series in psychology) (2nd ed., pp. 339–375). Chicago, IL: Nelson-Hall Publishers. [ Google Scholar ]
  • Bergelson E., & Swingley D. (2012). At 6–9 months, human infants know the meanings of many common nouns . Proceedings of the National Academy of Sciences of the United States of America , 109 ( 9 ), 3253–3258. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Best C. T. (1995). The emergence of native-language phonological influences in infants: A perceptual assimilation model . In Goodman J. C. & Nusbaum H. C. (Eds.), The development of speech perception: The transition from speech sounds to spoken words (pp. 167–224). Cambridge, MA: MIT Press. [ Google Scholar ]
  • Best C. T., Goldstein L. M., Nam H., & Tyler M. D. (2016). Articulating what infants attune to in native speech . Ecological Psychology , 28 ( 4 ), 216–261. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bladon R. A. W., & Al-Bamerni A. (1976). Coarticulation resistance in English /l/ . Journal of Phonetics , 4 ( 2 ), 137–150. [ Google Scholar ]
  • Bock K., & Levelt W. (2002). Language production . In Altman G. T. (Ed.), Psycholinguistics: Critical concepts in psychology (pp. 405–450). Abingdon-on-Thames, England: Routledge. [ Google Scholar ]
  • Bohland J. W., Bullock D., & Guenther F. H. (2010). Neural representations and mechanisms for the performance of simple speech sequences . Journal of Cognitive Neuroscience , 22 ( 7 ), 1504–1529. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bohland J. W., & Guenther F. H. (2006). An fMRI investigation of syllable sequence production . Neuroimage , 32 ( 2 ), 821–841. [ PubMed ] [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1988). Some notes on syllable structure in articulatory phonology . Phonetica , 45 ( 2–4 ), 140–155. [ PubMed ] [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1989). Articulatory gestures as phonological units . Phonology , 6 ( 2 ), 201–251. [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1992). Articulatory phonology: An overview . Phonetica , 49 ( 3–4 ), 155–180. [ PubMed ] [ Google Scholar ]
  • Cooper R. P., & Shallice T. (2006). Hierarchical schemas and goals in the control of sequential behavior . Psychological Review , 113 ( 4 ), 887–916. [ PubMed ] [ Google Scholar ]
  • Daniloff R., & Hammarberg R. (1973). On defining coarticulation . Journal of Phonetics , 1 ( 3 ), 239–248. [ Google Scholar ]
  • Davis B. L., & MacNeilage P. F. (2000). An embodiment perspective on the acquisition of speech perception . Phonetica , 57 ( 2–4 ), 229–241. [ PubMed ] [ Google Scholar ]
  • Davis B. L., MacNeilage P. F., & Matyear C. L. (2002). Acquisition of serial complexity in speech production: A comparison of phonetic and phonological approaches to first word production . Phonetica , 59 ( 2–3 ), 75–107. [ PubMed ] [ Google Scholar ]
  • Davis M., & Redford M. A. (2019). The emergence of discrete motor units in a production model that assumes holistic speech plans . Submitted. [ PMC free article ] [ PubMed ]
  • Dell G. S. (1986). A spreading-activation theory of retrieval in sentence production . Psychological Review , 93 ( 3 ), 283–321. [ PubMed ] [ Google Scholar ]
  • Diehl R. L., Lotto A. J., & Holt L. L. (2004). Speech perception . Annual Review of Psychology , 55 , 149–179. [ PubMed ] [ Google Scholar ]
  • Dodd B. (1975). Children's understanding of their own phonological forms . The Quarterly Journal of Experimental Psychology , 27 ( 2 ), 165–172. [ PubMed ] [ Google Scholar ]
  • Evans M. A. (1985). Self-initiated speech repairs: A reflection of communicative monitoring in young children . Developmental Psychology , 21 ( 2 ), 365–371. [ Google Scholar ]
  • Ferguson C. A., & Farwell C. B. (1975). Words and sounds in early language acquisition . Language , 51 , 491–439. [ Google Scholar ]
  • Flege J. E. (1988). Anticipatory and carry-over nasal coarticulation in the speech of children and adults . Journal of Speech and Hearing Research , 31 ( 4 ), 525–536. [ PubMed ] [ Google Scholar ]
  • Foulkes P., & Docherty G. (2006). The social life of phonetics and phonology . Journal of Phonetics , 34 ( 4 ), 409–438. [ Google Scholar ]
  • Fowler C. A. (1980). Coarticulation and theories of extrinsic timing . Journal of Phonetics , 8 ( 1 ), 113–133. [ Google Scholar ]
  • Fowler C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective . Journal of Phonetics , 14 , 3–28. [ Google Scholar ]
  • Fowler C. A., & Saltzman E. (1993). Coordination and coarticulation in speech production . Language and Speech , 36 ( 2–3 ), 171–195. [ PubMed ] [ Google Scholar ]
  • Galantucci B., Fowler C. A., & Turvey M. T. (2006). The motor theory of speech perception reviewed . Psychonomic Bulletin & Review , 13 ( 3 ), 361–377. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Garrett M. F. (1988). Processes in language production . Linguistics: The Cambridge Survey , 3 , 69–96. [ Google Scholar ]
  • Gecas V. (1982). The self-concept . Annual Review of Sociology , 8 ( 1 ), 1–33. [ Google Scholar ]
  • Gierut J. A. (2007). Phonological complexity and language learnability . American Journal of Speech-Language Pathology , 16 ( 1 ), 6–17. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Giles H., & Powesland P. (1997). Accommodation theory . In Coupland N. & Jaworski A. (Eds.), Sociolinguistics (pp. 232–239). London, England: Palgrave. [ Google Scholar ]
  • Goldinger S. D. (1998). Echoes of echoes? An episodic theory of lexical access . Psychological Review , 105 ( 2 ), 251–279. [ PubMed ] [ Google Scholar ]
  • Goldrick M. (2006). Limited interaction in speech production: Chronometric, speech error, and neuropsychological evidence . Language and Cognitive Processes , 21 ( 7–8 ), 817–855. [ Google Scholar ]
  • Goldstein L., Byrd D., & Saltzman E. (2006). The role of vocal tract gestural action units in understanding the evolution of phonology . In Arbib M. A. (Ed.), Action to language via the mirror neuron system (pp. 215–249). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Goldstein L., & Fowler C. A. (2003). Articulatory phonology: A phonology for public language use . In Schiller N. O. & Meyer A. (Eds.), Phonetics and phonology in language comprehension and production: Differences and similarities (pp. 159–207). Berlin, Germany: De Gruyter. [ Google Scholar ]
  • Green J. R., Moore C. A., Higashikawa M., & Steeve R. W. (2000). The physiologic development of speech motor control: Lip and jaw coordination . Journal of Speech, Language, and Hearing Research , 43 , 239–255. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Guenther F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production . Psychological Review , 102 ( 3 ), 594–621. [ PubMed ] [ Google Scholar ]
  • Guenther F. H., Ghosh S. S., & Tourville J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production . Brain and Language , 96 ( 3 ), 280–301. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Guenther F. H., & Perkell J. S. (2004). A neural model of speech production and its application to studies of the role of auditory feedback in speech . In Maassen B., Kent R. D., Peters H. F. M., van Lieshout P. H. H. M., & Hulstijn W. (Eds.), Speech motor control in normal and disordered speech (pp. 29–49). Oxford, England: Oxford University Press. [ Google Scholar ]
  • Haken H., Kelso J. S., & Bunz H. (1985). A theoretical model of phase transitions in human hand movements . Biological Cybernetics , 51 ( 5 ), 347–356. [ PubMed ] [ Google Scholar ]
  • Harris M., Yeeles C., Chasin J., & Oakley Y. (1995). Symmetries and asymmetries in early lexical comprehension and production . Journal of Child Language , 22 ( 1 ), 1–18. [ PubMed ] [ Google Scholar ]
  • Hickok G. (2012). Computational neuroanatomy of speech production . Nature Reviews Neuroscience , 13 ( 2 ), 135–145. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G. (2014). The architecture of speech production and the role of the phoneme in speech processing . Language, Cognition and Neuroscience , 29 ( 1 ), 2–20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G., Buchsbaum B., Humphries C., & Muftuler T. (2003). Auditory–motor interaction revealed by fMRI: Speech, music, and working memory in area Spt . Journal of Cognitive Neuroscience , 15 ( 5 ), 673–682. [ PubMed ] [ Google Scholar ]
  • Hickok G., Houde J., & Rong F. (2011). Sensorimotor integration in speech processing: Computational basis and neural organization . Neuron , 69 ( 3 ), 407–422. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G., & Poeppel D. (2000). Towards a functional neuroanatomy of speech perception . Trends in Cognitive Sciences , 4 ( 4 ), 131–138. [ PubMed ] [ Google Scholar ]
  • Hickok G, & Poeppel D. (2007). The cortical organization of speech processing . Nature Reviews Neuroscience , 8 ( 5 ), 393–402. [ PubMed ] [ Google Scholar ]
  • Houde J. F., & Nagarajan S. S. (2011). Speech production as state feedback control . Frontiers in Human Neuroscience , 5 , 82 https://doi.org/10.3389/fnhum.2011.00082 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Johnson K. (2007). Decisions and mechanisms in exemplar-based phonology . In Sole M.-J., Beddor P., & Ohala M. (Eds.), Experimental approaches to phonology (pp. 25–40). Oxford, England: Oxford University Press. [ Google Scholar ]
  • Johnson K., Flemming E., & Wright R. (1993). The hyperspace effect: Phonetic targets are hyperarticulated . Language , 69 , 505–528. [ Google Scholar ]
  • Kager R., Pater J., & Zonneveld W. (Eds.). (2004). Constraints in phonological acquisition . Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Katseff S., Houde J., & Johnson K. (2012). Partial compensation for altered auditory feedback: A tradeoff with somatosensory feedback? Language and Speech , 55 ( 2 ), 295–308. [ PubMed ] [ Google Scholar ]
  • Keating P. (1990). The window model of coarticulation: Articulatory evidence . In Kingston J. & Beckman M. E. (Eds.), Papers in laboratory phonology (pp. 451–470). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Keating P., & Shattuck-Hufnagel S. (2002). A prosodic view of word form encoding for speech production . UCLA Working Papers in Phonetics , 101 , 112–156. [ Google Scholar ]
  • Kelso J. A., Saltzman E. L., & Tuller B. (1986). The dynamical perspective on speech production: Data and theory . Journal of Phonetics , 14 ( 1 ), 29–59. [ Google Scholar ]
  • Kent R. D. (1983). The segmental organization of speech . In MacNeilage P. F. (Ed.), The production of speech (pp. 57–89). New York, NY: Springer-Verlag. [ Google Scholar ]
  • Kent R. D., & Forner L. L. (1980). Speech segment duration in sentence recitations by children and adults . Journal of Phonetics , 8 , 157–168. [ Google Scholar ]
  • Kuhl P. K. (2000). A new view of language acquisition . Proceedings of the National Academy of Sciences of the United States of America , 97 ( 22 ), 11850–11857. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kuhl P. K., Ramírez R. R., Bosseler A., Lin J. F. L., & Imada T. (2014). Infants' brain responses to speech suggest analysis by synthesis . Proceedings of the National Academy of Sciences of the United States of America , 111 ( 31 ), 11238–11245. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lametti D. R., Nasir S. M., & Ostry D. J. (2012). Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback . Journal of Neuroscience , 32 ( 27 ), 9351–9358. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lee S., Potamianos A., & Narayanan S. (1999). Acoustics of children's speech: Developmental changes of temporal and spectral parameters . The Journal of the Acoustical Society of America , 105 , 1455–1468. [ PubMed ] [ Google Scholar ]
  • Levelt W. J. (1989). Speaking: From intention to articulation . Cambridge, MA: MIT Press. [ Google Scholar ]
  • Li F. (2013). The effect of speakers' sex on voice onset time in Mandarin stops . The Journal of the Acoustical Society of America , 133 ( 2 ), EL142–EL147. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lindblom B. (1990). Explaining phonetic variation: A sketch of the H&H theory . In Hardcastle W. J. & Marchal A. (Eds.), Speech production and speech modelling (pp. 403–439). Dordrecht, the Netherlands: Springer. [ Google Scholar ]
  • Lindblom B., Lubker J., & Gay T. (1979). Formant frequencies of some fixed-mandible vowels and a model of motor programming by predictive simulation . Journal of Phonetics , 7 , 147–161. [ Google Scholar ]
  • Locke J. L. (1983). Phonological acquisition and change . New York, NY: Academic Press. [ Google Scholar ]
  • Löfqvist A. (1990). Speech as audible gestures . In Hardcastle H. W. & Marchal A. (Eds.), Speech production and speech modelling (pp. 289–322). Dordrecht, the Netherlands: Kluwer Academic. [ Google Scholar ]
  • MacDonald E. N., Goldberg R., & Munhall K. G. (2010). Compensations in response to real-time formant perturbations of different magnitudes . The Journal of the Acoustical Society of America , 127 ( 2 ), 1059–1068. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • MacDonald E. N., Johnson E. K., Forsythe J., Plante P., & Munhall K. G. (2012). Children's development of self-regulation in speech production . Current Biology , 22 ( 2 ), 113–117. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • MacKay D. G. (1970). Spoonerisms: The structure of errors in the serial order of speech . Neuropsychologia , 8 ( 3 ), 323–350. [ PubMed ] [ Google Scholar ]
  • Macken M. A., & Ferguson C. A. (1981). Phonological universals in language acquisition . Annals of the New York Academy of Sciences , 379 ( 1 ), 110–129. [ Google Scholar ]
  • MacNeilage P. F. (1970). Motor control of serial ordering of speech . Psychological Review , 77 ( 3 ), 182–196. [ PubMed ] [ Google Scholar ]
  • MacNeilage P. F. (1998). The frame/content theory of evolution of speech production . Behavioral and Brain Sciences , 21 ( 4 ), 499–511. [ PubMed ] [ Google Scholar ]
  • Mandler G. (2007). A history of modern experimental psychology . Cambridge, MA: MIT Press. [ Google Scholar ]
  • McCune L., & Vihman M. M. (1987). Vocal motor schemes . Papers and Reports on Child Language Development , 26 , 72–79. [ Google Scholar ]
  • McCune L., & Vihman M. M. (2001). Early phonetic and lexical development: A productivity approach . Journal of Speech, Language, and Hearing Research , 44 ( 3 ), 670–684. [ PubMed ] [ Google Scholar ]
  • Ménard L., Perrier P., Aubin J., Savariaux C., & Thibeault M. (2008). Compensation strategies for a lip-tube perturbation of French [u]: An acoustic and perceptual study of 4-year-old children . The Journal of the Acoustical Society of America , 124 ( 2 ), 1192–1206. [ PubMed ] [ Google Scholar ]
  • Menn L. (1983). Development of articulatory, phonetic, and phonological capabilities . In Butterworth B. (Ed.), Language production (Vol. 2 , pp. 3–50). Cambridge, MA: Academic Press. [ Google Scholar ]
  • Menn L., Schmidt E., & Nicholas B. (2013). Challenges to theories, charges to a model: The linked-attractor model of phonological development . In Vihman M. M. & Keren-Portnoy T. (Eds.), The emergence of phonology: Whole-word approaches and cross-linguistic evidence (pp. 460–502). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Nam H., Goldstein L., & Saltzman E. (2009). Self-organization of syllable structure: A coupled oscillator model . In Pellegrino F., Chitoran I., Marsico E., & Coupé C. (Eds.), Approaches to phonological complexity (pp. 299–328). Berlin, Germany: De Gruyter. [ Google Scholar ]
  • Newell A., & Simon H. A. (1972). Human problem solving . Oxford, England: Prentice-Hall. [ Google Scholar ]
  • Nielsen K. (2011). Specificity and abstractness of VOT imitation . Journal of Phonetics , 39 ( 2 ), 132–142. [ Google Scholar ]
  • Nittrouer S. (1993). The emergence of mature gestural patterns is not uniform: Evidence from an acoustic study . Journal of Speech and Hearing Research , 36 ( 5 ), 959–972. [ PubMed ] [ Google Scholar ]
  • Nittrouer S. (1995). Children learn separate aspects of speech production at different rates: Evidence from spectral moments . The Journal of the Acoustical Society of America , 97 ( 1 ), 520–530. [ PubMed ] [ Google Scholar ]
  • Nittrouer S., Studdert-Kennedy M., & McGowan R. S. (1989). The emergence of phonetic segments: Evidence from the spectral structure of fricative-vowel syllables spoken by children and adults . Journal of Speech and Hearing Research , 32 ( 1 ), 120–132. [ PubMed ] [ Google Scholar ]
  • Nittrouer S., Studdert-Kennedy M., & Neely S. T. (1996). How children learn to organize their speech gestures: Further evidence from fricative-vowel syllables . Journal of Speech and Hearing Research , 39 , 379–389. [ PubMed ] [ Google Scholar ]
  • Niziolek C. A., Nagarajan S. S., & Houde J. F. (2013). What does motor efference copy represent? Evidence from speech production . Journal of Neuroscience , 33 ( 41 ), 16110–16116. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Noiray A., Abakarova D., Rubertus E., Krüger S., & Tiede M. (2018). How do children organize their speech in the first years of life? Insight from ultrasound imaging . Journal of Speech, Language, and Hearing Research , 61 , 1355–1368. [ PubMed ] [ Google Scholar ]
  • Noiray A., Ménard L., & Iskarous K. (2013). The development of motor synergies in children: Ultrasound and acoustic measurements . The Journal of the Acoustical Society of America , 133 ( 1 ), 444–452. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Norman D. A., & Shallice T. (1986). Attention to action . In Davidson R. J., Schwartz G. E., & Shapiro D. (Eds.), Consciousness and self-regulation (pp. 1–18). Plenum Press. [ Google Scholar ]
  • Oh E. (2011). Effects of speaker gender on voice onset time in Korean stops . Journal of Phonetics , 39 ( 1 ), 59–67. [ Google Scholar ]
  • Oller D. K. (2000). The emergence of the speech capacity . New York, NY: Psychology Press. [ Google Scholar ]
  • Perkell J. S., Matthies M. L., Svirsky M. A., & Jordan M. I. (1993). Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: A pilot “motor equivalence” study . The Journal of the Acoustical Society of America , 93 ( 5 ), 2948–2961. [ PubMed ] [ Google Scholar ]
  • Perkell J. S., Matthies M. L., Svirsky M. A., & Jordan M. I. (1995). Goal-based speech motor control: A theoretical framework and some preliminary data . Journal of Phonetics , 23 ( 1 – 2 ), 23–35. [ Google Scholar ]
  • Pierrehumbert J. (2002). Word-specific phonetics . In Gussenhoven C. & Warner N. (Eds.), Laboratory phonology 7 (pp. 101–139). Berlin, Germany: Mouton de Gruyter. [ Google Scholar ]
  • Plude D. J., Enns J. T., & Brodeur D. (1994). The development of selective attention: A life-span overview . Acta Psychologica , 86 ( 2–3 ), 227–272. [ PubMed ] [ Google Scholar ]
  • Postma A. (2000). Detection of errors during speech production: A review of speech monitoring models . Cognition , 77 ( 2 ), 97–132. [ PubMed ] [ Google Scholar ]
  • Recasens D. (1989). Long range coarticulation effects for tongue dorsum contact in VCVCV sequences . Speech Communication , 8 ( 4 ), 293–307. [ Google Scholar ]
  • Redford M. A. (2015). Unifying speech and language in a developmentally sensitive model of production . Journal of Phonetics , 53 , 141–152. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Redford M. A. (2018). Grammatical word production across metrical contexts in school-aged children's and adults' speech . Journal of Speech, Language, and Hearing Research , 61 , 1339–1354. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Redford M. A., & Miikkulainen R. (2007). Effects of acquisition rate on emergent structure in phonological development . Language , 83 ( 4 ), 737–769. [ Google Scholar ]
  • Richardson M. J., Shockley K., Fajen B. R., Riley M. A., & Turvey M. T. (2009). Ecological psychology: Six principles for an embodied–embedded approach to behavior . In Calvo P. & Gomila A. (Eds.), Handbook of cognitive science: An embodied approach (pp. 159–187). Amsterdam, the Netherlands: Elsevier. [ Google Scholar ]
  • Roelofs A. (1999). Phonological segments and features as planning units in speech production . Language and Cognitive Processes , 14 ( 2 ), 173–200. [ Google Scholar ]
  • Rogers S. (1978). Self-initiated corrections in the speech of infant–school children . Journal of Child Language , 5 , 365–371. [ Google Scholar ]
  • Saltzman E., & Kelso J. A. (1987). Skilled actions: A task-dynamic approach . Psychological Review , 94 ( 1 ), 84–106. [ PubMed ] [ Google Scholar ]
  • Saltzman E., & Munhall K. G. (1989). A dynamical approach to gestural patterning in speech production . Ecological Psychology , 1 ( 4 ), 333–382. [ Google Scholar ]
  • Savariaux C., Perrier P., & Orliaguet J. P. (1995). Compensation strategies for the perturbation of the rounded vowel [u] using a lip tube: A study of the control space in speech production . The Journal of the Acoustical Society of America , 98 ( 5 ), 2428–2442. [ Google Scholar ]
  • Schmidt R. A. (1975). A schema theory of discrete motor skill learning . Psychological Review , 82 ( 4 ), 225–260. [ Google Scholar ]
  • Schwartz J. L., Boë L. J., Vallée N., & Abry C. (1997). The dispersion–focalization theory of vowel systems . Journal of Phonetics , 25 ( 3 ), 255–286. [ Google Scholar ]
  • Sharkey S. G., & Folkins J. W. (1985). Variability of lip and jaw movements in children and adults: Implications for the development of speech motor control . Journal of Speech and Hearing Research , 28 , 8–15. [ PubMed ] [ Google Scholar ]
  • Shattuck-Hufnagel S., & Klatt D. H. (1979). The limited use of distinctive features and markedness in speech production: Evidence from speech error data . Journal of Verbal Learning and Verbal Behavior , 18 ( 1 ), 41–55. [ Google Scholar ]
  • Shockley K., Sabadini L., & Fowler C. A. (2004). Imitation in shadowing words . Perception & Psychophysics , 66 ( 3 ), 422–429. [ PubMed ] [ Google Scholar ]
  • Smith A., & Goffman L. (1998). Stability and patterning of speech movement sequences in children and adults . Journal of Speech, Language, and Hearing Research , 41 , 18–30. [ PubMed ] [ Google Scholar ]
  • Smith A., & Zelaznik H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence . Developmental Psychobiology , 45 , 22–33. [ PubMed ] [ Google Scholar ]
  • Smith B. L. (1992). Relationships between duration and temporal variability in children's speech . The Journal of the Acoustical Society of America , 91 , 2165–2174. [ PubMed ] [ Google Scholar ]
  • Stevens K. N., & Blumstein S. E. (1981). The search for invariant acoustic correlates of phonetic features . In Eimas P. D. & Miller J. L. (Eds.), Perspectives on the study of speech (pp. 1–38). Hillsdale, NJ: Erlbaum. [ Google Scholar ]
  • Stoel-Gammon C. (1983). Constraints on consonant–vowel sequences in early words . Journal of Child Language , 10 ( 2 ), 455–457. [ PubMed ] [ Google Scholar ]
  • Stoel-Gammon C., & Cooper J. A. (1984). Patterns of early lexical and phonological development . Journal of Child Language , 11 ( 2 ), 247–271. [ PubMed ] [ Google Scholar ]
  • Stoel-Gammon C., & Dunn C. (1985). Normal and disordered phonology in children . Austin, TX: Pro-Ed. [ Google Scholar ]
  • Studdert-Kennedy M. (1987). The phoneme as a perceptuomotor structure . In Allport A., MacKay D., Prinz W., & Scheerer E. (Eds.), Language perception and production (pp. 67–84). London, England: Academic Press. [ Google Scholar ]
  • Symons D. K. (2004). Mental state discourse, theory of mind, and the internalization of self–other understanding . Developmental Review , 24 ( 2 ), 159–188. [ Google Scholar ]
  • Terband H., Van Brenk F., & van Doornik-van der Zee A. (2014). Auditory feedback perturbation in children with developmental speech sound disorders . Journal of Communication Disorders , 51 , 64–77. [ PubMed ] [ Google Scholar ]
  • Thompson A. E., & Hixon T. J. (1979). Nasal air flow during normal speech production . Cleft Palate Journal , 16 , 412–420. [ PubMed ] [ Google Scholar ]
  • Tilsen S. (2014). Selection and coordination of articulatory gestures in temporally constrained production . Journal of Phonetics , 44 , 26–46. [ Google Scholar ]
  • Tourville J. A., & Guenther F. H. (2011). The DIVA model: A neural theory of speech acquisition and production . Language and Cognitive Processes , 26 ( 7 ), 952–981. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Turk A., & Shattuck-Hufnagel S. (2014). Timing in talking: What is it used for, and how is it controlled . Philosophical Transactions of the Royal Society of London B: Biological Sciences , 369 ( 1658 ), 20130395. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Turvey M. T. (1990). Coordination . American Psychologist , 45 ( 8 ), 938–953. https://doi.org/10.1037/0003-066X.45.8.938 [ PubMed ] [ Google Scholar ]
  • Vihman M. M. (1996). Phonological development: The origins of language in the child . Oxford, England: Blackwell. [ Google Scholar ]
  • Vihman M. M. (2017). Learning words and learning sounds: Advances in language development . British Journal of Psychology , 108 ( 1 ), 1–27. [ PubMed ] [ Google Scholar ]
  • Vihman M. M., & Croft W. (2007). Phonological development: Toward a “radical” templatic phonology . Linguistics , 45 , 683–725. [ Google Scholar ]
  • Vihman M. M., Ferguson C. A., & Elbert M. (1986). Phonological development from babbling to speech: Common tendencies and individual differences . Applied Psycholinguistics , 7 ( 1 ), 3–40. [ Google Scholar ]
  • Vihman M. M. & Keren-Portnoy T. (Eds.). (2013). The emergence of phonology: Whole-approaches and cross-linguistic evidence . Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Vihman M. M., Macken M. A., Miller R., Simmons H., & Miller J. (1985). From babbling to speech: A re-assessment of the continuity issue . Language , 61 , 397–445. [ Google Scholar ]
  • Vihman M. M., & McCune L. (1994). When is a word a word? Journal of Child Language , 21 ( 3 ), 517–542. [ PubMed ] [ Google Scholar ]
  • Vihman M. M., Nakai S., DePaolis R. A., & Hallé P. (2004). The role of accentual pattern in early lexical representation . Journal of Memory and Language , 50 ( 3 ), 336–353. [ Google Scholar ]
  • Waterson N. (1971). Child phonology: A prosodic view . Journal of Linguistics , 7 ( 2 ), 179–211. [ Google Scholar ]
  • Wellman H. M., Cross D., & Watson J. (2001). Meta-analysis of theory-of-mind development: The truth about false belief . Child Development , 72 ( 3 ), 655–684. [ PubMed ] [ Google Scholar ]
  • Whiteside S. P., & Irving C. J. (1998). Speakers' sex differences in voice onset time: A study of isolated word production . Perceptual and Motor Skills , 86 ( 2 ), 651–654. [ PubMed ] [ Google Scholar ]
  • Wickelgran W. A. (1969). Context-sensitive coding, associative memory, and serial order in (speech) behavior . Psychological Review , 76 ( 1 ), 1–15. [ Google Scholar ]
  • Zharkova N., Hewlett N., & Hardcastle W. J. (2011). Coarticulation as an indicator of speech motor control development in children: An ultrasound study . Motor Control , 15 ( 1 ), 118–140. [ PubMed ] [ Google Scholar ]
  • Zharkova N., Hewlett N., & Hardcastle W. J. (2012). An ultrasound study of lingual coarticulation in /sV/ syllables produced by adults and typically developing children . Journal of the International Phonetic Association , 42 ( 2 ), 193–208. [ Google Scholar ]

Logo for TRU Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.1 Evidence for Speech Production

Dinesh Ramoo

The evidence used by psycholinguistics in understanding speech production can be varied and interesting. These include speech errors, reaction time experiments, neuroimaging, computational modelling, and analysis of patients with language disorders. Until recently, the most prominent set of evidence for understanding how we speak came from speech errors . These are spontaneous mistakes we sometimes make in casual speech. Ordinary speech is far from perfect and we often notice how we slip up. These slips of the tongue can be transcribed and analyzed for broad patterns. The most common method is to collect a large corpus of speech errors by recording all the errors one comes across in daily life.

Perhaps the most famous example of this type of analysis are what are termed ‘ Freudian slips .’ Freud (1901-1975) proposed that slips of the tongue were a way to understand repressed thoughts. According to his theories about the subconscious, certain thoughts may be too uncomfortable to be processed by the conscious mind and can be repressed. However, sometimes these unconscious thoughts may surface in dreams and slips of the tongue. Even before Freud, Meringer and Mayer (1895) analysed slips of the tongue (although not in terms of psychoanalysis).

Speech errors can be categorized into a number of subsets in terms of the linguistic units or mechanisms involved. Linguistic units involved in speech errors could be phonemes, syllables, morphemes, words or phrases. The mechanisms of the errors can involve the deletion, substitution, insertion, or blending of these units in some way. Fromkin (1971; 1973) argued that the fact that these errors involve some definable linguistic unit established their mental existence at some level in speech production. We will consider these in more detail in discussing the various stages of speech production.

Speech Error Types and Examples
Error Type Error Target
Anticipation eading list Reading list
Perseveration black oxes black boxes
Exchange at ack pack rat
Substitution s encil stencil
Deletion sippery slippery
Insertion s kool school

An error in the production of speech.

An unintentional speech error hypothesized by Sigmund Freud as indicating subconscious feelings.

9.1 Evidence for Speech Production Copyright © 2021 by Dinesh Ramoo is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

1SpecialPlace

Exploring the Wonders of Speech: What is Speech and How Does it Work?

Speech is an intricate and remarkable aspect of human communication. It’s the primary means by which we express our thoughts, share our emotions, and connect with others. But what exactly is speech, and how does it work? In this blog, we’ll delve into the fascinating world of speech , exploring its components, functions, and the science behind this remarkable ability.

Understanding Speech: The Basics

Speech is the vocalized form of human communication. It involves the production and articulation of sounds to convey meaning and thoughts. This intricate process combines physiological, cognitive, and linguistic elements to create a rich tapestry of communication.

Components of Speech

  • Phonemes: Phonemes are the smallest distinct units of sound in a language. Different languages have different sets of phonemes, and combining these phonemes in various ways forms words and sentences.
  • Articulation: Articulation refers to the movement of the vocal organs (such as the lips, tongue, and vocal cords) to produce specific sounds. The precise coordination of these movements allows us to create a diverse range of sounds.
  • Prosody: Prosody encompasses the rhythm, intonation, and stress patterns in speech. It adds emotional nuances and conveys intentions, like whether a statement is a question or a declaration.

The Science Behind Speech Production

Speech production involves a complex interplay of the brain, vocal apparatus, and linguistic knowledge. Here’s a simplified breakdown of the process:

  • Brain Activation: The brain’s language centers, including Broca’s area and Wernicke’s area, play crucial roles in speech production and comprehension. These areas coordinate the planning and execution of speech.
  • Speech Planning: The brain formulates the intended message into linguistic units (words and phrases). It then sends signals to the motor cortex, which controls the movements of the vocal organs.
  • Vocal Cord Vibrations: As air from the lungs passes through the vocal cords, they vibrate. This vibration generates sound waves, which are then shaped into specific sounds by manipulating the vocal organs.
  • Articulation: The tongue, lips, teeth, and other vocal organs work together to modify the airflow and shape the sound waves. This process results in the production of different phonemes.
  • Auditory Feedback: The brain constantly monitors the sounds being produced and compares them to the intended speech. This feedback loop allows for real-time adjustments to pronunciation.

Functions of Speech

  • Communication: The primary function of speech is to convey information, thoughts, feelings, and intentions to others.
  • Social Interaction: Speech plays a pivotal role in social interactions, enabling connections, friendships, and cooperation among individuals.
  • Cultural Transmission: Through speech, cultures pass down traditions, stories, and knowledge from one generation to the next.
  • Learning and Education: Speech is essential for learning languages, acquiring new information, and participating in educational activities.

Speech is a marvel of human evolution, blending cognitive abilities, linguistic knowledge, and intricate motor coordination. It allows us to connect, express, and understand each other in ways that written language cannot replicate. Understanding the components and science behind speech sheds light on the complexity of this everyday phenomenon and deepens our appreciation for the power of communication .

Recent Posts

Varsha Srivastava

  • Voices of Silence: Celebrating the International Week of the Deaf people - September 24, 2024
  • Alzheimer’s Disease – Its symptom and causes - September 17, 2024
  • The Grandparent’s Role: How Supportive Grandparenting Can Enhance Children’s Therapy Outcomes - September 10, 2024

Leave a Comment

(0 Comments)

Cancel reply

Facebook

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Mental Health
  • Multilingual
  • Occupational Therapy
  • Speech Delay
  • Speech Therapy
  • Success Stories
  • Voices of Silence: Celebrating the International Week of the Deaf people September 24, 2024
  • Sign Language around the World September 23, 2024
  • Alzheimer’s Disease – Its symptom and causes September 17, 2024
  • The Grandparent’s Role: How Supportive Grandparenting Can Enhance Children’s Therapy Outcomes September 10, 2024
  • Ganesh Chaturthi Storytelling for Speech Development: Fun Ways to Practice Communication September 7, 2024

Don't miss our one of its kind SPEAK EASY PROGRAM for stuttering management. Chat with us to know more! Dismiss

StudyPrism

Define the terms: speech, language, communication

Qualification : NCFE CACHE Level 3 Diploma for the Early Years Educator Unit : Unit 3.10: Develop the speech, language and communication of children Learning outcome : Understand theory and current frameworks which underpin children’s speech, language and communication development. Assessment criteria : Define the terms: speech, language, communication

English is a complex language that contains many different elements. Three core components of English are speech, language, and communication. These terms might be confusing to some, so let’s break down each one individually to get a better sense of their meaning.

What is Speech?

Speech refers to the physical act of producing sound. Speech production involves the coordination of several muscles in the mouth and throat that work together to produce sound. This sound can then be used to communicate with others. Speech production also involves the ability to hear oneself speak, called auditory feedback. People who have difficulty producing speech may seek help from a speech-language pathologist or speech therapist.

What is Language?

Language refers to the rules and conventions used when speaking or writing in any given language. It involves grammar, vocabulary, syntax, and other linguistic features that are specific to a certain language such as English or Spanish. Language proficiency requires an understanding of how words combine into sentences and how these sentences convey meaning in order for effective communication with others. Without this knowledge, two people may not be able to understand each other’s message.

What is Communication?

Communication is the exchange of information between two or more people or groups using spoken words, gestures, signs, symbols, or written messages. It involves both verbal and nonverbal cues (such as facial expressions) that give clues about what someone is trying to say or express through their body language. The goal of communication is typically for one person or group to share their ideas with another person or group in order for them bothto gain a better understanding of each other’s point of view on a particular subject matter. Effective communication requires understanding and respect for each other’s beliefs and values as well as an awareness of potential cultural differences that could impact the conversation’s outcome if not taken into account beforehand..

In summary, speech refers to the physical act of producing sound which can then be used for communication; language encompasses all aspects related to speaking a particular language; while communication involves exchanging information between two people using verbal and/or nonverbal cues in order for them bothto gain an understanding on a given subject matter. These three terms are essential components needed for successful interactions with others — whether it be face-to-face conversations or digital interactions online — making them invaluable tools for any student studying English as a second language!

  • Today's news
  • Reviews and deals
  • Climate change
  • 2024 election
  • Newsletters
  • Fall allergies
  • Health news
  • Mental health
  • Sexual health
  • Family health
  • So mini ways
  • Unapologetically
  • Buying guides

Entertainment

  • How to Watch
  • My watchlist
  • Stock market
  • Biden economy
  • Personal finance
  • Stocks: most active
  • Stocks: gainers
  • Stocks: losers
  • Trending tickers
  • World indices
  • US Treasury bonds
  • Top mutual funds
  • Highest open interest
  • Highest implied volatility
  • Currency converter
  • Basic materials
  • Communication services
  • Consumer cyclical
  • Consumer defensive
  • Financial services
  • Industrials
  • Real estate
  • Mutual funds
  • Credit cards
  • Balance transfer cards
  • Cash back cards
  • Rewards cards
  • Travel cards
  • Online checking
  • High-yield savings
  • Money market
  • Home equity loan
  • Personal loans
  • Student loans
  • Options pit
  • Fantasy football
  • Pro Pick 'Em
  • College Pick 'Em
  • Fantasy baseball
  • Fantasy hockey
  • Fantasy basketball
  • Download the app
  • Daily fantasy
  • Scores and schedules
  • GameChannel
  • World Baseball Classic
  • Premier League
  • CONCACAF League
  • Champions League
  • Motorsports
  • Horse racing

New on Yahoo

  • Privacy Dashboard

British PM makes unfortunate ‘return of the sausages’ gaffe during serious speech

United Kingdom Prime Minister Keir Starmer made an unfortunate gaffe during a speech at his party's conference Tuesday when he mistakenly called for the return of "sausages," instead of hostages, held by Hamas in the Gaza Strip.

Starmer was speaking at the Labour Party's annual conference in Liverpool when he called for a de-escalation between Lebanon and Israel, as well as a cease-fire in Gaza.

He also called for the return of hostages being held by the terror group when he slipped up, before quickly recovering.

"I call again for an immediate cease-fire in Gaza, the return of the sausages — the hostages — and a recommitment to the two-state solution: a recognized Palestinian state alongside a safe and secure Israel," he said.

Israel Says It Conducted Retaliatory Strikes Against Hezbollah In Lebanon, Struck Hamas In Gaza

The mistake quickly went viral.

Read On The Fox News App

During his remarks, Starmer was also heckled by a spectator in the audience who shouted about Gaza.

"This guy's obviously got a pass from the 2019 conference. We've changed the party," Starmer joked in response, Reuters reported. "While he's been protesting, we've been changing the party. That's why we've got a Labour government."

Uk Government Accused Of Cracking Down On Free Speech: ‘Think Before You Post’

Multiple hostages are still being held in Gaza nearly a year after the group attacked Israeli communities Oct. 7, sparking the latest conflict between Hamas and the Jewish state.

Israel has proposed ending the war if Hamas releases the remaining hostages, along with the demilitarization of Gaza and the establishment of an alternative governing body. Hamas has rejected several offers to end the conflict.

Israel has bombarded Gaza and pledged to hunt down those responsible for the deadly attack. Meanwhile, it has also had to defend itself on a second front against shelling in its north from Iran-backed Hezbollah in Lebanon.

Israeli forces said they have continued to carry out dozens of airstrikes on Hezbollah targets within Lebanon and that artillery and tanks continue to hit targets close to the border.

Original article source: British PM makes unfortunate ‘return of the sausages’ gaffe during serious speech

what is meaning speech production

Create a free profile to get unlimited access to exclusive videos, sweepstakes, and more!

Lala Kent Makes a Big Clarification About Her Baby's Name: "Her Name Is Sosa Because..."

The  Vanderpump Rules  cast member opened up about the meaning of her second daughter's unique name. 

what is meaning speech production

Just three weeks after giving birth to her daughter  Sosa Kent , Lala Kent has finally revealed the meaning behind her daughter's unique name. The  Vanderpump Rules   cast member took to her podcast,  Give Them Lala , on September 23, where she opened up about how she came up with the perfect name for her second daughter.

How to Watch

Watch Vanderpump Rules on  Peacock  and the Bravo app .  

"OK, I have always loved the name, Sosa," Lala stated. "I loved it ever since I saw the movie  Hardball  with Keanu Reeves and he takes all of the kids to the baseball game and they see Sammy Sosa and I was just like, 'Oh my god, I'm  obsessed  with that name.'"

This was a name that Lala had on her running list of future baby names, which she started working on at a "very young age." "And the name has never left," she explained.

What does Sosa mean?

The name Sosa doesn't have any specific meaning to Lala other than that it's a name that she has  loved  for a long time. Lala even revealed that she wasn't aware that Sosa meant "saltwater" in Portuguese . "But by the way, now that I know that, let's rock with that," she said. "Because that is absolutely beautiful."

Here’s what you may have missed on Bravo:

Lala Kent Gives Birth to Baby Number 2: "Welcome Into the World"

Lala Kent Shares the First Photo of Her Daughter, Sosa Kent: See the Snapshot

Summer Moon and Ocean's After-School Picture Is Too Cute for Words

Lala Kent Says Sosa Kent is "Not Named After Anyone"

Lala took to her Instagram Story after releasing the podcast episode, where she set the record straight on her comments. Although Lala knew of the baseball player, she clarified that her baby is  not  named after him. 

"To make this very clear, for the final tiiiiiime," Lala wrote on her story. "SOSA IS NOT NAMED AFTER ANYONE. HER NAME IS SOSA BECAUSE I LOVED IT."

Copy explaining Lala Kent's baby name.

Lala Kent posts an instagram story explaining her newborn, Sosa Kent's name.

"PERRRIIIIOOOOOOOD," she concluded.

"I just love the name and I am happy I got to use it," she noted on the podcast.

How Lala Kent Came Up With Daughter Ocean Kent Emmett's Name

Lala kent shares a first look at baby sosa's face: "completely in love".

Although Lala picked Sosa's name from her long-running list of favorite names, her discovery of her first child  Ocean Kent Emmett 's name was a  bit  different. She had actually been on a vacation with Ocean's father when she came across her first daughter's name in the wild.

"I don't even know why this name didn't come up sooner but we were in the Bahamas three years ago, I walked past this yacht and it was called the  Ocean Alexander , and I was like, 'Oh my God, the name Ocean, it has to be my baby's name,'" she told  Entertainment Tonight   in May 2021.

"We knew that we were going to name our first baby — whether it was a boy or a girl — Ocean," she concluded.

Vanderpump Rules

  • Kids And Families
  • Ocean Kent Emmett

Related Stories

Gina Kirschenheiter smiling in front of a step and repeat.

Gina Is Stunned by Son Nicholas' New Hair: "I Can't"

Briana Culberson smiling outdoors.

Get a New Look at Briana Culberson's Jaw-Dropping Rings

Kristen Doute smiling at the Watch What Happens Live clubhouse in New York City.

Get The Latest on Kristen Doute's Fertility Journey

Lala Kent posing outdoors in a grey dress.

Why Lala Kent Keeps Baby Sosa's Bassinet In Her Bathroom

Noelle Robinson wearing a long sleeve blue dress at a restaurant.

See What Noelle Robinson Is Up To Today

Milania Giudice of Real Housewives of New Jersey smiles in a green shirt.

See Milania Giudice's Life at the University of Tampa

Jennifer Aydin and her son Justin Aydin together on a football field during his highschool graduation.

Inside Jennifer Aydin's Joint Birthday Party for Her Son & Dad

Milania Giudice and Joe Giudice pose for a photo together.

Milania Shares the Snapchats She Gets from Joe Giudice

Melissa Gorga and Joe Gorga of the Real Housewives of New Jersey.

See Inside Melissa and Joe's Weekend at Sons' Football Games

Teresa Giudice and Gabriella Giudice smiling together in Paris, France.

Inside Teresa & Luis' Trip to See Gabriella in Michigan

Farrah Brittany with her father Guraish Aldjufrie in Bali together

Farrah Shares New Photos of Dad Guraish Aldjufrie

Lala Kent posing in a black outfit outdoors.

See Lala Kent's New Photo of Baby Sosa's Face

Latest videos.

Angie Katsanevas Says Lisa Barlow Makes It Hard to Be Honest

Angie Katsanevas Says Lisa Barlow Makes It Hard to Be Honest

Governor Gretchen Whitmer Chats About Working With Oprah Winfrey

Governor Gretchen Whitmer Chats About Working With Oprah Winfrey

Governor Gretchen Whitmer Describes Vice President Kamala Harris in Three Words

Governor Gretchen Whitmer Describes Vice President Kamala Harris in Three Words

Governor Gretchen Whitmer on Taylor Swift and Brittany Mahomes’ Friendship

Governor Gretchen Whitmer on Taylor Swift and Brittany Mahomes’ Friendship

Uzo Aduba Says Meredith Marks Doesn’t Own Everything Related to Baths

Uzo Aduba Says Meredith Marks Doesn’t Own Everything Related to Baths

Uzo Aduba Can’t Believe She Got to Star Alongside This Oscar Winner

Uzo Aduba Can’t Believe She Got to Star Alongside This Oscar Winner

The Real Housewives of Salt Lake City Show Inside Their Refrigerators

The Real Housewives of Salt Lake City Show Inside Their Refrigerators

The Below Deck Mediterranean Season 9 Charter Guests Tell All

The Below Deck Mediterranean Season 9 Charter Guests Tell All

Chef Johnathan Shillingford Calls Out Joe Bradley’s Biggest Red Flag

Chef Johnathan Shillingford Calls Out Joe Bradley’s Biggest Red Flag

Would Joe Bradley Have Pursued Carrie O’Neill Over Bri Muller and Elena Dubaich?

Would Joe Bradley Have Pursued Carrie O’Neill Over Bri Muller and Elena Dubaich?

Chef Johnathan Shillingford Discouraged Gael Cameron and Nathan Gallagher’s Canoodling

Chef Johnathan Shillingford Discouraged Gael Cameron and Nathan Gallagher’s Canoodling

Joe Bradley Doesn’t Regret Turning Elena Dubaich Down

Joe Bradley Doesn’t Regret Turning Elena Dubaich Down

Vpr S11 Web Dynamiclead Desktop 1920x1080 1

Sandoval Reveals the Shocking Increase in His Bills Once Ariana Madix Is Taken Off the Home Loan

Katie Maloney Says Having Sex with Men "Feels Like Charity"

Katie Maloney Says Having Sex with Men "Feels Like Charity"

Lisa Vanderpump Says Tom Sandoval "Deserves a Good Kick in the Ass"

Lisa Vanderpump Says Tom Sandoval "Deserves a Good Kick in the Ass"

Recommended for you.

The Bravo Fan Fest Miami logo overlaid onto the Miami skyline.

Bravo Fan Fest Is Heading to Miami in November

Josh Flagg, Tracy Tutor and Josh Altman together in front of a purple and orange sky and the LA cityscape.

Where Did the MDLLA Cast Go to College?

Two Black tee shirts and a glass with funny quotes on a blue yellow and pink background

The Best Real Housewives Quotes: "Name 'Em" & More

IMAGES

  1. 4 Stages of Speech Production

    what is meaning speech production

  2. PPT

    what is meaning speech production

  3. Stages of Speech Production (aka Levels of Linguistic Representation)

    what is meaning speech production

  4. Anatomy of speech production

    what is meaning speech production

  5. PPT

    what is meaning speech production

  6. PPT

    what is meaning speech production

VIDEO

  1. Stages of speech production and speech perception

  2. Deep meaning pictures #youtubeshorts #motivation #shortvideo #shorts

  3. Definition of parts of speech & Its types

  4. டான்ஸ் இரட்டை அர்த்த பேச்சு Double meaning speech நம்ம ஊர் கச்சேரி

  5. An Introduction to TV Production Techniques

  6. CHOOSE ONES! YOU SHOULDN'T SAY THESE THINGS TO OTHERS

COMMENTS

  1. Speech production

    Speech production is the process by which thoughts are translated into speech. This includes the selection of words, the organization of relevant grammatical forms, and then the articulation of the resulting sounds by the motor system using the vocal apparatus.Speech production can be spontaneous such as when a person creates the words of a conversation, reactive such as when they name a ...

  2. Speech Production

    Speech production is a complex process that includes the articulation of sounds and words, relying on the intricate interplay of hearing, perception, and information processing by the brain and ...

  3. Speech Production

    Definition. Speech production is the process of uttering articulated sounds or words, i.e., how humans generate meaningful speech. It is a complex feedback process in which hearing, perception, and information processing in the nervous system and the brain are also involved. Speaking is in essence the by-product of a necessary bodily process ...

  4. Overview of Speech Production and Speech Mechanism

    Overview of Speech Mechanism. The speech mechanism is a complex and intricate process that enables us to produce and comprehend speech. The speech mechanism involves a coordinated effort of speech subsystems working together seamlessly. Speech Mechanism is done by 5 Sub-systems: I. Respiratory System.

  5. Speech

    Human speech is served by a bellows-like respiratory activator, which furnishes the driving energy in the form of an airstream; a phonating sound generator in the larynx (low in the throat) to transform the energy; a sound-molding resonator in the pharynx (higher in the throat), where the individual voice pattern is shaped; and a speech-forming articulator in the oral cavity ().

  6. 2.1 How Humans Produce Speech

    Speech is produced by bringing air from the lungs to the larynx (respiration), where the vocal folds may be held open to allow the air to pass through or may vibrate to make a sound (phonation). The airflow from the lungs is then shaped by the articulators in the mouth and nose (articulation). The field of phonetics studies the sounds of human ...

  7. Speech Production

    Introduction. Speech production is one of the most complex human activities. It involves coordinating numerous muscles and complex cognitive processes. The area of speech production is related to Articulatory Phonetics, Acoustic Phonetics and Speech Perception, which are all studying various elements of language and are part of a broader field ...

  8. Speech Production

    Speech production is a highly complex motor act involving the coordination and synchronization of multiple neural and muscular networks. During speech, a number of component processes occur to support the retrieval of the phonological code, which underlies the lexical representation of word forms (Tremblay et al., 2016).

  9. 9.2 The Standard Model of Speech Production

    9.2 The Standard Model of Speech Production. Speech production falls into three broad areas: conceptualization, formulation and articulation (Levelt, 1989). In conceptualization, we determine what to say. This is sometimes known as message-level processing. Then we need to formulate the concepts into linguistic forms.

  10. Articulating: The Neural Mechanisms of Speech Production

    Speech production is a highly complex sensorimotor task involving tightly coordinated processing across large expanses of the cerebral cortex. Historically, the study of the neural underpinnings of speech suffered from the lack of an animal model. The development of non-invasive structural and functional neuroimaging techniques in the late 20 ...

  11. The Source-Filter Theory of Speech

    To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941; Fant, 1960) (see figure 1): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the "source" sound.Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract.

  12. Speech Production

    A theory of speech production provides an account of the means by which a planned sequence of language forms is implemented as vocal tract activity that gives rise to an audible, intelligible acoustic speech signal. Such an account must address several issues. Two central issues are considered in this article.

  13. 4 Stages of Speech Production

    This is the action of reflecting on what you said and making sure that what you said is what you meant. Real-Time Spell Check And Grammar Correction. Conclusion. There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage.

  14. Speech science

    The production of speech is a highly complex motor task that involves approximately 100 orofacial, laryngeal, pharyngeal, and respiratory muscles. [2] [3] Precise and expeditious timing of these muscles is essential for the production of temporally complex speech sounds, which are characterized by transitions as short as 10 ms between frequency bands [4] and an average speaking rate of ...

  15. Psycholinguistics/Development of Speech Production

    Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. ... Syntactic development is measured using mean length of the utterance (MLU) which is categorized into 5 stages ...

  16. Brain areas involved in speech production

    In fact, speech production is an extremely involved process. Thoughts must be translated into linguistic representations (itself not a trivial feat), which are then sent to speech mechanisms that can coordinate, initiate, modify and execute the articulation of an utterance. Through the study of patients with disorders affecting this complex ...

  17. 1

    The production of a speech sound may be divided into four separate but interrelated processes: the initiation of the air stream, normally in the lungs; its phonation in the larynx through the operation of the vocal folds; its direction by the velum into either the oral cavity or the nasal cavity (the oro-nasal process); and finally its ...

  18. Speech Production From a Developmental Perspective

    The ecological dynamics and information-processing approaches to speech production both assume three major levels of analysis: a phonological level where abstract form representations are associated with conceptual meaning, a speech plan level where abstract forms are elaborated for implementation, and an implementation level where articulatory ...

  19. Phonation Definition, Process & Examples

    The definition of phonation is the production and utterance of speech sounds. Phonation is also referred to as voicing , which is to express sounds like words or parts of words that have meaning.

  20. 9.1 Evidence for Speech Production

    The evidence used by psycholinguistics in understanding speech production can be varied and interesting. These include speech errors, reaction time experiments, neuroimaging, computational modelling, and analysis of patients with language disorders. Until recently, the most prominent set of evidence for understanding how we speak came from ...

  21. What is Speech Production and How Does it Work?

    Understanding Speech: The Basics. Speech is the vocalized form of human communication. It involves the production and articulation of sounds to convey meaning and thoughts. This intricate process combines physiological, cognitive, and linguistic elements to create a rich tapestry of communication. Components of Speech

  22. Speech Production and Perception: Differences, Similarities, and

    Speech production is the process of generating and articulating sounds that convey meaning. It involves several components, such as the respiratory system, the vocal tract, the articulators, and ...

  23. Define the terms: speech, language, communication

    What is Speech? Speech refers to the physical act of producing sound. Speech production involves the coordination of several muscles in the mouth and throat that work together to produce sound. This sound can then be used to communicate with others. Speech production also involves the ability to hear oneself speak, called auditory feedback.

  24. What is Kamala Harris' 'brat' rebrand all about?

    Kamala Harris has overhauled her campaign's online presence by embracing a social media trend inspired by pop star Charli XCX's Brat album cover. The presumptive Democratic presidential nominee ...

  25. British PM makes unfortunate 'return of the sausages' gaffe during

    United Kingdom Prime Minister Keir Starmer made an unfortunate gaffe during a speech at his party's conference Tuesday when he mistakenly called for the return of "sausages," instead of hostages ...

  26. Harris to pitch U.S. manufacturing revival

    In a speech on the economy slated for Pittsburgh, Harris will try to contrast former president Donald Trump's plans to impose trillions of dollars in new tariffs with her proposals to spur U.S ...

  27. Lala Kent Discusses Baby Sosa's Name's Meaning

    Although Lala picked Sosa's name from her long-running list of favorite names, her discovery of her first child Ocean Kent Emmett's name was a bit different. She had actually been on a vacation ...

  28. Ellen DeGeneres addresses her 'mean' reputation in new Netflix ...

    "Had I ended my show by saying, 'Go, f--- yourselves, people would have been pleasantly surprised to find out I'm kind," she jokes. DeGeneres says she first encountered rumors about her ...