Language Matters: Speaking Styles Match the Audience

Siri's voice recognition is trained using informal speech.

Siri’s voice recognition is trained using informal speech.

Paul Warren is Professor of Linguistics at Victoria University of Wellington. Language Matters is a bimonthly column on language.

OPINION: A quick search of the internet turns up plenty of cute (and not so cute) photo sites of dog owners who look a lot like their dogs. Patterns of convergence also exist in our discourse.

That doesn’t mean we look more like our dogs (although I’ll get to how we talk to pets soon), but over time we can look more and more like the people we hang out with.

The opposite effect also exists – we show patterns of divergence, especially from people we might not want to be associated with.

* Language Matters: When the archaic doesn’t make sense
* Language Matters: The Right and Wrong Ways to Apologize
* Language Matters – Learn to speak another language
* New genderless AI voice could be the future of virtual assistance

We also adopt different speaking registers or styles depending on the general characteristics of who we perceive to be our audience.

A classic example of this is what in linguistics is called child-directed speech. It goes by many other names, such as infant-directed speech and baby-directed speech, although the latter can also refer to infant speech.

In many previous studies, we find the term motherais, as well as more inclusive terms like parentais and caregiver.

Baby talk can also refer to the child's speech.


Baby talk can also refer to the child’s speech.

Child-directed speech has a number of key characteristics. These include simpler phrases and vocabulary, as well as special words such as doggie and onomatopoeic forms like Choo Choo or bow wow. There are lots of repetitions and particular ways of speaking, using more dramatic intonation patterns with greater rises and falls and a generally higher pitch of voice.

Adults (and a child’s older siblings) adopt these ways of speaking without usually realizing it. Young children seem to find these types of speech more engaging and pay more attention to them, and features such as simple grammar and repetition provide good scaffolding for their learning.

Many of these characteristics are also found in what is sometimes called pet-directed speech, so perhaps we instinctively adopt a certain style of conversation with cuddly little creatures.

Paul Warren:


Paul Warren: “Along with particular ways of talking to children, it seems we also have particular ways of talking to our devices.”

It seems that we also have particular ways of talking to our devices. A recent study in the phonetics diary studied how speakers sound when talking to voice-activated smart systems such as Apple’s Siri or Amazon’s Alexa, compared to how they sound when talking with other humans.

The study showed that, unlike child-directed speech, Siri-directed speech has a lower pitch and a smaller pitch range than adult-directed speech. This smaller pitch range perhaps reflects less emotional engagement with a Siri than with a human. The pitch range increases during a Siri interaction, possibly reflecting increasing engagement with the device.

In one particularly interesting part of the study, participants took part in a simulation where they believed they were interacting with either a native English-speaking adult or a Siri. They were seated in front of a computer and asked to say a short sentence out loud, such as “The word is a bone.” They then heard a human voice or a Siri voice saying “Is that the word?” as a word appeared on the computer screen. If the wrong word appeared (for example, augur rather than bone), then the participant had to repeat the sentence.

The researchers were interested in how participants would change the way they pronounced the sentence in order to correct the error, and whether that differed depending on whether they thought they were speaking with a Siri or with another human. Most of the strategies were similar, including going the extra mile to speak more clearly.

However, Siri’s speech recognition is trained using informal speech, and exaggerated corrections make speech less intelligible to Siri.

The resulting “misunderstanding cycle” suggests that Siri training data should also include examples of speakers performing this type of correction.

Source link

Comments are closed.