Taylor and Francis Group is part of the Academic Publishing Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.


Unit 9: Prosody

The contribution of the voice to perceived speaker meaning has long been recognized as very important. In the nineteenth and early twentieth centuries many books were written about how to speak well (‘elocution’), and these included not only instruction in ‘correct’ pronunciation, but also guidance on how to speak effectively. Speaking was seen as an art form, and all those who spoke publicly, including politicians, ministers of religion and actors, were considered to need training in how to do it well. Such skills were, of course, the main focus of the much earlier study of rhetoric, which was concerned both with what was said and how it was said. Elocution handbooks on the other hand were concerned solely with the voice, and much of what they contain in relation to speaking effectively involves what we now refer to as prosody. This, as you have read in Unit A9, has a number of component elements – tempo, voice quality, loudness and pitch, and the most important of these was considered to be pitch or intonation, referred to then as ‘modulation’. An early handbook of the ‘elocutionary art’ (Brewer 1912) claimed that it was only through modulation that it was possible to establish ‘a sympathy between the speaker and his audience’ (1912: 83). The so-called ‘attitudinal function’ of intonation has continued to be seen as primary, but exactly how we convey an attitude in our voices has so far been difficult to determine.

In Unit B9.4 you can read an extract from the work of Carlos Gussenhoven (2004), who has suggested that some effects originate in animal behaviour. His theory of ‘Biological Codes’ builds on earlier work by John Ohala (e.g. REF), who coined the notion of the ‘Frequency Code’ relating specifically to Fundamental Frequency (pitch).1

Recent work at the interface between prosody and pragmatics (e.g. Barth-Weingarten et al. 2009) suggests that meaning can be conveyed both by paralinguistic features of the voice, for example, changes in pitch range over longer stretches of speech, and by linguistic choices, for example, choice of a rising or falling nucleus, which can trigger prosodic implicatures if there is a mismatch between expected and unexpected usage.

In this section we suggest two different ways to follow up what you have read about prosody. The first deals with the very elusive attitudinal intonation, so often linked with emotion, which assumes that these meanings are paralinguistic effects. The second is an examination of a recent innovation in English intonation – ‘Uptalk’ – which seems to reverse the expected phonological choice of nuclear tone associated with statements.

Some sound files are included which illustrate prosodic patterns mentioned in the book.

  • 9.1. Emotion and attitude in the voice.
  • 9.2. Speaker meaning and linguistic choices: the case of ‘Uptalk’.
  • 9.3. Audio Excerpts from the book


  • Barth-Weingarten, D., N. Dehé and A. Wichmann (eds.) (2009) Where Prosody Meets Pragmatics. Bingley: Emerald
  • Brewer, R.F. (1912) ‘Speech’ in R.D. Blackman (ed.) Voice Speech and Gesture: A practical handbook to the elocutionary art. Edinburgh: John Grant
  • Gussenhoven, C. (2004) The phonology of tone and intonation. Cambridge: Cambridge University Press.

1 Pitch is the term for what we hear, while the fundamental frequency (F0) is the measurable acoustic correlate. It is measured in Hertz.

9.1. Emotion and attitude in the voice

Intonation is known to indicate the speaker’s attitudes and emotions, and many studies, both descriptive and experimental, have been carried out to find out exactly how this is signalled in the voice. It has, however, proved difficult to identify reliable characteristics, and listening tests show that some emotions are easily confused with others unless there is additional contextual information. One of the greatest problems has been the failure to distinguish adequately between emotion on the one hand – how the speaker feels – and attitude on the other hand – the speaker’s stance or behaviour towards an interlocutor. The simplest way to illustrate this is by comparing what is meant by you sound sad with he sounded so patronizing: a person can feel sad on their own, whether they speak or not, while it is not possible to be patronizing on your own. The first is an emotion and the second is an attitude.

The effects of emotion on the voice have been of great interest to those working in speech technology, especially automatic speech recognition. But to get a computer to recognize emotions in a voice it is first necessary for humans to analyse how they do it themselves. The use of emotional labels has been unreliable: for example, what one listener hears as anger another listener may hear as fear. Referred to in Unit B9.1, Cowie et al. (2000) have developed a way of tracking perceived emotion in speech that avoids the use of labels and requires listeners to place what they hear in an emotional ‘space’ that has two intersecting dimensions – the active–passive dimension and the positive–negative dimension.  According to this, for example, anger would be in the active–negative quarter, while sadness would be in the negative–passive quarter.


Such studies, however, assume that the cues to emotion or attitude are in the utterance itself, in other words, that there is something about how a word, phrase or utterance is said that contains the information that hearers use. However, other research suggests that this is not necessarily the case. It seems that sequential relationships are also important in conveying speaker meaning. You can read about this in relation to conversation analysis in Unit A6.

In one interesting study, Cauldwell (2000) edited some recordings of conversation that he had collected. He took two short utterances and played them to listeners, first in isolation and then in context. The impressions reported by the listeners were very different in each case: in isolation most of them thought the utterances sounded angry, but in context only 10 percent of the listeners heard any anger. This suggests that emotions and attitudes that we ‘hear’ in people’s voices are not necessarily in the utterance itself but in the conjunction between the sound of the voice and the sequential context, or even in the words themselves.


You can carry out your own study if you have access to a simple sound editor.

  • Record someone reading the same sentence in a happy, angry, sad, enthusiastic way (choose your own adjectives), and then play them to listeners.
  • You can
    • ask listeners to match a given list of attitudes/emotions with the recordings (perhaps with a few extra ones as distractors), or
    • ask listeners for their own descriptions: e.g. How does this person sound? Or
    • ask listeners to place what they hear in a circle like that used by Cowie et al. (described above).

Remember to give some thought to your methodology here.

  • How easy would it be to record people who were really angry, or sad, or happy?
  • Might there be ethical problems involved?
  • Are you sure that the attitude or emotion is not obvious from the words themselves? I am absolutely furious is unlikely to sound happy; We had a wonderful time is unlikely to sound angry. On the other hand, How did it begin? could be said in a number of different contexts with different underlying emotions, and might be a good sentence to try.

When you have designed and carried out your study, either with sentences devised and read for the purpose or using naturally occurring utterances, note any difficulties listeners may have in identifying the intended, or inferred attitude. Why might this be?


  • Cauldwell, R.T. (2000) ‘Where did the anger go? The role of context in interpreting emotion in speech’ in Proceedings of the ISCA workshop on Speech and Emotion, Belfast, pp127–31
  • Cowie, R., E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey and M. Schröder. (2000) ‘“Feeltrace”: an instrument for recording perceived emotion in real time’, in R. Cowie, E. Douglas-Cowie and M. Schröder (eds) Speech and emotion: Proceedings of the ISCA workshop, Belfast, NI: Textflow, pp19–24
  • Wichmann, A. (2000) Intonation in Text and Discourse. London: Longman (especially Chapter 6)
  • Wichmann, A. and Cauldwell, R.T. (2003) ‘Wh-Questions and attitude: The effect of context’ in A. Wilson, P. Rayson and T. McEnery (eds.) Corpus Linguistics by the Lune. Frankfurt: Peter Lang, pp291–305

9.2. Speaker meaning and linguistic choices: the case of ‘Uptalk’

As you can read in Unit A9.3 and Unit B9.3, some intonational meaning is conveyed not by paralinguistic variation in the voice but by means of simple linguistic choices, such as the choice of a fall or a rise. A fundamental distinction in English, in the standard UK variety at least, is that between a question and a statement. Questions typically end in a rising nucleus, while statements typically end in a fall. As in:

A: Are you coming to the party to/night?
B: I intend to but I might be a bit \late.1

Of course, there may be many reasons why, in a given context, these typical patterns do not occur, but there has been one apparent innovation in English that has attracted considerable attention. According to Wells (2006: 37) ‘[s]ince about 1980 a new use of a rising tone on statements has started to be heard in English’. As the examples below show, the rising tone is used where conventionally a falling tone would be used, and tends to make statements sound more like questions. It is referred to sometimes as the ‘high rising terminal’ but is more generally known as ‘Uptalk’. This way of speaking is typically only used by younger speakers, and older speakers tend to be very annoyed by it. It has also been associated with female speakers, but there is little evidence to support this in our view.

Here are some of the examples quoted by Wells (with adapted intonation transcription):

It’d be safer if you stayed with /friends for a couple of days.
We’re working people but our pay doesn’t re/flect that.
Where are you working? I’m in an office in Princess /Hall. (Wells 2006: 37)

What appears to be happening is that speakers are superimposing on their declarative utterances a rising intonation contour that functions to elicit some kind of backchannel response from the interlocutors. It is not changing the statement to a question, but giving the statement a dual function: (1) this is my information, (2) are you listening?

In the context of conversation there is rarely any ambiguity involved – it is clear that the utterance is intended as a statement and not as a question – but those who do not use it tend to dislike it. It can, however, be a genuine source of misunderstanding. One of the authors of this volume observed a young student teacher in a class of undergraduates. Unlike the regular, older teachers, the (male) student teacher was a very marked user of Uptalk, and throughout the lesson students repeatedly prepared to answer what they thought were questions – by intake of breath, change of posture, raising the hand – only to discover that the teacher was continuing to speak and the utterance had simply been a statement with a rising intonation contour. This shows that listeners’ expectations may be guided by the situational context – in an interactive teaching session students expect to be asked questions, and while they frequently used Uptalk themselves, they did not associate it with their teachers, who were mostly older and therefore did not use it.


  • Listen to the recordings of the examples from Wells. They have been spoken twice, first with a falling nucleus, which is the more expected choice, and secondly with a rising nucleus, making them sound more like questions.
  • Can you find examples of this usage, or references to it?

Older people are often annoyed about the way in which young people use language. This is often an indication that language is changing permanently, but Uptalk is sometimes used and then dropped again by the same speakers. It may also be the case that speakers adapt their usage in different contexts.

  • What should learners be encouraged to do?


  • Cruttenden, A. (1997) Intonation. 2nd edn. Cambridge: Cambridge University Press (especially pp128–31)
  • Wells. J.C. (2006) English Intonation. Cambridge: Cambridge University Press (especially pp37–8)

1 Remember that the forward slash / indicates a rising tone and the backward slash \ indicates a falling tone.

9.3. Audio Excerpts from the Textbook

This book has a strong focus on spoken English, and contains a number of examples that illustrate how intonation can contribute to the meaning of an utterance. In the printed form it is necessary to use a set of symbols to indicate the different pitch movements that contribute to these different meanings. The most common contours are falls, rises and rise-falls, which are transcribed with the symbols \, / and \/ respectively. If the word is polysyllabic, the symbol is always placed immediately before the accented syllable.

For example:

\yes; \yesterday; abso\lutely; /no; /maybe; to/day; \/so; \/actually; to\/day
We have recorded some of the examples in the book so that you can hear how they might sound. These recordings are slow and careful, but of course in fast, natural speech these pitch movements are harder to identify.

A9.2 Prosody and Information Structure (pp98–9)

The sentences on page 99 indicate the unmarked position of the sentence accent (nucleus). The accented syllable is capitalized.

You will hear them read as they are here, with the prosodic prominence on the last lexical item. Note that this is not always the last word.

  1. I think I’ll go HOME
  2. Can you give me some MOney
  3. Don’t LOOK at it
  4. Can we HELP him
  5. It’s not a good time to SEE him about it
  6. I’ve cut out an interesting NEWSpaper article
  7. There’s nothing to LOOK at
  8. Pass the DICtionary | there’s something I want to look UP

Box A9.1 (p99)

You ARE the weakest link – goodbye

This is the expression used to dismiss a contestant from the popular British TV programme The Weakest Link. The Praat picture in your book shows the intonation contour as it was recorded by a male British speaker. Here the recording has been made by a female British speaker. It is spoken twice; the second time with a wider pitch range.

  • Note the marked prominence on ARE.
  • Note also the intonation on ‘Goodbye’. The first syllable is relatively high, and the second syllable is lower and rises slightly. This final rise sounds rather casual and dismissive to a British English ear.


The fall-rise contour (and also the rise) is normally used to indicate that the speaker has not finished. A simple intonation transcription marks this contour as \/, and the symbol is placed immediately before the accented syllable.

You can hear these fragments being read as they are printed (i.e. unfinished):

  • I went to \/town, and then...
  • \/Sometimes, we like to...

The non-finality of the fall-rise contour can also be exploited strategically to generate an implicature – to suggest a meaning without saying it. It usually implies an unspoken ‘but...’

In the following fictional conversation, the second speaker makes this explicit by asking ‘But?’ with a rising (here: questioning) tone.

A: What do you think of \Hubert?
B: He’s very me\/ticulous
A: /But?
B: Utterly \boring


A single word such as no can be spoken in different ways. Here you can hear the word no first with rising tone, then with a falling tone and finally with a fall-rise.

/no      \no     \/no


You will hear a series of utterances containing sorry, with different intonation patterns as indicated.


  1. I’m very \sorry
  2. I’m very \/sorry
  3. \/ Sorry
  4. /Sorry?


  1. I’m \sorry I’m /late
  2. |Sorry about /that
  3. \Sorry /Neil
  4. I’m \so /sorry
  5. I’m \so \/sorry
  6. I \am sorry

A: \Pete’s arrived
B: /Sorry?
A: I said \Pete’s arrived

B9.3 wichmann 2004 (pp219–21)

Wichmann, A. (2004) ‘The intonation of please-requests: a corpus based study’, Journal of Pragmatics 36: 1543–5.

The examples illustrated in Fig. 1 on p220 show different ways of saying please-requests. The original utterances described in the article came from a spoken corpus (ICE–GB) but they are read here for illustration only.

  • Can I have a glass of water please
  • Could we have a second question please
  • Please go on


The examples on p263 are adapted from a corpus (ICE–GB). Here they are being read aloud for illustration only.

1. And thank you a\gain Rob | ­very very \much.

2. A: What’s your \name?
B: \Susan
A: \Susan | \thank you

3. A: How \are you?
B: Oh \fine /thank you

4. A: Here’s your pres/cription
B: Thanks very \much

5. I don’t want big lifesize photographs of relatives hanging on the wall /thank you.

6. A: Here’s your /ticket
B: /Thank you

7. A: Do you want a cuppa?
B: \No /thanks