To transcribe something is to put it into a written or printed form.
Once we have transcribed spoken data, we then have a transcription that we can use to analyse the spoken data.
A transcription (or transcript) is a written or printed version of something.
In this article, we’re going to look at why we transcribe spoken data, how we transcribe, how the International Phonetic Alphabet is used in transcription, and then how to cite speech transcription.
Why do we Transcribe Spoken Data?
Due to the nature of spoken language, once we’ve heard it, we generally can’t hear it again.
Spoken data is simply data of language which represents how it was spoken. Spoken data differs from written language in that it usually shows the more informal language features that aren’t present in written language.
To collect spoken data that we can listen to again, we must record it. This can be done either as an audio recording or as an audio-visual recording (video) where we can then listen to the spoken data as many times as we need.
Although having audio recordings is important when analysing spoken data, it’s not always the most useful way to store data, as it can't be analysed and it can make it difficult to find a specific piece of data quickly.
We transcribe spoken data so that we have a written form of it. This makes it much easier to analyse what has been said and how. Looking at the content of the spoken data (such as topics, words and interruptions) can be useful in areas of linguistics like sociolinguistics where we may need to analyse and compare the language of different speakers.
Language differences can vary among speakers and can be related to social aspects such as age, class, gender, occupation, ethnicity and region.
Another reason why we transcribe spoken data is to look at a person’s accent and pronunciation features. This is done by transcribing data using the International Phonetic Alphabet, which we’ll look at in a bit more detail later. Doing this allows greater and more specific speech analysis in fields such as phonetics and phonology.
Accent and pronunciation features are the aspects of spoken language that can differ between different speakers. For example, how the /a/ in ‘bath’ is pronounced differently in British accents. Here, a short /a/ sound in ‘bath’ is a feature of northern accents.
Fig. 1 - Transcribing data involves writing it out.
Transcription of data in research
Before transcribing, you first need to collect the data. This is done most often through recording spoken language either as an audio recording or recording as a video – having a video may be useful for looking at things such as NVC within a person's speech.
NVC stands for non-verbal communication and is the name given to any sort of gesture, movement or facial expression used to communicate something. NVC is often used in conjunction with verbal communication (speech) but can also be used on its own.
When recording and transcribing data, certain factors need to be considered. These are ethics and the observer’s paradox.
Ethics
In relation to ethics, we need to think about what is the morally right practice as researchers. As spoken language is produced by an individual and is unique to that individual, you need their permission to record them.
If you don’t ask permission before recording someone, it could be considered a breach of that person’s privacy. Every study that requires spoken data has to first go through ethical considerations and make sure that permission has been asked for where it is needed.
The observer’s paradox
The observer’s paradox is the name given to the problem that arises when trying to record natural spoken language. Most natural speech occurs when the speakers are completely at ease and talking casually amongst themselves.
When recording data though, there is usually an observer (the person recording the data) or at the very least a recording device. Due to ethical considerations, the speakers will also know that they are being recorded. As much as people may try to speak naturally, there is always an element of being a bit on edge when you know you’re being recorded or listened to. This may cause the speaker to either consciously or subconsciously alter how they speak.
How to overcome observer’s paradox
When collecting data, you can make certain allowances for observer’s paradox to overcome it. One thing you could do is ask for permission to record someone’s speech in advance of doing it and then record them when they’re not expecting it. With this method, you’ll have to let them listen to what you recorded before you use it as data to make sure they’re happy with you using it.
Another way to try and sidestep the observer’s paradox is to let people know that you are recording them and then lead the conversation through some casual topics before you get to the conversation you want to record.
By doing this, you’ll allow the speakers to get accustomed to being recorded and settle into speaking more naturally by the time it gets to the data you need. This will hopefully encourage more natural speech.
Transcribing Data
Before you start writing out your data into transcript form, you’ll need to write a sentence or two outlining some basic context. This will need to include:
Where and when the interaction is taking place
Who the speakers are
Any contextual information relevant to your study, for example, the gender of the speakers if you’re looking at language and gender
When writing out a transcript, you’ll first need to listen to your recording and write out what was said. It’s a good idea to listen to the recording a few times to make sure you write what you actually hear and not what you expect to hear.
It’s easy to mishear and automatically correct what you hear when you write it down. You’ve got to be careful not to do this when transcribing as you want a true representation of the spoken data.
If something is said that is unusual or of note (this will depend on what you’re looking for), it’s a good idea to annotate this on your transcript and to listen through again to see if it appears anywhere else as well.
Features of communication that can be shown in transcriptions:
Feature | Definition | As it would be shown in a transcript |
False start | Where someone starts speaking, pauses, and starts again. | John: I don't think... I didn't really see him. |
Micro-pauses | A pause in speech that is less than a tenth of a second. | (.) |
Pause | A pause in speech longer than a tenth of a second, showing the length of the pause in seconds. | (0.6) |
Interruptions | Where one speaker interrupts another. Two slashes indicate at what point the speaker interrupts. | John: I did see that the game // was on over the weekend.Peter: // The game was amazing! |
Simultaneous speech | This is where two speakers are speaking at the same time, indicated with lines on either side of simultaneous speech. | John: Did you see the game? It was amazing, | there was a goal right at the end of the second half! |Peter: | It was so close! I couldn't believe they got in there so quick with that goal. | |
Repetition | Where the same word or utterance is repeated. | John: I did see that. I did see that yeah. |
Stutter | Where a speaker struggles to keep a flow in speech. | Tom: D d d did you see the g g game? |
Filler | A small word inserted by a speaker in-between utterances. | John: I erm, did see uh, that it like, was really sudden. |
Making note of specific speech sounds, such as phonemes can be done by using the International Phonetic Alphabet.
What is the International Phonetic Alphabet?
The International Phonetic Alphabet (IPA) was developed in the 19th century as an internationally recognised system of phonetic symbols. Each symbol corresponds to one specific speech sound, removing the confusion caused by having multiple sounds represented by the same letters.
In English, the letter ‘c’ either sounds like ‘see’ or ‘k,’ as in the words 'cat' and 'centipede'. The IPA symbols can help us differentiate between the sounds as there is a different symbol for each different sound, such as /kæt/ for cat and /sɛntɪpi:d/ for centipede.
You can have a look at all of the different symbols are in the IPA chart here.
Fig. 2 - IPA Chart.
How to use the IPA when Transcribing Spoken Data
Using IPA in transcribing spoken data can make your data much more accurate and can be especially useful if you're looking at accent features such as vowel pronunciation in your spoken data. In A-level English language, you won’t be expected to transcribe whole extracts into IPA, but you will be expected to have a basic understanding of it.
Let's look at an example of how the IPA can be used to show pronunciation features.
A glottal stop is a closing of the throat which creates a pause in the airflow. Glottal stops usually replace consonants at the end or middle of words in certain languages and dialects. In the IPA, the glottal stop is represented with this symbol /ʔ/.
Let's look at the glottal stop that appears in the word hat in certain dialects.
If the ‘t’ is pronounced, it would be written as /hat/.
If the ‘t’ isn’t pronounced and is replaced with a glottal stop, it would be written as /haʔ/.
When you write something using IPA, make sure to put slanted brackets on either side of it to indicate your use of IPA. For example, /kat/ for ‘cat,’ /wau/ for ‘wow,’ and /beið/ for ‘bathe.’ The slanted brackets are for phonemic transcription (otherwise known as broad transcription) which is language-specific and records enough details to show how words differ from others in a language. Square brackets [ ] are used for narrow transcription which records as many details in the sound as possible.
In the IPA chart, there are also diacritics and suprasegmentals which are the small marks placed next to, under, or on top of vowel or consonant symbols and give much greater information about the prosodic features of the speech sounds.
Prosodic features are the extra elements of speech sound, such as tone, intonation, rhythm, and stress.
The use of suprasegmentals and diacritics can be used to show stress, syllables and the linking of speech so that you can represent in written form exactly how something has been said. When adding diacritics and suprasegmentals into your transcription, you need to use square brackets around the transcribed speech to show that it's narrow transcription.
Transcript example
This transcript is an extract from a recorded conversation between two friends (Polly and Laura) who are planning a trip. You can spot some of the features from the table earlier.
1 Polly: Well I was thinking that we could all get the train together.
2 Laura: (0.5) Yeah… Yeah well I was going to say I could drive some of (.) four
3 of us.
4 Polly: Oh yeah (2) Well how about (.) | how about girls | in the car and boys
5 on the train. | |
6 Laura: | How about we |
7 Yeah that sounds okay (1) We’ll have to //
8 Polly: // I mean (.) we’ll have to see (.) Like we’ll have to ask the boys what
9 they think
10 Laura: Yeah yeah
What are we looking at in this example?
Line 1 is an example of an utterance without any notable speech features.
In line 2, we can see that Laura took a pause of half a second before she started speaking, and then took another micro-pause later on in her utterance.
In line 4, Polly pauses for two seconds and then we see an example of simultaneous speech. In this simultaneous speech, Polly on line 4 says "how about girls" while Laura on line 6 says "how about we." As the lines are around those two sections of utterances, these are the only two sections that are spoken simultaneously.
In lines 7 and 8, we can see an interruption where the double slanted brackets are. Here, Polly interrupts Laura and then carries on speaking.
An utterance is a spoken sound, word or sentence. ‘Utterance’ is often used in relation to transcription instead of ‘sentence.’
Citing speech transcriptions
When you first reference the transcript you’re talking about in your work, it’s usually good to cite the year and to give an overview of the general context, saying briefly who the speakers are and where the conversation is taking place (providing it’s relevant to what you’re discussing). From then on, it’s usually fine to reference a line number (as all transcripts should have numbered lines) and also state who is speaking to make it clear for your reader.
Quoting transcriptions
When quoting a short utterance or a word, simply put it in quote marks as you would when quoting a book.
In line 4, Polly pauses for 2 seconds, saying "oh yeah (2) Well how about."
When you are explaining something with the help of the IPA, make sure to put that part in slanted brackets.
When quoting multiple lines, do it as a separate section underneath your paragraph and then do your explanation underneath, making sure to still reference specific line numbers.
----- Paragraph explaining your point -----
“
Quoted lines from the transcript
”
----- Paragraph discussing the quoted text -----
Transcribing Spoken Data - Key Takeaways
A transcription is a written or printed version of something.
When recording data for transcription, we have to consider ethics and the observer’s paradox.
Transcripts can be used to show features of spoken language such as interruptions, pauses and simultaneous speech.
The International Phonetic Alphabet (IPA) can be used to represent specific sounds of speech.
When citing speech transcripts, you can either quote a short utterance or a longer extract.
References
- Fig. 2: IPA chart 2020 (https://commons.wikimedia.org/wiki/File:IPA_chart_2020.svg) by International Phonetic Association (https://www.internationalphoneticassociation.org/IPAcharts/IPA_chart_orig/IPA_charts_E.html) is licensed by CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/deed.en)
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel