Guest blogger: “The reliability of speechreading as forensic evidence” by Catherine Hill

Earlier this year, Australian linguistics professor, Helen Fraser, received a phone call asking whether she could recommend a lipreader to transcribe CCTV footage for use as evidence in the court of law. Fraser told the caller she was unable to do so, but found it an interesting question, so later asked me to investigate the standing of lipreading as forensic evidence for this blog.

This blog post summarises my findings on the reliability of speechreading. Beginning with linguistic accuracy, I then explore individual aptitudes, and finish with how lipreading is used in forensic evidence.

About the author – Cat Hill

I am a lover of languages with a passion for adventure. I hold a Bachelor of Arts from The University of Melbourne with majors in Linguistics and Spanish. I often spend time researching word etymologies and forming connections between languages. When I can, I enjoy mountaineering and going on road trips.

Speechreading

Speechreading is a form of visual speech perception whereby lipreading is combined with observation of facial expressions and gestures, using context to assist comprehension (Anderson, 2016). This term speechreading is more appropriate than lipreading as speech is multimodal, encompassing audible and visual context

The importance of speechreading in communication

Visual information greatly influences auditory information. Visual context can make speech is easier to understand. The McGurk Effect is a perceptual phenomenon in which audio is paired with a different video, creating the illusion of hearing what is being said in the video instead of the audio (BBC, 2010).

Fraser and her colleague Debbie Loakes have explained that the interpretation of covert recordings in court is susceptible to similar inaccuracies (Fraser, & Loakes, 2020, p. 406). The auditory evidence supplied is often of low quality and therefore, aided by a transcript. Despite the existing safeguards, transcripts can be erroneous and could drastically affect the jury’s understanding (Fraser, & Loakes, 2020, p. 406-7).

Speechreading without sound

Visual speech perception is much harder than auditory speech perception.

The term ‘viseme’ was coined in 1968 from ‘visual speech phoneme’ to refer to visually distinct units of speech (Fisher, 1968). There are less visemes than phonemes in English; only a quarter of the 44 English phonemes are visually distinct (Files et al., 2015). Words that are indistinguishable include those with different meanings but the same pronunciation such as ‘bear’ and ‘bare’, and those with sounds made at the back of the mouth using the larynx and velum such as the /ŋ/ in ‘bring’. Therefore, there are far more homophones for lip readers than for listeners.

Another limitation of speechreading to consider is the visual quality of the contextual information. Speech readers rely on illumination, rate of movement, and distance from the subject (Files et al., 2015).

Speechreading aptitude

Much of the literature surrounding speechreading suggests that Deaf individuals are more accurate at visual speech perception than hearing people (Bernstein, 2000, p. 233). A 2000 study found that the average speechreading accuracy for Deaf people is 44% and only 21% in hearing people (Bernstein, 2000, p. 234).

However, individuals in both groups scored the top score of 75% which implies that the best Deaf person is no better at speechreading than the best hearing person, suggesting that lipreading is in fact, innate (Bernstein, 2000, p. 233). Of the Deaf participants, those with ‘severe to profound hearing impairment’ scored higher and therefore, while speechreading aptitude may be innate, perhaps the skill can be improved through practise (Bernstein, 2000, p. 246).

For this blog post, I interviewed Patricia Thornton, former principal of the Victorian College for the Deaf (quoted here with permission). Thornton acknowledges the importance of speechreading in Deaf comprehension of common words and phrases.

The teaching of speechreading does not feature prominently in modern education. This could be due to its unreliability or the fact that the skill is innate. Historically, oralism (Deaf education through spoken language) was once the main form of education for Deaf students. Classroom use of sign language was prohibited at the 1880 International Congress on the Education of the Deaf (ICED) in Milan, lasting until the 2010 ICED in Vancouver where the prohibition was officially retracted (World Federation of the Deaf, 2016). The recent cultural shift to be more inclusive of people with different needs and abilities has meant Deaf people are given more opportunities to partake in spoken communication.

Thornton also credits the decrease in emphasis on speechreading in Australian Deaf education since the 1980s to the rise in prevalence of AUSLAN (Australian Sign Language) and to vaccines that target diseases causing hearing loss (Meningococcal and MMR).

Speechreading as forensic evidence

Speechreading as forensic evidence is accepted in principle in Australia, the UK, and the US. Forensic speechreading is typically performed from a CCTV recording and is harder than live speechreading as it lacks full contextual information.

Despite no official testing of speechreading ability, there are several self-employed professional speech readers across the world. A 2004 landmark case saw speechreading established as admissible as evidence in the UK when Deaf speech reader, Jessica Rees, provided expert evidence at Reading Crown Court (R v Luttrell & Ors, 2004, p. 13-59). The judge was required to issue a special warning to the jury regarding the limitations of speechreading. Ultimately, the defendant was found guilty and sentenced to three years in prison (R v Luttrell & Ors, 2004, p. 2).

Rees was also the first expert speech reader to give evidence in Australia. In 2012, Rees compiled a nine-page transcript using CCTV footage from an evening in which a casino patron died following a conflict with security guards. With the aid of Rees’ evidence, one of the six guards, one was charged with manslaughter and the others faced assault charges (Russel, 2012).

How reliable is forensic speechreading?

This blog post has demonstrated the linguistic limitations of speechreading, the complexities of training and testing forensic speechreading abilities, and how it is used in forensic evidence.

Thornton’s main stance on speechreading is that while Deaf speech readers can decipher common words, it would be unreliable as forensic evidence.

In UK courts, a judge is required to issue the jury a warning concerning the inaccuracies of speechreading (R v Luttrell & Ors, 2004, p. 2).

Conforming with the Australian Criminal Code Act 1995, evidence must be proved ‘beyond reasonable doubt’ (Presumption of Innocence). On average, speechreading is only around 30% accurate (Bernstein, 2000, p. 246).

While speechreading is a valid form of comprehension, its reliability is less than that of listening comprehension. Speechreading could be used as a tool in assisting forensic investigations where appropriate. However, it should be regulated for use on trial. An official forensic speechreading test needs to be developed to identify expert speech readers and ensure accurate speechreading.

Languages other than English?

It is notable that all the references about speech reading citing here refer to English. It would be interesting to know about speechreading in languages other than English, if any readers have information on that topic.

Virtual speechreading

In a time where Automatic Speech Recognition (ASR) is pervasive in our society, perhaps Automated Lip Reading will become a valuable tool in forensics – although given the limitations of ASR when dealing with indistinct speech, explored in presentations by Debbie Loakes and Lauren Harrington (videos on this blog), we would need to be very careful to ensure automated lip reading was used responsibly.

References

Anderson, K. (2016). Speechreading. Supporting Success for Children with Hearing Loss.

BBC. (2010). Try this bizarre audio illusion! BBC [Video]. YouTube.

Bernstein, L., Tucker, E., & Demorest, M. (2000). Speech perception without hearing. Perception & Psychophysics. 62(2).

Files, B., Tjan, B., Jiang, J., & Bernstein, L. (2015). Visual speech discrimination and identification of natural and synthetic consonant stimuli. Front Psychol. 6(878).

Fisher, C. (1968). Confusions Among Visually Perceived Consonants. Speech and Hearing Research. 11(4).

Fraser, H., & Loakes, D. (2020). Acoustic injustice: The experience of listening to indistinct covert recordings presented as evidence in court. Law Text Culture. 24.

Presumption of Innocence. Attorney-General’s Department.

R v Luttrell & Ors (2004). England and Wales Court of Appeal (Criminal Division).

Russel, M. (2012). Battle over lip-read expert in Crown ‘shut-down’ case. The Age.

World Federation of the Deaf. (2016). 21st International Congress on the Education of the Deaf (ICED) in July 2010 in Vancouver, Canada. https://wfdeaf.org/news/21st-international-congress-on-the-education-of-the-deaf-iced-in-july-2010-in-vancouver-canada/

Zajechowski, M. (2022). Automated Speech Recognition (ASR) Software – An Introduction. Usability Geek.

~~~

Header Image credit

Koorosh Orooj, CC BY-SA 4.0, via Wikimedia Commons