Helen in Melbourne
The Hub has had a busy month, with Helen coming to Melbourne (for the first time in two years!). We had some work to do together on a research project (which we will report on soon) and Helen gave two lectures about forensic linguistics in the first year Language course. The first lecture focused on written language as evidence, while the second looked at spoken language as evidence. There was a good response from the students to the material Helen went through – you can see her in action below.
One of the really interesting things about the lectures is that the live-captioning was turned on. Live-captioning is great for accessibility, providing (almost) real time text on the screen as the speaker talks . Working on transcription in the Hub, and also having focused on automatic methods (see a previous blog post here), my interest was piqued. One thing that was especially interesting is that hesitations such as um and ah are factored out immediately and do not appear in the live-captioning at all (although sometimes it captioned ah as are which are phonetically the same in Australian English).
The live-captioning, while exceptionally good, provided some interesting errors that I couldn’t help be interested in. The system is corrective, so there was some amusing errors which were auto-corrected as Helen was speaking, but a lot of them slipped though too. Here is a small list of examples, with what Helen said on the left and what the system captioned it with on the right.
Helen | Live-caption |
Disadvantage before the law | Disadvantage before the ball |
Written language and spoken language | Recent language and spoken language |
As someone who was present during the crime | As someone whose president during the crime |
somebody who’s confident | somebody who’s conferences |
gunshot residue | gunshot resident |
Where had it been | Where headed being |
I’d like to speak about written language as evidence | I’d like to speak about Britain Language as evidence |
AIFL | AFL |
A hold up at a petrol station | A whole lot better petrol station |
phonetic science | pathetic science |
you measure the pitch of the voices | you measure the ph of the voices |
indistinct audio | indistinct audience |
maybe that’s the experience of a lot of you | maybe that’s the experience of logic view |
Debbie Loakes | Davey Lopes |
Note that most of these errors make sense phonetically, but not semantically. For example the errors usually contain sounds / sound patterns similar to the original utterance, but the system has misclassified them into something else (arguably, something more frequent / likely in day-to-day speech). This is similar to what I observed when trying to apply automatic transcription to indistinct covert recordings – see the hyperlink above, or you could read about it in much more detail in Loakes (2022). Taking an example from the above table, AIFL (which is the Aston Institute for Forensic Linguistics) was live-captioned as AFL (which is the Australian Football League) – this latter acronym is used very frequently in Australia and, while wrong, it was probably a statistically more likely choice for the system to make.
Another interesting thing to note is that Helen must use a lot of “fricated /t/” sounds – this is when /t/ sounds a bit like /s/ (a normal phonetic feature for some speakers of Australian English). This really tricked the system, such as when written language and spoken language was live-captioned as recent language and spoken language (there were other examples too, not provided here). Finally, I also particularly liked the error made by the system when phonetic science was live-captioned as pathetic science and of course my “new name” Davey.
Of course we can make light of these mistakes as they occur in a relatively low stakes context, although they are no doubt frustrating for students using the live-captions to help them. However, we know that errors in converting spoken language to written language can have a “dark side” so it is always important to keep judging the trade-off between accuracy and efficiency, and to keep assessing the usefulness of automatic systems in every context in which they are applied.
During Helen’s visit, we also had a little down time. We went to dinner at the 50 year old Shakahari restuarant one night, joined by Ester Leung and Sunyoung Oh from the Asia Institute. We had gotten in touch with them because they presented on a fantastic podcast featured on Ear to Asia called Beyond Squid Game: Translating Asian film and TV for a hungry global market. We found some very interesting parallels with their discussion on automatic translation in particular. Having said that, the podcast is just generally very engaging for people who are interested in translation and in linguistics (note that Sunyoung has a PhD in linguistics, and Ester has also written about forensic linguistics in Hong Kong!). Some things I learnt about from that podcast are that people crowd-source translations for popular Korean television shows – Ester and Sunyong talk about how such translations contain errors, but because it is a low-stakes activity (i.e. for entertainment purposes) it is not too problematic. I also learnt about the idea of “English templating”, in which a translation is made into English and then translated into various other languages, which happens in the case of Korean television shows and apparently many other contexts too. Tune in to their podcast to learn more.
References
Burridge, K. (2017). The dark side of mondegreens: How a simple mishearing can lead to wrongful conviction. The Conversation
Loakes, D. (2022). Does Automatic Speech Recognition (ASR) have a role in the transcription of indistinct covert recordings for forensic purposes? Frontiers in Communication, Vol. 7, Article ID 803452.