Arts West building
Arts West Building, University of Melbourne main campus

Helen in Melbourne

The Hub has had a busy month, with Helen coming to Melbourne (for the first time in two years!). We had some work to do together on a research project (which we will report on soon) and Helen gave two lectures about forensic linguistics in the first year Language course. The first lecture focused on written language as evidence, while the second looked at spoken language as evidence. There was a good response from the students to the material Helen went through – you can see her in action below.

Helen Fraser lecturing about speaker identification (April 14, 2022)

One of the really interesting things about the lectures is that the live-captioning was turned on. Live-captioning is great for accessibility, providing (almost) real time text on the screen as the speaker talks . Working on transcription in the Hub, and also having focused on automatic methods (see a previous blog post here), my interest was piqued. One thing that was especially interesting is that hesitations such as um and ah are factored out immediately and do not appear in the live-captioning at all (although sometimes it captioned ah as are which are phonetically the same in Australian English).

The live-captioning, while exceptionally good, provided some interesting errors that I couldn’t help be interested in. The system is corrective, so there was some amusing errors which were auto-corrected as Helen was speaking, but a lot of them slipped though too. Here is a small list of examples, with what Helen said on the left and what the system captioned it with on the right.

Helen Live-caption
Disadvantage before the law Disadvantage before the ball
Written language and spoken language Recent language and spoken language
As someone who was present during the crime As someone whose president during the crime
somebody who’s confident somebody who’s conferences
gunshot residue gunshot resident
Where had it been Where headed being
I’d like to speak about written language as evidence I’d like to speak about Britain Language as evidence
AIFL AFL
A hold up at a petrol station A whole lot better petrol station
phonetic science pathetic science
you measure the pitch of the voices you measure the ph of the voices
indistinct audio indistinct audience
maybe that’s the experience of a lot of you maybe that’s the experience of logic view
Debbie Loakes Davey Lopes

Note that most of these errors make sense phonetically, but not semantically. For example the errors usually contain sounds / sound patterns similar to the original utterance, but the system has misclassified them into something else (arguably, something more frequent / likely in day-to-day speech). This is similar to what I observed when trying to apply automatic transcription to indistinct covert recordings – see the hyperlink above, or you could read about it in much more detail in Loakes (2022). Taking an example from the above table, AIFL (which is the Aston Institute for Forensic Linguistics) was live-captioned as AFL (which is the Australian Football League) – this latter acronym is used very frequently in Australia and, while wrong, it was probably a statistically more likely choice for the system to make.

Another interesting thing to note is that Helen must use a lot of “fricated /t/” sounds – this is when /t/ sounds a bit like /s/ (a normal phonetic feature for some speakers of Australian English). This really tricked the system, such as when written language and spoken language was live-captioned as recent language and spoken language (there were other examples too, not provided here). Finally, I also particularly liked the error made by the system when phonetic science was live-captioned as pathetic science and of course my “new name” Davey.

Of course we can make light of these mistakes as they occur in a relatively low stakes context, although they are no doubt frustrating for students using the live-captions to help them. However, we know that errors in converting spoken language to written language can have a “dark side” so it is always important to keep judging the trade-off between accuracy and efficiency, and to keep assessing the usefulness of automatic systems in every context in which they are applied.

During Helen’s visit, we also had a little down time. We went to dinner at the 50 year old Shakahari restuarant one night, joined by Ester Leung and Sunyoung Oh from the Asia Institute. We had gotten in touch with them because they presented on a fantastic podcast featured on Ear to Asia called Beyond Squid Game: Translating Asian film and TV for a hungry global market. We found some very interesting parallels with their discussion on automatic translation in particular. Having said that, the podcast is just generally very engaging for people who are interested in translation and in linguistics (note that Sunyoung has a PhD in linguistics, and Ester has also written about forensic linguistics in Hong Kong!). Some things I learnt about from that podcast are that people crowd-source translations for popular Korean television shows – Ester and Sunyong talk about how such translations contain errors, but because it is a low-stakes activity (i.e. for entertainment purposes) it is not too problematic. I also learnt about the idea of “English templating”, in which a translation is made into English and then translated into various other languages, which happens in the case of Korean television shows and apparently many other contexts too. Tune in to their podcast to learn more.

Image of the outside of the Shakahari restaurant – a very Melbourne view with various street artistry and graffiti
Aside from working hard on our research project, talking to students and (in Helen’s case) meeting with some potential collaborators who reside in Melbourne, we also made some plans going forward. We will be talking about those things soon in some upcoming blog posts.

References

Burridge, K. (2017). The dark side of mondegreens: How a simple mishearing can lead to wrongful conviction. The Conversation

Loakes, D. (2022). Does Automatic Speech Recognition (ASR) have a role in the transcription of indistinct covert recordings for forensic purposes? Frontiers in Communication, Vol. 7, Article ID 803452.