Researchers develop an AI software system that lip-reads better
Scientists at the Oxford University have claimed to develop a new computer software program that can lip-read better than humans, which might give the hearing impaired a helping hand. The system was trained on thousands of hours of BBC news programs, the media channel said Friday.
The new artificial intelligence (AI) software system called “Watch, Attend and Spell (WAS)” has been developed by Oxford University in collaboration with Google’s DeepMind.
The AI system uses computer vision and machine learning methods to learn how to lip-read from a dataset made up of more than 5,000 hours of TV footage from the BBC. The videos included more than 118,000 sentences and a vocabulary of 17,500 words spoken by more than 1,000 different people.
The researchers put WAS to the test alongside a human expert lip-reader, tasking the pair with working out what was being said in a silent video using only the person’s mouth movements. The AI system correctly lip-read 50 percent of silent speech, while professional lip-readers got only 12 percent right, researchers found. The machine’s mistakes were small, including things like missing an “s” at the end of a word, or single letter misspellings.
While there’s lots more work to be done before the technology is put into practice, the researchers tell the BBC that the aim is to get it to work in real-time and such a feat is feasible. It will learn as long as they keep training it on TV footage.
The software could support a number of developments, including helping the hard of hearing to navigate the world around them. Speaking on the tech’s core value, Jesal Vishnuram, Action on Hearing Loss Technology Research Manager, said: “Action on Hearing Loss welcomes the development of new technology that helps people who are deaf or have a hearing loss to have better access to television through superior real-time subtitling.
“It is great to see research being conducted in this area, with new breakthroughs welcomed by Action on Hearing Loss by improving accessibility for people with a hearing loss. AI lip-reading technology would be able to enhance the accuracy and speed of speech-to-text especially in noisy environments and we encourage further research in this area and look forward to seeing new advances being made.”
WAS Joon Son Chung, lead-author of the study and a graduate student at Oxford’s Department of Engineering, commenting on the potential uses said: “Lip-reading is an impressive and challenging skill, so WAS can hopefully offer support to this task – for example, suggesting hypotheses for professional lip readers to verify using their expertise. There are also a host of other applications, such as dictating instructions to a phone in a noisy environment, dubbing archival silent films, resolving multi-talker simultaneous speech and improving the performance of automated speech recognition in general.”
The research paper describing the system can be accessed here.