Using the findings of the McGurk effect, a team at Disney's Research labs has figured out a way to read all possible said phrases from video of person speaking
According to latest Disney news, Disney’s research lab has put out a lot of adorable and sometimes weird creations over the past few years. We’ve seen 3D printers capable of making “huggable” things using felt as a raw material, cute robots which can etch massive pictures on the beach and even spinning tops made in unorthodox shapes which defy physics.
This time around, the lab has churned out a tool which utilizes advanced algorithms to produce videos which shows people saying things which they actually didn’t.
The project is akin to lip reading, with the result looking similar to the videos by Youtuber BadLipReading which are immensely popular, as reported by current . While it may seem easy for deaf people, lip reading is actually very tough. This is because while speaking, we make a lot of different shapes with our mouth, and a lot of those shapes resemble each other, making it hard to decipher which one it is.
To make the process of listening smoother, the brain accepts both the visual and aural clues from your eyes and ears, and then perceives what the other person is saying to make us hear the correct sound as sometimes lip-reading isn’t enough to gauge speech. For example, if you record three separate videos of a person saying “bah, “vah” and “gah” and remove the audio, you’ll discover that the lip movement is the same between all three phrases and you won’t be able to decipher which one is which without hearing the sound. This effect is called the McGurk effect.
Using the same concept, a research Team from the lab has figured out an algorithm that can generate all the possible said phrases by looking at a video of a person saying something.
For example, a recording containing a person saying “clean swatches” could be audio swapped with 9,000 different phrases that would fit the lip movement in the video quite well; while 9,000 different phrases may have been generated, not all of the fit the bill. Most of the phrases don’t even make sense. You’ll see hilarious results like “like to watch you” or even “need no pots,” however some do make perfect sense and can be used to re-dub a video while keeping the lip-sync intact.
The following is a demo video for the project from Disney. Keep in mind that the demo video uses a roboto voice to ensure efficiency, however a human voice would obviously look more proper, and the illusion would be more believable.
While this doesn’t have any real utility right now besides being a proof of concept, it’s a good example to demonstrate just how much effort our brain puts into perceiving the most simplest of things.