In a press conference earlier today Google announced a bold and ambitious plan to automatically add computer generated captioning to all English videos on Youtube. Capitalizing on more than a decade of expertise gained during the creation of services such as the Google Translation Engine and Google Voice, the web giant will leverage its powerful computing infrastructure to automatically convert English speech into text across virtually every video in its archive. The announcement from Google represents a landmark moment in the field of automated speech translation. It is also a much welcome boon to deaf and hearing impaired people across the world. What’s more, the automatic translation of millions of hours of speech laden video into text represents a big boost to the field of information and search. Millions of hours of speech will now be accessible to search engines. Google’s translation initiative portends a fast approaching world where all speech, whether it be in videos, phone conversations, or plain old human to human conversations will be automatically converted into text and made searchable for anyone, speaking any language, in the world. Be sure to watch the video after the break that explains most of what you want to know about Google’s video caption service.
The importance of the announcement today is not really about the technology or the capability of speech to text translation, both of which have been inching into the mainstream for many years now. Indeed, Google has been converting speech to text in Youtube videos to limited partners at least as long ago as Nov 2009. The real news here is the enormous scale of the translation that will occur. Google indicates that nearly every single English video on Youtube will soon be translated into text captioning, and languages other than English will be sure to follow. Even though only English videos can be captioned, once those videos are captioned into English their captions can then be translated into roughly 50 different languages. Below is a short video from Google that clearly explains the basics of their captioning service:
When it comes to managing information, whether it be searching, sorting, or grouping, text is king. Search engines such as Google or Bing, question answering services like Wolfram Alpha, and pretty much any other major information management tool you can imagine are all built upon the ability to analyze text. Give them videos or images, and they don’t know what to do with them or how to understand them, save for the text meta data that accompanies them. Today’s captioning announcement from Google means that literally millions of hours of speech that has been trapped within the world’s videos will now be extracted and eventually made available for search engines and all other information tools to manipulate. It is not clear when or how these captions will be made openly available to other services, but hopefully they will be made available soon. As if the world wasn’t already awash in information, it looks like Google has just opened the floodgates for a whole lot more information to flow into our lives.
Of course, for the millions of deaf people around the world the captioning initiative from Google is an especially exciting development, opening the doors to the rich universe of Youtube content that has thus far been very difficult to access. In its presentation today Google gave ample time for hearing impaired and deaf advocates to explain how awesome this captioning service will be for them. The promo video below demonstrates that value:
Automated language translation, either in the form of speech to text translation or translation from one language to another, has been a big theme for us here at the Hub. We have seen tweets automatically converted into other languages, an entire social network that allows people using any language to converse, and smartphone apps that can translate your dictated speech. Although full fledged human level artificial intelligence, or strong ai, may still be a ways off, advances in language translation demonstrate how artificial intelligence to conquer specific subsets of human intelligence, or narrow ai, is blooming all around us.
Star Trek quality universal language translation isn’t quite here yet, but it is tantalizingly close. Even with current technology the translations are good enough for most of our needs. Over the last few months there have been several occasions in which I was easily able to communicate with an individual over Facebook in a foreign language by passing our conversation through Google’s language translation engine. Ten to twenty years from now automatic translation will be really, really good, and language may completely cease to be a barrier to human communication. Spoken words and written words will seamlessly be converted from one language to another language in real time as we converse with each other in person or over the phone, engage in video chats with each other, write to each other, and – yes – even watch videos on Youtube.
*Disclosure – Keith Kleiner is a former employee of Google, Inc.