IBM’s ‘Watson’ Takes on Jeopardy! You Can Challenge the Computer to a Trivia Duel
Back in 1997, IBM made history when its computer Deep Blue defeated Grandmaster Garry Kasporov in a game of chess. Now their new questioning answering system, Watson, is looking to do the same with Jeopardy! Run on multimillion dollar super computers, Watson solves questions by analyzing their language and finding possible answers in millions of documents stored in its memory. It is not connected to the internet. IBM is looking to pit Watson against former Jeopardy! champions in a broadcast match sometime soon (perhaps this fall). Clive Thompson from The New York Times recently wrote an amazing article on Watson, and the Times has a special page where you can challenge the machine to a match. Good luck, you'll need it. You can scout your competition by watching Watson in action against real human contestants in a video from IBM below.
Jeopardy! questions prove especially difficult for computers because they contain so much wordplay and twisted phrasing. Watson has advanced skills in natural language processing, and is able to parse the relevant portions of a question. IBM wants to create a computer that can understand the way humans talk. That will have big applications in the future, as we try to build virtual assistants (imagine super-smart versions of the Siri iPhone App). As we mentioned before, automation is going to leak into many fields you wouldn't expect. A computer that can answer Jeopardy questions could cut through bureaucratic red tape, do research for lawyers, or answer questions for doctors. Watson may make its debut playing humans on Jeopardy! but no mistake - IBM is building a system with a far greater potential.
For now, however, that potential is still limited. Watson works by rapidly searching through its millions of stored documents and finding associations between words and phrases. 'Shakespeare' often appears with 'Hamlet' and 'Midsummer Nights Dream' but also with 'William' and 'England'. Likewise, 'pen' is linked to 'writing' and 'ink'. All these associations help Watson answer a Rhyme-Time clue like "Shakespeare's writing instrument" as "What is Will's quill?"
To determine the right associations, Watson makes evaluations. It finds all the possible connections between relevant words using several different algorithms and then weighs them according to how often they come up in its database. It prefers answers that are found by multiple algorithms and it double checks its answers by running them back through its system. The analysis of possibilities, probabilities, and double-checking lets Watson not only know what the answer might be, it let's it evaluate how right it thinks it is. If it's not confident, it doesn't buzz in. That sort of decision making is a sign of a great Jeopardy! player and as you'll see in the following clip, Watson has some serious trivia chops:
When you play against Watson on the New York Times site, you're actually playing against pre-recorded guesses - so questions don't change. That means if you lose the first time through you can go back and answer all the questions with the correct answers. While you don't get a sense of Watson's speed, you can get an idea of how it evaluates answers by looking at the probabilities (bar graph) it associates with each one. Pretty cool.
Of course, you'll also notice that Watson makes some big mistakes as well. Tricks in semantics, and reasoning still trip up the machine. But that's okay, it's a work in progress. The DeepQA project was only begun in 2007, headed by David Ferrucci, and IBM only announced the Jeopardy! Challenge last year. Thompson does an amazing job describing the history of the project in greater detail in his NYT article. Watson still has several months, perhaps longer, before it must face a set of unknown Jeopardy! all-stars on television.
It will need the time to prepare. IBM categorizes what it takes to win at Jeopardy! into four basic skills: searching through a clue for the relevant portion, finding the correct answer in a vast realm of stored knowledge, evaluating the confidence in your answer, and quickly deciding whether or not to buzz in. Watson is good at all four skills sets, but humans excel at the later two skills. An all-star contestant will buzz-in before they are even sure of their answer, trusting in the five-second grace period to figure things out. Winners typically buzz in first for half or more of the questions, and get the answer right 85-95% of the time. Watson isn't at that level yet.
The human trials you see in the video took place over several days. According to the NYT, Watson carried one day, able to win 4 out of 6 games against 7 human opponents. Yet the following day it lost just as many games, once with no points. It still loses to humans with rhyming clues, and wordplay, and can get distracted by word associations that appear often but are not relevant to the question. Watson will have to get better if it hopes to beat the likes of Ken Jennings.
Whether or not Watson finds success in its eventual Jeopardy! showdown, IBM plans on marketing similar systems to companies in the next few years. In the beginning, the list of those who could afford such a machine will be short, as Watson depends on Blue Gene servers, around $1 million each. IBM executives, however, hope that in the next ten to fifteen years price performance in computing will allow a DeepQA system to become much cheaper, eventually available on machines the size of a laptop. 2025 could be the year that everyone has a Watson in their home.
But we might be experiencing the benefits of question answering systems far sooner. We've already discussed how medical AI programs could help doctors in the near future (the Xprize is even aiming to put them on your smart phone). These systems won't have Watson's level of language analysis, but they'll answer questions with quick (and hopefully accurate) results. One day, the techniques created for the DeepQA project will allow us to interact with such systems by talking just as we would with any human. Eventually we may not be able to tell the difference between the two experiences. Eventually we won't might not even care if there is a difference.
[image credits: New York Times, IBM]
[source: IBM, New York Times]