Inspired by Watson’s success on Jeopardy!, AI specialist Matthew Ginsberg wanted to see if computers could out-duel humans in another language-based game. What he created was Dr. Fill, a software program that fills in the vertical and horizontal boxes of a crossword puzzle. A few weeks ago Ginsberg pitted Dr. Fill against a field of human competitors at the American Crossword Puzzle Tournament in Brooklyn. The entry was unofficial, however, as the rules require that the $5,000 prize money must go to a human being. Singularity Hub got in touch with Dr. Ginsberg and asked him about how Dr. Fill works, and about its surprising performance in the tournament.
Just as Watson was blazing fast at Jeopardy!’s buzzer, Dr. Fill makes short work of its crosswords – at least the relatively easy ones. It finishes crosswords of average difficulty in about 30 seconds, then, like a gloating perfectionist, it spends another minute and a half to see if any of its answers can be improved upon. How’s that compare to human solve times? Even the fastest puzzle masters can’t complete a tournament-level puzzle in under two minutes. Tougher puzzles make Dr. Fill ponder for three minutes, about half as long as humans need.
Speed helps, as tournament points are won with speed, but the answers have to be right, of course.
Dr. Fill’s approach to solving crosswords resembles nothing of human strategy. As Ginsberg explains in an email, “It solves the crosswords in a very inhuman way, really relying on search more than on an understanding of what’s going on. It shares that property with all other computer game players of which I’m aware, including Watson.”
When humans tackle crosswords they use knowledge and experience. Dr. Fill uses the sheer power of statistics. When Dr. Fill first encounters a crossword with 75 words, for instance, it chooses the 10,000 or so words from its database that could possibly fit into the crossword pattern. It then uses keywords from the clues to narrow its choices further and, as its human counterparts, uses letters from intersecting words to rule out impossible solutions. If it doesn’t find a perfect match it chooses the 100 most probable words. Dr. Fill’s database draws from a dictionary database of over 10 million words, Wikipedia, and crossword puzzles it has learned in the past. Still, there remains some types of crosswords that the software simply can’t figure out. “In most years, every puzzle’s fill is either words or sequences of words,” Ginsberg says. “Sometimes there is a puzzle where the fill, for whatever reason, doesn’t have this property. Dr. Fill does poorly on such puzzles because it doesn’t know what’s going on.”
Ginsberg had reason to feel confident about Dr. Fill’s chances going into the tournament. Prior to the actual contest the program ran simulations of the prior 15 tournaments. It topped the competition three times. Tournament organizers were handing out “I BEAT DR. FILL” buttons to anyone who out-solved the software.
How well did Dr. Fill do this time? It placed 141st out of nearly 600 contestants. Hope they made enough buttons.
How could Dr. Fill be bettered by 140 sluggish-brained humans? I suggested to Ginsberg that the tournament organizers may have stacked the deck against Dr. Fill this year by choosing crosswords that were particularly unorthodox. “They did,” he responded, “but not on purpose. This year there were two such puzzles (which has never happened before).” Had the two puzzles not reared their illogical heads, Ginsberg told me, Dr. Fill would have placed 19th.
Ginsberg is a crossword aficionado himself, having created over two dozen crosswords for the New York Times. He even co-created one with the actress Dana Delany that would have given Dr. Fill some problems. Entitled “That’s Disgusting,” the crossword required the solver to add “ic” (sounds like “Ick!”) to words and phrases. One example was the clue “fancy garb for Caesar,” which the solver would have needed to think “fine tune” that had an “icky” ending, or “FINE TUNIC.”
So can Dr. Fill improve its standing by brute force, overpowering its opponents with supercomputer-sized memory and computing power as Deep Blue did? Ginsberg, who runs Dr. Fill from his notebook computer, says no. “I don’t think that more memory or computing power is really what separates me from having the best score at the ACPT. I just need more time to describe various crossword themes to [Dr. Fill], and a bit more careful analysis of the clues in some cases.”
Ginsberg is into other mind-bending pursuits aside from crosswords. He earned a PhD in mathematics at Oxford at the age of 24, then went on to teach artificial intelligence at Stanford for nine years. He has written a book entitled “Essentials of Artificial Intelligence” and is the editor of “Readings in Nonmonotonic Reasoning.” If that sounds like light stuff to you (because you’re a genius!), he co-founded the University of Oregon’s Computational Research Laboratory (CIRL). His current day job is chief executive of On Time Systems, a software company that helps clients like the US Air Force calculate efficient flight paths for their aircraft.
Using statistics for both work and play. But is there a more practical side to Dr. Fill (other than reminding its human competitors of their inevitable demise on the crossword puzzle arena, which Ginsberg is confident will happen)? “Dr. Fill is good at crosswords, but not much else,” Ginsberg admits. “I think that the real payoff from these systems is the development of new algorithms that can then be used in other search-based applications.”
[image credits: ZME Science and crosswordtournament.com]
image 1: Dr. Fill
image 2: tourney
image 3: Feyer