Anyone who’s spent even a little time on the internet lately may feel like there’s a little too much “debate”—much of which descends into ad hominem insults.
At the same time, there’s increasing concern around automation displacing humans in the world of work. But this has not deterred IBM from pursuing “Project Debater,” which was first tested on the world stage in San Francisco in June.
Naturally, when you think of IBM and AI, you think of Watson, the program capable of answering natural-language questions that famously beat human contestants at Jeopardy! in 2011. IBM sees Project Debater as a natural successor to Watson.
Both are, in turn, a natural progression from the search engine. Search engines can produce relevant results to key terms. Watson can produce information from open-ended questions by assessing context and using some degree of logic and reasoning. Project Debater aims to not only find relevant information, but also cogently and coherently use it in a debate on a complex topic.
It’s a new level of natural language comprehension and engagement. If the Turing Test for true AI is a machine that can convince people it’s human, it makes sense to try to be persuasive.
Understanding context has typically been difficult for conversational AI, with many famous examples limited to simple call-and-response. Yet it is crucial to be convincing in a debate. In order to allow an algorithm to debate, IBM had to break the art of debating down into an algorithm.
In the challenge, the AI was given one of forty previously unseen debate topics. It then constructed a four-minute opening argument by scanning a vast database of 300 million relevant news articles and stitching together the components that it deemed to be relevant to the topic at hand.
Natural language processing is allowing algorithms to construct ever more realistic dialogue, as Google Duplex showed—but it’s a big leap from fooling one person during a highly formulaic conversation about restaurant bookings to arguing with a human in order to convince a room full of people.
Project Debater has to deal with standard text to speech and NLP problems as well as the rules of debate. Yet the structured environment of a formal debate likely helped the software persuade nine audience members that it was correct on space exploration.
After the opening statements, the debate moved onto rebuttal. IBM’s research has focused on the semantic structure of arguments, breaking them down into individual claims and evidence offered in support. A 2014 research paper demonstrated an algorithm that could detect context-specific claims.
The example used was the argument that violent video games should not be sold to minors. An underlying claim might be that these games promote violent behavior, or desensitize the players. You can see how an algorithm that can identify this specific claim might be able to analyze a large corpus of related material for a rebuttal,and possibly provide supporting factual evidence for that rebuttal.
Indeed, subsequent algorithms detected evidence and categorized it into “expert,” “anecdotal,” and “study.” Here, you can see the beginnings of nuanced argument: rebutting expert evidence with anecdotes may be unpersuasive for a less emotionally-charged debate, but if things get heated, citing academic studies might prove less persuasive than a narrative story that supports your point.
This may seem to be a simple, algorithmic approach to argument: identify the claims your opponent is making, and rebut each in turn by analyzing similar arguments in the database, or perhaps sourcing expert opinions.
But each link in the chain of “argument mining” requires complex reasoning. Sentiment analysis—like that developed by Stanford—is crucial for understanding whether a particular quote from an expert actually supports your argument or not.
Similarly, you need some way of analyzing whether individual claims found in the literature support or rebut your overall argument. All the while, linguistic subtleties such as idioms can prove to be stumbling blocks, and need to be taken into account.
An endless string of rebuttals that doesn’t present anything positive is not the most convincing argument; IBM also developed a means of creating “de novo” claims and arguments related to the topic at hand by scanning its corpus, introducing them as points for the opponent to rebut.
The algorithm also includes a means of scoring the persuasiveness of its arguments, which may need tinkering in the case of one “de novo” claim that was apparently sourced from the opinion pages of Britain’s socialist newspaper, The Morning Star. To a certain degree, the ancient law of “garbage in, garbage out” applies—but one could say the same for sources used by human debaters.
While you can attempt to formalize the logical rules of debate, and avoid the dreaded logical fallacies or internal inconsistencies, a perfectly logical argument may not be the most persuasive. People are generally far less logical than we would like to think we are.
It is perhaps with this in mind that IBM chose to make its AI more sympathetic. At one point, when discussing telemedicine, the AI tells a joke: “I would say that this makes my blood boil, but I have no blood.” Coming up with the perfect zinger to get the audience onside requires a little work; earlier versions of the bot made inappropriate jokes about sex education and having children.
If you listen to debates on specific motions from Intelligence Squared, you will quickly realize that often the main technique used is to attempt to reinterpret the terms of the debate, and to force the opposing side into arguing for something indefensible.
While this may technically be an example of the straw man fallacy—distorting your opponent’s point of view to make it easier to attack—it can nonetheless persuade audiences if the opposing team falls into the trap. Similarly, unconventional techniques such as “gish gallop”—hurling as many arguments as possible at your opponent so that they’re forced to spend time rebutting them, even if they are internally contradictory—can prove effective, especially if the audience is partisan to begin with.
Given that Project Debater’s core approach seems to be identifying, researching, and rebutting the individual claims made by the opponent, a strategy that relies on making many claims might prove successful—or perhaps Project Debater would have the patience to calmly and methodically pick them all apart. Bad-faith debating in general will always be more difficult for an algorithm to deal with.
Perhaps in a few years, even procrastinating by arguing about politics on Twitter will be a task for AI. As Project Debater points out, blood pressure is no cause for concern.
IBM views its project not as a means of winning every online argument by creating a digital Cicero. Instead, they aim to help humans understand the nature of logical argument.
According to their website, “Debate enriches decision-making, helping people weigh the pros and cons of new ideas and philosophies. It lies at the core of civilized society. We debate not only to convince others of our own opinions, but also to understand and learn from each other’s views.” It is clear that the general rule that the quest for AI forces us to think about and understand our own intelligence applies to the logic of debating and rhetoric, too.
In this world—far from being the last word in any debate—people engage with Project Debater as a tool for civilization and democracy. Discussions with the AI could allow us to interrogate our own worldviews and opinions and hence become better informed and make better decisions.
Here, again, we find the idealism of science fiction: artificial intelligence as an oracle, a Deep Thought that humans can consult for rational answers to our complex questions. Whether people will want to listen to a machine’s reason—or even contemplate any reasons aside from their own—remains to be seen.