It's a good day for sports fans, but a weird one for sports writers. North Carolina-based startup Statsheet has launched 347 new websites, one for every Division I college basketball team in the US. Now every team, no matter their size or budget, will have their own dedicated sports coverage just like the big names in the NCAA such as Duke or UNC. Statsheet is even providing a journalist to cover each game, make predictions, and put the teams stats in perspective. There's one catch: that journalist is a computer program. Using their own proprietary software, Statsheet automates all of their sports coverage. Hundreds of websites, thousands of charts and graphs, and miles of writing all generated without humans. It's astounding. I had a chance to speak with CEO Robbie Allen about the launch of Statsheet's college basketball network, and their future projects. Watch out sports writers, there's not a team, league, or game that will be left unexplored by the new wave of automated journalism.
When Allen first announced the Division I college basketball project back in March, he didn't anticipate how well it would be received. The first wave of feedback has been "95% positive", much better than the 50/50 split he was hoping for. The big attraction seems to be that all teams in the league, no matter how meager their chances to make it to the playoffs, get their own dedicated coverage. And we're not talking a perfunctory discussion here. There are colorful graphs, detailed analysis, and rosters for everyone - from the biggest names right down to the most obscure. Many games, though not all, get detailed write ups. Computer programs turn data from box scores into full sentences that put the reader in the game. How good is this writing? Certainly better than what you'd expect. Here's a passage from an article discussing the predicted success for a team in the upcoming year:
PENN - The Quakers are ready to return to their glory days under head coach Jerome Allen, who took over after Glen Miller was let go in December. Allen went just 6-15 as an interim following Miller's release, but the poor record was mostly due to a depleted roster. Penn lost a few players to injury, but now fully healthy, the Quakers should be ready to compete for an Ivy League championship. Leading the charge is scoring champ Zack Rosen, who tallied 17.7 ppg and 4.4 apg last season. If Tyler Bernardini can stay healthy, he will join Rosen to help form one of the elite backcourt tandems in the league. Jack Eggleston is another important player for the Quakers and he produced 13.0 ppg, 6.4 rpg a year ago.
*UPDATE 10.17.10 Robbie Allen just contacted me and let me know that I had actually selected a paragraph from Statsheet's The Sports Network, which is not automated. Below is a fully automated story. Actually, I'm kind of glad this happened, because it gives you an idea of how closely the automated and non-automated content resemble one another:
The first game of the 2010-2011 season for North Carolina basketball will be in Chapel Hill on November 12 against Lipscomb. Expectations are high that this year's Tar Heels team is an improvement on last year's. They'll be bringing back a group that played 43% of last season's minutes and adding the efforts of 3 Top 100 recruits, including #1 Harrison Barnes. North Carolina has the largest deficiency in rebounding where they lost 64.3% of their output. Equally as concerning is three point shooting, where they also lost a big 63% of last year's output. The AP gives the Tar Heels a #8 ranking in their preseason AP Top 25 poll. They weren't ranked in last year's final poll. North Carolina closed out the last season with an overall record of 20-17, placing 9th in the ACC with their 5-11 conference record. The Tar Heels lost to Georgia Tech 62-58 in the ACC tournament. They then went to the National Invitational Tournament (NIT) as a #4 seed, losing in the Championship game to Dayton, 79-68.
Answer honestly: would you have guessed that was written by a computer? I definitely didn't. Not every game preview, season prediction, and post-game discussion is going to be this good though, right?
Well, I pretty much chose this paragraph at random from the Statsheet network, so they just might be. Allen stressed to me that this project only really began six months ago. So, however good you think the writing is or isn't now, it's going to get better as they continue to improve their algorithms.
I'm also impressed with Statsheet's graphs which can be easily embedded for use in your blogs and personal websites. Here's a typical breakdown of player impact from a recent Troy Trojans game:
As a fan of some teams with mediocre records (Go Crimson! Go Owls!) I'm pretty excited by all this automated sports journalism. Luckily for me, there's going to be much more available soon. Allen says that 2011 will likely see the arrival of most (or all) of the professional leagues in the US. NFL, MLB, NHL, you name it, Statsheet will be covering it.
There's some good money to be made in that expansion. Besides ad sales, Statsheet generates revenue through merchandise, ticket sales, and business to business services. Allen says its possible we'll see Statsheet subscription services, and licensed content as well. Again, all of this will be automated.
Allen has plans to expand Statsheet not just in terms of geography, but time as well. One of the amazing features of automated writing is that it's not dependent on the memory of a human journalist. Give Statsheet data from a twenty year old game, and it can write up an article for it as easily as it does with tomorrow's double header. With the addition of Major League Baseball Allen says that they'll be able to reach back to the late 19th Century. Imagine having a play by play for a game from 1898.
Already Statsheet has data reaching back to the mid 90s, and it can use this information to provide historic perspective to current events in the 'Game Notes' section at the bottom of every story. When was the last time a player at UNC scored three double-doubles in back to back games? Statsheet will let you know.
At their core, artificial writing programs take raw data and convert it into a narrative that humans like to read. That basic skill could be used in real estate, finance, or even national security. Yet the benefits of automated journalism are greater than the breadth of its applications. It can also provide a different kind of perspective, maybe even a better one. Allen pointed out that a single writer can only write from their own head. Sure, they might have a fact checker, or an assistant, but essentially all the words are coming from one brain. Automation is much more collective. Many different developers work on shaping the program's word choices, its flow, and the importance it places on different facts. Add this to a perfect memory, and you get an artificial writer that has some clear advantages over its human counterparts.
As we've discussed before, there really isn't a job out there that is 'safe' from automation. Computer programs, as they become more sophisticated, are going to expand into every niche employment opportunity out there. Clearly the best human sports writers can outperform Statsheet's automated articles, but what about all the mediocre writers? To the average reader, Statsheet will seem just as good...maybe better. This college basketball season is the start of a growing trend. Give it time, and human journalists could be the exception, not the norm. That reminds me, I need to polish my resume.
[image credit: Statsheet]
[sources: Statsheet press release and CEO Robbie Allen]