If you want to make a killing on Wall Street, social media may be the secret to your success. Researchers at Indiana University and the University of Manchester have found that the moods expressed in Twitter feeds can accurately predict some changes in the Dow Jones Industrial Average three or four days before they occur. As discussed in their paper recently uploaded to arXiv, computer scientists studied 9.8 million tweets from 2.7 million users from 2008. A self-organizing fuzzy neural network (SOFNN), basically a type of AI, trained to use sentiment analysis techniques and this tweet data could predict changes in the Dow with an accuracy of 87.6%. Pretty incredible. Without knowing if there's a real causal link between social media and the stock market, it's unclear if this technique is a new means of predicting the Dow, or just a handy correlation. With trillions of dollars at stake, however, you can bet that investors are going to try and find out.

Before we dive into the juicy facts of this study, we should take a healthy grain of salt. There have been many other social phenomenon that researchers have claimed could predict the stock market. Sports events like the Super Bowl, the NY Mets winning the playoffs, or a horse taking the Triple Crown have all been linked to determining how the Dow will rise and fall. Gains and losses in the month of January have been said to have greater than a 90% success rate in predicting how the rest of the year will proceed. But this is all just correlation, not causation. As far as anyone can prove, none of these factors, even the success in January, is actually impacting the Dow directly. It could be that sentiments on Twitter are being caused by the same things that impact the stock market, or it could just be another bizarre correlation.

The study, however, did look at a lot of data in new ways, and there's a chance it did discover a strong link. The three authors, Johan Bollen and Huina Mao at Indiana University, and Xia-Jun Zeng at University of Manchester have all explored the impact and emotional sentiments expressed in social media. For the Twitter-Dow research they used two pieces of analytic software. OpinionFinder is a freely available program that rates the general level of positive/negative feelings in social data. The authors also developed a new system that uses data collected by Google in 2006 (on billions of webpages) that can label information with one of 964 different emotional terms. They combined the Google data with a Profile of Mood States, to create their GPOMS program that takes those 900+ labels and groups them into 6 different 'mood dimensions': Calm, Alert, Sure, Vital, Kind and Happy. So, OpinionFinder tells you if a bunch of tweets are positive or negative, and GPOMS categorizes them as Calm, or Alert, or Sure, etc.

Researchers used 7 different ways of analyzing Twitter feeds. OpinionFinder (top) gave an overall positive or negative score for sentiments. Other fields, like Happy, Alert, Sure, etc came from the GPOMS analysis. Of these, only Calm was a reasonable predictor for the Dow.

Bollen, Mao, and Zeng fed the 9.8 million tweets into OpinionFinder and GPOMS and compared the trends they found to the Dow. The emotional lens that seemed to have a strong correlation to the stock market was 'Calm'. When the calmness of tweets changed, two to six days later the Dow would fluctuate in about the same way. 'Calm + Happy' also seemed to be able to predict the Dow. In the graph below you can see how the 'Calm' trend delayed by three days overlaps with the Dow trend with remarkable frequency. Let's take a step back and think about that for a moment. If you see tweets becoming calmer and calmer, then a few days later chances are the Dow will rise. If the levels of calmness drop off, it might be time to take your money out of stocks.

That's pretty frakkin' nuts.

You could try to explain it. Perhaps the tweeting public cares deeply about the stock market and the calmer they feel the more likely they will be to invest their money. The authors mainly avoid the attempt to understand why calmness and Dow are linked, merely stating at the end of the paper that the relation needs to be researched further.

When delayed by three days, the trends in GPOM's 'Calm' field overlapped meaningully with the Dow (top). The trends are separated for better viewing in the middle and bottom graphs. Events that the public can't predict, like the bank bail out pointed to in the top, can disrupt the accuracy of the 'Calm'-based SOFNN model.

The team used a self organizing fuzzy neural network (SOFNN) to learn from the emotional data and predict behavior in the Dow. According to Bollen's statements to Indiana University News, this SOFNN was five-tiered system capable of rewiring itself while it learned. Still, Bollen refers to this SOFNN as a fairly basic economic analysis AI. Basic or not, the SOFNN could use the emotional tweet data to predict the changes in the 2008 Dow data with an accuracy of 87.6%. That's bankable success. To convince themselves that they had meaningful data (and not another Super Bowl style superstition) the team took a random sample of 20 days and calculated how likely it was that a prediction program could be 87.6% successful. The odds are just 3.4%. In other words, it could just be dumb luck that 'Calm' and Dow are related, but the chances are slim. There's probably something meaningful in this approach to predicting the stock market.

The overlap for 'Calm' and Dow isn't perfect, but it doesn't have to be. Rather than try and create a program that simply uses tweet analysis to predict stocks, the research team was interested in augmenting other prediction algorithms. As stated earlier, the SOFNN is a fairly basic model for the stock market. More sophisticated programs may have much better luck when combining the Twitter data with information about the economy and other factors. The authors are hoping to upgrade their programs in the near future.

So could we one day use Twitter feeds to predict the Dow? Maybe. The authors acknowledge there are some limitations they will need to account for as they move ahead with their research. They didn't separate tweets according to geography, and it's unclear if they should. Does globalization mean that all tweets should be counted, or should researchers focus on just those from the US? Twitter already skews toward English speakers and US residents, but that is likely to change in the years ahead. Also, while they used two different methods for analyzing tweets (GPOMS and OpinionFinder) no one really knows what the 'real' emotional sentiment on Twitter is at any given moment. Is GPOMS really a good measure for how calm people are on twitter? If not, would a more accurate measure for calmness actually be a better predictor for the Dow? Again, the authors think it will take more research to find out. Finally, the authors acknowledge that even though the GPOMS 'Calm' trend and Dow are strongly correlated, there isn't a clear explanation for why this should be.

I can't predict whether or not tweets will one day help us cheat the stock market, but I do know one thing for certain: there's going to be a lot more scientific analysis of social media. The content of social media is so extensive, and so cheap to collect, that it will prove to be a very seductive data set for scientists to explore. Already there are free Twitter analysis programs that can help you understand the public mood there. New programs will let us analyze other parts of our lives for similar information. Centamental eavesdrops on casual conversations and translates them into sentimental data. In the future we could analyze all aspects of our lives, virtual or real, to better understand how we'll invest. Or maybe we'll use such information to predict wars, elections, and everything else. To some extent social media is an expression of who we are and how we feel. The more we know about it, the more we know how we will act. It may even be possible that influencing social media could directly impact the things it predicts - like the Dow. ...Are you thinking what I'm thinking? Quick everyone, start tweeting really calm thoughts! Maybe we can save the economy.

[image credits: Bollen et al 2010]
[sources: Bollen et al 2010 (PDF
via arXiv), IU News]