The job of a scientist has its fun parts, and its not-so-fun parts. Making new discoveries, understanding the way things work, and experimenting with the natural world are all pretty cool ways to spend your day. Sifting through endless files of data looking for small correlations and insight...not so much. Which may explain the popularity of the new software from Cornell Computational Synthesis Lab called Eureqa. Toted as something of a virtual scientist, Eureqa finds hidden mathematical relations in large spreadsheets of data. The software uses a technique, symbolic regression, that slowly evolves equations over time to see which best fits the information you give it. How powerful is Eureqa? Well it can derive Newton's Second Law from the motion of a pendulum without any input on the physical laws of mechanics in just a few hours. So it has Newton beat by several years. Other researchers are hoping to have Eureqa find the mathematical relations in their own work which is much more complicated than simple Newtonian physics. If successful, Eureqa could not only speed up scientific research, it could change the roles humans take in science. Check out video tutorials for Eureqa after the break.

Eureqa examines data from an experiment, and produces equations that explain what happened. Sounds like a scientist to me.
Eureqa examines data from an experiment, and produces equations that explain what happened. Sounds like a scientist to me.

Eureqa is free to download and free to use and scientists are eager to give it a try. At this moment, it's just another interesting software tool that will help researchers make new discoveries. In its success, however, are the roots of a much larger change. Programs like Eureqa could one day take over a large part of scientific work. Data analysis is a key task in any modern lab, and is the core service provided by many auxiliary companies working in major industries. Did you perform a geological survey, do some market research, or study the stars? Chances are you employed a data analyst or an entire firm of them. Now, programs like Eureqa are on their way to improving that analysis, and one day reducing the number of humans needed in the process. We've already seen software that can mimic the work of journalists, now it seems scientists can be automated to some degree as well. Sure, this could have some negative consequences (which we've discussed with Martin Ford's recent book), but it's also going to be amazingly helpful. The quicker we find the underlying mathematical equations for a phenomenon, the quicker we can learn how to harness it for everyone's good.

Anyone with Excel and a penchant for listening to Clippy the helpful paper clip can find an equation for a set of numbers in a spreadsheet. What makes Eureqa so powerful is the way that it approaches fitting equations to data. Symbolic regression takes the lessons of biology and applies them to computation. Say you have a big file of numbers that represent something cool you're researching. Now let's look at a set of random equations. None are really going to match your file of numbers very well, but some will do better than others. They are more 'fit'. In symbolic regression, those 'fit' equations are kept and varied. Their 'offspring' are then tested for fitness. Over time, the best equations survive and evolve until you have one that matches your data as perfectly as it can. Computational analysis and mathematics can now enjoy the same advantages as bacteria in a pond. Nifty.

Eureqa is a scientific tool, and that's apparent in the videos below. The first is an introduction to the experiment that would become Eureqa, the other two are tutorials on how to use the final software. Those unaccustomed to analytical software are likely to be bored. Still, if you know you're way around an equation, watching Eureqa in action is awesome.

Eureqa comes with some great features to help you analyze faster, better, stronger. You can run Eureqa across multiple computers in a network, allowing for faster processing. It also lets you watch its progress in real time, and see which equations are currently leading the pack, so you don't have to wait to know if it's getting close to a solution. It also lets you pick among the surviving equations, stopping the analysis if you find one you want to explore, and letting you compare equations for complexity as well as fitness. Eureqa can even use derivatives to give you differential equations for your data, which is an important part for producing many mathematical models.

There are some limitations. While the user interface is only designed for Windows, Eureqa can be run on Linux and Mac OSX using an emulator like WINE. Processing time is going to vary for each set of data you provide, and the tutorials recommend several hours (or an overnight run) for more complex applications. Most importantly, Eureqa is not a tool for the common human. You need to know what you're doing, how to best select seed equations, and how to interpret the results before this software will improve your research.

But make no mistake, it will improve research. And as data analysis improves, we're going to see quicker turn around between experiments and published results. Hopefully, when paired with greater communications between scientists (in forums like openwetware) this innovation will accelerate technological progress. Even if it doesn't become popular, we're bound to see other applications of symbolic regression. This sort of computing is going to change how we learn about our world. And it should give scientists more time to focus on the fun parts of their jobs.

[photo and video credit: Hod Lipson, Cornell University]