This AI Learned the Design of a Million Algorithms to Help Build New AIs Faster

The skyrocketing scale of AI has been hard to miss in recent years. The most advanced algorithms now have hundreds of billions of connections, and it takes millions of dollars and a supercomputer to train them. But as eye-catching as big AI is, progress isn’t all about scale—work on the opposite end of the spectrum is just as crucial to the future of the field.

Some researchers are trying to make building AI faster, more efficient, and more accessible, and one area ripe for improvement is the learning process itself. Because AI models and the data sets they feed on have grown exponentially, advanced models can take days or weeks to train, even on supercomputers.

Might there be a better way? Perhaps.

A new paper published on the preprint server arXiv describes how a type of algorithm called a “hypernetwork” could make the training process much more efficient. The hypernetwork in the study learned the internal connections (or parameters) of a million example algorithms so it could pre-configure the parameters of new, untrained algorithms.

The AI, called GHN-2, can predict and set the parameters of an untrained neural network in a fraction of a second. And in most cases, the algorithms using GHN-2’s parameters performed as well as algorithms that had cycled through thousands of rounds of training.

There’s room for improvement, and algorithms developed using the method still need additional training to achieve state-of-the-art results. But the approach could positively impact the field if it reduces the energy, computing power, and cash needed to build AI.

Automating AI 

Although machine learning is partially automated—that is, no one tells a machine learning algorithm exactly how to accomplish its task—actually building the algorithms is far more hands on. It takes a good deal of skill and experience to tweak and tune a neural network’s internal settings so that it can learn a task at a high enough level to be useful.

“It’s almost like being the coach rather than the player,” Demis Hassabis, co-founder of DeepMind, told Wired in 2016. “You’re coaxing these things, rather than directly telling them what to do.”

To reduce the lift, researchers have been developing tools to automate key steps in this process, like, for example, finding the ideal architecture for a new algorithm. A neural network’s architecture is the high level stuff, like the number of layers of artificial neurons and how those layers link together. Finding the best architecture takes a good bit of trial and error, and automating it can save engineers time.

So, in 2018, a team of researchers from Google Brain and the University of Toronto built an algorithm called a graph hypernetwork to do the job. Of course they couldn’t actually train a bunch of candidate architectures and pit them against each other to see which would come out on top. The set of possibilities is huge, and training them one by one would quickly get out of hand. Instead, they used the hypernetwork to predict the parameters of candidate architectures, run them through a task, and then rank them to see which performed best.

The new research builds on this idea. But instead of using a hypernetwork to rank architectures, the team focused on parameter prediction. By building a hypernetwork that’s expert at predicting the values of parameters, they thought, perhaps they could then apply it to any new algorithm. And instead of starting with a random set of values—which is how training usually begins—they could give algorithms a big head start in training.

To build a useful AI parameter-picker, you need a good, deep training data set. So the team made one—a selection of a million possible algorithmic architectures—to train GHN-2. Because the data set is so large and diverse, the team found GHN-2 can generalize well to architectures it’s never seen. “They can, for example, account for all the typical state-of-the-art architectures that people use,” Thomas Kipf, a research scientist at Google Research’s Brain Team in Amsterdam, recently told Quanta. “That is one big contribution.”

After training, the team put GHN-2 through its paces and compared algorithms using its predictions to traditionally trained algorithms.

The results were impressive.

Traditionally, algorithms use a process called stochastic gradient descent (SGD) to gradually tune a neural network’s connections. Each time the algorithm performs a task, the actual output is compared to the desirable output (is this an image of a cat or a dog?), and the network’s parameters are adjusted. Over thousands or millions of iterations, training nudges an algorithm toward an optimal state where errors are minimized.

Algorithms using GHN-2’s predictions—that is, with no training whatsoever—matched the accuracy of algorithms that were trained with SGD over thousands of iterations. Crucially, however, it took GHN-2 less than a second to predict a model’s parameters, whereas the traditionally trained algorithms took some 10,000 times longer to reach the same level.

To be clear, the performance the team achieved isn’t yet state-of-the-art. Most machine learning algorithms are trained much more intensively to higher standards. But even if an algorithm like GHN-2 doesn’t get its predictions just right—a likely outcome—starting with a set of parameters that is, say, 60 percent of the way there is far superior to starting with a set of random parameters. Algorithms would need fewer learning cycles to reach their optimal state.

“The results are definitely super impressive,” DeepMind’s Peter Veličković told Quanta. “They basically cut down the energy costs significantly.”

As billion-parameter models give way to trillion-parameter models, it’s refreshing to see researchers crafting elegant solutions to complement brute force. Efficiency, it seems, may well be prized as much as scale in the years ahead.

Image Credit: Leni Johnston / Unsplash

 


 

Looking for ways to stay ahead of the pace of change? Rethink what’s possible.  Join a highly curated, exclusive cohort of 80 executives for Singularity’s flagship Executive Program (EP), a five-day, fully immersive leadership transformation program that disrupts existing ways of thinking. Discover a new mindset, toolset and network of fellow futurists committed to finding solutions to the fast pace of change in the world. Click here to learn more and apply today!

Jason Dorrier
Jason Dorrier
Jason is editorial director of Singularity Hub. He researched and wrote about finance and economics before moving on to science and technology. He's curious about pretty much everything, but especially loves learning about and sharing big ideas and advances in artificial intelligence, computing, robotics, biotech, neuroscience, and space.
RELATED
latest
Don't miss a trend
Get Hub delivered to your inbox

featured