The neural networks that power today’s AI are incredibly powerful, but training them can require entire server farms and huge amounts of energy. A new approach from IBM suggests we may be able to slash that dramatically by reducing the number of bits used to carry out their calculations.
Bits are the most basic units of information in a computer and can exist in one of two states—1 or 0, on or off. Many of the numbers computers have to deal with are much bigger than 1 or 0, though, so computers lump multiple bits together to represent them. You’ve probably noticed that a lot of software is described as either 32-bit or 64-bit. That’s simply telling you how many bits it uses to encode each chunk of data it handles.
The more bits you use, the more precise that piece of data can be. The easiest way to think about how this might be useful is in dealing with numbers. If you were trying to encode Pi with a 64-bit computer, you could stretch it out to many times more decimal places than a 32-bit computer.
For programs that have to deal with very large or very small numbers, like those that model the climate or quantum physics on supercomputers, this extra precision can be very useful. Even in consumer computers, it can come in handy for more complex tasks like video editing or gaming.
But not everything needs to run at such high precision, and that includes neural networks. While the standard data size for deep learning is still 32-bit, it’s becoming increasingly common to run neural networks in 16-bit. A wealth of research has shown that with careful design, you can go even lower without any significant loss in accuracy.
This is highly desirable, because lower-precision data requires less memory and less bandwidth, which can speed up calculations and reduce the energy required to carry out operations. That could be a major boon for an industry that is pumping out most of its headline-grabbing feats by building ever-larger and more power-hungry models.
Now researchers from IBM have taken this trend to the extreme. In a paper presented at the prestigious AI conference NeurIPS, the researchers showed that they could train AI to tackle a variety of image and language tasks with limited loss in accuracy using only four bits. This enabled them to speed up the training process more than sevenfold.
The main challenge the researchers had to solve is the fact that different parts of a neural network deal with numbers on very different scales. While the weights between neurons are normally some decimal ranging from -1 to 1, elsewhere you can get values as high as 1,000 or as low as 0.001.
That’s a problem, because you need to come up with a way of representing numbers of vastly different magnitudes using only the 16 degrees of freedom that your 4-bit data scheme provides. Any system that can cover a range as broad as 1,000 to 0.001 is inevitably going to lose any precision when dealing with numbers between -1 and 1.
The researchers’ solution was to use a logarithmic scale, a common mathematical tool for representing values over large ranges. On these scales numbers don’t increase in equal increments, but jump up by a set factor each time, most commonly a factor of 10. So while a standard linear scale would go “1, 2, 3,” a logarithmic one could go “1, 10, 100.”
In this case the researchers used a factor of four for their scale. But importantly, they also used an adaptive scaling technique that enabled them to refit this logarithmic scale to different parts of the network that operate over different ranges of values.
This resulted in a massive speed-up that could significantly slash the energy, computing power, and time required to train neural networks. That could make it possible for resource-constrained academics to build the kinds of giant models coming out of companies like Google and OpenMind, or even train neural networks on edge devices like smartphones.
But there is a major caveat. The research was done in a simulation, not on real chips, because four-bit AI chips don’t exist yet. Study leader Kailash Gopalakrishnan told MIT Tech Review that they are on their way, though, and IBM should have four-bit hardware ready in three to four years.
They’re not the only ones working on this problem, either. Most companies building dedicated AI hardware are busy boosting the low-precision capabilities of their chips. It seems like the future of AI looks increasingly fuzzy.