The Race to a Billion Billion Operations Per Second: An Exaflop by 2018?

381 8 Loading

Control Data Corp’s (CDC) first supercomputer, the CDC 6600, operated at a speed of three megaflops (106 floating-point operations per second).  A half century on, our most powerful supercomputers are a billion times faster. But even that impressive mark will inevitably fall. Engineers are eyeing an exaflop (1018 flops)—and some think they’ll get there by 2018.

What’s so special about an exaflop? Other than the fact it’s a quintillion operations a second? Simple. We can always use more computing power.

Supercomputers enable scientists to model nature—protein folding, the Big Bang, Earth’s climate—as never before. China’s Tianhe-1A (2.57 petaflops) recently ran a 110 billion atom model through 500,000 steps. The model was a mere 0.116 nanoseconds in real time,  and it took the machine three hours to complete.

Yet even the simplest natural systems have vastly more particles playing out over vastly greater timescales. There are roughly as many molecules in ten drops of water as there are stars in the universe.

So, while a petaflop is good, an exaflop is better.

Further, Henry Markram’s Blue Brain Project estimates a full simulation of the human brain would require about an exaflop. Might insights gleaned from such a simulation lead to breakthroughs in AI? Maybe. (See here for more on that debate.)

Whether it leads to a breakthrough in AI, or a deeper understanding of the human brain, or is just a killer scientific model-maker, the first exaflop machine will be a data-processing beast. And world powers are gunning for it.

Japan’s K computer, recently overtaken by IBM’s Sequoia.

International competition for the top spot is as tight as it’s ever been. China knocked IBM’s Jaguar off the top of the pile with their Tianhe-1A in 2010 (2.57 petaflops). Then it was Japan’s turn to lead the pack with their K computer in 2011 (10.5 petaflops). And the US retook the lead with IBM’s Sequoia in 2012 (16.3 petaflops).

The pace is blistering. Today’s top speed (16.3 petaflops) is 16 times faster than its counterpart four years ago (1.04 petaflops). And Oak Ridge National Laboratory is converting its ex-champion Cray Jaguar into the 20-petaflop Titan (operational later this year). It’s believed Titan’s capacity will be upwards of 35 petaflops.

But even at 35 petaflops, an exaflop (1,000 petaflops) seems distant. Is 2018 a realistic expectation? Sure, it’s plausible. It took 21 years to go from megaflops in 1964 (CDC 6600) to gigaflops in 1985 (Cray 2). But only 11 years to break the teraflop barrier in 1996 (ASCI Red). And just 12 years to enter petaflop territory in  2008 (Roadrunner).

Clocking an exaflop by 2018 would be a decade’s development—a record pace, but not too far outside the realm of reason. The below chart maps supercomputers as long as they’ve been officially ranked by Top500. Today’s pace puts processing power within range of an exaflop by 2018.

Source data: www.top500.org

But if supercomputer speed can continue increasing at the current pace is debatable.

“The laws of physics are hunting us down,” says Mike McCoy of Lawrence Livermore Natinal Laboratory. “One of the things that make processors work faster is increasing the frequency of the processors. We found that we can’t increase the frequency like we used to simply because the amount of heat generated would melt the computer.”

So, if you can’t make the parts faster, use more of them, right? You bet. The fastest computer in the world, IBM’s Sequoia, packs 1.6 million processors.

The problem is energy consumption increases in lockstep with size, lacking corresponding efficiency gains. Sequoia operates on an average six to seven megawatts; each of its 96 racks radiates enough heat to power 50 single-family homes; and the system requires 3,000 gallons of water a minute to carry all that heat away.

Apart from massive energy requirements, engineers can’t just keep adding processors indefinitely.The more cores they add, the more difficult it is to synchronize them all. At some point, scaling up further will realize diminishing returns.

That’s why the US Defense Advanced Research Projects Agency (DARPA) is funding a project titled the Power Efficiency Revolution for Embedded Computer Technologies (PERFECT). PERFECT will explore alternative technologies to increase processor efficiency. Two such technologies are already in development.

Titan will use NVIDIA Tesla GPUs. Credit: NVIDIA

The first approach increases overall efficiency by parceling out special duties (e.g., graphics) to specialized processors (GPUs or graphics processing units). It’s called “massive heterogeneous processing concurrency.” And in fact, Titan will make use of this approach. The second idea addresses power. Near threshold voltage (NTV) tech significantly lowers operating voltage to make a more energy efficient chip.

Both approaches are yet young and have their obstacles. It’s difficult to evenly spread work across large numbers of specialized chips. And chips operating at lower voltages flirt with the transistors’ on/off point, making it paramount to precisely control current leakage—a difficult thing to do.

Nevertheless, Oak Ridge Laboratories, home to the soon-operational 20-gigaflop Titan, is optimistic. Oak Ridge engineers foresee “two systems beyond Titan to achieve exascale performance by about 2018.” The first will be a 200-petaflop prototype, using exascale technologies. The second will be the real deal—an exaflop behemoth.

And keep in mind, even as the boundaries at the top are pushed, the speeds already broken are more commonly reached. Roadrunner was the world’s only petaflop machine in 2008. As of June 2012, there are 20 computers operating at a petaflop or more. You can still do a lot with all that power. IBM’s Jeopardy! champ, Watson, operates at a mere 80 teraflops, yet, with some ingenious software it defeated  humans at their own game.

Despite the challenges, exascale computing seems attainable at some point in the next ten or fifteen years. How much further we go will depend on fundamentally new innovations. But when hasn’t that been the case? Human ingenuity is forever making the impossible possible.

Discussion — 8 Responses

  • Dave Sill November 1, 2012 on 1:53 pm

    There’s not such thing as a FLOP, MegaFLOP, PetaFLOP, or ExaFLOP. The base unit is FLOPS–floating point operations per second. A FLOP would be a floating point operation per. Per what? FLOPS isn’t plural, it just sounds plural.

    • vmagna Dave Sill November 1, 2012 on 6:22 pm

      Yes there is such thing as a flop
      http://en.wikipedia.org/wiki/Flop

      If you were to phrase your statement as “there is no such thing as FLOP…. As a term referring to computing operations” you would have made sense.

      Correction – corrected

      • turtles_allthewaydown vmagna November 2, 2012 on 1:45 pm

        Actually Dave is correct, as he used the acronym FLOP, not the word Flop.

        Correction – correction – corrected

        Okay, before you get totally nitpicky, there are other acronyms for FLOP, and some might say there is the term FLoaiting-point OPeration. But it has no reference of time, and doesn’t make sense for measuring speed such as in the article above.

  • Tracy R. Atkins November 2, 2012 on 7:50 am

    Didn’t Kurzweil estimate the human brain at operating around 10^16 CPS, on the lower, and around 10^19 at the upper theoretical limit?

    • turtles_allthewaydown Tracy R. Atkins November 2, 2012 on 1:40 pm

      Maybe so, but having the raw power means nothing unless we can program it properly. Software is lagging behind the hardware in many aspects. The understanding of the brain itself is even further behind, so we sure aren’t ready to fully model a real brain, even when we get an exaflops computer.

  • eldras November 3, 2012 on 4:41 am

    quantum computers are going to calculate near infinities in a few seconds. Due to fix error problems by 2022 ibm.
    Super recursive algorithms may outperforms qc’s.

    Supercomputers need radical designs to compete.

  • StableAnarchy November 7, 2012 on 10:23 pm

    “Oak Ridge Laboratories, home to the soon-operational 20-gigaflop Titan”

    * petaFLOPS

  • Gary Bernstein March 28, 2013 on 10:15 pm

    Brain FLOPS under-estimated? Astrocyctes missing:
    ==
    Astrocytes seem left-out of brain FLOPS estimates which take, IIRC, num_synapses * frequency ~= 38 Peta-FLOPs (IBM?)

    Is it reasonable to leave out astrocytes? If not, by what factor would the FLOPs estimate of the brain need be increased? Astrocyctes are the most numerous of brain cells, and interconnect with several to 100s thousands synapses.

    Why would astrocytes need to be considered?

    1) Astrocyctes affect synaptic signal strength, and thereby neuron firing. Astrocyctes communicate with each other and with synapses, so shouldn’t the processing represented by the collective states therein need to be considered as more than just a single alteration of the “floating point” value, and, rather, an alteration of the calculation necessary for the value?

    2) Something to also consider) Recent discoveries such as, e.g., augmented intelligence in mice injected with human astrocytes.