2021 Could Be a Banner Year for AI—If We Solve These 4 Problems

Shelly Fan

Jan 05, 2021

If AI has anything to say to 2020, it’s “you can’t touch this.”

Last year may have severed our connections with the physical world, but in the digital realm, AI thrived. Take NeurIps, the crown jewel of AI conferences. While lacking the usual backdrop of the dazzling mountains of British Columbia or the beaches of Barcelona, the annual AI extravaganza highlighted a slew of “big picture” problems—bias, robustness, generalization—that will encompass the field for years to come.

On the nerdier side, scientists further explored the intersection between AI and our own bodies. Core concepts in deep learning, such as backpropagation, were considered a plausible means by which our brains “assign fault” in biological networks—allowing the brain to learn. Others argued it’s high time to double-team intelligence, combining the reigning AI “golden child” method—deep learning—with other methods, such as those that guide efficient search.

Here are four areas we’re keeping our eyes on in 2021. They touch upon outstanding AI problems, such as reducing energy consumption, nixing the need for exuberant learning examples, and teaching AI some good ole’ common sense.

Greed: Less Than One-Shot Learning

You’ve heard this a billion times: deep learning is extremely greedy, in that the algorithms need thousands (if not more) examples to showcase basic signs of learning, such as identifying a dog or a cat, or making Netflix or Amazon recommendations.

It’s extremely time-consuming, wasteful in energy, and a head-scratcher in that it doesn’t match our human experience of learning. Toddlers need to see just a few examples of something before they remember it for life. Take the concept of “dog”—regardless of the breed, a kid who’s seen a few dogs can recognize a slew of different breeds without ever having laid eyes on them. Now take something completely alien: a unicorn. A kid who understands the concept of a horse and a narwhal can infer what a unicorn looks like by combining the two.

In AI speak, this is “less than one-shot” learning, a sort of holy-grail-like ability that allows an algorithm to learn more objects than the amount of examples it was trained on. If realized, the implications would be huge. Currently-bulky algorithms could potentially run smoothly on mobile devices with lower processing capabilities. Any sort of “inference,” even if it doesn’t come with true understanding, could make self-driving cars far more efficient at navigating our object-filled world.

Last year, one team from Canada suggested the goal isn’t a pipe dream. Building on work from MIT analyzing hand-written digits—a common “toy problem” in computer vision—they distilled 60,000 images into 5 using a concept called “soft labels.” Rather than specifying what each number should look like, they labeled each digit—say, a “3”—as a percentage of “3,” or “8,” or “0.” Shockingly, the team found that with carefully-constructed labels, just two examples could in theory encode thousands of different objects. Karen Hao at MIT Technology Review gets into more detail here.

Brittleness: A Method to Keep AI Hacker-Proof

For everything AI can do, it’s flawed at defending insidious attacks targeting its input data. Slight or seemingly random perturbations to a dataset—often undetectable by the human eye—can enormously alter the final output, something dubbed “brittle” for an algorithm. Too abstract? An AI trained to recognize cancer from a slew of medical scans, annotated in yellow marker by a human doctor, could learn to associate “yellow” with “cancer.” A more malicious example is nefarious tampering. Stickers placed on a roadway can trick Tesla’s Autopilot system to mistake lanes and careen into oncoming traffic.

Brittleness requires AI to learn a certain level of flexibility, but sabotage—or “adversarial attacks”—is becoming an increasingly recognized problem. Here, hackers can change the AI’s decision-making process with carefully-crafted inputs. When it comes to network security, medical diagnoses, or other high-stakes usage, building defense systems against these attacks is critical.

This year, a team from the University of Illinois proposed a powerful way to make deep learning systems more resilient. They used an iterative approach, having two neural nets battle it out—one for image recognition, and the other for generating adversarial attacks. Like a cat-and-mouse game, the “enemy” neural net tries to fool the computer vision network into recognizing things that are fictitious; the latter network fights back. While far from perfect, the study highlights one increasingly popular approach to make AI more resilient and trustworthy.

AI Savant Syndrome: Learning Common Sense

One of the most impressive algorithms this year is GPT-3, a marvel by OpenAI that spits out eerily human-like language. Dubbed “one of the most interesting and important AI systems ever produced,” GPT-3 is the third generation of an algorithm that produces writing so “natural” that at a glance it’s hard to decipher machine from human.

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

Yet GPT-3’s language proficiency is, upon deeper inspection, just a thin veil of “intelligence.” Because it’s trained on human language, it’s also locked into the intricacies and limitations of our everyday phrases—without any understanding of what they mean in the real world. It’s akin to learning slang from Urban Dictionary instead of living it. An AI may learn to associate “rain” with “cats and dogs” in all situations, gaining its inference from the common vernacular describing massive downpours.

One way to make GPT-3 or any natural language-producing AI smarter is to combine it with computer vision. Teaching language models to “see” is an increasingly popular area in AI research. The technique combines the strength of language with images. AI language models, including GPT-3, learn through a process called “unsupervised training,” which means they can parse patterns in data without explicit labels. In other words, they don’t need a human to tell them grammatical rules or how words relate to one another, which makes it easier to scale any learning by bombarding the AI with tons of example text. Image models, on the other hand, better reflect our actual reality. However, these require manual labeling, which makes the process slower and more tedious.

Combining the two yields the best of both worlds. A robot that can “see” the world captures a sort of physicality—or common sense—that’s missing from analyzing language alone. One study in 2020 smartly combined both approaches. They started with language, using a scalable approach to write captions for images based on the inner workings of GPT-3 (details here). The takeaway is that the team was able to connect the physical world—represented through images—by linking it with language on how we describe the world.

Translation? A blind, deaf, and utterly quarantined AI learns a sort of common sense. For example, “cats and dogs” can just mean pets, rather than rain.

The trick is still mostly experimental, but it’s an example of thinking outside the artificial confines of a particular AI domain. By combining the two areas—natural language processing and computer vision—it works better. Imagine an Alexa with common sense.

Deep Learning Fatigue

Speaking of thinking outside the box, DeepMind is among those experimenting with combining different approaches to AI into something more powerful. Take MuZero, an Atari-smashing algorithm they released just before Christmas.

Unlike DeepMind’s original Go, poker, and chess-slaying AI wizard, MuZero has another trick up its sleeve. It listens to no one, in that the AI doesn’t start with previous knowledge of the game or decision-making processes. Rather, it learns without a rulebook, instead observing the game’s environment—akin to a novice human observing a new game. In this way, after millions of games, it doesn’t just learn the rules, but also a more general concept of policies that could lead it to get ahead and evaluate its own mistakes in hindsight.

Sounds pretty human, eh? In AI vernacular, the engineers combined two different approaches, decision trees and a learned model, to make an AI great at planning winning moves. For now, it’s only been shown to master games at a level similar to AlphaGo. But we can’t wait to see what this sort of cross-fertilization of ideas in AI can lead to in 2021.

Image Credit: Oleg Gamulinskiy from Pixabay

Shelly Fan

Dr. Shelly Xuelai Fan is a neuroscientist-turned-science-writer. She's fascinated with research about the brain, AI, longevity, biotech, and especially their intersection. As a digital nomad, she enjoys exploring new cultures, local foods, and the great outdoors.