Robot ‘Iron Chef’ Sharpens Skills With YouTube Cooking Videos

If robots and AI are our technological children (and of course they are!), what’s the best way to teach them about the world? Why, the internet, of course. Using the popular deep learning programming technique, computer scientists are rearing the next generation of infant AIs on a steady diet of online images and videos.

A bit like the scene in the Matrix when Neo downloads kung fu directly to his brain, deep learning programs rapidly absorb large amounts of data and learn from it. Of course, the former is fictional—the latter, not at all.

In computer vision, for example, programs fed thousands of images can independently learn to isolate and identify individual components in them. While you won’t just stumble upon ten thousand cat pictures in a desk drawer, the web is a treasure trove of such data. Fertile ground for young, impressionable programs.

Now, artificial intelligence researchers are moving beyond still images. University of Maryland researchers, for example, recently trained deep learning software with 88 YouTube cooking videos. After binging on the videos, the software learned to identify simple culinary tasks and form commands for a robot arm.

How does it work?

The program identifies hands and classifies their position.
The program identifies hand positions.

The program isolates hands in the video and assigns one of six “grasp” positions. It identifies objects and classifies them as one of 48 foods or tools. Finally, it identifies the action being performed—combining the lot into a command for execution by a robotic arm and grasper.

Simple as the tasks are, it isn’t easy teaching software from raw YouTube footage. Background variation and noise make it harder to pick out the video’s critical elements. To further improve accuracy, the program calculates the most probable action by associating verbs and nouns in the video.

Put to the test, the deep learning software was able to accurately recognize and correctly classify objects and use what it saw in the videos to form commands for a robotic arm in various related actions. Looking forward, the team thinks software like theirs has great potential for robot learning.

“We believe this preliminary integrated system raises hope towards a fully intelligent robot for manipulation tasks that can automatically enrich its own knowledge resource by ‘watching’ recordings from the World Wide Web,” the researchers wrote in a paper describing the project.

As they imply, the technique would be useful beyond cooking. You’d be surprised what’s already on YouTube. Robots might learn how to wash dishes or fold laundry. Agricultural robots might learn to pick fruit.

Of course, you can’t learn everything on YouTube. (Can you?) But no one is talking about turning these programs loose online. For now, researchers will select the action to be learned and the group of videos best suited to the task. Even so, the technique could make robot learning faster and easier.

And as the volume of online information grows, perhaps future robots will be able to tie a tie like a pro, chop carrots like Julia Child, or learn kung fu (yikes!) like Neo—just by watching YouTube.

Image Credit: Alberto D’Ottavi/Flickr; Yez Hou Yang/Yi Li/Cornelia Fermuller/Yiannis Aloimonos/University of Maryland

Jason Dorrier
Jason Dorrier
Jason is editorial director of Singularity Hub. He researched and wrote about finance and economics before moving on to science and technology. He's curious about pretty much everything, but especially loves learning about and sharing big ideas and advances in artificial intelligence, computing, robotics, biotech, neuroscience, and space.
Don't miss a trend
Get Hub delivered to your inbox