What if a camera could not only take a picture but describe the scene, identify objects, and list the names of people within it? The artificial intelligence necessary to perform such a feat may seem to be a distant reality but a communications project by a New York University grad student has ignited interest using a clever workaround. Matt Richardson’s creation is called the Descriptive Camera. Take a picture, wait between approximately 3-6 minutes, and the camera prints out a description of what’s in the scene. Inside the camera is a webcam, BeagleBone board, and a thermal printer.
How does an AI-less camera accomplish such a currently insurmountable task? Through artificial artificial intelligence.
The camera connects to the Internet through an ethernet cable (ideally it would be Wi-Fi) and sends the image from the webcam to Amazon’s Mechanical Turk, a service that allows a requester to post small tasks that workers, many overseas, can complete for minimal payments. So once someone describes the image, it is transmitted back to the camera for printing.
It’s a clever workaround that lives up to the name of the Amazon service based on an 18th century machine that seemingly could play chess but was actually housing a chess master within it. Using the service, Richardson paid $1.25 for each description.
Human analysis of each photo produces metadata, which are effectively tags. How important is photo tagging? Back in 2010, Mark Zuckerberg CEO of Facebook said that 95% of users have been tagged in a photo that links to their profiles, so that’s currently 800 million people tagged. Last year, the site also introduced tags which link to company pages, and that means users are busily identifying products in pictures, whether cars, soft drinks, or designer jeans. Unlike the Amazon service, Facebook users are doing all of this tagging in the context of it being a social activity, and that means it’s free (like most of the data generated on the site).
But developers are quickly eliminating the need for humans for photo analysis, at least when it comes to faces. New tools for facial recognition have become powerful, with a new system from Hitachi that identifies faces in photos or videos by searching through 36 million images a second. Face.com, for instance, not only is capable of identifying faces, but it’s able to characterize them by gender, mood and age.
While some may see the Descriptive Camera as a bunch of hype, Richardson sees it as a window into the future. That’s probably why, along with getting class credit, he was willing to invest $200 in parts and many hours of programming to get it to work. One way to look at this camera is that it’s a device for collecting searchable data. With all the talk recently about big data, it’s becoming more obvious by the day that data is the new currency, and so an image that lacks metadata is like throwing money away. In that light, think of how much data is being wasted in your photo albums? Perhaps that’s why Facebook was willing to spend $1 billion to purchase Instagram, which introduced hashtags last year.
With that kind of money being invested in photos, Richardson’s interest in the project for his Interactive Communications class comes into better focus.
Even as a prototype, the Descriptive Camera may very well be the spark that ignites dozens of startups to hit the problem of photo descriptions hard, so that years from now, it’ll be difficult to imagine using a camera that doesn’t tell you everything about the photo.