Google’s Gemini AI Hints at the Next Great Leap for the Technology

Google has launched Gemini, a new artificial intelligence system that can seemingly understand and speak intelligently about almost any kind of prompt—pictures, text, speech, music, computer code, and much more.

This type of AI system is known as a multimodal model. It’s a step beyond just being able to handle text or images like previous algorithms. And it provides a strong hint of where AI may be going next: being able to analyze and respond to real-time information from the outside world.

Although Gemini’s capabilities might not be quite as advanced as they seemed in a viral video, which was edited from carefully curated text and still-image prompts, it is clear that AI systems are rapidly advancing. They are heading towards the ability to handle more and more complex inputs and outputs.

To develop new capabilities, AI systems are highly dependent on the kind of “training” data they have access to. They are exposed to this data to help them improve at what they do, including making inferences such as recognizing a face in a picture or writing an essay.

At the moment, the data that companies such as Google, OpenAI, Meta, and others train their models on is still mainly harvested from digitized information on the internet. However, there are efforts to radically expand the scope of the data that AI can work on. For example, by using always-on cameras, microphones, and other sensors, it would be possible to let an AI know what is going on in the world as it happens.

Real-Time Data

Google’s new Gemini system has shown that it can understand real-time content such as live video and human speech. With new data and sensors, AI will be able to observe, discuss, and act upon occurrences in the real world.

Self-driving cars, which already collect enormous amounts of data as they drive on our roads, are the most obvious example of this. This information ends up on the manufacturers’ servers where it is used not just in the moment of operating the vehicle, but to build long-term, computer-based models of driving situations that can support better traffic flow or help authorities identify suspicious or criminal behavior.

In the home, we already use motion sensors, voice assistants, and security cameras to detect activity and pick up on our habits. Other “smart” appliances are appearing on the market all the time. While early uses for this tech are familiar, such as optimizing heating for better energy usage, the understanding of habits will become much more advanced.

This means that an AI can both infer activities in the home, and even predict what will happen in the future. This data could then be used, for instance, by doctors to detect early onsets of ailments such as diabetes or dementia, as well as to recommend and follow up on changes in lifestyle.

As AI’s knowledge of the real world gets more comprehensive, it will act as a companion. At the grocery store, I can discuss the best and most economical ingredients for a meal I am planning. At work, AI will be able to remind me of the names and interests of clients in a face-to-face meeting—and suggest the best way to secure their business. When on a trip in a foreign country, it will be able to maintain an ongoing conversation about local tourist attractions, while keeping an eye on any potentially dangerous situations I might encounter.

Privacy Implications

There are enormous positive opportunities that come with all this new data, but there is an equal risk of overreach and intrusion on people’s privacy. As we have seen, users have so far been more than happy to trade a staggering amount of their personal information in return for access to free products, such as social media and search engines.

The trade-offs in the future will be even greater and potentially more dangerous, as AI gets to know and support us in every aspect of everyday life.

If given a chance, the industry will continue to expand its data collection into all aspects of life, even offline ones. Policymakers need to understand this new landscape and ensure the benefits balance the risks. They will need to monitor not just the power and pervasiveness of the new AI models, but also the content they collect.

When AI expands its capabilities into the next frontier—the real world—only our imaginations will limit the possibilities.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Image Credit: Google DeepMind / Unsplash

Lars Erik Holmquist
Lars Erik Holmquist
Lars Erik Holmquist is professor of design and innovation at the School of Art and Design at Nottingham Trent University (NTU). He is an internationally leading researcher in human-computer interaction, interaction design, and ubiquitous computing. He has published over 100 articles in fields such as HCI, design methods, mobile applications, and ubiquitous computing, which have been cited more than 4500 times. Before joining NTU, he was professor of innovation at Northumbria University's Department of Design.
RELATED
latest
Don't miss a trend
Get Hub delivered to your inbox

featured