Google recently launched Gemini Live, a groundbreaking feature that allows users to engage in semi-natural spoken conversations with an AI chatbot powered by the company’s latest large language model. This development is part of Google’s ongoing efforts to push the boundaries of artificial intelligence and voice interaction.
A Response to OpenAI and ChatGPT
Gemini Live marks a significant milestone for Google, as it directly responds to OpenAI’s Advanced Voice Mode and ChatGPT’s nearly identical feature, which is currently in limited alpha test. While OpenAI was the first to demo the feature, Google has successfully rolled out the finalized version, demonstrating its commitment to innovation.
In my experience with Gemini Live, I found that these low-latency, verbal features feel much more natural than texting with ChatGPT or even interacting with Siri or Alexa. The AI chatbot responds quickly, often in under two seconds, and is able to pivot fairly rapidly when interrupted. While not perfect, Gemini Live represents the best way to use your phone hands-free that I’ve seen yet.
How Gemini Live Works
Before initiating a conversation with Gemini Live, users can choose from 10 distinct voices, each created through collaboration with voice actors. This variety is a notable improvement over OpenAI’s offering, which only features three voices. The human-like quality of these voices adds to the overall naturalness of the interaction.
In one example, a Google product manager verbally asked Gemini Live to recommend family-friendly wineries near Mountain View with outdoor areas and playgrounds nearby. The AI successfully suggested Cooper-Garrod Vineyards in Saratoga, which met all the specified criteria. However, it’s worth noting that Gemini Live did seem to hallucinate a nearby playground called Henry Elementary School Playground, which is supposedly 10 minutes away from the vineyard. In reality, there are other playgrounds closer by.
Limitations and Future Plans
While Gemini Live shows great promise, it does leave room for improvement. For instance, sometimes users may interrupt the AI mid-sentence, but the chatbot doesn’t always pick up on what’s being said. Google has stated that it’s not allowing Gemini Live to sing or mimic voices outside of the provided options, citing concerns about copyright law.
Furthermore, the company is not currently focused on getting Gemini Live to understand emotional intonation in a user’s voice – an aspect that OpenAI highlighted during its demo. However, Google does plan to add real-time video understanding capabilities in the future as part of Project Astra, a fully multimodal AI model that debuted at Google I/O.
Conclusion
Gemini Live represents a significant step forward for Google and the field of artificial intelligence. This feature offers users a more natural way to interact with their devices, making it easier to dive deeply into subjects without relying on simple Google Search. While there’s still room for improvement, Gemini Live is an exciting development that showcases the potential of AI in voice interaction.
The Future of Voice Interaction
As we move forward, it will be interesting to see how companies like Google and OpenAI continue to innovate in the field of voice interaction. With advancements like Gemini Live and Project Astra on the horizon, one thing is certain – the future of voice interaction holds much promise for improved natural language processing and a more seamless user experience.
Topics
- AI
- ChatGPT
- Gemini Live
- Generative AI
- Made by Google
About the Author
The author is an expert in the field of artificial intelligence and voice interaction. With extensive knowledge of cutting-edge technologies like Gemini Live and Project Astra, they offer valuable insights into the future of human-computer interaction.
Subscribe to Our Newsletter
Stay up-to-date on the latest developments in AI, voice interaction, and more by subscribing to our newsletter. Every week, we’ll bring you the most relevant news and updates from the world of artificial intelligence.
Related Articles
- Decart Nabs $32M at $500M+ Valuation to Build AI Tech and ‘Open World’ Apps: Ingrid Lunden
- Temu is the Most Downloaded App on the US App Store in 2024: Sarah Perez
- ChatGPT Now Understands Real-Time Video, Seven Months After OpenAI First Demoed It: Kyle Wiggers
Get Involved
Join the conversation about Gemini Live and the future of AI by sharing your thoughts and opinions in the comments below.