Google I/O 2024, Alphabet’s (NASDAQ:GOOGL) annual developer conference, was held on Tuesday (May 14) at the Shoreline Amphitheater in Mountain View, California.
During the event, Google’s development team marked a new era in artificial intelligence (AI) by showcasing advances in Gemini, its AI assistant, along with other AI products and features across its portfolio.
Read on to learn about five of the most important takeaways from the event.
1. Gemini advances keep coming
I/O’s keynote presentation centered on the advanced version of Gemini 1.5 Pro, which has a 1 million token context window. It was launched as an experimental feature in February. Sundar Pichai, Google’s CEO, announced that the new model will become globally available to developers through Gemini Advanced starting in June. With a longer context window, Gemini can analyze more data at once, improving its understanding of complex data.
Aside from that, the company revealed it will be offering a private preview of an even larger model of Gemini 1.5 Pro with a context window of 2 million tokens. It also introduced Gemini 1.5 Flash, a smaller, faster AI model.
In addition, Pichai introduced new AI features coming to Google services. Enhanced Gemini interactions will enable users to communicate with AI through Google Messages. The upcoming Gemini Live feature, accessible to Gemini Advanced subscribers, offers a natural conversational experience by leveraging advanced speech technology. Users can converse with Gemini, select from various voices and interrupt for clarification. Gemini Live can assist in tasks such as job interview preparation and will integrate camera capabilities later this year for environment-based discussions.
Google also launched Gems for Gemini Advanced subscribers, offering a personalized AI experience. Users can create custom Gemini versions and provide role descriptions such as ‘gym buddies’ or ‘coding partners.’ Gemini will then process these inputs to generate tailored Gems for users to interact with.
In the realm of search, Google announced the launch of AI Overviews in the US later this week; it said this feature will deliver nuanced, paragraph-long answers based on multi-step reasoning. Meanwhile, Ask Photos is a tool that can simplify recalling information from personal photos. “Say you’re paying at the parking station, but you can’t recall your license plate number,” Pichai explained during a demonstration. ‘Before, you could search Photos for keywords and then scroll through years’ worth of photos, looking for license plates. Now, you can simply ‘Ask Photos.’ It knows the cars that appear often, it triangulates which one is yours, and tells you the license plate number.”
Furthermore, Google introduced Search with Video, which allows users to ask questions and obtain detailed solutions by “showing” Google a live video feed. During the demonstration, one of Google’s developers asked Google to help her figure out why her record player wasn’t working and to identify a specific part. Google could tell what part of the record player she was asking about (the tonearm) and provided instructions for troubleshooting.
2. Workspace gets new AI features
Google has also integrated Gemini 1.5 Pro into several of its Workspace services, introducing new features such as summaries located in side panels. This development lets users pull out key information from lengthy content such as email threads and documents. For example, Gemini can summarize an email thread and generate a response.
NotebookLM’s series of AI-driven enhancements greatly improve its usability and intelligence. Greater contextual understanding makes it more effective at suggesting relevant connections, and advanced natural language processing abilities contribute to a more interactive experience with the platform; notably, NotebookLM can be interrupted, a feature it shares with GPT-4o, which was introduced at OpenAI’s Spring Update on Monday (May 13).
NotebookLM will basically function as a private tutor, generating study guides, quizzes and audio overviews based on input text material and more personalized adherence to learning styles.
3. Deep Mind shares Project Astra updates
Google’s Deep Mind team unveiled one of its latest AI endeavors, Project Astra, which essentially replaces Google Assistant. Astra can “recall” past events by capturing video input and caching it in chronological order. Its capabilities echo those of GPT-4o, although Project Astra is not yet available to the public and its capabilities are still being refined.
Deep Mind also introduced a suite of AI-enabled creative tools, including Imagen 3, an advanced image-generation model capable of producing images with rich details, precision and fewer distortions than previous models.
In collaboration with YouTube, Google has also launched Music AI Sandbox, an innovative platform that offers musicians creative support. Musician Michael Rebillet likened it to having a friend offering creative suggestions. Musicians can now compose new instrumentals from scratch, experiment with sound transfer and enhance their tracks.
Finally, there’s Veo, Google’s newest AI-powered video-generation software, which is capable of generating 1080p videos that are over a minute long.
4. Google teases arrival of Gemma 2
Google also teased the upcoming arrival of Gemma 2, the newest addition to its family of lightweight open models built on the same foundation as Gemini. Gemma 2, a 27 billion parameter model, will be optimized to run out of NVIDIA’s (NASDAQ:NVDA) GPUs, and will offer enhanced performance and efficiency on a single TPU host in Vertex AI. According to developers, this will make Gemma 2 a more affordable model and will expand its range to a wider audience.
They company also introduced a new addition to the Gemma family — PaliGemma is an open vision-language model available on multiple platforms like GitHub, Hugging Face, Kaggle and Vertex AI Model Garden.
5. AI comes to Android
Finally, Google is set to transform the Android experience with AI built into the device.
For Pixel users, Gemini Nano with multimodality offers richer image descriptions and scam alerts during phone calls. Google is bringing Gemini Nano’s multimodal capabilities to TalkBack later this year, where it will enhance image descriptions for users who are blind or have low vision. With TalkBack, Gemini Nano will provide detailed and accurate information about images, even without an Internet connection.
With Gemini on Android, users can ask questions about videos and PDFs, isolating specific features within images using Circle to Search.
Market reaction
Alphabet’s share price has seen an overall positive trend this year, despite a temporary decline in valuation following lower-than-expected Q4 earnings. The company has rebounded since a tech stock selloff on March 6, and is up about 25 percent year-to-date. On Monday (March 13), the stock opened low amid rumors of a rival search tool from Microsoft (NASDAQ:MSFT) and OpenAI, but had recovered by the end of the day.
Alphabet traded flat on the day of Google I/O. It closed Wednesday (May 15) at US$173.88.
Securities Disclosure: I, Meagen Seatter, hold no direct investment interest in any company mentioned in this article.