Veo 3, which launched during the 2025 I/O Conference, becomes Google’s flagship model in AI video generation after incorporating native audio features that surpass the competition in the industry.
Article Body
Picture a reality where you can generate deepfakes at the touch of a button, and with a click of the mouse, command an AI to produce perfectly fitted sound parts, background noise, and even chatter. This fantasy was introduced on May 20, 2025, at the Google I/O conference. With Veo 3, Google intends to redefine the paradigms of AI video generation by incorporating audio elements that strengthen its stance in the aggressive landscape of AI-infused video technologies.
With Veo 3, we’ve taken a giant leap. For the first time, Google’s AI can produce audio to accompany videos, synchronizing them with the text provided. For instance, if you portray a busy city street, Veo 3 will show you the cars and pedestrians and also narrate the interactions—traffic noise, people talking, and even horn blares. It’s something that differentiates Google from competitors like OpenAI’s Sora, Meta’s Movie Gen, Runway, and Stability AI, all of which are competing in this emerging domain.
Veo 3 is now more powerful than ever. Google also released updated features for the previous version, Veo 2, allowing users to add and subtract objects, expand frames, switch from portrait to landscape view, and move the camera with precision. These improvements bring more intuitive and powerful video editing capabilities, offering unprecedented control to content creators.
Google does not only focus on videos, as the company is also set to release Imagen 4, a new image generation model that has the ability to produce sharp images (up to 2k) in different resolutions and formats. Imagen 4 is already part of major Google services such as the Gemini app, Whisk, Vertex AI, and the Workspace apps (Slides, Vids, and Docs). The creative process will soon be accelerated with a new version of Imagen 4 that with a speed of 10x compared to Imagen 3, its speed will significantly boost creativity.
For film producers and writers, Google has unveiled an advanced AI filmmaking feature called Flow. Using the capabilities of Veo, Imagen, and Gemini, Flow can createsingle scenes or even entire movies. Subscribers to the Google AI Pro and Ultra plans in the US have access to Flow, which is built on VideoFX and is rolling out soon to other countries. The tool empowers any filmmaker, enabling people with no prior filmmaking experience to create high quality films.
Innovations such as Gemini are fueled by advancement in Google’s AI model. Gemini 2.5 Pro with Deep Think mode augments reasoning skills while document-level low-latency performance is provided for developers by Gemini 2.5 Flash. Both models capitalize on the success of Gemini 2.0 Flash which was released in December 2024 and made widely available in February. Furthermore, Google is embedding Project Mariner’s capabilities of computer control into Gemini API as well as Vertex AI, which is presently being tested by leaders in the field like Automation Anywhere, UiPath, and Browserbase, with expansion planned for 2025.
Through these announcements, Google aims to reinforce its unapologetic position at the helm of the AI arms race. They are not just aide memoire by letting other companies use advanced video, audio, and image multipurpose generation, but also equip creators and developers with powerful tools so that they can lead the race. With new Veo 3, an AI powered tool set bound to change the innovative and creative industry, along with the ecosystem built around it, Google aims to set a new magnifier for the creativity end.