Baidu just showed the world that China’s AI ambitions aren’t slowing down — they’re scaling up.
At its flagship event Baidu World 2025, the company unveiled ERNIE 5.0, a natively omni-modal foundation model that can handle text, images, audio, and video in one unified framework. Think ChatGPT meets Midjourney meets Synthesia — but all under one roof.
And that wasn’t the only headline moment. Baidu’s CEO Robin Li used the event to roll out a full suite of AI-powered products — from digital humans and no-code builders to global AI workspaces — signaling that Baidu wants to compete not just in model quality, but in how people actually use AI.
“When you internalize AI, it becomes a source of productivity, not cost,” Li told the audience in Beijing. The line landed like a thesis statement for Baidu’s next era.
ERNIE 5.0: Built to Think Across Mediums
While OpenAI and Google push their own multi-modal updates, Baidu’s ERNIE 5.0 may be the most natively integrated yet. Unlike patch-worked models that bolt image or audio features on top, ERNIE 5.0 was trained from the ground up to understand and generate across text, visuals, and sound simultaneously.
That means ERNIE 5.0 can read an article, watch the attached video, and answer contextually — or even create a cohesive presentation mixing all formats. Baidu says it can handle instruction following, creative writing, factual reasoning, and agentic planning, all in one shot.
A public preview is now live via ERNIE Bot, and enterprise customers can tap it through Baidu Cloud’s Qianfan model-as-a-service platform.
Li hinted that this model’s intelligence limit has “broken through its previous ceiling” — and that AI agents themselves could soon outpace the foundational models that power them.
Meet Famou, the Self-Evolving Agent
Perhaps the wildest announcement came when Baidu pulled back the curtain on Famou, which it claims is the world’s first commercially available self-evolving agent.
Famou can “simulate and surpass” top algorithm experts by adapting its own logic on the fly — basically, a self-upgrading AI that learns by doing. Baidu says it can handle dynamic optimization in complex sectors like energy, logistics, and finance.
For now, Famou is invite-only, accessible through famou.com. But if it works as promised, it could mark a new frontier in autonomous AI — systems that improve themselves without human retraining.
GenFlow 3.0 and the Rise of the Super-Agent
Baidu’s GenFlow 3.0, a general AI agent designed to automate tasks and workflows, also got a major upgrade. With 20 million+ users, it’s now one of the world’s largest active general agents.
The new version boasts stronger memory and multimodal abilities — it can jointly process documents, slides, videos, and podcasts — and seamlessly coordinate between them. If OpenAI’s GPTs or Anthropic’s “Projects” aim to assist individuals, GenFlow’s ambition is broader: to run companies.
AI in Motion: Apollo Go’s 17 Million Rides
Not everything at Baidu World 2025 was software. The company’s autonomous ride-hailing arm, Apollo Go, has quietly become the largest robotaxi service in the world — surpassing 17 million cumulative rides across 22 cities.
Baidu claims over 240 million km of autonomous driving to date, with 140 million km fully driverless. That’s a staggering scale, even compared to Waymo or Cruise.
Li’s vision is that autonomous vehicles will “become mobile living spaces.” He might not be wrong: as robotaxi costs drop, fully automated mobility could become as routine as hailing an Uber.
No-Code Goes Global: Meet MeDo
Baidu also revealed its no-code application builder Miaoda 2.0, which has already generated 400,000+ apps inside China. Its international twin, MeDo, just went live globally via meDo.dev.
It’s a direct move into developer ecosystems beyond China — and a test of whether Baidu’s AI tools can appeal to Western and Southeast Asian markets, not just domestic users.
Alongside MeDo, Baidu introduced Oreate, an all-in-one AI workspace for creators. Think Notion meets Canva meets ChatGPT. The platform already counts 1.2 million users globally.
Digital Humans and the Future of Interaction
One of the biggest flexes of Baidu World 2025 came via its digital human tech. These avatars are now capable of full-modal emotional alignment — they can react, gesture, and respond in real time during live streams.
During China’s recent “Double 11” shopping festival, 83% of livestreamers used Baidu’s digital human tech, which helped drive a 91% surge in GMV year over year.
Baidu’s Li called digital humans a “universal interaction interface for the AI era” — in other words, the face of the machine age.
Why It Matters: From Model Wars to App Wars
The subtext of Baidu World 2025 is clear: the model wars are giving way to application wars.
Every major AI company — OpenAI, Google, Anthropic, Baidu — has powerful base models now. What matters next is how those models are used, monetized, and embedded into workflows.
Robin Li put it bluntly: “Applications will create 100× the value of foundation models.”
In that sense, Baidu is pivoting from chasing benchmarks to building ecosystems. Its global rollout of MeDo, Oreate, and digital human platforms shows that China’s AI innovation is starting to travel — and maybe, to compete head-on with Western incumbents.
Conclusion
ERNIE 5.0 isn’t just another model update; it’s Baidu’s global comeback statement. With multimodal smarts, evolving agents, and the world’s biggest robotaxi fleet, Baidu is quietly threading AI through every corner of its empire.
The next question isn’t whether ERNIE 5.0 can match GPT-5 — it’s whether Baidu can turn these tools into a global ecosystem that users actually rely on.
And if it can, the AI race just got a lot more interesting.