/tice-news-prod/media/media_files/2026/02/09/sarvam-ai-2026-02-09-10-55-26.jpg)
For years, the global AI conversation has felt like a two-horse race. Silicon Valley on one side. China on the other. India—despite its massive developer base, academic depth, and real-world complexity—has mostly been viewed as a consumer or implementer of AI, not a creator of core, foundational models.
That narrative is now beginning to crack.
From Bengaluru, a relatively young startup is forcing the global AI community to pause and pay attention. Sarvam AI is not just building tools for India—it is building from India. And its latest breakthroughs suggest that India’s AI ambitions may finally be moving from promise to proof.
Over the past week, two of Sarvam’s in-house models—Sarvam Vision, an optical character recognition (OCR) system, and Bulbul V3, a text-to-speech model—have sparked rare global admiration. The reason? They are outperforming or rivaling some of the most talked-about AI systems in the world, including ChatGPT, Google Gemini, and Anthropic Claude, in areas that matter deeply to India.
Sarvam AI: Beating the Giants at Their Own Game
Sarvam Vision focuses on OCR—teaching machines to accurately read and understand documents. While this may sound niche, it is one of the hardest and most critical problems in a country like India, where documents often mix languages, scripts, layouts, tables, stamps, and handwritten elements.
According to details shared publicly by Sarvam AI co-founder Pratyush Kumar, Sarvam Vision achieved an 84.3 percent accuracy score on olmOCR-Bench, a well-known benchmark for OCR performance. That score places it ahead of Gemini 3 Pro and recent OCR-focused models like DeepSeek OCR v2. Notably, ChatGPT ranked significantly lower on the same benchmark.
The momentum did not stop there.
On OmniDocBench v1.5, a benchmark designed to test how AI systems interpret real-world documents, Sarvam Vision posted an overall score of 93.28 percent. Its strongest performance came in areas that traditionally break OCR systems—complex document layouts, technical tables, and mathematical formulas. These are precisely the formats found in government files, financial records, academic papers, and enterprise workflows across India.
For a startup that was once questioned for focusing too heavily on Indic-language problems, this performance has flipped scepticism into validation.
From Doubt to Admission: Global Voices Take Notice
Perhaps the most telling signal of Sarvam’s impact is not just benchmark numbers, but changing opinions.
Tech commentator Deedy Das, who had earlier expressed doubts about the value of training smaller, Indic-focused AI models, publicly acknowledged that he had underestimated Sarvam’s approach. In a candid post on X, Das admitted that the company had identified and filled a gap global AI labs largely ignored.
He wrote that Sarvam now offers the best text-to-speech, speech-to-text, and OCR models for Indic languages, calling them both valuable and reasonably priced. His shift from scepticism to praise mirrors a broader reassessment taking place within the AI community.
Users have echoed this sentiment. One early user, after testing Sarvam’s tools, summed up their experience simply: “I used this a couple of days ago! Oh man wow.”
Bulbul V3: Giving Indian Languages a Natural Voice
Alongside Sarvam Vision, the company has also released Bulbul V3, its latest text-to-speech AI model. Designed to generate natural, expressive audio, Bulbul V3 targets a challenge that global AI tools have historically struggled with—high-quality voice generation for Indian languages.
In its official announcement, Sarvam described Bulbul V3 as a production-ready model built specifically for India’s linguistic diversity. The company says it has focused on reducing failure modes and ensuring stable, content-accurate speech across real-world inputs.
At present, Bulbul V3 supports 35+ voices across 11 Indian languages, with plans already in motion to expand coverage to 22 languages.
The comparison many users are making is with ElevenLabs, widely considered a global leader in AI voice generation. But for Indian use cases, cost and language depth matter. According to Pratik Desai, founder of KissanAI, Bulbul has become their default choice.
Desai noted that Bulbul has improved consistently with every release, adding that ElevenLabs’ pricing never made sense for Indic or regional language applications.
A Bigger Signal for India’s AI Future
Sarvam AI’s rise is not just about one startup or two successful models. It represents a deeper shift in how India’s AI ecosystem is evolving—from adaptation to original innovation.
By building foundational models locally and optimising them for India’s linguistic, cultural, and document-heavy realities, Sarvam is demonstrating that world-class AI does not have to be generic to be global. Sometimes, being deeply local is exactly what makes technology stand out.
As global attention slowly turns toward what is being built in Bengaluru, one thing is becoming clear: India’s AI story is no longer just about scale or talent supply. It is about capability, credibility, and confidence.
And Sarvam AI, quietly but convincingly, is leading that change.
/tice-news-prod/media/agency_attachments/EPJ25TmWqnDXQon5S3Mc.png)
Follow Us