With its most recent inventions, Sarvam Vision and Bulbul V3, Bengaluru based firm Sarvam AI has been causing a stir in the artificial intelligence (AI) community throughout the world. According to reports, the AI model has done better than industry titans like Google Gemini and ChatGPT in crucial optical character recognition (OCR) domains.
Co-founder Pratyush Kumar stated in a post on X (previously Twitter) that Sarvam Vision outperformed Gemini 3 Pro and DeepSeek OCR v2 with an accuracy of 84.3% on olmOCR Bench and 93.28% on OmniDocBench v1.5.
Additionally, Bulbul V3’s text to speech model supports 35 voices and uses a sample set that spans 22 official Indian languages across the years 1800 to the present. Additionally, the material and scan quality varies. On Indian languages, Sarvam Vision is the best model by far, while supporting all 22 scheduled Indian languages, said Kumar.
The series features a 3B parameter state space vision language model that can perform visual understanding tasks such as complicated table parsing, chart interpretation, image captioning, and scene text recognition.
Sarvam AI is a sovereign AI
According to the company’s official website, it wants to create a future in which AI is readily available to all Indians. We want India to confidently and responsibly embrace the most significant technological transformation of our time.
Our goal is to create fundamental elements and apply them to the particular requirements of the nation, the business stated on its website. The company contends that on problems unique to India, meticulous data curation and job tweaking can outperform more general, generic models.
The success of Sarvam AI represents a critical turning point in India’s AI development and demonstrates the nation’s ability for fundamental AI innovation. Global experts, such as tech commentator Deedy Das, have recognized the startup’s attention on India specific issues and the usefulness of Sarvam’s OCR and speech models for Indian languages.
When I wrote about Sarvam a year ago, I thought the approach of training small indic language models was incorrect, Das wrote on X. They have, however, made it better. It’s actually very valuable that they have the finest OCR, speech to text, and text to speech models for Indic languages. The prices are quite affordable. Additionally, the website is incredibly user friendly in addition to having a stunning design.
They are addressing a critical ecosystem gap and undertaking tasks that large labs are unlikely to fully pursue (at least in the near future). Although I don’t know much about the company, I think their technological advancements are impressive, and I can’t recall the last time I had that opinion of software originating from India.
HOW DID IT BEAT GOOGLE AND ChatGPT?
Recent products from the business, such as Bulbul V3 (a text to speech system) and Vision (an OCR/vision model), were evaluated against international systems and performed better on specific benchmarks aimed for India.
Also Read: Chocolate Day 2026: When is it, Its history and why people Celebrate
According to Sarvam and national media coverage, Bulbul V3 outperformed various international TTS systems in handling numerals, named entities, and code mixed text, and it reported lower error rates on telephony grade audio in automated error tests and a blind listening research.
Separately, the company’s Vision tool outperformed generalist models on certain India language OCR jobs in preliminary tests on document reading in Indian languages.









