OpenAI, a San Francisco-basedAI research and deployment firm that created ChatGPT, has introducedIndQA, a new benchmark for evaluating AI systems on Indian culture and languages, here on Tuesday.
The company said its mission was to make AGI (Artificial General Intelligence) benefit all of humanity, across languages and cultures. Some 80% of people worldwide do not speak English as their primary language and yet most existing benchmarks that measure non-English language capabilities fell short, the firm noted.
That means, existing multilingual benchmarks like MMMLU are now saturated,which make them less useful for measuring real progress. In addition, current benchmarks mostly focus on translation or multiple-choice tasks. They don’t adequately capture what really matters for evaluating an AI system’s language capabilities—understanding context, culture, history, and the things that matter to people where they live.
That’s why IndQA, a new benchmark designed to evaluate how well AI models understand and reason about questions that matter in Indian languages, across a wide range of cultural domains, was built.
“Today we are rolling our IndQA. Built in collaboration with 261 experts across 12 languages, IndQA fills a key gap by enabling fair and rigorous evaluation that reflects India’s cultural and linguistic diversity,’’ said Srinivas Narayanan, CTO, B2B Application, OpenAI .
According to Mr. Narayanan, the benchmark will help all AI models perform better in languages and contexts that are currently underrepresented in global datasets.
While OpenAI’s aim was to create similar benchmarks for other languages and regions, India,where about a billion people didn’t speak English as their primary language and used 22 official languages, was an obvious starting point for the company.
According to company officials, this work is part of OpenAI’s ongoing commitment to improve products and tools for Indian users, and to make itstechnology more accessible throughout the country for a wide range of users from students, farmers, educators and all.
IndQA evaluates knowledge and reasoning about Indian culture and everyday life in Indian languages. It spans 2,278 questions across 12 languages and 10 cultural domains, created in partnership with 261 domain experts from across India, as per OpenAI.
“Unlike existing benchmarks like MMMLU and MGSM, it is designed to probe culturally nuanced, reasoning-heavy tasks that existing evaluations struggle to capture,’’ said the firm in a blog.
IndQA covers a broad range of culturally relevant topics, such as Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation—with items written natively in Bengali, English, Hindi, Hinglish (given the prevalence of code-switching in conversations), Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.
IndQA uses a rubric-based approach; and each datapoint includes a culturally grounded prompt in an Indian language, an English translation for auditability, rubric criteria for grading, and an ideal answer that reflects expert expectations.
Experts (who are native‑level speakers of the relevant language and English with deep expertise) from 10 different domains in India drafted difficult, reasoning‑focused prompts tied to their regions and specialties. Also, each question was tested against OpenAI’s strongest models at the time of their creation: GPT‑4o, OpenAI o3, GPT‑4.5, and (partially, post public launch) GPT‑5.
As a caveat, the firm said, because questions were not identical across languages, IndQA was not a language leaderboard; cross‑language scores shouldn’t be interpreted as direct comparisons of language ability. Instead, IndQA would be used to measure improvement over time within a model family or configuration.
Also speaking at media conference, Mr. Narayanan said, “India can be a beacon of how AI can be used for social good including education, health and farming etc.’’
He further said the company has 4-5 million developers globally. “We are really propping up the developer ecosystems so that they can do more with AI.. We continue to improve our models, pushing the frontiers of technology to help enterprises to have a better agentic future.’’
Published – November 04, 2025 09:10 pm IST