[Bhaashini] : A Crowdsourcing initiative for Indian Languages

[Bhaashini] aims to create open datasets to develop Speech Recognition, Text-to-Speech, Machine Translation and Optical Character Recognition for Indian languages. This initiative will empower our technologists, language enthusiasts and language communities to build world class digital applications in our own local languages.

Enrich Indian languages become atma-nirbhar through these datasets.

This is an effort by MeitY, Government of India, under the National Language Translation Mission (NLTM). This effort is supported by EkStep Foundation on a pro-bono basis and leverages its open source work under https://sunbird.org/projects/vakyansh along with various other open source frameworks.


India is a land of many languages, our digital strategy must reflect it.

90% of India's population uses regional languages to conduct their day-to-day activities.
As more and more people use digital platforms for financial, educational and social activities, digital platforms must work seamlessly in all Indian Languages.
Speaker Diversity

Enrich your own language

We need to build speech recognition, text-to-speech, machine translation and OCR technologies that are tuned for your language.

These technologies will transform your language to become "digital first" in various sectors such as: education, healthcare and media.

The tools, utilities and models proposed to be developed in your language will rely on open source contributions through the Bhaashini crowd sourcing platform

Crowd Sourcing & your contribution

The above mentioned technologies rely on AI technology that requires large datasets in their respective fields.

[Bhaashini] currently has four crowdsoucing initiatives to create these datasets -

1. Bolo India creates a repository of diverse voices speaking Indian languages, where volunteer reads the corresponding text.
2. Suno India creates an open dataset through transcription of audio files.
3. Likho India creates open parallel translation datasets between corresponding sentences in two languages.
4. Dekho India creates an open data repository of images and the corresponding text.

This is where you come in.

We are inviting you to join us and voluntarily contribute to this initiative. Contributing at one stretch can be overwhelming hence you can do this by visiting the website multiple times and commence from where you left us off.

Every minute of your contribution will make a huge difference and bring us closer to our objective.

Speaker Diversity

EkStep Foundation

EkStep Foundation gathers partners and creates open infrastructure, tools and frameworks to solve complex societal problems at scale. We do this by leveraging the open source digital infrastructure we have created called Sunbird and by following a way of thinking and doing called the Societal Platform Approach.

EkStep is a not-for-profit organization founded by Rohini and Nandan Nilekani and Shankar Maruwada.