Nepali AI Initiative Phase 1

AI for the
Nepali language.

17 million speakers. One of the most underserved languages in AI. We're fixing that — openly.

17M+
Speakers
Phase 1
Live Now
CC BY 4.0
License
0
Datasets
The problem

Nepali is invisible to AI.

Nepali is spoken by over 17 million people as a first language — and tens of millions more across India, Bhutan, and the diaspora. Yet it has almost no dedicated AI tooling.

Most large language models perform poorly on Nepali. There are almost no public datasets for training, no open speech corpora, and no production-grade translation APIs.

Suswo is changing that. We're starting with open data — because you can't build models without it.

Our mission

Build. Publish. Repeat.

Collect and clean Nepali text, speech, and parallel data.

Train and publish open models on Hugging Face under permissive licenses.

Build developer tools — APIs, libraries, and demos — that make Nepali AI usable.

Roadmap

Three phases. Long game.

Phase 1 — Now Live

Open Contributor

Open community. Datasets and models published freely on Hugging Face and GitHub under CC BY 4.0 and Apache 2.0.

Project scaffolding First dataset pipeline Public dataset release First model checkpoint
Phase 2 — Next

Open Core

Free base models with a hosted API. Developers can call the API for free at limited rate — pay for production scale.

○ REST API design ○ Rate-limited free tier ○ Paid hosted tier
Phase 3 — Future

Commercial Products

Nepali speech-to-text, translation API, writing assistant, and government / NGO contracts.

○ Speech-to-text API ○ Translation API ○ Writing assistant ○ Enterprise contracts
Open Datasets

Data we've published.

Datasets publishing soon.

Phase 1 is underway. The first dataset will appear here and on Hugging Face when it passes our quality review.

Watch on Hugging Face ↗
Open Models

Models we've trained and published.

Models publishing soon.

We train on the published datasets. The first checkpoint will release alongside or shortly after the first dataset.

Get involved

Contribute to Nepali AI.

📝

Contribute Data

Help us collect Nepali text, speech recordings, and parallel translations. No ML experience needed.

Contributor guide ↗
💻

Contribute Code

Build dataset pipelines, evaluation scripts, and NLP tooling. We work in Python, use Hugging Face Transformers, and move fast.

View repos ↗
📢

Spread the Word

Share the project with Nepali-speaking communities, universities, and researchers. Awareness is half the battle for low-resource languages.

Share on LinkedIn ↗

Ready to contribute?

Every dataset row, every commit, every share helps build the future of Nepali AI.

Hire Suswo Try Revamint