Independent GenAI R&D Lab
AI for the languages the industry ignores
We build language models, datasets, and tools for underrepresented languages and cultures β starting with the 500 million Arabic speakers whose dialects remain invisible to frontier AI.
What we do
Three pillars of research, one mission
Low-Resource Languages
Pre-trained models, tokenizers, and datasets for 340+ underrepresented languages. No GPU required.
Agentic AI
End-to-end agentic LLM training pipelines, MCP tooling, and broad-coverage agent behaviors.
Cultural Alignment
Language models that reflect the values, norms, and linguistic realities of non-Western cultures.
What we ship
Flagship Projects
Sawalni
The first LLM for Moroccan languages: Darija and Amazigh in Latin, Arabic, and Tifinagh scripts.

Wikilangs
Open NLP models for 300+ languages, with base LLMs coming soon.

Wikilangs Games
A Wordle for the World: word games for every culture on Wikipedia, a fun and educative way to explore languages.
Sawalni Games
Word games adapted for Moroccans to connect with their language and culture through play.
Wikipedia Monthly
An open dataset for every language on Wikipedia, processed and refreshed monthly for the research community.
Publications
Research
Sawtone: A Universal Framework for Phonetic Similarity and Alignment Across Languages and Scripts
Lingua Posnaniensis, Vol. 67(1)
GenAI for Moroccan Darija: Challenges and Early Results
University of Navarra, Spain
Gherbal: A Multilingual Classifier for Low-Resource Languages
University Hassan II, Casablanca, Morocco
Stay connected
Reducing the digital divide, one language at a time
Get updates on our research, open-source releases, and upcoming publications.