- Access the models on huggingface: https://huggingface.co/JamieOgundiran
- Dataset is available on: https://huggingface.co/datasets/JamieOgundiran/AfricaLLM
AfricaLLM is a comprehensive research project that evaluates and fine-tunes Large Language Models (LLMs) for African languages. This project addresses the significant gap in culturally-aware AI systems for African languages by:
- Evaluating 5 open-source LLMs on African language benchmarks
- Fine-tuning models using World Values Survey (WVS) data from African countries
- Analyzing performance across 17 African languages and 3 key NLP tasks
- Providing comprehensive datasets, analysis tools, and reproducible results
The project tackles cultural bias in LLMs by leveraging authentic cultural data from African countries and creating more linguistically appropriate language models.
- Multi-Model Evaluation: Base and fine-tuned versions of Qwen, Llama, Mistral, and Gemma models
- Comprehensive Benchmarks: AfriQA, AfriSenti, and AfriXNLI tasks
- 16+ African Languages: From Amharic to Zulu with cultural context
- Advanced Fine-tuning: QLoRA-based efficient fine-tuning pipeline
- Rich Analysis Tools: Performance comparisons, visualizations, and language-specific insights
- Reproducible Research: Complete datasets, scripts, and result analysis
Our comprehensive evaluation shows significant improvements in African language understanding:
| Model Family | Base Performance | Finetuned Performance | Improvement |
|---|---|---|---|
| Qwen 32B | 0.3942 | 0.3973 | +0.78% |
| Gemma 27B | 0.3894 | 0.41785 | +5.1% |
| Qwen 8B | 0.2970 | 0.3172 | +6.80% |
| Llama 8B | 0.3139 | 0.2572 | -18.07% |
| Mistral 7B | 0.3032 | 0.3092 | +2.2% |
Note: Results show average performance across all tasks and languages
AfricaLLM/
├── data/ # Training and evaluation data
│ ├── _Finetune_/ # Fine-tuning datasets (with reasoning)
│ ├── _Finetune_No_Reasoning/ # Fine-tuning datasets (no reasoning)
│ ├── [Language]/ # Language-specific data (16+ languages)
│ │ ├── Finetune/ # Fine-tuning data per language
│ │ └── WVQ_[Language].csv # World Values data per language
│ ├── WVQ.jsonl # Combined World Values dataset
│ └── new_WVQ.jsonl # Enhanced World Values dataset
├── WVS_original_dataset/ # Original World Values Survey data
├── results/ # Comprehensive evaluation results
│ ├── result_raw/ # Raw model outputs
│ ├── result_cleaned/ # Processed results (CSV format)
│ └── result_analysis/ # Statistical analysis and summaries
├── data_preprocees.ipynb # Data preprocessing pipeline
├── finetune.ipynb # Model fine-tuning pipeline
├── results.ipynb # Results analysis and visualization
└── pyproject.toml # Project dependencies
# Clone the repository
git clone https://github.com/username/AfricaLLM.git
cd AfricaLLM
# Install dependencies using uv (recommended)
uv sync
# Or using pip
pip install -e .The easiest way to start is by exploring our pre-computed results:
import pandas as pd
# Load model performance comparison
model_results = pd.read_csv('results/result_analysis/model_averages.csv')
print(model_results)
# Load task-specific analysis
task_results = pd.read_csv('results/result_analysis/task_averages.csv')
print(task_results)
# Load language-specific performance
language_results = pd.read_csv('results/result_analysis/language_averages.csv')
print(language_results)Run the comprehensive results analysis:
# Open the results analysis notebook
jupyter notebook results.ipynbTransform World Values Survey data into training format:
# Open the preprocessing notebook
jupyter notebook data_preprocees.ipynb
# The notebook processes WVS data from multiple African countries:
# - Ethiopia (Amharic, Tigrigna, Oromo)
# - Kenya (Swahili)
# - Nigeria (Hausa, Igbo, Yoruba)
# - Zimbabwe (Shona, Ndebele)
# - Ghana (Twi, Ewe)
# - Rwanda (Kinyarwanda)
# - South Africa (Afrikaans, Sotho, Tswana, Xhosa, Zulu)Fine-tune models using our optimized QLoRA pipeline:
# Open the fine-tuning notebook
jupyter notebook finetune.ipynb
# Key configurations:
# - QLoRA with 4-bit quantization for efficiency
# - LoRA parameters: r=32, alpha=64, dropout=0.2
# - Optimized for African language datasets
# - Support for both reasoning and no-reasoning variantsSupported Model Families:
- Qwen: 8B, 32B variants
- Llama: 8B variants
- Mistral: 7B variants
- Gemma: 27B variants
Comprehensive evaluation on African language benchmarks:
- AfriQA: Question Answering in African languages
- AfriSenti: Sentiment Analysis across African languages
- AfriXNLI: Cross-lingual Natural Language Inference
Evaluation is done using lm-harness: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/afrobench
# Performance comparison between base and fine-tuned models
jupyter notebook results.ipynbOur project supports 16+ African languages with cultural context:
| Language | Code | Country | Script | Speakers |
|---|---|---|---|---|
| Amharic | amh | Ethiopia | Ge'ez | 57M |
| Afrikaans | afr | South Africa | Latin | 7M |
| Ewe | ewe | Ghana/Togo | Latin | 4.5M |
| Hausa | hau | Nigeria/Niger | Latin/Arabic | 70M |
| Igbo | ibo | Nigeria | Latin | 45M |
| Kinyarwanda | kin | Rwanda | Latin | 12M |
| Oromo | orm | Ethiopia | Latin | 37M |
| Shona | sna | Zimbabwe | Latin | 14M |
| Sotho | sot | South Africa | Latin | 5.6M |
| Swahili | swa | Kenya/Tanzania | Latin | 200M |
| Tigrinya | tir | Ethiopia/Eritrea | Ge'ez | 9M |
| Twi | twi | Ghana | Latin | 17M |
| Xhosa | xho | South Africa | Latin | 8.2M |
| Yoruba | yor | Nigeria/Benin | Latin | 45M |
| Zulu | zul | South Africa | Latin | 12M |
| + More | ... | Various | ... | ... |
- Model Size Impact: Larger models (32B) generally perform better but show diminishing returns
- Fine-tuning Effectiveness: Variable across model families - Qwen 8B shows +6.80% improvement
- Language-Specific Patterns: Some languages benefit more from fine-tuning than others
- Task Complexity: AfriXNLI shows more consistent improvements than AfriSenti
- Best Performing Languages: Swahili, Hausa, Yoruba (high-resource)
- Most Improved: Portuguese (+39.97%), Amharic (+36.34%)
- Challenging Languages: Tswana, Twi, Tsonga (limited resources)
We welcome contributions! Here's how you can help:
- Add New Languages: Contribute World Values Survey data for additional African languages
- Improve Models: Experiment with different fine-tuning strategies
- Enhance Analysis: Add new evaluation metrics or visualization tools
- Documentation: Improve guides and tutorials
# Fork and clone the repository
git clone https://github.com/yourusername/AfricaLLM.git
cd AfricaLLM
# Create development environment
uv sync --dev
# Submit pull requestIf you use this work in your research, please cite:
@misc{africallm2024,
title={AfricaLLM: Comprehensive Evaluation and Fine-tuning of Large Language Models for African Languages},
author={AfricaLLM Research Team},
year={2024},
url={https://github.com/username/AfricaLLM},
note={A comprehensive study of LLM performance on African languages with cultural context}
}This project is licensed under the MIT License - see the LICENSE file for details.
Making Large Language Models work for everyone, everywhere