The DHS AI/ML Toolkit is a suite of interoperable tools designed to enable scalable, explainable, and efficient AI workflows tailored for Demographic and Health Surveys (DHS) data. Built with a focus on open science, public health impact, and technical reproducibility, the toolkit supports every stage of the data pipeline — from ingestion and transformation to modeling, visualization, and policy application.
The project was inspired by a 2020 study by Bitew et al., which highlighted the untapped potential of applying data science and machine learning methods to DHS survey data. In response, our team launched this initiative to promote standardized, reproducible AI frameworks and support graduate students, researchers, and organizations working with DHS data.
Whether you are building child mortality risk models, spatial dashboards, or Bayesian inference systems, the DHS AI/ML Toolkit offers powerful components to support data-driven development and decision-making in low-resource and research settings.
Toolkit Components
- DHS-To-Database-dhs2CSVTables-simplified (Open Source)
A simplified Python wrapper for converting raw DHS survey data into clean CSV tables and SQLite databases. Built on top of the original DHS-To-Database by Harry Gibson, this tool offers a user-friendly interface, Python 3.8+ support, and streamlined data engineering for research and analysis workflows.
- CIAO BAYESIAN
An explainable AI (xAI) system built on Bayesian statistics to analyze under-five child survival risks. Extending the work of KILIMA TULIP AI, this project emphasizes transparency and interpretability in modeling, offering insights through probabilistic reasoning. Results can be explored interactively via the FLOWER Dashboard.
- KILIMA TULIP AI
A deep learning model developed to predict under-five child survival outcomes across five African countries: Ethiopia, Ghana, Uganda, South Africa, and Zimbabwe. Building on the success of DEEP MINTILO AI, it achieves 95% accuracy using DHS survey data to support efforts in improving child health.
- DEEP MINTILO AI
A deep learning model focused on predicting under-five child survival risk in Ethiopia. As an extension of MINTILO AI, it leverages neural networks to uncover key risk factors and has achieved over 90% prediction accuracy using DHS survey data.
- MINTILO AI (Open Source)
An open source machine learning project built in Python, applying classical models like logistic regression, KNN, Random Forest, Gradient Boosting, and CatBoost to DHS survey data. MINTILO AI uncovers key risk factors for under-five child survival and welcomes contributions from data scientists and developers passionate about public health and AI for good.
- WATOTO SURVIVAL (Open Source)
An open source project developed in R using classical survival analysis techniques like Kaplan-Meier and Cox regression to study under-five child survival. Built on DHS survey data, WATOTO SURVIVAL enables collaboration among researchers and graduate students, and welcomes contributions from those interested in advancing public health analytics through open science.
- DHS AI Genesis
DHS AI Genesis was our initial step toward building a platform that applies data science and machine learning to DHS data. It originated as a space to demonstrate how machine learning algorithms can be used to extract insights from household survey data. Genesis marked the beginning of our broader work on the DHS AI/ML Toolkit and remains a reference point for researchers interested in this field.