DHS AI Genesis is a Machine Learning (ML) framework that showcases applications of data science and machine learning to Demographic and Health Surveys (DHS) data. This platform was created to support and inspire students, researchers, and developers who are interested in building and sharing tools that can analyze and visualize insights from household survey datasets. We started this project with a few guiding motivations:
- Demonstrate what is possible: Many machine learning tutorials are built on synthetic or toy datasets. Our goal is to demonstrate practical workflows on real-world DHS data that can be extended or replicated in other research contexts.
- Support reproducibility and transparency: All data processing steps, code logic, and application behavior are visible and modifiable. This makes it easier for others to learn from and improve upon our tools.
- Build lightweight, portable tools: Every tool on this platform is designed to run on modest computing infrastructure (e.g., a laptop or low-cost server), using simple tech stacks. We want our tools to be usable in constrained environments as well as scalable ones.
What You Will Find Here
This platform includes a set of web applications, each focused on a particular task or workflow:
- Importing and transforming DHS household or child-level data
- Estimating and visualizing child survival outcomes
- Comparing patterns across regions, gender, or other covariates
- Running classification or survival models on DHS-like variables
ETL
The data used in these apps has been preprocessed and structured using a pipeline that transforms DHS flat files into normalized tables. This ETL (Extract, Transform, Load) process involves:
- Renaming DHS variables with readable column names
- Joining household or child records with metadata (region, cluster, survey year, etc.)
- Filtering to include relevant age groups or variables
- Exporting cleaned data to SQLite-compatible tables
This work builds upon the excellent open-source project by Harry Gibson, whose repository DHS-To-Database provides foundational scripts and schemas for extracting and transforming DHS datasets into usable tabular formats. We gratefully acknowledge their contribution as the starting point for our own adaptations.
Our Broader Goal
DHS AI Genesis is part of a larger vision to promote the use of machine learning and explainable AI for survey-based public health research. It laid the foundation for the development of the more comprehensive DHS AI/ML Toolkit which includes more advanced modeling tools and broader country coverage.
We believe that tools like these can help researchers and decision-makers make better use of publicly available survey data, particularly in low-resource settings where tools must be efficient, interpretable, and easy to deploy.
If you are interested in collaborating, adapting one of the apps, or sharing your feedback, please do not hesitate to reach out via our main kofiyatech contact page.
The DHS Program
The original DHS datasets are collected and maintained by The DHS Program, which provides open access to a wide range of standardized health and demographic surveys. The DHS Program has conducted surveys in over 90 countries. These surveys have been ongoing since 1985 and have provided data for population, health, and nutrition programs worldwide.
Researchers can request access to the data at https://dhsprogram.com/data/. We thank The DHS Program for making high-quality data freely available to the global research community.
DISCLAIMER
This platform is an independent research and learning tool. It is not affiliated with The DHS Program, or any other governmental or international organization. The content, visualizations, and analyses provided here are for educational and exploratory purposes only. The interpretations and tools are those of the authors and do not necessarily reflect the views of any associated institutions.