Pump it Up: Data Mining the Water Table
This project was part of a competition hosted by drivendata.org. The goal of the competition was to use data from Taarifa and the Tanzanian Ministry of Water and predict which pumps were functional, which needed repair and which were non-functional. Results obtained helped optimize Tanzanian maintenance operations and ensure that clean, potable water is available to communities across Tanzania.
Background
The demand for water in Tanzania is high. Tanzania has a population of approximately 57 million. It has one of the fastest-growing economies in Africa, but not all areas have grown equally. In particular, women and girls, spend a significant amount of time traveling to collect water.
Exploratory Data Analysis
Some of the key insights obtained from exploring the data:
- There are a total of 23,000 non-functional wells.
- Most wells have good water quality but about 17,000 with good water quality are non-functional.
- Both submersible and motor pumps tend to fall at a higher rate than gravity and handpump wells. This is probably due to the increase in maintenance necessary.
- About 9,000 wells that are currently non-functional have adequate water availability. However, they can not be accessed.
- Well restoration should be prioritized to the ones that have good and high water quality and with gravity or handpump as their lifting system. About 4,500 wells meet these criteria and should be prioritized for restoration.
Model
The modeling approach used to classify the wells was SMOTE and xgboost. The model had an accuracy of around 80% (top 15% in the competition leaderboard). More details can be found on my GitHub page.