Can Big Data root out corruption in Africa?

Many anticorruption advocates are excited about the prospects that “big data” will help detect and deter graft and other forms of malfeasance. But good data alone isn’t enough. To be useful, there must be a group of interested and informed users, who have both the tools and the skills to analyse the data to uncover misconduct, and then lobby governments and donors to listen to and act on the findings. The analysis of big datasets to find evidence of corruption requires statistical skills and software, both of which are in short supply in many parts of the developing world, such as sub-Saharan Africa.

Yet some ambitious recent initiatives are trying to address this problem. Oxford mathematician Balázs Szendrői together with his colleague Danny Parsons and Elizabeth Dávid-Barrett from the University of Sussex have been leading one such intiative that helps empower a group of young African mathematicians to analyse “big data” on public procurement.

As part of the British Academy/DFID-funded project (Curbing Corruption in Development Aid-funded Procurement) Elizabeth, together with Mihály Fazekas and Olli Hellmann had painstakingly collected contract-level data from three major donors covering 20 years. However, data is only the start. Elizabeth explains:

"The first step in this project was to develop software; this may seem trivial, but many cash-strapped African universities simply don’t have the resources to purchase the latest statistical software packages. The African Maths Initiative (AMI), a Kenyan NGO that works to create a stronger mathematical community and culture of mathematics across Africa, has helped to solve this problem by developing a new open-source program, R-Instat (which builds on the popular but difficult-to-learn statistics package R), funded through crowd-sourcing. Still in development, it is on track for launch in July this year. AMI has also helped develop a menu on R-Instat that can be used specifically for analysing procurement data and identifying corruption risk indicators.

Once we’ve got the data and the software to analyze it, the next and most crucial ingredient are the people. For “big data” to be useful as an anticorruption tool, we need to bring together two groups: people who understand how to analyse data, and people who understand how procurement systems can be manipulated to corrupt ends. Communication between the two is essential. So last month I tried to do my part by visiting AIMS Tanzania, an institute that offers a one-year high-level Master’s programme to some of Africa’s best math students, to help conduct a one-day workshop. After a preliminary session in which we discussed the ways in which the procurement process can be corrupted, and how that might manifest in certain red flags (such as single-bidder contracts), the students had the opportunity to use the R-Instat software to analyse the aid-funded procurement dataset that my colleagues and I had created. Students formed teams and developed their own research questions that they attempted to answer by using R-Instat to run analyses on the data.

Even the simplest analyses revealed interesting patterns. Why did one country’s receipts from the World Bank drop off a cliff one year and never recover? Discussion revealed a few possible reasons: perhaps a change of government led donors to change policy, or the country reached a stage of development where it no longer qualified for aid? Students became excited as they realised how statistical methods could be applied to identify, understand and solve real-world problems. Some teams came up with really provocative questions, such as the group who wanted to know whether Francophone or Anglophone countries were more vulnerable to corruption risks. Their initial analysis revealed that contracting in the Francophone countries was more associated with red flags. They developed the analysis to include a wider selection of countries, and maintained broadly similar results. Another group found that one-quarter of contracts in the education sector in one country had been won by just one company, and more than half of contracts by value in this sector had been won by three companies, all of which had suspiciously similar names. Again, there might be perfectly innocent reasons for this, but in just a couple of hours, we had a set of preliminary results that certainly warrant further analysis. Imagine what we might find with a little more time!

It is programs like these, that develop the tools and cultivate the skills in the next generation of analysts, that will determine whether the promise of “big data” as an anticorruption tool will be realised in the developing world."

A fuller discussion of the research appears in the Sussex Centre for Corruption blog. In the image above Balázs Szendrői from Oxford Mathematics addresses the students.