Data science cat and dog

Andrew Russell Green

Research, data science and software portfolio

Data science cat and dog

Andrew Russell Green

Research, data science and software portfolio

Wikimedia Fundraising
Wikimedia Fundraising

An analysis of donations to the Wikimedia Foundation based on publicly available data.

Skills used
Trend analysis
Data visualization
Data preparation
Python
Research methodology

This analysis shows trends in average donation amounts from banner campaigns in nine countries. Tasks included scraping, cleaning and preprocessing data from multiple public sources, and review of possible approaches for the high-level analysis.

I did this work as personal project. It's a proof-of-concept of the approach I used, not a completed analysis.

Approach

A principle I followed is that, whenever possible, we should consider the causal (in this case, social and economic) processes behind our data. The principle is applied in the analysis in two ways:

  1. Amounts are adjusted for local inflation in each country, and take into account fluctuations in exchange rates. This shows the value of the money donors gave, in terms of the local economic context, providing a sense of what donating meant to them.

  2. I abstracted away from the internal Wikimedia campaign concept in the raw data, and consolidated uninterrupted blocks of time over which donors were targeted based on country and language.

    This also highlights the donors’ point of view. Donors know that at certain times of the year, they see fundraising banners on Wikipedia, but they are indifferent to internal Wikimedia Foundation campaigns. (The concept comes from the banner targeting software that Wikimedia uses. A given donor segment may be targeted by many of these internal campaigns at the same time or in short succession.)

This approach ties into my theoretical work about culture, and aligns broadly with causal statistical analysis.

Results

Results should be seen as extremely tentative at best, given the limitations of the data and the analysis. For most countries analyzed, we see a downward trend in inflation-adjusted average donation amounts. In other words, in these countries, the local value of the average amount contributed by each donor has gone down.

There could be many possible explanations for these trends. One that comes to mind is that donors were influenced by the suggested donation amounts that appear on banners, and Wikimedia may have decreased these suggested amounts to promote recurring donations.

Below are plots for three of the nine countries analyzed: the Netherlands, Japan and India. (For the Netherlands and Japan, banners were shown on Wikipedias in English and the local language. In the legends, the first two letters indicate which Wikipedia the banners were shown on.)

Sources, Preprocessing and Analysis

The Wikimedia Foundation publishes average donation amounts here. Data is aggregated by internal Wikimedia campaign. Amounts are in US dollars.

To determine the targeted reader segments and dates of each campaign, I scraped the public logs of the Wikimedia banner system. Campaigns that targeted more than one country or segment were not considered. (For this reason, the analysis mostly omits English-speaking countries.)

To calculate inflation-adjusted donation amounts, first, I converted the USD amounts in the published data to local currencies using exchange rates at the time of the campaign. Then, I adjusted amounts to the current value of local currency based on local inflation rates, and converted back to USD at the current exchange rate. To merge data from contiguous campaigns, amounts were weighted using the total number donations from each campaign.

Exchange and inflation rates are from the International Monetary Fund’s International Financial Statistics, downloaded via the IMF’s API.

I set a change point for all trend lines to May, 2020 to take into account major shifts in economies and online habits due to the global pandemic.

Code

Code for this analysis is here. It includes a python package, a command-line tool and a Jupyter notebook.