Monitoring the most controversial wikipedia articles

How I've built a web application that keeps track of edits and controversy scores of the English speaking Wikipedia to monitor editwars as they unfold in realtime.

Edit wars in Wikipedia

Because anyone can edit Wikipedia, sometimes people disagree about what an article should say. These disagreements can lead to what’s called an edit war.

An edit war happens when two or more people keep changing an article back and forth. For example, one person might add some information, and another person removes it. Then the first person adds it again, and so on. Instead of discussing the issue calmly, they fight by editing the page over and over.

In the past, this phenomenon has been studied and described by several studies. Multiple approaches for calculating the controversy of a Wikipedia article have been suggested. These studies most often used data dumps from Wikipedia to answer questions such as what articles are the most controversial. One limitation of these monitoring techniques is that they are always calculated for a given point in time and not continuously updated. The goal of my little exploration is to calculate the controversy values of Wikipedia articles as done by previous work but do it frequently for a given time period so that one can see the “live” values of the controversies and how they evolve in real-time.

Definition of the controversy score of a Wikipedia article

The controversy value $M$ as defined in the work of Yasseri et. al. ¹ is based on mutual reverts. A revert is an editorial change that does undo the changes from another editor. A mutual revert is when editor A reverts editor B and vice versa. Yasseri et. al. came up with the following definition of $M$ :

M = E \sum_{\text{all mutual reverts}} min(N^d, N^r)

Where $N^r, N^d$ is the number of edits for the article committed by reverting/reverted editor. The authors use the sum over mutual reverts rather than single reverts. Because reverting is part of the normal workflow e.g. for defending against vandalism.

I also decided to go with the mutual reverts as an indication of a dispute. Mostly because they are an easy metric to compute and are probably a good indication of a disagreement between editors. Also, the concept of mutual reverts is very near the definition of what an edit war is by definition, that said it certainly also comes with its limitation. A more sophisticated method to detect disagreement may include a semantic analysis of the content of the edits.

Since we want to measure the controversy in a given time frame I came up with the following method: Let $t$ be our timeframe defined by a starting and end date, $R(t)$ the set of mutual reverts of an article in a given time frame $t$ :

M_{[t_0, t_1]}= \sum_{(u, v) \in R_{[t_0, t_1]}} \min(e_u(t_r), e_v(t_r))

Where $t_r$ is the timestamp of the revert and $R_{[t_0, t_1]}$ is the set of mutual revert events in the window $[t_0, t_1]$ and $e_u(t), e_v(t)$ are the cumulative number of edits by users $u$ and $v$ up to time $t$

Implementation

The monitoring application consists of three components. A little Python script that listens to a websocket stream of all Wikipedia edits. A timescale database that consists of editing events as a time series and runs a cron job to calculate the controversy score every hour. And finally, a web application that presents the statistics to users.

Consuming recent Wikipedia edits

The recent changes in the form of a stream of changes come from the wikimon project from hatnote ². Wikimon aggregates recent changes to Wikipedia’s IRC feed and streams it over websocket.

In this project, this websocket stream gets consumed and filters out all edits that don’t belong to the “Main” namespace of the English Wikipedia. So all discussion and talk pages are not part of the analysis.

For each change, the consumer script does an API call to Wikipedia to retrieve the content of the revision of the article. For a fast comparison of the article content, an md5 hash is computed on the wikitext content of an edit.

The content hash is then stored in the database along with the metadata of the edit.

Calculating the c-score

Every hour the controversy scores of the articles edited in a predefined time frame get recalculated. Reverts are detected via the md5 hashes of the article’s content. The c-score is then calculated as defined before.

Make monitoring visible to users.

Lastly, I’ve created a minimal web application that displays the metrics we calculated in a simple table. The 5 most controversial articles as well as the 10 most edited pages are displayed and can be filtered by different time periods.

The current version of the application is available here.

And the source code is openly available on my gitlab.

Call for suggestions

This small project was very fun to put together and I personally think I’m only scratching the surface of what’s possible with the stream of edit data. So if you have an idea for a specific metric that would be interesting to calculate or a certain way of how the data can be visualized in a different way, please get in touch with me.

Finally, I want to thank Hatnote for hosting and creating wikimon. Without that easy way to connect to the stream of Wikipedia edits, this project wouldn’t have been able to come together so smoothly.