[UPDATE 15/03/30] Check out this follow-up post to get the R code used to extract and clean the Politwoops & Twitter data.
A package that automates the data collection is now available on GitHub: PolitwoopsR.

For some months now, I’ve been playing with data from Politwoops: an international project that tracks politicians on Twitter and stores their deleted tweets for posterity. The project started in the Netherlands and spread to dozens of other countries. In the US, it is run by the good people at the Sunlight Foundation. Looking at those US deleted tweets has been one of my side projects at the Lazer Lab. The initial idea was to separate the tweets deleted for trivial reasons (typos, broken links) from those that had politically problematic content, compare the latter to non-deleted tweets, and see if there are political issues/topics that appear disproportionally often in removed content from the left and the right.

I am still working on some of that, along with colleagues at the lab. In the meantime, I’ve put together some fun descriptive stats and posted them as charts below: the top most deleted terms, hashtags, and urls by Democrats & Republicans, the US states where politicians tend to go back and edit their Twitter history most often, the top tweet-deleting politicians in the US, the sentiment of the tweets, a semantic network of deleted terms, and so on.
Thanks are due to Open State‘s Breyten Ernsting (@breyten) and Sunlight‘s Nicko Margolies (@SFnicko) whose helpful tweets clued me in on data structure.

Data: politwoops.sunlightfoundation.com


Top 10 Politicians Deleted Tweets

Top 10 Politicians %Deleted Tweets


Deleted Tweets by Party (2011-2015)

Tweet Sentiment by Party (2011-2015)


Deleted tweets per politician by state over time – Total

Deleted tweets per politician by state over time – Democrats

Deleted tweets per politician by state over time – Republicans

Top terms in deleted tweets (2011-2015)