December 1st, 2016

Using Sentiment Analysis to Measure Election Surprise

Category: Data Journalism
prediction_comparisons

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset.
One recent event that surprised many people was the November 8th US Presidential election. Hillary Clinton, who ended up losing the race, had been given chances ranging from a 71.4% (FiveThirtyEight), to a 85% (New York Times), to a >99% chance of victory (Princeton Election Consortium).
prediction_comparisons

Credit: New York Times

To measure the shock of this upset, we decided to examine comments made during the announcements of the election results and see how (if) the sentiment changed. The sentiment of a comment is measured by how its words correspond to either a negative or positive connotation. A score of ‘0.00’ means the comment is neutral, while a higher score means that the sentiment is more positive (and a negative score implies the comment is negative).
Our dataset is a .csv of all Reddit comments made during 11/8/2016 to 11/10/2016 (UTC) and is courtesy of /u/Stuck_In_the_Matrix. All times are in EST, and we’ve annotated the timeline (the height of the bars denotes the number of comments posted during that hour):
politics_counts2


We examined five political subreddits to gauge their reactions. Our first target was /r/hillaryclinton, Clinton’s primary support base. The number of comments reached a high starting at around 9pm EST, but the sentiment gradually fell as news came in that Donald Trump was winning more states than expected.
hillaryclinton_counts

/r/hillaryclinton: Number of Comments per Hour

hillaryclinton_sentiment

/r/hillaryclinton: Mean Sentiment Score per Hour

What is interesting is the low number of comments made after the election was called for Donald Trump. I suspect that it may have been a subreddit-wide pause on comments due to concerns about trolls, but I’m not sure; I contacted the moderators but haven’t received a response back yet.
A few other left-leaning subreddits had interesting results as well. While /r/SandersforPresident was closed for the election season post-Bernie concession, it’s successor, /r/Political_Revolution, had not closed and experienced declines in comment sentiment as well.
political_revolution_counts

/r/Political_Revolution: Number of Comments per Hour

political_revolution_sentiment

/r/Political_Revolution: Mean Sentiment Score per Hour


On /r/The_Donald (Donald Trump’s base), the results were the opposite.
the_donald_counts

/r/The_Donald: Number of Comments per Hour

the_donald_sentiment

/r/The_Donald: Mean Sentiment Score per Hour


There are also a few subreddits that are less candidate- or ideology-specific: /r/politics and /r/PoliticalDiscussion. /r/PoliticalDiscussion didn’t seem to show any shift, but /r/politics did seem to become more muted, at least compared to the previous night.
politicaldiscussion_counts

/r/PoliticalDiscussion: Number of Comments per Hour

politicaldiscussion_sentiment

/r/PoliticalDiscussion: Mean Sentiment Score per Hour
politics_sentiment
/r/politics: Mean Sentiment Score per Hour

To recap,

  1. Reddit political subreddits experienced a sizable increase in activity during the election results
  2. Subreddits differed in their reactions to the news along idealogical lines, with pro-Trump subreddits having higher positive sentiment than pro-Clinton subreddits

What could be the next steps for this type of analysis?

  1. Can we use these patterns to classify the readership of the comments sections of newspapers as left- or right-leaning?
  2. Can we apply these time-series sentiment analyses to other events, such as sporting events (which also includes two ‘teams’)?
  3. Can we use sentiment analysis to evaluate the long-term health of communities, such as subreddits dedicated to eventually-losing candidates, like Bernie Sanders?

Leave a Reply

The Making of H2O Driverless AI – Automatic Machine Learning

It is my pleasure to share with you some never before exposed nuggets and insights

December 5, 2018 - by Arno Candel
Gratitude and thank you, makers!

Makers, Happy Thanksgiving - Hope you get to spend time with your loved ones this week. Thank them

November 21, 2018 - by Saurabh Kumar
New features in H2O 3.22

Xia Release (H2O 3.22) There's a new major release of H2O and it's packed with new

November 12, 2018 - by Jo-Fai Chow
Top 5 things you should know about H2O AI World London

We had a blast at H2O AI World London last week! With a record number

November 6, 2018 - by Bruna Smith
Fallback Featured Image
Anomaly Detection with Isolation Forests using H2O

Introduction Anomaly detection is a common data science problem where the goal is to identify odd

November 6, 2018 - by angela
Fallback Featured Image
Launching the Academic Program … OR … What Made My First Four Weeks at H2O.ai so Special!

We just launched the H2O.ai Academic Program at our sold-out H2O AI World London. With

October 30, 2018 - by Conrad

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img