Buzz Blog

Scientists Predicted American Idol Winner Using Twitter Data

Thursday, May 24, 2012
Last night, millions of viewers watched as Phillip Phillips was crowned "American Idol's" winner for the show's 11th season. But hours before the announcement, computer scientists had already predicted he would win after looking at relevant Twitter posts, and they published their prediction online.

A key factor missing from other analyses — the geographic location of the tweets — proved critical in making the final prediction. In addition to the final prediction, the researchers also successfully predicted many of the show's earlier eliminations — a more difficult task considering the greater number of contestants.

Phillip Phillips, the 11th American Idol winner. Image Credit: Fox Television via ABC News.

Researchers have been looking at Twitter's predictive power for applications including disease spreading, stock market behavior and political election outcomes. Twitter users, however, don't always represent an accurate sample of a target population. Consequently, predictions made for events like political elections have had mixed results.

Bruno Gonçalves, a computer scientist at Northeastern University, and his team hoped to better understand what Twitter can and cannot predict by looking at "Idol" Twitter chatter (no pun intended) between 8:00 pm and 3:00 am on the show's election nights.

"People don't usually go into the effort to understand the intrinsic limitations [of Twitter]," said Bruno Gonçalves, a computer scientist at Northeastern University and coauthor of the American Idol research.

So why study a TV show? Apparently, there are a number of assumptions that go into traditional election analyses that hold particularly well for American Idol.
  • The stakes aren't as high as real elections, so there aren't as many organized interests pushing for one candidate over another.
  • Fans are allowed to vote multiple times, so particularly vocal tweeters can have more of an impact on the actual results by calling in several times, unlike traditional elections.
  • The people tweeting about American Idol tend to fit the demographics of American Idol voters. For political elections, Twitter users discussing politics may tend to be activists, which skews predictive results toward one political extreme.
Furthermore, there's a mini-election every week on American Idol, giving researchers ample opportunity to test their analyses. Overall, the analysis accurately predicted the eliminated contestant or the three contestants with the fewest votes for a significant majority of the weeks.

And the winner is...

For the three weeks leading up to the finale, the research team correctly predicted every person voted off of the show. There's only a 1.7 percent chance that random guessing would have yielded all of these correct predictions. Also, for all but one week, the analysis correctly predicted either the bottom three contestants or two of the bottom three contestants.

The percentage of tweets about each of the contestants for the three weeks leading up to the finale. Contestants who were voted off are shown in red, and error bars for the 99 percent confidence interval are shown in black. Image Courtesy Ciulla et al. via arxiv.org.

That's pretty impressive, considering the team wasn't even considering whether tweets were positive or negative. Discussions about a contestant, regardless of the specific content, seemed to predict his or her success. But there was one important component that the team did include for the finale prediction: the geographic location of the Twitter users.

Only U.S. viewers are allowed to cast votes for the show (although there are some workarounds, most votes seem to come from the U.S.). Therefore, tweet location was vital for the final showdown between Phillip Phillips and Jessica Sanchez. Although Sanchez had much more Twitter volume globally, (she had a lot of support from the Phillipines, her mother's native country) Phillips had a slight edge in the States. Other analyses that tallied the total volume of Tweets missed this aspect, and they wrongly predicted that Sanchez would win.

Nonetheless, this analysis was still fairly basic, and there's several other factors, such as the sentiment of the tweets, worth considering in future research.

"Here, our goal was to establish a solid baseline," said Gonçalves.

Other than the potential applications to actual elections, there could be widespread implications for American Idol itself, including the huge betting market for the show. By predicting the winners ahead of time with this public data, someone could stand to win a substantial amount of money. When analyses start to consider even more variables, they'll likely become more accurate -- and potentially lucrative.

But the research team simply wanted to know more about Twitter's predictive power, according to Gonçalves. When asked whether he was a fan of the show, Gonçalves admitted that he had never really sat down to watch it.

"This was a purely scientific interest," he said.

You can find out more about the team's analysis through their research paper on arxiv.org. In particular, notice the timestamp for the latest version of the paper, indicating that it was published before the final "Idol" results were announced.

-------------------------------------------------------------------------------------

If you want to keep up with Hyperspace, AKA Brian, you can follow him on Twitter.
Posted by Hyperspace