Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Text Mining: Twitter Reacts to M. S. Dhoni's Decision

DZone's Guide to

Text Mining: Twitter Reacts to M. S. Dhoni's Decision

A look at the feelings behind the resignation of an Indian cricket team captain, using Hadoop and R.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Mahendra Singh Dhoni on Wednesday announced that he was stepping down as captain of limited-overs cricket. As expected, his decision sent his fans, other cricketers, and experts into a tizzy. Many congratulated Dhoni on his glorious captain's career with cricket legend Sachin Tendulkar leading from the front, saying “it’s a day to celebrate his successful career and respect the decision"

Then after, we (BDCoE Lab) decided to analyze the tweets and look at the sentiment surrounding the announcement using Hadoop and R.


Data Collection: In the 1st stage we fetched data from Twitter service and stored it in HDFS using Apache Flume.

Store Data in HDFS: The twitter JSON data was stored in the HDFS.

Apache Hive: We used Hive to transform the data into a formatted dataset for the data science process.

Data Science Using R 

  • Word Frequencies: A common task in text mining is to look at word frequencies.
word_tweets_dhoni %>%
  count(word, sort = TRUE) %>%
  filter(n > 3000) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_bar(stat = "identity") +
  xlab(NULL) +
  coord_flip()


  • Generate a WordCloud: An image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance.
library(wordcloud)
word_tweets_dhoni %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 200))


Sentiment Wordcloud

word_tweets_dhoni %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("#F8766D", "#00BFC4"),
                   max.words = 200)


  • Combinations of words using n-grams: Using bigrams to provide context in sentiment analysis. 

Words Preceded By "Captain"

captain_words <- tweets_dhoni_bigrams_separated %>%
  filter(word1 == "captain") %>%
  inner_join(AFINN, by = c(word2 = "word")) %>%
  count(word2, score, sort = TRUE) %>%
  ungroup()


captain_words %>%
  mutate(contribution = n * score) %>%
  arrange(desc(abs(contribution))) %>%
  head(20) %>%
  mutate(word2 = reorder(word2, contribution)) %>%
  ggplot(aes(word2, n * score, fill = n * score > 0)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  ylab("Words preceded by \"captain\"") +
  xlab("Sentiment score * #dhoni of occurrences") +
  coord_flip()


Words Preceded By Negation

negation_words <- c("not", "no", "never", "without","like")


negated_words <- tweets_dhoni_bigrams_separated %>%
  filter(word1 %in% negation_words) %>%
  inner_join(AFINN, by = c(word2 = "word")) %>%
  count(word1, word2, score, sort = TRUE) %>%
  ungroup()

negated_words %>%
  mutate(contribution = n * score) %>%
  mutate(word2 = reorder(word2, contribution)) %>%
  group_by(word1) %>%
  top_n(10, abs(contribution)) %>%
  ggplot(aes(word2, contribution, fill = n * score > 0)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  facet_wrap(~ word1, scales = "free") +
  xlab("Words preceded by negation") +
  ylab("Sentiment score * #dhoni of occurrences") +
  coord_flip()


Visualizing a Network of Bigrams With igraph

tweets_dhoni_bigrams_counts <- tweets_dhoni_bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

library(igraph)

tweets_dhoni_bigrams_graph <- tweets_dhoni_bigrams_counts %>%
  filter(n > 500 & n < 3000) %>%
  graph_from_data_frame()

tweets_dhoni_bigrams_graph

library(ggraph)

set.seed(2016)

a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

ggraph(tweets_dhoni_bigrams_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()


Source Code

Thanks....!!

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
text analysis ,big data ,hadoop ,r

Published at DZone with permission of Ankur Kumar. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}