Skip to content

CatarinaNSSilva/twitter_sentiment_python_r

Repository files navigation

How do Twitter users feel about Python and R?

Catarina Silva

September 20, 2020

 

Python and R are popular programming languages for statistics. There is a vast amount of discussion around what's the difference between them, which language is the best for data science and how to choose which one to learn first.

I wanted to have an overview of the Twitter community opinion behind each language. So I used "python AND language" and "rstats" for a Twitter search query for Python and R, respectively.

An interesting point I can conclude from this little exercise is that people think that both R and Python are easy (although the word "difficult" was twitted twice as much as "easy" for Python tweets and it didn't pop up in the top words for R!).

Top sentiments for Python

Sentiment Python

Top sentiments for R

Sentiment R

Word cloud of top Tweets for Python Word cloud of top Tweets for R
Word Could Python Word Could R

 

Here is how I conducted the analysis:

 

Load the packages


library("rmarkdown")
library("rtweet")
library("dplyr")
library("tidyr")
library("tidytext")
library("textdata")
library("ggplot2")
library("wordcloud2") 
library("webshot")
library("htmlwidgets")

 

Load data and process each set of tweets into tidy text


python_2020_09_19_clean = read.csv("./python_2020_09_19_clean.csv", header = T)
r_2020_09_20_clean = read.csv("./r_2020_09_20_clean.csv", header = T)

tweets_python_2020_09_19_clean = python_2020_09_19_clean %>% select(screen_name, text)
tweets_r_2020_09_20_clean = r_2020_09_20_clean %>% select(screen_name, text)

 

Use pre-processing text transformations to clean up the tweets

  1. Remove http elements manually

tweets_python_2020_09_19_clean$stripped_text1 = gsub("http\\S+","",tweets_python_2020_09_19_clean$text)

tweets_r_2020_09_20_clean$stripped_text1 = gsub("http\\S+","",tweets_r_2020_09_20_clean$text)

  1. Remove punctuation and add id to each tweet (note: unnest_tokens() converts to lower case)

tweets_python_2020_09_19_clean_stem = tweets_python_2020_09_19_clean %>% 
  select(stripped_text1) %>%
  unnest_tokens(word, stripped_text1)

tweets_r_2020_09_20_clean_stem = tweets_r_2020_09_20_clean %>% 
  select(stripped_text1) %>%
  unnest_tokens(word, stripped_text1)

  1. Remove stop words from your list of words (e.g. is, on...)

cleaned_tweets_python_2020_09_19_stem = tweets_python_2020_09_19_clean_stem %>%
  anti_join(stop_words)

cleaned_tweets_r_2020_09_20_stem = tweets_r_2020_09_20_clean_stem %>%
  anti_join(stop_words)

 

Have a look at the top 40 words


top_words_tweets_python_2020_09_19 = cleaned_tweets_python_2020_09_19_stem %>% 
  count(word, sort=TRUE) %>%
  top_n(40) %>%
  mutate(word = reorder(word,n))

cloud_phyton = wordcloud2(top_words_tweets_python_2020_09_19, size = 2, color = "grey", shape = "circle")
cloud_phyton

top_words_tweets_r_2020_09_20 = cleaned_tweets_r_2020_09_20_stem %>% 
  count(word, sort=TRUE) %>%
  top_n(40) %>%
  mutate(word = reorder(word,n))

cloud_r = wordcloud2(top_words_tweets_r_2020_09_20, size = 1, color = "grey", shape = "circle")
cloud_r

 

Get sentiment lexicons


get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc")) %>% filter(sentiment=="positive")
get_sentiments(lexicon = c("bing", "afinn", "loughran", "nrc")) %>% filter(sentiment=="negative")

bing_python = cleaned_tweets_python_2020_09_19_stem %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort=TRUE) %>%
  ungroup()

bing_r = cleaned_tweets_r_2020_09_20_stem %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort=TRUE) %>%
  ungroup()

 

Plot sentiment analysis for top 5 words


bing_python %>%
  group_by(sentiment) %>%
  top_n(5) %>%
  ungroup() %>%
  mutate(word = reorder(word,n)) %>%
  ggplot(aes(word, n, fill=sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(title="Python tweets",
       y="",
       x=NULL) +
  coord_flip() + theme_bw()

bing_r %>%
  group_by(sentiment) %>%
  filter(word != "plot") %>% 
  filter(word != "cloud") %>% 
  filter(word != "shiny") %>% 
  top_n(5) %>%
  ungroup() %>%
  mutate(word = reorder(word,n)) %>%
  ggplot(aes(word, n, fill=sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(title="R tweets",
       y="",
       x=NULL) +
  coord_flip() + theme_bw()


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages