3. Sentiment analysis

3.1. Importing textblob

As we mentioned at the beginning of this workshop, textblob will allow us to do sentiment analysis in a very simple way. We will also use the re library from Python, which is used to work with regular expressions. For this, I'll provide you two utility functions to: a) clean text (which means that any symbol distinct to an alphanumeric value will be remapped into a new one that satisfies this condition), and b) create a classifier to analyze the polarity of each tweet after cleaning the text in it. I won't explain the specific way in which the function that cleans works, since it would be extended and it might be better understood in the official redocumentation.

The code that I'm providing is:

from textblob import TextBlob
import re

def clean_tweet(tweet):
    '''
    Utility function to clean the text in a tweet by removing
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analize_sentiment(tweet):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(clean_tweet(tweet))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

The way it works is that textblob already provides a trained analyzer (cool, right?). Textblob can work with different machine learning models used in natural language processing. If you want to train your own classifier (or at least check how it works) feel free to check the following link. It might result relevant since we're working with a pre-trained model (for which we don't not the data that was used).

Anyway, getting back to the code we will just add an extra column to our data. This column will contain the sentiment analysis and we can plot the dataframe to see the update:

# We create a column with the result of the analysis:
data['SA'] = np.array([ analize_sentiment(tweet) for tweet in data['Tweets'] ])

# We display the updated dataframe with the new column:
display(data.head(10))

Obtaining the new output:

	Tweets	len	ID	Date	Source	Likes	RTs	SA
0	On behalf of @FLOTUS Melania & myself, THA...	144	903778130850131970	2017-09-02 00:34:32	Twitter for iPhone	24572	5585	1
1	I will be going to Texas and Louisiana tomorro...	132	903770196388831233	2017-09-02 00:03:00	Twitter for iPhone	44748	8825	1
2	Stock Market up 5 months in a row!	34	903766326631698432	2017-09-01 23:47:38	Twitter for iPhone	44518	9134	0
3	'President Donald J. Trump Proclaims September...	140	903705867891204096	2017-09-01 19:47:23	Media Studio	47009	15127	0
4	Texas is healing fast thanks to all of the gre...	143	903603043714957312	2017-09-01 12:58:48	Twitter for iPhone	77680	15398	1
5	...get things done at a record clip. Many big ...	113	903600265420578819	2017-09-01 12:47:46	Twitter for iPhone	54664	11424	1
6	General John Kelly is doing a great job as Chi...	140	903597166249246720	2017-09-01 12:35:27	Twitter for iPhone	59840	11678	1
7	Wow, looks like James Comey exonerated Hillary...	130	903587428488839170	2017-09-01 11:56:45	Twitter for iPhone	110667	35936	1
8	THANK YOU to all of the incredible HEROES in T...	110	903348312421670912	2017-08-31 20:06:35	Twitter for iPhone	112012	29064	1
9	RT @FoxNews: .@KellyannePolls on Harvey recove...	140	903234878124249090	2017-08-31 12:35:50	Twitter for iPhone	0	6638	0

As we can see, the last column contains the sentiment analysis (SA). We now just need to check the results.

3.2. Analyzing the results

To have a simple way to verify the results, we will count the number of neutral, positive and negative tweets and extract the percentages.

# We construct lists with classified tweets:

pos_tweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] > 0]
neu_tweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] == 0]
neg_tweets = [ tweet for index, tweet in enumerate(data['Tweets']) if data['SA'][index] < 0]

Now that we have the lists, we just print the percentages:

# We print percentages:

print("Percentage of positive tweets: {}%".format(len(pos_tweets)*100/len(data['Tweets'])))
print("Percentage of neutral tweets: {}%".format(len(neu_tweets)*100/len(data['Tweets'])))
print("Percentage de negative tweets: {}%".format(len(neg_tweets)*100/len(data['Tweets'])))

Obtaining the following result: Percentage of positive tweets: 51.0% Percentage of neutral tweets: 27.0% Percentage de negative tweets: 22.0%

We have to consider that we're working only with the 200 most recent tweets from D. Trump (last updated: September 2nd.). For more accurate results we can consider more tweets. An interesting thing (an invitation to the readers) is to analyze the polarity of the tweets from different sources, it might be deterministic that by only considering the tweets from one source the polarity would result more positive/negative. Anyway, I hope this resulted interesting.

As we saw, we can extract, manipulate, visualize and analyze data in a very simple way with Python. I hope that this leaves some uncertainty in the reader, for further exploration using this tools.

Go back to 2. Visualization and basic statistics
Go next to 4. References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03-sentiment-analysis.md

03-sentiment-analysis.md

3. Sentiment analysis

3.1. Importing textblob

3.2. Analyzing the results

Files

03-sentiment-analysis.md

Latest commit

History

03-sentiment-analysis.md

File metadata and controls

3. Sentiment analysis

3.1. Importing textblob

3.2. Analyzing the results