Sentiment Analysis in Twitter

5 min readApr 5, 2020

Titan Tutorial #7: Building and deploying a basic Sentiment Analysis model

Sentiment Analysis is a subset of NLP (Natural Language Processing) focused in the identification of opinions and feelings from texts.

For example, these techniques are commonly used to understand the feelings of the customers about a product or service, or to measure the success of a marketing campaign.

In this tutorial, we will see how to build and deploy a basic Sentiment Analysis using TextBlob, a well-know library to process textual data.

Sentiment Analysis can be tackled from two different perspectives, a Machine Learning approach (with supervised or unsupervised models), or a Lexicon approach:

Different approaches to Sentiment Analysis

Lexicon is the approach to sentiment analysis which involves estimating the sentiment from the semantic orientation of word or phrases that occur in a text.

For example, words like inspiration or love will be weighted with a “positive” sentiment, while words like “sham” or “disgust” will be considered as negative. When processing a sentence, Lexicon techniques sum or average the values of all the identified words to come up with a Sentiment estimation.

For this tutorial, we will be using TextBlob, a very well-know Python library with several tools for data processing (Noun phrase extraction, Sentiment Analysis, Classification, Translation and more) and, also NLTK (Natural Language Toolkit), a library to help with text tokenization, parsing, stemming, tagging, etc.

In order to retrieve the tweets for the analysis, we will use Tweepy, an easy-to-use Python library for accessing the Twitter API.

NOTE: Please note that, in order to use Tweepy, you will need to open a Twitter Developer Account and to create a new application to obtain the required credentials. You can see the complete process here.

Let’s now dive into the actual model we want to build. The idea of the model we are about to build is the following:

Aim: Calculate the sentiment analysis about a topic based on the last 100 tweets about the subject.

First of all, let’s define the YAML specification for our deployment in an initial markdown cell as we saw in Tutorial #5.

```yaml
titan: v1
service:
  image: tensorflow
  machine:
    cpu: 2
    memory: 1024MB
  command:
    - pip install requirements.txt
```

Please note that it has been included a pip install command to meet the required dependencies specified in therequirements.txtfile. Namely, the content of this file is:

tweepy
nltk
textblob

After this we can define the required imports for the model:

import pandas as pd
import json
import re
import tweepy as tw
from nltk.corpus import stopwords
from textblob import TextBlob

For this model, we will also need the Twitter credentials to allow Tweepy to access Twitter’s API:

consumer_key= 'your consumer key'
consumer_secret= 'your consumer secret'
access_token= 'your access token'
access_token_secret= 'your access token secret'

Once the credentials have been set, the API authentication for Twitter can be set up:

auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

Next step is to define some helper functions to handle the tweets we will be retrieving:

A function to remove URLs for each tweet which returns the same tweet with no URLs:

def remove_url(txt):
    return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split())

2. A function to get a certain number of tweets about a certain topic:

def get_tweets(search_term, amount, lang, date):  tweets = tw.Cursor(api.search,
                   q=search_term,
                   lang=lang,
                   since=date).items(amount)
  
  if not tweets:
    raise TypeError('Error: not enough tweets available for your request. Please change your query')
  
  return tweets

3. A function to make a rough sentiment approach by averaging the sentiment of each tweet and to display the result based on a very basic set of rules. If the average sentiment is > 0 it will be considered positive while, if it is < 0, it will be considered negative.

def show_result(mean_sentiment):
 
  if mean_sentiment > 0.5:
    print('%8.2f Very Positive Sentiment' % (mean_sentiment))
  elif mean_sentiment >= 0.25 and mean_sentiment <= 0.5:
    print('%8.2f Positive Sentiment' % (mean_sentiment))
  elif mean_sentiment >= 0.0 and mean_sentiment <= 0.25:
    print('%8.2f Slightly Positive Sentiment' % (mean_sentiment))
  elif mean_sentiment < 0.0 and mean_sentiment >= -0.25:
    print('%8.2f Slightly Negative Sentiment' % (mean_sentiment))
  elif mean_sentiment < -0.25 and mean_sentiment >= -0.5:
    print('%8.2f Negative Sentiment' % (mean_sentiment))
  elif mean_sentiment < -0.5: 
    print('%8.2f Very Negative Sentiment' % (mean_sentiment))

In order to make local testing of the model in our Notebook, we will also define a mock request object as follows:

# Mock request object for local API testing
args = {
    "param": ['#bitcoin']
}
REQUEST = json.dumps({ 'args': args })

Note that the value of ‘param’ is what we will be changing for testing purposes.

Finally, we will define the cell that we will be exposing through Titan:

# POST /sentiment
status = 200
location = None
content_type = 'application/json'
lang = 'en'
since = '2020-01-01'
amount = 100try:
    
    request = json.loads(REQUEST)
    args = request.get('args', {})
    search_term = args.get('param', args.get('text', None))
 
    tweets = get_tweets(search_term, amount, lang, since)
    
    # Remove URLs
    tweets_no_urls = [remove_url(tweet.text) for tweet in tweets]    # Create textblob objects of the tweets
    sentiment_objects = [TextBlob(tweet) for tweet in tweets_no_urls]    # Check polarity
    sentiment_values = [[tweet.sentiment.polarity, str(tweet)] for tweet in sentiment_objects]
    
    sentiment_df = pd.DataFrame(sentiment_values, columns=["polarity", "tweet"])
    
    mean_sentiment = float(sentiment_df.mean())    show_result(mean_sentiment) 
    
    
except Exception as err:
    status = 500
    content_type = 'application/json'
    print(json.dumps({ 'error': 'Cannot process request due to an error: {}'.format(err)}))

If we locally run the model, we will obtain the following results about the query ‘#bitcoin’:

0.09 Slightly Positive Sentiment

You can get all the code to run the model in a Notebook here or by cloning this GitHub repository:

Now that the model is successfully running locally, we can deploy it using Titan using thedeploy command:

$ titan deploy

Once it has finished, we can finally test our model with arbitrary topics to check the estimated sentiment:

Wrap-up

In this new post of our series of tutorials we have seen how to create and deploy a simple Sentiment Analysis model based on TextBlob and Tweepy.

As we have seen, using Titan we have been able to get up and running the model with a very little effort.

Stay tuned for more interesting tutorials!

Next Tutorial

In our next tutorial, a complete movie recommendation is made. Be sure to check it out!

Foreword

Titan can help you to radically reduce and simplify the effort required to put AI/ML models into production, enabling Data Science teams to be agile, more productive and closer to the business impact of their developments.

If you want to know more about how to start using Titan or getting a free demo, please visit our website or drop us a line at info@akoios.com.

If you prefer, you can schedule a meeting with us here.

Sentiment Analysis in Twitter

Titan Tutorial #7: Building and deploying a basic Sentiment Analysis model

Wrap-up

Next Tutorial

Foreword

Written by Akoios

No responses yet