Skip to Content

Get data from Twitter API in Python – Step by Step Guide

Twitter is one of the most prominent social networks in our current day and age. With people, from commoners to public figures using it as a medium to share their thoughts and opinions, it is a rich source of data. The Twitter API lets you “Programmatically analyze, learn from, and engage with conversation on Twitter”. In this tutorial, we’ll cover how you can use the Twitter API in Python to access data for your own analysis.

Note: If you’re looking to get data from Twitter API v2 check out our new tutorial. However, if you’re interested in extracting data from Twitter API v1.1 continue with the tutorial.

To make any request to the Twitter API (in python or anywhere else) you require your API Key and Access Token. For this, you need to apply for a developer account with Twitter and have your account approved. Once approved, you can create a project and associate it with a sample App. This App will provide you with your API Key and Access Token which you can use to authenticate and use the Twitter API.

To apply for a developer account with Twitter –

Twitter API apply for access page
  • You’ll be navigated to login to your Twitter account. Login to your account. If you do not have a Twitter account sign up for one.
Log in to Twitter
  • After logging in you’ll be navigated to a questionnaire on why and how you intend to use the Twitter API. Fill it according to your use-case. If you’re a hobbyist using it to explore the API select Exploring the API under the Hobbyist column.
Define your use case of the Twitter API
  • Answer all the follow-up questions.
  • Review the Developer Agreement and Policy and Submit your Application.
  • Check your email and click the confirmation link to complete the application process.

Generally, if Twitter doesn’t find anything off with your application, you’d be able to access your developer account immediately after completing your application process. Now, to get your API Key and Access Token follow the steps –

Twitter Developer platform welcome screen
  • Give your App a name and click Get keys.
  • You’ll be shown your API key and API secret key. Copy and save them securely. You’ll be using them to access the Twitter API.
Twitter API key and secret

Having secured the Twitter API key and secret you can move on to the python IDE of your choice for using it to access data from the Twitter API.

There are a number of ways to access data from the Twitter API in Python, for this tutorial, we’ll be using the tweepy python library which makes it easy to connect to and fetch data from the Twitter API. In this tutorial, we’ll be fetching the tweets with a specific hashtag (#covid19) from the API.

If you do not have the tweepy library you can install it using the command:

pip install tweepy

This will install the Tweepy library which comes with a whole range of functionality on fetching data from the Twitter API. Its API class provides access to the entire Twitter RESTful API methods. Each method can accept various parameters and return responses.

For more, refer to tweepy’s documentation.

Open up your preferred python environment (eg. Jupyter Notebook, Spyder, etc) and use your Twitter API credentials to authenticate and connect to the API.

# import tweepy
import tweepy as tw

# your Twitter API key and API secret
my_api_key = "XXXXXXXXXXXXXXXXX"
my_api_secret = "XXXXXXXXXXXXXXXXXXXXXXX"

# authenticate
auth = tw.OAuthHandler(my_api_key, my_api_secret)
api = tw.API(auth, wait_on_rate_limit=True)

Use your Twitter API key and secret key as values for variables my_api_key and my_api_secret respectively. Then, initialize the tweepy OAuthHandler with the API key and the API secret and use it to get an instance of tweepy API class using which you’ll be making requests to the Twitter API.

A search query is simply a string telling the Twitter API what kind of tweets you want to search for. Imagine using the search bar on Twitter itself without the API. For example, if you want to search for tweets with #covid19, you’d simply type #covid19 in the Twitter search bar and it’ll show you those tweets.

Under the hood, if we’re using a search query with Twitter API, it actually returns the results from what you’d get had you searched for it directly on Twitter.

search_query = "#covid19 -filter:retweets"

Here we set up our search_query to fetch tweets with #covid19 but also filter out the retweets. You can customize your query based on your requirements. For more, refer to this guide.

We use the Tweepy Cursor to fetch the tweets. It returns an object which can be iterated over to get the API responses. We fetch 50 tweets for the search query specified above.

# get tweets from the API
tweets = tw.Cursor(api.search,
              q=search_query,
              lang="en",
              since="2020-09-16").items(50)

# store the API responses in a list
tweets_copy = []
for tweet in tweets:
    tweets_copy.append(tweet)
    
print("Total Tweets fetched:", len(tweets_copy))

Output:

Total Tweets fetched: 50

Here, we pass as an argument the api.search object, the search query, the language of the tweets, and the date from which to search the tweets. We also limit the number of items (i.e. tweets in this case to 50). The responses are iterated over and saved to the list tweets_copy.

We now create a dataset (a pandas dataframe) using the attributes of the tweets received from the API.

import pandas as pd

# intialize the dataframe
tweets_df = pd.DataFrame()

# populate the dataframe
for tweet in tweets_copy:
    hashtags = []
    try:
        for hashtag in tweet.entities["hashtags"]:
            hashtags.append(hashtag["text"])
        text = api.get_status(id=tweet.id, tweet_mode='extended').full_text
    except:
        pass
    tweets_df = tweets_df.append(pd.DataFrame({'user_name': tweet.user.name, 
                                               'user_location': tweet.user.location,\
                                               'user_description': tweet.user.description,
                                               'user_verified': tweet.user.verified,
                                               'date': tweet.created_at,
                                               'text': text, 
                                               'hashtags': [hashtags if hashtags else None],
                                               'source': tweet.source}))
    tweets_df = tweets_df.reset_index(drop=True)

# show the dataframe
tweets_df.head()

Output:

The populated dataframe with Tweets

Here, the dataframe tweets_df is populated with different attributes of the Tweet like the username, user’s location, the user’s description, tweet’s timing, tweet’s text, hashtag, etc.

Also, note that for the tweet’s text we’re not using tweet.text rather we’re calling the API again with the tweet id and fetching its full text. This is because tweet.text does not contain the full text of the Tweet.

Having the data stored as a dataframe is quite useful for further analysis and reference.


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.