Skip to Content

Get Data From Twitter API v2 in Python

Twitter is one of the most prominent social networks in our current day and age. With people, from commoners to public figures using it as a medium to share their thoughts and opinions, the tweets have become a key source for data. In this tutorial, we will show you the steps to get data from Twitter API v2 in Python.

data from twitter api v2 in python

Note – This tutorial assumes that you’re trying to get data from the newly released v2 version of the Twitter API. If you’re looking to get data from the Twitter API v1.1, refer to this tutorial.

To make any request to the Twitter API (in python or anywhere else) you require your API Key and Tokens for authentication. For this, you need to apply for a developer account with Twitter and have your account approved. Once approved, you can create a project and associate it with a sample App. This App will provide you with your API Key and Authentication Token (Bearer Token) which you can use to authenticate and use the Twitter API.

To apply for a developer account with Twitter –

Twitter API apply for access page
  • You’ll be navigated to login to your Twitter account. Login to your account. If you do not have a Twitter account sign up for one.
Log in to Twitter
  • After logging in you’ll be navigated to a questionnaire on why and how you intend to use the Twitter API. Fill it according to your use-case. If you’re a hobbyist using it to explore the API select Exploring the API under the Hobbyist column.
Define your use case of the Twitter API
  • Answer all the follow-up questions.
  • Review the Developer Agreement and Policy and Submit your Application.
  • Check your email and click the confirmation link to complete the application process.

Generally, if Twitter doesn’t find anything off with your application, you’d be able to access your developer account immediately after completing your application process. Now, to get your API Key and Authentication Token follow the steps –

Twitter Developer platform welcome screen
  • Give your App a name and click Get keys.
  • You’ll be shown your API key, API secret key and your Authenciation Token. Copy and save them securely. You’ll be using them to access the Twitter API.

Having secured your Authentication Token (Bearer Token) you can move on to the python IDE of your choice for using it to access data from the Twitter API.

There are a number of ways to access data from the Twitter API in Python, for this tutorial, we’ll be using the tweepy python library which makes it easy to connect to and fetch data from the Twitter API. In this tutorial, we’ll be fetching the tweets with a specific hashtag (#covid19) from the API.

Note: Make sure you’re using tweepy version 4.0 or above to make requests to the v2 version of the Twitter API.

If you do not have the tweepy library you can install it using the command:

pip install tweepy

This will install the Tweepy library which comes with a whole range of functionality on fetching data from the Twitter API v2. Its Client class provides access to the entire Twitter RESTful API methods for Twitter API v2. Each method can accept various parameters and return responses.

For more, refer to its documentation.

Open up your preferred python environment (eg. Jupyter Notebook, Spyder, etc) and use your Bearer Token to authenticate and connect to the API.

import tweepy

# your bearer token
MY_BEARER_TOKEN = "YOUR_BEARER_TOKEN"
# create your client 
client = tweepy.Client(bearer_token=MY_BEARER_TOKEN)

Here we import the tweepy library and create our client instance by passing our Authentication Token (Bearer Token) as an argument that will allow access to the Twitter API. Now, we can use the client to request information from the API.

Note: Your API credentials and Authentication Tokens are sensitive information that you shouldn’t share publically. As a secure alternative to directly pasting your Authentical Token, you can save your token on your machine as an environment variable and access it in your program without requiring to show it.

A search query is simply a string that tells the Twitter API what kind of tweets you want to search for. Imagine using the search bar on Twitter itself without the API. For example, if you want to search for tweets with #covid19, you’d simply type #covid19 in the Twitter search bar and it’ll show you those tweets.

Under the hood, if we’re using a search query with Twitter API, it actually returns the results from what you’d get had you searched for it directly on Twitter.

search_query = "#covid19 -in:retweets"

Here we set up our search_query to fetch tweets with #covid19 but exclude retweets. You can customize your query based on your requirements. Note that the v2 version of the API has made some changes on how to build search queries. You can refer to this guide for better understanding of how to build search queries.

tweepy.Client comes with a number of functions to make requests to the Twitter API v2. For example, you can use the search_recent_tweets() function to get tweets relevant to your query from the last seven days.

You can use the search_all_tweets() function to perform a full-archive search of public tweets relevant to your query. Note that this endpoint is only available to those users who have been approved for the Academic Research product track.

There are other functions as well, for example, you can use the get_users_tweets() to get tweets of a specific user. You can perform actions like, tweet, follow, unfollow, etc. using the API as well.

Let’s use the search_recent_tweets() function to fetch tweets and the user details relavant to the query created above.

# query to search for tweets
query = "#covid19 lang:en -is:retweet"

# your start and end time for fetching tweets
start_time = "2021-12-10T00:00:00Z"
end_time = "2021-12-14T00:00:00Z"

# get tweets from the API
tweets = client.search_recent_tweets(query=query,
                                     start_time=start_time,
                                     end_time=end_time,
                                     tweet_fields = ["created_at", "text", "source"],
                                     user_fields = ["name", "username", "location", "verified", "description"],
                                     max_results = 10,
                                     expansions='author_id'
                                     )

Here we passed the query and optional parameters start_time, end_time, tweet_fields (to get tweet specific additional information), user_fields (to get user specific additional information), max_results, and expansions. For all the parameters and other information on this function refer to its documentation.

By default, this function returns 10 results for a request. You can modify this with the max_results parameter which takes a number between 10 and 100.

The expansions parameter is used to indicate that we are requesting additional information, for example, user_fields in the above code. The expanded object metadata will be returned with the “includes” response object.

Let’s look at the results from the API to better understand.

# tweet specific info
print(len(tweets.data))
# user specific info
print(len(tweets.includes["users"]))

Output:

10
10

Here tweets.data contains the tweet related data while tweets.includes["users] contains the additional user data that we requested. Let’s look at this information for the first tweet (present at index 0).

# first tweet
first_tweet = tweets.data[0]
dict(first_tweet)

Output:

{'source': 'Twitter Web App',
 'text': 'My tweets are focused on transport, especially safety and related areas of #ClimateCrisis and #COVID19. \n\nI stray into less important areas occasionally.\n\nBut I also sometimes RT when the matters are of such great importance that they should not be neglected,
 'id': 1470544059047555077,
 'author_id': 2380906790,
 'created_at': datetime.datetime(2021, 12, 13, 23, 59, 59, tzinfo=datetime.timezone.utc)}
# user information for the first tweet
first_tweet_user = tweets.includes["users"][0]
dict(first_tweet_user)

Output:

{'verified': False,
 'id': 2380906790,
 'username': 'CHAIRRDRF',
 'name': 'CHAIRRDRF',
 'description': 'Dr Robert Davis, Chair of the Road Danger Reduction Forum}

You can see that we get all the tweet and user information requested.

We now create a dataset (a pandas dataframe) using the attributes of the tweets received from the API.

# import the pandas library
import pandas as pd

# create a list of records
tweet_info_ls = []

# iterate over each tweet and corresponding user details
for tweet, user in zip(tweets.data, tweets.includes['users']):
    tweet_info = {
        'created_at': tweet.created_at,
        'text': tweet.text,
        'source': tweet.source,
        'name': user.name,
        'username': user.username,
        'location': user.location,
        'verified': user.verified,
        'description': user.description
    }
    tweet_info_ls.append(tweet_info)

# create dataframe from the extracted records
tweets_df = pd.DataFrame(tweet_info_ls)
# display the dataframe
tweets_df.head()

Output:

Dataframe created from the tweet data recieved from the API

Here, the dataframe tweets_df is populated with different attributes of the tweet and its corresponding user like the created_at, text, source, name, username, location, etc.

Having the data stored as a dataframe is quite useful for further analysis and reference.

Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.