Twitter is one of the most prominent social networks in our current day and age. With people, from commoners to public figures using it as a medium to share their thoughts and opinions, it is a rich source of data. The Twitter API lets you “Programmatically analyze, learn from, and engage with conversation on Twitter”. In this tutorial, we’ll cover how you can use the Twitter API in Python to access data for your own analysis.
Note: If you’re looking to get data from Twitter API v2 check out our new tutorial. However, if you’re interested in extracting data from Twitter API v1.1 continue with the tutorial.
1. Get access to the Twitter API
To make any request to the Twitter API (in python or anywhere else) you require your API Key and Access Token. For this, you need to apply for a developer account with Twitter and have your account approved. Once approved, you can create a project and associate it with a sample App. This App will provide you with your API Key and Access Token which you can use to authenticate and use the Twitter API.
1.1 Apply for a developer account with Twitter
To apply for a developer account with Twitter –
- Navigate to Twitter’s apply for access page and apply for a developer account.
- You’ll be navigated to login to your Twitter account. Login to your account. If you do not have a Twitter account sign up for one.
- After logging in you’ll be navigated to a questionnaire on why and how you intend to use the Twitter API. Fill it according to your use-case. If you’re a hobbyist using it to explore the API select Exploring the API under the Hobbyist column.
- Answer all the follow-up questions.
- Review the Developer Agreement and Policy and Submit your Application.
- Check your email and click the confirmation link to complete the application process.
1.2 Get your Twitter API Key and Access Token
Generally, if Twitter doesn’t find anything off with your application, you’d be able to access your developer account immediately after completing your application process. Now, to get your API Key and Access Token follow the steps –
- On clicking the confirmation email from the above application step, you’ll be navigated to the Twitter Developer Platform.
- Give your App a name and click
Get keys
. - You’ll be shown your API key and API secret key. Copy and save them securely. You’ll be using them to access the Twitter API.
Having secured the Twitter API key and secret you can move on to the python IDE of your choice for using it to access data from the Twitter API.
2. Fetch data from Twitter API in Python
There are a number of ways to access data from the Twitter API in Python, for this tutorial, we’ll be using the tweepy
python library which makes it easy to connect to and fetch data from the Twitter API. In this tutorial, we’ll be fetching the tweets with a specific hashtag (#covid19
) from the API.
2.1 Install tweepy
If you do not have the tweepy
library you can install it using the command:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
pip install tweepy
This will install the Tweepy library which comes with a whole range of functionality on fetching data from the Twitter API. Its API
class provides access to the entire Twitter RESTful API methods. Each method can accept various parameters and return responses.
For more, refer to tweepy’s documentation.
2.2 Authenticate with your credentials
Open up your preferred python environment (eg. Jupyter Notebook, Spyder, etc) and use your Twitter API credentials to authenticate and connect to the API.
# import tweepy import tweepy as tw # your Twitter API key and API secret my_api_key = "XXXXXXXXXXXXXXXXX" my_api_secret = "XXXXXXXXXXXXXXXXXXXXXXX" # authenticate auth = tw.OAuthHandler(my_api_key, my_api_secret) api = tw.API(auth, wait_on_rate_limit=True)
Use your Twitter API key and secret key as values for variables my_api_key
and my_api_secret
respectively. Then, initialize the tweepy OAuthHandler
with the API key and the API secret and use it to get an instance of tweepy API class using which you’ll be making requests to the Twitter API.
2.3 Set up your search query
A search query is simply a string telling the Twitter API what kind of tweets you want to search for. Imagine using the search bar on Twitter itself without the API. For example, if you want to search for tweets with #covid19
, you’d simply type #covid19 in the Twitter search bar and it’ll show you those tweets.
Under the hood, if we’re using a search query with Twitter API, it actually returns the results from what you’d get had you searched for it directly on Twitter.
search_query = "#covid19 -filter:retweets"
Here we set up our search_query
to fetch tweets with #covid19
but also filter out the retweets. You can customize your query based on your requirements. For more, refer to this guide.
2.4 Collect the Tweets
We use the Tweepy Cursor
to fetch the tweets. It returns an object which can be iterated over to get the API responses. We fetch 50 tweets for the search query specified above.
# get tweets from the API tweets = tw.Cursor(api.search, q=search_query, lang="en", since="2020-09-16").items(50) # store the API responses in a list tweets_copy = [] for tweet in tweets: tweets_copy.append(tweet) print("Total Tweets fetched:", len(tweets_copy))
Output:
Total Tweets fetched: 50
Here, we pass as an argument the api.search object, the search query, the language of the tweets, and the date from which to search the tweets. We also limit the number of items (i.e. tweets in this case to 50). The responses are iterated over and saved to the list tweets_copy
.
2.5 Create a dataset
We now create a dataset (a pandas dataframe) using the attributes of the tweets received from the API.
import pandas as pd # intialize the dataframe tweets_df = pd.DataFrame() # populate the dataframe for tweet in tweets_copy: hashtags = [] try: for hashtag in tweet.entities["hashtags"]: hashtags.append(hashtag["text"]) text = api.get_status(id=tweet.id, tweet_mode='extended').full_text except: pass tweets_df = tweets_df.append(pd.DataFrame({'user_name': tweet.user.name, 'user_location': tweet.user.location,\ 'user_description': tweet.user.description, 'user_verified': tweet.user.verified, 'date': tweet.created_at, 'text': text, 'hashtags': [hashtags if hashtags else None], 'source': tweet.source})) tweets_df = tweets_df.reset_index(drop=True) # show the dataframe tweets_df.head()
Output:
Here, the dataframe tweets_df
is populated with different attributes of the Tweet like the username, user’s location, the user’s description, tweet’s timing, tweet’s text, hashtag, etc.
Also, note that for the tweet’s text we’re not using tweet.text
rather we’re calling the API again with the tweet id and fetching its full text. This is because tweet.text
does not contain the full text of the Tweet.
Having the data stored as a dataframe is quite useful for further analysis and reference.
References
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.