Pandas is a powerful data manipulation library in Python that lets us interact with and manipulate tabular data. In this tutorial, we will look at how to select columns from a Pandas dataframe that are of a specific data type (for example, numeric, object, boolean, etc.).
How to select columns by data type in a Pandas dataframe?
You can use the Pandas dataframe select_dtypes()
function to select columns from a dataframe based on their types (the dtypes). Pass the list of dtypes to include as an argument to the include
parameter.
The following is the syntax –
# select columns based on dtypes df.select_dtypes(include=[dtypes_to_include], exclude=[dtypes_to_exclude])
It returns a subset of the dataframe’s columns based on the column dtypes.
Examples
Let’s now look at some examples of using the above syntax.
First, we will create a dataframe that we will be using throughout this tutorial.
import pandas as pd # employee data data = { "Name": ["Jim", "Dwight", "Angela", "Tobi"], "Age": [26, 28, 27, 32], "Department": ["Sales", "Sales", "Accounting", "HR"], "Salary": [55000, 60000, 52000, 45000], "Bonus%": [8.25, 10, 8.50, 8.25], "OnProbation": [True, False, False, False] } # create pandas dataframe df = pd.DataFrame(data) # display the dataframe df
Output:
Here, we created a dataframe with information about some employees in an office. The dataframe has the columns, “Name”, “Age”, “Department”, “Salary”, “Bonus%”, and “OnProbation”.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Let’s get the dtypes of the columns in the above dataframe.
# get column dtypes df.dtypes
Output:
Name object Age int64 Department object Salary int64 Bonus% float64 OnProbation bool dtype: object
There are two object
type columns – “Name” and “Department”, two int64
columns – “Age” and “Salary”, one float64
type column “Bonus%” and one bool
type column – “OnProbation”.
Example 1 – Select all object type columns
Let’s select all the object
type columns from the above dataframe.
# select object type columns df.select_dtypes(include=['object'])
Output:
Here, we passed 'object'
in the list of dtypes to include.
Example 2 – Select all numeric columns
Let’s now select all the numeric columns in the above dataframe. This includes both the integer and the float columns.
You can use the number
type to select numeric columns in a dataframe.
# select all numeric type columns df.select_dtypes(include=['number'])
Output:
We get the “Age”, “Salary”, and the “Bonus%” columns.
Note, if you only want to select columns with specific dtypes like int64
or float64
use that instead of the generic number
type. For example, let’s select only the int64
type columns from the above dataframe.
# select integer type columns df.select_dtypes(include=['int64'])
Output:
We get the integer columns “Age” and “Salary”.
Example 3 – Select all numeric and boolean columns
You can also pass multiple dtypes in the list to the include
parameter. For example, let’s now include all the numeric and the boolean type columns from the above dataframe.
# select all numeric and bool type columns df.select_dtypes(include=['number', 'bool'])
Output:
We get the bool
type column “OnProbation” along with the numeric columns in the dataframe.
Example 4 – Select all columns except numeric columns
The select_dtypes()
function also takes a parameter, exclude
that you can use to exclude columns of specific dtypes when selecting them.
For example, let’s get all the columns from the above dataframe that are non-numeric.
# select all columns except numeric columns df.select_dtypes(exclude=['number'])
Output:
We get the non-numeric columns of the dataframe – “Name”, “Department”, and “OnProbation”.
Summary
In this tutorial, we looked at how to use the pandas dataframe select_dtypes
function to select columns based on their dtypes in a pandas dataframe. The following are the key takeaways –
- Pass the dtypes to include as a list to the
include
parameter. - Pass the dtypes to exclude as a list to the
exclude
parameter.
You might also be interested in –
- Pandas – Get Columns with Missing Values
- Pandas – Drop Duplicate Columns From Dataframe
- Pandas – Delete All Columns Except Some Columns
- Pandas dataframe describe() function
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.