select columns of a specific dtype in pandas

Pandas – Select Columns of a Specific Type

Pandas is a powerful data manipulation library in Python that lets us interact with and manipulate tabular data. In this tutorial, we will look at how to select columns from a Pandas dataframe that are of a specific data type (for example, numeric, object, boolean, etc.).

How to select columns by data type in a Pandas dataframe?

You can use the Pandas dataframe select_dtypes() function to select columns from a dataframe based on their types (the dtypes). Pass the list of dtypes to include as an argument to the include parameter.

The following is the syntax –

# select columns based on dtypes
df.select_dtypes(include=[dtypes_to_include], exclude=[dtypes_to_exclude])

It returns a subset of the dataframe’s columns based on the column dtypes.

Examples

Let’s now look at some examples of using the above syntax.

First, we will create a dataframe that we will be using throughout this tutorial.

import pandas as pd

# employee data
data = {
    "Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Age": [26, 28, 27, 32],
    "Department": ["Sales", "Sales", "Accounting", "HR"],
    "Salary": [55000, 60000, 52000, 45000],
    "Bonus%": [8.25, 10, 8.50, 8.25],
    "OnProbation": [True, False, False, False]
}

# create pandas dataframe
df = pd.DataFrame(data)

# display the dataframe
df

Output:

the complete employees dataframe

Here, we created a dataframe with information about some employees in an office. The dataframe has the columns, “Name”, “Age”, “Department”, “Salary”, “Bonus%”, and “OnProbation”.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Let’s get the dtypes of the columns in the above dataframe.

# get column dtypes
df.dtypes

Output:

Name            object
Age              int64
Department      object
Salary           int64
Bonus%         float64
OnProbation       bool
dtype: object

There are two object type columns – “Name” and “Department”, two int64 columns – “Age” and “Salary”, one float64 type column “Bonus%” and one bool type column – “OnProbation”.

Example 1 – Select all object type columns

Let’s select all the object type columns from the above dataframe.

# select object type columns
df.select_dtypes(include=['object'])

Output:

object type columns of employees dataframe

Here, we passed 'object' in the list of dtypes to include.

Example 2 – Select all numeric columns

Let’s now select all the numeric columns in the above dataframe. This includes both the integer and the float columns.

You can use the number type to select numeric columns in a dataframe.

# select all numeric type columns
df.select_dtypes(include=['number'])

Output:

numeric columns of employees dataframe

We get the “Age”, “Salary”, and the “Bonus%” columns.

Note, if you only want to select columns with specific dtypes like int64 or float64 use that instead of the generic number type. For example, let’s select only the int64 type columns from the above dataframe.

# select integer type columns
df.select_dtypes(include=['int64'])

Output:

integer columns of employees dataframe

We get the integer columns “Age” and “Salary”.

Example 3 – Select all numeric and boolean columns

You can also pass multiple dtypes in the list to the include parameter. For example, let’s now include all the numeric and the boolean type columns from the above dataframe.

# select all numeric and bool type columns
df.select_dtypes(include=['number', 'bool'])

Output:

numeric and bool type columns of employees dataframe

We get the bool type column “OnProbation” along with the numeric columns in the dataframe.

Example 4 – Select all columns except numeric columns

The select_dtypes() function also takes a parameter, exclude that you can use to exclude columns of specific dtypes when selecting them.

For example, let’s get all the columns from the above dataframe that are non-numeric.

# select all columns except numeric columns
df.select_dtypes(exclude=['number'])

Output:

non-numeric columns of employees dataframe

We get the non-numeric columns of the dataframe – “Name”, “Department”, and “OnProbation”.

Summary

In this tutorial, we looked at how to use the pandas dataframe select_dtypes function to select columns based on their dtypes in a pandas dataframe. The following are the key takeaways –

  • Pass the dtypes to include as a list to the include parameter.
  • Pass the dtypes to exclude as a list to the exclude parameter.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top