Skip to Content

Pandas – Find Column Names that Contain Specific String

In this tutorial, we will look at how to get the column names in a pandas dataframe that contain a specific string (in the column name) with the help of some examples.

How to find columns whose name contain a specific string?

pandas get column names with a specified string

You can apply the string contains() function with the help of the .str accessor on df.columns to get column names (of a pandas dataframe) that contain a specific string.

You can use the .str accessor to apply string functions to all the column names in a pandas dataframe.

Pass the string you want to check for as an argument to the contains() function. The following is the syntax.

# get column names containing a specific string, s
df.columns[df.columns.str.contains(s)]

The idea is to get a boolean array using df.columns.str.contains() and then use it to filter the column names in df.columns.

Alternatively, you can use a list comprehension to iterate through the column names and check if it contains the specified string or not.

Examples

Let’s now look at some examples of using the above syntax.

First, we will create a pandas dataframe that we will be using throughout this tutorial.

import pandas as pd

# employee data
data = {
    "First Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Last Name": ["Halpert", "Schrute", "Martin", "Flenderson"],
    "Age": [26, 28, 27, 32]
}

# create pandas dataframe
df = pd.DataFrame(data)

# display the dataframe
df

Output:

employee dataframe with spaces in some column names

Here, we created a dataframe with information about some employees in an office. The dataframe has the columns – “First Name”, “Last Name”, and “Age”.

Example 1 – Get columns names that contain a specific string

Let’s get the column names in the above dataframe that contain the string “Name” in their column labels.

We’ll apply the string contains() function with the help of the .str accessor to df.columns.

# check if column name contains the string, "Name"
df.columns.str.contains("Name")

Output:

array([ True,  True, False])

You can see that we get a boolean array indicating which columns in the dataframe contain the string “Name”.

We can use the above boolean series to filter df.columns to get only the columns that contain the specified string (in this example, “Name”)

# get column names that contain the string, "Name"
df.columns[df.columns.str.contains("Name")]

Output:

Index(['First Name', 'Last Name'], dtype='object')

We get the column names with “Name” in them.

Example 2 – Get column names with a specific string using list comprehension

Alternatively, we can use a list comprehension to iterate through the column names in df.columns and select the ones that contain the given string.

# get column names that contain the string, "Name"
[col for col in df.columns if "Name" in col]

Output:

['First Name', 'Last Name']

We get the column names with “Name” in them. The “First Name” and the “Last Name” columns are the only ones with the string “Name” present in their names in the above dataframe.

Summary

In this tutorial, we looked at how to get the column names containing a specified string in a pandas dataframe. The following are the key takeaways –

  • Use the string contains() function (applied using the .str accessor on df.columns) to check if a column name contains a given string or not (and use this result to filter df.columns).
  • You can also get column names containing a specified string with the help of a list comprehension.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.