In this tutorial, we will look at how to remove prefix or suffix substrings from column names of a pandas dataframe.
Remove Prefix from column names in Pandas
You can use the string lstrip()
function or the string replace()
function to remove prefix from column names. Let’s go over them with the help of examples. First, we will create a sample dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "tb1_Name": ["Emma", "Shivam", "Mike", "Noor"], "tb1_Age": [16, 17, 14, 16] }) # display the dataframe print(df)
Output:
tb1_Name tb1_Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
We now have a dataframe containing the names and ages of four students in a high school. Note that the column names have prefix “tb1_” which doesn’t give any relevant information and as such can be removed.
Let’s see how we can remove the prefix from all the columns.
1. Using string lstrip()
The string lstrip()
function is used to remove leading characters from a string. Pass the substring that you want to be removed from the start of the string as the argument.
To rename the columns, we will apply this function on each column name as follows.
# remove prefix df.columns = df.columns.str.lstrip("tb1_") # display the dataframe print(df)
Output:
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Name Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
You can see that the prefix has been removed from each column.
2. Using string replace()
You can also use the string replace()
function to remove prefix from a string.
# create a dataframe df = pd.DataFrame({ "tb1_Name": ["Emma", "Shivam", "Mike", "Noor"], "tb1_Age": [16, 17, 14, 16] }) # remove prefix df.columns = df.columns.str.replace("tb1_", "") # display the dataframe print(df)
Output:
Name Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
We get the same result as above.
Note that the string replace() function will replace every occurrence of the substring. This can be an issue if the prefix substring occurs later in the column name. Thus, it is recommended that you use the string lstrip() function to remove prefixes.
Remove Suffix from column names in Pandas
You can use the string rstrip()
function or the string replace()
function to remove suffix from column names. Let’s look at some examples.
1. Using string rstrip()
The string rstrip()
function is used to remove trailing characters from a string. Pass the substring that you want to be removed from the end of the string as the argument.
# create a dataframe df = pd.DataFrame({ "Name_tb1": ["Emma", "Shivam", "Mike", "Noor"], "Age_tb1": [16, 17, 14, 16] }) # display the dataframe print(df)
Output:
Name_tb1 Age_tb1 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
Here the column names in the dataframe df have suffix “_tb1” which we want to remove. To rename the columns, we will apply the rstrip()
function on each column name as follows.
# remove suffix df.columns = df.columns.str.rstrip("_tb1") # display the dataframe print(df)
Output:
Name Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
You can see that the suffix is not present in the updated column names.
2. Using string replace()
You can also use the string replace() function to remove suffix from column names.
# create a dataframe df = pd.DataFrame({ "Name_tb1": ["Emma", "Shivam", "Mike", "Noor"], "Age_tb1": [16, 17, 14, 16] }) # remove suffix df.columns = df.columns.str.replace("_tb1", "") # display the dataframe print(df)
Output:
Name Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
As already mentioned, the string replace() function will replace every occurrence of the passed substring. This can be an issue if the substring occurs earlier in the column name. Thus, it is preferred to use the rstrip()
function to remove suffix from column names.
For Python 3.9+ use string removeprefix()
and removesuffix()
Python version 3.9 introduced new string functions to remove prefix and suffix from strings. You can use these function on the column names to remove prefixes and suffixes.
To remove prefix from column names:
# remove prefix df.columns = df.columns.map(lambda x: x.removeprefix("prefix_string"))
To remove suffix from column names:
# remove suffix df.columns = df.columns.map(lambda x: x.removesuffix("suffix_string"))
For more on these functions, refer to their documentation.
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.