Remove prefix from pandas column names

Remove Prefix or Suffix from Pandas Column Names

In this tutorial, we will look at how to remove prefix or suffix substrings from column names of a pandas dataframe.

Remove prefix from pandas column names

You can use the string lstrip() function or the string replace() function to remove prefix from column names. Let’s go over them with the help of examples. First, we will create a sample dataframe that we will be using throughout this tutorial.

import pandas as pd

# create a dataframe
df = pd.DataFrame({
    "tb1_Name": ["Emma", "Shivam", "Mike", "Noor"],
    "tb1_Age": [16, 17, 14, 16] 
})
# display the dataframe
print(df)

Output:

  tb1_Name  tb1_Age
0     Emma       16
1   Shivam       17
2     Mike       14
3     Noor       16

We now have a dataframe containing the names and ages of four students in a high school. Note that the column names have prefix “tb1_” which doesn’t give any relevant information and as such can be removed.

Let’s see how we can remove the prefix from all the columns.

The string lstrip() function is used to remove leading characters from a string. Pass the substring that you want to be removed from the start of the string as the argument.

To rename the columns, we will apply this function on each column name as follows.

# remove prefix
df.columns = df.columns.str.lstrip("tb1_")
# display the dataframe
print(df)

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

     Name  Age
0    Emma   16
1  Shivam   17
2    Mike   14
3    Noor   16

You can see that the prefix has been removed from each column.

You can also use the string replace() function to remove prefix from a string.

# create a dataframe
df = pd.DataFrame({
    "tb1_Name": ["Emma", "Shivam", "Mike", "Noor"],
    "tb1_Age": [16, 17, 14, 16] 
})
# remove prefix
df.columns = df.columns.str.replace("tb1_", "")
# display the dataframe
print(df)

Output:

     Name  Age
0    Emma   16
1  Shivam   17
2    Mike   14
3    Noor   16

We get the same result as above.

Note that the string replace() function will replace every occurrence of the substring. This can be an issue if the prefix substring occurs later in the column name. Thus, it is recommended that you use the string lstrip() function to remove prefixes.

You can use the string rstrip() function or the string replace() function to remove suffix from column names. Let’s look at some examples.

The string rstrip() function is used to remove trailing characters from a string. Pass the substring that you want to be removed from the end of the string as the argument.

# create a dataframe
df = pd.DataFrame({
    "Name_tb1": ["Emma", "Shivam", "Mike", "Noor"],
    "Age_tb1": [16, 17, 14, 16] 
})
# display the dataframe
print(df)

Output:

  Name_tb1  Age_tb1
0     Emma       16
1   Shivam       17
2     Mike       14
3     Noor       16

Here the column names in the dataframe df have suffix “_tb1” which we want to remove. To rename the columns, we will apply the rstrip() function on each column name as follows.

# remove suffix
df.columns = df.columns.str.rstrip("_tb1")
# display the dataframe
print(df)

Output:

     Name  Age
0    Emma   16
1  Shivam   17
2    Mike   14
3    Noor   16

You can see that the suffix is not present in the updated column names.

You can also use the string replace() function to remove suffix from column names.

# create a dataframe
df = pd.DataFrame({
    "Name_tb1": ["Emma", "Shivam", "Mike", "Noor"],
    "Age_tb1": [16, 17, 14, 16] 
})
# remove suffix
df.columns = df.columns.str.replace("_tb1", "")
# display the dataframe
print(df)

Output:

     Name  Age
0    Emma   16
1  Shivam   17
2    Mike   14
3    Noor   16

As already mentioned, the string replace() function will replace every occurrence of the passed substring. This can be an issue if the substring occurs earlier in the column name. Thus, it is preferred to use the rstrip() function to remove suffix from column names.

Python version 3.9 introduced new string functions to remove prefix and suffix from strings. You can use these function on the column names to remove prefixes and suffixes.

To remove prefix from column names:

# remove prefix
df.columns = df.columns.map(lambda x: x.removeprefix("prefix_string"))

To remove suffix from column names:

# remove suffix
df.columns = df.columns.map(lambda x: x.removesuffix("suffix_string"))

For more on these functions, refer to their documentation.


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top