Skip to Content

Rename DataFrame Column Name in Pyspark

In this tutorial, we will look at how to rename a column in a Pyspark dataframe with the help of some examples.

How to rename a Pyspark dataframe column?

rename pyspark dataframe column name

You can use the Pyspark withColumnRenamed() function to rename a column in a Pyspark dataframe. It takes the old column name and the new column name as arguments. The following is the syntax.

DataFrame.withColumnRenamed(old_column_name, new_column_name)

It returns a Pyspark dataframe with the column renamed.

Examples

Let’s look at some examples of using the above function to rename one or more column names. First, we’ll create a Pyspark dataframe that we will be using throughout this tutorial.

#import the pyspark module
import pyspark
  
# import the  sparksession class  from pyspark.sql
from pyspark.sql import SparkSession

# create an app from SparkSession class
spark = SparkSession.builder.appName('datascience_parichay').getOrCreate()

# books data as list of lists
df = [[1, "PHP", "Sravan", 250],
        [2, "SQL", "Chandra", 300],
        [3, "Python", "Harsha", 250],
        [4, "R", "Rohith", 1200],
        [5, "Hadoop", "Manasa", 700],
        ]
  
# creating dataframe from books data
dataframe = spark.createDataFrame(df, ['Book_Id', 'Book_Name', 'Author', 'Price'])

# display the dataframe
dataframe.show()

Output:

+-------+---------+-------+-----+
|Book_Id|Book_Name| Author|Price|
+-------+---------+-------+-----+
|      1|      PHP| Sravan|  250|
|      2|      SQL|Chandra|  300|
|      3|   Python| Harsha|  250|
|      4|        R| Rohith| 1200|
|      5|   Hadoop| Manasa|  700|
+-------+---------+-------+-----+

We have a dataframe with 5 rows and 4 columns containing information on some books like the book name, author, price, etc.

Rename a column name in Pyspark

Let’s rename the column “Author” to the name “Writer”. For this, we pass the old column name, “Author” and the new column name, “Writer” as arguments to the withColumnRenamed() function.

# change column name from Author to Writer 
dataframe.withColumnRenamed("Author", "Writer").show()

Output:

+-------+---------+-------+-----+
|Book_Id|Book_Name| Writer|Price|
+-------+---------+-------+-----+
|      1|      PHP| Sravan|  250|
|      2|      SQL|Chandra|  300|
|      3|   Python| Harsha|  250|
|      4|        R| Rohith| 1200|
|      5|   Hadoop| Manasa|  700|
+-------+---------+-------+-----+

You can see that the “Author” column was renamed to “Writer”.

Rename multiple columns in Pyspark

To rename multiple columns, you can chain multiple calls to the withColumnRenamed() function. For example, let’s rename the “Book_Id” column to “Id” and the “Book_Name” column to “Name”.

# change column names - Book_Id to Id and Book_Name to Name
dataframe.withColumnRenamed("Book_Id", "Id").withColumnRenamed("Book_Name", "Name").show()

Output:

+---+------+-------+-----+
| Id|  Name| Author|Price|
+---+------+-------+-----+
|  1|   PHP| Sravan|  250|
|  2|   SQL|Chandra|  300|
|  3|Python| Harsha|  250|
|  4|     R| Rohith| 1200|
|  5|Hadoop| Manasa|  700|
+---+------+-------+-----+

Both the columns were renamed.

In this tutorial, we looked at how to use the withColumnRenamed() function to change column names in a Pyspark dataframe.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Authors

  • Piyush

    Piyush is a data scientist passionate about using data to understand things better and make informed decisions. In the past, he's worked as a Data Scientist for ZS and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

  • Gottumukkala Sravan Kumar