Skip to Content

In this tutorial, we will look at how to rename a column in a Pyspark dataframe with the help of some examples. How to rename a Pyspark dataframe column? You can use the Pyspark withColumnRenamed() function to rename a column in a Pyspark dataframe. It takes the old column name and the new column name …

Read More about Rename DataFrame Column Name in Pyspark

In this tutorial, we will look at how to get the variance of a column in a Pyspark dataframe with the help of some examples. How to get variance for a Pyspark dataframe column? You can use the variance() function from the pyspark.sql.functions module to compute the variance of a Pyspark column. The following is …

Read More about PySpark – Variance of a DataFrame Column

Standard deviation is a descriptive statistic used as a measure of the spread in the data. In this tutorial, we will look at how to get the standard deviation of a column in a Pyspark dataframe with the help of some examples. How to get standard deviation for a Pyspark dataframe column? You can use …

Read More about Pyspark – Standard Deviation of a Column

In this tutorial, we will look at how to rename the categories in a Pandas category type column (or series) with the help of examples. How to rename categories in Pandas? You can use the Pandas rename_categories() function to rename the categories in a category type column in Pandas. The following is the syntax – …

Read More about Pandas – Rename Categories in Category Column

In this tutorial, we will look at how to use the Pyspark where() function to filter a Pyspark dataframe with the help of some examples. How to filter dataframe in Pyspark? You can use the Pyspark where() method to filter data in a Pyspark dataframe. You can use relational operators, SQL expressions, string functions, lists, …

Read More about Filter PySpark DataFrame with where()

In this tutorial, we will see different aggregate functions in Pyspark and how to use them on dataframes with the help of examples. How to apply them to Pyspark dataframes? Aggregate functions are used to combine the data using descriptive statistics like count, average, min, max, etc. You can apply aggregate functions to Pyspark dataframes …

Read More about Aggregate Functions in PySpark

In this tutorial, we will look at how to use the Pyspark collect() function to get collect data from a Pyspark dataframe. Collect data from Pyspark dataframe You can use the collect() function to collect data from a Pyspark dataframe as a list of Pyspark dataframe rows. It does not take any parameters but if …

Read More about Get DataFrame Records with Pyspark collect()

In this tutorial, we will look at how to display a dataframe using the show() method in PySpark with the help of some examples. How to display dataframe in Pyspark? The show() method in Pyspark is used to display the data from a dataframe in a tabular format. The following is the syntax – Here, df …

Read More about Display DataFrame in Pyspark with show()