In this tutorial, we will look at how to apply string functions to a category type column in Pandas with the help of some examples.
How to apply string functions to a category type column?
For appropriate category type columns in Pandas, you can use the .str
accessor to apply string functions. The following is the syntax –
# apply string funciton on category column df["Col"].str.contains("abc")
Here, “Col” is a category type column where we apply a string function contains()
. You can similarly apply other string functions.
Examples
Let’s look at some examples of applying string functions to a categorical column in Pandas. First, we will create a sample dataframe that we’ll be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "Name": ["Tim", "Sarah", "Hasan", "Jyoti", "Jack"], "Shirt Size": ["Medium", "Small", "Small", "Medium", "Large"], "University": ["MIT, USA", "Stanford, USA", "MIT, USA", "IIT Delhi, India", "Cambridge, UK"] }) # change to category dtype df["Shirt Size"] = df["Shirt Size"].astype("category") df["University"] = df["University"].astype("category") # display the dataframe print(df)
Output:
Name Shirt Size University 0 Tim Medium MIT, USA 1 Sarah Small Stanford, USA 2 Hasan Small MIT, USA 3 Jyoti Medium IIT Delhi, India 4 Jack Large Cambridge, UK
We now have a dataframe containing the name, t-shirt size, and the university information of some students participating in a hackathon.
Note that the “Shirt Size” and “University” columns are of category
type.
String function contains()
on a Pandas category column
Let’s use the string function contains()
to check which of the above students are from universities in the USA. For this, apply the contains()
function on the “University” column with the help of the .str
accessor.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
# check if student is from a US university print(df["University"].str.contains("USA"))
Output:
0 True 1 True 2 True 3 False 4 False Name: University, dtype: bool
The resulting series is of bool
type. We get True
if the string “USA” is present in the “University” field and False
otherwise.
String function lower()
on Pandas category column
Let’s look at another example, here, we’ll change the case in the “Shirt Size” column to lowercase. For this, apply the string lower()
function on the “Shirt Size” column with the help of the .str
accessor.
# shirt size column to lowercase print(df["Shirt Size"].str.lower())
Output:
0 medium 1 small 2 small 3 medium 4 large Name: Shirt Size, dtype: object
The resulting series has values in the lower case. Note that the returned series is of object
type. That is, the category
type of the original series is not preserved here.
You might also be interested in –
- Pandas – Convert Category Type Column to String
- Change Category Order of a Pandas Column
- Add New Categories to a Category Column in Pandas
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.