two columns selected from a dataframe with three columns

Select One or More Columns in Pandas

There are a number of ways in which you can select a subset of columns in pandas. You can select them by their names or their indexes. In this tutorial, we’ll look at how to select one or more columns in a pandas dataframe through some examples.

Let’s look at some of the different ways in which we can select columns of a dataframe using their names –

import pandas as pd

# create a sample dataframe
data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

# select columns 'Name' and 'Department'
df_selected = df[['Name', 'Department']]

# print the dataframe
print("The original dataframe:\n")
print(df)
print("\nDataframe with the selected columns:\n")
print(df_selected)

Output:

The original dataframe:

     Name  Age       Department
0     Jim   26            Sales
1  Dwight   28            Sales
2  Angela   27       Accounting
3    Tobi   32  Human Resources

Dataframe with the selected columns:

     Name       Department
0     Jim            Sales
1  Dwight            Sales
2  Angela       Accounting
3    Tobi  Human Resources

In the above example, we select the columns Name and Department from the dataframe df by passing them as a list to the indexing operator []. You can see that the returned dataframe just has those two columns.

.loc is a pandas dataframe property used for accessing rows or columns of a dataframe by their labels. You can use it to select a subset of columns of a dataframe by their names.

import pandas as pd

# create a sample dataframe
data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

# select columns 'Name' and 'Department'
df_selected = df.loc[:,['Name', 'Department']]

# print the dataframe
print("The original dataframe:\n")
print(df)
print("\nDataframe with the selected columns:\n")
print(df_selected)

Output:

The original dataframe:

     Name  Age       Department
0     Jim   26            Sales
1  Dwight   28            Sales
2  Angela   27       Accounting
3    Tobi   32  Human Resources

Dataframe with the selected columns:

     Name       Department
0     Jim            Sales
1  Dwight            Sales
2  Angela       Accounting
3    Tobi  Human Resources

In the above example, we use df.loc[:,['Name', 'Department']] to select columns Name and Department. Note that the : before the , is used so that we get all the rows for the two columns. You can give your specific slices based on what rows you require.

You can also select columns by giving their indexes using the .iloc property of the dataframe.

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

import pandas as pd

# create a sample dataframe
data = {
    'Name': ['Jim', 'Dwight', 'Angela', 'Tobi'],
    'Age': [26, 28, 27, 32],
    'Department': ['Sales', 'Sales', 'Accounting', 'Human Resources']
}

df = pd.DataFrame(data)

# select columns 'Name' and 'Department'
df_selected = df.iloc[:,[0, 2]]

# print the dataframe
print("The original dataframe:\n")
print(df)
print("\nDataframe with the selected columns:\n")
print(df_selected)

Output:

The original dataframe:

     Name  Age       Department
0     Jim   26            Sales
1  Dwight   28            Sales
2  Angela   27       Accounting
3    Tobi   32  Human Resources

Dataframe with the selected columns:

     Name       Department
0     Jim            Sales
1  Dwight            Sales
2  Angela       Accounting
3    Tobi  Human Resources

In the above example, we use the column indexes 0 and 2 to select columns Name and Department respectively from the dataframe df.

Refer to this guide for more on indexing and selecting data in pandas.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

Scroll to Top