How to access a Column in Pandas? - Data Science Parichay

In this tutorial, we will learn how to access a column in a Pandas dataframe. Further, we will understand the same with the help of different examples.

We can select a column in a Pandas dataframe by –

Column Name
Column Index

1. Accessing a column in a Pandas dataframe by Column name/label :

In order to access a particular column by its name or label, we can use the pandas loc property.

Syntax : dataframeName.loc[:, “columnName”]

Here, “:” specifies the rows we want to access and the “columnName” specifies the column name we want to access.

2. Accessing a column in a Pandas dataframe by Column index :

In order to access a particular column by its index, we can use the pandas iloc property.

📚 Data Science Programs By Skill Level

Introductory ⭐

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

Syntax : dataframeName.iloc[:, columnIndex]

Here, “:” specifies the rows we want to access and the columnIndex which is an integer, specifies the column index we want to access.

Examples

We will now look at a few examples for a better understanding.

But before that, we will create a pandas dataframe that we will be using throughout this tutorial using the following command:

import pandas as pd

# employee data
data = {
    "Name": ["Jim", "Dwight", "Angela", "Tobi"],
    "Age": [26, 28, 27, 32],
    "Department": ["Sales", "Sales", "Accounting", "HR"]
}

# create pandas dataframe
df = pd.DataFrame(data)
# displays dataframe
df

Output:

pandas dataframe with employee information

Example 1: Get a column by its name or label

Let’s access the “Age” column in the above dataframe using its name.

df.loc[:,"Age"]

Output:

0    26
1    28
2    27
3    32
Name: Age, dtype: int64

Alternatively, you can use the [] notation in pandas to access a dataframe column using its name.

df["Age"]

Output:

0    26
1    28
2    27
3    32
Name: Age, dtype: int64

We get the same result as above.

Example 2: Get a column by its index

Let’s now access the same “Age” column using its column index (which is 1).

df.iloc[: , 1]

Output:

0    26
1    28
2    27
3    32
Name: Age, dtype: int64

You might be wondering, what will happen if we try to access a column that does not exist in our dataframe. Let’s take a few more examples:

Example 3

Here, we access a column by its name, “grade” that is not present in the above dataframe.

df.loc[:,"grade"]

Output:

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'grade'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

<ipython-input-7-6638a7078278> in <module>
----> 1 df.loc[:,"grade"]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1098     def _getitem_tuple(self, tup: tuple):
   1099         with suppress(IndexingError):
-> 1100             return self._getitem_lowerdim(tup)
   1101 
   1102         # no multi-index, so validate all of the indexers

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    836                 # We don't need to check for tuples here because those are
    837                 #  caught by the _is_nested_tuple_indexer check above.
--> 838                 section = self._getitem_axis(key, axis=i)
    839 
    840                 # We should never have a scalar section here, because

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1162         # fall thru to straight lookup
   1163         self._validate_key(key, axis)
-> 1164         return self._get_label(key, axis=axis)
   1165 
   1166     def _get_slice_axis(self, slice_obj: slice, axis: int):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_label(self, label, axis)
   1111     def _get_label(self, label, axis: int):
   1112         # GH#5667 this will fail if the label is not present in the axis.
-> 1113         return self.obj.xs(label, axis=axis)
   1114 
   1115     def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3759         if axis == 1:
   3760             if drop_level:
-> 3761                 return self[key]
   3762             index = self.columns
   3763         else:

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'grade'

as we can see, when we tried accessing the column by name which was not present in our dataframe, a KeyError was thrown.

Example 4

Now, let’s try to access a column (by its index) that is not present in the above dataframe.

df.iloc[:, 7]

Output:

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-8-ba5e2fdae46d> in <module>
----> 1 df.iloc[:, 7]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1504     def _getitem_tuple(self, tup: tuple):
   1505 
-> 1506         self._has_valid_tuple(tup)
   1507         with suppress(IndexingError):
   1508             return self._getitem_lowerdim(tup)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    752         for i, k in enumerate(key):
    753             try:
--> 754                 self._validate_key(k, i)
    755             except ValueError as err:
    756                 raise ValueError(

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1407             return
   1408         elif is_integer(key):
-> 1409             self._validate_integer(key, axis)
   1410         elif isinstance(key, tuple):
   1411             # a tuple should already have been caught by this point

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
   1498         len_axis = len(self.obj._get_axis(axis))
   1499         if key >= len_axis or key < -len_axis:
-> 1500             raise IndexError("single positional indexer is out-of-bounds")
   1501 
   1502     # -------------------------------------------------------------------

IndexError: single positional indexer is out-of-bounds

we can see that, when we tried accessing the column which was not present in the dataframe through its index , an IndexError error was thrown.

Summary:

In this tutorial, we looked at how to access a column in a Pandas dataframe. The following are the key takeaways –

You can use the pandas loc property to access a column using its name or label.
You can use the pandas iloc property to access a column using its index (column position).
Accessing a column not present in the dataframe will result in an error.

Author

Anushka Singh

View all posts