In this tutorial, we will learn how to access a column in a Pandas dataframe. Further, we will understand the same with the help of different examples.
We can select a column in a Pandas dataframe by –
- Column Name
- Column Index
1. Accessing a column in a Pandas dataframe by Column name/label :
In order to access a particular column by its name or label, we can use the pandas loc
property.
Syntax : dataframeName.loc[:, “columnName”]
Here, “:” specifies the rows we want to access and the “columnName” specifies the column name we want to access.
2. Accessing a column in a Pandas dataframe by Column index :
In order to access a particular column by its index, we can use the pandas iloc
property.
Introductory ⭐
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
Intermediate ⭐⭐⭐
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
Advanced ⭐⭐⭐⭐⭐
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
Syntax : dataframeName.iloc[:, columnIndex]
Here, “:” specifies the rows we want to access and the
columnIndex
which is an integer, specifies the column index we want to access.
Examples
We will now look at a few examples for a better understanding.
But before that, we will create a pandas dataframe that we will be using throughout this tutorial using the following command:
import pandas as pd # employee data data = { "Name": ["Jim", "Dwight", "Angela", "Tobi"], "Age": [26, 28, 27, 32], "Department": ["Sales", "Sales", "Accounting", "HR"] } # create pandas dataframe df = pd.DataFrame(data) # displays dataframe df
Output:
Example 1: Get a column by its name or label
Let’s access the “Age” column in the above dataframe using its name.
df.loc[:,"Age"]
Output:
0 26 1 28 2 27 3 32 Name: Age, dtype: int64
Alternatively, you can use the []
notation in pandas to access a dataframe column using its name.
df["Age"]
Output:
0 26 1 28 2 27 3 32 Name: Age, dtype: int64
We get the same result as above.
Example 2: Get a column by its index
Let’s now access the same “Age” column using its column index (which is 1
).
df.iloc[: , 1]
Output:
0 26 1 28 2 27 3 32 Name: Age, dtype: int64
You might be wondering, what will happen if we try to access a column that does not exist in our dataframe. Let’s take a few more examples:
Example 3
Here, we access a column by its name, “grade” that is not present in the above dataframe.
df.loc[:,"grade"]
Output:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: -> 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: /usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() /usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'grade' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) <ipython-input-7-6638a7078278> in <module> ----> 1 df.loc[:,"grade"] /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key) 923 with suppress(KeyError, IndexError): 924 return self.obj._get_value(*key, takeable=self._takeable) --> 925 return self._getitem_tuple(key) 926 else: 927 # we by definition only have the 0th axis /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 1098 def _getitem_tuple(self, tup: tuple): 1099 with suppress(IndexingError): -> 1100 return self._getitem_lowerdim(tup) 1101 1102 # no multi-index, so validate all of the indexers /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup) 836 # We don't need to check for tuples here because those are 837 # caught by the _is_nested_tuple_indexer check above. --> 838 section = self._getitem_axis(key, axis=i) 839 840 # We should never have a scalar section here, because /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1162 # fall thru to straight lookup 1163 self._validate_key(key, axis) -> 1164 return self._get_label(key, axis=axis) 1165 1166 def _get_slice_axis(self, slice_obj: slice, axis: int): /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_label(self, label, axis) 1111 def _get_label(self, label, axis: int): 1112 # GH#5667 this will fail if the label is not present in the axis. -> 1113 return self.obj.xs(label, axis=axis) 1114 1115 def _handle_lowerdim_multi_index_axis0(self, tup: tuple): /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level) 3759 if axis == 1: 3760 if drop_level: -> 3761 return self[key] 3762 index = self.columns 3763 else: /usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key) 3456 if self.columns.nlevels > 1: 3457 return self._getitem_multilevel(key) -> 3458 indexer = self.columns.get_loc(key) 3459 if is_integer(indexer): 3460 indexer = [indexer] /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: -> 3363 raise KeyError(key) from err 3364 3365 if is_scalar(key) and isna(key) and not self.hasnans: KeyError: 'grade'
as we can see, when we tried accessing the column by name which was not present in our dataframe, a
KeyError
was thrown.
Example 4
Now, let’s try to access a column (by its index) that is not present in the above dataframe.
df.iloc[:, 7]
Output:
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-8-ba5e2fdae46d> in <module> ----> 1 df.iloc[:, 7] /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key) 923 with suppress(KeyError, IndexError): 924 return self.obj._get_value(*key, takeable=self._takeable) --> 925 return self._getitem_tuple(key) 926 else: 927 # we by definition only have the 0th axis /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup) 1504 def _getitem_tuple(self, tup: tuple): 1505 -> 1506 self._has_valid_tuple(tup) 1507 with suppress(IndexingError): 1508 return self._getitem_lowerdim(tup) /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _has_valid_tuple(self, key) 752 for i, k in enumerate(key): 753 try: --> 754 self._validate_key(k, i) 755 except ValueError as err: 756 raise ValueError( /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_key(self, key, axis) 1407 return 1408 elif is_integer(key): -> 1409 self._validate_integer(key, axis) 1410 elif isinstance(key, tuple): 1411 # a tuple should already have been caught by this point /usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_integer(self, key, axis) 1498 len_axis = len(self.obj._get_axis(axis)) 1499 if key >= len_axis or key < -len_axis: -> 1500 raise IndexError("single positional indexer is out-of-bounds") 1501 1502 # ------------------------------------------------------------------- IndexError: single positional indexer is out-of-bounds
we can see that, when we tried accessing the column which was not present in the dataframe through its index , an
IndexError
error was thrown.
Summary:
In this tutorial, we looked at how to access a column in a Pandas dataframe. The following are the key takeaways –
- You can use the pandas
loc
property to access a column using its name or label. - You can use the pandas
iloc
property to access a column using its index (column position). - Accessing a column not present in the dataframe will result in an error.
You might also be interested in –
- Check if a Column Exists in a Pandas DataFrame
- Check if Pandas DataFrame column has object dtype
- Most frequent value in a Pandas Column
- Split Pandas column of lists into multiple columns
- Pandas – Set Column as Index (With Examples)
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.