Python for Data Science – Data Structures

Data Structures in python allow you to store and access data more efficiently. In this tutorial, we’ll cover the four basic inbuilt data structures in python – lists, tuples, sets, and dictionaries. These inbuilt data structures are commonly used not just by programmers but also by data science practitioners for their day to day tasks.

This is our fifth and final article in our series, Python for Data Science.

  • In the first article, we introduced how data science is changing the world and why Python is preferred by a majority of data science practitioners.
  • The second article explained some of the fundamental building blocks of programming in python – expressions, variables, data types, operators, comments, input/output functions, etc.
  • In our third article, we looked at the flow of control in python using constructs like conditionals and loops.
  • The fourth article covered the fundamentals of functions in python – function definition, arguments, the scope of variables, the return statement, etc.
  • Inbuilt Data Structures in Python
    • Lists
      • Creating a list
      • Accessing list items
      • Slicing a list
      • Adding elements to a list
      • Removing elements from a list
      • Concatenating lists
    • Tuples
      • Creating a tuple
      • Accessing and slicing tuple items
      • Concatenating tuples
      • Immutable but potentially changing
    • Set
      • Creating a set
      • Adding and removing elements from set
      • Set operations
    • Dictionary
      • Creating a dictionary
      • Updating a dictionary
  • Recommended Reading

While storing and accessing data efficiently is important, there’s no one-size-fits-all way of doing so. Different use cases may require the data to be stored differently. And this is why Python offers four different inbuilt data structures – Lists, Tuples, Sets, and Dictionaries each with its own utility and use cases.

Lists are used to store an ordered collection of items. These items can be any type of object from numbers to strings or even another list. This makes lists are one of the most versatile data structures in python to store a collection of objects.

Creating a list

In python, lists can be created using the square brackets [] with individual items inside the square brackets separated by a comma.

# syntax for creating a list
# list_name = [item1, item2,..., itemn]
# example
ls = [1, 2, 3]
print(ls)

Output:

[1, 2, 3]

Accessing list items

Items in a list are ordered. Meaning, they are present in a specific sequence and can be accessed by their index which denotes their position in the list. The items of a list are indexed starting from 0 all the way to n-1 where n is the length of the list. This indexing is called positive indexing.

# indexing example
ls = ['a', 'b', 'c']
print("Item at the 0th index:", ls[0])
print("Item at the 1st index:", ls[1])
print("Item at the 2nd index:", ls[2])

Output:

Item at the 0th index: a
Item at the 1st index: b
Item at the 2nd index: c

Items in a list can also be accessed by a negative index. These indices refer to items from the end of the list. The negative index starts from -1. For example, ls[-1] will give the last item, ls[-2], the second last item, and so on.

# indexing example
ls = ['a', 'b', 'c']
print("Item at the -1 index:", ls[-1])
print("Item at the -2 index:", ls[-2])
print("Item at the -3 index:", ls[-3])

Output:

Item at the -1 index: c
Item at the -2 index: b
Item at the -3 index: a

Slicing a list

We can access individual items in a list through their index but what if we want to select a range of items inside the list? In python, this can be done through slicing which is done using the : symbol. The syntax to slice elements in a list is:

list_name[start_index:end_index]

The above syntax returns the list sub-segment starting from the start_index up-to but not including the end_index. Example:

# slicing example
ls = ['India', 'USA', 'Canada', 'Australia', 'UK']
print("Slicing [1:3] gives", ls[1:3])
# If starting index is not provided, it's assumed to be 0
print("Slicing [:3] gives", ls[:3])
# If end index is not provided, it's assumed to be the list's length
print("Slicing [3:] gives", ls[3:])
# slicing using [:] gives the entire list
print("Slicing [:] gives", ls[:])

Output:

Slicing [1:3] gives ['USA', 'Canada']
Slicing [:3] gives ['India', 'USA', 'Canada']
Slicing [3:] gives ['Australia', 'UK']
Slicing [:] gives ['India', 'USA', 'Canada', 'Australia', 'UK']

Updating list elements

Lists are mutable. You can add, remove, or update the elements in a list. This ability makes lists quite flexible when it comes to storing data. Example:

# update the list
ls = ['a', 'b', 'c']
# original list
print("Original list:", ls)
# change the second element to 'd'
ls[1] = 'd'
# updated list
print("Updated list:", ls)

Output:

Original list: ['a', 'b', 'c']
Updated list: ['a', 'd', 'c']

Adding an element to a list

Items in a list can be added using the append() or insert() function.

The append() function is used to add an element to the end of the list.
The insert() function is used to insert an element at a specific index inside the list. Example:

# adding elements to a list example
ls = ['India', 'USA', 'Canada', 'Australia', 'UK']
# append is used to add the element to the end of the list
ls.append("South Africa")
print(ls)
# insert is used to add the element at a specific index
ls.insert(1, "South Korea")
print(ls)

Output:

['India', 'USA', 'Canada', 'Australia', 'UK', 'South Africa']
['India', 'South Korea', 'USA', 'Canada', 'Australia', 'UK', 'South Africa']

Removing elements from a list

Items in a list can be removed using the remove()or pop() function.

The remove() function is used to remove the first occurrence of the value passed from the list.
The pop() function is used to remove the element from the list at the specified index.
Note: If you don’t pass any index to the pop function the last element from the list is removed (or popped out). Example:

# removing elements from a list example
ls = ['India', 'USA', 'Canada', 'Australia', 'UK']
# remove the element based on the value passed
ls.remove("Australia")
print(ls)
# remove the element based on the index passed
ls.pop(1)
print(ls)
# remove the last element from the list
ls.pop()
print(ls)

Output:

['India', 'USA', 'Canada', 'UK']
['India', 'Canada', 'UK']
['India', 'Canada']

Concatenating lists

We can concatenate lists using the + operator. Just as using the + operator on strings concatenates them, using it on list results in a combined list. Example:

# concatenate lists
ls1 = [1, 2, 3]
ls2 = [4, 5]
ls3 = ls1 + ls2
print(ls3)

Output:

[1, 2, 3, 4, 5]

Tuples are similar to lists in that they’re used to store ordered data. The only difference is that Tuples are immutable. Meaning, once a tuple is created, its values cannot be updated or changed.

Creating a tuple

A tuple can be created using parenthesis () with individual items inside the parenthesis separated by a comma. Example:

# syntax for creating a tuple
# tuple_name = (item1, item2,..., itemn))
# example
tup = (1, 2, 3)
print(tup)

Output:

(1, 2, 3)

Accessing and slicing tuple items

Like lists, tuple items can be accessed and sliced using their indices enclosed in squared brackets []. Example:

# tuple accessing and slicing example
tup = ('India', 'USA', 'Canada', 'Australia', 'UK')
# accessing tuple elements
print("Item at 0 index:", tup[0])
print("Item at 2 index:", tup[2])
# negative indexing
print("Item at -1 index:", tup[-1])
# slicing tuple
print("Slicing [1:3] gives", tup[1:3])
# if starting index is not provided, it's assumed to be 0
print("Slicing [:3] gives", tup[:3])
# if end index is not provided, it's assumed to be the tuple's length
print("Slicing [3:] gives", tup[3:])
# slicing using [:] gives the entire tuple
print("Slicing [:] gives", tup[:])

Output:

Item at 0 index: India
Item at 2 index: Canada
Item at -1 index: UK
Slicing [1:3] gives ('USA', 'Canada')
Slicing [:3] gives ('India', 'USA', 'Canada')
Slicing [3:] gives ('Australia', 'UK')
Slicing [:] gives ('India', 'USA', 'Canada', 'Australia', 'UK')

Concatenating tuples

Like lists, tuples can be concatenated using the + operator. Example:

# concatenate tuples
tup1 = (1, 2, 3)
tup2 = (4, 5)
tup3 = tup1 + tup2
print(tup3)

Output:

(1, 2, 3, 4, 5)

Immutable but potentially changing

Since tuples are immutable we cannot add or remove elements from a tuple. But, what if a tuple contains a mutable item, for example, a list? Can we change the entries in the list?

This is an interesting question. A tuple, by definition, is a collection of objects. This collection is immutable, that is, it cannot be changed. But, if it contains a mutable element, for example, a list, it can potentially change.

This happens because tuple and other data structures actually store the reference to the items. So, if you try changing an immutable object (for example, an integer value) it tries to change its reference or address which is not allowed. But, if you try to change a mutable object, (example, a list) it does not change the reference to that list hence the tuple (which is basically an immutable collection of such references) remains unchanged. The example below depicts this behaviour:

# tuple with a mutable object
tup = ('red', 'blue', [1,2,3])
print("The original tuple:", tup)
# print the memory location of the list
print("Memory location of tuple's third item(the list):", id(tup[2]))
# updating the list inside the tuple
tup[2][0] = 7
# tuple after updating the list
print("The updated tuple:", tup)
# print the memory location of the list
print("Memory location of tuple's third item(the list):", id(tup[2]))

Output:

The original tuple: ('red', 'blue', [1, 2, 3])
Memory location of tuple's third item(the list): 140065732921856
The updated tuple: ('red', 'blue', [7, 2, 3])
Memory location of tuple's third item(the list): 140065732921856

For more on this behaviour of tuples, refer to this article.

Sets are used to store a collection of unordered and unique elements. Sets are unordered, meaning you cannot use it for storing sequences as there is no inherent ordering of elements inside the set. Hence, sets do not support indexing, slicing, or other sequence-like behaviour. Also, sets are mutable.

Creating a set

A set can be created using curly braces {}. Example:

# set example
sample_set = {'red', 'red', 'blue', 'green'}
# print the set
print(sample_set)

Output:

{'red', 'green', 'blue'}

As you can see in the above example, the duplicate value for ‘red’ was not considered in the set.

Adding and removing elements from set

Sets are mutable and thus we can add and remove elements from a set.

To add an element to a set, we use the add()function. Example:

# add element to set example
sample_set = {'red', 'blue', 'green'}
# print the set
print("Original set:", sample_set)
# add 'yellow' to the set
sample_set.add('yellow')
# print the updated set
print("Updated set:", sample_set)

Output:

Original set: {'red', 'green', 'blue'}
Updated set: {'red', 'green', 'blue', 'yellow'}

To remove an element from a set we can use remove() or discard() methods:
1. remove(): This method gives an error if the element is not present in the set.
2. discard(): This method does not give an error if the element is not present in the set.

Example:

# remove element from set example
sample_set = {'red', 'blue', 'green', 'yellow'}
# print the set
print("Original set:", sample_set)
# use the remove() function
sample_set.remove('yellow')
# print the updated set
print("Updated set after removing yellow:", sample_set)
# use the discard() function
sample_set.discard('blue')
# print the updated set
print("Updated set after removing blue:", sample_set)

Output:

Original set: {'red', 'green', 'blue', 'yellow'}
Updated set after removing yellow: {'red', 'green', 'blue'}
Updated set after removing blue: {'red', 'green'}

Set Operations

Python sets are similar to sets in mathematics and many of the mathematical set operations like union, intersection, difference, etc. can be performed on sets in python. Example:

# common set operations
a = {1, 2, 3}
b = {2, 3, 4}
# print the sets
print("Set a:", a)
print("Set b:", b)
# union operation
print("Union of sets a and b:", a.union(b))
# intersection operation
print("Intersection of sets a and b:", a.intersection(b))
# difference operation
print("Elements of set a not in b (a-b):", a.difference(b))
print("Elements of set b not in a (b-a):", b.difference(a))
# symmetric difference operation
print("Elements present in either set a or set b but not in both:", a.symmetric_difference(b))
print("Elements present in either set a or set b but not in both:", b.symmetric_difference(a))

Output:

Set a: {1, 2, 3}
Set b: {2, 3, 4}
Union of sets a and b: {1, 2, 3, 4}
Intersection of sets a and b: {2, 3}
Elements of set a not in b (a-b): {1}
Elements of set b not in a (b-a): {4}
Elements present in either set a or set b but not in both: {1, 4}
Elements present in either set a or set b but not in both: {1, 4}

Dictionaries are used to store key to value mappings in python. Unlike sequences (example, lists, tuples) which are indexed by a range of numbers, dictionaries are indexed by keys. Dictionaries are mutable but you can only use immutable types as their keys.

Creating a dictionary

A dictionary in python can be created using the curly braces {} with individual key:value pairs inside the curly braces separated by a comma. The value corresponding to a key can easily be accessed by using the square brackets []. Example:

# creating a sample dictionary
sample_dict = {'USA': 'Washington', 'India': 'New Delhi', 'UK':'London'}
# print the dictionary
print("Sample dictionary:", sample_dict)
# print the value corresponding to the key India
print("Key corresponding to India:", sample_dict['India'])

Output:

Sample dictionary: {'USA': 'Washington', 'India': 'New Delhi', 'UK': 'London'}
Key corresponding to India: New Delhi

Updating a dictionary

Dictionaries are mutable and hence we can update the dictionary by adding new key:value pairs, removing existing key:value pairs, or changing the value corresponding to a key.

The example below shows how to update the value of an existing key and also add a new key:value pair to the dictionary.

sample_dict = {'USA': 'Washington', 'India': 'New Delhi', 'UK':'London'}
# print the dictionary
print("Original dictionary:", sample_dict)
# update the value corresponding to the key 'USA'
sample_dict['USA'] = 'New York'
# print the updated dictionary
print("Updated dictionary:", sample_dict)
# add another key:value pair
sample_dict['France'] = 'Paris'
# print the updated dictionary
print("Updated dictionary:", sample_dict)

Output:

Original dictionary: {'USA': 'Washington', 'India': 'New Delhi', 'UK': 'London'}
Updated dictionary: {'USA': 'New York', 'India': 'New Delhi', 'UK': 'London'}
Updated dictionary: {'USA': 'New York', 'India': 'New Delhi', 'UK': 'London', 'France': 'Paris'}

To remove a key:value pair from a dictionary, we can use the pop() function which returns the value that has been removed. Example:

sample_dict = {'USA': 'Washington', 'India': 'New Delhi', 'UK':'London'}
# print the dictionary
print("Original dictionary:", sample_dict)
# remove the key UK
sample_dict.pop("UK")
# print the updated dictionary
print("Updated dictionary:", sample_dict)

Output:

Original dictionary: {'USA': 'Washington', 'India': 'New Delhi', 'UK': 'London'}
Updated dictionary: {'USA': 'Washington', 'India': 'New Delhi'}

In this tutorial, we looked at the different inbuilt data structures in python – Lists, sets, tuples, and dictionaries, their characteristics, and how each of them is different when it comes to storing data. If you’d like to dive deeper, we recommend the following (opening available) resource to supplement the topics covered –

  • Chapter-4 and Chapter-5 of the book Automate the Boring Stuff with Python. This book is specially designed to have an implementation first approach. The online version of the book is freely available.

If you found this article useful do give it a share! For more such articles subscribe to us.

With this, we come to the end of our Python for Data Science series. In the series, we covered some of the fundamentals of the Python programming language which are essential for Data Science. Stay curious and happy learning!