Python for Data Science-Introduction

Hi, welcome to Python for Data Science, a short series of articles to help beginners in Data Science learn the fundamentals of the python programming language with a focus on Data Science use cases. There are no prerequisites, the series aims to introduce learners in Data Science to python in as simple and as no-nonsense way as possible.

This is the first article in the series. By the end of this article, you’ll have an idea about how data-driven technologies are changing our world, the current state and future prospects of data science, why python could be a good choice for data science, and how to install python on your local machine.

Table of contents

  • Data, the most valuable resource
  • Use cases of Data Science
  • The Data Science Job Market
  • Why Python for Data Science?
  • Python Introduction
  • Install Python

Data, the most valuable resource

Technologies powered by data are changing the world we live in. Fast. Whether it’s recommending which videos to watch or helping identify cancer, machine learning algorithms feeding on data have become so good that in some areas they’re challenging or even outperforming humans.

GPT-3, a language model developed by Open AI, is talk of the town. Trained on around 45 terabytes of data, it’s shown shockingly good results on a number of tasks like generating code and writing remarkably good content.

A JavaScript code generator built on top of GPT-3

It’s no wonder data is now considered a more valuable resource than oil by many. The shift towards data as a minefield of information becomes even more evident by the fact that some of the most valuable companies are in fact technology companies. The only four US companies to ever hit $1 Trillion in market capitalization are all technology companies – Apple, Microsoft, Amazon, and Alphabet (Google’s parent company).

Data science is leading the way to better understand the mammoth amounts of data at our disposal. Not only is it being used for making better decisions by businesses but it’s transforming the way consumers and individuals interact in the connected world.

FAANG Companies (Logos of Facebook, Apple, Amazon, Netflix and Google stacked horizontally)
The FAANG companies

The applications of data science are numerous. Facebook and Google use it to target consumers with more personalized ads. Streaming services like Netflix and Spotify use is to give better recommendations. Banks and credit card companies use it to screen candidates for loans. The list goes on and on. And this is what makes data science really exciting, the wide range of problems that can be solved using the same principles and fundamentals.

Data Scientist has been consistently ranked as one of the top 3 jobs in the United States with a median salary of over $100,000 by Glassdoor. According to LinkedIn’s Emerging Jobs Report 2020, Data Science as a specialty is continuing to grow significantly across all industries and is expected to have an annual growth rate of ~37%.

As companies become more efficient towards their data goals, the traditional jack of all trades role of a Data Scientist is being segregated. Specialized roles like that of a Data Engineer or a Machine Learning Engineer are high in demand as well.

Data Science is the practice of extracting meaningful and actionable insights from data. Python, R, Jupyter, Spider, etc are just tools that enable us to do that. And it’s important to understand that the data science fundamentals do not change with the choice of the tools. Having said that, if you’re a beginner with no preference over a specific language, we’d recommend Python mainly because of its easier learning curve. Also, it has the support of a number of open-sourced libraries like numpy, pandas, scipy, scikit-learn for accomplishing almost any data-related tasks.

Python is quite popular and has strong online community support. This makes it quite easy to find solutions to bugs and support online. According to the 2019 Kaggle ML and DS survey, the majority of Kaggle practitioners use Python.

Python v/s R | Kaggle users preference

Even as a stand alone programming language, python is more popular than ever.

Python is an interpreted, general-purpose object-oriented programming language. It’s interpreted, meaning that do not need to compile the entire code before running as in C++ or Java. In python, you can execute the code line by line. It’s designed to have better code readability and has a gentler learning curve compared to other languages. Its formatting is visually uncluttered, and it often uses English keywords where other languages use punctuation.

In python, printing a statement is as simple as,

python print statement example, print("Hello World")

Over the course of the next few articles in this series, we’ll be covering some of the most fundamental and important elements of Python. The articles would specifically cater to the Data Science use case of python. With the series, we aim to help you start your Data Science journey and become better practitioners.

We recommend you install anaconda, which not only installs python for you, it also installs other important tools like Jupyter, Spyder, RStudio, etc as well. The installation process is relatively simple.

Anaconda Individual Edition installer options
Anaconda Installers
  • Install anaconda from the downloaded installer. During the installation process, provide your preference on how you’d like the anaconda to be installed. For more details, you can follow this guide or use the default configurations if you’re not sure.
  • On Successful installation, you’d be able to open up the Anaconda Navigator. A simple check to test the installation would be to open up a Jupyter Notebook and run a simple python command. Like print(“Hello World”). You can open up a Jupyter Notebook by launching it from the Anaconda Navigator.

That’s it. On successful execution of the above steps, you’d have all the tools necessary to run python be it directly in the command prompt or via a simple application like Jupyter Notebook.

In the next article in this Python for Data Science series, we’d be covering some of the fundamental building blocks of python. Stay curious and keep learning!


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.