Pandas dataframes are great for handling two dimensional tabular data. It may happen that you require to randomly select a subset of rows from a dataframe. In this tutorial we’ll look at how to get a random sample of rows of a pandas dataframe.

## The `sample()`

function

The pandas dataframe `sample()`

function can be used to randomly sample rows from a pandas dataframe. It can sample rows based on a count or a fraction and provides the flexibility of optionally sampling rows with replacement. The following is its syntax:

`df_subset = df.sample(n=num_rows)`

Here `df`

is the dataframe from which you want to sample the rows. The parameter `n`

is used to determine the number of rows to sample. It defaults to 1. You can also sample rows based on fraction instead of a count using the `frac`

parameter.

Note: Fix the `random_state`

to get reproducible results.

## Examples

First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.

```
import pandas as pd
data = {
'Name': ['Microsoft Corporation', 'Google, LLC', 'Tesla, Inc.',\
'Apple Inc.', 'Netflix, Inc.'],
'Symbol': ['MSFT', 'GOOG', 'TSLA', 'AAPL', 'NFLX'],
'Shares': [100, 50, 150, 200, 80]
}
df = pd.DataFrame(data)
df
```

Now, let’s look at some of the different use-cases of sampling rows from a dataframe via the pandas dataframe `sample()`

function.

### 1. Sample rows based on count

To randomly sample a fixed number of rows from a dataframe, pass the number of rows to sample to the `n`

parameter of the `sample()`

function.

```
df_sub = df.sample(n=2, random_state=2)
print(df_sub)
```

Output:

**Data Science Programs By Skill Level**

**Introductory** ⭐

- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science

**Intermediate ⭐⭐⭐**

- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization

**Advanced ⭐⭐⭐⭐⭐**

- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science

**🔎 Find Data Science Programs 👨💻 111,889 already enrolled**

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

```
Name Symbol Shares
2 Tesla, Inc. TSLA 150
4 Netflix, Inc. NFLX 80
```

In the above example, we randomly sample two rows from the dataframe `df`

.

### 2. Sample rows based on fraction

If you want to sample rows based on a fraction instead of a count, example, half of all the rows, you can use the `frac`

parameter.

```
df_sub = df.sample(frac=0.4, random_state=2)
print(df_sub)
```

Output:

```
Name Symbol Shares
2 Tesla, Inc. TSLA 150
4 Netflix, Inc. NFLX 80
```

In the above example, we sample 40% of rows of the dataframe `df`

by passing the fraction `0.4`

to the `frac`

parameter.

### 3. Sample rows with replacement

The pandas dataframe `sample()`

function also let’s you sample rows with replacement. Meaning, you can sample the same row more than once. To enable sampling rows with replacement, pass `replace=True`

to the `sample()`

function.

Note that the default behavior of the `sample()`

function is to not sample with replacement. That is, the parameter `replace`

is `False`

by default.

```
df_sub = df.sample(n=3, replace=True, random_state=2)
print(df_sub)
```

Output:

```
Name Symbol Shares
0 Microsoft Corporation MSFT 100
0 Microsoft Corporation MSFT 100
3 Apple Inc. AAPL 200
```

In the above example, you can see that the row with index `0`

of the dataframe `df`

is sampled twice. This happened because of sampling rows with replacement.

For more on the pandas dataframe `sample()`

function. Refer to its official documentation.

With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel having pandas version 1.0.5

More on Pandas DataFrames –

- Pandas – Sort a DataFrame
- Change Order of Columns of a Pandas DataFrame
- Pandas DataFrame to a List in Python
- Pandas – Count of Unique Values in Each Column
- Pandas – Replace Values in a DataFrame
- Pandas – Filter DataFrame for multiple conditions
- Pandas – Random Sample of Rows
- Pandas – Random Sample of Columns
- Save Pandas DataFrame to a CSV file
- Pandas – Save DataFrame to an Excel file
- Create a Pandas DataFrame from Dictionary
- Convert Pandas DataFrame to a Dictionary
- Drop Duplicates from a Pandas DataFrame
- Concat DataFrames in Pandas
- Append Rows to a Pandas DataFrame
- Compare Two DataFrames for Equality in Pandas
- Get Column Names as List in Pandas DataFrame
- Select One or More Columns in Pandas
- Pandas – Rename Column Names
- Pandas – Drop one or more Columns from a Dataframe
- Pandas – Iterate over Rows of a Dataframe
- How to Reset Index of a Pandas DataFrame?
- Read CSV files using Pandas – With Examples
- Apply a Function to a Pandas DataFrame

**Subscribe to our newsletter for more informative guides and tutorials. ****We do not spam and you can opt out any time.**