It can be very handy to know how to programmatically get a list of all files in a folder. For example, you have a folder full of text files containing useful data that you want to collate into a dataset or you just want to find out whether a given file exists in a folder or not. In this tutorial, we will look at how to get a list of all the files in a folder using Python.
How to get a list of files in a directory?
There are a number of ways to get a list of all files in a directory using Python. You can use the os module’s
os.listdir() or the glob module’s
glob.glob() functions to list out the contents of a directory.
Let’s demonstrate the usage for each of these methods with the help of some examples. First, let’s look at the directory structure of the directory we want to use for this tutorial.
The “weather” directory contains one python script, one requirements text file, one README markdown file, and a directory named “data” which stores the data for the project.
os module in python comes with a number of handy functions for file handling. To list out the contents of a directory, you can use the
os.listdir() function. It returns a list of all files and directories in a directory.
For example, let’s use it to get the list of contents in the current working directory which is the “weather” directory from the tree shown above.
import os print(os.listdir())
['data', 'README.md', 'requirements.txt', 'train.py']
You can see we get all the files and directories in the current working directory. You can, however, pass a custom directory path to list out its contents instead. For example, let’s list out the contents of the “data” directory present inside the current working directory.
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
import os print(os.listdir('./data'))
['chennai.txt', 'data sources', 'delhi.txt', 'kolkata.txt', 'mumbai.txt', 'test_set.csv', 'train_set.csv']
We get a list of all files and folders present in the “data” directory. In this example, we passed a relative path but you can also pass an absolute path and get its contents as well.
If you only want to get a list of files and not the directories, you can use the
os.path.isfile() function which checks whether a given path is a file or not. For example, let’s list out only the files (and not directories) inside the “data” directory.
import os from os.path import isfile, join # set the base path base_path = './data' file_ls = [f for f in os.listdir(base_path) if isfile(join(base_path, f))] print(file_ls)
['chennai.txt', 'delhi.txt', 'kolkata.txt', 'mumbai.txt', 'test_set.csv', 'train_set.csv']
You can see that we only get the files and not the directories present inside the “data” folder.
For more on the
os module in python, refer to its documentation.
You can also use the
glob module to get a list of files in a directory. Let’s use it to list out the files in our current directory.
import glob print(glob.glob("*"))
['data', 'README.md', 'requirements.txt', 'train.py']
You can see that we get all the files and directories in the current working directory. Note that we passed “*” as the parameter to the
glob.glob() function which results in listing all the files and folders in the given directory.
You can also specify the types of files you want to get from a path. For example, to only get text files from the “data” folder in our current working directory –
import glob print(glob.glob("data/*.txt"))
['data\\chennai.txt', 'data\\delhi.txt', 'data\\kolkata.txt', 'data\\mumbai.txt']
We get a list of only the text files present in the “data” folder. Note that the above result is obtained on a Windows machine hence the “\\” in the path.
With this, we come to the end of this tutorial. The code examples and results presented in this tutorial have been implemented in a Jupyter Notebook with a python (version 3.8.3) kernel.
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.
Tutorials on interacting with the file system in Python –