In this tutorial, we’ll try to understand how to determine the file type using Python with the help of some examples.
There are multiple ways to detect the type of a file using Python. For example –
📚 Discover Online Data Science Courses & Programs (Enroll for Free)
- Harvard University Data Science: Learn R Basics for Data Science
- Standford University Data Science: Introduction to Machine Learning
- UC Davis Data Science: Learn SQL Basics for Data Science
- IBM Data Science: Professional Certificate in Data Science
- IBM Data Analysis: Professional Certificate in Data Analytics
- Google Data Analysis: Professional Certificate in Data Analytics
- IBM Data Science: Professional Certificate in Python Data Science
- IBM Data Engineering Fundamentals: Python Basics for Data Science
- Harvard University Learning Python for Data Science: Introduction to Data Science with Python
- Harvard University Computer Science Courses: Using Python for Research
- IBM Python Data Science: Visualizing Data with Python
- DeepLearning.AI Data Science and Machine Learning: Deep Learning Specialization
- UC San Diego Data Science: Python for Data Science
- UC San Diego Data Science: Probability and Statistics in Data Science using Python
- Google Data Analysis: Professional Certificate in Advanced Data Analytics
- MIT Statistics and Data Science: Machine Learning with Python - from Linear Models to Deep Learning
- MIT Statistics and Data Science: MicroMasters® Program in Statistics and Data Science
🔎 Find Data Science Programs 👨💻 111,889 already enrolled
Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.
- Get the file type using the file name and its extension using the
os.path.splitext()function. For example, the file “cat.png” appears to be an image file since it has “.png” as its extension.
- Alternatively, you can determine the file based on the contents of the file as well by using the
Let’s look at both methods in detail.
Method 1- Using
This is an inbuilt method in
os library which splits the pathname into a (root, ext) pair such that
root + ext == path. The extension, ext, is either empty or begins with a period and contains at most one period.
Parameters: The only parameter is path which indicates the path of the file specified.
Let’s now look at the usage of this method with some worked out examples
Example 1 – Simple path
import os print(os.path.splitext("file1.txt"))
Here, we get the filename and the extension of the file. From the extension, we can say that the given file is a text file.
Example 2 – Path with No extension
What happens if the filename does not contain any extension?
import os print(os.path.splitext("file"))
As there is no extension in the path specified, we can only see the filename and can’t really determine the file type here.
Example 3 – Path containing an extension
Let’s look at another example of a filename with a confusing extension.
import os print(os.path.splitext("foo.bar.exe"))
We can say that the given file is an executable file (it has
.exe as its extension). As the path contains an extension, then ext will be set to this extension, including the leading period. Note that previous periods will be ignored.
Example 4 – Path containing leading periods
import os print(os.path.splitext(".sdfasg")) print(os.path.splitext("/faaaoo/gves/....png"))
('.sdfasg', '') ('/faaaoo/gves/....png', '')
Here we can see that the leading periods of the last component of the path are considered to be part of the root.
A drawback of determining file type from its extension
In this method, we are trying to determine the file type just by looking at the path of the file. But if we think about it, we can see a case where the path of the file has an extension of one type, but the content in the file is of another type. A simple example can be, a file named “f1.jpg” can have the content inside which is of type HTML. So, for such cases, we cannot just determine the file type by using its name/path.
The solution can be to use the magic number associated with the file to determine the type of the file. A magic number is a fixed number that is used to identify a file. This approach gives you more freedom when naming files and does not require an extension. Magic numbers are useful for identifying files because files can occasionally have incorrect file extensions.
This is can be done using the
Method 2 – Using the
You can use the
magic.from_file() method available in the
python-magic library to determine the type of a file based on its contents. It uses the magic number associated with the file to determine its type.
The following is the syntax –
Parameters: The parameters are path that indicates the path of the file specified and mime(True/False) that attains the mime type of the file (optional).
For more information, refer this.
Let’s now look at the usage of the above method with an example.
Example – Determine file type
Let’s say we create a simple file with HTML markup inside but name the file with a “.jpg” extension. Look at the image below.
The contents of this file are saved under the name “cats.jpg”. Now, let’s try to determine its type using the
import magic print(magic.from_file("cats.png"))
HTML document, ASCII text, with CRLF line terminators
Here, we can see that, even though the file is a jpg file, the inside content is of type HTML, so the output of the program is an HTML file.
In this tutorial, we understood how to determine the file type using
os.path.splitext method. Then, we understood the drawback of using this method and then tried to understand
python-magic library which is better to use.
You might also be interested in –
- Python – Get Filename from Path with Examples
- Get File size using Python
- List of all files in a directory using Python
Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.