determine file type with python

Python – Determine File Type

In this tutorial, we’ll try to understand how to determine the file type using Python with the help of some examples.

There are multiple ways to detect the type of a file using Python. For example –

  1. Get the file type using the file name and its extension using the os.path.splitext() function. For example, the file “cat.png” appears to be an image file since it has “.png” as its extension.
  2. Alternatively, you can determine the file based on the contents of the file as well by using the python-magic library.

Let’s look at both methods in detail.

Method 1- Using os.path.splitext() method

This is an inbuilt method in os library which splits the pathname into a (root, ext) pair such that root + ext == path. The extension, ext, is either empty or begins with a period and contains at most one period.

Basic Syntax:

os.path.splitext(path)

Parameters: The only parameter is path which indicates the path of the file specified.

Let’s now look at the usage of this method with some worked out examples

Example 1 – Simple path

import os
print(os.path.splitext("file1.txt"))

Output:

📚 Data Science Programs By Skill Level

Introductory

Intermediate ⭐⭐⭐

Advanced ⭐⭐⭐⭐⭐

🔎 Find Data Science Programs 👨‍💻 111,889 already enrolled

Disclaimer: Data Science Parichay is reader supported. When you purchase a course through a link on this site, we may earn a small commission at no additional cost to you. Earned commissions help support this website and its team of writers.

('file1', '.txt')

Here, we get the filename and the extension of the file. From the extension, we can say that the given file is a text file.

Example 2 – Path with No extension

What happens if the filename does not contain any extension?

import os
print(os.path.splitext("file"))

Output:

('file', '')

As there is no extension in the path specified, we can only see the filename and can’t really determine the file type here.

Example 3 – Path containing an extension

Let’s look at another example of a filename with a confusing extension.

import os
print(os.path.splitext("foo.bar.exe"))

Output:

('foo.bar', '.exe')

We can say that the given file is an executable file (it has .exe as its extension). As the path contains an extension, then ext will be set to this extension, including the leading period. Note that previous periods will be ignored.

Example 4 – Path containing leading periods

import os
print(os.path.splitext(".sdfasg"))
print(os.path.splitext("/faaaoo/gves/....png"))

Output:

('.sdfasg', '')
('/faaaoo/gves/....png', '')

Here we can see that the leading periods of the last component of the path are considered to be part of the root.

A drawback of determining file type from its extension

In this method, we are trying to determine the file type just by looking at the path of the file. But if we think about it, we can see a case where the path of the file has an extension of one type, but the content in the file is of another type. A simple example can be, a file named “f1.jpg” can have the content inside which is of type HTML. So, for such cases, we cannot just determine the file type by using its name/path.

Solution:

The solution can be to use the magic number associated with the file to determine the type of the file. A magic number is a fixed number that is used to identify a file. This approach gives you more freedom when naming files and does not require an extension. Magic numbers are useful for identifying files because files can occasionally have incorrect file extensions.

This is can be done using the python.magic library.

Method 2 – Using the python-magic library

You can use the magic.from_file() method available in the python-magic library to determine the type of a file based on its contents. It uses the magic number associated with the file to determine its type.

The following is the syntax –

Basic Syntax:

magic.from_file(path,mime)

Parameters: The parameters are path that indicates the path of the file specified and mime(True/False) that attains the mime type of the file (optional).

For more information, refer this.

Let’s now look at the usage of the above method with an example.

Example – Determine file type

Let’s say we create a simple file with HTML markup inside but name the file with a “.jpg” extension. Look at the image below.

file with HTML content

The contents of this file are saved under the name “cats.jpg”. Now, let’s try to determine its type using the magic.from_file() method.

import magic
print(magic.from_file("cats.png"))

Output:

HTML document, ASCII text, with CRLF line terminators

Here, we can see that, even though the file is a jpg file, the inside content is of type HTML, so the output of the program is an HTML file.

Summary

In this tutorial, we understood how to determine the file type using os.path.splitext method. Then, we understood the drawback of using this method and then tried to understand python-magic library which is better to use.

You might also be interested in –


Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.


Author

  • Chaitanya Betha

    I'm an undergrad student at IIT Madras interested in exploring new technologies. I have worked on various projects related to Data science, Machine learning & Neural Networks, including image classification using Convolutional Neural Networks, Stock prediction using Recurrent Neural Networks, and many more machine learning model training. I write blog articles in which I would try to provide a complete guide on a particular topic and try to cover as many different examples as possible with all the edge cases to understand the topic better and have a complete glance over the topic.

Scroll to Top