Python utility functions for computer vision

How to save some time managing your images when working on image classification.

Published in

Towards Data Science

6 min readApr 20, 2020

If you spend enough time working on computer vision using deep learning, you’ll know that there are some code snippets that come back again and again when managing your images. If you are like me and you are tired of always writing python code to go through a list of images separated into categories in a dataset or building a NumPy array from your images, this post will be useful to you. I will also explain how to build a module that can be imported from any script on your machine with your most used functions.

Adding a module to PYTHONPATH

Before describing the functions that I think are useful when dealing with images, I want to explain how you will be able to call them from any script without having to rewrite the code. This will be done by writing all your functions in a script and adding the location of this script to your computer’s environment variables. More specifically you want to add a new variable called PYTHONPATH. The path that you specify for this variable will be included in the directories that python searches when you import a module. If you want to see a list of these directories, you can use the following code in a script or a notebook

import sys
print(sys.path)

The first step is to write all your functions and classes, including the imports, in a script. As an example, let’s call this script myutils.py. Secondly you have to add the location of this file to your environment variables. Instead of reinventing the wheel I point you towards this source that explains how to do it in detail, for any operating system. The last step is simply to import your newly created module from another script as you would do with any other module:

import myutils
myutils.some_function()

from myutils import another_function
another_function()

Note that it is very important in this context to write some documentation for your functions. You will not have access to the code when importing the module and it will be hard to remember what the function does exactly and what the arguments mean. I will not include my docs here just to avoid cluttering the code and avoid repetitions with the explanations in the text.

Requirements

Before diving into the functions directly it is important to list the packages that are required to be able to run these functions. All of them are standard data science packages and they should be imported at the top of the utilities script. They will not have to be imported again when you import your module.

Loading an image into a NumPy array

The first function that I want to introduce is pretty simple but will be useful for other functions and can easily avoid a lot of repetition in your code. The code is the following

This function is used to load an image and resize it using a specific interpolation method. The interpolation part doesn’t sound very useful, but it turns out that tensorflow and tf.keras don’t use the same methods by default so it can become useful to keep track of which one is used. The image is then converted to a numpy array, its pixels are normalized to be between 0 and 1 and a dimension is added. These last steps are necessary to use these images as inputs to a neural network.

Split a dataset into training and validation sets

The first step in all good image classification project is to get a dataset and organize it into different folder, one for each of your classes. Once this is done you will want to split this data into train and val sets to be able to assess the performance of your deep learning model. This can be done by using the train_test_split function that sklearn provides. This however only works on numpy arrays but it can be useful to have a more permanent split, done directly on your images. This is what the following function does.

This function simply takes as input the path to the images. These need to be in a directory called “train”, in which each folder represents a class. The function takes a certain fraction of the images in each class, determined by the argument “split”, and moves them to a newly created directory called “val”.

Create numpy dataset from directory of images

Once you have your dataset, there are generally three ways to input the images into a tf.keras neural network: as numpy arrays, using keras generators or using tfrecords. The first option can be annoying to do so the following function takes care of the necessary steps.

Again to use this function the images must be separated into folders that are each dedicated to a class. Examples of directories to use as input are the train and val that are created using the previous function “train_split”. All the images are loaded into numpy arrays using “load_image” and are combined into one big array. An array of labels is also created from the division into folders. The categories are numbered from the order that the folders are in (normally alphabetical). If the “onehot” argument is False, each label is simply the number of the category. If “onehot” is True, each label is converted to a one-hot encoded vector.

Looking for duplicate images

Sometimes when you make your dataset yourself you end up with many copies of the same image. If your dataset is small this can be handled manually, but if you have thousands of images it becomes a problem. This is why I built a class to find duplicates and delete them. There are already many scripts that do this online. This is definitely not the best one but one of the things that I like is that it’s ok if your data is already split. Duplicates will be found even if one image is in the train directory and the other is in the val. Also, the comparison between images is done as numpy arrays, so even if it is slower it is more accurate than using hash. Here is the code:

The only input of this class is the directory where the images are. All the images in the directory will be checked, no matter at what level they are located. When the class is instantiated there will be a dictionary created where the images are separated according to their size. You can then use the method “find_duplicates()” to obtain the list of images that appear more than once. Once this is done the method “show_duplicates()” allows you to visualize the images that are present more than once and you can use the method “delete_duplicates()” to delete these images.

Apply a generic function to images

The last function that I want to discuss seems simple at first but is very generic so the strategy can become pretty powerful. The idea is again to work on images that are separated in folders according to classes. This function goes through all the images and applies a generic function to them.

The first argument is the path to the directory where the images are located. Again an example is train or val. The argument “do” is any function that you want that takes as first argument the path to an image. The *args part refers to any parameters that you want. Whatever you write there when using the function will be used as arguments for the function do. These must be given as unnamed parameters. If you wanted to specify named arguments, you would have to use *kwargs. The difference comes down to the difference between things like “function(2)” and “function(number=2)”.

I think an example is required here. Let’s say I want to print the filename along with a keyword. I would define a function and give it as an argument to “work_with_files” as follows:

The output of this call would look like this:

filename: image1.jpg
filename: image2.jpg
filename: image3.jpg
...

I think these functions are simple but used often enough that it is worth sharing them with people. Hopefully, someone will benefit from the things that I learned by spending a lot of time manipulating images with python. Let me know if you have any comment or questions :)