Agglomerative Clustering for OpenCV Contours with Python
Group Contours That Belong to The Same Object
Introduction
“findContours” function of OpenCV is often used to detect objects by computer vision engineers. Thanks to OpenCV, we just need to write a few lines of code in order to detect the contours (objects). However, contours detected by OpenCV are usually scattered. For instance, a feature rich image could have hundreds to thousands of contours, but it doesn’t mean we have that many objects in the image. Some contours that belong to the same object are detected separately, so it’s our interest to group them such that one contour corresponds to one object.
Why and How
When I encountered the problem during my project, I spent much time trying to use different parameters or different OpenCV functions to detect contours, but none of them worked. Then, I did more research and found one post in OpenCV’s forum. It mentioned agglomerative clustering. However, no source code was given. I also found that sklearn supports aggomerative clustering, but I didn’t use it for the following two reasons:
- The function seemed very complex to me. I didn’t know how to feed correct parameters and I doubted if contours’ data type could fit in the function.
- I needed to use python 2.7, OpenCV 3.3.1 and Numpy 1.11.3. They were not compatible with sklearn’s version (0.20+), which supports agglomerative clustering.
I had to program the agglomerative clustering by myself. Luckily, I learned clustering algorithms in my “Data Mining” course, so I knew K-means (Plus) clustering and agglomerative clustering.
The Source Code
To share with the community about the functions I wrote, I have open sourced it in Github and also published it as a gist to be embedded below. The gist version is for python 3. To use it in python 2.7, you just need to change `range` to `xrange`. Refer here if you need to see the actual code in my Github repo.
The code should be self-explanatory. Just some minor notes here:
- “calculate_contour_distance” function gets the bounding boxes of contours and calculates the distance between two rectangles.
- “merge_contours” function, we can do it simply with `numpy.concatenate` because each contour is just a numpy array of points.
- With the agglomerative clustering algorithm, we don’t need to know how many clusters in advance. Instead, a threshold distance, e.g. 40 pixels can be supplied to the function, so it will stop processing if the closest distance among all contours is bigger than the threshold.
Result
To visualize the clustering effect, please see the two images below. First image shows 12 contours were detected initially, and only 4 were left after clustering as in the second image. The two small objects were due to noises, and they were not merged because they were too far away from the two big clusters compared with the threshold distance.
Summary
In this post, I have shared python source code for clustering OpenCV contours based on agglomerative clustering algorithm. I wish I had found similar function on the internet when I was doing my project. Hope this sharing can help someone.