Check out how the R&D Engineering team at Ciklum created an image segmentation algorithm using computer vision and semantic segmentation.
Carvana is an online marketplace for buying used vehicles. It provides a fully automated service scheme that resembles a coin-operated vending machine for cars.
Carvana has a custom rotating photo studio that automatically captures and processes 16 standard images of each vehicle in their inventory. While Carvana takes high-quality photos, bright reflections and cars of the similar colors as the background cause automation errors, which requires a skilled photo editor to change.
As with any big purchase, consumers need transparent and comprehensive information before committing. To manually edit 16 projections of thousands of individual cars, the company would have to hire an army of designers, which would be a time-consuming and expensive process.
Customers had to rely on blurry pictures with little information. Carvana urgently needed an algorithm to separate the car from the background as a professional designer would. The company hosted a competition on Kaggle with $25,000 prize, 735 teams, and two months to find a solution.
Develop an algorithm to automatically generate masks for cars in images. The algorithm has to predict whether each pixel of the picture belongs to the class of car or the background.
For the training set, a GIF file with 2.5K images was provided, containing the manual cutout mask for each image with pixel intensity values of 0 and 1. The testing set was 100,064 images in .jpg format with 1280 x 1918 resolution.
The car and the background in the image had to be colored at the pixel level. The input was a JPG file and the output, a binary mask.
There are several popular models for semantic segmentation in recent deep learning literature such as SegNet, FCN, E-NEt, U-Net, and this year’s state-of-the-art models, such as PSPNet and LinkNet. After some initial experimentation, SegNet and U-Net were used as a base to build a custom architecture.
Building the Model
The model consisted of two levels:
Separating a mask where we have predictions.
Reducing the predictions to one.
Level 2 was executed in two stages:
The weighted values of accuracy and probability were summed up, divided, and thresholded to give a probability of 0 to 1.
A model was built from the best results on the basis of weighted values. Depending on internal validation, the values were summed up for the next validation.
Our obstacles were wheels, colors, and errors around edges.
Some cars had disc brakes with special holes, while other cars had different types of brakes with different wheel designs. The dataset was previously marked by different specialists and the markdown differed as some considered marking the holes important, whereas other specialists did not. Consequently, the algorithm had difficulty with wheels in some cases.
All of the cars were of different colors on a white grayish background. When white vehicles were put on this background, it was hard for the model to understand the border between the car and the background as the training dataset contained few images of white cars.
Errors Around Edges
Shadows near the cars, small antennas, and roof racks added complexity to the image masking process.
The evaluation metric was a mean Dice coefficient, which demonstrated how two multitudes were overlapped. The higher the Dice coefficient, the better (max 1).
The algorithm demonstrated a mean Dice coefficient of 0.997 for differentiating the cars from the background, which was much better and faster than manual design work.
How the algorithm works:
With an efficient algorithm, Carvana could now build trust with consumers by providing better images, streamline the online buying process, and most importantly, save time and money.
Any questions about the process? If this project is interesting to you or you're doing something similar, drop your comments below and we can have a more detailed discussion.