Computer Vision Systems Applied to Real Business Problems
Explore how computer vision systems powered by convolutional neural networks сan be applied to real business problems.
Join the DZone community and get the full member experience.Join For Free
Computer Vision has finally found its way out of the lab into real-world applications.
The latest state-of-the-art CV systems, which rely heavily on Deep Learning and Convolutional Neural Networks, are now capable of providing high levels of accuracy in object classification, object detection, and other visual recognition tasks. Companies across various industries are finding multiple use-cases for this groundbreaking tech.
However, there’s still a great deal of confusion as to how Computer Vision platforms work. Outside the narrow circle of practitioners, not many understand the complex inner workings of Deep Learning and Convolutional Neural Networks (CNNs, ConvNets) that underpin most modern CV systems.
In this post, we’ll shed some light on what CNNs do and discuss how businesses can leverage modern CV platforms.
What Is Computer Vision?
Computer vision is a study of visual data; it is an interdisciplinary field that concerns itself with how machines can be made not only to mimic the way a person’s eye processes light and colors but also how human brain gains a high-level understanding of visual content.
As humans, we put no conscious effort into identifying objects.
We look at this picture, for example, and immediately recognize a tree with green leaves on its branches (so it’s probably summer) standing alone in the field with clear blue sky above it (which we associate with hot weather.) We can also tell that the tree isn't young (its trunk is quite bulked up), arrive at conclusions as to the regions this kind of a tree is indigenous to and so on. Even in this seemingly simple image, there are a lot of things going on, and we can come up with an entire story to describe what we see in a few seconds. We’re able to process this visual information so efficiently because our brains have thousands of years of evolutionary context to rely on. Machines, however, do not have that advantage. To them, images are just massive piles of integer values that represent intensities on the color spectrum.
So, to make computers extract meaning from visual content, we require something beyond devices that capture light properly; we need software that allows machines to understand context. The technology that’s currently being used to bridge the gap between pixels and meaning is called Convolutional Neural Networks (CNNs).
ImageNet and CNN's Rise to Fame
In 2009, a group of researchers from Princeton and Stanford created the ImageNet database; they downloaded millions of pictures from the web, organized them into thousands of object classes, sorted, cleaned, and annotated labels for each image. They compiled a large labeled dataset of visual data for Machine Learning models to train on. By doing so, they were trying to learn definitively if algorithms available at the time were capable of recognizing most of the objects in our world and, also, to help tackle the issue of overfitting which graphical models, support vector machines, AdaBoost and other machine learning algorithms tended to stumble upon when processing high-dimensional data in small quantities.
Then, they launched the ImageNet Large Scale Visual Recognition Challenge (ILSCVRC) which, since its inception, has become a benchmark in object detection/classification on hundreds of categories and millions of images.
The challenge has been running annually since 2010; it consists of two components — a huge labeled dataset (publicly available) on which algorithms are trained, and a competition where a set of test images, without annotated labels, is used to measure and compare models’ performances. The winner is whoever manages to deliver the most accurate predictions.
From the results of ILSVRC, we can see that each year the winning algorithm has been able to trump the accuracy record set by its predecessor, to achieve even lower error rates. In 2010 and 2011, the models hovered around 25 percent error rate. In 2012, there was a significant, nearly 10 percent drop (to 16.4%.) That year, and ever since then, the winners of ImageNet challenge have been Convolutional Neural Networks.
Today’s state-of-the-art CNNs achieve error rates of ~2.5 percent; they’re able to rival the humans in image classification and have proved to perform well in optical character recognition, natural language processing, drug discovery, video analysis, and other related tasks.
How Exactly do CNNs Work?
Describing in detail the inner workings of a CNN is beyond the scope of this post, so here’s a simplified explanation.
In a nutshell, CNNs are trainable architectures that can determine invariant features within data automatically. They can efficiently detect and classify objects into categories and not get confused over background clutter, scale, pose and other distracting factors.
To achieve this, CNNs break images down into groups of pixels (pixel matrices/massive grids of numbers) and carry out calculations and manipulations to detect patterns.
A typical convolutional neural network consists of:
- Input layer that ingests visual data in a form of arrays of numbers;
- Hidden layers that carry out feature extraction — Convolution Layers, ReLU layers, Pooling layers, etc. They each utilize various filters to perform computations and reorganize pixels into a form that’s easier for a machine to read;
- Fully connected layer that identifies objects.
The filters of each convolutional layer are mathematical objects that slide over the image (a gigantic grid of numbers) and convolve (compute dot products) the image’s pixel matrix at each step. CNNs go from determining high-level patterns — rough edges, curves, etc. — on top layers to recognizing sophisticated objects such faces, animals, cars and so on, on deeper layers, as they perform more convolutions.
The teaching happens when CNN models ingest labeled datasets. At first, their filter values are completely randomized and thus the predictions they make are bogus. But as the network continues using the error function to compare its outputs to the actual labels and adjusting its filters appropriately, it keeps achieving better accuracy with each iteration.
Given how hard it is to make the algorithms overcome the inherent challenges of image classification such as viewpoint variation, deformation, illumination, occlusion, background clutter, and intraclass variation - the accuracy CNNs have been able to achieve seems astounding.
So taken are companies with the technology’s potential, they’ve already begun utilizing it across various applications.
Computer Vision Systems Can Be Used In:
Healthcare. The latest advances in high-performance computing allow us to train neural networks to diagnose certain diseases from MRI scans. Companies such Arterys have officially got a green light from FDA to apply Deep Learning algorithms to medical imagery.
Agriculture. Due to the rise of drone and satellite technologies, it costs substantially cheaper to acquire large datasets of areal imagery compared to a few years ago. Computer Vision technologies can help farmers predict crop yields, detect and classify crop diseases, and enhance field inspection.
Insurance. In this area, companies can use facial recognition technologies to speed up identity verification, enhance underwriting and claim processing and thus deliver better customer service. Also, tornadoes, wildfires, hurricanes, tornadoes, etc. can be monitored and the impact on insurance claims can be modified in real time with the help image recognition tech. Companies such as Orbital Insights also offer to help firms to understand better the emerging markets and support the development of new products in various regions through geospatial analysis.
Automotive. Here, machine learning is used for automated reading of road signs and speed limit setting, lane detection, scene analysis and more. At their core, most modern self-driving cars utilize technologies that input pictures of what’s in front of the vehicle (radar readings, etc.) and output the positions of other objects on the road.
Computer vision and Convolutional Neural Networks have been around for quite some time, but it’s only now that corporations have started to recognize the business benefits of integrating them into applications. This sudden spike of interest can be attributed to the increased accuracy deep learning algorithms have been showing in competitions such as ImageNet, recent advances in computing, and the growing amount of visual data floating around the web.
We’re still nowhere near replicating human visual intelligence but, apparently, we can already benefit from incorporating CNNs into business processes in agriculture, insurance, finance, healthcare, and other fields.
Opinions expressed by DZone contributors are their own.