An Introduction to 5 Types of Image Annotation

DZone 's Guide to

An Introduction to 5 Types of Image Annotation

The five most common types of image annotation are bounding boxes, cuboids, polygons, lines, and semantic segmentation.

· AI Zone ·
Free Resource


Looking for information on the different image annotation types? In the world of AI and machine learning, data is king. Without data, there can be no data science. For AI developers and researchers to achieve the ambitious goals of their projects, they need access to enormous amounts of high-quality data. In regards to image data, one major field of machine learning that requires large amounts of annotated images in computer vision.

Table of Contents

  1. What is Computer Vision?
  2. What is Image Annotation?
  3. Common Image Annotation Types
  4. 2D Bounding Boxes
  5. 3D Bounding Boxes / Cuboids
  6. Polygons
  7. Lines and Splines
  8. Semantic Segmentation

Don’t have time to read the entire article? Watch our 5-minute video instead!

What Is Computer Vision?

Computer vision is one of the biggest fields of machine learning and AI development. Put simply, computer vision is the area of AI research that seeks to make a computer see and visually interpret the world. From autonomous vehicles and drones to medical diagnosis technology and facial recognition software, the applications of computer vision are vast and revolutionary.

Since computer vision deals with developing machines to mimic or surpass the capabilities of human sight, training such models requires a plethora of annotated images.

What Is Image Annotation?

Image annotation is simply the process of attaching labels to an image. This can range from one label for the entire image or numerous labels for every group of pixels within the image. A simple example of this is providing human annotators with images of animals and having them label each image with the correct animal name. The method of labeling, of course, relies on the image annotation types used for the project. Those annotated images sometimes referred to as ground truth data, would then be fed to a computer vision algorithm. Through training, the model would then be able to distinguish animals from unannotated images.

While the above example is quite simple, branching further into more intricate areas of computer vision like autonomous vehicles requires more intricate image annotation.

What Are the Most Common Image Annotation Types?

Wondering what image annotation types best suit your project? Below are five common types of image annotations and some of their applications.

1. Bounding Boxes

For bounding box annotation, human annotators are given an image and are tasked with drawing a box around certain objects within the image. The box should be as close to every edge of the object as possible. The work is usually done on custom platforms that differ from company to company. If your project has unique requirements, some companies can tweak their existing platforms to match your needs.

One specific application of bounding boxes would be autonomous vehicle development. Annotators would be told to draw bounding boxes around entities like vehicles, pedestrians, and cyclists within traffic images.

Developers would feed the machine learning model with the bounding-box-annotated images to help the autonomous vehicle distinguish these entities in real-time and avoid contact with them.

2. 3D Cuboids

Much like bounding boxes, 3D cuboid annotation tasks annotators with drawing a box around objects in an image. Where bounding boxes only depicted length and width, 3D cuboids label length, width, and approximate depth.

With 3D cuboid annotation, human annotators draw a box encapsulating the object of interest and place anchor points at each of the object’s edges. If one of the object’s edges are out of view or blocked by another object in the image, the annotator approximates where the edge would be based on the size and height of the object and the angle of the image.

3. Polygons

Sometimes objects in an image don’t fit well in a bounding box or 3D cuboid due to their shape, size, or orientation within the image. As well, sometimes developers want more precise annotation for objects in an image like cars in traffic images or landmarks and buildings within aerial images. In these cases, developers might opt for polygonal annotation.

With polygons, annotators draw lines by placing dots around the outer edge of the object they want to annotate. The process is like a connect the dots exercise while placing the dots at the same time. The space within the area surrounded by the dots is then annotated using a predetermined set of classes i.e. cars, bicycles, trucks. When assigned more than one class to annotate, it is called a multi-class annotation.

4. Lines and Splines

While lines and splines can be used for a variety of purposes, they’re mainly used to train machines to recognize lanes and boundaries. As their name suggests, annotators would simply draw lines along the boundaries you require your machine to learn.

Lines and splines can be used to train warehouse robots to accurately place boxes in a row, or items on a conveyor belt. However, the most common application of lines and splines annotation is autonomous vehicles. By annotating road lanes and sidewalks, the autonomous vehicle can be trained to understand boundaries and stay in one lane without veering.

5. Semantic Segmentation

Whereas the previous examples on this list dealt with outlining the outer edges or boundaries of an object, semantic segmentation is much more precise and specific. Semantic segmentation is the process of associating every single pixel in an entire image with a tag. With projects requiring semantic segmentation, human annotators will be usually be given a list of pre-determined tags to choose from with which they must tag everything within the page.

Using similar platforms used in polygonal annotation, annotators would draw lines around a group of pixels they want to tag. This can also be done with AI-assisted platforms where, for example, the program can approximate the boundaries of a car, but might make a mistake and include the shadows underneath the car in the segmentation. 

In those cases, human annotators would use a separate tool to crop out the pixels that don’t belong. For example, with training data for autonomous vehicles, annotators might be given instructions like “Please segment everything in the image by roads, buildings, cyclists, pedestrians, obstacles, trees, sidewalks, and vehicles.”

Another common application of semantic segmentation is medical imaging devices. For anatomy and body part labeling, annotators would be given a picture of a person and be told to tag each body part with the correct body part names. Semantic segmentation can also be used for incredibly specialized tasks like tagging brain lesions within CT scan images.

These are just five common image annotation types used in machine learning and AI development.

If you have a project with unique requirements or are in need of custom AI training data, get in touch with Lionbridge AI to see how we can help you. Our multilingual community of over 1 million contributors can help you get the data you need when you need it.

ai, artificial intelligence, computer vision, data annotation, data science, datasets, machine learning, ml, training data

Published at DZone with permission of Limarc Ambalina . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}