DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Mastering Time Series Analysis: Techniques, Models, and Strategies
  • Microservices With Apache Camel and Quarkus

Trending

  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Mastering Time Series Analysis: Techniques, Models, and Strategies
  • Microservices With Apache Camel and Quarkus
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Region of Interest Pooling Explained

Region of Interest Pooling Explained

RoI pooling is used for object detection tasks, significantly speeds up train and test time, and lets us reuse the feature map from the convolutional network.

Tomasz Grel user avatar by
Tomasz Grel
·
Mar. 06, 17 · Opinion
Like (0)
Save
Tweet
Share
8.06K Views

Join the DZone community and get the full member experience.

Join For Free

In this post, we’re going to say a few words about an interesting neural network layer called Rregion of Interest pooling (also known as RoI pooling), the implementation of which we’ve recently open-sourced (you can find it here). First, let’s start with some background.

Two major tasks in computer vision are object classification and object detection. In the first case, the system is supposed to correctly label the dominant object in an image. In the second case, it should provide correct labels and locations for all objects in an image. Of course, there are other interesting areas of computer vision, such as image segmentation, but today, we’re going to focus on detection.

In this task, we’re usually supposed to draw bounding boxes around any object from a previously specified set of categories and assign a class to each of them. For example, let’s say we’re developing an algorithm for self-driving cars and we’d like to use a camera to detect other cars, pedestrians, cyclists, etc. Our dataset might look like this.

In this case, we’d have to draw a box around every significant object and assign a class to it. This task is more challenging than classification tasks such as MNIST or CIFAR. On each frame of the video, there might be multiple objects — some of them are overlapping, and some are poorly visible or occluded. Moreover, for such an algorithm, performance can be a key issue. In particular, for autonomous driving, we have to process tens of frames per second.

So, how do we solve this problem?

Typical Architecture

The object detection architecture we’re going to be talking about today is broken down into two stages:

1. Region Proposal

Given an input image find all possible places where objects can be located. The output of this stage should be a list of bounding boxes of likely positions of objects. These are often called region proposals or Regions of Interest. There are quite a few methods for this task, but we’re not going to talk about them in this post.

2. Final Classification

For every region proposal from the previous stage, decide whether it belongs to one of the target classes or to the background. Here we could use a deep convolutional network.

Object detection pipeline wit region of interest pooling

Object detection pipeline with a Region of Interest pooling. 

Usually, in the proposal phase, we have to generate a lot of regions of interest. Why? If an object is not detected during the first stage (region proposal), there’s no way to correctly classify it in the second phase. That’s why it’s extremely important for the region proposals to have a high recall. That’s achieved by generating very large numbers of proposals (i.e., a few thousand per frame). Most of them will be classified as background in the second stage of the detection algorithm.

Some problems with this architecture are:

  • Generating a large number of regions of interest can lead to performance problems. This would make real-time object detection difficult to implement.
  • It’s suboptimal in terms of processing speed. More on this later.
  • You can’t do end-to-end training, i.e., you can’t train all the components of the system in one run (which would yield much better results).

That’s where Region of Interest pooling comes into play.

Region of Interest Pooling: Description

Region of Interest pooling is a neural-net layer used for object detection tasks. It was first proposed by Ross Girshick in April 2015 (the article can be found here) and it achieves a significant speedup of both training and testing. It also maintains a high detection accuracy. The layer takes two inputs:

  1. A fixed-size feature map obtained from a deep convolutional network with several convolutions and max-pooling layers.
  2. An N x 5 matrix of representing a list of Regions of Interest, where N is a number of RoIs. The first column represents the image index and the remaining four are the coordinates of the top left and bottom right corners of the region. 

region proposals on a cat imageAn image from the Pascal VOC dataset annotated with region proposals (the pink rectangles).

What does the RoI pooling actually do? For every Region of Interest from the input list, it takes a section of the input feature map that corresponds to it and scales it to some pre-defined size (i.e., 7×7). The scaling is done by:

  1. Dividing the region proposal into equal-sized sections (the number of which is the same as the dimension of the output).
  2. Finding the largest value in each section.
  3. Copying these max values to the output buffer.

The result is that from a list of rectangles with different sizes we can quickly get a list of corresponding feature maps with a fixed size. Note that the dimension of the RoI pooling output doesn’t actually depend on the size of the input feature map nor on the size of the region proposals. It’s determined solely by the number of sections we divide the proposal into.

What’s the benefit of RoI pooling? One of them is processing speed. If there are multiple object proposals on the frame (and usually there’ll be a lot of them), we can still use the same input feature map for all of them. Since computing the convolutions at early stages of processing is very expensive, this approach can save us a lot of time.

Region of Interest Pooling: Example

Let’s consider a small example to see how it works. We’re going to perform Region of Interest pooling on a single 8×8 feature map, one Region of Interest and an output size of 2×2. Our input feature map looks like this:

Region of interest pooling example (input feature map)

Let’s say we also have a region proposal (top left, bottom right coordinates): (5,0), (10,7). In the picture, it would look like this:

Region of interest pooling example (region proposal)Normally, there’d be multiple feature maps and multiple proposals for each of them, but we’re keeping things simple for the example.

By dividing it into 2×2 sections (because the output size is 2×2) we get:

Region of interest pooling example (pooling sections)

Notice that the size of the Region of Interest doesn’t have to be perfectly divisible by the number of pooling sections (in this case, our RoI is 7×5 and we have 2×2 pooling sections).

The max values in each of the sections are:

Region of interest pooling example (output)

And that’s the output from the Region of Interest pooling layer. Here’s our example presented in form of a nice animation:

Region of interest pooling (animation)

What are the most important things to remember about RoI pooling?

  • It’s used for object detection tasks.

  • It allows us to reuse the feature map from the convolutional network.

  • It can significantly speed up both train and test time.

  • It allows for the training of object detection systems in an end-to-end manner.

If you need an open-source implementation of RoI pooling in TensorFlow, you can find our version here.

In the next post, we’re going to show you some examples on how to use Region of Interest pooling with Neptune and TensorFlow.

Machine learning Object (computer science) neural network

Published at DZone with permission of Tomasz Grel, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Mastering Time Series Analysis: Techniques, Models, and Strategies
  • Microservices With Apache Camel and Quarkus

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: