DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Exploring Deep Learning Models for Compression and Acceleration

Exploring Deep Learning Models for Compression and Acceleration

Let's take a look at exploring Deep Learning models for compression and acceleration as well as explore low bit Neural Networks.

Leona Zhang user avatar by
Leona Zhang
·
Sep. 17, 18 · Analysis
Like (1)
Save
Tweet
Share
4.23K Views

Join the DZone community and get the full member experience.

Join For Free

1. Compression and Acceleration With Deep Learning Models

As the scale of deep learning networks increases, the computational complexity also increases accordingly, which severely limits its application to smart devices such as mobile phones. For example, the use of large-scale complex network models such as VGGNet and the residual network on an end device is not realistic.

Image title

Therefore, we need a deep learning model to perform compression and acceleration. We have described two major compression algorithms below.

1.1 Extremely Low Bit Neural Networks

The Low Bit model refers to compressing consecutive weights into discrete low-precision weights. As we have shown in the image below, the parameters of the original deep learning neural network are float variables and need 32 bit storage space. If we convert them into a state with only three values (0, +1, -1), then storage takes just 2 bits, significantly compressing the storage space and simultaneously avoiding multiplication operations. Only the symbol bit changes to addition and subtraction operations, thereby increasing computation speed.

Image title

Here is a reference article on Low Bit model — Extremely Low Bit Neural Networks: Squeeze the Last Bit Out with ADMM.

Next, we will use a binary network as an example to explain the above compression process. First, suppose that the optimized function of the original neural network is f(w), and the limiting condition is that the parameters of the deep learning network are constrained to within C. If C is (-1, 1), then the network is a binary network, as below:

Image title

Here, we will introduce an ADLM (Alternating Direction Method of Multipliers) that solves distributed optimization and constrained optimization to solve the above discrete non-convex constrained optimization problems. It takes the following form:

Image title

We will use ADMM to solve when the objective function is f(x)+g(z), and where the constraint condition is the optimization of Ax+Bz=c. First, write the Augmented Lagrangian function, and then convert the above problem by solving xyz:

Image title

That is, the minimum of xz is solved first, and then the updated value of y is obtained accordingly. The above is the ADMM standard solution. Next, let's see how we can convert the Low Bit Neural Network problem into an ADMM problem. First, we need to introduce the indicator function, the form of which is as follows:

Image title

Here, the objective function of the binary neural network is equivalent to the sum of the optimized objective function and the indicator function.

Image title

This means that when the indicator function belongs to C, the optimization goal is equal to the initialization goal. When the current function does not belong to C, the indicator function is positive infinity, and the indicator function will be optimized first.

Then, we need to introduce a consistency restraint. Here we introduce the variable G and constrain W = G, so the objective function is equivalent to:

Image title

After adding this helper variable, we can transform the optimization problem of the binary neural network into an issue of ADMM standard. Next, we write the formula for Augmented Langrangian and use the ADMM algorithm to reach the optimization goal, as we have described below:

Image title

Aside from the binary network described above, there are also the following common parameter spaces:

Image title

After adding 2, 4, and 8 as values in the parameter space, there is still no need for multiplication. Instead, we need only the shift operations. Therefore, this replaces all multiplication operations in the neural network with shift and add operations.

We have shown the final optimization results after applying the above Low Bit model to ImageNet for classification in the following table:

Image title

Table 1 shows the application results of the algorithm in AlexNet and VGG-16. You will find that the algorithm is better in binary and trinary networks than in the original range. Furthermore, the classification results in the trinary network are nearly lossless when compared to results from the full precision classification. Table 2 shows the application of the algorithm in ResNet-18 and ResNet-50. The results are similar to those in Table 1.

In terms of detection, the algorithm still features high availability as you can see in the following table:

Image title

We drew the data for this experiment from Pascal VOC 2007. According to the data in the above table, the accuracy of the detection results in the three-value space is almost negligible compared to the full-precision parameter space.

1.2 Extremely Sparse Networks

The sparse neural network is suitable for cases where most of the parameters in the network are zero. We can simplify the storage parameters using a simple compression algorithm, such as run-length coding, to reduce the parameter storage space significantly. Since we cannot involve 0 in the calculation, a large amount of computation space is saved, which greatly increases computation speed. In sparse networks, the optimization goal is still the same as above, but the restrictions get changed as follows:

Image title

We can obtain the falling gradient for f(w) and used it in successive iterations. In each iteration, we will perform connection pruning according to the standard that the smaller the W parameter, the less important it is. When we change small parameters to zero, we can maintain sparsity.

Image title

However, there is an obvious problem with the above solution, as we have described in the following figure:

Image title

W1 is closer to 0 than w2; however, if w1 is set to zero, the loss to the function is higher. Therefore, when determining the importance of w, consider the size and slope simultaneously. You can set it to zero only if the value of w and the slope are both low. Based on the above criteria, we conducted a rare experiment on Alexnet and GoogleNet, as we have shown below:

Image title

We can see from the above image that whether it's a pure convolution network or it's inside a fully connected layer network, it can reach a sparsity of 90%.

1.3 Comparison of Experiment Results

We have described the sparse and quantized methods above. In Experiment 1, we have applied the two methods to Alexnet at the same time. The results are as follows:

Image title

We can see from the above image that at 3 Bits, when the sparsity is 90% or more, the loss of precision is almost negligible, with a compression rate of more than 82 times.

In Experiment 2, we applied both methods to ImageNet and Pascal VOC. Among them, P was sparse and Q was quantified. From the results in the figure, we can see that the accuracy loss of the experimental process was minimal, and the speed of inference in ImageNet was significantly improved. Pascal VOC can reach a sparsity of 88.7%, quantified in 3 bits with a compression ratio of 40 times. This is only a one-point drop from the full-precision network mAP.

Image title


2. Training Platforms

We established a Gauss training platform based on the above two methods. The Gauss training platform currently supports several common types of training tasks (like facial recognition, OCR, classification, and supervision) and models (like CNN and LSTM). It also supports the training of multiple machines and we can set it with the smallest number of parameters possible, reducing the amount of effort required from the user.

Image title

At the same time, the Gauss training platform also supports two types of model training tools: Data-dependent and Data-independent. Data-dependent model training tools require users to provide training data. The training time is longer and it is suitable for scenarios that demand compression and acceleration. Data-independent training tools require no user-supplied training data and feature one-click processing and second-level processing.

3. Highly Efficient Forward Inference Tools

Even after the establishment of the training platform, the actual use of the model still requires efficient forward inference tools. We can implement Low-bit matrix calculation tools quickly using low-precision matrix calculation tools like AliNN and BNN. After implementation, the reasoning tool is 2-5 times faster than competing products on the ARM platform and three times faster on Intel platform.

Image title

Deep learning neural network Network optimization Database Space (architecture)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 5 Data Streaming Trends for 2023
  • File Uploads for the Web (2): Upload Files With JavaScript
  • Chaos Engineering Tutorial: Comprehensive Guide With Best Practices
  • Integrate AWS Secrets Manager in Spring Boot Application

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: