Over a million developers have joined DZone.

Deep Learning Achievements of 2017 (Part 2)

DZone's Guide to

Deep Learning Achievements of 2017 (Part 2)

As we wrap up our deep learning review for 2017, let's look into reinforcement learning, news that happened in 2017, and some other miscellaneous stuff.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

In Part 1, we looked at text, voice, and computer vision advancements in 2017. In Part 2, we'll look at reinforcement learning, news, and more!

Reinforcement Learning

Reinforcement learning (RL), or learning with reinforcement, is one of the most interesting and actively developing approaches to machine learning.

The essence of the approach is to learn the successful behavior of the agent in an environment that gives a reward through experience — just as people learn throughout their lives.

RL is actively used in games, robots, and system management (traffic, for example).

Of course, everyone has heard about AlphaGo’s victories in the game over the best professionals. Researchers were using RL for training: the bot played with itself to improve its strategies.

Reinforcement Training With Uncontrolled Auxiliary Tasks

In previous years, DeepMind learned using DQN to play arcade games better than humans. Currently, algorithms are being taught to play more complex games like Doom.

Much of the attention is paid to learning acceleration because the experience of the agent in interaction with the environment requires many hours of training on modern GPUs.

DeepMind reported that the introduction of additional losses (auxiliary tasks), such as the prediction of a frame change (pixel control) so that the agent better understands the consequences of the actions, significantly speeds up learning.

Learning Robots

OpenAI has been actively studying agent training by humans in a virtual environment, which is safer for experiments than in real life.

In one of the studies, the team showed that one-shot learning is possible: a person shows in VR how to perform a certain task, and one demonstration is enough for the algorithm to learn it and then reproduce it in real conditions.

If only it was so easy with people. :)

Learning About Human Preferences

Here is the work of OpenAI and DeepMind on the same topic. The bottom line is that an agent has a task, and the algorithm provides two possible solutions for the human and indicates which one is better. The process is repeated iteratively and the algorithm for 900 bits of feedback (binary markup) from the person learned how to solve the problem.

As always, the human must be careful and think of what he or she is teaching the machine. For example, the evaluator decided that the algorithm really wanted to take the object, but in fact, he/she just simulated this action.

Movement in Complex Environments

There is another study from DeepMind. To teach the robot complex behavior (walk, jump, etc.), and even do it similarly to a human, you have to be heavily involved with the choice of the loss function, which will encourage the desired behavior. However, it would be preferable that the algorithm learned complex behavior itself by leaning with simple rewards.

Researchers managed to achieve this: they taught agents (body emulators) to perform complex actions by constructing a complex environment with obstacles and with a simple reward for progress in movement.

You can watch the impressive video with results. However, it’s much more fun to watch it with superimposed sound!

Finally, I will give a link to the recently published algorithms for learning RL from OpenAI. Now, you can use more advanced solutions than the standard DQN.


Cooling the Data Center

In July 2017, Google reported that it took advantage of DeepMind’s development in machine learning to reduce the energy costs of its data center.

Based on the information from thousands of sensors in the data center, Google developers trained a neural network ensemble to predict PUE (power usage effectiveness) and more efficient data center management. This is an impressive and significant example of the practical application of ML.

One Model for All Tasks

As you know, trained models are poorly transferred from task to task, as each task has to be trained for a specific model. A small step towards the universality of the models was done by Google Brain in his article One Model to Learn Them All.

Researchers have trained a model that performs eight tasks from different domains (text, speech, and images). For example, translation from different languages, text parsing, and image and sound recognition.

In order to achieve this, they built a complex network architecture with various blocks to process different input data and generate a result. The blocks for the encoder/decoder fall into three types: convolution, attention, and gated mixture of experts (MoE).

Main results of learning:

  • Almost perfect models were obtained (the authors did not fine-tune the hyperparameters).
  • There is a transfer of knowledge between different domains, that is, on tasks with a lot of data, the performance will be almost the same. And it is better on small problems (for example, on parsing).
  • Blocks needed for different tasks do not interfere with each other and even sometimes help, for example, MoE, for the Imagenet task.

By the way, this model is present in tensor2tensor.

Learning on ImageNet in One Hour

In their post, Facebook told us how their engineers were able to teach the Resnet-50 model on Imagenet in just one hour. Truth be told, this required a cluster of 256 GPUs (Tesla P100).

They used Gloo and Caffe2 for distributed learning. To make the process effective, it was necessary to adapt the learning strategy with a huge batch (8192 elements): gradient averaging, warm-up phase, special learning rate, etc.

As a result, it was possible to achieve an efficiency of 90% when scaling from 8 to 256 GPU. Now, researchers from Facebook can experiment even faster, unlike mere mortals without such a cluster.


Self-Driving Cars

The self-driving car sphere is intensely developing, and cars are actively being tested. From relatively recent events, we can note the purchase of Intel MobilEye, the scandals around Uber and Google technologies stolen by their former employeethe first death when using autopilot mode, and much more.

I will note one thing: Google Waymo is launching a beta program. Google is a pioneer in this field, and it is assumed that their technology is very good because cars have been driven more than three million miles.

As for more recent events, self-driving cars have been allowed to travel across all U.S. states.


As I said, modern ML is beginning to be introduced into medicine. For example, Google collaborates with a medical center to help with diagnosis.

DeepMind has even established a separate unit.

This year, under the program of the Data Science Bowl, there was a competition held to predict lung cancer in a year on the basis of detailed images with a prize fund of one million dollars.


Currently, there are heavy investments in ML, as it was before with big data. China invested $150 billion in AI to become the world leader in the industry. For comparison, Baidu Research employs 1,300 people, and in the same FAIR (Facebook) — 80. At the last KDD, Alibaba employees talked about their parameter server KungPeng, which runs on 100 billion samples with a trillion parameters, which “becomes a common task.”

You can draw your own conclusions; it’s never too late to study machine learning. In one way or another, over time, all developers will use machine learning, which will become one of the common skills.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

ai ,deep learning ,reinforcement learning ,algorithms

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}