Deep Reinforced Learning: Addressing Complex Enterprise Challenges
Deep Reinforced Learning: Addressing Complex Enterprise Challenges
Explore reinforcement learning and some complex enterprise challenges.
Join the DZone community and get the full member experience.Join For Free
Current deep learning algorithms and methods are nowhere near the holy grail of “Artificial General Intelligence (AGI).”
Current algorithms lean more towards narrow learning, meaning they are good at learning and solving specific types of problems under specific conditions. These algorithms take a humongous amount of data as compared to humans who can learn from relatively few learning encounters. The transfer process of these learnings from one problem domain to another domain is somewhat limited as well.
Recently, reinforcement learning (RL) has been gaining popularity compared to other deep learning techniques. The buzz around reinforcement learning started with the advent of AlphaGo by DeepMind. AlphaGo was built to play the very complex game of Go. The essence of RL is that it can train models through the interaction with the environment and learn and calibrate from their mistakes. Learning happens through a delayed and cumulative reward system where an agent deduces an action, which then acts on the environment to make a state change. The agent takes the next best action based on the optimized delayed reward. The system retains the learning and recalls the best action when a similar circumstance arises.
This feature of RL — to improve and evolve without constant human or programmatic intervention — makes it interesting to real-world problems like autonomous driving. The autonomous driving puzzle cannot be solved by conventional AI alone, which typically leverages computer vision using Convolutional Neural Networks (CNNs). Autonomous driving cannot be modeled as a supervised learning problem due to strong interaction with the environment, including other vehicles, pedestrians, driver behavior, and road infrastructures. At an abstract level, an autonomous driving agent is an implementation of three steps of sequential tasks: sense (recognize), plan, and control.
Figure 2: Autonomous Driving sequential tasks
The recognition problem has been solved with a high degree of accuracy with advancements in computer vision. We now have the capability to detect pedestrians, curb space, free space between vehicles, traffic signs with low computing power, and high accuracy. Path planning is the most difficult piece of the puzzle. One needs to take a series of environment inputs and incorporate recognitions and predictions to chart the future driving actions that maneuver the vehicle safely to its destination (reward) by avoiding any accidents/delays (penalties). The control task is relatively easy, as it simply involves passing the signal to either speed (brake, accelerator) or direction control (steering).
What makes RL so attractive and suitable for autonomous driving is the fact that driving is a multi-player, multi-state problem that involves implicit negotiations and interactions. There can literally be thousands of combinations while entering or exiting a freeway ramp or negotiating a crowded roundabout. The driver temperament, skill level, and experience level cannot be programmed with supervised learning. Through exploration and exploitation techniques, RL can be a great tool for boundary cases, as it can learn from its own experiences and actions that lead to a reward. RL, in a way, closely mimics human decision making — it is like learning to ride a bicycle by trial and error. Mathematically, this state model is best explained with the Markov Decision Process (MDP).
Advancements in reinforcement learning are slowly addressing some of the challenges of huge training data requirements and intense computing needs. There are new advancements in the DQN (Deep Q Network where Q mathematically models the reward function), where an AI agent can learn to drive just by observing the synthetic scenes with virtually simulated miles. The amazing thing is that this learning can happen without much prior information about actual physically driven miles. DQNs currently do have some limitations especially when it comes to dealing with high-dimensional observation space like autonomous driving, which is a continuous domain. Significant progress is being made in this space with Google Deepmind’s innovations with the Deep Deterministic Policy Gradient (DDPG) algorithms to address these limitations.
True Level 4+ autonomous driving as defined by SAE are still many years away. What is in the imminent future is shared mobility and autonomous technology working in concert with humans. The technological advancements will address the use cases of driver safety, enhanced V2X connectivity, and autonomous driving under prescribed conditions such as on a stretch of a freeway.
There are quite a few other industries where RL can be a game changer. As another example, robots on the factory floors today mostly operate on pre-defined paths and confined areas. With RL leveraging State–action–reward–state–action (SARSA) algorithms, Robots can find and negotiate optimum paths with more available degrees of freedom on the factory floor. Robots can become an integrated part and co-exist safely with humans and other plant equipment. The RL led and gaming-inspired advancements in obstacle avoidance can play an important role in providing a collision-free, safe, and secure mechanism for robots to navigate. Think of the opportunities it can open up where a robot can bring material from a warehouse and load it on the assembly line — all within an extended ERP and warehouse management system.
The factors that are democratizing the deep learning adoption is the availability of abstract libraries such as Keras. These libraries hide the mathematical complexities involved in various tensor operations and let you focus on the model development, hyperparameter tuning, and model deployment to carry out predictions. To illustrate, the 9-line Keras code teaser snippet below builds a functioning reinforced learning DQN with one hidden layer and an input layer with 12 neurons.
Figure 3: Keras code snippet for DQN implementation
The open-source interfaces such as OpenAI gym provide a suite of reinforcement learning tasks. They provide the environment where developers can bring their algorithms developed in a backend of their choice whether it is TensorFlow of Theano. The maturity of platforms like CUDA, which further leverage GPU compute power, Tensor Processing Units (TPU), neural network chips, etc. have contributed significantly to deep learning progress as well.
Enterprises will need a new mindset to fully exploit these emerging deep learning trends. At a minimum, they will need access to high-performance computing (HPC) environments that can support prototyping, simulations, transformations, rendering, visualizations, and training. Enterprises will need skilled resources that combine a wide range of engineering and computer science disciplines and are savvy with machine learning and data science concepts.
This important trend to move away from rule-based to AI-driven programming will further evolve into modeless programming with RL. I foresee challenges because RL neural networks are difficult to train due to extremely high training data needs. It takes a significant amount of time and resources to collect the interesting (driving segments that bring new scenarios and conditions) physically-driven miles to the learning algorithms. An autonomous vehicle equipped with multiple Lidars, cameras, and other sensors can create petabytes of data in hours.
The trick is to create virtual miles and use modern simulation techniques to improve the accuracy of the predictions. I foresee the roles of GANs (Generative Adversarial Networks) as a technology to create synthetic data becoming mainstream. GANs are dueling networks that are pitted against each other like two boxers. The generator part of the neural net creates the fake data, and the discriminator part of the neural net evaluates for authenticity. Over a period, the generator gets so good that the discriminator cannot differentiate between fake and real data. In the autonomous driving world, GANs can take an actual driving scenario and create diverse scenarios by adding different weather, lighting, and congestion conditions and create synthetic scenes that are photorealistic and can be used for training.
Another challenge will be the verifiability and explainability of deep learning algorithms. This is still an area where a lot of research is happening. At the end of the day, the whole solution needs to be automotive grade and ASIL (Automotive Safety Integrity Level) compliant and provide traceability into each decision that an AI algorithm makes.
Another concern I have is the feature engineering that enables re-shaping the data using domain knowledge is still a critical data science skill that is in short supply. There is no substitute for proper feature engineering to improve the accuracy of predictive models. Some of the modern AutoML platforms are getting smarter to discard weak features and remove noise from the signal. The right ensembles of different models will continue to be very important. After all, you need a little bit of everything — like XGBoost and some amount of k-means — to provide the best predictions for real-world problems!
These exciting technologies will find their home in multiple domains and bring significant improvements to our quality of life and tackle some of the most demanding challenges of humanity.
About the author
Raman Mehta is the CIO at Visteon. Raman has earned several leadership awards including CIO magazine’s 2017, 2013 CIO 100 Award, Computerworld’s 2012 Premier 100 IT Leaders Award, and a Crain’s Detroit Business CIO award.
Intellectus is an exclusive, invite-only community of highly acclaimed experts and analysts coming together to share their research, insights and views on trending topics in their respective industries on top-level publications. Our goal is to empower industry experts and make their thoughts known to the world.
Opinions expressed by DZone contributors are their own.