Your Self-Driving Car: How Did It Get So Smart?
Get a high-level look at the data management challenges and data management opportunities that surround autonomous vehicle research.
Join the DZone community and get the full member experience.Join For Free
The march towards autonomous vehicles continues to accelerate. While expert opinions differ on the specific timing and use cases that will emerge first, few deny that self-driving cars are in our future. Not surprisingly, when reviewing big data strategies with my automotive clients, discussions on data management strategies for autonomous driving research inevitably surface.
A few weeks ago, at DataWorks Summit 2017 in Munich, I co-presented a “Big Data in Automotive” session with NorCom and Microsoft, two Hortonworks solution partners collaborating with Hortonworks on data management solutions for autonomous drive research. Particularly intriguing was a discussion I had with Dr. Tobias Abthoff from Norcom. Through our discussions, a more robust data management strategy for autonomous vehicle research emerged.
Autonomous Driving Research: What Is It?
As we all can appreciate, “teaching” a vehicle to drive under the full range of conditions it will encounter (i.e. road conditions, weather conditions and behavior of other traffic participants like cars, trucks or people) is a daunting proposition. If merely the thought of this makes you nervous, you’re not alone. According to the American Automobile Association (AAA), 75 percent of consumers are not yet ready to embrace self-driving cars. However, that is the very challenge facing automakers: teaching vehicles to unfailingly assess and respond to any combination of operational conditions “on-the-fly” through discrete rules (algorithms) governing a vehicle’s behavior.
Machines Learn, Too
Interestingly, humans and machines “learn” in similar ways. For any given situation, both humans and machines must first absorb experiences (data), followed by applying a set of rules (algorithms) that facilitate problem-solving. When outcomes are either positive or negative, we generally learn from the exercise.
Traditional Data Management Approaches Slow Autonomous Development
As it turns out, teaching cars to drive is an incredibly data intensive endeavor. Traditional data management approaches are straining to cope with the demands imposed by autonomous driving research. Consider challenges in the following two areas.
1. Data Storage Challenges
Autonomous test vehicles (those cars with the funky oversized cameras you’ve seen driving around) generate multiple terabytes of video, RADAR, LIDAR, and sensor data per vehicle, per day. For the large fleets of vehicles currently being tested, simple math suggests that automakers are now ingesting and storing multiple petabytes of autonomous car test data. Why store all of this data? The reason is simple: Automakers are attempting to capture data describing virtually every operating condition a vehicle may encounter (i.e. crossing an intersection in slippery conditions at night) and use this information to “teach” cars to drive.
Traditionally, vehicle test data has been stored in numerous Network Attached Storage (NAS) systems, often distributed in locations around the world. However, given the cost and performance limitations inherent with NAS-based systems, automakers are investigating more efficient storage solutions for autonomous vehicle research.
2. Data Processing Challenges
Once all of this vehicle data is stored, how is it actually used to teach a car to drive? Several data processing steps are required. First, each frame of video (with associated RADAR, LIDAR, and sensor data) is analyzed to capture exactly what was “seen” (i.e. a person crossing an intersection) and cataloged, providing a library of driving scenario “inputs” from which engineers can develop rules (algorithms) that dictate how a vehicle should respond. Next, these algorithms must be tested via simulations, utilizing the real-world autonomous vehicle big data previously collected.
Traditional architectures are not optimized for the large-scale data processing workloads required for testing algorithms via simulations. Using traditional methods, vehicle driving data is stored in NAS-based solutions and transferred to workstations, where engineers test algorithms under development. This process introduces two fundamental challenges. First, large amounts of data must be moved, requiring considerable time and network bandwidth. Second, individual workstations do not provide the massive computing power required to return simulation results in a timely manner. Not surprisingly, automakers are seeking more efficient solutions.
A Better Way: Accelerating the Pace of Autonomous Development
In discussions with Dr. Abthoff, he shared with me a fascinating approach that NorCom has taken to address the data management challenges associated with autonomous vehicle research. The approach consists of the following two basic principles.
1. Data Storage Within Hadoop
Through the ability to store data of unlimited size (beyond petabytes) and variety (video, LIDAR, RADAR, sensor, etc.), the Hadoop Distributed File System (HDFS) provides a high-performance and cost-effective foundation for storing data associated with autonomous vehicle research.
2. Data Processing Within Hadoop
The NorCom approach also leverages Hadoop’s inherent capability to perform massively scalable MapReduce and Spark workloads, particularly useful for processing algorithm test simulations. By doing so, NorCom has rewritten the data processing playbook. Rather than moving data to the algorithms on workstations for data processing, this new method prescribes exactly the opposite approach by redeploying the algorithms to the data (Hadoop), where high-performance computing also resides.
By leveraging this approach, data processing performance is enhanced exponentially. Simulation test results that once required days can now be achieved in minutes, accelerating the pace of autonomous development. Complimentary technologies extend these benefits still further. For example, by equipping Hadoop nodes with Graphical Processing Units (GPUs), simulation computations based on deep learning frameworks can be dramatically accelerated. In addition, container technologies such as Docker provide the ability to deploy legacy applications, once only able to be run on Workstations, to be deployed directly on the high-performance Hadoop cluster, without the need to adapt the applications.
Published at DZone with permission of Michael Ger. See the original article here.
Opinions expressed by DZone contributors are their own.