DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Evolving Data Strategy at Major Canadian Bank
  • Delta, Hudi, and Iceberg: The Data Lakehouse Trifecta
  • Demystifying Data Fabric Architecture: A Comprehensive Overview
  • Design Twitter Like Application Using Lambda Architecture

Trending

  • Exploring Edge Computing: Delving Into Amazon and Facebook Use Cases
  • Top 8 Conferences Developers Can Still Attend
  • Microfrontends for Quarkus Microservices
  • Writing Reusable SQL Queries for Your Application With DbVisualizer Scripts
  1. DZone
  2. Data Engineering
  3. Data
  4. Your Self-Driving Car: How Did It Get So Smart?

Your Self-Driving Car: How Did It Get So Smart?

Get a high-level look at the data management challenges and data management opportunities that surround autonomous vehicle research.

Michael Ger user avatar by
Michael Ger
·
Jul. 08, 17 · Opinion
Like (2)
Save
Tweet
Share
3.64K Views

Join the DZone community and get the full member experience.

Join For Free

The march towards autonomous vehicles continues to accelerate. While expert opinions differ on the specific timing and use cases that will emerge first, few deny that self-driving cars are in our future. Not surprisingly, when reviewing big data strategies with my automotive clients, discussions on data management strategies for autonomous driving research inevitably surface.

A few weeks ago, at DataWorks Summit 2017 in Munich, I co-presented a “Big Data in Automotive” session with NorCom and Microsoft, two Hortonworks solution partners collaborating with Hortonworks on data management solutions for autonomous drive research. Particularly intriguing was a discussion I had with Dr. Tobias Abthoff from Norcom. Through our discussions, a more robust data management strategy for autonomous vehicle research emerged.

Autonomous Driving Research: What Is It?

As we all can appreciate, “teaching” a vehicle to drive under the full range of conditions it will encounter (i.e. road conditions, weather conditions and behavior of other traffic participants like cars, trucks or people) is a daunting proposition. If merely the thought of this makes you nervous, you’re not alone. According to the American Automobile Association (AAA), 75 percent of consumers are not yet ready to embrace self-driving cars. However, that is the very challenge facing automakers: teaching vehicles to unfailingly assess and respond to any combination of operational conditions “on-the-fly” through discrete rules (algorithms) governing a vehicle’s behavior.

Machines Learn, Too

Interestingly, humans and machines “learn” in similar ways. For any given situation, both humans and machines must first absorb experiences (data), followed by applying a set of rules (algorithms) that facilitate problem-solving. When outcomes are either positive or negative, we generally learn from the exercise.

Traditional Data Management Approaches Slow Autonomous Development

As it turns out, teaching cars to drive is an incredibly data intensive endeavor. Traditional data management approaches are straining to cope with the demands imposed by autonomous driving research. Consider challenges in the following two areas.

1. Data Storage Challenges

Autonomous test vehicles (those cars with the funky oversized cameras you’ve seen driving around) generate multiple terabytes of video, RADAR, LIDAR, and sensor data per vehicle, per day. For the large fleets of vehicles currently being tested, simple math suggests that automakers are now ingesting and storing multiple petabytes of autonomous car test data. Why store all of this data? The reason is simple: Automakers are attempting to capture data describing virtually every operating condition a vehicle may encounter (i.e. crossing an intersection in slippery conditions at night) and use this information to “teach” cars to drive.

Traditionally, vehicle test data has been stored in numerous Network Attached Storage (NAS) systems, often distributed in locations around the world. However, given the cost and performance limitations inherent with NAS-based systems, automakers are investigating more efficient storage solutions for autonomous vehicle research.

2. Data Processing Challenges

Once all of this vehicle data is stored, how is it actually used to teach a car to drive? Several data processing steps are required. First, each frame of video (with associated RADAR, LIDAR, and sensor data) is analyzed to capture exactly what was “seen” (i.e. a person crossing an intersection) and cataloged, providing a library of driving scenario “inputs” from which engineers can develop rules (algorithms) that dictate how a vehicle should respond. Next, these algorithms must be tested via simulations, utilizing the real-world autonomous vehicle big data previously collected.

Traditional architectures are not optimized for the large-scale data processing workloads required for testing algorithms via simulations. Using traditional methods, vehicle driving data is stored in NAS-based solutions and transferred to workstations, where engineers test algorithms under development. This process introduces two fundamental challenges. First, large amounts of data must be moved, requiring considerable time and network bandwidth. Second, individual workstations do not provide the massive computing power required to return simulation results in a timely manner. Not surprisingly, automakers are seeking more efficient solutions.

A Better Way: Accelerating the Pace of Autonomous Development

In discussions with Dr. Abthoff, he shared with me a fascinating approach that NorCom has taken to address the data management challenges associated with autonomous vehicle research. The approach consists of the following two basic principles.

1. Data Storage Within Hadoop

Through the ability to store data of unlimited size (beyond petabytes) and variety (video, LIDAR, RADAR, sensor, etc.), the Hadoop Distributed File System (HDFS) provides a high-performance and cost-effective foundation for storing data associated with autonomous vehicle research.

2. Data Processing Within Hadoop

The NorCom approach also leverages Hadoop’s inherent capability to perform massively scalable MapReduce and Spark workloads, particularly useful for processing algorithm test simulations. By doing so, NorCom has rewritten the data processing playbook. Rather than moving data to the algorithms on workstations for data processing, this new method prescribes exactly the opposite approach by redeploying the algorithms to the data (Hadoop), where high-performance computing also resides.

By leveraging this approach, data processing performance is enhanced exponentially. Simulation test results that once required days can now be achieved in minutes, accelerating the pace of autonomous development. Complimentary technologies extend these benefits still further. For example, by equipping Hadoop nodes with Graphical Processing Units (GPUs), simulation computations based on deep learning frameworks can be dramatically accelerated. In addition, container technologies such as Docker provide the ability to deploy legacy applications, once only able to be run on Workstations, to be deployed directly on the high-performance Hadoop cluster, without the need to adapt the applications.

Big data Data management Data processing Processing

Published at DZone with permission of Michael Ger. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Evolving Data Strategy at Major Canadian Bank
  • Delta, Hudi, and Iceberg: The Data Lakehouse Trifecta
  • Demystifying Data Fabric Architecture: A Comprehensive Overview
  • Design Twitter Like Application Using Lambda Architecture

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: