DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Autonomous Cars, Big Data, and Edge Computing: What You Need to Know

Autonomous Cars, Big Data, and Edge Computing: What You Need to Know

Self-driving cars need to take in large amounts of data and process them almost instantly. Read on for overview of this process.

Arjuna Chala user avatar by
Arjuna Chala
·
Feb. 19, 19 · Analysis
Like (2)
Save
Tweet
Share
12.87K Views

Join the DZone community and get the full member experience.

Join For Free

This article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!

The driverless car has been a high-tech dream for decades. Now that broadband connectivity, cloud computing, and artificial intelligence are increasingly available, autonomous cars should go mainstream in the near future, provided certain technical and regulatory milestones are reached. But another issue that must be addressed before self-driving cars can reach critical mass is the issue of data. Specifically, the data analysis and storage requirements of autonomous cars present challenges beyond the capabilities of most current big data solutions.

Autonomous cars generate a staggering amount of data. Intel estimated one car generates terabytes of data in eight hours of operation. Multiple images, radar/lidar, time-of-flight, accelerometers, telemetry, and gyroscope sensors generate data streams that must be analyzed in order to perform the calculations and adjustments required to safely navigate a car. That analysis needs to happen in real-time if the car is to keep up with constantly changing driving conditions (other cars or pedestrians moving around the vehicle, changing weather and light conditions, traffic signs, and so on). These real-time performance requirements mean there's no time to upload data to a central server, conduct the necessary analytics, and then send instructions back to the car for execution. Data that is critical to safely navigate the car must be analyzed locally by the car itself — essentially, the car is an edge device in a cloud network.

Not only does the car need to analyze data on its own, it must also learn to pick and choose between different data streams to identify the ones best suited for analysis at any given moment to keep the car driving safely. 

That last requirement — the need to determine what data is required to perform an analysis — is tricky. While predefined filters can help a car's machine learning routines learn what data to use and when to use it, those filters are generated by human engineers, so they can't be updated in real-time. Accordingly, an autonomous car will need to run machine learning and analytics engines powerful enough to recognize mission-critical data requiring immediate analysis and action on their own, without involving a human in the analysis. Once input from a person is required, decision-making based on data analysis in real time is simply not possible.

We need analytics and machine learning algorithms for autonomous cars that can:

  • Identify data in all formats.

  • Recognize what data is required for mission-critical operations and perform analysis of that data locally.

  • Compress or aggregate non-critical data for uploading to the cloud for future use.

  • Schedule uploads of non-critical data from the car to the cloud when less expensive communications are available (for example, when the car is parked overnight at home and can access the owner's Wi-Fi instead of a metered cellular network).

  • Know how to call for historical data from the cloud so the AI can use it for future analytics.

The last bullet is particularly important. An autonomous car manufacturer will be responsible for storing vast amounts of data generated by cars operating around the world, and much of that data will likely have no real value when initially captured. However, that data's value may be revealed in the future as the manufacturer's autonomous driving applications evolve and improve. Today's non-critical data can be useful for future applications, provided the data is properly stored and easily accessible. If they don't make plans in advance for how to make data available whenever necessary, autonomous car vendors run the risk of creating a "dark data" problem. Dark data is the term used to describe data assets an organization collects but fails to take advantage of — because they don't know how to, or perhaps forgot they have. This will be a particularly significant problem for self-driving cars because of the sheer volume of data they generate.

To address the dark data problem, autonomous car vendors need to move their data storage strategies away from data warehouse models and adopt emerging data storage models like data lakes. While a detailed examination of the difference between a data warehouse and a data lake is beyond the scope of this article, to illustrate the difference between the two, compare a book with a library. With a book (data warehouse), someone has already determined what content is contained in that book and how it is formatted, while a library (data lake) allows you to store whatever content you want in almost any format. In other words, a data warehouse is a centralized platform for basic importing, exporting, and preprocessing of data gathered from a collection of linked systems using one data schema. A data lake is a distributed yet integrated data platform that supports schemaless (including unstructured and structured) data and performs queries of data in real-time by leveraging metadata to quickly find, transform, and load data between systems. Data lakes' support for both structured and unstructured data on the same platform is important, as autonomous car sensors generate datastreams in very different formats that can't easily be stored in the same schema. Other key differences that distinguish a data lake from a data warehouse include:

  • Schema on read.

  • Unlimited storage.

  • The ability to access both raw and processed data.

  • The ability to link data from many individual clusters.

Linking data between clusters is particularly important for autonomous cars, as it allows for the integration of different datasets from different geographic locations. Car OEMs are global companies with multiple offices and data centers scattered around the world. As more countries move to support autonomous cars, autonomous car vendors will want to use all the data generated by cars driving locally in the self-driving AI and ML algorithms they use to power their cars globally. As we see more vendors enter the autonomous driving market, the ones who will ultimately win out over others will be those vendors best prepared to analyze data at the local level and those who have cataloged their databases properly — so future autonomous applications can find the data they need, when they need it.

This article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!

Big data Machine learning Computing

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • OpenVPN With Radius and Multi-Factor Authentication
  • MongoDB Time Series Benchmark and Review
  • Tackling the Top 5 Kubernetes Debugging Challenges
  • Benefits and Challenges of Multi-Cloud Integration

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: