Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Building LinkedIn's Real-Time Data Pipeline

DZone's Guide to

Building LinkedIn's Real-Time Data Pipeline

Free Resource

Modernize your application architectures with microservices and APIs with best practices from this free virtual summit series. Brought to you in partnership with CA Technologies.


At the core of many of LinkedIn's analytics applications is a real-time data pipeline built on top of Apache Kafka. This system handles over 10 billion messages writes per day for thousands of production processes. This talk will cover some of the challenges of building and scaling this data pipeline for log data, system metrics, and other high-volume data streams. It will also cover some details of the design of Kafka, as well as some of the particular requirements of Hadoop data loads and real-time processing applications.

About Jay Kreps
Jay is the technical lead for LinkedIn's data team, which is responsible for the site's core data technologies including storage systems, data pipelines, Hadoop, search, social graph, and recommendation systems. He is an original author on several open source projects including Apache Kafka, a real-time distributed messaging system, and Project Voldemort a distributed key-value store. He has a Masters degree in computer science from UC Santa Cruz where he studied machine learning.

The Integration Zone is proudly sponsored by CA Technologies. Learn from expert microservices and API presentations at the Modernizing Application Architectures Virtual Summit Series.

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}