DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • System Coexistence: Bridging Legacy and Modern Architecture
  • Building an AI/ML Data Lake With Apache Iceberg
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

Trending

  • Advancing Your Software Engineering Career in 2025
  • Efficient API Communication With Spring WebClient
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Code Reviews: Building an AI-Powered GitHub Integration
  1. DZone
  2. Coding
  3. Java
  4. Apache Flink: Full Checkpoint vs Incremental Checkpoint

Apache Flink: Full Checkpoint vs Incremental Checkpoint

This article will analyze the benefits a streaming application can gain when configured to run with incremental checkpointing.

By 
Josson Paul Kalapparambath user avatar
Josson Paul Kalapparambath
·
Feb. 17, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
5.0K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Flink is a real-time data stream processing engine. Most of the stream processing applications are ‘stateful.’ This means the state is stored and used for further processing. In Apache Flink, the state is managed through a configured state backend. Flink supports two-state backends in production. One is the HashMapStateBackend, and the other one is the EmbeddedRocksDBStateBackend. 

To prevent data loss and achieve fault tolerance, Flink can persist snapshots of the state to a durable storage. Flink can be configured to snapshot either the entire state into a durable location or the delta since the last snapshot. The former is called full checkpoint, and the latter is known as the incremental checkpoint. 

In this article, we are going to compare HashMapStateBackend with full checkpoint and EmbeddedRocksDBStateBackend with incremental checkpointing. This article assumes that the audience has either working knowledge or theoretical knowledge of Apache Flink. 

Overview of Flink State Backend

To understand the Flink state backend, it is important to know the difference between in-flight state and state snapshots. 

The in-flight state is known as Flink’s working state and is stored locally. Based on the state back-end configuration, it is either in heap memory or off-heap memory, with a possible spillover to the local disk. 

On the other hand, state snapshots (checkpoint or save point) are stored in a durable remote location. These snapshots are used to reconstruct the Flink job state in case of a job failure. 

The in-flight state can be lost if the job fails. This doesn’t impact the job recovery if the checkpoint is enabled in the job. When the checkpoint is configured, the state gets retrieved from durable storage at the time of recovery. 

Which state backend to be selected for the production depends on the application’s requirement for throughput, latency, and scalability.

There are two state backends that Apache Flink supports in production.

1. HashMapStateBackend

It is a lightweight state backend in Flink for managing the Keyed State and Operator State during the stream processing. The state is stored in the Java Heap using a HashMap data structure. Since it is stored in memory, the main constraint here is that the maximum state size is limited to the Java Heap size. There is no serialization involved in writing to the state or reading from the state. So, this is suitable for low latency, high throughput, and not-so-big state applications. 

2. EmbeddedRocksDBStateBackend

This state backend stores the in-flight data in the in-memory RocksDB database. By default, RocksDB stores the data in the local disk of the task manager. The data is serialized and stored in an off-heap memory and spilled over to a local disk attached to the task manager. The serialization format depends on the type serializer configured in the application. 

With this state backend, the amount of state that can be stored is only limited by the disk space attached to the task manager. If the application has a huge state and can’t be contained in the heap memory, this is the right state backend. Since serialization is involved, the application is going to have higher latency and lower throughput compared to HashMapStateBackend.

Overview of Snapshot State

The snapshot represents the global state of the Flink Job. This consists of a pointer to each data source and the state of all Flink’s stateful operators after processing up to those pointers from sources. Checkpointing in Apache Flink is a mechanism to achieve fault tolerance by periodically saving the state to durable remote storage. 

In case of job failure, Flink retrieves the stored state from durable remote storage and starts processing the streaming data from where it left off. Flink uses asynchronous barrier snapshotting. It is a variant of the Chandy-Lamport algorithm. 

Flink supports two types of checkpointing.

1. Full Checkpointing

Full checkpointing is where the entire state of the Flink job is captured and stored in durable remote storage. In case of a job failure, the job recovers from the previously stored state. The storage space requirement and the time taken to do checkpointing are entirely dependent on the application state. The full checkpointing works with both HashMapStateBackend and RocksDBStateBackend.

2. Incremental Checkpointing

Incremental checkpointing is an optimized approach. Instead of snapshotting the entire state, Flink saves only the ‘deltas’ made to the state since the last checkpoint. This reduces the network overhead and, thus, the time taken for checkpointing. The checkpointing is happening fully asynchronous in this case. 

Only RocksDBStateBackend supports the incremental checkpointing. Flink leverages RocksDBs internal mechanism for this. Even though checkpointing takes less time than full checkpointing, in case of a job failure, the recovery time depends on many factors. If the network is a bottleneck, the recovery time can be higher than the recovery from full checkpointing.

Analysis

Pipeline details: Apache Beam pipeline running on the Flink engine with a fixed window of 10 minutes and the checkpoint is configured to run every 3 minutes. The serialization type configured is AVRO.

  • Cluster type: "m5dn.4xlarge"
  • Final checkpoint storage: S3
  • Number of unique keys: 2K
Input rate 10k/s (~ 7.8 MB/s)
Type  No:
of TM
Parallelism Heap Allocation
per TM
Heap Usage
per TM
Pod Memory Usage
per TM
CPU usage
per TM
Checkpoint
Size
Checkpoint
Duration
Flink Managed
Memory
HashMapState
With Full Checkpoint
1 1 10GB 8.5GB 11.1GB 1.2 4GB 50 sec 0
RocksDBState
With Incremental
 Checkpoint with AVRO
1 1 3GB 1.82GB 4.63GB 1.5 207MB 3 sec 3GB

Input rate 20k/s (~15.6 MB/s)
Type  No:
of TM
Parallelism Heap Allocation
per TM
Heap Usage
per TM
Pod Usage
per TM
CPU usage
per TM
Checkpoint
Size
Checkpoint
Duration
Flink Managed
Memory
HashMapState
With Full Checkpoint
2 2 10GB 8.69GB 11.2GB 1.3 8.39GB 50 sec 0
RocksDBState
With Incremental
Checkpoint with AVRO
2 2 3GB 1.87GB 4.71GB 1.4 404MB 3 sec 3GB


input rate 30k/s (~23.5 MB/s)
Type  No:
of TM
Parallelism Heap Allocation
per TM
Heap Usage
per TM
Pod Usage
per TM
CPU usage
per TM
Checkpoint
Size
Checkpoint
Duration
Flink Managed
Memory
HashMapState
With Full Checkpoint
3 3 10GB 9.74GB 11.2GB 1.2 11.8GB 65 sec 0
RocksDBState
With Incremental
 Checkpoint with AVRO
3 3 3GB 1.72GB 3.75GB 1.4 576MB 5 sec 3GB


As you can see from the above experiment, the checkpoint duration decreases with incremental checkpointing. This can very well help with application performance.  

Summary

Below is the summary of the experiment.


HashMapStateBackend with Full Checkpoint RocksDBStateBackend with Incremental Checkpoint
Application Latency Low latency because data is stored as Java Objects in the Heap. Reading and writing don't involve any serialization. Since serialization is involved in every read or write application, latency will be higher.
Scalability Less scalable for jobs with large state Highly scalable for jobs with large state and slowly changing states
Fault Tolerance Highly fault tolerant Highly fault tolerant
Checkpoint duration Checkpoint duration is high because snapshotting is happening for the entire data set every time. Checkpointing duration is less because only the delta since the last checkpoint is saved.
Recover Complexity Recovery is easy because only one snapshot has to be loaded. Recovery is complex because RocksDB has to build the state from multiple checkpoints and much depends on the network speed.
Storage Requirement Supported by both HashMapStateBackend and RocksDBStatebackend. Supported only by RocksDBStatebackend.
State snapshot Saves the entire state at every checkpoint. Saves only the delta since the last successful one.
Heap Size Since the state is stored in the Heap before checkpointing, the Heap requirement is high, and more GC cycles are to be expected.  States are stored in the off-heap and possibly on the local disk, thus less Heap space and lesser GC cycles.
State Backend Size Limited to the max heap allocated to a JVM. RocksDB state backend size is not limited by the JVM Heap limit but only by the available disk space.
Performance Impact Higher impact on processing because it is a full snapshot. Lesser impact on processing because it is only the delta that is snapshotted.
CPU CPU usage is only for processing and GC. No state back-end serialization is involved. CPU usage is higher compared to Full Checkpoint for the same input data rate.

CPU utilization can be optimized by applying a proper serialization mechanism. We experimented with Avro and got much better results compared to Kryo
Best Use case Good for smaller state backend size and frequently changing state. Good for higher state backend and slowly updating the state. 


Apache Flink Checkpoint (pinball) Apache

Opinions expressed by DZone contributors are their own.

Related

  • System Coexistence: Bridging Legacy and Modern Architecture
  • Building an AI/ML Data Lake With Apache Iceberg
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!