Kafka Streams: Catching Data in the Act (Part 2)
Go over factors we need to consider in our design, understand the capacity of our system, characterize it in terms of operational delays, and learn how to plan for failures and recovery.
Join the DZone community and get the full member experience.Join For Free
i was on vacation with my son at yosemite over spring break a few weekends ago. the early part of the trip was washed out due to rain, as they closed the park and we were cooped up in the lodge waiting it out. but we had a patio view of the merced river and the torrent of water gushing through. loads of time to break out the laptop and write up the long overdue second installment to the use of kafka streams. so, here we go.
in the previous post , we laid out a monitoring problem and a solution approach with kafka streams. here, we'll go over the design of the experiment and simulate it to convince ourselves that it all works as expected. we start with the overarching timing belt, the pace, and the resiliency concerns that should be accounted for in the design. we want to have an intuitive feel for the rhythm of the operation and know what to expect before we start generating a lot of data.
the objective of this post is to go over the factors we need to consider in the design, understand the capacity of our system, characterize it in terms of operational delays inherent in the system, and plan for failures and recovery. we'll run a few simulations as we go along to confirm that the results are as per the design.
the apparatus for the experiment to twist the triangle is the laptop. the three producer processes get started at the same time and write the rawvertex data to disk. the stream process pulls the rawvertex data, smooths them over a short contiguous period, writes to various state stores, computes stress metrics, and saves to long-term storage. that is the big picture.
the timing belt
stream operations live and die by time. so first, the operational definition of time in the model is critical as the monitors and/or the stream processors can go down and restart, getting stuck in expensive calculations, or the network slows down the delivery of messages, etc. — even as the wall-clock chugs along uninterrupted. the good folks at kafka have realized this and allowed for the concept of stream time to be defined as per the needs of the use case.
in our case, this would be the time a measurement is made, or a close enough instance in time when all three vertices have a measurement — so, we can approximate a snapshot of the triangle (it certainly will not do to define a continuously deforming triangle with the positions of the vertices measured at widely different times). this stream time will not advance if no new measurements emanate from the monitors — perfect!
second, we need to have an intuitive understanding of the operational rhythm, so a quick glance at the various metrics in a production scenario at, say, 4 am can tell us what may be amiss. this is the clip of the operation — like the number of raw/smoothed measurements and the number of triangle metrics per unit time that we should be seeing on a production dashboard. this pace of the operation is determined by the rate of rawvertex production at the monitors, the smoothing interval at the vertexprocessor before it produces a smoothedvertex, and the window size (and its retention time) in the triangleprocessor. a few obvious considerations would be:
- smoothing interval : the smoothing interval employed by the vertexprocessor should be small enough to not to mask any trends while removing noise. this interval defines the minimum size of the window in the triangleprocessor.
- window size : too big a window size means that the triangle is not close to a snapshot image. too small a window size may yield zero number of measurements from one or more of the sensors. we need at least one measurement from each sensor/vertex to fall in a window in order to define a triangle.
- window retention time : too small of a retention time means any late-arriving messages will not be used, leading to some windows having zero measurements for a vertex even while the window is about to be dropped — meaning no triangle can be defined for this window period. but too big of a retention time means more bookkeeping, computation, disk usage, and likely a longer wait for a metric to plateau/saturate.
latency and steady/unsteady state operations
third, we have to take into account the limitations of the apparatus (the laptop, as is the case here) at hand, such as latency and contrast between the steady and unsteady operational states.
- latency : how long does it take for a trend at the monitors to manifest itself as a trend in the stress metrics for the triangle under a steady state operation? this is the core latency inherent in the system, determining how fast we can respond to events.
- capacity and resiliency : if the stream processor dies and is restarted, it will pick back up at about the same stream time as when it went down, as the processor is backed up by persistent state stores ( rocksdb ). there is no data loss. but how long does it take for the system to catch up and reach steady state? this is the capacity and resiliency of the system.
in a steady-state operation, the rate at which the raw vertices are delivered stays about the same. but when the stream processor comes back up after a failure, it can get overloaded with all the backlogged measurements. the vertexprocessor gets too busy to spare time to forward the smoothed measurements to the next stage. the punctuate/forward operation is best effort , meaning that if the processor is otherwise busy, the punctuate operation may be delayed or clubbed with the next punctuate op. if the processor has been down for an extended period of time, one can see large delays in producing the first smoothed vertices after a restart.
the steady state
with the above discussion in mind, we proceed with the simulation in steady state.
- one raw vertex per second : the producers sleep for 1 second after producing a message to the rawvertex topic. so, there could be no more than 60 raw measurements delivered to each partition in a minute.
- five smoothed vertices per minute : vertex processor punctuates (ideally!) the forward operation every 12 seconds. that is, the vertex processor keeps a running average of the x and y coordinate values for 12 seconds and then forwards those averages to the smoothedvertex topic before starting a new smoothing batch. so, in ideal conditions, we should see no more than 12 raw measurements (as the monitors are producing one measurement per second) in each average, with five such averages being produced per minute. the three vertexprocessors together then pump 15 smoothed measurements to the smoothedvertex topic in a minute.
- one triangle per minute : the window store employs one-minute windows so we get a triangle snapshot for every minute of stream-time (and real-time in a steady state operation) that we can alert on if needed.
- three hours of window retention : each window lives for three hours of stream time before it is disposed of. in a steady state operation, three hours is plenty of time to saturate a window with all the measurements it can ever get.
running the simulation for over 12 hours and capturing the beat for verification on elasticsearch and kibana, we see that our design holds up with respect to the rate of production of messages and metrics.
figure 1: steady-state operation. 1 rawvertex/min and 12 seconds of smoothing. so, 3 (vertices) x 60/12 = 15/min. smoothed vertices. 1 min. window size means one triangle per minute. as expected from the design.
but what about latency? figure 2a below shows the latency of smoothed vertex messages in the stream process.
- there is essentially no delay (a few milliseconds, maybe) in receiving a raw measurement from the producer (not shown in figure 2).
- travel time : travel time is the elapsed wall clock time it takes for smoothed vertex message to make its way from the vertexprocessor where it is emitted to the triangleprocessor. on average, it takes 1.6 seconds (figure 2a); not sure if 1.6 seconds on a laptop is considered too long — probably attributable to all the push-pull ops at the disk and the fact that the laptop was loaded all through the simulation.
- saturation time : saturation time is the elapsed wall clock time between the latest arriving smoothed vertex into a window, and the earliest arriving one. on the average, a typical window saturates in about 49 seconds (figure 2b)! that is fantastic, as it means we can cut the window retention time way back from what we chose as three hours in the design. but we will have to see how a restart scenario looks like before we do that.
figure 2: steady-state operation. 2a shows the spread of the travel-time or stream-delay for a smoothed vertex messages to go from vertexprocessor to triangleprocessor. 2b shows the spread of the wall clock elapsed time between the earliest and latest arriving messages to a window (that is the saturation-time). 2c and 2d show the deforming triangle and some metrics with time.
- figures 2c and 2d show the metrics from this run. they are quite similar to pictures we produced via analytical means in the earlier post — makes sense, as we apply the same perturbation to the vertices.
a snapshot of the current offsets (obtained with kafka-utils ) during the steady-state shows that the processing is humming along nicely with just a few measurements on the disk waiting to be processed (see lines 7, 14, 20, and 26 below). this is the desired behavior and if anything, it shows that we may be able to increase the production rate and the laptop may still be able to stay current. but we will see in the next section that we do need this extra processing capacity to quickly lead us out of an unsteady state to steady state.
cluster name: local, consumer group: triangle-stress-monitor topic name: smoothedvertex total distance: 2 partition id: 0 high watermark: 8793 low watermark: 0 current offset: 8791 offset distance: 2 percentage distance: 0.02% topic name: rawvertex total distance: 5 partition id: 0 high watermark: 28165 low watermark: 0 current offset: 28163 offset distance: 2 percentage distance: 0.01% partition id: 1 high watermark: 28169 low watermark: 0 current offset: 28168 offset distance: 1 percentage distance: 0.0% partition id: 2 high watermark: 28200 low watermark: 0 current offset: 28198 offset distance: 2 percentage distance: 0.01%
the unsteady state
unsteady state results when the stream processor has a backlog of measurements to process. this causes some hiccups in the processing rhythm of the operation until it all stabilizes once more. we know for a fact that it will stabilize given enough time because of our conservative design parameters. our conclusion from our discussion on the steady state above is that we have excess capacity in stream processes that will eat into the backlog and eventually catch up to work on the currently produced measurements.
we simulate two periods of unsteady state. we start the producers, build up data in the rawvertex partitions a bit, and then start the stream processing. this gives rise to the first period of unsteady state. after achieving the steady state and running for a bit, we kill the stream process, wait for several hours, and restart it to give the second period of unsteady state. for example, the state of the offsets just before the second period of an unsteady state is as follows. there is no backlog (line 8 below) in the smoothedvertex, as expected. but we have built up a significant backlog to be processed in the raw vertices (lines 15, 21, and 27 below).
cluster name: local, consumer group: triangle-stress-monitor topic name: smoothedvertex total distance: 0 partition id: 0 high watermark: 8886 low watermark: 0 current offset: 8886 offset distance: 0 percentage distance: 0.0% topic name: rawvertex total distance: 56558 partition id: 0 high watermark: 47325 low watermark: 0 current offset: 28463 offset distance: 18862 percentage distance: 39.86% partition id: 1 high watermark: 47303 low watermark: 0 current offset: 28460 offset distance: 18843 percentage distance: 39.83% partition id: 2 high watermark: 47351 low watermark: 0 current offset: 28498 offset distance: 18853 percentage distance: 39.82%
figure 3: unsteady and steady states. 3a : the apparent travel time for a smoothed vertex to make its way from the vertexprocessor to the triangleprocessor is larger during unsteady states. it approaches its steady-state value as the stream processor catches up with the backlogged messages. 3b : smaller fraction (57%) of overall smoothed vertices have less than two seconds of travel time compared to the steady-state-only operation (85% see figure 2a). 3c : windows saturate right away during unsteady-state as a burst of smoothed vertices unloads onto the triangleprocessor. as the stream processor catches up, the natural rhythm reasserts itself and the saturation time goes back up to a minute as in the steady state. 3d : window saturation time has a bimodal nature. the first mode corresponds to the steady-state value, and other near zero, reflecting the burst of smoothed vertices into the triangleprocessor.
the simulation results in figure 3 bear out many of the assertions we have been making about the steady and unsteady states.
- delays in receiving the smoothed vertices . figure 3a plots the wall clock delay in receiving smoothed vertices at the triangleprocessor after they are forwarded by the vertexprocessor as the simulation continues. the unsteady states are characterized by sharply higher initial delays that decay with time as the operation marches to a steady state with smaller delays in single digits. these larger delays during the unsteady state influence the overall distribution shown in figure 3b. only about 57% of messages have a delay of fewer than two seconds in this steady plus unsteady simulation compared to about 84% of messages in the steady-state-only simulation in figure 2a.
- a torrent of triangle metrics . figures 3a and 3c make for an interesting pair. during the unsteady state, a large number of punctuate operations are clubbed together as one in the vertexprocessor. this is why we observed the delays noted in item #1 above. but this exact behavior dumps a whole bunch of smoothed vertices in one shot to the triangleprocessor, and the time windows in the window store saturate right away. this means that there is not much wall clock time-gap between the earliest-arriving message in a window and the latest-arriving one. this gap dives to zero during the unsteady state.
- a bi-modal distribution for window saturation-time . this is shown in figure 3d. during each period of unsteady state, the time taken to saturate a window in the window store is near zero and recovers to its steady-state value that we know to be under a minute from figure 2b. this sets up an approximate bi-modal for the distribution of window saturation time.
we spent a good bit of time looking into the dynamics, and the ebb and flow of messages under different conditions. simulations such as these to establish the operational metrics for normal and failure conditions are essential to developing insight into the system. we will get to the actual code and mechanics under these simulations in the next installment.
Published at DZone with permission of Ashok Chilakapati. See the original article here.
Opinions expressed by DZone contributors are their own.