The Best Decision: Your Future and Serverless Stream Processing
The Best Decision: Your Future and Serverless Stream Processing
Data-driven companies are harnessing the power of serverless computing to optimize their analyses with stream processing.
Join the DZone community and get the full member experience.Join For Free
About a year ago, we started being a part of a digital transformation with the first ever cloud-based IDE for serverless development. It was no cakewalk—we’ve been burning the candle at both ends trying to cover a majority from the AWS’s serverless stack. Working with AWS Kinesis made me realize the beauty of serverless—of course, the exposure to streaming data with Kafka spared me some time going through the rudiments.
How do online role-playing games adjust according to your decisions?
How do gambling sites predict the odds of a live game?
Why were the Splash brothers benched while Portland was handing the Warriors their worst loss in a 73-win NBA season?
The power of real-time streaming data analytics is astonishing indeed. Now, since serverless technology is gaining some momentum, maybe you won’t have to worry about taking risky decisions on your own at all. This post covers the basics of serverless streaming data processing and how it will be an influential component of our decision making in the future.
Stephen Curry and Klay Thompson benched during a game (source: The Smoking Cuban)
Look Around: It's All Data
Life is an endless series of events. The technology around us has made it a stream of digital actions emitting streams of data. If you turn back and inspect your life very carefully, you'll see the never-ending string of data you have generated with your every digital action. It could be a lot to digest at first, but let’s explore some scenarios and try to find what applies to you and me.
- Online banking and convenient e-commerce purchasing capabilities
- Ride-sharing and modern-day travelling and transportation
- Industrial equipment and agricultural use cases like monitored machinery, autonomous tractors, and precision farming
- Automated power generation and smart grids, Zero-net Buildings, and smart metering
- Real-estate property recommendations based on geolocation
- Predictive Maintenance
- Online dating and matchmaking relying on complex personality patterns and attribute distribution
Valentine's Day Adjustment of a Data Scientist
- Financial trading according to the real-time changes in the stock market, analytical risk management
- Movies, songs, and other digital media with adaptive experiences depending on demographics, preference, and history
- Improved web and mobile application experience based on usage
- Dynamic and personalized experiences in online gaming
- Enhanced social media experiences with hyper-personalization and predictive analytics
- Telemetry from connected devices, or remote data centers from geospatial or spatial services like weather, resource assessment
- Sports analytics to enhance the players’ performance reducing health risk
All these events produce data—lots of it. Due to the frequency of this data emission, it has become an increasing burden to the digital space.
What is Streaming Data?
In a survey conducted last year about data, it’s estimated that with the current pace of data generation, 1.7 MB of data will be created every second for every person on earth by 2020.
Data that is poured continuously by millions of sources every second has become a fact we can’t just ignore. Big Data disciplinary was an eye-opener for the tech world to apply this once irritating data to do something useful. This same irksome data is collected and analyzed by a new species of engineers, data scientists.
Due to the nature of continuity and often being in small sizes (in the order of kilobytes), these data flows—usually referred by the moniker streaming data—are collected simultaneously as records and sent in for further processing.
Stream Processing to Decisions
A streaming data processing structure usually comprises of two layers—a storage layer and a processing layer. The former is responsible for ordering large streams of records and facilitating persistence and accessibility at high speeds. The processing layer takes care of data consumption, executing computations, and notifying the storage layer to get rid of already-processed records. Data processing is done for each record incrementally or by matching over sliding time windows. Processed data is then, subjected to streaming analytics operations and the derived information is used to make context-based decisions. For instance, companies can track public sentiment changes on the products by analyzing social media streams continuously—world's most influential nations can intervene in decisive events like presidential elections in other powerful countries—mobile apps can offer personalized recommendations for products based on geo-location of devices, user emotions.
Most applications collect a portion of their data at the outset to produce simple summary reports and take simple decisions such as triggering alarms or calculating a moving average value. When the time flies by, these become more and more sophisticated, and companies might want to access profound insights to perform intricate activities in turn with the aid of Machine Learning algorithms and data analysis techniques. The continual growth of data has made data scientists work around the clock to come up with trailblazing solutions to utilize as much data as possible to fabricate alternate futures with better decisions.
Adoption of the ideal cloud provider to fit organizational requirements can be overwhelming. However, all the major cloud service providers are equipped with competitive options to accommodate stream processing due to its impacting ubiquity. Here's a list of commonly used serverless services to bolster enterprise-grade applications, highly relying on streaming data.
Infographic: Serverless Stream Processing Components
Real-World Use Cases
Many companies use insights from stream analytics to enhance the visibility of their businesses which allows them to deliver customers a personalized experience. Additionally, "near real-time" transparency gives these firms the flexibility to promptly address emergencies. The emerging serverless architecture has driven all the leading cloud service platforms to present complementary solutions. Stream processing was made available for serverless application development with fully-managed, cloud-based services for real-time data processing over large Distributed Data Streams.
1. Hyper-Personalized Television!
Netflix, the leading online television network in the world, developed a solution which centralizes their flow logs using Amazon Kinesis Streams. As a system processing billions of traffic flows every day, this eliminates plenty of complexity for them because of the absence of a database in the architecture. Due to the high scalability and lightning speed, they can discover and address issues as they arise, monitoring the application on a massive scale. With the upgraded recommendation algorithm, video transcoding, and licensing popular media, this subsequently grants a seamless experience to the subscribers. With the exponential growth of the subscribers, the company’s responsibilities increase by the day. However, nothing seems to be a problem for Netflix for many years to come since they are considered to have a sound decision-making model.
2. Empowering the Decision Makers
As a leading source of integrated and intelligent information for businesses and professionals, Thomson Reuters provide their services to decision makers in a wide range of domains like financing and risk, science, legal, and technology. This company built an in-house analytics engine to take full control of data and moved to AWS because they were familiar with its capabilities and scale. The new real-time pipeline attached to Amazon Kinesis stream produces better results in perceptive customer experience with accurate economic forecasts, financial trends for beneficiaries including a range of government activities.
3. Unicorn: A Solution to Traffic Congestion
Jakarta has become a heavily congested city where the motorcycle is deemed the most efficient mode of transport. To exploit this business opportunity, GO-JEK—one of the few unicorn businesses in Southeast Asia—started as a call center for motorcycle taxi bookings. However, to meet the demand in exceeding expectations, the company had to consider expansion. Now with the support of Google Cloud Professional Services, the business architecture built on Cloud Dataflow for stream inference enables them to predict changes in demand effectively.
Shortcomings in Stream Processing
Serverless stream processing is increasingly becoming a vital part of decision-making engines. However, with the current set of features, it's not the ideal solution for some scenarios. Implementing real-time analytics for sliding windows and temporal event patterns is not a course for the faint-hearted.
The best way to assimilate never-ending data of this magnitude is through real-time dashboards which requires additional data organization and persisting. These maneuvers introduce undesirable latency and data management issues into the context. However, technology is evolving and trying to catch up to the speeds with integration using advanced cloud data management techniques to produce materialized views.
Poor data analytics - Poor decisions
Stream Processing often uses a time-based or record-based window to be processed in contrast to the batch-based processing, which can lead to challenges in use cases that require query re-execution.
Nowadays, application requirements grow beyond aggregated analytics. Increasing the window size seems to be an appropriate temporary solution but, it develops another intractable problem—Memory Management. Modern-day solutions usually provide advanced memory management and scheduling techniques to overcome this.
All in all, it’s apparent that serverless stream processing has been playing a prominent role around us without us even knowing. With the power of serverless data stream processing, applications can evolve from traditional batch processing to real-time analytics. The revelation of profound insights will result in effective decision making without having to manage infrastructure. Even today, many organizations practice orthodox decision-making strategies based on the analytics derived using the big data clusters that belonged to the past. New horizons of serverless and real-time data processing are now equipped with the power to make effective decisions and create a—more productive, relevant and most importantly secure—world around you.
Will serverless stream processing make your future brighter by making the best decisions?
What do you think?
Share your thoughts, and check my personal blog.
Published at DZone with permission of Chamath Kirinde , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.