Building a Real Time Streaming Solution with Apache Spark and EVAM
Building a Real Time Streaming Solution with Apache Spark and EVAM
Join the DZone community and get the full member experience.Join For Free
The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.
http://spark.apache.org is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. Spark is growing in popularity, and supports batch as well as streaming, graph data, machine learning, as well as Hadoop and map/reduce. It’s an ideal platform to explore the building of a real time event processing.
EVAM is an established leader in real time event processing, with over 40 solutions supporting real time event processing for over 200 million end-users globally. In recent years EVAM has explored building customer solutions that leverage popular capabilities, such as AWS Kinesis, and RedShift, and in this article we explore a solution architecture built on Apache Spark combined with EVAM’s event processing engine.
Real Time Event Processing Requirements:
Real time customer engagement systems pose a strict set of requirements, centered on “event to action” within 50 milliseconds. Achieving this level of responsiveness is possible through selective data integration, combining technical events in a way that is useful for customers and the business. The general requirements include:
- Data integration with any source, including click stream, logs, transactional systems, IOT, Twitter, and others
- Real time ingestion through Kafka, Kinesis, and other systems
- Event processing that combines events and non-events with a time window and customer profile data, to trigger actions. For the purpose of this article we will refer to each such combination a “scenario.”
- With tens or hundreds of scenarios, it’s necessary to constrain and prioritize actions. Actions related to customer support presumably will take precedence over “new offer” actions, for example.
- A sequence of events which fulfills one scenario should be available as input for other scenarios.
- The time latency of “event to action” should be no longer than 50 milliseconds
A Conceptual Model for Real Time Event Processing:
Having been involved in real time event solutions for the past decade, I’ve learned the importance of a business abstraction layer for real time customer engagement. A business abstraction layer defines events in terms meaningful for the business, and creates the opportunity for business analysts to define and implement scenarios.
A model is focused on an “actor” who generates events. The actor is typically a person, but can be a device on the network (IOT, or CPE). Events originate as a technical event, which are combined to constitute a “business” event. For example, the creation of a new customer can involve the creation of multiple tables in a relational database, each of which are individually recognized as an event, but are combined to form the “customer creation” event.
A robust model incorporates support for non-events, and aggregation of events over time. Non-events can include the purchase of a new device or service, which is not registered on the network within a certain time window. Recognizing the lack of an event can be critically important for business systems.
The resulting actions require context such as name, email address, and other “actor” data for engaging via email or other channels.
Finally, it’s important to support prioritization and constraints on scenarios. Customers that are in the midst of a “technical support” scenario can be treated differently than other customers. Equally important is that customers aren’t subjected to being inundated with multiple actions, as it’s fairly common for multiple scenarios to be triggered within a short time frame. Effective management strategies for scenarios are among the many challenges posed in building a robust real time engagement solution on top of Spark or other open source frameworks.
In addition to prioritization and support for global constraints on scenarios, it’s also important to monitor scenarios with a real time dashboard. Monitoring scenarios can lead to enhancements and optimizations, which can be easily implemented if scenarios are accessible through templates so that parameters can be easily updated.
Putting it all together:
One challenge associated with enterprise Big Data strategies is simply to organize the range of use-cases, and technical requirements. In this article we’ve focused on support for complex real time event processing for customer (and device) engagement, with a corresponding need for recognizing combinations of events, non-events, with time windows, customer profile data, and an overlay of prioritization and constraints for different scenarios.
Spark provides an ideal framework, in providing industry-wide programming support for data integration, technical event processing, and support for a range of batch processes. In the solution architecture Spark provides an effective front-end to the EVAM event engine which provides a business event abstraction, suitable for support of the complex scenarios.
The EVAM event processing engine is easily integrated with cloud based designs (in another article we’ll outline how EVAM is used with AWS Kinesis, RedShift, and other services to service a global wireless carrier). In this article we’ve highlighted how EVAM integrates with Apache Spark.
In this architecture, EVAM hosts the in-memory resilient cache of real time events, and associated business rules for scenario recognition. The EVAM design includes a Visual Scenario designer, which uses the input of Spark technical events to recognize a higher level business event. EVAM’s design allows for complex scenarios, which can include a mix of real time events, non-events, with time windows, and include customer profile data (customer name, email, address, payment status). EVAM’s is also well suited to act as an “enterprise event hub” where existing legacy event processing systems are routed, for a centralized enterprise-wide view of events and associated actions.
Real time event processing is an exciting space. It’s technically interesting, but is also proving to be a practical solution for real business value. Rather than collecting “everything,” these systems focus on real time event collection to support specific scenarios. Data collection is focused and generates immediate insight into customer behavior, with associated real time actions. These systems reduce churn, with lower customer support, and improved cross selling and revenues.
Apache Spark will continue to grow in popularity, as it provides an increasingly mature framework for real time data collection, with support for a range of batch processing functions including Graph, Hadoop, and others. Delivering an effective real time event management system on Spark, however, would be a large undertaking. Such systems require logic for event recognition that includes real time events, non-events, time windows, and customer profile data. Developing such a system on Spark, along with the flexibility to prioritize and constrain scenarios is not a realistic goal for most teams.
A practical approach would make use of Spark alongside a proven enterprise real time event processing engine, such as provided by EVAM. My firm, EVAM is a leader in real time event processing, with over forty enterprises relying on EVAM to support over 200 million end-users. In another article we explore how EVAM was deployed on AWS, using Kinesis, RedShift and other services to provide a global wireless carrier with a real time event solution.
To learn more about Real Time Streaming solutions, visit our site and let us know how we can help at http://www.evam.com
Opinions expressed by DZone contributors are their own.