DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Fraud Detection With Apache Kafka, KSQL, and Apache Flink

Fraud Detection With Apache Kafka, KSQL, and Apache Flink

Exploring fraud detection case studies and architectures with Apache Kafka, KSQL, and Apache Flink with examples, guide images, and informative details.

Kai Wähner user avatar by
Kai Wähner
CORE ·
Jan. 26, 23 · Presentation
Like (7)
Save
Tweet
Share
10.44K Views

Join the DZone community and get the full member experience.

Join For Free

Fraud detection becomes increasingly challenging in a digital world across all industries. Real-time data processing with Apache Kafka became the de facto standard to correlate and prevent fraud continuously before it happens. This article explores case studies for fraud prevention from companies such as Paypal, Capital One, ING Bank, Grab, and Kakao Games that leverage stream processing technologies like Kafka Streams, KSQL, and Apache Flink.

Fraud Detection and the Need for Real-Time Data

Fraud detection and prevention is the adequate response to fraudulent activities in companies (like fraud, embezzlement, and loss of assets because of employee actions).

An anti-fraud management system (AFMS) comprises fraud auditing, prevention, and detection tasks. Larger companies use it as a company-wide system to prevent, detect, and adequately respond to fraudulent activities. These distinct elements are interconnected or exist independently. An integrated solution is usually more effective if the architecture considers the interdependencies during planning.

Real-time data beats slow data across business domains and industries in almost all use cases. But there are few better examples than fraud prevention and fraud detection. It is not helpful to detect fraud in your data warehouse or data lake after hours or even minutes, as the money is already lost. This “too late architecture” increases risk, revenue loss, and lousy customer experience.

It is no surprise that most modern payment platforms and anti-fraud management systems implement real-time capabilities with streaming analytics technologies for these transactional and analytical workloads. The Kappa architecture powered by Apache Kafka became the de facto standard replacing the Lambda architecture.

A Stream Processing Example in Payments

Stream processing is the foundation for implementing fraud detection and prevention while the data is in motion (and relevant) instead of just storing data at rest for analytics (too late).

No matter what modern stream processing technology you choose (e.g., Kafka Streams, KSQL, Apache Flink), it enables continuous real-time processing and correlation of different data sets. Often, the combination of real-time and historical data helps find the right insights and correlations to detect fraud with a high probability.

Let's look at a few examples of stateless and stateful stream processing for real-time data correlation with the Kafka-native tools Kafka Streams and ksqlDB. Similarly, Apache Flink or other stream processing engines can be combined with the Kafka data stream. It always has pros and cons. While Flink might be the better fit for some projects, it is another engine and infrastructure you need to combine with Kafka.

Ensure you understand your end-to-end SLAs and requirements regarding latency, exactly-once semantics, potential data loss, etc. Then use the right combination of tools for the job.

Stateless Transaction Monitoring With Kafka Streams

A Kafka Streams application, written in Java, processes each payment event in a stateless fashion one by one:

Transaction Monitoring for Fraud Detection with Kafka Streams

Stateful Anomaly Detection With Kafka and KSQL

A ksqlDB application, written with SQL code, continuously analyzes the transactions of the last hour per customer ID to identify malicious behavior:

Anomaly Detection

Kafka and Machine Learning With TensorFlow for Real-Time Scoring for Fraud Detection

A KSQL UDF (user-defined function) embeds an analytic model trained with TensorFlow for real-time fraud prevention:

Machine Learning

Case Studies Across Industries

Several case studies exist for fraud detection with Kafka. It is usually combined with stream processing technologies, such as Kafka Streams, KSQL, and Apache Flink. Here are a few real-world deployments across industries, including financial services, gaming, and mobility services:

  • Paypal: processes billions of messages with Kafka for fraud detection.
  • Capital One: looks at events as running its entire business (powered by Confluent), where stream processing prevents $150 of fraud per customer on average per year by preventing personally identifiable information (PII) violations of in-flight transactions.
  • ING Bank: started many years ago by implementing real-time fraud detection with Kafka, Flink, and embedded analytic models.
  • Grab: is a mobility service in Asia that leverages fully managed Confluent Cloud, Kafka Streams, and ML for stateful stream processing in its internal GrabDefence SaaS service.
  • Kakao Games: a South-Korean gaming company uses data streaming to detect and operate anomalies with 300+ patterns through KSQL.

Let’s explore the latter case study in more detail.

Deep Dive Into Fraud Prevention 

With Kafka and KSQL in Mobile Gaming

Kakao Games is a South Korea-based global video game publisher specializing in games across various genres for PC, mobile, and VR platforms. The company presented at Current 2022—The Next Generation of Kafka Summit in Austin, Texas.

Here is a detailed summary of their compelling use case and architecture for fraud detection with Kafka and KSQL.

Use Case: Detect Malicious Behavior by Gamers in Real-Time

The challenge is evident when you understand the company’s history: Kakao Games has many outsourced games purchased via third-party game studios. Each game has its unique log with its standard structure and message format. Reliable real-time data integration at scale is required as a foundation for analytical business processes like fraud detection.

The goal is to analyze game logs and telemetry data in real-time. This capability is critical for preventing and remediating threats or suspicious actions from users.

Architecture: Change Data Capture and Streaming Analytics for Fraud Prevention

The confluent-powered event streaming platform supports game log standardization. ksqlDB analyzes the incoming telemetry data for in-game abuse and anomaly detection.

Kakao Games 1

Source: Kakao Games (Current 2022 in Austin, Texas)

Implementation: SQL Recipes for Data Streaming With KSQL

Kakao Games detects anomalies and prevents fraud with 300+ patterns through KSQL. Use cases include bonus abuse, multiple account usage, account takeover, chargeback fraud, and affiliate fraud.

Here are a few code examples written with SQL code using KSQL:

Kakao Games 2

Source: Kakao Games (Current 2022 in Austin, Texas)

Results: Reduced Risk and Improved Customer Experience

Kakao Games can do real-time data tracking and analysis at scale. Business benefits are faster time to market, increased active users, and more revenue thanks to a better gaming experience.

Conclusion

Ingesting data with Kafka into a data warehouse or a data lake is only part of a good enterprise architecture. Tools like Apache Spark, Databricks, Snowflake, or Google BigQuery enable finding insights within historical data. But real-time fraud prevention is only possible if you act while the data is in motion. Otherwise, the fraud already happened before you detected it. 

Stream processing provides a scalable and reliable infrastructure for real-time fraud prevention. The choice of the right technology is essential. However, all major frameworks, like Kafka Streams, KSQL, or Apache Flink, are very good. Hence, the case studies of Paypal, Capital One, ING Bank, Grab, and Kakao Games look different. Still, they have the same foundation with data streaming powered by the de facto standard Apache Kafka to reduce risk, increase revenue, and improve customer experience.

If you want to learn more about streaming analytics with the Kafka ecosystem, check out how Apache Kafka helps in cybersecurity to create situational awareness and threat intelligence and how to learn from a concrete fraud detection example with Apache Kafka in the crypto and NFT space.

How do you leverage data streaming for fraud prevention and detection? What does your architecture look like? What technologies do you combine? Let’s connect on LinkedIn and discuss it!

Anomaly detection Apache Flink Data processing Stream processing Data (computing) kafka Data lake Data warehouse Java (programming language) AWS Lambda

Published at DZone with permission of Kai Wähner, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 5 Software Developer Competencies: How To Recognize a Good Programmer
  • Create a REST API in C# Using ChatGPT
  • Java Code Review Solution
  • A Beginner's Guide to Infrastructure as Code

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: