DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

  1. DZone
  2. Refcards
  3. Getting Started With Real-Time Analytics
refcard cover
Refcard #390

Getting Started With Real-Time Analytics

Real-time analytics is necessary for any business that needs to make decisions in hours, minutes, or seconds. Implementing real-time analytics requires processing high volumes of input data and matching it with existing data in minutes, seconds, or even less time. This Refcard aims to acquaint readers with real-time analytics, where it is used, how it works, and the challenges involved.

Download Refcard
Free PDF for Easy Reference
refcard cover

Written By

author avatar Sida Shen
Product Marketing Manager, CelerData
Table of Contents
► What Is Real-Time Analytics? ► About Real-Time Analytics ► Common Challenges of Real-Time Analytics ► Getting Started With Real-Time Analytics ► Conclusion
Section 1

What Is Real-Time Analytics?

Real-time analytics is necessary for any business that needs to make decisions in hours, minutes, or seconds. Implementing real-time analytics requires processing high volumes of input data and matching it with existing data in minutes, seconds, or even less time.

This Refcard aims to acquaint you with real-time analytics, where it is used, how it works, and the challenges involved.

Real-time analytics involves processing data as soon as it comes into a system. Analytics processes, discovers, and communicates meaningful insights from data through math, statistics, and machine learning. Traditionally, analytical systems process a large amount of data, often on the scale of petabytes, and have few users running relatively large queries when compared to transactional or operational systems. This definition has changed in recent years, and analytical systems are often user- and customer-facing analytical systems. They are also frequently used as part of a larger workflow where the "user" is another system or artificial intelligence solution that makes data-driven decisions.

Real time is a relative term in the sphere of analytics. Generally speaking, data freshness is measured in minutes or seconds instead of hours or days. The definition is subjective to the user and business expectations. However, data that is aggregated into large blocks and then processed can be considered "batch," and data that is processed soon after it is received can be considered "real-time."

Importance of Real-Time Analytics in Today's Business Environment 

Businesses changed fundamentally over the past few years. Business operations are becoming increasingly fast paced and complex. Real-time analytics provides up-to-the-minute insights into various aspects of the business, enabling quick and informed decision-making, especially in industries where market conditions change rapidly, such as adtech, crypto, or finance. In these industries and many others, competitive advantage is gained by reacting swiftly to market changes.

Meanwhile, fraud and security are a bigger threat than ever before, and real-time analytics is essential to both detecting and preventing fraud and security penetration. Real-time analytics has become more crucial across industries where supply chains are now constantly changing due to the dynamic geopolitical and economic environment. Often, everything from consumer devices to manufacturing equipment are instrumented with sensors and provide real-time data; analytics must keep up to supply correct control and telemetry to the people and algorithms that control and monitor these devices.

Whether it is due to industry-specific requirements or to the changing nature of customer expectations and competitive requirements, real-time analytics has become an integral tool.

Section 2

About Real-Time Analytics

Real-time analytics shares a lot in common with batch analytics, including the need for data transformation. However, it differs in both timeframe and implementation.

How Real-Time Processing Differs from Traditional Analytics 

Batch processing systems operate on an accumulated set of data over a specified time interval. These systems usually consist of regular load processes that extract data, transform it often into summary tables, and load it into a destination system. Often, tools like Apache Spark are used to process the data before loading it into a destination system, usually a data warehouse like Teradata or Snowflake.

Real-time analytics processes data as soon as it arrives, usually instantaneously. Real-time systems usually include a message or event queue like Apache Kafka or Flink. Data is often sent to the operational system and transformed and loaded into the analytics system either one after the other or simultaneously. The message or event queue and associated logic are what are called real-time data pipelines or data transformation pipelines.

Real-time data pipelines also process data as soon as it arrives, almost instantaneously. In order to ensure data can be queried efficiently, data pipelines transform the data into a more denormalized form, pre-joining and often pre-aggregating data. The data lake or data warehouse ends up containing a more efficiently queried summarized form of the data in the operational system. The primary difference between batch and real-time analytics is simply batching.

Real-World Examples of Real-Time Analytics Usage

There can be no exhaustive list of every real-time analytics use case. First, new innovations are happening every day as the global economy increasingly digitizes. Secondly, competition is forcing previously slow-moving industries to move faster and provide service on demand. With that said, there are some places where real-time analytics is a must or has a clear advantage:

Table 1

Usage Data Requirements Real-World Examples

Security and threat detection

  • Requires analyzing massive amounts of data.
  • To be effective, it must happen in real time. 

According to Statistica, in 2022, there were more than 1,802 reported data compromises, which affected more than 422 million people. While there are yearly fluctuations, the trend has been upward virtually since the start of modern computing. Detecting a breach is useful even after the fact, but shutting down anomalous activity before there is damage is crucial for sensitive data and systems.

Fraud and risk analysis

  • Real-time analytics often combined with machine learning and other algorithmic techniques can detect when transactions are "abnormal" or unusually risky.
  • Largely done in batches after the fact.
  • Modern payment systems are real-time, so fraud and risk detection must be as well.

According to the NICE Actimize 2023 Fraud Insights Report, fraudulent transactions have risen 92%year over year, and the amounts are up 146%.

Fraud has been especially challenging to cryptocurrency firms, where major incidents have helped convince a majority of the public that crypto investments are unusually risky, according to CNBC. To combat this, crypto firms are implementing anti-money-laundering (AML) and know-your-customer (KYC) systems using real-time analytics. The crypto industry requires new database technology as analyzing blockchains is more intensive than flat transaction logs.

Network telemetry and traffic monitoring

  • Related to security and threat detection.
  • Includes other issues such as misconfiguration, faulty equipment, or traffic congestion. 
  • Can be detected and corrected.
  • Requires real-time analytics and extreme volume as the system has to essentially outrun the network at least at some sample rate.

Real-time analytics can ensure networks are stable. As a result, faulty equipment is routed around and replaced, and traffic is moved to appropriate routes on redundant networks.

Online user behavior tracking

  • Used across industries, especially gaming and e-commerce.
  • Every click, mouseover, and scroll generates data.

By understanding what users do and why, vendors can provide customers with a better experience and, ultimately, close more deals. Internet giants, like Airbnb in the US and TenCent and Alibaba in China, have implemented systems to help algorithmically understand their users as well as provide information to professionals.

Supply chain and inventory management

  • A real-time endeavor in recent years.
  • Before 2020, seeing an empty shelf at a major retailer in a developed country was unheard of. The shelf space is too valuable, and there was never a reason to be out in economies of abundance.
  • While empty slots are still costly, they are now more common.

In recent years, supply chains have become more dynamic and, in many cases, more risky. Gone are the days of permanent contracts like the storied contract of yesteryear between Ford and Firestone. 

Now, everyone from brick-and-mortar retail to manufacturing and beyond must be aware of their entire inventory and supply chains, and be prepared to make changes.

Should You Use Real-Time Analytics? 

In some industries and use cases, real-time analytics is a clear-cut requirement. If you are not sure, there are four questions to answer:

  1. How fast is data being generated and at what frequency?
    • It's crucial to first assess if your data pipeline can generate fresh data at the source. Real-time analytics becomes futile if the starting point is already outdated data.
  2. How quickly can you make decisions based on this data?
    • Businesses reap the rewards of real-time analytics when they can swiftly translate freshly produced data into actionable insights. Equally crucial to the freshness of data is the rapidity of decision-making. If your business process cannot drive decisions or lead action quickly, then real-time analytics might not be the best fit for you.
  3. Does your business model or strategy benefit from real-time insights?
    • It is widely recognized that fresh data can be extremely advantageous, but these benefits aren't universal. Rather than merely focusing on the perks of real-time analytics, why not flip the perspective? Consider what the upper limit of data freshness is that your operations can effectively manage or tolerate.
  4. Can the benefits justify the cost of real-time analytics?
    • It's crucial to determine if the advantages gained from real-time analytics outweigh the investment it requires. Implementing real-time analytics often demands specific tools and additional resource investments. Despite the emergence of cloud-based solutions reducing some of these costs, real-time analytics still requires significant investments in infrastructure and skilled personnel.

If your data is generated quickly, your business makes quick decisions and actions,, and can justify the cost, then real-time analytics may be right for you.

Section 3

Common Challenges of Real-Time Analytics

Real-time analytics is more challenging than batch analytics and generally costs more. It is used where the business use case demands or in competitive industries seeking an advantage.

Demand for Fresh, Mutable Data 

Real-time analytics is a delicate balancing act. Handling analytics at scale is difficult. Doing everything in seconds is much harder. And the hardest part is dealing with mutable data in real time. Yet many new applications — from SaaS dashboards and network support systems to gaming and finance — require real-time analytics on data that is forever changing. Everything necessary to get the data from the source to the appropriate form in the destination system must happen increasingly fast. Analytical databases have not traditionally handled updates well, but this is increasingly necessary when analyzing in real time.

Real-Time Preprocessing Pipeline: the Hidden Cost of RTA 

Most analytical databases do not handle join operations efficiently at scale. Because of this, it is usually necessary to do some amount of denormalization, pre-aggregation, and transformation before loading data into the analytical system. These data pipelines are complex, difficult to maintain, and make it hard to track data points back to the source. Additionally, the more preprocessing, the less fresh the data.

Next, handling updates for pre-joined or pre-aggregated data often requires re-processing any pre-aggregated or pre-joined data. Finally, many analytics systems are full of data that will never be queried, but that cannot be known in advance. All of these issues make the system costly to operate and difficult to maintain.

Figure 1: Real-time data pipeline

Compounding these challenges, user and business expectations frequently outpace technical innovation. Data volumes are forever rising, and new techniques require processing even more data at an increased pace. So maintaining these systems also means ensuring they can evolve.

To mitigate these problems, data platform engineers should select data query engines and technology that handle joins and aggregations more efficiently. Moreover, consider technologies that allow transformation directly in the database, also called extract, load, transform (ELT) instead of the traditional extract, transform, load (ETL). Where possible, it's often preferable to adopt data lake technologies rather than black-box proprietary technology. This allows the system to continuously evolve as new technologies become available.

Section 4

Getting Started With Real-Time Analytics

Implementing real-time analytics requires generating, capturing, preprocessing, and analyzing visualization and reporting. The task of preprocessing and analysis can be simplified by using newer technologies that enable joins at scale.

Figure 2: Real-time analytics in a nutshell

Step 1: Real-Time Data Generation 

Every process in real-time analytics starts with data. This data is generated by multiple sources such as online transactions, social media interactions, and Internet of Things (IoT) devices. The data can be structured or unstructured and often arrives in various formats that demand different kinds of handling and processing.

Step 2: Data Capture and Ingestion for Real-Time Analytics 

Once data is generated, the next step is data capturing and ingestion. This process involves gathering the generated data from its various sources and importing it into the system where it will be analyzed. In the context of real-time analytics, data capture and ingestion is a continuous cycle that happens frequently. The captured data doesn't just sit idle; it's immediately put to work. This swift and continuous flow is what allows real-time analytics to provide immediate insights and drive quick decision-making.

Step 3: Data Preprocessing for Real-Time Analytics 

Preprocessing refers to cleaning and transforming raw data to make it ready for analysis. This stage can involve filling in gaps where data may be missing, eradicating duplicates, and changing the data into a format that is easier to work with.

The pace of real-time analytics poses a unique challenge at this stage — a lot of databases designed for this type of analysis struggle with multi-table queries (JOIN operations). To ensure real-time insights aren't held back by these constraints, users typically perform a process called denormalization during preprocessing.

This preprocessing stage must be fast, or it reduces the "real-time" nature of the analytics. Traditional ETL tools like Spark may not work here due to their slower pace. Often, Spark Streaming or Flink are used for data pipelines. These tools are like high-speed blenders, capable of preparing our "data ingredients" much more quickly, keeping everything fresh. However, they can be a challenge to set up and maintain because of the nature of their complexity.

Step 4: Real-Time Data Analysis, Visualization, and Reporting

This is the pivotal stage where the magic of real-time analytics truly unfolds, and it begins with retrieving the data from our real-time database. Analysts querying business intelligence (BI) tools, such as Tableau or Apache Superset, generate SQL commands on the back end that fetch the most current data for their real-time dashboards and reports.

This freshly retrieved real-time data might also be sent to other applications for a deeper dive. Some of these could be AI-powered applications, using advanced algorithms to go beyond just analyzing the data. They can draw out deeper insights, trends, or even predictions. With real-time analytics, we're not just looking at what's happening now but also anticipating what could happen next.

Step 5: Decision-Making With Real-Time Analytics

This is where the data we've collected, cleaned, and analyzed is finally put to use. This could involve adjusting a marketing strategy in response to user behavior, optimizing system performance, or identifying and responding to potential security threats.

Human analysts, using real-time dashboards and reports, can quickly adjust strategies based on current data trends. Meanwhile, algorithms can make automated adjustments in real time, responding instantly to data-driven triggers. Regardless of the decision-maker, the speed and accuracy of real-time analytics make for an efficient and responsive decision-making process. It's all about reacting promptly and staying ahead of the curve.

Section 5

Conclusion

Real-time analytics is becoming more common as new use cases and industry practices emerge. While real-time analytics is more costly than traditional batch analytics, analysts such as Ventana's Matthew Aslett expect it to grow from 22% to 50% in the next couple of years as companies seek new advantages. By using new technologies, you can minimize the cost of data transformation pipelines and, therefore, increase the freshness of data.

If you're navigating the space of real-time analytics, you are not alone. Check out the following resources to learn more about:

  • How Airbnb implemented Minerva while reducing the amount of data transformation they perform
  • How vectorization improves database performance and makes joins at scale possible, reducing the need for data pipelines
  • How to go pipeline-free with your real-time analytics and ditch denormalization and complex, time-consuming data transformation work

Like This Refcard? Read More From DZone

related article thumbnail

DZone Article

Is Your Stream Processor Obese?
related article thumbnail

DZone Article

Is Low Code the Developer's Ally or Replacement? Debunking Myths and Misconceptions
related article thumbnail

DZone Article

Bridging UI, DevOps, and AI: A Full-Stack Engineer’s Approach to Resilient Systems
related article thumbnail

DZone Article

The Ultimate Guide to Code Formatting: Prettier vs ESLint vs Biome
related refcard thumbnail

Free DZone Refcard

Open-Source Data Management Practices and Patterns
related refcard thumbnail

Free DZone Refcard

Real-Time Data Architecture Patterns
related refcard thumbnail

Free DZone Refcard

Getting Started With Real-Time Analytics
related refcard thumbnail

Free DZone Refcard

Getting Started With Apache Iceberg

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: