{{announcement.body}}
{{announcement.title}}

Faster and Smarter Analytics to Lower Risk and Increase Profit

DZone 's Guide to

Faster and Smarter Analytics to Lower Risk and Increase Profit

Lower risk and increase profit with these tips for real-time analytics.

· Big Data Zone ·
Free Resource

woman-walking-on-tight-rope-over-gorge

Don't be risky with your real-time analytics

Millions of trades are completed every day by financial services institutions that are challenged with regulations and fierce competition.

Sophisticated IT tools are being leveraged to reduce financial risks; using algorithms fed by growing amounts of data with the goal of achieving informed decisions and risk mitigation rapidly — at the time of transaction. But the effectiveness of the analysis is fully dependent on performance and speed.

By processing large volumes of data more efficiently to accelerate and enrich insights financial institutions can create a competitive advantage. Their decisions are more objective, informed and timely, which means that they can also be more profitable.

You may also like: Real-Time Stream Processing With Apache Kafka Part One.

Big Data Bottlenecks

There are several factors that can slow down analytics. There are huge volumes of different types of data, including trade data, customer data, web logs, research, publications, market data, and public sentiment communicated via social media. Inconsistent data definitions and implementations can complicate the process of data aggregation. Some data warehouses pre-aggregate the data to provide a fixed view, and this data may need to be aggregated again to provide a full risk analysis.

In addition, different network hops between systems can also slow processing times, as shown in the following diagram:

Typical real-time analytics and machine learning architecture

Typical real-time analytics and machine learning architecture

A unified data layer can overcome these challenges to reduce calculation and retrieval times in near real-time. Having one data layer that supports both push and pull capabilities from archived data in data lakes or data warehouses (Hadoop or cloud storage) can deliver continuous and on-demand insights.

Solution for Real-Time Insights

Typically, data is stored in a speed layer (using a traditional fast database or IMDG) and a batch layer where archived data is stored such as HDFS or S3. Data can be streamed into both layers, with the speed layer holding a sliding window of the recent data, e.g., the last 24 hours or few days. Users can then query the appropriate layer according to the business needs, and when needed, query both layers independently and merge the query results.

The caveats for this architecture include:

  • Lifecycle management of both the speed and batch layers.
  • Complex Queries are required.
  • Multiple code bases for multiple products.
  • High Availability for each component.
  • High Availability for the entire workflow.
  • Slow batch layer performance.


Typical Lambda architecture

Typical Lambda architecture

This process can be greatly simplified and accelerated by ingesting data directly, or from the selected message broker to a unified data layer and automatically managing data tiers (both upstream and downstream) from a single platform using a unified API.

The code below shows how trading data is automatically loaded to your historical batch layer, per date. In this case, data from pre-2018 is considered historical or non-operational.

The business policy for loading data to the historical layer (Hadoop, S3) can be customized. For example, highly-traded stocks can be stored in the operational speed layer for longer than others.

lambda.policy.trade.class=policies.ThresholdArchivePolicy
lambda.policy.trade.table=model.v1.StockData
lambda.policy.trade.threshold-column=stock_date
lambda.policy.trade.threshold-value=2018-01-01
lambda.policy.trade.threshold-date-format=yyyy-MM-dd
lambda.policy.trade.batch-data-
source.class=com.gigaspaces.lambda.JdbcBatchDataSource lambda.policy.trade.batch-data-source.url=jdbc:hive2://hive-server:10000/;ssl=false


Even though the defined data has been moved to an external data store, it is completely available for queries, analysis and simulations via a unified API that will also access the operational layer. If a query relates only to the data on the operational speed layer, the batch layer will not be accessed at all. For example, a parameter such as “the last 90 days” can be used to locate the data on both RAM and SSD, so it can be fetched in 37 milliseconds without having to access the batch layer.

Running a query that accesses only the speed layer (data on both RAM and SSD) in 37 milliseconds

Running a query that accesses only the speed layer (data on both RAM and SSD) in 37 milliseconds

Running query that accesses only the speed layer (data on both RAM and SSD)


If the data queried is older than 90 days, it can be fetched from the batch layer in 1,545 milliseconds without the need to access the speed layer.

Running the same query on data from before the date defined in the policy in 1545 milliseconds

Running the same query on data from before the date defined in the policy in 1545 milliseconds

Running query on data from before the defined-date


In this example, data both older and younger than 90 days, is fetched in 1,530 milliseconds from the speed and batch layers. Only one query is needed to receive the results.

Unified result received when running the query on data that is split between the batch and speed tiers

Unified result received when running the query on data that is split between the batch and speed tiers

Output from running query on data split between batch and speed tiers


Data from multiple sources — such as trades, positions, market data, and derivatives (Forex, Interest Rate, Equity) — is streamed through Kafka and ETL tools to a single platform. Using defined business policies, this data is intelligently tiered between RAM, persistent memory, SSD, and external data storage technologies. From the instant, data is ingested, it is immediately available for continuous and on-demand queries, interactive reports, and simulations.

Stock price prediction simulation for the next 100 days

Stock price prediction simulation for the next 100 days

A clear benefit is the retrieval and ability to run advanced analytics and ML from a single API irrespective of where the data is located — RAM, SSD, Persistent Memory, or on a historical tier (Hadoop, AWS S3, Azure Blob Storage, etc.). The second benefit is speed. 

Speeding up analytics for more informed trading decisions can make the difference between success and failure for trading analysts. Having a unified data layer can provide the necessary access to streaming and hot data plus fast access to historical data at the speed that banks need to create and maintain a competitive advantage to minimize risk and improve the profitability of trading decisions.


Related Article

Topics:
big data analytics ,real-time analytics ,big data ,speed layer ,batch layer ,lambda architecture

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}