Faster and Smarter Analytics to Lower Risk and Increase Profit
Lower risk and increase profit with these tips for real-time analytics.
Join the DZone community and get the full member experience.Join For Free
Millions of trades are completed every day by financial services institutions that are challenged with regulations and fierce competition.
Sophisticated IT tools are being leveraged to reduce financial risks; using algorithms fed by growing amounts of data with the goal of achieving informed decisions and risk mitigation rapidly — at the time of transaction. But the effectiveness of the analysis is fully dependent on performance and speed.
By processing large volumes of data more efficiently to accelerate and enrich insights financial institutions can create a competitive advantage. Their decisions are more objective, informed and timely, which means that they can also be more profitable.
You may also like: Real-Time Stream Processing With Apache Kafka Part One.
Big Data Bottlenecks
There are several factors that can slow down analytics. There are huge volumes of different types of data, including trade data, customer data, web logs, research, publications, market data, and public sentiment communicated via social media. Inconsistent data definitions and implementations can complicate the process of data aggregation. Some data warehouses pre-aggregate the data to provide a fixed view, and this data may need to be aggregated again to provide a full risk analysis.
In addition, different network hops between systems can also slow processing times, as shown in the following diagram:
A unified data layer can overcome these challenges to reduce calculation and retrieval times in near real-time. Having one data layer that supports both push and pull capabilities from archived data in data lakes or data warehouses (Hadoop or cloud storage) can deliver continuous and on-demand insights.
Solution for Real-Time Insights
Typically, data is stored in a speed layer (using a traditional fast database or IMDG) and a batch layer where archived data is stored such as HDFS or S3. Data can be streamed into both layers, with the speed layer holding a sliding window of the recent data, e.g., the last 24 hours or few days. Users can then query the appropriate layer according to the business needs, and when needed, query both layers independently and merge the query results.
The caveats for this architecture include:
- Lifecycle management of both the speed and batch layers.
- Complex Queries are required.
- Multiple code bases for multiple products.
- High Availability for each component.
- High Availability for the entire workflow.
- Slow batch layer performance.
This process can be greatly simplified and accelerated by ingesting data directly, or from the selected message broker to a unified data layer and automatically managing data tiers (both upstream and downstream) from a single platform using a unified API.
The code below shows how trading data is automatically loaded to your historical batch layer, per date. In this case, data from pre-2018 is considered historical or non-operational.
The business policy for loading data to the historical layer (Hadoop, S3) can be customized. For example, highly-traded stocks can be stored in the operational speed layer for longer than others.
Even though the defined data has been moved to an external data store, it is completely available for queries, analysis and simulations via a unified API that will also access the operational layer. If a query relates only to the data on the operational speed layer, the batch layer will not be accessed at all. For example, a parameter such as “the last 90 days” can be used to locate the data on both RAM and SSD, so it can be fetched in 37 milliseconds without having to access the batch layer.
If the data queried is older than 90 days, it can be fetched from the batch layer in 1,545 milliseconds without the need to access the speed layer.
In this example, data both older and younger than 90 days, is fetched in 1,530 milliseconds from the speed and batch layers. Only one query is needed to receive the results.
Data from multiple sources — such as trades, positions, market data, and derivatives (Forex, Interest Rate, Equity) — is streamed through Kafka and ETL tools to a single platform. Using defined business policies, this data is intelligently tiered between RAM, persistent memory, SSD, and external data storage technologies. From the instant, data is ingested, it is immediately available for continuous and on-demand queries, interactive reports, and simulations.
A clear benefit is the retrieval and ability to run advanced analytics and ML from a single API irrespective of where the data is located — RAM, SSD, Persistent Memory, or on a historical tier (Hadoop, AWS S3, Azure Blob Storage, etc.). The second benefit is speed.
Speeding up analytics for more informed trading decisions can make the difference between success and failure for trading analysts. Having a unified data layer can provide the necessary access to streaming and hot data plus fast access to historical data at the speed that banks need to create and maintain a competitive advantage to minimize risk and improve the profitability of trading decisions.
Opinions expressed by DZone contributors are their own.