Big Data as a Platform for EU Market Regulation
There are many business and technology requirements that drive the new EU market implementations to an open data architecture.
Join the DZone community and get the full member experience.Join For Free
The first post in this three-part series explored the evolution of capital markets regulation in the European financial markets over the last 15 years. We covered the important aspects of MAR (Market Abuse Regulation) and MiFid II. In this second blog post, we will discuss the business and technology requirements that drive these implementations to an open data architecture.
Key Business and Technology Requirements
The MAR and MiFiD II regulations have broad ramifications across a variety of key capital markets business functions across the front, mid, and back offices. These include compliance, compensation policies, regulatory reporting, trade surveillance etc. However, the biggest obstacles are related to technology as we will examine below.
Some of the key business requirements that can be distilled from a perusal of the regulatory mandates include the below.
Efficiently Store Enormous Amounts of Heterogeneous Trade Data
Both MiFiD II and MAR mandate the need to perform trade monitoring and analysis on not just real-time data but also historical data spanning a few years. Among others, this will include data feeds from a range of business systems: trade data, valuation and position data, reference data, rates, market data, client data, front, middle and back office, data, voice, chat and other internal communications etc. To sum up, the ability to store a range of cross-asset (almost all kinds of instruments), cross-format (structured and unstructured including voice), cross-venue (exchange, OTC, etc.) trading data with a higher degree of granularity is key.
Perform Data Lineage and Auditing
Such stored data needs to be fully auditable for five years. This implies not just being able to store it but also putting in place capabilities in place to ensure strict governance and audit trail capabilities.
Manage a huge volume increase in data storage requirements (5+ years) due to extensive record keeping requirements.
Perform Realtime Surveillance and Monitoring of Data
Once data is collected, normalized, and segmented, it will need to support real-time monitoring of data (around five seconds) to ensure that every trade can be tracked through its lifecycle. Detecting patterns that could perform surveillance for market abuse and monitor for best execution are key.
Create Business Rules
Core logic that deals with identifying some of the above trade patterns are created using business rules. Business Rules have been covered in various areas in the blog but they primarily work based on an IF-THEN-ELSE construct.
Machine Learning and Predictive Analytics
A variety of supervised and unsupervised learning approaches can be used to perform extensive behavioral modeling and segmentation to discover transactions behavior with a view to identifying behavioral patterns of traders and any outlier behaviors that connote potential regulatory violations.
Provide a Single View of an Institutional Client
From the firm’s standpoint, it would be very useful to have a single view capability for clients that shows all of their positions across multiple desks, risk position, KYC score, etc.
Logical Architecture of a Market Surveillance System
The ability to perform deep and multi-level analysis of trade activity implies the capability of not only storing heterogeneous data for years in one place as well as the ability to perform forensic analytics (rules and Machine Learning) in place at very low latency. Querying functionality ranging from interactive (SQL-like) needs to be supported as well as an ability to perform deep forensics on the data via Data Science.
Further, the ability to perform quick and effective investigations of suspicious trader behavior also requires compliance teams to access and visualize patterns of trade, drill into behavior to identify potential compliance violations. A Big Data platform is ideal for these complete range of requirements.
Key Design Requirements for a Market Surveillance System for MiFiD II and MAR
The most important technical features for such a system are the following.
Support end to end monitoring across a variety of financial instruments across multiple venues of trading. Support a wide variety of analytics that enables the discovery of interrelationships between customers, traders, and trades as the next major advance in surveillance technology. HDFS is the ideal storage repository of this data.
Provide a platform that can ingest from tens of millions to billions of market events (spanning a range of financial instruments: equities, bonds, forex, commodities, and derivatives, etc.) on a daily basis from thousands of institutional market participants. Data can be ingested using a range of tools such as Sqoop, Kafka, Flume, API, etc.
The ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement. As we can see from the above, market manipulation is an activity that seems to constantly push the boundaries in new and unforeseen ways. This can be met using open source languages like Python and R. Multifaceted projects such as Apache Spark allow users to perform exploratory data analysis (EDA), data science-based analysis using language bindings with Python, R. etc. for a range of investigate use cases.
Provide advanced visualization techniques, thus helping compliance and surveillance officers manage the information overload.
The ability to perform deep cross-market analysis, i.e., to be able to look at financial instruments and securities trading on multiple geographies and exchanges.
The ability to create views and correlate data that are both wide and deep. A wide view is one that helps look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading, and unusual pricing.
The ability to provide in-memory caches of data for rapid pre-trade and post-trade compliance checks.
Ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models, i.e., best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into R.
Provide data scientists and quants with development interfaces using tools like SAS and R.
The results of the processing and queries need to be exported in various data formats, a simple CSV/txt format or more optimized binary formats, JSON formats, or even into custom formats. The results will be in the form of standard relational DB data types (i.e., String, Date, Numeric, Boolean).
Based on back testing and simulation, analysts should be able to tweak the model and also allow subscribers (typically compliance personnel) of the platform to customize their execution models.
A wide range of analytical tools needs to be integrated that allow the best dashboards and visualizations. This can be supported by platforms like Tableau, Qlikview, and SAS.
An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communication from a range of disparate systems, both internally and externally, and then match these things appropriately. The matching engine can be created using languages supported in Hadoop — Java, Scale, Python and R, etc.
Provide for multiple layers of detection capabilities starting with configuring business rules (that describe a trading pattern) as well as dynamic capabilities based on Machine Learning models (typically thought of as being more predictive). Such a system can also parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.
The next and final post will delve into the above logical architecture and will discuss the end-to-end flow from an open enterprise Hadoop design standpoint. We will use the Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) as candidate technologies for the implementation.
Call to Action
The Hortonworks 100% open-source solution is at the heart of it all in financial services: Connected data platforms for data at rest and data in motion. Working together, Hortonworks Data Platform and Hortonworks DataFlow to provide our retail banking and capital market customers with a crucial competitive advantage in their dynamic, competitive industries.
Published at DZone with permission of Vamsi Chemitiganti. See the original article here.
Opinions expressed by DZone contributors are their own.