Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Time-Sensitive Trade Data Processing

DZone's Guide to

Time-Sensitive Trade Data Processing

Learn about algorithmic trading or automated trading: the use of historical data to predict future events in the world of trading.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Statistics and probability are two modern mathematical disciplines with which we can predict the future, to a reasonable extent, using a fair amount of past data. For example, things are not just randomly happening. They are not just random events. They have a strong relationship with time. The "future" is not just a random stream of events. Can we truly predict the future? With some future events, yes, we certainly can. We have no doubt that tomorrow’s sun is going to rise like it did today, and we know that after winter, there will come spring.

Those who are trading in stock markets are fortune seekers through numbers. Whether it's equity, forex, cryptocurrency, commodity, or any other complex derivatives, it’s still a sense of the future that is available through data.

Computers are the best candidate for coming up with accurate predictions for the future of a stock or asset using numbers because computers are a million times better than humans in calculations. This movement of future prediction by machines is called algorithmic trading or automated trading.

Most mature markets are heavy users of automated trading, and other immerging markets are quickly following in their footsteps. Even though computers are good at calculations, how could it be possible to predict events?

Trading data is very much time-sensitive. Some automated trading algorithms require processing historical data through many different financial formulas. The purpose of this article is to explain the techniques we took to process historical trade data requirements with direct fn algorithmic trading applications.

Challenges and Type of Queries

The entire subject of technical analysis is based on the belief of ever-repeating history. There are concepts, techniques, and defined mathematical formulas about financial markets. Mathematicians who were interested in financial markets have made such discoveries. The details discussed on the subject of technical analysis are used to identify trends in price movements and correlations between various assets classes. Traders do not only buy/sell orders but also reflect some aspects of miners — but instead of mining the earth, they are miners of historical trading data who want to extract hidden fortunes within the data.

A single year of trade data will be over ten million records for emerging trading markets, while busier markets might contain several hundreds of million records. Automated trading algorithms may issue some kind of time-sensitive queries over these large datasets.

If we iterate and select matching trades through this kind of large dataset, then answering a complex query will take hours or even days of execution. Linear searching must be avoided — if not completely, then at least up to a reasonable extent.

If we need to directly access the required data from its memory location, then we obviously must know the location of the data required by the query. With a mathematical formula to find the location of the data, accessing data directly will not only be possible but will also avoid linear searching.

The Shape of Data

Data can’t exist without a shape. Some data are inherently tabular and well-suited for relational databases. Some data exists in graphs, while some data represents complex hierarchical models. Finding the most natural shape of the data for a given application is a key factor in the success of the application and early completion of the project.

Historical stock trading data has some important properties to consider:

  • Time of the trade
  • Date of the trade
  • Symbol of the trade
  • Trade price and quantity
  • Sequence number

The number of seconds in a day is a constant — each day has 86,400 seconds, not any less or more. The number of trading symbols of a given trading venue is also a constant for a certain time period; days may grow with the dataset. Let’s store all other properties of a trade against time, date, and symbol: (time,date,symbol) -> {trade1, trade2 …. Traden}. These kinds of arrangements will result in a three-dimensional shape of a cuboid:

Image title

At this point, it’s all simple geometry, as data has become a tangible shape in the space. Applying formulas to avoid linear searching is now possible.

Time Series Databases

Like relational databases, time series databases are designed for answering time-sensitive queries. There are very well-tested and popular time series databases in the market: both open source and commercial. Time series databases are growing faster than any other discipline of databases, as the massive growth of IoT data is essentially time-sensitive; there are patterns to be discovered against the time to reason about why such things happen in this time.

To process historical time-sensitive data, our approach was to implement a simple time series database specifically designed for trade data handling, without going to advanced solutions in the market built for general purposes. The main data structure was a cube and was conceptualized as a Java interface, as listed below.

The implementation of the cube interface can vary due to requirements. It could be a simple file-based implementation or a complete in-memory implementation.

public interface Cube {

    /**
     * Save a trade in time series database
     * @param trade
     */
    void saveTrade(Trade trade);

    /**
     * save list of trades in same time series in to the database, the trade list must be in same time
     * @param tradeList
     * @param date
     */
    void saveTradeList(List<Trade> tradeList, String date);

    /**
     * will return list of trades for particular query.
     * @param tradeQuery
     * @return
     */
    List<Trade> getTrades(TradeQuery tradeQuery);

    /**
     * dynamic olhc calculation.
     * @param tradeQuery
     * @param intervalInSec
     * @return
     */
    List<OLHC> getOlhc(TradeQuery tradeQuery, int intervalInSec);

    /**
     * the results will be used in more advanced trading algorithms such as VWAP.
     * @param symbol
     * @param fromTime
     * @param toTime
     * @param historyDays
     * @param intervalInSec
     * @return
     */
    List<TimeTradeVolume> getTradeAtInterval(String symbol, String fromTime, String toTime, int historyDays, int intervalInSec);

}

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,time series ,data analytics ,predictive analytics ,data processing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}