Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Analyzing Financial Time Series Data Using MariaDB ColumnStore

DZone's Guide to

Analyzing Financial Time Series Data Using MariaDB ColumnStore

Financial data can hold tons of valuable insights. Get some in-depth examples of how to analyze financial time series data using MariaDB ColumnStore.

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

MariaDB ColumnStore is a GPLv2 open-source columnar database built on MariaDB Server. It is a fork and evolution of the former InfiniDB. It can be deployed in the cloud (optimized for Amazon Web Services) or on a local cluster of Linux servers using either local or networked storage. In this blog post, I will show some examples of analysis of financial time series data using MariaDB ColumnStore.

Free Forex Historical Data at HistData.com

First of all, we will download Forex historical data (GBPUSD M1 Full 2016 Year Data), which is freely available on HistData.com. HistData.com forex historical data look like:

2016.01.03,17:00,1.473350,1.473350,1.473290,1.473290,0
2016.01.03,17:01,1.473280,1.473360,1.473260,1.473350,0
2016.01.03,17:02,1.473350,1.473350,1.473290,1.473290,0
2016.01.03,17:03,1.473300,1.473330,1.473290,1.473320,0

The first column is a timestamp of currency rate of every minute needed to convert the format in order to fit with ColumnStore DATETIME data type. I modified the timestamp format using the following simple Ruby script:

convert.rb:
#!/usr/bin/env ruby
id = 0
while line = gets
  date, time, open, high, low, close = line.split(",")
  year, month, day = date.unpack("a4xa2xa2")
  hour, minute = time.unpack("a2xa2")
  id+= 1
  print "#{id},#{year}-#{month}-#{day} #{hour}:#{minute},”
  puts  [open, high, low, close].join(“,”)
end

For example, you can convert the historical data as follows:  ruby convert.rb DAT_ASCII_GBPUSD_M1_2016.csv > gbpusd2016.csv. The converted forex historical data will be:

1,2016-01-03 17:00,1.473350,1.473350,1.473290,1.473290
2,2016-01-03 17:01,1.473280,1.473360,1.473260,1.473350
3,2016-01-03 17:02,1.473350,1.473350,1.473290,1.473290
4,2016-01-03 17:03,1.473300,1.473330,1.473290,1.473320

This demo was performed on the following setup:

  • CPU: Intel Core i7-7560U 2.4 GHz 2 physical cores/4 logical cores
  • OS: CentOS 7.3 ( virtualized on VMware Workstation 12 Pro on Windows 10 Pro)
  • Memory: 4GB
  • ColumnStore: 1 UM and 1 PM on single node

Next, we create a database and a table with MariaDB monitor:

# mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.23-MariaDB Columnstore 1.0.9-1

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE forex;
MariaDB [(none)]> USE forex;
MariaDB [forex]> CREATE TABLE gbpusd (
    id INT, 
    time DATETIME,
    open DOUBLE,
    high DOUBLE, 
    low DOUBLE, 
    close DOUBLE
    ) engine=ColumnStore default character set=utf8;

Note: With standard ColumnStore installation, MariaDB monitor command is mcsmysql (aliased to /usr/local/mariadb/columnstore/mysql/bin/mysql)
Storage engine must be ColumnStore

Now it's time to import CSV data whose timestamps are corrected. For bulk import, we use the cpimport command with ColumnStore. With -s (separator) option, we specify the delimiter character (in this example, comma).

# cpimport -s ',' forex gbpusd gbpusd2016.csv
Locale is : C
Column delimiter : ,

Using table OID 3163 as the default JOB ID
Input file(s) will be read from : /home/vagrant/histdata
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3163_D20170624_T103843_S950145_Job_3163.xml

...

2017-06-24 10:38:45 (29756) INFO : For table forex.gbpusd: 372480 rows processed and 372480 rows inserted.
2017-06-24 10:38:46 (29756) INFO : Bulk load completed, total run time : 2.11976 seconds

We could import 372,280 rows of forex data in 2 seconds. LOAD DATA LOCAL INFILE can be used as well with ColumnStore.

Sample Queries Using Functions

The reason why GBPUSD M1 data 2016 was chosen, is to track very quick moves of GBPUSD before and after the Brexit vote, which was held on June 23, 2016.

Now we look at maximum, minimum and drop off rate during June 23-24, 2016, using Windows Functions of ColumnStore.

MariaDB [forex]> SELECT MAX(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MAX(close) |
+------------+
|    1.50153 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MIN(close) |
+------------+
|    1.32322 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MAX(close)/MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+-----------------------+
| MAX(close)/MIN(close) |
+-----------------------+
|    1.1347546399658233 |
+-----------------------+
1 row in set (0.03 sec)

GBPUSD fell from 1.50 to 1.32 (-13%) in 2 days (actually in half a day).

Correlation GBPUSD: USDJPY on June 23, 2016

Next graph shows scatter plot of GBPUSD vs. USDJPY during June 23-24, 2016. (USDJPY M1 2016 were imported same as GBPUSD).

Note: GBPUSD is multiplied by 100, then subtracted by 130. USDJPY is subtracted by 110.

Now, we calculate Pearson correlation coefficient using statistical aggregation functions. Following SELECT statement was used:

SELECT
  (AVG(gbpusd.close*usdjpy.close) - AVG(gbpusd.close)*AVG(usdjpy.close)) / 
  (STDDEV(gbpusd.close) * STDDEV(usdjpy.close))
AS correlation_coefficient_population
FROM gbpusd
JOIN usdjpy ON gbpusd.time = usdjpy.time
WHERE
  gbpusd.time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

Query result:

+ ------------------------------------+
| correlation_coefficient_population |
+------------------------------------+
|                 0.9647697302087848 |
+------------------------------------+
1 row in set (0.57 sec)

GBPUSD and USDJPY were highly correlated during the UK EU membership referendum.

Moving Average GBPUSD June 23-24, 2016

In Forex trading, a moving average is often used to smooth out price fluctuations. The following query allows moving average of sliding 13 rows window.

SELECT 
  time,
  close,
  AVG(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS MA13,
  COUNT(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS row_count
FROM gbpusd
WHERE time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

Summary

In this blog post, the following features of MariaDB ColumnStore were explained:

  • With the cpimport command, users can easily perform fast bulk import to MariaDB ColumnStore tables.
  • MariaDB ColumnStore allows fast and easy data analytics using SQL 2003 compliant Window Functions, such as MAX, MIN, AVG, and STDDEV.
  • No index management for query performance tuning needed.

MariaDB ColumnStore can be downloaded here, detailed instructions to install and test MariaDB ColumnStore on Windows using Hyper-V can be found here.

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

Topics:
columnstore ,database

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}