DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Data
  4. Ingest BTC.com and Blockchain.com Data via Apache NiFi

Ingest BTC.com and Blockchain.com Data via Apache NiFi

How to ingest bitcoin and other cryptocurrency data from blockchain.com and btc.com.

Tim Spann user avatar by
Tim Spann
CORE ·
Jul. 05, 18 · Tutorial
Like (3)
Save
Tweet
Share
6.32K Views

Join the DZone community and get the full member experience.

Join For Free

Ingesting Blockchain Data From btc.com and blockchain.com(.info).

Blockchain exchanges contain a large amount of data that you need to ingest, process, store, and analyze if you wish to make proper trades and find the right time to buy and sell various currencies. Bitcoin, being the first and most valuable, is a currency I wish to trash. Using two sites that provide great APIs for data: btc.com and blockchain.com. They provide easy to ingest and fast REST APIs returning clean JSON. With this data, sometimes like in Blockchain Blocks for the current data in the format of multiple JSON records, I want to split that, which is easy with Apache NiFi. Once there, I use the prebuilt JSON schema (I used InferAvroSchema to derive this). I can now run live queries on my blockchain data as it enters the system before I put it into Apache Hive for permanent storage.

Image title

API Calls

blockchain.com

  • https://blockchain.info/latestblock

  • https://api.blockchain.info/charts/transactions-per-second?timespan=5weeks&rollingAverage=8hours&format=json

  • https://blockchain.info/blocks/BTC.com?format=json

  • https://api.blockchain.info/pools?timespan=5days

  • https://api.blockchain.info/stats

  • https://blockchain.info/ticker

  • https://blockchain.info/tobtc?currency=USD&value=10000

Blocks For Today

  • https://chain.api.btc.com/v3/block/date/${now():format('yyyyMMdd')}

btc.com

  • https://chain.api.btc.com/v3/block/latest

  • https://chain.api.btc.com/v3/tx/unconfirmed

Deeper Use Case

Today's Blocks

This API returns records for all of today's blocks.   We are gathering some important information like nonce, version, timestamp, hash, pool, and previous/next hash to link the blocks.

Apache Avro Schema in JSON Format Stored in Hortonworks Schema Registry

{
 "type": "record",
 "name": "blocksfortoday",
 "fields": [
  {
   "name": "height",
   "type": "int",
   "doc": "Type inferred from '527784'"
  },
  {
   "name": "version",
   "type": "int",
   "doc": "Type inferred from '536870912'"
  },
  {
   "name": "mrkl_root",
   "type": "string",
   "doc": "Type inferred from '\"c8f658ad595854f4c8c510b672447e838a2746e8724bfb26d0d127e5a4421385\"'"
  },
  {
   "name": "timestamp",
   "type": "int",
   "doc": "Type inferred from '1529180826'"
  },
  {
   "name": "bits",
   "type": "int",
   "doc": "Type inferred from '389609537'"
  },
  {
   "name": "nonce",
   "type": "int",
   "doc": "Type inferred from '236046944'"
  },
  {
   "name": "hash",
   "type": "string",
   "doc": "Type inferred from '\"00000000000000000017ca1d74bdee575dd48d4b3513eea2f7e06b313883d73d\"'"
  },
  {
   "name": "prev_block_hash",
   "type": "string",
   "doc": "Type inferred from '\"00000000000000000000acf6259fffe63d36623c324f756faaf995a9e2896b87\"'"
  },
  {
   "name": "next_block_hash",
   "type": "string",
   "doc": "Type inferred from '\"0000000000000000000000000000000000000000000000000000000000000000\"'"
  },
  {
   "name": "size",
   "type": "int",
   "doc": "Type inferred from '189026'"
  },
  {
   "name": "pool_difficulty",
   "type": "long",
   "doc": "Type inferred from '11831716619811'"
  },
  {
   "name": "difficulty",
   "type": "double",
   "doc": "Type inferred from '4.940704885521827E12'"
  },
  {
   "name": "tx_count",
   "type": "int",
   "doc": "Type inferred from '486'"
  },
  {
   "name": "reward_block",
   "type": "int",
   "doc": "Type inferred from '1250000000'"
  },
  {
   "name": "reward_fees",
   "type": "int",
   "doc": "Type inferred from '15691427'"
  },
  {
   "name": "created_at",
   "type": "int",
   "doc": "Type inferred from '1529180835'"
  },
  {
   "name": "confirmations",
   "type": "int",
   "doc": "Type inferred from '1'"
  },
  {
   "name": "is_orphan",
   "type": "boolean",
   "doc": "Type inferred from 'false'"
  },
  {
   "name": "curr_max_timestamp",
   "type": "int",
   "doc": "Type inferred from '1529180826'"
  },
  {
   "name": "is_sw_block",
   "type": "boolean",
   "doc": "Type inferred from 'true'"
  },
  {
   "name": "stripped_size",
   "type": "int",
   "doc": "Type inferred from '153817'"
  },
  {
   "name": "weight",
   "type": "int",
   "doc": "Type inferred from '650477'"
  },
  {
   "name": "extras",
   "type": {
    "type": "record",
    "name": "extras",
    "fields": [
     {
      "name": "pool_name",
      "type": "string",
      "doc": "Type inferred from '\"BTC.com\"'"
     },
     {
      "name": "pool_link",
      "type": "string",
      "doc": "Type inferred from '\"https://pool.btc.com\"'"
     }
    ]
   },
   "doc": "Type inferred from '{\"pool_name\":\"BTC.com\",\"pool_link\":\"https://pool.btc.com\"}'"
  }
 ]
}

QueryRecord Query

 SELECT * FROM FLOWFILE WHERE CAST(tx_count AS INT) > 0 

With my query in Apache NiFi, I am only interested in records that have a transaction count.  So we filter out zero records.

Once we have ingested the data real-time with Apache NiFi, I can now store it in the external table that NiFi built the DDL for. As you can see, it's an external table since our Apache ORC files are stored outside of the Hive data warehouse directory. So, we just have to create new Apache ORC files in that LOCATION directory to add more rows to the dataset.

Create an Apache Hive Table

CREATE EXTERNAL TABLE IF NOT EXISTS blocksfortoday1 (height INT, version INT, mrkl_root STRING, timestamp INT, bits INT, nonce INT, hash STRING, prev_block_hash STRING, next_block_hash STRING, size INT, pool_difficulty BIGINT, difficulty DOUBLE, tx_count INT, reward_block INT, reward_fees INT, created_at INT, confirmations INT, is_orphan BOOLEAN, curr_max_timestamp INT, is_sw_block BOOLEAN, stripped_size INT, weight INT, extras STRUCT<pool_name:STRING, pool_link:STRING>)
STORED AS ORC
LOCATION '/blocksfortoday1'

Example Apache Hive Query Run in Apache Zeppelin

select * from blocksfortoday1 where  CAST(tx_count as INT) > 500 order by created_at desc


Image title

Apache Zeppelin let's use query our current and constantly ingesting Blocks For Today Hive data as well as create charts and graphs. In the same environment, I can run Spark batch code, Spark machine learning, as well as Python libraries for Deep Learning like Apache MXNet and TensorFlow.   

References:

  • https://btc.com/
  • https://www.blockchain.com/en/explorer
  • https://github.com/blockchain
Data (computing) Database Apache NiFi Apache Hive

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DevOps vs Agile: Which Approach Will Win the Battle for Efficiency?
  • Tracking Software Architecture Decisions
  • 10 Best Ways to Level Up as a Developer
  • Comparing Map.of() and New HashMap() in Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: