Concord.io: Processing Data in Motion
Tim Spann interviews Concord co-founder Shinji Kim about their distributed stream processing framework and lists some quick specs about the technology.
Join the DZone community and get the full member experience.Join For Free
Concord.io is very fast, is written in C++, and has an interesting new streaming framework that scales. There are a lot of stream processing frameworks, but this looks to be the only one written in fast C++ with APIs for multiple languages like Go and Java.
For code snippets, check out the walkthrough.
I spoke with Shinji Kim, Co-Founder of Concord Systems in NYC on what the fast deal is going on with Concord. Concord currently runs on Apache Mesos, but could run on other environments like YARN. The main use is for processing data in motion as it sits between message queues and databases. It's used for fast data enrichment, aggregation, filtering, and deduplication. Since it is written in C++ there is no garbage collection cycle to slow down processing and no worries about JVM heap memory management. It supports popular fast data sources like Kafka and Kinesis and data sinks like HDFS and MySQL. It uses the Pub/Sub Operator Model (composable jobs by metadata). Users can manage local state on a Key-Value store backed by RocksDB.
It supports writing applications in Python, Ruby, Go, Java, Scala and C++ using it's API.
Containerized execution environment.
Uses Open Source Twitter Zipkin for monitoring.
Concord systems reports benchmarks of Concord (500K QPS/node at 10ms/event) vs Spark Streaming (100K QPS/node at 1s batch window).
The API is only missing the ubiquituous hipster NodeJS and R. If this product becomes open source, those APIs are very likely to appear.
It's an interesting product to consider if you are hitting some performance walls in streaming.
Opinions expressed by DZone contributors are their own.