DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. BI/Analytics on Big Data/Cassandra: Vertica, Acunu and Intravert (!?)

BI/Analytics on Big Data/Cassandra: Vertica, Acunu and Intravert (!?)

Brian O' Neill user avatar by
Brian O' Neill
·
Apr. 03, 13 · Interview
Like (0)
Save
Tweet
Share
6.12K Views

Join the DZone community and get the full member experience.

Join For Free

As part of our presentation up at NYC* Big Data Tech Day, we noted that Hadoop didn't really work for us.  It was great for ingesting flat files from HDFS into Cassandra, but the map/reduce jobs that used Cassandra as input didn't cut it.  We found ourselves contorting our solutions to fit within the map/reduce framework, which required developer-level capabilities.  We had to add complexity into the system to do batch management/composition, and in the end the map/reduce jobs took too long to complete.

Eventually, we swapped out Hadoop for Storm.  That allowed us to do real-time cumulative analytics.  And most recently, we converted our topologies to Trident.  Handling all CRUD operations through Storm allowed us to perform roll-up metrics by different dimensions using Trident State.  (Additionally, we can write to wide-rows for indexing, etc.)

This is working really well, but we are seeing increasing demand from our data scientists and customers to support "ad hoc" dimensional analysis, dashboards, and reporting.  Elastic Search keeps us covered on many of the ad hoc queries, but aside from facets, it has little support for real-time dimensional aggregations, and no support for dashboards and reports.

We turned to the industry to find the best of breed.  With some help from others that have traveled this road, (shout out to @elubow), we settled on Vertica, Infobright and Acunu as contenders.  I quickly grabbed VM's from each of them and went to work. 

WARNING: What I'm about to say is based on a few days experimentation, and largely consists of initial first impressions.  It has no basis on real production experience. (yet =)

First up was Acunu.  Although each of the VMs functioned as an appliance, when logging into the VM and playing around with things, we were most at home with Acunu.  Acunu is backed by Cassandra.  Having C* installed and running as the persistence layer was like having an old friend playing wingman on an initial first date.  (they can bail you out if things start going south =)

Acunu had a nice REST API and a simple enough web-based UI to manage schemas and dimensions.  Within minutes, I was inserting data from a ruby script and playing around with dashboards.... until something went wrong and the server starting throwing OoM's.  After a restart, things cleared up, but it left me questioning the stability a bit.  (once again, this was a *single* vm running on my laptop, so it wasn't the most robust environment)

Next, I moved on to Vertica.  From a features and functions point of view, Vertica looked to be leaps and bounds ahead.  It had sophisticated support for R, which would make our data scientists happy.  It also has compression capabilities, which will make our IT/Ops guys happy.  And it looked to have some sophisticated integration with Hadoop, just in case we ever wanted/needed to support deep analytics that could leverage M/R.

That said, it was far more cumbersome to get up and running, and felt a bit like I went backwards in time.  I couldn't find a REST API. (please let me know if someone has one for Vertica)  So, I was left to go through the hoop-drill of getting a JDBC client driver, which was not available in public repos, etc.  When using the admin tool provided on the appliance, I felt like I was back in middle school (early 90's) installing linux via an ANSI interface on an Intel 8080.  In the end however, I grew accustomedto their client (vsql) and was happily hacking away over the JDBC driver and it felt fairly solid.

Although we are still interested in pursuing both Acunu and Vertica, both experiences left me wanting.   What we really want is a fully open-source solution (preferably apache license) that we are free to enhance, supplement, etc.... with optional commercial support.

That got me thinking about Edward Capriolo's presentation on Intravert.   If I boil down our needs into "must-haves" and "nice-to-haves", what we really *need* is just an implementation of Rainbird.  (http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011) 

AS AN ASIDE:
Does anyone know what happened to Rainbird?  I've been trying to get the answer, to no avail.
http://www.youtube.com/watch?v=84k7o4GdkQg

Now, time for crazy talk...
Intravert provides a slick REST API for CRUD operations on Cassandra.  As I said before, I'm a *huge* REST fan.  It provides the loose-coupling for everything in our polyglot persistence architecture.    Intravert also provides a loosely coupled eventing framework to which I can attached handlers.   What if I implemented a handler, that took the CRUD events, and updated additional column families with the dimensional counts/aggregations???    If I then combine that with a javascript framework for charting, how far would that get me?  (60-70% solution?)

To be clear, I'm not bashing Vertica or Acunu.  Both have solid value propositions and they are both contenders in our options analysis.  I'm just mourning the fact that there seems to be no good open-source solution in this space like there are in others.  (Neo4j/TitanDB for graphs, Elastic Search/SOLR for search, Kafka/Kestrel for queueing, Cassandra for Storage, etc.)

We are also considering Druid and Infobright, but I haven't gotten to them yet:
https://github.com/metamx/druid
Please don't bash me for early judgments.
I'm definitely interested in hearing people's thoughts.

Data science Open source Big data

Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Chaos Engineering Tutorial: Comprehensive Guide With Best Practices
  • Best Navicat Alternative for Windows
  • How To Set Up and Run Cypress Test Cases in CI/CD TeamCity
  • How To Select Multiple Checkboxes in Selenium WebDriver Using Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: