Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Collection Tools for Events Analytics

DZone's Guide to

Data Collection Tools for Events Analytics

Tools like GA are great until a startup gets bigger and events analytics requirements get more sophisticated. Here are some of the best data collection tools for events data.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

One of the first things we do after launching a website is connect to Google Analytics. A little bit down the road, we’ll connect more “out-of-the-box” analytics tools to calculate funnels, retention, A/B tests, and more.

These tools are great and they work fine until a company gets bigger and analytics requirements get more sophisticated. It’s time to set up a data infrastructure, which means selecting a data collection tool, ETL tool, data warehouse, and BI tool on top of that.

In the startup world, this usually happens when a company has raised Series A and has around 25-50 employees. Google Analytics and other web analytics tools are not enough anymore. Their costs are rising, but requirements-wise, they are not delivering what you need. Also, at this point, you probably have a lot of data in other places, as well, such as production databases, marketing, and sales tools, and you want your reports to consolidate data from all these places.

For the scope of this post, we’ll give you an overview of the best data collection tools specifically for events data.

Build Your Own

You can probably build a simple proof of concept with AJAX requests sending events to your server and then writing it to your database in a couple of hours. But for a production-ready solution at scale, it could easily become a full-time job for several engineers at your company.

We’ve seen companies having this and loving it, and other companies suffering from maintenance and hoping to move away from this in-house setup to data collection tools at some point.

We’d recommend almost never doing it yourself unless you have very specific requirements and/or a use case where it’s impossible (or highly inefficient) to use available on-the-market options.

  • Pros: It is fun to write code.

  • Cons: You probably need to focus on your core business instead.

Segment

Segment allows you to route events from different sources to different destinations. JavaScript or your server code (such as Ruby or Node.js) clients could be considered one of the sources, and Redshift or BigQuery the destination. Segment can also send data from some services, such as Stripe, to your analytical warehouse, though it doesn’t support any database as a source.

Segment is quite popular for organizing events that stream to services, such as Google Analytics, Mixpanel, etc. We see that it became a part of the stack for a lot of companies early on. In a case when you have Segment already, it could be one of the most painless ways to “upgrade to SQL.” You just need to enable your warehouse as a data source and you’re good to go.

Pricing depends on monthly active users. Segment calls it Monthly Tracking Users (MTU). If you have a lot of users, which is usually the case for fast-growing B2C startups, Segment could become quite expensive: 100,000 MTU is ~$1,000 per month.

Segment could be a good option if you already use it to route events to different destinations and you don’t expect a high volume of monthly active users.

  • Pros: Easy to start if you already use Segment; good ecosystem with a lot of guides and ready-to-go solutions.

  • Cons: Vendor lock-in; your bill could go crazy.

Snowplow

Snowplow is an open-source web, mobile, and event analytics platform. Since it’s open-source, you don’t have a vendor lock-in here and should not have to worry about bills getting crazy. However, the initial implementation could be pretty expensive and you probably need to hire a consultant if your team doesn’t have enough experience.

There are some options to make it easier, such as hosting your Snowplow collector at third-party providers. As you scale, you can always host it yourself at some point.

Besides initial cost, you should also consider future maintenance since you’re going to host it yourself. Snowplow itself is battle-tested and production-ready on a big scale. It is more a question of getting enough expertise to implement and maintain it later.

  • Pros: No vendor lock-in; good ecosystem with technology and consultant partners.

  • Cons: Initial implementation could be quite expensive.

Firebase

For iOS/Android only.

Firebase started as a real-time backend-as-a-service. After it was acquired by Google in 2014, Firebase evolved into a bigger platform providing more features besides the real-time backend, such as crash reporting, push notifications, and analytics.

With Firebase Analytics, you can collect events data and assign properties to users. But the reason we mentioned it in our data collection tools overview is that it has native BigQuery integration, which makes it very convenient to load your data from Firebase to BigQuery.

It’s a go-to option if you already use other Firebase features, you have BigQuery, and are ready to have a long-term relationship with Google Infrastructure.

Analytics itself is free, but BigQuery integration is paid. The pricing is affordable, but not very straightforward. You can play around with their pricing calculator to get a better sense.

  • Pros: Easy to start if you have Firebase already; scales perfectly; affordable pricing.

  • Cons: Lock-in on Google Cloud products; iOS/Android only.

Heap

Heap is a mobile and web analytics platform similar to Mixpanel or Amplitude. The main difference with Heap is that it tracks everything automatically so you don’t need to specify events you want to send in your app. You create new events in the Heap interface by setting some rules; for example, a click on a specific button on a specific page could be considered a “Purchase event.”

Also, what makes Heap different and why it is here in the data collection tools list is that Heap provides a SQL feature, which basically is a managed Redshift instance. Heaps "owns" warehouse, but you can connect your BI tool by requesting credentials from the Heap team.

The “codeless” event creation is probably not a huge bonus here since you want your data team to control your raw data and expose transformed data models to end users. However, not placing snippets of Heap’s code across your application to send events could make life easier in case of migration away from Heap.

  • Pros: Easy to start for Heap users.

  • Cons: Vendor lock-in both on data collection and data warehousing.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
data collection ,event analytics ,big data ,heap ,firebase ,snowplow ,segment ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}