DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Building an Open Source Mixpanel Alternative, Part 1: Collecting and Displaying Events

Building an Open Source Mixpanel Alternative, Part 1: Collecting and Displaying Events

In the first part of this series, we take a look at how you gather the necessary data to begin displaying your events.

Artyom Keydunov user avatar by
Artyom Keydunov
·
Mar. 22, 19 · Tutorial
Like (2)
Save
Tweet
Share
10.05K Views

Join the DZone community and get the full member experience.

Join For Free

This is the first part of a tutorial series on building an analytical web application with Cube.js. It expects the reader to be familiar with Javascript, Node.js, React, and have basic knowledge of SQL. The final source code is available here and the live demo is here. The example app is serverless and running on AWS Lambda. It displays data about its own usage.

Image title

There is a category of analytics tools like Mixpanel or Amplitude, which are good at working with events data. They are ideal for measuring product or engagement metrics, such as activation funnels or retention. They are also very useful for measuring A/B tests.

Although all these tools do a job, they are proprietary and cloud-based. That could be a problem when privacy is a concern. Or if one wants to customize how funnels or retention work under the hood. While traditional BI tools, like Tableau or Power BI, could potentially be used to run the same analysis, they cannot offer the same level of user experience. The problem is that they are designed to be general business intelligence tools, and not specific for funnels, retention, A/B tests, etc.

With recent advancements in frontend development, it became possible to rapidly develop complex user interfaces. Things which took a week to build five years ago can now be built in an afternoon. On the backend and infrastructure side, cloud-based MPP databases, such as BigQuery and Athena, are dramatically changing the landscape. The ELT approach, when data is transformed inside the database, is getting more and more popular, replacing traditional ETL. Serverless architecture makes it possible to easily deploy and scale applications.

All of these made it possible to build internal alternatives to established services like Mixpanel, Amplitude, or Kissmetrics. In this series of tutorials, we’re going to build a full-featured open-source event analytics system.

It will include the following features:

  • Data collection;
  • Dashboarding;
  • Ad hoc analysis with query builder;
  • Funnel analysis;
  • Retention analysis;
  • Serverless deployment;
  • A/B tests;
  • Real-time events monitoring;

The diagram below shows the architecture of our application:

Image title

In the first part of our tutorial, we’ll focus more on how to collect and store data, and briefly cover how to make a simple chart based on this data. The following parts focus more on querying data and building various analytics reporting features.

Collecting Events

We’re going to use Snowplow Cloudfront Collector and Javascript Tracker. We need to upload a tracking pixel to Amazon CloudFront CDN. The Snowplow Tracker sends data to the collector by making a GET request for the pixel and passing data as a query string parameter. The CloudFront Collector uses CloudFront logging to record the request (including the query string) to an S3 bucket.

Next, we need to install Javascript Tracker. Here is the full guide.

But, in short, it is similar to Google Analytics’s tracking code or Mixpanel’s, so we need to just embed it into our HTML page.

<script type="text/javascript">      
  ;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
   p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
   };p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
   n.src=w;g.parentNode.insertBefore(n,g)}} .  (window,document,"script","//d1fc8wv8zag5ca.cloudfront.net/2.10.2/sp.js","snowplow"));

  window.snowplow('newTracker', 'cf', '<YOUR_CLOUDFRONT_DISTRIBUTION_URL>’, { post: false });
</script>


Here you can find how it is embedded into our example application.

Once we have our data, which is CloudFront logs, in the S3 bucket, we can query it with Athena. All we need to do is create a table for CloudFront logs.

Copy and paste the following DDL statement into the Athena console. Modify the LOCATION for the S3 bucket that stores your logs.


CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
  `date` DATE,
  time STRING,
  location STRING,
  bytes BIGINT,
  requestip STRING,
  method STRING,
  host STRING,
  uri STRING,
  status INT,
  referrer STRING,
  useragent STRING,
  querystring STRING,
  cookie STRING,
  resulttype STRING,
  requestid STRING,
  hostheader STRING,
  requestprotocol STRING,
  requestbytes BIGINT,
  timetaken FLOAT,
  xforwardedfor STRING,
  sslprotocol STRING,
  sslcipher STRING,
  responseresulttype STRING,
  httpversion STRING,
  filestatus STRING,
  encryptedfields INT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/AWSLogs/Account_ID/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )


Now we are ready to connect Cube.js to Athena and start building our first dashboard.

Building Our First Chart

First, install Cube.js CLI. It is used for various Cube.js workflows.

$ npm install -g cubejs-cli


Next, сreate a new Cube.js service by running the following command. Note, we are specifying Athena as a database here ( -d athena ) and template as serveless ( -t serverless ). Cube.js supports different configurations, but for this tutorial, we will use the serverless one.

$ cubejs create event-analytics-backend -d athena -t serverless


Once run, the create command will create a new project directory that contains the scaffolding for your new Cube.js project. This includes all the files necessary to spin up the Cube.js backend, example frontend code for displaying the results of Cube.js queries in a React app, and some example schema files to highlight the format of the Cube.js Data Schema layer.

The .env file in this project directory contains placeholders for the relevant database credentials. For Athena, you'll need to specify the AWS access and secret keys with the access necessary to run Athena queries, and the target AWS region and S3 output location where query results are stored.

CUBEJS_DB_TYPE=athena
CUBEJS_AWS_KEY=<YOUR ATHENA AWS KEY HERE>
CUBEJS_AWS_SECRET=<YOUR ATHENA SECRET KEY HERE>
CUBEJS_AWS_REGION=<AWS REGION STRING, e.g. us-east-1>
# You can find the Athena S3 Output location here: https://docs.aws.amazon.com/athena/latest/ug/querying.html
CUBEJS_AWS_S3_OUTPUT_LOCATION=<S3 OUTPUT LOCATION>


Now, let’s create a basic Cube.js Schema for our events model. Cube.js uses Data Schema to generate and execute SQL; you can read more about it here.

Create a schema/Events.js file with the following content.


const regexp = (key) => `&${key}=([^&]+)`;
const parameters = {
  event: regexp('e'),
  event_id: regexp('eid'),
  page_title: regexp('page')
}

cube(`Events`, {
  sql:
    `SELECT
      from_iso8601_timestamp(to_iso8601(date) || 'T' || "time") as time,
      ${Object.keys(parameters).map((key) => ( `url_decode(url_decode(regexp_extract(querystring, '${parameters[key]}', 1))) as ${key}` )).join(", ")}
    FROM cloudfront_logs
    WHERE length(querystring) > 1
    `,

  measures: {
    pageView: {
      type: `count`,
      filters: [
        { sql: `${CUBE}.event = 'pv'` }
      ]
    },
  },

  dimensions: {
    pageTitle: {
      sql: `page_title`,
      type: `string`
    }
  }
});


In the schema file, we create an Events cube. It is going to contain all the information about our events. In the base SQL statement, we’re extracting values from the query string sent by the tracker by using the regexp function. Cube.js is good at running transformations such this and it could also materialize some of them for performance optimization. We’ll talk about it in the next parts of our tutorial.

With this schema in place, we can run our dev server and build the first chart.

Spin up the development server by running the following command.

$ npm dev


Visit http://localhost:4000, it should open a CodeSandbox with an example. Change the renderChart function and the query variable to the following.

const renderChart = resultSet => (
  <Chart height={400} data={resultSet.chartPivot()} forceFit>
    <Coord type="theta" radius={0.75} />
    <Axis name="Events.pageView" />
    <Legend position="right" name="category" />
    <Tooltip showTitle={false} />
    <Geom type="intervalStack" position="Events.pageView" color="x" />
  </Chart>
);

const query = {
  measures: ["Events.pageView"],
  dimensions: ["Events.pageTitle"]
};


Now, you should be able to see the pie chart, depending on what data you have in your S3.

Image title

In the next part, we’ll walk through how to build Conversion Funnels. In the third part we will make a dashboard and dynamic query builder, like one in Mixpanel or Amplitude. Part 3 also will cover Retention Analysis. In the final part, we will discuss how to deploy the whole application in the serverless mode to AWS Lambda.

You can check out the full source code of the application here.

And the live demo is available here.

Open source Database Event Mixpanel CloudFront Data (computing) application AWS

Published at DZone with permission of Artyom Keydunov. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Reliability Is Slowing You Down
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT
  • OpenVPN With Radius and Multi-Factor Authentication
  • Choosing the Right Framework for Your Project

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: