Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Analytics Race Amongst the World's Largest Companies

DZone's Guide to

The Analytics Race Amongst the World's Largest Companies

In a real-time world, there's no time to wait for lengthy ETL processes. Using a system that supports transactions AND analytics allows data to be analyzed in place.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Data is fueling the world’s most valuable companies. Today the list is topped by Apple, Google, Microsoft, Amazon, and Facebook. These top companies harness data to drive outsized value. While the companies are unique, they share a more common approach to analytics than you might expect.

Market Cap of World's Largest Companies

The Rapid Rise of Data Capture for Analytics

In a short span, entire industries have been born that didn’t exist previously. Each of these areas is supported by one or more of the world’s largest companies:

  • App stores from Apple and Google.
  • Online music, videos, and books from Apple, Google, and Amazon.
  • Seller marketplaces from Amazon.com.
  • Social networks from Facebook.

Apps Music Books

These areas have common characteristics driving the data workloads:

  • Incredibly large end user bases numbering hundreds of millions.

  • A smaller (but still large) base of creators or sellers.

The platform providers (Apple, Google, Amazon, and Facebook) seek analytics for:

  • Themselves.

  • The content producers or sellers.

  • Often, all the way to the end users.

All of these characteristics culminate in a stack that starts with the platform provider, extends up to the creators or sellers, and ends with consumers. At each level, there is a unique analytics requirement.

Multilevel Analytics

The App Store Example

Let’s use the App Store example to explore analytics architectures across this type of stack. App Stores are also an ideal example of new workloads that require a fresh approach to data engineering.

App Store Characteristics

The largest App Stores have the following characteristics:

  • Hundred of millions of end users.
  • Millions of application developers.
  • Dozens of app segments.
  • One primary platform provider (i.e. Apple, Google).

App Stores also represent a large, fast-growing segment of the economy. According to a recent article in the San Francisco Chronicle based on data from analytics firm App Annie, both Apple and Google are growing, with Android taking a recent lead.

This year, things are changing: Android app distributors will leap ahead of the App Store, according to projections by analytics firm App Annie. In 2017, the App Store will generate $40 billion in revenue, while Android app stores run by Google and other parties will generate $41 billion, App Annie said. That gap is expected to widen in 2021, with Android app stores generating $78 billion and Apple’s store at $60 billion, according to the analytics firm’s report, which was released on Wednesday.

Image title

App Store revenue and projections.

Data Workloads From App Stores

App Store workloads produce and collect information on:

  • The distribution of apps to end users.

  • App data coming from each app from each end user.

    • Transactional data.

    • Log data.

Desired Data Engineering Capabilities

To meet the needs for comprehensive and multilevel App Store analytics, data solutions need to provide:

  • Fast data capture, including the ability to ingest data in real-time.

  • Low latency query capability to support sophisticated queries with sub-second responses.

  • High concurrency, enabling many users to access the system simultaneously without slowdown.

Fast Ingest Low Latency Queries High Concurrency

Desired Analytical Capabilities

To serve all levels of requirements, App Stores (and many other areas with similar characteristics) need to deliver:

  • Analytics for the platform: Real-time analytics to understand operationally what is happening at any moment and ad hoc analytics for impromptu drill downs on specific queries.

  • Analytics for app developers: Including ad hoc queries so developers can segment the data any way they want and traditional solutions, serving many groups of analytics users often required pre-computing results (but this negated the option for ad hoc analytics).

  • Analytics for end users: Responsive, lightweight analytics for hundreds of millions of users, such as what apps are installed and up to date.

Analytics Architecture Strategies

For App Stores or any other large data-driven business, the following goals and implementation approaches can make analytics at scale easier to achieve.

Goals

  • Multilevel: Provides analytics across the platform, developers, and end consumers. Using the appropriate indexing and sharding approaches, the platform provider can architect a solution to meet the needs of all three constituents

  • Self-service: Empowering self-service analytics ensures that results are instant and up-to-date without the cost and complexity of pre-computing

Implementation Recommendations

  • Use a scale-out distributed system: A distributed system can support both the speed and volume required for large scale analytics. Further, the right indexing and sharding allow for queries to be segmented appropriately, i.e. if thousands of developers are each issuing queries about data regarding their own applications, those queries can be directed to data partitions specific to those developers and not the entirety of the distributed system. This approach allows a high degree of concurrent access.

  • Ensure a modern query execution system.

  • Newer systems include features such as:

    • Code compilation to facilitate sub-second responses on repetitive queries.

    • Distributed joins for efficient operations across multiple tables.

    • Vectorization to take advantage of the latest CPU capabilities such as Single Instruction Multiple Data (SIMD).

    • Bundle transactions support for enable real-time analytics.

In a real-time world, there is no time to wait for lengthy extract, transform, and load processes. Using a system that supports transactions as well as analytics allows data to be analyzed in place.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,data analytics ,real-time data ,transactions

Published at DZone with permission of Gary Orenstein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}