DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Building a Real-Time Change Data Capture Pipeline With Debezium, Kafka, and PostgreSQL
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes

Trending

  • Software Delivery at Scale: Centralized Jenkins Pipeline for Optimal Efficiency
  • Traditional Testing and RAGAS: A Hybrid Strategy for Evaluating AI Chatbots
  • Manual Sharding in PostgreSQL: A Step-by-Step Implementation Guide
  • The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
  1. DZone
  2. Data Engineering
  3. Data
  4. The Data Journey

The Data Journey

During the last few years I've observed that many organizations are in the middle of a three steps process, that typically takes years. This process is the Data Journey.

By 
Alejandro Martin user avatar
Alejandro Martin
·
Updated Sep. 23, 22 · Opinion
Likes (2)
Comment
Save
Tweet
Share
4.0K Views

Join the DZone community and get the full member experience.

Join For Free

During the last few years, I talked to many retailers and other large tech companies dealing with Data.

I've observed the same pattern repeatedly so far: these organizations are in the middle of three steps process, which usually takes more than 5 years.

1. Data Consolidation

Migrating a platform to a micro-services ecosystem means you'll generate tons silos of information: every service has its own private repository.

Most analytical use cases require merging information from multiple services, and this usually gets challenging, given that each service may even have different database technology for the data repository. The difficulty of solving analytical use cases grows exponentially under this paradigm.

That's why the first step of this process is consolidating this data in a data lake, ideally on the cloud, for scalability and flexibility.

2. Data Modeling

Once the information is all together in a data lake, organizations need to model and normalize it, so they enable analysts to get insights and run queries.

The same entity may be referenced using different codes across services, besides other non-standard conventions.

This task is ideally implemented following a domain-entity organization, and every vertical domain is accountable for modeling the entities they own for the rest of the company. This approach fits quite well with the Data Mesh Principles.

Ideally, teams will have tools that guarantee a single source of truth for definitions and logic and some level of automation and observability, such as dbt. It's quite easy to imagine why this tool is growing in popularity, given these premises.

Completing this step enables analytical and ML teams to provide huge value. Now your organization has a single source of truth and can run on-demand queries with "the power of the cloud." A huge leap forward.

3. Publication

Most companies are already between working on steps one and two, but they are really struggling with this last one: "Ok, now you have your data modeled on, say, Snowflake; how do I build a dashboard on top of that? How do I consume these metrics or insights in near real-time?"

The first attempt is typically building a REST data service on top of Snowflake, BigQuery, Redshift, etc. I've seen this countless times, always with the same poor results. These products are great for analytical, long-running queries, but they're just not meant for interactive use cases: they don't have the low latency and concurrency capabilities by design; they're something else.

And that's the exact situation where a product like Tinybird provides the most value: making your already built information available to use in real-time, with the concurrency and latency you need.

Operational Analytics

When you complete the three steps, the ultimate benefit you get as an organization is enabling real-time Operational Analytics.

This means you can operate your business in real-time based on actual facts and insights you get from your Data Platform.

For example, if you're a retailer running a big promotion during Black Friday, you'll be able to re-arrange the items on your website depending on their performance in real-time of hiding out-of-stock products. Supplying the demand in an optimal way during a timely event is huge for increasing revenue.

Another common use case that's also unlocked is real-time personalization for user experience when the website adapts to the users with their interaction.

A Note on Streaming Analytics

There's a general agreement that distributed systems increase complexity exponentially. So does asynchronous communications between services.

Once you've shifted to an event-driven paradigm, you'll typically want to ingest all your events using a common hub, for example, a Kafka cluster. But unlocking the value and getting insights from the information you already have on the platform in near real-time is challenging.

There are already a few products for streaming analytics, such as Rockset, ksqlDB, Apache Druid, Imply, etc. They work great for some streaming use cases, but they all fall a bit short when it comes to high volume and concurrency, complex logic or multiple joins.

That's because they don't leverage a full OLAP like Clickhouse, which enables arbitrary time spans (vs window functions), advanced joins for complex use cases, managed MVs for rollups, and many other benefits.

Data (computing)

Published at DZone with permission of Alejandro Martin. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Building a Real-Time Change Data Capture Pipeline With Debezium, Kafka, and PostgreSQL
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • Exploring Intercooler.js: Simplify AJAX With HTML Attributes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!