DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Exploring the Role of Data Analytics in SOC Alert Tuning
  • Simplifying Data Management for Technology Teams With HYCU
  • Empowering Insights: Unlocking the Potential of Microsoft Fabric for Data Analytics
  • API Analytics: Unleashing the Power of Data-Driven Insights for Enhanced API Management

Trending

  • Rethinking Recruitment: A Journey Through Hiring Practices
  • Segmentation Violation and How Rust Helps Overcome It
  • Doris: Unifying SQL Dialects for a Seamless Data Query Ecosystem
  • Chaos Engineering for Microservices
  1. DZone
  2. Data Engineering
  3. Data
  4. Pros and Cons of Multi-Step Data Platforms

Pros and Cons of Multi-Step Data Platforms

This blog post highlights the pros and cons of a “one shoe fits all approach,” where one platform is used for all the use cases vs. the “best tool for the job.”

By 
Francesco Tisiot user avatar
Francesco Tisiot
·
Jun. 13, 23 · Opinion
Likes (1)
Comment
Save
Tweet
Share
3.0K Views

Join the DZone community and get the full member experience.

Join For Free

In the modern world, it's rare to have the data in the same shape and platform from the beginning till the end of its journey. Yes, some technologies can achieve quite a good range of functionalities but sometimes at the expense of precision, developer experience, or performance. Therefore, to achieve better or faster results, people might select a new tool for a precise task and start an implementation and integration process to move the data around.

This blog post highlights the pros and cons of a "one shoe fits all approach," where one platform is used for all the use cases vs. the "best tool for the job," where various tools and integrations are used to fulfill the requirements.

The Initial Data Touchpoint

To discuss data evolution, it's necessary to define its origin. Most of the time, the data is originated by some sort of application facing an internal or external audience and storing the transactions in a backend datastore. It's common for the backend to be a database (relational or not) that can immediately store and validate the data or a streaming service (like Apache Kafka) that can forward the transaction to the relevant parties.

No matter which solution is chosen, at this point of the data journey, the backend datastore must provide excellent performances in both the ability to write and read a single transaction since the application will mainly work at a single click/purchase/order level.

Evolving With Time

But, as for the intro, data rarely stays in place. On top of the transactional needs, new data stakeholders appear in the company: people in charge of the inventory are not interested in the single order movement but more in the overall daily trends; executives need to have a vision week by week of defined KPIs; selling agents want an immediate "maybe you should offer this" kinda functionality when talking to prospects.

All the above needs are very different from the original "transactional" task and require data storage techniques, functionalities, and APIs that must be adapted and customized for the single functionality and use-case to perform best.

Yes, there are tools that can perform a wide range of functions, but is it always suggested to stick with them?

The "Downloading to Excel" Problem

I've been in the Data and Analytics space for 15 years, and for 10 of them, I've been building centralized BI systems to provide a "unique source of truth" to companies.

For the same amount of years, I've been trying to battle against the "download to excel" practice. Whenever a certain functionality/option/calculation wasn't available in the tool of choice, people downloaded the entire dataset and recreating the analytical process in Excel. This, on top of the security chills, was causing a plethora of different ways to calculate the same KPI, therefore defeating the "unique source of truth" mission.

How to avoid that? By having frequent feedback loops about desired functionalities or outcomes and trying to build them into the main product. But, sometimes, the centralized BI efforts were not fast, accurate, or flexible enough for the end user; therefore, the leakage of data continued.

Later in my career, I realized that the main problem was trying to force everyone to use THE BI tool I was overlooking, which wasn't fitting all the stakeholder's needs. Therefore, it was important to include in the process that the feedback wasn't only about what to build next and what other tooling and integrations were needed to achieve the goal best. In other words, one centralized team on one platform can't scale, but a centralized team empowering others to use their favorite tool via trusted integration can!

The Modern Era of Business Units as IT Purchasers

Nowadays, a lot of the spending is not governed by central IT. Every division in a company is able to purchase, test, deploy and manage their tool of choice to achieve their goals. When data is in the mix, this again represents the "Excel" problem, but only with nicer UIs. Therefore it's very important that data teams act as enablers allowing business units to act, improve, analyze, and evolve over time by providing accurate and up-to-date data.

I feel that the times of "one tech fits all" are over… but what are we going to gain or miss?

Pros of a Single "One Tech Fits All" Solution

The main pros of a single "one tech fits all" solution were:

  • Unique source of truth: The initial datastore (database) containing the data is the only source of raw transactions, no questions of "how fresh is this data" needed to be raised.
  • Transaction accuracy: If using ACID databases, the transaction is either committed or uncommitted; therefore, there is a guarantee of always querying the current state.
  • Expertise: Providing several functionalities on top of the same technology means that the single tech expertise and related cost can be maximized.
  • Unique point of contact: For security and technical queries, there was a unique point of contact. The minimal amount of surface area exposed to attack.

Cons of the "One Tech Fits All" Solution

On the other side, the "one tech fits all" solution had also the following limitations:

  • Lack of features: Not all the features/functionalities required were included, and sometimes, if available, their performance was not optimized for the workload.
  • Performance: Adding different use cases and query patterns on top of the same tool means having noisy neighbors that can affect performance.
  • Sub Optimization: When hitting limits on the performance, the optimizations are usually a trade-off between the various use-cases requirements and are not focused on achieving the best performance for a single function.
  • Baseline: When hitting walls, like the lack of functionality, the source tech is only used as a baseline, with the majority of the work/parsing/analytics done elsewhere.

Pros of the "Best Tool for the Job" Solution

When choosing the best tool for the job after a proper evaluation phase, you have the following pros:

  • Tools fit exact needs: The evaluation phase is key to understanding the stakeholder requirements and buying/building the tool matching them.
  • Dedicated scaling: Scaling is purely based on the activity (or set of) that the tool needs to perform. no other use cases or noisy neighbors to optimize.
  • Performance: The tool selection is not only based on the feature availability but also on the performance. If limits of the selected choice are hit, upgrades of the components can be tight to the particular functionality needed.
  • Dedicated Expertise: Tooling experts can be hired or acquired, focusing on exactly the functionality covered by the use case.

Cons of the "Best Tool for the Job" Solution

On the other side, when now multiple tools are part of the data chain, you could face the following problems:

  • Integrations: Now, the data flow is the responsibility of the source platform, the target tool, and the integration between them. Dedicated people need to be responsible to create and monitor the connection between tools.
  • Export performance: Data needs to be exported from the source system providing minimal disruption. The methodology and frequency of extractions need to be agreed upon, possibly posing limits to the type of integrations and freshness of data.
  • Unique source of truth: The same or similar datasets are now available in several tools in possibly different formats. Which should be used?
  • Security: Having data assets spread across different tools exposes companies to a wider risk of a data breach or misusage.

What Lessons Can We Take From the "One Tech Fits All" to the "Best Tool for the Job" Solutions?

We somehow defined that the "best tool for the job" solution is the unavoidable choice nowadays, but has its own limits. How can we mitigate them with the lessons taken by the "one tech fits all" solution?

  • Create trusted integrations: If data is not in a unique place, create secure, efficient, fast integrations between tools that can provide data synchronization without compromising the source platform's performance. For example, tools like Debezium allow integration with some of the most used databases at the minimal cost of accessing their change logs.
  • Transaction accuracy: Rely on source system features to address the consistency of changes. Debezium allows tracking transaction metadata.
  • Security: Minimize the security risks by having methods to centrally define, approve, monitor, and evaluate security settings across technology and how they affect each other. Tools like the metadata parser could help in the job.

Summary

As a data team, embrace the "best tool for the job." Your role is to provide best-in-class, fast, monitorable, and observable integrations between systems.

Cons End user Data (computing) security Integration Analytics

Published at DZone with permission of Francesco Tisiot. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Exploring the Role of Data Analytics in SOC Alert Tuning
  • Simplifying Data Management for Technology Teams With HYCU
  • Empowering Insights: Unlocking the Potential of Microsoft Fabric for Data Analytics
  • API Analytics: Unleashing the Power of Data-Driven Insights for Enhanced API Management

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!