DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.

Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.

Kubernetes in the Enterprise: The latest expert insights on scaling, serverless, Kubernetes-powered AI, cluster security, FinOps, and more.

A Guide to Continuous Integration and Deployment: Learn the fundamentals and understand the use of CI/CD in your apps.

Related

  • IoT Cloud Computing in IoT: Benefits and Challenges Explained
  • Effective Methods of Tackling Modern Cybersecurity Threats
  • Maximize Your Analytics Potential With Server-Side Tracking and Google Analytics 4 Integration
  • The Rising Risks and Opportunities in API Security

Trending

  • Performance of ULID and UUID in Postgres Database
  • Demystifying Databases, Data Warehouses, Data Lakes, and Data Lake Houses
  • Building Your Own AI Chatbot With React and ChatGPT API
  • Evolution of Software Architecture: From Monoliths to Microservices and Beyond
  1. DZone
  2. Data Engineering
  3. Data
  4. Pros and Cons of Multi-Step Data Platforms

Pros and Cons of Multi-Step Data Platforms

This blog post highlights the pros and cons of a “one shoe fits all approach,” where one platform is used for all the use cases vs. the “best tool for the job.”

Francesco Tisiot user avatar by
Francesco Tisiot
·
Jun. 13, 23 · Opinion
Like (1)
Save
Tweet
Share
2.8K Views

Join the DZone community and get the full member experience.

Join For Free

In the modern world, it's rare to have the data in the same shape and platform from the beginning till the end of its journey. Yes, some technologies can achieve quite a good range of functionalities but sometimes at the expense of precision, developer experience, or performance. Therefore, to achieve better or faster results, people might select a new tool for a precise task and start an implementation and integration process to move the data around.

This blog post highlights the pros and cons of a "one shoe fits all approach," where one platform is used for all the use cases vs. the "best tool for the job," where various tools and integrations are used to fulfill the requirements.

The Initial Data Touchpoint

To discuss data evolution, it's necessary to define its origin. Most of the time, the data is originated by some sort of application facing an internal or external audience and storing the transactions in a backend datastore. It's common for the backend to be a database (relational or not) that can immediately store and validate the data or a streaming service (like Apache Kafka) that can forward the transaction to the relevant parties.

No matter which solution is chosen, at this point of the data journey, the backend datastore must provide excellent performances in both the ability to write and read a single transaction since the application will mainly work at a single click/purchase/order level.

Evolving With Time

But, as for the intro, data rarely stays in place. On top of the transactional needs, new data stakeholders appear in the company: people in charge of the inventory are not interested in the single order movement but more in the overall daily trends; executives need to have a vision week by week of defined KPIs; selling agents want an immediate "maybe you should offer this" kinda functionality when talking to prospects.

All the above needs are very different from the original "transactional" task and require data storage techniques, functionalities, and APIs that must be adapted and customized for the single functionality and use-case to perform best.

Yes, there are tools that can perform a wide range of functions, but is it always suggested to stick with them?

The "Downloading to Excel" Problem

I've been in the Data and Analytics space for 15 years, and for 10 of them, I've been building centralized BI systems to provide a "unique source of truth" to companies.

For the same amount of years, I've been trying to battle against the "download to excel" practice. Whenever a certain functionality/option/calculation wasn't available in the tool of choice, people downloaded the entire dataset and recreating the analytical process in Excel. This, on top of the security chills, was causing a plethora of different ways to calculate the same KPI, therefore defeating the "unique source of truth" mission.

How to avoid that? By having frequent feedback loops about desired functionalities or outcomes and trying to build them into the main product. But, sometimes, the centralized BI efforts were not fast, accurate, or flexible enough for the end user; therefore, the leakage of data continued.

Later in my career, I realized that the main problem was trying to force everyone to use THE BI tool I was overlooking, which wasn't fitting all the stakeholder's needs. Therefore, it was important to include in the process that the feedback wasn't only about what to build next and what other tooling and integrations were needed to achieve the goal best. In other words, one centralized team on one platform can't scale, but a centralized team empowering others to use their favorite tool via trusted integration can!

The Modern Era of Business Units as IT Purchasers

Nowadays, a lot of the spending is not governed by central IT. Every division in a company is able to purchase, test, deploy and manage their tool of choice to achieve their goals. When data is in the mix, this again represents the "Excel" problem, but only with nicer UIs. Therefore it's very important that data teams act as enablers allowing business units to act, improve, analyze, and evolve over time by providing accurate and up-to-date data.

I feel that the times of "one tech fits all" are over… but what are we going to gain or miss?

Pros of a Single "One Tech Fits All" Solution

The main pros of a single "one tech fits all" solution were:

  • Unique source of truth: The initial datastore (database) containing the data is the only source of raw transactions, no questions of "how fresh is this data" needed to be raised.
  • Transaction accuracy: If using ACID databases, the transaction is either committed or uncommitted; therefore, there is a guarantee of always querying the current state.
  • Expertise: Providing several functionalities on top of the same technology means that the single tech expertise and related cost can be maximized.
  • Unique point of contact: For security and technical queries, there was a unique point of contact. The minimal amount of surface area exposed to attack.

Cons of the "One Tech Fits All" Solution

On the other side, the "one tech fits all" solution had also the following limitations:

  • Lack of features: Not all the features/functionalities required were included, and sometimes, if available, their performance was not optimized for the workload.
  • Performance: Adding different use cases and query patterns on top of the same tool means having noisy neighbors that can affect performance.
  • Sub Optimization: When hitting limits on the performance, the optimizations are usually a trade-off between the various use-cases requirements and are not focused on achieving the best performance for a single function.
  • Baseline: When hitting walls, like the lack of functionality, the source tech is only used as a baseline, with the majority of the work/parsing/analytics done elsewhere.

Pros of the "Best Tool for the Job" Solution

When choosing the best tool for the job after a proper evaluation phase, you have the following pros:

  • Tools fit exact needs: The evaluation phase is key to understanding the stakeholder requirements and buying/building the tool matching them.
  • Dedicated scaling: Scaling is purely based on the activity (or set of) that the tool needs to perform. no other use cases or noisy neighbors to optimize.
  • Performance: The tool selection is not only based on the feature availability but also on the performance. If limits of the selected choice are hit, upgrades of the components can be tight to the particular functionality needed.
  • Dedicated Expertise: Tooling experts can be hired or acquired, focusing on exactly the functionality covered by the use case.

Cons of the "Best Tool for the Job" Solution

On the other side, when now multiple tools are part of the data chain, you could face the following problems:

  • Integrations: Now, the data flow is the responsibility of the source platform, the target tool, and the integration between them. Dedicated people need to be responsible to create and monitor the connection between tools.
  • Export performance: Data needs to be exported from the source system providing minimal disruption. The methodology and frequency of extractions need to be agreed upon, possibly posing limits to the type of integrations and freshness of data.
  • Unique source of truth: The same or similar datasets are now available in several tools in possibly different formats. Which should be used?
  • Security: Having data assets spread across different tools exposes companies to a wider risk of a data breach or misusage.

What Lessons Can We Take From the "One Tech Fits All" to the "Best Tool for the Job" Solutions?

We somehow defined that the "best tool for the job" solution is the unavoidable choice nowadays, but has its own limits. How can we mitigate them with the lessons taken by the "one tech fits all" solution?

  • Create trusted integrations: If data is not in a unique place, create secure, efficient, fast integrations between tools that can provide data synchronization without compromising the source platform's performance. For example, tools like Debezium allow integration with some of the most used databases at the minimal cost of accessing their change logs.
  • Transaction accuracy: Rely on source system features to address the consistency of changes. Debezium allows tracking transaction metadata.
  • Security: Minimize the security risks by having methods to centrally define, approve, monitor, and evaluate security settings across technology and how they affect each other. Tools like the metadata parser could help in the job.

Summary

As a data team, embrace the "best tool for the job." Your role is to provide best-in-class, fast, monitorable, and observable integrations between systems.

Cons End user Data (computing) security Integration Analytics

Published at DZone with permission of Francesco Tisiot. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • IoT Cloud Computing in IoT: Benefits and Challenges Explained
  • Effective Methods of Tackling Modern Cybersecurity Threats
  • Maximize Your Analytics Potential With Server-Side Tracking and Google Analytics 4 Integration
  • The Rising Risks and Opportunities in API Security

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: