People moving to Talend from other integration tools always come to our team with many questions in mind. As a Customer Success Architect, the most common ones I come across are:
- Does Talend support most of the features we already have in our current tool?
- How quickly can we learn and start building production-quality data integration jobs?
- How can I get started quickly if I have never used modern data integration tools before?
Too often, I see customers moving from a proprietary software to a solution built on open source technologies ignoring design patterns or "anti-patterns" which are considered normal in object-oriented programming communities like Java, .Net.
In my attempt to revisit those anti-patterns, I have listed and summarized a few patterns which still have relevance when using Talend. There are many anti-patterns which appear due to weak organizational decision-making processes or inappropriate Project Management practices. In this blog, I want to focus more on software engineering design practices that can adversely impact a new system roll-out. Let's analyze their symptoms/consequences, root causes and possible solutions below.
As the name itself suggests, the age-old wood burning stoves often needed repairing and, as a common practice, were repaired with any materials at hand. This caused ad-hoc repairs leading to additional ad-hoc structures of many software systems. In software engineering, stovepipes can result during migration of interfaces (aka. autogenerated), due to islands of implementation (aka. enterprise), or legacy software with undesirable qualities (aka. system).
During the migration of interfaces, there might be old interfaces which are implementation specific & tightly coupled to the underlying subsystem, which can cause dependencies when scaling to a distributed world. There can also be location, address space or access level restrictions which have to be considered during migration. Possible solutions to stovepipe issues in software design can include re-engineering the interfaces completely from scratch and replacing old object models with new, agile data modeling techniques centered around a distributed architecture. You can also look at a Microservices architecture for solving problems around interoperability and consistency.
Islands of Implementation
The common symptoms of islands of implementation are an incorrect use of technology standards, usability and interoperability issues, excessive cost and time escalations due to changing business needs. The root cause for this is typically around not having enterprise level standards, organizational structures leading to poor communications, inappropriate trained resources deployed in projects. But these can also occur during corporate mergers, acquisitions or due to vendor-lock ins. The possible enterprise level solutions include building proper requirement models (high-level architecture, short-term standards, releases/installation conventions, scoping of system capabilities) and specification models (enterprise architecture, interoperability specification, development profiles).
Migration From Legacy Systems
The symptoms companies experience when migrating from legacy systems can be around insufficient/outdated documentation, expensive requirement changes, lots of workarounds, interoperability issues etc. The root causes can be due to lack of architectural vision, technological disruptions, tight coupling, insufficient use of metadata, lack of abstraction layer etc. Use of component architectures that provides flexible substitutions of software modules due to fast-changing business/technology landscapes can solve this issue. This can be achieved by appropriate use of microservices architecture, metadata management tool in a component architecture, and the usage of data dictionary tools.
Vertical design elements are those dependent on specific software application and individual implementation. Horizontal design elements are those that are common among applications. The "Jumble" problem occurs when developers/architects mix up these horizontal and vertical elements with symptoms of stability, reusability and scaling popping up in the design. The Horizontal-Vertical-Metadata (HVM) pattern depicted below shows a way forward by doing some code refactoring with adding an extension to specialized functionality as vertical elements and with trading off the static architecture design with dynamic architecture ( metadata change & resource management) can lead to well-structured, scalable and reusable software.
Vendor lock-in occurs when a software project adopts a product technology and becomes completely dependent upon the vendor's implementation. When upgrades are done, common symptoms such as software changes and interoperability problems occur, and continuous maintenance is required to keep the system running. Sometimes, promised features get delayed or the product varies significantly from the advertised compliance to open standard.
Typically, the root cause for a few of the issues with vendor lock-in might be caused by not having an effective process for standard compliance or simply not doing a technical analysis before buying the product. Therefore, possible solutions to avoid these problems or reduce the impact could be building an "Isolation Layer" between the vendor software & application software with interoperability/consistency in mind.
Using Talend as a data integration platform can alleviate vendor lock-in problems by cutting down efforts needed to achieve big data distribution and portability between Hortonworks, Cloudera, and MapR. Or to allow migration between AWS, Azure and Google Cloud to take benefits from these cloud offerings on cost, features etc.
A "Wolf Ticket" is a product that claims openness and conformance to standards that have no enforceable meaning. The products are delivered with proprietary interfaces that may vary significantly from the published standard. A key problem here is that technology consumers often assume that openness comes with some benefits. Standards do reduce technology migration costs and improve technology stability, but, differences in the implementation of the standards often negate their assumed benefits, such as multivendor interoperability and software portability.
To tackle this, clients should work with the vendor to get the gaps filled or build custom components for critical business requirements. Using an Open Source platform like Talend has its benefits here as well because, for common requirements among clients, the roadmaps might have something already planned or at least you can build something which can be easily ported even during upgrades easily due to reliance on Java.
Reinvent the Wheel
Custom software systems are built from the ground up, even though several systems with overlapping functionality exist in the market. Because top-down analysis and design lead to new architectures and custom software, software reuse is limited and interoperability is accommodated after the fact. Typical symptoms of this anti-pattern can be commercial or open source software in the market already have these features, inadequate support for change management and interoperability, extensive effort to deliver similar functionality, closed system architecture etc.
The root cause can be due to insufficient documentation or knowledge transfer, incorrect assumptions on future business requirements etc. One possible resolution can be to perform architecture mining to facilitate understanding, design validation, refactoring and documentation. Trying to incorporate OSGI framework that promotes reuse can help as well. Talend uses OSGI container based on Apache Karaf for Application Integration implementations.
A Big Ball of Mud
A "big ball of mud" is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover, and code entropy.
The typical symptoms can be seen in maintenance, scalability areas. The possible resolutions that are given in "Reinvent the Wheel" antipattern can be tried here as well.
Here, a database is used as the message queue for routine interprocess communication. Taking into account that DBMS is a way to store information, and messages are the way to transport information, your decision should be based on the answer to the question, "Do I need persistence of data in time or is the data is consumed by the recipient?". The consequences in this software design anti-pattern can be many, like unnecessary read/write to the database, need to build other message monitoring tools etc.
The root cause can be ease of programming or lack of technical know-how on messaging systems, short-sightedness etc. The possible solution would be having a proper messaging system in place to manage load, consistency, scale. Talend ESB based on Apache Camel can be an appropriate solution based on open source technology.
An inner-platform effect refers to the tendency of software architects to create a system so customizable as to become a replica, and often a poor replica, of the software development platform they are using.
An example is found in XML, where developers sometimes favor generic element names and use attributes to store meaningful information. For example, every element might be named item and have attributes type and value. This practice requires joins across multiple attributes in order to extract meaning. As a result, XPath expressions are more convoluted, evaluation is less efficient, and structural validation provides little benefit. We have seen the similar problems occurring when some clients try to migrate from one database platform to another. The temptation to create generic jobs with incorrect designs might end up putting too much transformation logic in few jobs which can cause issues like slowness, code quality & maintenance, reliability etc. Again careful design should balance between leveraging the implicit capabilities of source systems instead of replicating the same.
An input kludge is a type of failure in software (an anti-pattern) where simple user input is not handled properly. This may cause a buffer overflow security hole, issues during deduplication, BI reporting, etc. The cause here is often caused by multiple channels of data insertions like Web Forms, API's, Mobility etc and few of which might be missed to implement the Frontend validations necessary. Here, the only long-term solution is to fix it the t source and have it corrected. But if the data is coming from a third party or might be due to mergers/acquisitions then solutions like Talend Data Quality, MDM, Data Dictionary might come handy.
Interface bloat occurs when a computer interface incorporates too many operations on some data into an interface, only to find that most of the objects cannot perform the given operations. Possible symptoms can include having one monolithic interface to do all possible operations as well as lots of unused parameters when calling the interface. Possible solutions can be logically separating interfaces into series of headers and libraries so can easily maintain the once needed. Backward compatibility should be also considered while refactoring.
Talend's data integration platform provides a very easy mechanism to build SOAP, RESTFUL API's and different version of them managed with continuous integration process using source control like Git.
A race condition is a special condition that may occur inside a critical section, where the result of multiple threads executing may differ depending on the sequence in which the threads execute, the critical section is said to contain a race condition. This can happen in Java code, File Access, DB access, etc. In Java, there are mechanisms like using synchronized block which can help solve this issue. Similarly, in a clustered environment trying to insert logs to shared files or in multiple web application sharing the same database environment setup must factor proper design configurations to avoid possible race conditions.
The few antipatterns described above are only a subset of a great number of design anomalies that an architect/developer may come across in an IT environment. As you can see, there are many assumptions based on which teams make software design decisions, so it's always a nice practice to have the assumptions well documented and maintained as a living document throughout the life cycle of all systems.
I want to conclude by saying that even with a solution like Talend which enhances ease of development, time to market, portability, nothing can prevent a system from discovering these aberrations again if design practices are not followed consistently by all teams. Until next time!