Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
[DZone Research] Observability + Performance: We want to hear your experience and insights. Join us for our annual survey (enter to win $$).
Real-Time Analytics
Announcing DZone Core 2.0!
Database Systems
This data-forward, analytics-driven world would be lost without its database and data storage solutions. As more organizations continue to transition their software to cloud-based systems, the growing demand for database innovation and enhancements has climbed to novel heights. We are upon a new era of the "Modern Database," where databases must both store data and ensure that data is prepped and primed securely for insights and analytics, integrity and quality, and microservices and cloud-based architectures.In our 2023 Database Systems Trend Report, we explore these database trends, assess current strategies and challenges, and provide forward-looking assessments of the database technologies most commonly used today. Further, readers will find insightful articles — written by several of our very own DZone Community experts — that cover hand-selected topics, including what "good" database design is, database monitoring and observability, and how to navigate the realm of cloud databases.
Design Patterns
Threat Modeling
In the fast-paced world of software development, projects need agility to respond quickly to market changes, which is only possible when the organizations and project management improve efficiency, reduce waste, and deliver value to their customers fastest. A methodology that has become very popular in this digital era is the Agile methodology. Agile strives to reduce efforts, yet it delivers high-quality features or value in each build. Within the Agile spectrum, there exists a concept known as "Pure Agile Methodology," often referred to simply as "Pure Agile," which is a refined and uncompromising approach to Agile project management. It adheres strictly to the core values of the Agile Manifesto. Adherence to the Agile Manifesto includes favoring individuals and interactions over processes and tools, working solutions over comprehensive documentation, customer collaboration over contract negotiation, and responding to change over following a plan. Though agile is being used worldwide for most software projects, the way it is implemented is not always pure agile. We must be able to discern the Pure Agile if the way it is implemented is seamless. Hence, that is also known as "Agile in its truest form." Within the Agile framework, Agile Testing plays a pivotal role in ensuring that software products are not only developed faster but also meet high-quality standards. Agile testing is a new-age approach to software testing to keep pace with the agile software development process. Agile testing is an iterative and incremental that applies the principles of agile software development to the practice of testing. It goes beyond traditional testing methods, becoming a collaborative and continuous effort throughout the project lifecycle. Agile testing is a collaborative, team-oriented process. Unlike traditional software testing, Agile testing tests systems in small increments, often developing tests before writing the code or feature. Below are the ways it is much different than traditional testing: Early involvement: Agile testing applies a 'test-first' approach. Testers are involved in the project from the beginning itself, i.e., requirements discussions, user story creation, and sprint planning. This assures that testing considerations are taken into account from the outset. Integration: In Agile testing, activities are performed with development simultaneously rather than driving them separately in the testing phase. The biggest advantage of having Agile testing is defects are detected and addressed at an early stage, which eventually helps to reduce the cost, time, and effort. User-centric: Agile testing has the most preference and importance for customer feedback, and the testing effort gets aligned as per the feedback given by the customer. Feedback-driven: Agile testing has the significance of continuous feedback. This enduring feedback and communication ensures that everyone is aligned on project goals and quality standards. TDD: As we know, test-driven development is common practice in Agile, where tests are prepared before the code is written or developed to ensure that the code meets the acceptance criteria. This promotes a "test-first" mindset among developers. Regression testing: As the product evolves with each iteration, regression testing becomes critical. New functionality changes or features shouldn't introduce regression, which can break existing functionality. Minimal documentation: Agile Testing often relies on lightweight documentation, focusing more on working software than extensive test plans and reports. Test cases may be captured as code or in simple, accessible formats. Collaboration: All Agile teams are cross-functional, with all the groups of people and skills needed to deliver value across traditional organizational silos, largely eliminating handoffs and delays. The term "Agile testing quadrants" refers to a concept introduced by Brian Marick, a software testing expert, to help teams and testers think systematically about the different types of testing they need to perform within an Agile development environment. At Scale, many types of tests are required to ensure quality: tests for code, interfaces, security, stories, larger workflows, etc. By describing a matrix (having four quadrants defined across two axes), many types of tests are necessary to ensure quality: tests for code, interfaces, security, stories, larger workflows, etc. That guides the reasoning behind these tests. Extreme Programming (XP) proponent and Agile Manifesto co-author Brian Marick helped pioneer agile testing. Agile Testing: Quadrants Q1- Contains unit and component tests. The test uses Test-Driven Development (TDD). Q2- Feature-level and capability-level acceptance tests confirm the aggregate behavior of user stories. The team automates these tests using BDD techniques. Q3- Contains exploratory tests, user acceptance tests, scenario-based tests, and final usability tests. these tests are often manual. Q4- To verify if the system meets its Non-functional Requirements (NFRs). Like Load and performance testing
I have been using a new open-source platform, API Logic Server (an open-source project on GitHub) to deliver API microservices for a client. I wanted to build a complete mobile banking API from start to finish based on the old TPC benchmark. This includes declarative business logic (a.k.a. spreadsheet-like rules), security, react-admin UI, and an Open API (Swagger) documentation. API Logic Server (ALS) creates executable projects that you can extend in your IDE. It is an open-source Python platform based on SQLAlchemy 2.0, Flask, safrs-JSON API, react-admin, and LogicBank (a declarative spreadsheet-like rules engine). ChatGPT-SQL Model I started by asking ChatGPT to "generate a banking DDL based on the old TPC benchmark for MySQL". Here is the DDL that was generated: While ChatGPT gave me a usable DDL, I then asked ChatGPT to "add columns for deposits, withdrawals, and image to Transaction" to allow the declarative rules to do the heavy lifting. Transaction Deposit DECIMAL(15,2) Withdrawal DECIMAL(15,2) Image (checks and withdrawal slips) TEXT Authentication Customer UseName VARCHAR(64) Customer Password (hash) VARCHAR(64) API Logic Server (ALS) This is a full-featured open-source Python platform (like Django) to create a complete runtime API and a react-admin back-office UI solution. The command line feature of ALS made the creation of a running server with a multi-page react-admin U and an Open API a snap. The command line feature will read the SQL "banking" schema, and create a customizable project with all the wiring needed for SQLAlchemy and a full Restful JSON:API. ALS uses a command line approach to connect to the database and create all the running components: Shell ApiLogicServer create --project_name=tpc --db_url=mysql+pymysql://root:p@localhost:3306/banking This creates a project you can open in your IDE to run and customize. Rules Declarative rules are spreadsheet-like expressions that automate backend multi-table derivations and constraints. Rules automatically execute as part of your API, making it a service. They dramatically reduce the amount of code you'd expect to write manually. Rules are entered in your IDE. They are extensible with Python and are debugged with the IDE debugger. Rules provide automatic re-use for our various TPC use cases - handling deposits and withdrawals, maintaining account balances, preventing overdrafts, and processing balance transfers. The magic is the LogicBank (an open-source spreadsheet-like engine on GitHub) that handles the ordering and execution at runtime and integrates directly with SQLAlchemy. We start the process by writing our logic in a business user-friendly way. Derive Account balance is the sum of Transaction.TotalAmount Constraint: Account.AcctBalance cannot be less than zero Constraint: Transaction.Deposits and Transaction. Withdrawals must be greater than zero Formula - Transaction.TotalAmount is Deposit less withdrawal Customers can only transfer between their own accounts Adding Rules I am using VSCode. The command-line generated code is broken up into folders like database, api, logic, security, devops, etc. Under logic/declare_logic.py, we convert our design rules into actual declarative rules. Code completion makes this a breeze. Python plus your IDE provides a Domain Specific Language for business logic. Observe that rules are simply a formalization of our design above - an executable design. Python Rule.sum(derive=models.Account.AcctBalance, as_sum_of=models.Transaction.TotalAmount) Rule.constraint(validate=models.Account, as_condition=lambda row: row.AcctBalance >= 0, error_msg="Account balance {row.AcctBalance} cannot be less than zero") Rule.formula(derive=models.Transaction.TotalAmount, as_expression=lambda row: row.Deposit - row.Withdrawal) Automated React-Admin UI ALS created a react-admin back office multi-table application for all the tables in the model. This allowed me to add a customer, checking and savings account, sample deposit transactions, test rules (e.g. sums, constraints, formula, etc.), and transfer funds. React-Admin Back Office UI OpenAPI (Swagger) ALS will also generate OpenAPI (Swagger) documentation. This is using the safrs JSON-API, which enables clients to specify child tables and columns to return (a self-service API, much like GraphQL). This API will allow the front-end developer the ability to show the customer information, all their accounts, and a list of transactions (deposits and withdrawals) in a single API request. Another nice feature is each row returns a checksum which is used to support optimistic locking. Open API (Swagger) Transfer Funds The heart of the TPC benchmark was moving funds between 2 accounts in a single transaction. In this example, we let the rules do the formulas, derivations. validations, and constraints, but we need an API to POST the JSON. One approach is using the api/custom_api.py to build a Python class to transfer funds from one account to another. Another approach is to ask ChatGPT to "generate the transfer funds SQLAlchemy code" - so I added a new Transfer table and a commit event rule to implement the change to do the same work. Rules automatically adjust the firing order (formula, sums, validations, and then the commit row event). The code below is entered in your IDE, providing code completion, debugging, etc. Python def fn_transfer_funds(row=models.Transfer, old_row=models.Transfer, logic_row=LogicRow): if logic_row.isInsert(): fromAcctId = row.FromAccountID toAcctId = row.ToAccountID amount = row.Amount from_trans = models.Transaction() from_trans.TransactionID = nextTransId() from_trans.AccountID = fromAcctId from_trans.Withdrawl = amount from_trans.TransactionType = "Transfer From" from_trans.TransactionDate = date.today() session.add(from_trans) to_trans = models.Transaction() to_trans.TransactionID = nextTransId() to_trans.AccountID = toAcctId to_trans.Deposit = amount to_trans.TransactionType = "Transfer To" to_trans.TransactionDate = date.today() session.add(to_trans) print("Funds transferred successfully!") Rule.commit_row_event(on_class=models.Transfer, calling=fn_transfer_funds Security (Authentication/Authorization) Since this is a multi-tenant model, the declarative security model needs roles and filters that authorize different role players to specific CRUD tasks. The role-based access control requires a separate data model for login, roles, and user roles. We also will need an authentication process to validate users to log in to the mobile banking system. ALS asks that you initialize the security model using the command line tool (ApiLogicServer add-auth) which creates all the necessary components. Python DefaultRolePermission(to_role=Roles.customer, can_read=True, can_update=True, can_insert=True, can_dellete=False) Grant( on_entity = models.Customer, to_role = Roles.customer, can_delete=False, filter = lambda : models.Customer.CustomerID == Security.current_user().CustomerID) Docker Container The devops/docker folder has Shell scripts to build and deploy a running Docker image that can be deployed to the cloud in a few clicks. Just modify the docker-compose properties for your database and security settings. Summary I was impressed with API Logic Server's ability to create all the running API components from a single command line request. Using ChatGPT to get started and even iterate over the dev lifecycle was seamless. The front-end developers can begin writing the login (auth) and use the API calls from the Open API (Swagger) report while the final logic and security are being instrumented. Business users can run the screens, and collaborate to ensure the real requirements are identified and met. While I am new to the Python language, this felt more like a DSL (domain-specific language) with code completion and a well-organized code space. The ALS documentation provides great help and tutorials to understand how to deliver your own logic and API. The GitHub source can be found here.
Writing clean, understandable, easy-to-support, and maintain code is hard and requires many years of experience. At least we're used to thinking this way. What if there is a way to write such a code consciously and without spending years and years developing these skills? Functions, Functions Everywhere… If we look into modern code, we'll see methods and functions. A lot of them. They are the bread and butter of modern software, the basic building block. The quality of our functions almost completely defines the quality of our software. The problem is that nobody tells us how to write functions and methods. Of course, we can see tons and tons of articles and books, which tell us "do that", and "don't do this" while writing functions. But how exactly we should do that and don't do this? Why we should or should not do this and that? There were no answers to those questions. (Note that the discussion below uses the terms "function" and "method" interchangeably because, for the discussion, the difference between methods and functions is irrelevant). Less Art, More Engineering Writing code consciously means that we clearly understand how to write it and, more importantly, why it should be written in a specific way. Doing something consciously is possible only if we clearly understand the internals. With our current coding practices, functions are considered a black box, an atomic, indivisible element, and the question "what is inside the function" is carefully ignored. Let's break this tradition and define that function has a specific structure. Quite interesting is that the idea of defining the function structure is not new, it just was not applied to regular code. Instead, writing tests using the “Given/When/Then” template is quite standard. Standard Function Structure Before providing a more formal definition, I'd like to walk through quite typical Traditional Imperative Java code shown below: Java public Comment.Id addComment(User.Id userId, Publication.Id publicationId, String commentText) { if (userId == null || userService.find(userId) == null) { throw new UnknownUserException(); } if (publicationId == null || publicationService.find(publicationId) == null) { throw new UnknownPublicationException(); } if (commentText == null) { throw new InvalidComment(); } var newComment = Comment.newComment(userId, publicationId, commentText); var commentId = commentService.addComment(newComment); return commentId; } The part of the function between lines 2 and 12 performs routine parameter checks typical for the methods/functions that are dealing with raw/unchecked input. Then, the part of the function at line 14 prepares intermediate data. Finally, the part of the function at line 16 performs the essential actions, i.e. do things that are declared by the method/function name. There is another, less obvious, but no less essential part spread across the whole function body: lines 3, 7, 11, and 18, which return error or actual calculated value. Let's call these parts "phases" and give them names according to what they implement inside this function (this is a crucial moment, I'll return to it shortly). In total, we have 3+ phases: The first phase is Validation — it is responsible for checking function arguments. It also defines function contract (in math, we would say that it defines function domain). The second phase is Consolidation — it is responsible for preparing necessary intermediate data, i.e. creating new objects, calculating or retrieving necessary data from external sources, etc., etc. This phase uses validated function parameters. For convenience, let's call prepared/retrieved/calculated data and validated function parameters Data Dependencies. The third phase is Action — it is responsible for performing things for which the function was created in the first place. The last (3+) phase is Reaction — its purpose is to adapt value(s) or knowledge that exists inside the function to the contract. This phase usually is spread across the function body and usually has two forms — for successful response and error reporting. For this reason, I'm somewhat reluctant to call it a full-fledged phase, hence the "+" in the number of phases above. With these names in mind, we are almost ready to write a more formal definition of the function structure. The last necessary element is the understanding that not every function contains all phases. So, Function Structure consists of: Zero or one Validation phase, followed by Zero or one Consolidation phase, followed by Zero or one Action phase Zero or more Reaction phases intermixed with the phases mentioned above Finally, let's return to the note above: The responsibilities of Validation and Consolidation are defined relatively to Action phase, i.e. we have the function named “addComment()”, but code in Validation and Consolidation does not add any comments. Instead, it validates parameters and collects data dependencies. If we move code from Validation into the dedicated function named (for example) “validateAddCommentParameters()”, then the same code will become the Action because it performs the actions for which the function was created. The same will happen if we move code from the Consolidation phase to a dedicated method with an appropriate name. Analyzing Function Structure One immediate result of splitting the function into phases is that it is much more transparent for analysis and code reviews. Each phase has a clearly defined purpose, phases go in defined order. Even just writing/refactoring code with the provided above structure in mind, makes code better structured and easier to understand. Interesting observation: since each phase has a dedicated responsibility, then the function, which has the Validation and/or Consolidation phases, breaks the single responsibility principle! Interesting here not the fact that we've discovered a code smell. Most seasoned Java developers would say that the function is somewhat long. But most of them would not be able to answer what exactly is wrong with the code (me too, BTW). By introducing structure, we've made the issue easy to spot even for a junior developer. So, if the presence of these phases is an issue, then how can we solve it? Now, let's remember that each phase is relative to the Action phase. Hence, by extracting Validation and Consolidation into dedicated functions, we can avoid mixing different responsibilities inside one function: Java public Comment.Id addComment(User.Id userId, Publication.Id publicationId, String commentText) { validateAddCommentParameters(userId, publicationId, commentText); return commentService.addComment(makeComment(userId, publicationId, commentText)); } private static Comment makeComment(User.Id userId, Publication.Id publicationId, String commentText) { return Comment.newComment(userId, publicationId, commentText); } private void validateAddCommentParameters(User.Id userId, Publication.Id publicationId, String commentText) { if (userId == null || userService.find(userId) == null) { throw new UnknownUserException(); } if (publicationId == null || publicationService.find(publicationId) == null) { throw new UnknownPublicationException(); } if (commentText == null) { throw new InvalidComment(); } } Notice that once Validation and Consolidation become dedicated methods, they turn into regular steps of the Action phase. This is the consequence of the relativity of the definition of phase responsibility. The refactoring is quite straightforward, but it cardinally changes the properties of the code: All three functions now consist of the Action phase only (+ Reaction, of course) Each function is focused on its own task, no more distraction from the main function purpose Each function step-by-step describes what it does. This simplifies understanding of code, its further modification, support, and maintenance Observing Abstraction Layering As mentioned here, strict layering of abstraction is essential. Hence, it is worth applying this requirement to the code as well. Although this is a "requirement", in fact, this is a convenient tool that enables a more in-depth understanding of our code and finding design issues. Applying this requirement to the Consolidation stage reveals an interesting property: each data dependency is independent of each other. If this is not the case, then most likely we have lower-level abstraction details (dependencies) leaking to the upper level. For example: Java ... var comment = commentService.find(commentId); var commentStats = commentStatsService.find(comment.statsId()); ... It's quite obvious that the internals of the comment storage are leaking to the upper level here. However, independence of data dependencies is useful not only for design issues detection. It allows natural, effortless parallelism if the code is written in a functional style (we'll take a look into this property below). Another typical case of design issue manifests itself as "continuous Consolidation": Java ... var value1 = service1.find(...); ... var value2 = service2.find(value1.field1()); ... var value3 = service3.find(value2.field2()); ... It's not so much different from the issue above, but usually, it is observed at the edge between Consolidation and Action. This issue makes it difficult to draw a boundary between phases and exposes a hidden design issue — mixing different layers of abstraction. Writing New Code Although I've started from the existing code and then refactored it, function structuring paves the way for convenient writing of the new code as well. Again, nothing radically new, just a "divide and conquer" top-down strategy: Write each function as a sequence of steps Split functionality as much as necessary until you can implement it with a call to an existing function/method or implement using a language construct within a single level of nesting. Of course, this is not a strict rule, there are always different cases and different requirements. Switch To Functional Code The code above is a typical imperative code, with all issues specific to such code, including the main one — loss of context. The code above could be written in a functional style, which is much better at preserving context. A direct rewrite of the example above (using the core part of the Pragmatic library) results in the following code: Java public Result<Comment.Id> addComment(Option<User.Id> userId, Option<Publication.Id> publicationId, Option<String> commentText) { return all( userId.toResult(Errors.UNKNOWN_USER).flatMap(userService::find), publicationId.toResult(Errors.UNKNOWN_PUBLICATION).flatMap(publicationService::find), commentText.toResult(Errors.INVALID_COMMENT) ).map((user, publication, comment) -> Comment.newComment(user.id(), publication.id(), comment)) .flatMap(commentService::addComment); } Perhaps not ideal, although the lack of typical null-checking noise makes code much more concise. Obviously, the direct rewrite didn't change the structure of the function, so it suffers from the same issue as the imperative version — mixed phases (and responsibilities). Simple refactoring addresses this issue: Java public Result<Comment.Id> addComment(Option<User.Id> userId, Option<Publication.Id> publicationId, Option<String> commentText) { return validateAndLoad(userId, publicationId, commentText) .map(SimpleCallFPRefactored::makeComment) .flatMap(commentService::addComment); } private static Comment makeComment(User user, Publication publication, String comment) { return Comment.newComment(user.id(), publication.id(), comment); } private Mapper3<User, Publication, String> validateAndLoad(Option<User.Id> userId, Option<Publication.Id> publicationId, Option<String> commentText) { return all( userId.toResult(Errors.UNKNOWN_USER).flatMap(userService::find), publicationId.toResult(Errors.UNKNOWN_PUBLICATION).flatMap(publicationService::find), commentText.toResult(Errors.INVALID_COMMENT) ); } The refactored version remains concise enough, but now it's much cleaner. A few important observations of the functional version: It preserves much more context — it is clear, from the method signature, that it accepts potentially missing values and may return an error There is basically no way to accidentally omit checking input, the resulting code just does not compile The functional version explicitly relies on the fact of independence of data dependencies The last point is essential because it exposes inherent parallelism in the code, i.e. parts that can be naturally done in parallel. With minimal changes, the functional version can be made asynchronous: Java public Promise<Comment.Id> addComment(Option<User.Id> userId, Option<Publication.Id> publicationId, Option<String> commentText) { return validateAndLoad(userId, publicationId, commentText) .map(this::makeComment) .flatMap(commentService::addComment); } private Comment makeComment(User user, Publication publication, String comment) { return Comment.newComment(user.id(), publication.id(), comment); } private Mapper3<User, Publication, String> validateAndLoad(Option<User.Id> userId, Option<Publication.Id> publicationId, Option<String> commentText) { return all( resolved(userId.toResult(Errors.UNKNOWN_USER)).flatMap(userService::find), resolved(publicationId.toResult(Errors.UNKNOWN_PUBLICATION)).flatMap(publicationService::find), resolved(commentText.toResult(Errors.INVALID_COMMENT)) ); } It is worth emphasizing that it does not just perform processing asynchronously, but performs two steps of validation in parallel. This transformation required very little effort and preserved code clarity and maintainability. Conclusion Don't take the considerations above as a scripture. My goal is to show how powerful is the introduction of the structure into the function. You can introduce your own structure and rules which will better fit your projects and your requirements. It's hard to underestimate the value of writing code consciously, with a clear understanding, of how to write it and, more importantly, why. Function structuring enables us to achieve this. Although the example code above uses Java, function structuring is applicable to the majority of languages that enable users to write functions and/or methods.
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Originally, the term "data pipeline" was focused primarily on the movement of data from one point to another, like a technical mechanism to ensure data flows from transactional databases to destinations such as data warehouses or to aggregate this data for analysis. Fast forward to the present day, data pipelines are no longer seen as IT operations but as a core component of a business's transformation model. New cloud-based data orchestrators are an example of evolution that allows integrating data pipelines seamlessly with business processes, and they have made it easier for businesses to set up, monitor, and scale their data operations. At the same time, data repositories are evolving to support both operational and analytical workloads on the same engine. Figure 1: Retail business replenishment process Consider a replenishment process for a retail company, a mission-critical business process, in Figure 1. The figure is a clear example of where the new data pipeline approach is having a transformational impact on the business. Companies are evolving from corporate management applications to new data pipelines that include artificial intelligence (AI) capabilities to create greater business impact. Figure 2 demonstrates an example of this. Figure 2: Retail replenishment data pipeline We no longer see a data process based on data movement, but rather we see a business process that includes machine learning (ML) models or integration with distribution systems. It will be exciting to see how data pipelines will evolve with the emergence of the new generative AI. Data Pipeline Patterns All data pipeline patterns are composed of the following stages, although each one of them has a workflow and use cases that make them different: Extract – To retrieve data from the source system without modifying it. Data can be extracted from several sources such as databases, files, APIs, streams, or more. Transform – To convert the extracted data into final structures that are designed for analysis or reporting. The data transformed is stored in an intermediate staging area. Load – To load the transformed data into the final target database. Currently and after the evolution of data pipelines, these activities are known as data ingestion and determine their pattern as we will see below. Here are some additional activities and components that are now part and parcel of modern data pipelines: Data cleaning is a crucial step in the data pipeline process that involves identifying and correcting inconsistencies and inaccuracies in datasets, such as removing duplicate records or handling missing values. Data validation ensures the data being collected or processed is accurate, reliable, and meets the specified criteria or business rules. This includes whether the data is of the correct type, falls within a specified range, or that all required data is present and not missing Data enrichment improves the quality, depth, and value of the dataset by adding relevant supplementary information that was not originally present with additional information from external sources. Machine learning can help enhance various stages of the pipeline from data collection to data cleaning, transformation, and analysis, thus making it more efficient and effective. Extract, Transform, Load Extract, transform, load (ETL) is a fundamental process pattern in data warehousing that involves moving data from the source systems to a centralized repository, usually a data warehouse. In ETL, all the load related to the transformation and storage of the raw data is executed in a layer previous to the target system. Figure 3: ETL data pipeline The workflow is as follows: Data is extracted from the source system. Data is transformed into the desired format in an intermediate staging area. Transformed data is loaded into the data warehouse. When to use this pattern: Target system performance – If the target database, usually a data warehouse, has limited resources and poor scalability, we want to minimize the impact on performance. Target system capacity – When the target systems have limited storage capacity or the GB price is very high, we are interested in transforming and storing the raw data in a cheaper layer. Pre-defined structure – When the structure of the target system is already defined. ETL Use Case This example is the classic ETL for an on-premise system where, because of the data warehouse computational and storage capacity, we are neither interested in storing the raw data nor in executing the transformation in the data warehouse itself. It is more economical and efficient when it involves an on-premises solution that is not highly scalable or at a very high cost. Figure 4: ETL sales insights data pipeline Extract, Load, Transform Modern cloud-based data warehouses and data lakes, such as Snowflake or BigQuery, are highly scalable and are optimized for in-house processing that allows for handling large-scale transformations more efficiently in terms of performance and cheaper in terms of cost. In extract, load, transform (ELT), the data retrieved in the source systems is loaded directly without transformation into a raw layer of the target system. Finally, the following transformations are performed. This pattern is probably the most widely used in modern data stack architectures. Figure 5: ELT data pipeline The workflow is as follows: Data is extracted from the source system. Data is loaded directly into the data warehouse. Transformation occurs within the data warehouse itself. When to use this pattern: Cloud-based modern warehouse – Modern data warehouses are optimized for in-house processing and can handle large-scale transformations efficiently. Data volume and velocity – When handling large amounts of data or near real-time processing. ELT Use Case New cloud-based data warehouse and data lake solutions are high-performant and highly scalable. In these cases, data repositories are better suited for work than external processes. The transformation process can take advantage of new features and run data transformation queries inside the data warehouse faster and at a lower cost. Figure 6: ELT sales insights data pipeline Reverse ETL Reverse ETL is a new data pattern that has grown significantly in recent years and has become fundamental for businesses. It is composed of the same stages as a traditional ETL, but functionally, it does just the opposite. It takes data from the data warehouse or data lake and loads it into the operational system. Nowadays, analytical solutions are generating information with a differential value for businesses and also in a very agile manner. Bringing this information back into operational systems allows it to be actionable across other parts of the business in a more efficient and probably higher impact way. Figure 7: Reverse ETL data pipeline The workflow is as follows: Data is extracted from the data warehouse. Data is transformed into the desired format in an intermediate staging area. Data is loaded directly into the operational system. When to use this pattern: Operational use of analytical data – to send back insights to operational systems to drive business processes Near real-time business decisions – to send back insights to systems that can trigger near real-time decisions Reverse ETL Use Case One of the most important things in e-commerce is to be able to predict what items your customers are interested in; this type of analysis requires different sources of information, both historical and real-time. The data warehouse contains historical and real-time data on customer behavior, transactions, website interactions, marketing campaigns, and customer support interactions. The reverse ETL process enables e-commerce to operationalize the insights gained from its data analysis and take targeted actions to enhance the shopping experience and increase sales. Figure 8: Reverse ETL customer insights data pipeline The Rise of Real-Time Data Processing As businesses become more data-driven, there's an increasing need to have actionable information as quickly as possible. This evolution has driven the transition from batch processing to real-time processing that allows processing the data immediately as it arrives. The advent of new technological tools and platforms that are capable of handling real-time data processing such as Apache Kafka, Apache Pulsar, or Apache Flink have made it possible to build real-time data pipelines. Real-time analytics became crucial for scenarios like fraud detection, IoT, edge computing, recommendation engines, and monitoring systems. Combined with AI, it allows businesses to make automatic, on-the-fly decisions. The Integration of AI and Advanced Analytics Advancements in AI, particularly in areas like generative AI and large language models (LLMs), will transform data pipeline landscape capabilities such as enrichment, data quality, data cleansing, anomaly detection, and transformation automation. Data pipelines will evolve exponentially in the coming years, becoming a fundamental part of the digital transformation, and companies that know how to take advantage of these capabilities will undoubtedly be in a much better position. Some of the activities where generative AI will be fundamental and will change the value and way of working with data pipelines include: Data cleaning and transformation Anomaly detection Enhanced data privacy Real-time processing AI and Advanced Analytics Use Case E-commerce platforms receive many support requests from customers every day, including a wide variety of questions and responses written in different conversational styles. Increasing the efficiency of the chatbot is so important to improve the customer experience, and many companies decide to implement a chatbot using a GPT model. In this case, we need to provide all information from questions, answers, and technical product documentation that is available in different formats. The new generative AI and LLM models not only allow us to provide innovative solutions for interacting with humans, but also to increase the capabilities of our data pipelines, such as data cleaning or transcriptions. Training the GPT model requires clean and preprocessed text data. This involves removing any personally identifiable information, correcting spelling, correcting grammar mistakes, or removing any irrelevant information. A GPT model can be trained to perform these tasks automatically. Figure 9: ETL data pipeline with AI for chatbot content ingestion Conclusion Data pipelines have evolved a lot in the last few years, initially with the advent of streaming platforms and later with the explosion of the cloud and new data solutions. This evolution means that every day they have a greater impact on business value, moving from a data movement solution to a key element in the business transformation. The explosive growth of generative AI solutions in the last year has opened up an exciting path, as they have a significant impact on all stages of the data pipelines; therefore, the near future is undoubtedly linked to AI. Such a disruptive evolution requires the adaptation of organizations, teams, and engineers to enable them to use the full potential of technology. The data engineer role must evolve to acquire more business and machine learning skills. This is a new digital transformation, and perhaps the most exciting and complex movement in recent decades. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data management is an ever-changing landscape, but throughout its history, a few use cases have driven most of the value and hence the majority of innovation. The following is a list of the key features enabled by effective data management: Informed decision-making Regulatory compliance Improved efficiency Data quality and security Competitive advantage As data volume within organizations has scaled ever larger, the underlying technologies have had to evolve and adapt to keep up with the ever-increasing demand imposed by such growth. Traditionally, the majority of data was consolidated into a centrally managed platform known as a data warehouse. However, over the last decade, new technologies and data strategies have emerged in an attempt to provide more cost-effective solutions. Two new paradigms have emerged as alternatives to the traditional data warehouse stack: the data lake and the data lakehouse. This article will outline what each of these data management strategies entails and how they map to various selection criteria such as cost, data volume, data integration, security and compliance, ease of use, and a number of other pivotal requirements. Data Warehouse vs. Data Lake vs. Data Lakehouse Data warehouses played a crucial role in data-driven organizations for years, supporting business intelligence and historical data analysis. However, as data volumes grew, their integrated storage couldn't scale cost-effectively. This led to the emergence of data lakes, shifting focus to scalable object storage over highly optimized solutions. Data lakes enabled storing vast data amounts, including unstructured or semi-structured data. However, ingestion efficiency and integration with traditional tools posed challenges. In 2019, the term "data lakehouse" was introduced to bridge the gap between data warehouses and data lakes. The goal is a unified platform for structured and unstructured data, fostering collaboration among data professionals. The below table summarizes the main decision points and how each architecture addresses (or doesn't) that item: Data Management Architecture Feature Comparison Criteria Data Warehouse Data Lake Data Lakehouse Data type support Primarily structured Diverse (structured, semi-structured, unstructured) Diverse (structured, semi-structured, unstructured) Schema enforcement Enforced schema Schema-on-read Structured and flexible Data processing High-performance SQL Flexibility for exploration, ad hoc analysis Both high-performance SQL and exploration Data integration Structured ETL Supports batch and real-time ingestion Supports batch and real-time ingestion Data storage Structured, columnar Raw and native format Raw and structured format Data quality and governance Strong governance Requires careful management Supports governance with flexibility Use cases Structured analytics, complex reporting Data exploration, machine learning, raw data processing Combines structured analytics and data exploration Query performance High-speed, low latency Varied, depending on tools and tuning High-performance with flexibility Historical analysis Yes Yes Yes Scalability Limited for very large data Scales horizontally Scales for data growth Cost-effectiveness Can be expensive Cost-effective for storing raw data Balances cost and performance Regulatory compliance Often supported Requires implementation Supports compliance measures Vendor ecosystem Well-established Varied and expanding Evolving and expanding User profiles Data analysts, business intelligence Data engineers and scientists, analysts Data engineers and scientists, analysts Real-time analytics Possible but limited Varies depending on tools Supports real-time analytics Schema evolution Requires schema changes Flexible with schema evolution Supports both schema changes and structure Data exploration Limited capability Flexible for exploration Supports both analytics and exploration Hybrid architecture Can be integrated with data lakes Can be combined with data warehouses Combines elements of both Table 1 Data Warehouse Data warehouses excel at processing structured data with a well-defined schema. With these restrictions, a data warehouse can offer highly efficient querying capabilities. Furthermore, they have strong integration with business intelligence tooling, and have robust integrated support for data quality and governance. The following table gives an overview of data warehouse aspects and how they may benefit or detract from a given use case: Data Warehouse Aspect Coverage Aspect Benefits Weaknesses Structured data Efficient storage and management Limited support for unstructured data Optimized queries High-performance querying Expensive Data consistency Enforced data consistency Inflexible schema Table 2 Benefits of Using a Data Warehouse Data warehouses provide several key advantages: Excel in efficiently storing and managing structured data, making complex analytics accessible through predefined schemas that enhance user-friendliness Offer high-performance querying capabilities, enabling the execution of complex analytical tasks and scaling to maintain query speed as data volumes expand Prioritize data consistency by enforcing structured schemas and implementing robust data governance measures, ensuring data integrity and reliability, making them a reliable single source of truth for decision-making within organizations Limitations of Using a Data Warehouse The weaknesses of a data warehouse revolve around cost, inflexible schema, and limited support for unstructured data. Implementing and maintaining a data warehouse can be expensive, with substantial initial setup and ongoing operational costs. Its reliance on a predefined schema makes it less adaptable to changes in data structure or the inclusion of new data sources, potentially hindering agility. Additionally, data warehouses are primarily designed for structured data, which limits their ability to efficiently handle unstructured or semi-structured data, potentially missing out on valuable insights from diverse data sources. Data Lake The data lake architecture evolved as a response to the rising costs of operating a data warehouse. A primary goal of this design was to lower the bar, in terms of cost, for storing vast amounts of data. Although data lakes provide a low price point for storage, they lack some of the integrations and features that have been developed in data warehouses over the years. Below are some of the trade-offs to consider when building a data lake: Data Lake Aspect Coverage Aspect Benefits Limitations Scalability Highly scalable, handles massive data volumes Data quality concerns Cost-effectiveness Cost-effective for storing raw data Complexity in data processing Storage of raw and unstructured data Accommodates diverse data types Potential data silos Table 3 Benefits of Using a Data Lake A data lake architecture offers distinct advantages for organizations seeking to harness their data effectively: Provides exceptional scalability, effortlessly accommodating massive data volumes as businesses grow Proves highly cost-effective, offering a budget-friendly solution for storing raw data in its native format Excels at storage, allowing organizations to effortlessly ingest and manage diverse data types, including unstructured and semi-structured data This versatility enables businesses to leverage their entire data ecosystem, promoting innovation and data-driven decision-making while keeping costs in check. Limitations of Using a Data Lake Despite its strengths, a data lake architecture is not without its challenges. It often introduces complexity in data processing, as the flexibility it offers can lead to difficulties in data organization, quality assurance, and integration. Moreover, there is a risk of potential data silos within a data lake, where data may become fragmented and less accessible, hindering the ability to derive valuable insights. Data discovery becomes a concern. To maximize the benefits of a data lake, organizations must carefully plan their data governance and integration strategies to mitigate these challenges effectively. Data Lakehouse The data lakehouse paradigm seeks to balance the benefits and trade-offs of a data warehouse and a data lake. This is accomplished by providing an integrated solution on top of what was traditionally data lake components. The goal is to provide the scalability, flexibility, and cost benefits of a data lake while still offering the performance, data governance, and user-friendliness of a data warehouse. DATA LAKEHOUSE ASPECT COVERAGE Aspect Benefits Limitations Hybrid architecture Combines data warehouse and data lake capabilities Architectural complexity Cost-to-performance flexibility Offers cost-effective scalability with high performance Potential performance issues Real-time analytics Supports real-time analytics Evolving technology landscape Table 4 Benefits of Using a Data Lakehouse A data lakehouse architecture presents a compelling solution for organizations aiming to unlock the full potential of their data. By seamlessly combining the robust features of a data warehouse and the flexibility of a data lake, it offers a comprehensive data management ecosystem. One of its standout advantages lies in its cost-to-performance flexibility, allowing businesses to balance their data storage and processing needs efficiently, optimizing both cost-effectiveness and performance. Additionally, the data lakehouse empowers organizations with real-time analytics capabilities, enabling them to make data-driven decisions and respond swiftly to changing trends and opportunities. This amalgamation of features positions the data lakehouse as a versatile and powerful solution for modern data management and analytics needs. Limitations of Using a Data Lakehouse A data lakehouse does come with certain limitations. One key concern is architectural complexity, as the integration of these diverse features can lead to intricate data management structures, requiring thorough planning and management. Potential performance issues may arise due to the combination of features, and organizations must carefully optimize their data processing to prevent bottlenecks. Additionally, the ever-evolving technology landscape means that staying up-to-date with the latest advancements and best practices is essential for maximizing the benefits of a data lakehouse. Despite these limitations, its capacity to provide a comprehensive data solution often outweighs these challenges for organizations seeking to harness the full potential of their data assets. The Future of Data Storage The future of data management and storage is poised to undergo transformative changes driven by evolving trends. One of the pivotal developments is the growing emphasis on interoperability between existing data architectures, including data warehouses, data lakes, and data lakehouses. Organizations are recognizing the need to seamlessly integrate these technologies to harness the full spectrum of their data assets efficiently. Simultaneously, data governance and data quality are becoming paramount concerns, driven by the exponential growth of data volumes and the increasing importance of compliance and data accuracy. As organizations navigate this landscape, they are likely to adopt comprehensive data governance strategies, leveraging automation and AI-powered tools to enhance data quality, traceability, and privacy. Overall, the future of data management and storage will revolve around achieving a harmonious synergy between diverse data architectures, underpinned by robust data governance practices to ensure the reliability and integrity of data assets in an ever-evolving digital ecosystem. Evolving Technologies Machine learning and AI technologies will play a pivotal role in automating data processing, analysis, and decision-making, enabling organizations to derive deeper insights from their data assets. Moreover, the rise of edge computing and the Internet of Things (IoT) will necessitate real-time data management capabilities, prompting the adoption of cloud-native solutions and distributed data architectures. As data privacy and security concerns grow, robust data governance frameworks will become imperative, ensuring that organizations maintain compliance with evolving regulations while safeguarding sensitive data. Collaboration across departments and data-driven cultures will be pivotal, with data democratization empowering a broader range of employees to harness data for informed decision-making. In this dynamic landscape, the ability to adapt swiftly to emerging technologies and data management trends will be the cornerstone of success in the data-driven future. Hybrid Solutions Hybrid solutions in data management architecture overcome limitations of different storage types. Such hybrid solutions are becoming more popular, and are starting to precipitate fully new designs. A model that exemplifies this concept involves not just the separation of compute and storage, as often seen in data lakes, but also a distinct storage platform integrated separately from the compute layer. This has played out most visibly in the emergence of open table formats such as Iceberg, Hudi, and Delta Lake. Conclusion The decision between a data warehouse, data lake, or data lakehouse involves a complex set of trade-offs. Data warehouses excel in structured analytics but may lack flexibility for diverse data types. Data lakes offer versatility but require careful data governance. The emerging data lakehouse concept seeks to balance these trade-offs by combining features of both, offering a unified platform; however, this choice is not one-size-fits-all. Organizations must weigh their specific business needs and adapt their data management strategies accordingly, considering factors such as data type diversity, scalability, cost, and the evolving technology landscape. The key lies in making informed decisions that align with current and future data requirements and recognizing the importance of ongoing adaptation in the dynamic world of data management. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Data-driven design is a game changer. It uses real data to shape designs, ensuring products match user needs and deliver user-friendly experiences. This approach fosters constant improvement through data feedback and informed decision-making for better results. In this article, we will explore the importance of data-driven design patterns and principles, and we will look at an example of how the data-driven approach works with artificial intelligence (AI) and machine learning (ML) model development. Importance of the Data-Driven Design Data-driven design is crucial as it uses real data to inform design decisions. This approach ensures that designs are tailored to user needs, resulting in more effective and user-friendly products. It also enables continuous improvement through data feedback and supports informed decision-making for better outcomes. Data-driven design includes the following: Data visualization – Aids designers in comprehending trends, patterns, and issues, thus leading to effective design solutions. User-centricity – Data-driven design begins with understanding users deeply. Gathering data about user behavior, preferences, and challenges enables designers to create solutions that precisely meet user needs. Iterative process – Design choices are continuously improved through data feedback. This iterative method ensures designs adapt and align with user expectations as time goes on. Measurable outcomes – Data-driven design targets measurable achievements, like enhanced user engagement, conversion rates, and satisfaction. This is a theory, but let's reinforce it with good examples of products based on data-driven design: Netflix uses data-driven design to predict what content their customers will enjoy. They analyze daily plays, subscriber ratings, and searches, ensuring their offerings match user preferences and trends. Uber uses data-driven design by collecting and analyzing vast amounts of data from rides, locations, and user behavior. This helps them optimize routes, estimate fares, and enhance user experiences. Uber continually improves its services by leveraging data insights based on real-world usage patterns. Waze uses data-driven design by analyzing real-time GPS data from drivers to provide accurate traffic updates and optimal route recommendations. This data-driven approach ensures users have the most up-to-date and efficient navigation experience based on the current road conditions and user behavior. Common Data-Driven Architectural Principles and Patterns Before we jump into data-driven architectural patterns, let's reveal what data-driven architecture and its fundamental principles are. Data-Driven Architectural Principles Data-driven architecture involves designing and organizing systems, applications, and infrastructure with a central focus on data as a core element. Within this architectural framework, decisions concerning system design, scalability, processes, and interactions are guided by insights and requirements derived from data. Fundamental principles of data-driven architecture include: Data-centric design – Data is at the core of design decisions, influencing how components interact, how data is processed, and how insights are extracted. Real-time processing – Data-driven architectures often involve real-time or near real-time data processing to enable quick insights and actions. Integration of AI and ML – The architecture may incorporate AI and ML components to extract deeper insights from data. Event-driven approach – Event-driven architecture, where components communicate through events, is often used to manage data flows and interactions. Data-Driven Architectural Patterns Now that we know the key principles, let's look into data-driven architecture patterns. Distributed data architecture patterns include the data lakehouse, data mesh, data fabric, and data cloud. Data Lakehouse Data lakehouse allows organizations to store, manage, and analyze large volumes of structured and unstructured data in one unified platform. Data lakehouse architecture provides the scalability and flexibility of data lakes, the data processing capabilities, and the query performance of data warehouses. This concept is perfectly implemented in Delta Lake. Delta Lake is an extension of Apache Spark that adds reliability and performance optimizations to data lakes. Data Mesh The data mesh pattern treats data like a product and sets up a system where different teams can easily manage their data areas. The data mesh concept is similar to how microservices work in development. Each part operates on its own, but they all collaborate to make the whole product or service of the organization. Companies usually use conceptual data modeling to define their domains while working toward this goal. Data Fabric Data fabric is an approach that creates a unified, interconnected system for managing and sharing data across an organization. It integrates data from various sources, making it easily accessible and usable while ensuring consistency and security. A good example of a solution that implements data fabric is Apache NiFi. It is an easy-to-use data integration and data flow tool that enables the automation of data movement between different systems. Data Cloud Data cloud provides a single and adaptable way to access and use data from different sources, boosting teamwork and informed choices. These solutions offer tools for combining, processing, and analyzing data, empowering businesses to leverage their data's potential, no matter where it's stored. Presto exemplifies an open-source solution for building a data cloud ecosystem. Serving as a distributed SQL query engine, it empowers users to retrieve information from diverse data sources such as cloud storage systems, relational databases, and beyond. Now we know what data-driven design is, including its concepts and patterns. Let's have a look at the pros and cons of this approach. Pros and Cons of Data-Driven Design It's important to know the strong and weak areas of the particular approach, as it allows us to choose the most appropriate approach for our architecture and product. Here, I gathered some pros and cons of data-driven architecture: PROS AND CONS OF DATA-DRIVEN DESIGN Pros Cons Personalized experiences: Data-driven architecture supports personalized user experiences by tailoring services and content based on individual preferences. Privacy concerns: Handling large amounts of data raises privacy and security concerns, requiring robust measures to protect sensitive information. Better customer understanding: Data-driven architecture provides deeper insights into customer needs and behaviors, allowing businesses to enhance customer engagement. Complex implementation: Implementing data-driven architecture can be complex and resource-intensive, demanding specialized skills and technologies. Informed decision-making: Data-driven architecture enables informed and data-backed decision-making, leading to more accurate and effective choices. Dependency on data availability: The effectiveness of data-driven decisions relies on the availability and accuracy of data, leading to potential challenges during data downtimes. Table 1 Data-Driven Approach in ML Model Development and AI A data-driven approach in ML model development involves placing a strong emphasis on the quality, quantity, and diversity of the data used to train, validate, and fine-tune ML models. A data-driven approach involves understanding the problem domain, identifying potential data sources, and gathering sufficient data to cover different scenarios. Data-driven decisions help determine the optimal hyperparameters for a model, leading to improved performance and generalization. Let's look at the example of the data-driven architecture based on AI/ML model development. The architecture represents the factory alerting system. The factory has cameras that shoot short video clips and photos and send them for analysis to our system. Our system has to react quickly if there is an incident. Below, we share an example of data-driven architecture using Azure Machine Learning, Data Lake, and Data Factory. This is only an example, and there are a multitude of tools out there that can leverage data-driven design patterns. The IoT Edge custom module captures real-time video streams, divides them into frames, and forwards results and metadata to Azure IoT Hub. The Azure Logic App watches IoT Hub for incident messages, sending SMS and email alerts, relaying video fragments, and inferencing results to Azure Data Factory. It orchestrates the process by fetching raw video files from Azure Logic App, splitting them into frames, converting inferencing results to labels, and uploading data to Azure Blob Storage (the ML data repository). Azure Machine Learning begins model training, validating data from the ML data store, and copying required datasets to premium blob storage. Using the dataset cached in premium storage, Azure Machine Learning trains, validates model performance, scores against the new model, and registers it in the Azure Machine Learning registry. Once the new ML inferencing module is ready, Azure Pipelines deploys the module container from Container Registry to the IoT Edge module within IoT Hub, updating the IoT Edge device with the updated ML inferencing module. Figure 1: Smart alerting system with data-driven architecture Conclusion In this article, we dove into data-driven design concepts and explored how they merge with AI and ML model development. Data-driven design uses insights to shape designs for better user experiences, employing iterative processes, data visualization, and measurable outcomes. We've seen real-world examples like Netflix using data to predict content preferences and Uber optimizing routes via user data. Data-driven architecture, encompassing patterns like data lakehouse and data mesh, orchestrates data-driven solutions. Lastly, our factory alerting system example showcases how AI, ML, and data orchestrate an efficient incident response. A data-driven approach empowers innovation, intelligent decisions, and seamless user experiences in the tech landscape. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
In today's digital landscape, it's not just about building functional systems; it's about creating systems that scale smoothly and efficiently under demanding loads. But as many developers and architects can attest, scalability often comes with its own unique set of challenges. A seemingly minute inefficiency, when multiplied a million times over, can cause systems to grind to a halt. So, how can you ensure your applications stay fast and responsive, regardless of the demand? In this article, we'll delve deep into the world of performance optimization for scalable systems. We'll explore common strategies that you can weave into any codebase, be it front end or back end, regardless of the language you're working with. These aren't just theoretical musings; they've been tried and tested in some of the world's most demanding tech environments. Having been a part of the team at Facebook, I've personally integrated several of these optimization techniques into products I've helped bring to life, including the lightweight ad creation experience in Facebook and the Meta Business Suite. So whether you're building the next big social network, an enterprise-grade software suite, or just looking to optimize your personal projects, the strategies we'll discuss here will be invaluable assets in your toolkit. Let's dive in. Prefetching Prefetching is a performance optimization technique that revolves around the idea of anticipation. Imagine a user interacting with an application. While the user performs one action, the system can anticipate the user's next move and fetch the required data in advance. This results in a seamless experience where data is available almost instantly when needed, making the application feel much faster and responsive. Proactively fetching data before it's needed can significantly enhance the user experience, but if done excessively, it can lead to wasted resources like bandwidth, memory, and even processing power. Facebook employs pre-fetching a lot, especially for their ML-intensive operations such as "Friends suggestions." When Should I Prefetch? Prefetching involves the proactive retrieval of data by sending requests to the server even before the user explicitly demands it. While this sounds promising, a developer must ensure the balance is right to avoid inefficiencies. A. Optimizing Server Time (Backend Code Optimizations) Before jumping into prefetching, it's wise to ensure that the server response time is optimized. Optimal server time can be achieved through various backend code optimizations, including: Streamlining database queries to minimize retrieval times. Ensuring concurrent execution of complex operations. Reducing redundant API calls that fetch the same data repeatedly. Stripping away any unnecessary computations that might be slowing down the server response. B. Confirming User Intent The essence of prefetching is predicting the user's next move. However, predictions can sometimes be wrong. If the system fetches data for a page or feature the user never accesses, it results in resource wastage. Developers should employ mechanisms to gauge user intent, such as tracking user behavior patterns or checking active engagements, ensuring that data isn't fetched without a reasonably high probability of being used. How To Prefetch Prefetching can be implemented using any programming language or framework. For the purpose of demonstration, let's look at an example using React. Consider a simple React component. As soon as this component finishes rendering, an AJAX call is triggered to prefetch data. When a user clicks a button in this component, a second component uses the prefetched data: JavaScript import React, { useState, useEffect } from 'react'; import axios from 'axios'; function PrefetchComponent() { const [data, setData] = useState(null); const [showSecondComponent, setShowSecondComponent] = useState(false); // Prefetch data as soon as the component finishes rendering useEffect(() => { axios.get('https://api.example.com/data-to-prefetch') .then(response => { setData(response.data); }); }, []); return ( <div> <button onClick={() => setShowSecondComponent(true)}> Show Next Component </button> {showSecondComponent && <SecondComponent data={data} />} </div> ); } function SecondComponent({ data }) { // Use the prefetched data in this component return ( <div> {data ? <div>Here is the prefetched data: {data}</div> : <div>Loading...</div>} </div> ); } export default PrefetchComponent; In the code above, the PrefetchComponent fetches data as soon as it's rendered. When the user clicks the button, SecondComponent gets displayed, which uses the prefetched data. Memoization In the realm of computer science, "Don't repeat yourself" isn't just a good coding practice; it's also the foundation of one of the most effective performance optimization techniques: memoization. Memoization capitalizes on the idea that re-computing certain operations can be a drain on resources, especially if the results of those operations don't change frequently. So, why redo what's already been done? Memoization optimizes applications by caching computation results. When a particular computation is needed again, the system checks if the result exists in the cache. If it does, the result is directly retrieved from the cache, skipping the actual computation. In essence, memoization involves creating a memory (hence the name) of past results. This is especially useful for functions that are computationally expensive and are called multiple times with the same inputs. It's akin to a student solving a tough math problem and jotting down the answer in the margin of their book. If the same question appears on a future test, the student can simply reference the margin note rather than work through the problem all over again. When Should I Memoize? Memoization isn't a one-size-fits-all solution. In certain scenarios, memoizing might consume more memory than it's worth. So, it's crucial to recognize when to use this technique: When the data doesn’t change very often: Functions that return consistent results for the same inputs, especially if these functions are compute-intensive, are prime candidates for memoization. This ensures that the effort taken to compute the result isn't wasted on subsequent identical calls. When the data is not too sensitive: Security and privacy concerns are paramount. While it might be tempting to cache everything, it's not always safe. Data like payment information, passwords, and other personal details should never be cached. However, more benign data, like the number of likes and comments on a social media post, can safely be memoized to improve performance. How To Memoize Using React, we can harness the power of hooks like useCallback and useMemo to implement memoization. Let's explore a simple example: JavaScript import React, { useState, useCallback, useMemo } from 'react'; function ExpensiveOperationComponent() { const [input, setInput] = useState(0); const [count, setCount] = useState(0); // A hypothetical expensive operation const expensiveOperation = useCallback((num) => { console.log('Computing...'); // Simulating a long computation for(let i = 0; i < 1000000000; i++) {} return num * num; }, []); const memoizedResult = useMemo(() => expensiveOperation(input), [input, expensiveOperation]); return ( <div> <input value={input} onChange={e => setInput(e.target.value)} /> <p>Result of Expensive Operation: {memoizedResult}</p> <button onClick={() => setCount(count + 1)}>Re-render component</button> <p>Component re-render count: {count}</p> </div> ); } export default ExpensiveOperationComponent; In the above example, the expensiveOperation function simulates a computationally expensive task. We've used the useCallback hook to ensure that the function doesn't get redefined on each render. The useMemo hook then stores the result of the expensiveOperation so that if the input doesn't change, the computation doesn't run again, even if the component re-renders. Concurrent Fetching Concurrent fetching is the practice of fetching multiple sets of data simultaneously rather than one at a time. It's similar to having several clerks working at a grocery store checkout instead of just one: customers get served faster, queues clear more quickly, and overall efficiency improves. In the context of data, since many datasets don't rely on each other, fetching them concurrently can greatly accelerate page load times, especially when dealing with intricate data that requires more time to retrieve. When To Use Concurrent Fetching? When each data is independent, and the data is complex to fetch: If the datasets being fetched have no dependencies on one another and they take significant time to retrieve, concurrent fetching can help speed up the process. Use mostly in the back end and use carefully in the front end: While concurrent fetching can work wonders in the back end by improving server response times, it must be employed judiciously in the front end. Overloading the client with simultaneous requests might hamper the user experience. Prioritizing network calls: If data fetching involves several network calls, it's wise to prioritize one major call and handle it in the foreground, concurrently processing the others in the background. This ensures that the most crucial data is retrieved first while secondary datasets load simultaneously. How To Use Concurrent Fetching In PHP, with the advent of modern extensions and tools, concurrent processing has become simpler. Here's a basic example using the concurrent {} block: PHP <?php use Concurrent\TaskScheduler; require 'vendor/autoload.php'; // Assume these are some functions that fetch data from various sources function fetchDataA() { // Simulated delay sleep(2); return "Data A"; } function fetchDataB() { // Simulated delay sleep(3); return "Data B"; } $scheduler = new TaskScheduler(); $result = concurrent { "a" => fetchDataA(), "b" => fetchDataB(), }; echo $result["a"]; // Outputs: Data A echo $result["b"]; // Outputs: Data B ?> In the example, fetchDataA and fetchDataB represent two data retrieval functions. By using the concurrent {} block, both functions run concurrently, reducing the total time it takes to fetch both datasets. Lazy Loading Lazy loading is a design pattern wherein data or resources are deferred until they're explicitly needed. Instead of pre-loading everything up front, you load only what's essential for the initial view and then fetch additional resources as and when they're needed. Think of it as a buffet where you only serve dishes when guests specifically ask for them, rather than keeping everything out all the time. A practical example is a modal on a web page: the data inside the modal isn't necessary until a user decides to open it by clicking a button. By applying lazy loading, we can hold off on fetching that data until the very moment it's required. How To Implement Lazy Loading For an effective lazy loading experience, it's essential to give users feedback that data is being fetched. A common approach is to display a spinner or a loading animation during the data retrieval process. This ensures that the user knows their request is being processed, even if the data isn't instantly available. Lazy Loading Example in React Let's illustrate lazy loading using a React component. This component will fetch data for a modal only when the user clicks a button to view the modal's contents: JavaScript import React, { useState } from 'react'; function LazyLoadedModal() { const [data, setData] = useState(null); const [isLoading, setIsLoading] = useState(false); const [isModalOpen, setIsModalOpen] = useState(false); const fetchDataForModal = async () => { setIsLoading(true); // Simulating an AJAX call to fetch data const response = await fetch('https://api.example.com/data'); const result = await response.json(); setData(result); setIsLoading(false); setIsModalOpen(true); }; return ( <div> <button onClick={fetchDataForModal}> Open Modal </button> {isModalOpen && ( <div className="modal"> {isLoading ? ( <p>Loading...</p> // Spinner or loading animation can be used here ) : ( <p>{data}</p> )} </div> )} </div> ); } export default LazyLoadedModal; In the above example, the data for the modal is fetched only when the user clicks the "Open Modal" button. Until then, no unnecessary network request is made. Once the data is being fetched, a loading message (or spinner) is displayed to indicate to the user that their request is in progress. Conclusion In today's fast-paced digital world, every millisecond counts. Users demand rapid responses, and businesses can't afford to keep them waiting. Performance optimization is no longer just a 'nice-to-have' but an absolute necessity for anyone serious about delivering a top-tier digital experience. Through techniques such as Pre-fetching, Memoization, Concurrent Fetching, and Lazy Loading, developers have a robust arsenal at their disposal to fine-tune and enhance their applications. These strategies, while diverse in their applications and methodologies, share a common goal: to ensure applications run as efficiently and swiftly as possible. However, it's important to remember that no single strategy fits all scenarios. Each application is unique, and performance optimization requires a judicious blend of understanding the application's needs, recognizing the users' expectations, and applying the right techniques effectively. It's an ongoing journey of refinement and learning.
What Is BFF? The Backend for Frontend (BFF) design pattern involves creating a backend service layer specifically tailored to the requirements of a particular frontend application or a set of closely related frontends. While traditionally this approach has been contrasted with a monolithic backend serving multiple frontends, it’s worth noting that a BFF can indeed serve multiple frontends, especially when tools like GraphQL (GQL) are utilized. The key is that these frontends have similar requirements and data needs. Regardless of the number of frontends, the primary advantage of the BFF is its ability to be optimized for the specific needs and context of its consumer(s). Here is an example of what could be architecture including a BFF pattern : Controllers: These are the entry points for incoming client requests. Each controller handles a specific set of endpoints, ensuring a clean separation of concerns. For instance, a ProductController might handle all product-related operations for the frontends. Services: Behind the controllers, we have service layers that perform business logic. These services coordinate a range of operations, ensuring seamless alignment between the data’s DTOs and the front end’s requirements. Additionally, they can leverage multithreading to enhance data request performance. For instance, they ProductService might coordinate retrieving product details, calculating promotions or discounts, and interfacing with inventory management. Within this service, one could expect methods like findProductById, applyDiscountToProduct, or getProductInventoryStatus. Data Mapping: Within the services, specialized mapping functions transform data between the domain model and the DTOs (Data Transfer Objects) that the API returns. This ensures that the front end receives data in the most appropriate format, tailored to its needs. Repositories: The repositories interact directly with our data sources, abstracting away the specifics of data recovery. For example, a ProductRepository might house methods for retrieving, storing, or modifying product information in the database, fetching related documents for the product, or interfacing with partner APIs. Error Handling: Throughout the architecture, standardized error handling mechanisms ensure that any issues are captured and reported back to the client in a very specific manner. This architecture promotes separation of concerns, making the BFF flexible and maintainable. Any interface could be easily added or modified without affecting the front end. Benefits and Trade-Offs Here are a few features with their benefits and trade-offs. Avoid Coupling Benefits Framework Independence: A BFF can be implemented in a different technology or framework than the front end or other BFFs, allowing developers to select the most appropriate tool for each specific front end. This becomes especially crucial in an era with a plethora of frontend frameworks and their potentially short lifespans. Decoupling Functional Code: Separating the backend-for-frontend from the frontend itself prevents tight coupling between functional logic and the frontend template, allowing each to evolve separately. It’s an unfavorable pattern seen in numerous front-end projects, often resulting in complex systems that are challenging to migrate. Trade-Offs Resource Flexibility: Implementing BFF often requires more versatile resources. The BFF may not use the same technology stack as the front end, necessitating developers to be skilled in multiple technologies or for teams to collaborate closely. Potential Functional Code Leakage: If not designed carefully, BFFs can start integrating too much business logic that ideally belongs to the primary API. This can lead to challenges in maintaining consistency and can duplicate logic across multiple BFFs. On this specific note, Behavior Driven Development can be invaluable. By employing tools like Karate or Selenium, you can discern the differences in implementation. Network Optimization Benefits Tailored Data Retrieval: By understanding exactly what the front end requires, a BFF can ensure that only necessary data is retrieved and sent, avoiding over-fetching or under-fetching of data. Leveraging Tools: With the BFF pattern, there’s an opportunity to use tools like GraphQL, which allows the front end to specify the exact data structures it requires. Trade-Offs Unnecessary calls: Improper application of the pattern could result in unnecessary calls, particularly if developers overlook design considerations, leading to network congestion. However, it’s worth highlighting that in the absence of BFF, such a scenario would have led to I/O overload. Data Segregation Benefits Custom Data Storage: BFFs allow for data to be stored in a way that is specifically optimized for the front end’s needs. For instance, data that supports hot configurations or client-specific settings can be stored separately. Trade-Offs Risk of Data Leaks: There’s a heightened risk of exposing sensitive data if not managed appropriately, as the BFF might interact with multiple data sources or expose data that’s tailored to front-end needs. Security Management Benefits Tailored Security Protocols: BFF enables fine-tuned security implementations, supporting both authorization logic and functional segregation. This ensures data protection and only exposes necessary data to the frontend, without restriction to primary APIs. Trade-Offs Reliance on API Security: While BFF handles frontend-specific security, the primary API still must implement basic security mechanisms. This means that the API exposes data without frontend-specific security but should still use basic methods like authentication. Quality Through Testing Benefits Focused Test Scenarios: With a BFF, tests can target specific scenarios and use cases unique to each front. This results in more accurate and relevant test cases, ensuring that the front end receives precisely what it expects. Rapid Feedback Loop: Since the BFF is tailored to the front end’s needs, automated testing can provide quicker feedback to developers. This can lead to faster iteration and more efficient debugging. Often, the adoption of unit tests is overlooked in frontend frameworks, given the lack of a dominant testing solution. This contrasts with frameworks typically favored for BFF, which tend to encourage and simplify unit test implementation. Enhanced End-to-End Testing: The BFF allows for end-to-end tests that closely mimic the real-world user experience. By simulating frontend requests, testers can gauge the entire data flow, from the BFF to the primary backend. While one could contend that these aren’t genuine end-to-end tests, their existence, easier maintenance, and reduced likelihood of becoming flaky make them invaluable. Trade-Offs Duplication of Efforts: There could be overlaps between the tests for the main backend and the BFF or even the front end. This redundancy might lead to wasted resources and time if not managed correctly. Maintenance Overhead: As the front end evolves, so will its requirements. The BFF’s tests must be continuously updated to reflect these changes, which could increase the maintenance burden. Risk of Over-Reliance: Teams might be tempted to overly rely on the BFF’s tests and overlook or downplay the significance of broader integration tests or tests on the main backend. Conclusion The BFF pattern has emerged as an innovative approach to bridge the gap between backend services and frontends, offering customization and efficiency tailored to the specific needs of each frontend or a set of closely related frontends. Its benefits, from streamlined network optimization to enhanced security protocols and focused testing scenarios, have been increasingly recognized in today’s fast-paced software development landscape. However, like any architectural pattern, it comes with its trade-offs, which necessitates a well-informed and judicious adoption strategy. By understanding its strengths and weaknesses and aligning them with project requirements, development teams can leverage the BFF pattern to achieve more responsive, maintainable, and efficient applications. As the software ecosystem continues to evolve, patterns like BFF exemplify the industry’s drive towards more modular, adaptable, and user-centric solutions.
Coined quite recently, the term microfrontend designates for a GUI (Graphical User Interface) what the one microservice designates for classical services, i.e., the decomposition process of the different application's parts and components. More importantly, it not only applies to GUIs in general but to a more specific category of GUIs named SPA (Single Page Application). This is important because if there existed several techniques aiming at separating the different parts and components of a web application in general when it comes to SPAs, the story would become a bit more difficult. As a matter of fact, separating the different parts and components of a general web application often means separating its different pages. This process becomes more tricky for SPAs, as it concerns the separation of the different visual fragments of the application's single page. This requires a finer granularity and a more intimate orchestration of the content elements. The microfrontend concept adds more complexity to the web applications development field, which is already fairly complex by itself. The SPA model, as well as the emergence of the so-called JavaScript or TypeScript-based web application platforms and frameworks, brought to the picture a high degree of intricacy, requiring developers to have a vast amount of background knowledge, from HTML and CSS to advanced aspects of Angular, React, Node, Vue, and jQuery. In the Java world, a new category of software developers has come to light: the full-stack developers who not only need to deal with the grief of mastering Java, be it standard or enterprise, and all its underlying sub-technologies like Servlet, REST, CDI, JPA, JMS and many others, currently placed under the auspices of Jakarta EE, but who, increasingly, is required to master things like WebPack, SystemJS, Bower, Gulp and others Yeoman. Not to mention any more Spring, Quarkus, Micronaut, or Helidon. In former times, when dinosaurs still populated the Earth, the enterprise-grade Java applications development only required the knowledge of a single technology: Java with possibly its enterprise extensions, appointed successively as J2EE, Java EE, and finally Jakarta EE. Unless it was Spring, the applications and services were deployed on Jakarta EE-compliant application servers, like Glassfish, Payara, Wildfly, JBoss, WebLogic, WebSphere, etc. These application servers were providing out-of-the-box all the required implementations of the above-mentioned specifications. Among these specifications, Jakarta Faces (formerly called JSF: Java Server Faces) was meant to offer a framework that facilitates and standardizes the development of web applications in Java. The Jakarta Faces history goes back to 2001 to its initial JSR (Java Specifications Request) 127. At that time, another web framework, known under the name of Struts and available under an Apache open-source license, was widely popular. As it sometimes happens in the web frameworks space, the advent of Jakarta Faces was perceived by the Apache community as being in conflict with Struts and, in order to resolve this alleged conflict, a long and heavy negotiation process of several years between Sun Microsystems and the Apache community, was required. Finally, Sun agreed to lift the restrictions preventing JSRs from being independently implemented under an open-source license, and the first implementation, named RI (Reference Implementation), was provided in 2003. Jakarta Faces was generally well received despite a market crowded with competitors. Its RI was followed by other implementations over the years, starting with Apache MyFaces in early 2004 and continuing with RedHat RichFaces in 2005, PrimeTek PrimeFaces in 2008, ICEsoft ICEfaces and Oracle ADF Faces in 2009, OmniFaces in 2012, etc. The specifications have evolved as well, from the 1.0 released in 2001 to the 4.0 released in 2022. Hence, more than 20 years of history in order to advent to the last Jakarta Faces release 4.0, a part of the Jakarta EE 10 specifications, named Mojara. The software history is sometimes convoluted. In 2010, Oracle acquired Sun Microsystems and became the owner of the Java trademark. All along the time period that they were under the Oracle stewardship, the Java EE specifications were in a kind of status quo before becoming Eclipse Jakarta EE. The company didn't really manage to set up a dialogue with users, communities, work groups, and all those involved in the recognition and promotion of the Java enterprise-grade services. Their evolution requests and expectations were ignored by the editor, who didn't know how to deal with their new responsibility as the Java/Jakarta EE owner. In such a way that little by little, this has led to a guarded reaction from software architects and developers, who began to prefer and adopt alternative technological solutions to application servers. While trying to find alternative solutions to Jakarta EE and to remedy issues like the apparent heaviness and the expansive prices of application servers, many software professionals have adopted Spring Boot as a development platform. And since they needed Jakarta EE implementations for even basic web applications, they deployed these applications in open-source servlet engines like Tomcat, Jetty, or Undertow. For more advanced features than just servlets, like JPA or JMS, Spring Boot provides integration with Active MQ or Hibernate. And should more advanced features be required, like JTA, for example, these software professionals were going fishing on the internet for free third-party implementations like Atomikos and, in the absence of an official integration, they tried to integrate by these features on their servlet engine, with all the risks that this entails. Other solutions, closer to real Jakarta EE alternatives, have emerged as well and, among them, Netty, Quarkus, Micronaut are the best-known and most popular. All these solutions were based on a couple of software design principles, like single concern, discrete boundaries, transportability across runtimes, auto-discovery, etc., which were known since the dawn of time. But because the software industry continuously needs new names, the new name that has been found for these alternative solutions is "microservices." More and more microservice architecture-based applications have appeared during the next few years, to such an extent that the word "microservice" became one of the most common buzzwords in the software industry. In order to optimize and standardize the microservices technology, the Eclipse Foundation decided to apply to microservices the same process that was used in order to design the Jakarta EE specifications. The Eclipse MicroProfile was born. But all these convolutions have definitely impacted the web framework technologies. While the high majority of the Java enterprise-grade applications were using Jakarta Faces for their web tier, switching from a software architecture based on Jakarta EE-compliant application servers to microservices resulted in a phasing-out of these architectures in favor of some more lightweight ones, often based on Eclipse Microprofile specifications. And since Jakarta Faces components needed an application server to be deployed on, other lighter alternatives, based on JavaScript or TypeScript libraries, like Angular, Vue, ExtJS, jQuery, and others, have been adopted to make up for its absence. Nowadays, most Java enterprise applications adopt the software architecture depicted below: While these microservices might be implemented using different frameworks like Spring Boot, the most natural choice is probably Quarkus. As a matter of fact, Quarkus is one of the most attractive Eclipse Microprofile implementations, not only thanks to its high degree of compliance with the specifications but also due to its extensions and its capacity to generate native code, which makes it the Supersonic and the Subatomic Java framework. As for the front end, it typically might be implemented in Angular. In order to achieve such an implementation, two development teams are generally required: A Frontend team specialized in TypeScript, Angular, CSS, and HTML development, using Node.js as a deployment platform, NPM as a build tool, Bower as a dependency management, Gulp as a streaming system, Karma and Jasmine for testing, WebPack as a code bundler, and probably many others. A Backend team specialized in Java development using the Eclipse Microprofile specifications, as well as different Jakarta EE implementations of sub-technologies like Jakarta REST, Jakarta Persistence, Jakarta Messaging, Jakarta Security, Jakarta JSON Binding, etc. A single team of fullstack developers covering all the above-mentioned fields and technologies might also do it, but this is less usual. In any case, as you can observe, it becomes quite difficult to build a Java enterprise-grade project team as it requires at least two categories of profiles, and, given this technology's complexity, the mentioned profiles should better be senior. This situation sharply contrasts with what happened in the former times when the Frontend could have been implemented using Jakarta Faces and, hence, a single Java development team was able to take charge of such an enterprise-grade project. Jakarta Faces is a great web framework whose implementations offer hundreds of ready-to-use widgets and other visual controls. Compared with Angular, where the visual components are a part of external libraries, like Material, NG-Bootstrap, Clarity, Kendo, Nebular, and many others, Jakarta Faces implementations not only provide ways more widgets and features but also are part of the official JSR 372 specifications and, in this respect, they are standard, as opposed to the mentioned libraries, which evolve with their authors prevailing moods, without any guarantee of consistency and stability. One of the criteria that has formed the basis of the decision of many organizations to switch from Jakarta Faces web applications to JavaScript/TypeScript frameworks was client-side rendering. It was considered that the server-side rendering, which is the way the Jakarta Faces works, is less performant than the client-side rendering provided by the browser-based applications. This argument has to be taken with a grain of salt: Client-side rendering means rendering pages directly in the browser with JavaScript. All logic, data fetching, templating, and routing are handled by the client. The primary downside of this rendering type is that the amount of JavaScript required tends to grow as an application grows, which can have negative effects on a page's capacity to consistently respond to user inputs. This becomes especially difficult with the addition of new JavaScript libraries, polyfills, and third-party code, which compete for processing power and must often be processed before a page's content can be rendered. Server-side rendering generates the full HTML for a page on the server in response to navigation. This avoids additional round-trips for data fetching and templating on the client since it's handled before the browser gets a response. Server-side rendering generally reduces the time required for the page content to become visible. It makes it possible to avoid sending lots of JavaScript to the client. This helps to reduce a page's TBT (Total Blocking Time), which can also lead to a lower average response time as the main thread is not blocked as often during page load. When the main thread is blocked less often, user interactions will have more opportunities to run sooner. With server-side rendering, users are less likely to be left waiting for CPU-bound JavaScript to run before they can access a page. Accordingly, the argument consisting of saying that the server-side rendering is bad while the client-side one would be better is just a myth. However, there is one potential trade-off here: generating pages on the server might take time, which may result in a higher TTFB (Time to First Byte). This is the time between the user's click instant and the one when the first content byte comes in. And admitting that this metric impacts more important others, like requests per second or latency and uptime, it's difficult to assert that the web application's average response time is really affected in a user-sensible way. Consequently, it appears clearly from this analysis that developing Java web applications using server-side rendering frameworks, like Jakarta Faces, not only leads to less performant applications, but it's also much simpler and less expansive. This approach doesn't require so many different technology stacks as its JavaScript/TypeScript-based alternatives. The development teams don't need several categories of profiles, and the same developer can directly contribute to both the front end and the back end without having to operate any paradigm switch. This last argument is all the more important as Java developers, concerned by things like multi-threading, transaction management, security, etc., aren't comfortable when it comes to command programming languages that have been designed to run in a browser. So the good news here is that if, like me, you're nostalgic for Jakarta Faces, for now on, you can start implementing your Frontends with it without the need for any Jakarta EE-compliant application server. That's because Quarkus, our famous Supersonic Subatomic Java platform, provides a Jakarta Faces extension, allowing you to write beautiful Frontends like in the old good times. At Melloware Inc., they provide a PrimeFaces extension for Quarkus, as described here. You'll find in the mentioned GIT repository a showcase application that demonstrates, with consistent code examples, how to use every single PrimeFaces widget. Please follow the guide in the README.md file to build and run the showcase on both an application server, like Wildfly, and in Quarkus. You'll tell me what it feels like there! Now, to come back to the microfrontend notion, which was our main concern at the beginning of this post, Michael Geers has written a well-documented article, as well as a book, in which he exemplifies the most modern trends to build rich and powerful SPAs. But far from really demystifying the concept, these works show how complex the microfrontend topic is by offering us an extensive journey in a new world populated by strange creatures like Self Contained Systems (SCS), Verticalized Systems, or Documents to Applications Continuum. Far from pretending to be able to clarify how all these new paradigms come into the overall landscape of web application development, if I'd have to resume in a single statement what the microfrontends essentially is, I'd define them by quoting Michael: A composition of features which are owned by independent teams. Each team has a distinct area of business or mission it cares about and specializes in. A team is cross functional and develops its features end-to-end, from database to user interface. The figure below tries to illustrate this concept. After reading this definition, I can't refrain from thinking that it fits so well to the Jakarta Faces Custom Components concept, which, as its name implies, lets you create brand new custom visual components that you can plug into your applications that different independent teams can own and specializes into, etc. As luck would have it! :-).
This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report The cloud is seamlessly integrated with almost all aspects of life, like business, personal computing, social media, artificial intelligence, Internet of Things, and more. In this article, we will dive into clouds and discuss their optimal suitability based on different types of organizational or individual needs. Public vs. Private Cloud Evaluation Mixing and matching cloud technologies provides a lot of ease and flexibility, but it comes with a lot of responsibilities, too. Sometimes it is difficult to make a choice between the types of cloud available, i.e., public, private, or hybrid cloud. An evaluation based on providers, cloud, and project demand is very crucial to selecting the right type. When evaluating a public and private cloud, it is important to consider the factors listed in Table 1: PUBLIC VS. PRIVATE CLOUD Public Cloud Private Cloud Best use cases Good for beginners Testing new features with minimal cost and setup Handling protected data and industry compliance Ensuring customized security Dedicated resources Cost Pay per usage Can be expensive Workload Suitable for fluctuating workloads Offers scalability and flexibility Suitable for predictable workloads Data confidentiality requirements (e.g., hospitals with confidential patient data) Infrastructure Shared infrastructure Can use existing on-premises investments Services Domain-specific services, including healthcare, education, finance, and retail Industry-specific customization options, including infrastructure, hardware, software stack, security, and access control Presence Global Reduces latency for geographically distributed customers Effective for limited geographic audiences and targeted client needs Table 1 Hybrid Cloud Both public and private clouds are useful in various ways, so it is possible to choose both to gain maximum advantages. This approach is achieved by adopting a hybrid cloud. Let's understand some of the key factors to consider: Hybrid clouds are suitable when the workload is both predicted or variable. A public cloud provides scalability and on-demand resources during peak seasons, while a private cloud handles base workload during off-seasons. To save money, public clouds can be shut off during non-peak seasons and store non-sensitive data. Private clouds generally cost more, but it is necessary for storing sensitive data. Private clouds are used for confidential data; non-regulated or non-sensitive information can be stored in a public cloud. Hybrid cloud is suitable for businesses operating in multiple regions. Private clouds serve specific regions, while public cloud providers offer global reach and accessibility for other services. Before adopting a hybrid approach, thorough analysis should be done, keeping factors such as workload patterns, budget, and compliance needs in consideration. Figure 1: Hybrid cloud combines features of public and private cloud Key Considerations for DBAs and Non-DBAs To achieve overall operational efficiency, it is essential for DBAs and non-DBAs to understand the key considerations related to database management. These considerations will help in effective collaboration, streamlined processes, and optimized data usage within an organization. Cost Optimization Cost is one of the major decision-making factors for everyone. It is very crucial to consider cost optimization strategies surrounding data, backup and archival strategies, and storage. Data is one of the most important factors when it comes to cost saving. It is always good to know your data patterns and to understand who the end user is so the classification of data is optimized for storage. No duplicates in data means no extra storage used. Also, an in-depth understanding of the types of storage available is required in order to get maximum benefit within budget. Classify your data into a structured or unstructured format. It is important for DBAs to analyze data that is no longer actively used but might be needed in the future. By moving this data to archival storage, DBAs can effectively save primary storage space. Implementing efficient backup strategies can help minimize redundant storage requirements, hence less cost. Data, storage, and cost are directly proportional to each other, so it is important to review these three for maximum benefits and performance with minimum costs from cloud providers. Available storage options include object storage, block storage, thin provisioning, and tiered storage. Figure 2: Data is directly proportional to the storage used and cost savings Performance To optimize storage speed, cloud providers use technologies like CDNs. Network latency can be reduced through strategies such as data compression, CDNs, P2P networking, edge computing, and geographic workload distribution. Larger memory capacity improves caching and overall performance. Computing power also plays a vital role. Factors like CPU-intensive tasks and parallel processing should be considered. GPUs or TPUs offer improved performance for intensive workloads in machine learning, data analytics, and video processing. Disaster Recovery Data should be available to recover after a disaster. If choosing a private cloud, be ready with backup and make sure you will be able to recover! It's important to distribute data, so if one area is affected, other locations can serve and business can run as usual. Security Cloud providers have various levels of data protection: With multi-factor authentication, data will be secure by adding an extra layer of verification. A token will be sent to a mobile device, via email, or by a preferred method like facial recognition. The right data should be accessible by the right consumer. To help these restrictions, technologies like identity and access management or role-based access control assign permission to the users based on assigned roles. Virtual private networks could be your savior. They provide secure private network connections over public networks and encrypted tunnels between the consumer's device and the cloud that will protect data from intruders. With encryption algorithms, clouds protect data at rest and in transit within infrastructure. However, it is always a good idea for a DBA and non-DBA to configure encryption settings within an app to achieve an organization's required security. Scalability When working with a cloud, it is important to understand how scalability is achieved: Cloud providers deploy virtual instances of servers, storage, and networks, which result in faster provisioning and allocation of virtual resources on demand. Serverless computing allows developers to focus on writing code. Infrastructure, scaling, and other resources required to handle incoming requests will be handled by cloud providers. Cloud providers suggest horizontal scaling instead of vertical scaling. By adding more servers instead of upgrading hardware in existing machines, the cloud will develop a distributed workload, which increases capacity. Vendor Lock-In For organizations looking for flexibility and varying cloud deployments, the risk of vendor lock-in can be limiting. To minimize this risk, implementing a hybrid cloud approach enables the distribution of data, flexibility, and easy migration. Using multiple cloud providers through a hybrid model helps avoid dependence on a single vendor with diverse capabilities. Open data formats and vendor-independent storage solutions will help in easy porting. In addition, containerization technologies for applications allow flexible vendor selection. It is essential for organizations to consider exit strategies, including contractual support, to ensure smooth transitions between vendors and to reduce challenges. Next Steps Cloud computing is a huge revolution for not only the IT industry but also for individuals. Here are some next steps based on the features and extensive usage. Extensive Acceptance Cloud computing is a long-term deal. It offers flexibility and the ability for developers to focus on code alone, even if they don't have prior knowledge or a dedicated team to maintain infrastructure. Other benefits include increased innovation, since most of the hustle is taken care by the cloud providers, and little-to no downtime, which is great for businesses that operate 24/7. Database Options To Mix and Match in the Cloud When we talk about cloud computing, there are many databases available. The following are some popular database options: DATABASE OPTIONS NoSQL Relational Serverless Managed Database Services Purpose Structured and unstructured data Structured and complex queries Unpredictable workloads Simplify database management Pros Scalability, flexibility, pairs well with cloud computing Strong data consistency, robust query capabilities Pay per usage, server management Scaling, automated backups, maintenance, cost-effective Cons Data consistency Scalability, fixed schema Vendor lock-in Dependent on providers Prevalence Will grow in popularity Will continue to stick around Will grow in popularity Will be a secure choice Examples MongoDB, Cassandra MySQL, Postgres, Oracle Google Cloud Firestore, Amazon Aurora Serverless Table 2 Conclusion It is important to note that there are scenarios where a private cloud might be preferred, such as when strict data security and compliance requirements exist, or when an organization needs maximum control over infrastructure and data. Each organization should evaluate its specific needs and consider a hybrid cloud approach, as well. Cloud providers often introduce new instances, types, updates, and features, so it is always good to review their documentation carefully for the most up-to-date information. By mindfully assessing vendor lock-in risks and implementing appropriate strategies, businesses can maintain flexibility and control over their cloud deployments while minimizing the challenges associated with switching cloud providers in the future. In this article, I have shared my own opinions and experiences as a DBA — I hope it offered additional insights and details about cloud options that can help to improve performance and cost savings based on individual objectives. This is an article from DZone's 2023 Database Systems Trend Report.For more: Read the Report
October 2, 2023 by
Building a DevOps Culture Layer by Layer
October 2, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
Empowering Real-World Solutions the Synergy of AI and .NET
October 3, 2023 by
Data Observability: Better Insights Through Reliable Data Practices
October 4, 2023 by CORE
HIPAA Compliance Testing In Software: Building Healthcare Software With Confidence
October 3, 2023 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Eclipse JNoSQL 1.0.2: Empowering Java With NoSQL Database Flexibility
October 3, 2023 by CORE
PromCon EU 2023: Observability Recap in Berlin
October 3, 2023 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by