DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report

DZone Spotlight

Friday, March 31 View All Articles »
Application Architecture Design Principles

Application Architecture Design Principles

By Ray Elenteny CORE
This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report Designing an application architecture is never complete. Regularly, all decisions and components need to be reviewed, validated, and possibly updated. Stakeholders require that complex applications be delivered more quickly. It's a challenge for even the most senior technologists. A strategy is required, and it needs to be nimble. Strategy combines processes, which aid in keeping a team focused, and principles and patterns, which provide best practices for implementation. Regardless, it's a daunting task requiring organizational commitment. Development, Design, and Architectural Processes Applications developed without any process is chaos. A team that invents their own process and sticks to it is much better off than a team using no process. At the same time, holding a project hostage to a process can be just as detrimental. Best practices and patterns are developed over multiple years of teams looking for better ways to produce quality software in a timely manner. Processes are the codification of the best practices and patterns. By codifying best practices and patterns into processes, the processes can be scaled out to more organizations and teams. For example, when an organization selects a development process, a senior leader may ascribe to a test-first development pattern. It becomes much easier for an organization to adopt a pattern by finding a process that outlines how the pattern is organizationally implemented. In the case of the test-first development pattern, test-driven development (TDD) may be selected as the development process. Another technical leader in the same organization may choose to lead their team using domain-driven design (DDD), a pattern by which software design is communicated across technical teams as well as other stakeholders. Can these two design philosophies coexist? Yes. They can. Here, TDD defines how software is constructed while DDD defines the concepts that describe the software. Software architecture works to remain neutral to specific development and design processes, and it is the specification on how an abstract pattern is implemented. The term, "abstract pattern," is used as most software architecture patterns can be applied across any development process and across any tech stack. For example, many architectures employ the use of inversion of control (or dependency injection). How Java, JavaScript, C#, etc. implement inversion of control is specific to the tech stack, but it accomplishes the same goal. Avoiding Dogmatic Adherence Regardless of development, design, or architectural process, it's key that strict adherence to a given process does not become the end goal. Unfortunately, this happens more often than it should. Remember that the intent of a process is to codify best practices in a way that allows teams to scale using the same goals and objectives. To that end, when implementing processes, here are some points to consider: There's no one size fits all. Allow culture to mold the process. Maturity takes time. Keep focused on what you're really doing — building quality software in a timely manner. Cross-Cutting Concerns Software architecture can be designed, articulated, and implemented in several ways. Regardless of approach, most software architecture plans address two key points: simplicity and evolution. Simplicity is a relative term in that an architectural approach needs to be easily understood within the context of the business domain. Team members should look at an architectural plan and say, "Of course, that's the obvious design." It may have taken several months to develop the plan, but a team responding in this manner is a sign that the plan is on the right track. Evolution is very important and can be the trickiest aspect of an architectural plan. It may sound difficult, but an architectural plan should be able to last ten-plus years. That may be challenging to comprehend, but with the right design principles and patterns in place, it's not as challenging as one might think. At its core, good software architecture does its best to not paint itself into a corner. Figure 1 below contains no new revelations. However, each point is critical to a lasting software architecture: Building architecture that endures. This is the end goal. It entails using patterns that support the remaining points. Multiple platform and deployment support. The key here is that what exists today will very likely look different five years from now. An application needs to be readily able to adapt to changes in platform and deployment models, wherever the future takes it. Enforceable, standard patterns and compliance. Not that there's nothing new, but the software industry has decades of patterns to adopt and compliance initiatives to adhere to. Changes in both are gradual, so keeping an eye on the horizon is important. Reuse and extensibility from the ground up. Implementation patterns for reuse and extensibility will vary, but these points have been building blocks for many years. Collaboration with independent, external modules. The era of microservices helps enforce this principle. Watch for integrations that get convoluted. That is a red flag to the architecture. Evolutionary, module compatibility and upgrade paths. Everything in a software's architecture will evolve. Consider how compatibility and upgrades are managed. Design for obsolescence. Understand that many components within a software's architecture will eventually need to be totally replaced. At the beginning of each project or milestone, ask the question, "How much code are we getting rid of this release?" The effect of regular code pruning is no different than the effect of pruning plants. Figure 1: Key architectural principles Developing microservices is a combination of following these key architectural principles along with segmenting components into areas of responsibility. Microservices provide a unit of business functionality. Alone, they provide little value to a business. It's in the assembly of and integration with other microservices that business value is realized. Good microservices assembly and integration implementations follow a multi-layered approach. Horizontal and Vertical Slices Simply stated, slicing an application is about keeping things where they belong. In addition to adhering to relevant design patterns in a codebase, slicing an application applies the same patterns at the application level. Consider an application architecture as depicted by a Lego® brick structure in the figure below: Figure 2: Microservices architecture Each section of bricks is separated by that thin Lego® brick, indicating a strict separation of responsibility between each layer. Layers interact only through provided contracts/interfaces. Figure 2 depicts three layers with each having a distinct purpose. Whether it be integration with devices such as a laptop or tablet, or microservices integrating with other microservices, the point at which service requests are received remains logically the same. Here, there are several entry points ranging from web services and messaging services to an event bus. Horizontal Slices Horizontal slices of an application architecture are layers where, starting from the bottom, each layer provides services to the next layer. Typically, each layer of the stack refines the scope of underlying services to meet business use case logic. There can be no assumptions by services in lower layers on how above services interact with them. As mentioned, this is done with welldefined contracts. In addition, services within a layer interact with one another through that layer's contracts. Maintaining strict adherence to contracts allows components at each layer to be replaced with new or enhanced versions with no disruption in interoperability. Figure 3: Horizontal slices Vertical Slices Vertical slices are where everything comes together. A vertical slice is what delivers an application business objective. A vertical slice starts with an entry point that drills through the entire architecture. As depicted in Figure 4, business services can be exposed in multiple ways. Entry points are commonly exposed through some type of network protocol. However, there are cases where a network protocol doesn't suffice. In these cases, a business service may offer a native library supporting direct integration. Regardless of the use case, strict adherence to contracts must be maintained. Figure 4: Vertical slices Obvious, Yet Challenging Microservices have become a predominant pattern by which large applications are assembled. Each microservice is concerned with a very specific set of functionalities. By their very nature, microservices dictate that well-defined contracts are in place, with which other microservices and systems can integrate. Microservices that are designed and implemented for cloud-native deployments can leverage cloud-native infrastructure to support several of the patterns discussed. The patterns and diagrams presented here will look obvious to most. As mentioned, good architecture is "obvious." The challenge is adhering to it. Often, the biggest enemy to adherence is time. The pressure to meet delivery deadlines is real and where cracks in the contracts appear. Given the multiple factors in play, there are times when compromises need to be made. Make a note, create a ticket, add a comment, and leave a trail so that the compromise gets addressed as quickly as possible. Well-designed application architecture married with good processes supports longevity, which from a business perspective provides an excellent return on investment. Greenfield opportunities are fewer than evolving existing applications. Regardless, bringing this all to bear can look intimidating. The key is to start somewhere. As a team, develop a plan and "make it so"! This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report More
10 Things to Know When Using SHACL With GraphDB

10 Things to Know When Using SHACL With GraphDB

By Henriette Harmse
Today I have one of those moments where I am absolutely sure if I do not write this down, I will forget how to do this next time. For one of the projects I am working on, we need to do SHACL validation of RDF data that will be stored in Ontotext GraphDB. Here are the 10 things I needed to learn in doing this. Some of these are rather obvious, but some were less than obvious to me. Number 1: To be able to do SHACL validation, your repository needs to be configured for SHACL when you create your repository. This cannot be done after the fact. Number 2: It seems to be better to import your ontology (or ontologies) and data into different graphs. This is useful when you want to re-import your ontology (or ontologies) or your data, because then you can replace a specific named graph completely. This was very useful for me while prototyping. Screenshot below: Number 3: SHACL shapes are imported into this named graph: http://rdf4j.org/schema/rdf4j#SHACLShapeGraph ...by default. At configuration time, you can provide a different named graph or graphs for your SHACL shapes. Number 4: To find the named graphs in your repository, you can do the following SPARQL query: select distinct ?g where { graph ?g {?s ?p ?o } } You can then query a specific named graph as follows: select * from <myNamedGraph> where { ?s ?p ?o . } Number 5: However, getting the named graphs does not return the SHACL named graph. On StackOverflow someone suggested SHACL shapes can be retrieved using: http://address:7200/repositories/myRepo/rdf-graphs/service?graph=http://rdf4j.org/schema/rdf4j#SHACLShapeGraph However, this did not work for me. Instead, the following code worked reliably: import org.eclipse.rdf4j.model.Model; import org.eclipse.rdf4j.model.impl.LinkedHashModel; import org.eclipse.rdf4j.model.vocabulary.RDF4J; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.repository.http.HTTPRepository; import org.eclipse.rdf4j.rio.RDFFormat; import org.eclipse.rdf4j.rio.Rio; import org.eclipse.rdf4j.rio.WriterConfig; import org.eclipse.rdf4j.rio.helpers.BasicWriterSettings; import java.util.stream.Collectors; public class RetrieveShaclShapes { public static void main(String[] args) { String address = args[0]; /* i.e. http://localhost/ */ String repositoryName = args[1]; /* i.e. myRepo */ HTTPRepository repository = new HTTPRepository(address, repositoryName); try (RepositoryConnection connection = repository.getConnection()) { Model statementsCollector = new LinkedHashModel( connection.getStatements(null, null,null, RDF4J.SHACL_SHAPE_GRAPH) .stream() .collect(Collectors.toList())); Rio.write(statementsCollector, System.out, RDFFormat.TURTLE, new WriterConfig().set( BasicWriterSettings.INLINE_BLANK_NODES, true)); } catch (Throwable t) { t.printStackTrace(); } } } ...using the following dependencies in the pom.xml: <dependency> <groupId>org.eclipse.rdf4j</groupId> <artifactId>rdf4j-client</artifactId> <version>4.2.3</version> <type>pom</type> </dependency> Number 6: Getting the above code to run was not obvious since I opted to using a fat jar. I encountered an "org.eclipse.rdf4j.rio.UnsupportedRDFormatException: Did not recognise RDF format object" error. RFD4J uses the Java Service Provider Interface (SPI) which uses a file in the META-INF/services of the jar to register parser implementations. The maven-assembly-plugin I used, to generate the fat jar, causes different jars to overwrite META-INF/services thereby loosing registration information. The solution is to use the maven-shade-plugin which merge META-INF/services rather overwrite it. In your pom you need to add the following to your plugins configuration: <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.4.1</version> <executions> <execution> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> </transformers> </configuration> </execution> </executions> </plugin> You can avoid this problem by using the separate jars rather than a single fat jar. Number 7: Importing a new shape into the SHACL shape graph will cause new shape information to be appended. It will not replace the existing graph even when you have both the "Enable replacement of existing data" and "I understand that data in the replaced graphs will be cleared before importing new data." options enabled, as seen in the next screenshot: To replace the SHACL named graph, you need to clear it explicitly by running the following SPARQL command: clear graph <http://rdf4j.org/schema/rdf4j#SHACLShapeGraph> For myself, I found it easier to update the SHACL shapes programmatically. Note that I made use of the default SHACL named graph: import org.eclipse.rdf4j.model.vocabulary.RDF4J; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.repository.http.HTTPRepository; import org.eclipse.rdf4j.rio.RDFFormat; import java.io.File; public class UpdateShacl { public static void main(String[] args) { String address = args[0]; /* i.e. http://localhost/ */ String repositoryName = args[1]; /* i.e. myRepo */ String shacl = args[2]; File shaclFile = new File(shacl); HTTPRepository repository = new HTTPRepository(address, repositoryName); try (RepositoryConnection connection = repository.getConnection()) { connection.begin(); connection.clear(RDF4J.SHACL_SHAPE_GRAPH); connection.add(shaclFile, RDFFormat.TURTLE, RDF4J.SHACL_SHAPE_GRAPH); connection.commit(); } catch (Throwable t) { t.printStackTrace(); } } } Number 8: Programmatically you can delete a named graph using this code and the same Maven dependency as we used above: import org.eclipse.rdf4j.model.IRI; import org.eclipse.rdf4j.model.ValueFactory; import org.eclipse.rdf4j.model.impl.SimpleValueFactory; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.repository.http.HTTPRepository; public class ClearGraph { public static void main(String[] args) { String address = args[0]; /* i.e. http://localhost/ */ String repositoryName = args[1]; /* i.e. myRepo */ String graph = args[2]; /* i.e. http://rdf4j.org/schema/rdf4j#SHACLShapeGraph */ ValueFactory valueFactory = SimpleValueFactory.getInstance(); IRI graphIRI = valueFactory.createIRI(graph); HTTPRepository repository = new HTTPRepository(address, repositoryName); try (RepositoryConnection connection = repository.getConnection()) { connection.begin(); connection.clear(graphIRI); connection.commit(); } } } Number 9: If you update the shape graph with constraints that are violated by your existing data, you will need to first fix your data before you can upload your new shape definition. Number 10: When uploading SHACL shapes, unsupported features fails silently. I had this idea to add human readable information to the shape definition to make it easier for users to understand validation errors. Unfortunately "sh:name" and "sh:description" are not supported by GraphDB version 10.0.2. and 10.2.0. Moreover, it fails silently. In the Workbench, it will show that it loaded successfully as seen in the next screenshot: However, in the logs I have noticed the following warnings: As these are logged as warnings, I was expecting my shape to have loaded fine, except that triples pertaining to "sh:name" and "sh:description" are skipped. However, my shape did not load at all. You find the list of supported SHACL features here. Conclusion This post may come across as being critical of GraphDB. However, this is not the intention. I think it is rather a case of growing pains that are still experienced around SHACL (and Shex, I suspect) adoption. Resources that have been helpful for me in resolving issues are: GraphDB documentation, and RDF4J on which GraphDB is built. More

Trend Report

Software Integration

Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!

Software Integration

Refcard #335

Distributed SQL Essentials

By Andrew Oliver
Distributed SQL Essentials

Refcard #384

Advanced Cloud Security

By Samir Behara CORE
Advanced Cloud Security

More Articles

Orchestration Pattern: Managing Distributed Transactions
Orchestration Pattern: Managing Distributed Transactions

As organizations migrate to the cloud, they desire to exploit this on-demand infrastructure to scale their applications. But such migrations are usually complex and need established patterns and control points to manage. In my previous blog posts, I covered a few of the proven designs for cloud applications. In this article, I’ll introduce the Orchestration Pattern (also known as the Orchestrator Pattern) to add to the list. This technique allows the creation of scalable, reliable, and fault-tolerant systems. The approach can help us manage the flow and coordination among components of a distributed system, predominantly in a microservices architecture. Let’s dive into a problem statement to see how this pattern works. Problem Context Consider a legacy monolith retail e-commerce website. This complex monolith consists of multiple subdomains such as shopping baskets, inventory, payments etc. When a client sends a request, the website performs a sequence of operations to fulfil the request. In this traditional architecture, each operation can be described as a method call. The biggest challenge for the application is scaling with the demand. So, the organisation decided to migrate this application to the cloud. However, the monolithic approach that the application uses is too restricted and would limit scaling even in the cloud. Adopting a lift and shift approach to perform migration would not reap the real benefits of the cloud. Thus, a better migration would be to refactor the entire application and break it down by subdomains. The new services must be deployed and managed individually. The new system comes with all the improvements of distributed architecture. These distributed and potentially stateless services are responsible for their own sub-domains. But the immediate question is how to manage a complete workflow in this distributed architecture. Let us try to address this question in the next section and explore more about Orchestration Patterns. Monolithic application migration to the cloud What Is Orchestration Pattern We have designed an appropriate architecture where all services operate within their bounded context. However, we still need a component that is aware of the entire business workflow. The missing element is responsible for generating the final response by communicating with all of the services. Think of it like an orchestra with musicians playing their instruments. In an orchestra, a central conductor coordinates and aligns the members to produce a final performance. The Orchestration Pattern also introduces a centralized controller or service known as the orchestrator, similar to a central conductor. The orchestrator does not perform business logic but manages complex business flows by calling independently deployed services, handling exceptions, retrying requests, maintaining state, and returning the final response. Orchestrator Pattern The figure above illustrates the pattern. It has three components: the orchestrator or central service, business services that need coordination, and the communication channel between them. It is an extension of the Scatter Gather pattern but involves a sequence of operations instead of executing a single task in parallel. Let’s examine a use case to understand how the pattern works. Use Case Many industries, such as e-commerce, finance, healthcare, telecommunications, and entertainment, widely use the orchestrator pattern with microservices. By now, we also have a good understanding of the pattern. In this section, I will talk about payment processing, which is relevant in many contexts, to detail the pattern in action. Consider a payment gateway system that mediates between a merchant and a customer bank. The payment gateway aims to facilitate secure transactions by managing and coordinating multiple participating services. When the orchestrator service receives a payment request, it triggers a sequence of service calls in the following order: Firstly, it calls the payment authorization service to verify the customer’s payment card, the amount going out, and bank details. The service also confirms the merchant’s bank and its status. Next, the orchestrator invokes the Risk Management Service to retrieve the transaction history of the customer and merchant to detect and prevent fraud. After this, the orchestrator checks for Payment Card Industry (PCI) Compliance by calling the PCI Compliance Service. This service enforces the mandated security standards and requirements for cardholder data. Credit card companies need all online transactions to comply with these security standards. Finally, the orchestrator calls another microservice, the Transaction Service. This service converts the payment to the merchant’s preferred currency if needed. The service then transfers funds to the merchant’s account to settle the payment transaction. Payment Gateway System Flow After completing all the essential steps, the Orchestrator Service responds with a transaction completion status. At this point, the calling service may send a confirmation email to the buyer. The complete flow is depicted in the above diagram. It is important to note that this orchestration service is not just a simple API gateway that calls the APIs of different services. Instead, it is the only service with the complete context and manages all the steps necessary to finish the transaction. If we want to add another step, for example, the introduction of new compliance by the government, all we need to do is create a new service that ensures compliance and add this to the orchestration service. It’s worth noting that the new addition may not affect the other services, and they may not even be aware of it. Implementation Details The previous section has demonstrated a practical use case for managing service using an orchestrator. However, below are a few tactics that can be used while implementing the pattern: Services vs ServerlessMostly following this pattern means having a business logic that spreads across many services. However, there are specific situations when not all the business steps require execution or only a few steps are necessary. Should these steps be deployed as functions instead of services in these scenarios? Events usually trigger functions, which shut down once they complete their job. Such an infrastructure can save us money compared to a service that remains active continuously and performs minimal tasks. Recovery from Transient FailuresThe orchestration pattern implementation can be challenging because it involves coordinating multiple services and workflows, which requires a different approach to designing and managing software systems than traditional monolithic architectures. The implementation must be able to handle potential transient failures, such as network failure, service failure, or database failure. Below are a few ways to cater to such issues: Retry MechanismImplementing a retry mechanism can improve resiliency when a service operation fails. The retry mechanism should configure the number of retries allowed, the delay between retries, and the conditions to attempt retries. Circuit Breaker PatternIn case a service fails, the orchestrator must detect the failure, isolate the failed service, and give it a chance to recover. It can help the service heal without disruption and avoid complete system failure. Graceful DegradationIf a service fails and becomes unavailable, the rest of the services should continue to operate. The orchestrator should look for fallback options to minimize the impact on end-users, such as previously cached results or an alternate service. Monitoring and AlertingThe entire business flow is distributed among various services when we operate with the Orchestration Pattern. Therefore, an effective monitoring and alerting solution is mandatory to trace and debug any failures. The solution must be capable of detecting any issues in real-time and taking appropriate actions to mitigate the impact. It includes implementing auto-recovery strategies, such as restarting failed services or switching to a backup service, and setting up alerts to notify the operations team when exceptions occur. The logs generated by the orchestrator are also valuable for the operations team to troubleshoot errors. We can operate smoothly and meet user needs by proactively identifying and resolving issues. Orchestration Service FailureFinally, we must prepare for scenarios where the orchestrator fails itself while processing requests. For instance, in our payment gateway example, imagine a scenario where the orchestrator calls the Transaction service to transfer the funds but crashes or loses connection before getting a successful response for the occurred transaction. It could lead to a frustrating user experience, with the risk of the customer being charged twice for the same product. To prevent such failure scenarios, we can adopt one of the following solutions: Service ReplicationReplicate the orchestration service across multiple nodes. The service can automatically fail over to the backup node when needed. With a load balancer that can detect and switch to the available node, the replication guarantees seamless service and prevents disruptions to the user. Data ReplicationNot only should we replicate the service, but we should also replicate the data to ensure data consistency. It enables the backup node to take over seamlessly without any data loss. Request QueuesImplementing queues like a buffer for requests when the orchestration service is down. The queue can hold incoming requests until the service is available again. Once the backup node is up and running, it can retrieve the data from the queue buffer and process them in the correct order. Why Use Orchestration Pattern The pattern comes with the following advantages: Orchestration makes it easier to understand, monitor and observe the application, resulting in a better understanding of the core part of the system with less effort. The pattern promotes loose coupling. Each downstream service exposes an API interface and is self-contained, without any need to know about the other services. The pattern simplifies the business workflows and improves the separation of concerns. Each service participates in a long-running transaction without any need to know about it. The orchestrator service can decide what to do in case of failure making the system fault-tolerant and reliable. Important Considerations The primary goal of this architectural pattern is to decompose the entire business workflow into multiple services, making it more flexible and scalable. And due to this, it’s crucial to analyse and comprehend the business processes in detail before implementation. A poorly defined and overly complicated business process will lead to a system that would be hard to maintain and scale. Secondly, it’s easy to fall into the trap of adding business logic into the orchestration service. Sometimes it’s inevitable because certain functionalities are too small to create their separate service. But the risk here is that if the orchestration service becomes too intelligent and performs too much business logic, it can evolve into a monolithic application that also happens to talk to microservices. So, it’s crucial to keep track of every addition to the orchestration service and ensure that its work remains within the boundaries of orchestration. Maintaining the scope of the orchestration service will prevent it from becoming a burden on the system, leading to decreased scalability and flexibility. Summary Numerous organizations are adopting microservice patterns to handle their complex distributed systems. The orchestration pattern plays a vital role in designing and managing these systems. By centralizing control and coordination, the orchestration pattern enhances agility, scalability, and resilience, making it an essential tool for organizations looking to modernize their infrastructure.

By Gaurav Gaur CORE
What Is Advertised Kafka Address?
What Is Advertised Kafka Address?

Let’s start with the basics. After successfully starting a Redpanda or Apache Kafka® cluster, you want to stream data into it right away. No matter what tool and language you chose, you will immediately be asked for a list of bootstrap servers for your client to connect to it. This bootstrap server is just for your client to initiate the connection to one of the brokers in the cluster, and then it will provide your client with initial sets of metadata. The metadata tells the client the currently available brokers, and which the leader of each partition is hosted by which brokers so that the client can initiate a direct connection to all brokers individually. The diagram below will give you a better idea. The client figures out where to stream the data based on the info given. Depending on the number of partitions and where they are hosted, your client will push or pull from the partitions to/from its host brokers. Both the Kafka address and the advertised Kafka address are needed. Kafka Address is used for Kafka brokers to locate each other, and the advertised address for the client to find them. In this post, we’ll help you understand the advertised Kafka address, how to use it in Docker and Kubernetes (K8s), and how to debug it. When To Use Kafka Address and Advertised Kafka Address When starting up your Redpanda cluster, Kafka Address is used to bind the Redpanda service to its host and use the established endpoint to start accepting requests. {LISTENER_NAME}://{HOST_NAME}:{PORT} The broker uses the advertised Kafka address in the metadata, so your client will take the address to locate other brokers. To set it, use --kafka-addr and --advertise-kafka-addr with RPK or kafka_api or advertised_kafka_api inside /etc/redpanda/redpanda.yaml for each broker. Until this point, everything seems straightforward. And, you might start to wonder whether the Kafka address and advertised Kafka address are actually redundant. It starts to get tricky when your client has no visibility into the cluster host, and if you pass the same internal address to the client, it won’t be able to resolve it. So, we need to modify the advertised Kafka address to let the Kafka client understand and be reachable from outside (i.e., external IP). How To Use Kafka Address and Advertised Kafka Address in Docker (Container) Another problem that often comes up is while running Redpanda brokers in Docker (container). The same applies to other more complex network topologies. But fear not, you already know the mechanics. All you need to do is put the right address for clients that reside in different places. When running the Docker container, it creates its own network by default, but in the case where you need to have multiple of them communicating, you will need to set up a network (sometimes even multiple layers of network) by bridging them together. We know that the Kafka address is used for binding to the host, we’ll just use 0.0.0.0 as it will bind to all interfaces in the host and any port of your wish (do not use an already occupied port). An example could be 0.0.0.0:9092 and 0.0.0.0:9095 for each broker running in the Docker container, you will register a name in the network, if your client is trying to access the broker within the network, all you need to do is set the advertised Kafka address to it’s registered hostname in the network. For example, if your first Redpanda container registered its name as Building-A, you can set the advertised Kafka address to Building-A:9092. For clients outside of the Docker network, where they don’t have access to the network’s routing table, the advertised Kafka address will need to be set to the host where the Docker containers are running on, so the client can find it. And don’t forget that you also need to expose the port and associate that with the host. But, what happens if you have clients who both want to access the cluster at the same time? Simple, add multiple listeners! Each listener will return a set of advertised Kafka addresses for clients in a different environment. Here’s a diagram for you. Using Kafka Address and Advertised Kafka Address in K8s Since Kubernetes is a platform that orchestrates containers, the concept is very similar to running Docker containers, but on a larger scale. In a typical Redpanda cluster, you will want to install a single Redpanda broker in an individual worker node. All the pods running in the K8s would get assigned an internal address, that is only visible inside the Kubernetes environment, if the client is running outside of Kubernetes, it will need a way to find the brokers. So you can use NodePort to expose the port and use the public IP address of the hosting worker node. For the Kafka address, as usual, just bind it to the local container. For example, 0.0.0.0:9092 and 0.0.0.0:9095. As for the advertised Kafka address, we will need to set two listeners: one for internal connection, and one for external. For internal clients, we can simply use the generated internal service name, for example, if your service name is set to Building-A, the advertised Kafka address will be internal://Building-A:9092.For the external listener use the hosting worker node’s public IP (or domain name ) with the port exposed in NodePort, where you will be assigned a new port. For example, if your first work node has public IP (Domain) as XXX-Blvd, and the new port assigned is 39092, you can set the advertised Kafka address to external://XXX-Blvd:39092 . How to Debug the Advertised Kafka Address When you are able to connect to your cluster with Redpanda Keep (rpk) and your client throws errors like “ENOTFOUND”. Check if the advertised_kafka_api is correctly set, with an address that can be resolved by your client. Shell > curl localhost:9644/v1/node_config {"advertised_kafka_api":[{"name":"internal","address":"0.0.0.0","port":9092},{"name":"external","address":"192.186.0.3","port":19092}]....} If you are running Docker, find out which port 9644 was exposed from Docker. Shell > docker port redpanda-0 8081/tcp -> 0.0.0.0:18081 9644/tcp -> 0.0.0.0:19644 18082/tcp -> 0.0.0.0:18082 19092/tcp -> 0.0.0.0:19092 And cURL. Shell > curl localhost:19644/v1/node_config {"advertised_kafka_api":[{"name":"internal","address":"redpanda-0","port":9092},{"name":"external","address":"localhost","port":19092}]....} If you are running Kubernetes, find out what is the exposed admin port. Shell > kubectl get svc redpanda-external -n redpanda NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) redpanda-external NodePort 10.100.87.34 <none> 9644:31644/TCP,9094:31092/TCP,8083:30082/TCP,8084:30081/TCP 3h53m Shell > kubectl get pod -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName -n redpanda NAME NODE redpanda-0 ip-1-168-57-208.xx.compute.internal redpanda-1 ip-1-168-1-231.xx.compute.internal redpanda-2 ip-1-168-83-90.xx.compute.internal Shell >kubectl get nodes -o=custom-columns='NAME:.metadata.name,IP:.status.addresses[?(@.type=="ExternalIP")].address' NAME IP ip-1.us-east-2.compute.internal 3.12.84.230 ip-1.us-east-2.compute.internal 3.144.255.61 ip-1.us-east-2.compute.internal 3.144.144.138 And cURL. Shell > curl 3.12.84.230:31644/v1/node_config {"advertised_kafka_api":[{"name":"internal","address":"redpanda-1.redpanda.redpanda.svc.cluster.local.","port":9093},{"name":"default","address":"3.12.84.230","port":31092}].....} Lastly, check if you are connecting to the correct listener(Port). And you’re all done! Conclusion If you made it this far, you should now have a better understanding of what the advertised Kafka address is and how you can use it in Docker and K8s. To learn more about Redpanda, check out our documentation and browse the Redpanda blog for tutorials and guides on how to easily integrate with Redpanda. For a more hands-on approach, take Redpanda's free Community edition for a test drive!

By Christina Lin CORE
Assessment of Scalability Constraints (and Solutions)
Assessment of Scalability Constraints (and Solutions)

This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report Our approach to scalability has gone through a tectonic shift over the past decade. Technologies that were staples in every enterprise back end (e.g., IIOP) have vanished completely with a shift to approaches such as eventual consistency. This shift introduced some complexities with the benefit of greater scalability. The rise of Kubernetes and serverless further cemented this approach: spinning a new container is cheap, turning scalability into a relatively simple problem. Orchestration changed our approach to scalability and facilitated the growth of microservices and observability, two key tools in modern scaling. Horizontal to Vertical Scaling The rise of Kubernetes correlates with the microservices trend as seen in Figure 1. Kubernetes heavily emphasizes horizontal scaling in which replications of servers provide scaling as opposed to vertical scaling in which we derive performance and throughput from a single host (many machines vs. few powerful machines). Figure 1: Google Trends chart showing correlation between Kubernetes and microservice (Data source: Google Trends ) In order to maximize horizontal scaling, companies focus on the idempotency and statelessness of their services. This is easier to accomplish with smaller isolated services, but the complexity shifts in two directions: Ops – Managing the complex relations between multiple disconnected services Dev – Quality, uniformity, and consistency become an issue. Complexity doesn't go away because of a switch to horizontal scaling. It shifts to a distinct form handled by a different team, such as network complexity instead of object graph complexity. The consensus of starting with a monolith isn't just about the ease of programming. Horizontal scaling is deceptively simple thanks to Kubernetes and serverless. However, this masks a level of complexity that is often harder to gauge for smaller projects. Scaling is a process, not a single operation; processes take time and require a team. A good analogy is physical traffic: we often reach a slow junction and wonder why the city didn't build an overpass. The reason could be that this will ease the jam in the current junction, but it might create a much bigger traffic jam down the road. The same is true for scaling a system — all of our planning might make matters worse, meaning that a faster server can overload a node in another system. Scalability is not performance! Scalability vs. Performance Scalability and performance can be closely related, in which case improving one can also improve the other. However, in other cases, there may be trade-offs between scalability and performance. For example, a system optimized for performance may be less scalable because it may require more resources to handle additional users or requests. Meanwhile, a system optimized for scalability may sacrifice some performance to ensure that it can handle a growing workload. To strike a balance between scalability and performance, it's essential to understand the requirements of the system and the expected workload. For example, if we expect a system to have a few users, performance may be more critical than scalability. However, if we expect a rapidly growing user base, scalability may be more important than performance. We see this expressed perfectly with the trend towards horizontal scaling. Modern Kubernetes systems usually focus on many small VM images with a limited number of cores as opposed to powerful machines/VMs. A system focused on performance would deliver better performance using few high-performance machines. Challenges of Horizontal Scale Horizontal scaling brought with it a unique level of problems that birthed new fields in our industry: platform engineers and SREs are prime examples. The complexity of maintaining a system with thousands of concurrent server processes is fantastic. Such a scale makes it much harder to debug and isolate issues. The asynchronous nature of these systems exacerbates this problem. Eventual consistency creates situations we can't realistically replicate locally, as we see in Figure 2. When a change needs to occur on multiple microservices, they create an inconsistent state, which can lead to invalid states. Figure 2: Inconsistent state may exist between wide-sweeping changes Typical solutions used for debugging dozens of instances don't apply when we have thousands of instances running concurrently. Failure is inevitable, and at these scales, it usually amounts to restarting an instance. On the surface, orchestration solved the problem, but the overhead and resulting edge cases make fixing such problems even harder. Strategies for Success We can answer such challenges with a combination of approaches and tools. There is no "one size fits all," and it is important to practice agility when dealing with scaling issues. We need to measure the impact of every decision and tool, then form decisions based on the results. Observability serves a crucial role in measuring success. In the world of microservices, there's no way to measure the success of scaling without such tooling. Observability tools also serve as a benchmark to pinpoint scalability bottlenecks, as we will cover soon enough. Vertically Integrated Teams Over the years, developers tended to silo themselves based on expertise, and as a result, we formed teams to suit these processes. This is problematic. An engineer making a decision that might affect resource consumption or might impact such a tradeoff needs to be educated about the production environment. When building a small system, we can afford to ignore such issues. Although as scale grows, we need to have a heterogeneous team that can advise on such matters. By assembling a full-stack team that is feature-driven and small, the team can handle all the different tasks required. However, this isn't a balanced team. Typically, a DevOps engineer will work with multiple teams simply because there are far more developers than DevOps. This is logistically challenging, but the division of work makes more sense in this way. As a particular microservice fails, responsibilities are clear, and the team can respond swiftly. Fail-Fast One of the biggest pitfalls to scalability is the fail-safe approach. Code might fail subtly and run in non-optimal form. A good example is code that tries to read a response from a website. In a case of failure, we might return cached data to facilitate a failsafe strategy. However, since the delay happens, we still wait for the response. It seems like everything is working correctly with the cache, but the performance is still at the timeout boundaries. This delays the processing. With asynchronous code, this is hard to notice and doesn't put an immediate toll on the system. Thus, such issues can go unnoticed. A request might succeed in the testing and staging environment, but it might always fall back to the fail-safe process in production. Failing fast includes several advantages for these scenarios: It makes bugs easier to spot in the testing phase. Failure is relatively easy to test as opposed to durability. A failure will trigger fallback behavior faster and prevent a cascading effect. Problems are easier to fix as they are usually in the same isolated area as the failure. API Gateway and Caching Internal APIs can leverage an API gateway to provide smart load balancing, caching, and rate limiting. Typically, caching is the most universal performance tip one can give. But when it comes to scale, failing fast might be even more important. In typical cases of heavy load, the division of users is stark. By limiting the heaviest users, we can dramatically shift the load on the system. Distributed caching is one of the hardest problems in programming. Implementing a caching policy over microservices is impractical; we need to cache an individual service and use the API gateway to alleviate some of the overhead. Level 2 caching is used to store database data in RAM and avoid DB access. This is often a major performance benefit that tips the scales, but sometimes it doesn't have an impact at all. Stack Overflow recently discovered that database caching had no impact on their architecture, and this was because higher-level caches filled in the gaps and grabbed all the cache hits at the web layer. By the time a call reached the database layer, it was clear this data wasn't in cache. Thus, they always missed the cache, and it had no impact. Only overhead. This is where caching in the API gateway layer becomes immensely helpful. This is a system we can manage centrally and control, unlike the caching in an individual service that might get polluted. Observability What we can't see, we can't fix or improve. Without a proper observability stack, we are blind to scaling problems and to the appropriate fixes. When discussing observability, we often make the mistake of focusing on tools. Observability isn't about tools — it's about questions and answers. When developing an observability stack, we need to understand the types of questions we will have for it and then provide two means to answer each question. It is important to have two means. Observability is often unreliable and misleading, so we need a way to verify its results. However, if we have more than two ways, it might mean we over-observe a system, which can have a serious impact on costs. A typical exercise to verify an observability stack is to hypothesize common problems and then find two ways to solve them. For example, a performance problem in microservice X: Inspect the logs of the microservice for errors or latency — this might require adding a specific log for coverage. Inspect Prometheus metrics for the service. Tracking a scalability issue within a microservices deployment is much easier when working with traces. They provide a context and a scale. When an edge service runs into an N+1 query bug, traces show that almost immediately when they're properly integrated throughout. Segregation One of the most important scalability approaches is the separation of high-volume data. Modern business tools save tremendous amounts of meta-data for every operation. Most of this data isn't applicable for the day-to-day operations of the application. It is meta-data meant for business intelligence, monitoring, and accountability. We can stream this data to remove the immediate need to process it. We can store such data in a separate time-series database to alleviate the scaling challenges from the current database. Conclusion Scaling in the age of serverless and microservices is a very different process than it was a mere decade ago. Controlling costs has become far harder, especially with observability costs which in the case of logs often exceed 30 percent of the total cloud bill. The good news is that we have many new tools at our disposal — including API gateways, observability, and much more. By leveraging these tools with a fail-fast strategy and tight observability, we can iteratively scale the deployment. This is key, as scaling is a process, not a single action. Tools can only go so far and often we can overuse them. In order to grow, we need to review and even eliminate unnecessary optimizations if they are not applicable. This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report

By Shai Almog CORE
NoSQL vs SQL: What, Where, and How
NoSQL vs SQL: What, Where, and How

As a beginner, it is essential to understand the two most commonly used types of databases: SQL and NoSQL. In this article, I have tried my best to provide a comprehensive guide that will help beginners to understand the differences between SQL and NoSQL, their use cases, and the scenarios in which they perform better than the other. The information here will provide you with an overview of SQL and NoSQL databases and highlight the advantages and disadvantages of each. By the end of this article, you will be able to make an informed decision on which type of database to use for your project. Whether you are a software developer, a data analyst, or a business owner looking to store and manage your data, this information is valuable and relevant to you. So, let's dive in and explore the world of SQL and NoSQL databases. Facts About SQL and NoSQL SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce after learning about the relational model from Edgar F. Codd in the early 1970s. The term NoSQL was used by Carlo Strozzi in 1998. Oracle brought the first commercial relational database to market in 1979 followed by DB2, SAP Sybase ASE, and Informix. NoSQL databases are not a replacement for relational databases but rather offer an alternative solution for certain use cases. SQL databases offer a high degree of data consistency and transactional support, making them a popular choice for applications that require data integrity and reliability. NoSQL databases are often horizontally scalable, meaning they can easily distribute data across multiple servers, allowing for greater scalability. The CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the three guarantees: There are no hard and fast rules for when to use SQL or NoSQL, and the best choice for a particular project will depend on the specific needs and constraints of the project. SQL databases are generally more widely used than NoSQL databases. According to a survey by DB-Engines, the top five most popular databases in terms of popularity and usage are all SQL databases (Oracle, MySQL, Microsoft SQL Server, PostgreSQL, and SQLite). Real-World Applications That Use SQL or NoSQL Twitter uses a NoSQL database (Cassandra) to store and manage the massive amount of data generated by its users. They say, "Our geo team uses it to store and query their database of places of interest. The research team uses it to store the results of data mining done over our entire user base." Netflix uses a combination of SQL and NoSQL databases to store and manage data related to its streaming service. The company uses a SQL database (MySQL) to store structured, transactional data such as subscriber information and billing records, and a NoSQL database (Cassandra) to store data related to user interactions and recommendations. LinkedIn uses a combination of SQL and NoSQL databases to store and manage data related to its professional networking platform. Espresso is LinkedIn's online, distributed, fault-tolerant NoSQL database that currently powers approximately 30 LinkedIn applications including Member Profile, InMail (LinkedIn's member-to-member messaging system), portions of the Homepage, and mobile applications. Facebook uses MySQL as a primary database, an open-source database developed by Oracle, that powers some of Facebook’s most important workloads. They introduced MyRocks, a new MySQL database engine, with the goal of improving space and write efficiency beyond what was possible with compressed InnoDB. Stack Overflow uses SQL Server. Nick Craver wrote in one of his blogs that Stack Overflow is using SQL Server as single source of truth. All data in Elastic and Redis comes from SQL Server. They run two SQL Server clusters with AlwaysOn Availability Groups. Use Cases for SQL and NoSQL in Different Businesses SQL Financial systems Customer relationship management (CRM) systems Inventory management systems Human resources (HR) systems Data warehousing and business intelligence (BI) systems NoSQL Social media networks E-commerce websites Real-time analytics systems Mobile app backends Content management systems (CMS) These are just a few examples, and there are many other use cases for both SQL and NoSQL. The best technology for a particular project will depend on the specific needs and constraints of the project. Database in the Cloud Most major cloud providers offer a variety of SQL and NoSQL databases as a service. Here are a few examples of the types of databases offered by some of the major cloud providers: Amazon Web Services (AWS) offers a range of SQL and NoSQL databases, including: SQL: Amazon RDS (MySQL, PostgreSQL, Oracle, Microsoft SQL Server) NoSQL: Amazon DynamoDB (key-value), Amazon DocumentDB (document), Amazon Neptune (graph) Microsoft Azure offers a range of SQL and NoSQL databases, including: SQL: Azure SQL Database (relational), Azure Database for MySQL, Azure Database for PostgreSQL NoSQL: Azure Cosmos DB (multi-model), Azure Table Storage (key-value) Google Cloud Platform offers a range of SQL and NoSQL databases, including: SQL: Cloud SQL (MySQL, PostgreSQL) NoSQL: Cloud Firestore (document), Cloud Bigtable (wide-column), Cloud Datastore (document) Database Creator Type Cloud Provider MySQL Oracle Relational Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure Oracle Oracle Relational Amazon Web Services (AWS), Microsoft Azure PostgreSQL PostgreSQL Global Development Group Relational Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure Microsoft SQL Server Microsoft Relational Amazon Web Services (AWS), Microsoft Azure MongoDB MongoDB Inc. Document Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure Cassandra Apache Columnar Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure Couchbase Couchbase Inc. Document Amazon Web Services (AWS) Redis Salvatore Sanfilippo Key-value Amazon Web Services (AWS), Microsoft Azure Neo4j Neo4j Inc. Graph Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure Best Practices on Selecting Between SQL and No SQL When choosing between SQL and NoSQL for a particular project, there are a few best practices to keep in mind (this is not the final list): Understand the specific needs and constraints of your project. This will help you determine which technology is the best fit. Consider the type and structure of the data you are working with. SQL is well-suited for structured, transactional data with well-defined relationships, while NoSQL is better for handling unstructured, high-volume data with less-defined relationships. (Again, your project and use case will decide this.) Evaluate the scalability and performance requirements of your application. You must have heard NoSQL databases are generally more scalable and performant than SQL databases, but this may not always be the case. Consider the level of consistency and reliability you need. SQL databases are generally more predictable and consistent, but NoSQL databases offer more flexibility. Test different technologies to see which one performs best in your particular use case. This will help you make an informed decision. Both SQL and NoSQL databases can offer high availability and durability, depending on the specific implementation and the use of techniques such as replication and sharding. Everyone is using NoSQL and so doing this is not always the right strategy. Tools To Help Decide To help decide between SQL and NoSQL for an enterprise application, you might consider using tools such as database performance benchmarking tools, database design and modeling tools, and database management and monitoring tools. Some examples of these types of tools include: MySQL Workbench MongoDB Compass DataGrip DBeaver Redis Desktop Manager Causes of Database Implementation Failures Poorly designed data models or schemas that do not meet the needs of the application Inadequate performance testing or optimization, resulting in poor database performance Lack of robust backup and recovery processes, leading to data loss or corruption Insufficient planning or resources for database maintenance and support Common Failure and Exception Connection failures - When there is a problem establishing a connection to the database, such as when the database server is not running or the connection details are incorrect Resolution: Establishing robust connection management and retry strategies to handle connection failures Query failures - A problem while executing a query, something like when the query syntax is invalid or the query is taking too long to execute Resolution: Debugging and optimizing queries to improve performance Transaction failures -If there is a problem with a database transaction like a transaction is canceled or rolled back due to a deadlock or a constraint violation Resolution: Implementing proper transaction management to minimize the risk of transaction failures Data corruption - This can occur when there is a problem with the data stored in the database, such as when data becomes corrupted or is lost due to a hardware failure or a software bug. Resolution: Implementing backup and recovery strategies to mitigate the risk of data loss or corruption Performance issues: An underperformance in database queries like slowness or the database consuming too many resources Resolution: Monitoring and tuning the database to identify and address performance issues Deployment Architectures for Databases Standalone server: In this architecture, the database is installed on a single server and accessed directly by the application. This is the simplest and easiest deployment option, but it is not suitable for high-scale or high-availability applications. Replication: Here, the database is deployed on multiple servers, with each server hosting a copy of the data. The servers are configured in a replica set, and one of the servers is designated as the primary. Applications write to the primary and the data is automatically replicated to the other servers. This provides improved availability and fault tolerance but does not offer horizontal scalability. Sharding: This is the same as with replication where the database is deployed on multiple servers, and the data is partitioned across the servers. Here partitions are called shards, and the servers are organized into a sharded cluster. Applications write to the cluster and the data is automatically routed to the appropriate shard. This style provides improved scalability and performance while requiring additional configuration and management. Cloud-managed service: A cloud provider manages the database and is accessed by APIs. This may be the easiest way to deploy and manage. On the other hand, it might be expensive, and control and customization will be less as compared to others. What Can Cause the Performance Issues in Database Insufficient resources Poorly designed queries Indexing issues Schema are not optimized Sharding issues Network latency or bandwidth My Personal Experience Using SQL and NOSQL I was part of an enterprise API development team where initially, we started using SQL database. Later when our organization adopted NoSQL, we moved there considering the fact that we will scale and everything else will be smooth. However, we started getting challenges like scale, performance, indexing, etc. One of the challenges of using NoSQL databases is that they often lack the strong data consistency guarantees that are provided by relational databases. You need to remember the "eventual consistency" in a distributed environment. This means that it is possible for data to become inconsistent or outdated in certain scenarios, such as when multiple clients are updating the same data simultaneously. So as beginners, we never thought of this scenario and gradually started learning and redesigning the database architecture to move from records to documents. NoSQL databases are designed to handle large amounts of data and high read and write throughput, but optimizing their performance requires a deep understanding of the database's architecture and configuration settings. A shift is required from a relational-only mindset. A database is a place to store data and a specific data structure is followed. Think of moving from a store procedure full of business logic to app-only business logic: there will be no logic inside the database. One has to be better at data modeling and designing indexes while using NoSQL to its full potential. Where To Go From Here SQL: The official SQL website W3Schools SQL tutorial Codecademy SQL course NoSQL: NoSQL Wikipedia page MongoDB NoSQL tutorial Cassandra NoSQL tutorial Both NoSQL and SQL databases have their unique advantages and limitations, and the choice between them depends on the specific requirements and use cases of a particular project. It is important to carefully consider the trade-offs and benefits of each option before making a decision. Note: The information and examples provided in these topics are for educational purposes only. The specific implementation and deployment database may vary depending on the specific requirements and constraints of the application. It is important to carefully plan and design the database deployment and architecture to ensure optimal performance and scalability. I hope you've learned as much as I have. Until next time: sharing is caring.

By Manas Dash CORE
[DZone Survey] Share Your Expertise and Take our 2023 Web, Mobile, and Low-Code Apps Survey
[DZone Survey] Share Your Expertise and Take our 2023 Web, Mobile, and Low-Code Apps Survey

Do you consider yourself a developer? If "yes," then this survey is for you. We need you to share your knowledge on web and mobile development, how (if) you leverage low code, scalability challenges, and more. The research covered in our Trend Reports depends on your feedback and helps shape the report. Our April Trend Report this year focuses on you, the developer. This Trend Report explores development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more.  And this is where we could use your insights! We're asking for ~10 minutes of your time to share your experience.Let us know your thoughts on the future of the developer! And enter for a chance to win one of 10 $50 gift cards! Take Our Survey Over the coming weeks, we will compile and analyze data from hundreds of DZone members to help inform the "Key Research Findings" for our upcoming April Trend Report, Development at Scale: An Exploration of Mobile, Web, and Low-Code Applications. Your responses help shape the narrative of our Trend Reports, so we cannot do this without you. The DZone Publications team thanks you in advance for all your help!

By Caitlin Candelmo
REST vs. Messaging for Microservices
REST vs. Messaging for Microservices

This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report A microservices architecture is an established pattern for building a complex system that consists of loosely coupled modules. It is one of the most talked-about software architecture trends in the last few years. It seems to be a surprisingly simple idea to break a large, interdependent system into many small, lightweight modules that can make software management easier. Here's the catch: After you have broken down your monolith application into small modules, how are you supposed to connect them together in a meaningful way? Unfortunately, there is no single right answer to this question, but as is so often the case, there are a few approaches that depend on the application and the individual use case. Two common protocols used in microservices are HTTP request/response with resource APIs and lightweight asynchronous messaging when communicating updates across several microservices. Let's explore these protocols. Types of Communication Microservices can communicate through many different modes of communication, each one targeting a different use case. These types of communications can be primarily classified in two dimensions. The first dimension defines if the communication protocol is synchronous or asynchronous: SYNCHRONOUS vs. ASYNCHRONOUS COMMUNICATION Synchronous Asynchronous Communication pattern The client sends a request and waits for a response from the server. Communication is not in sync, which means it does not happen in real time. Protocols HTTP/HTTPS AMQP, MQTT Coupling The client code can only continue its task further when it receives the server response. In the context of distributed messaging, coupling implies that request processing will occur at an arbitrary point in time. Failure isolation It requires the downstream server to be available or the request fails. If the consumer fails, the sender can still send messages. The messages will be picked up when the consumer recovers. Table 1 The second dimension defines if the communication has a single receiver or multiple receivers: COMMUNICATION VIA SINGLE vs. MULTIPLE RECEIVERS Single Receiver Multiple Receivers Communication pattern It implies that there is point-to-point communication that delivers a message to exactly one consumer that is reading from the channel, and that the message is processed only once. Communication from the sender is available to multiple receivers. Example It is well-suited for sending asynchronous commands from one microservice to another. The publish/subscribe mechanism is where a publisher publishes a message to a channel and the channel can be subscribed by multiple subscribers/receivers to receive the message asynchronously. Table 2 The most common type of communication between microservices is single-receiver communication with a synchronous protocol like HTTP/HTTPS when invoking a REST API. Microservices typically use messaging protocols for asynchronous communication between microservices. This asynchronous communication may involve a single receiver or multiple receivers depending on the application's needs. Representational State Transfer Representational state transfer (REST) is a popular architectural style for request and response communication, and it can serve as a good example for the synchronous communication type. This is based on the HTTP protocol, embracing verbs such as GET, POST, PUT, DELETE, etc. In this communication pattern, the caller waits for a response from the server. Figure 1: REST API-based communication REST is the most commonly used architectural style for communication between services, but heavy reliance on this type of communication has some negative consequences when it comes to a microservices architecture: Multiple round trips (latency) – The client often needs to execute multiple trips to the server to fetch all the data the client requires. Each endpoint specifies a fixed amount of data, and in many cases, that data is only a subset of what a client needs to populate their page. Blocking – When invoking a REST API, the client is blocked and is waiting for a server response. This may hurt application performance if the application thread is processing other concurrent requests. Tight coupling – The client and server need to know about each other. It increases complexity over time and reduces portability. Messaging Messaging is widely used in a microservices architecture, which follows the asynchronous protocol. In this pattern, a service sends a message without waiting for a response, and one or more services process the message asynchronously. Asynchronous messaging provides many benefits but also brings challenges such as idempotency, message ordering, poison message handling, and complexity of message broker, which must be highly available. It is important to note the difference between asynchronous I/O and the asynchronous protocol. Asynchronous I/O means that the calling thread is not blocked while the I/O operations are executed. This is an implementation detail in terms of the software design. The asynchronous protocol means the sender does not wait for a response. Figure 2: Messaging-based communication Asynchronous messaging has some advantages over synchronous messaging: Loose coupling – The message producer does not need to know about the consumer(s). Multiple subscribers – Using a publisher/subscriber (pub/sub) model, multiple consumers can subscribe to receive events. Resiliency or failure isolation – If the consumer fails, the producer can still send messages. The messages will be picked up when the consumer recovers from failure. This is especially useful in a microservices architecture because each microservice has its own lifecycle. Non-blocking – The producers and consumers can send and process messages at their own pace. Though asynchronous messaging has many advantages, it comes with some tradeoffs: Tight coupling with the messaging infrastructure – Using a particular vendor/messaging infrastructure may cause tight coupling with that infrastructure. It may become difficult to switch to another vendor/messaging infrastructure later. Complexity – Handling asynchronous messaging may not be as easy as designing a REST API. Duplicate messages must be handled by de-duplicating or making the operations idempotent. It is hard to implement request-response semantics using asynchronous messaging. To send a response, another queue and a way to correlate request and response messages are both needed. Debugging can also be difficult as it is hard to identify which request in Service A caused the wrong behavior in Service B. Asynchronous messaging has matured into a number of messaging patterns. These patterns apply to scenarios when several parts of a distributed system must communicate with one another in a dependable and scalable way. Let's take a look at some of these patterns. Pub/Sub Pattern The pub/sub pattern implies that a publisher sends a message to a channel on a message broker. One or more subscribers subscribe to the channel and receive messages from the channel in an asynchronous manner. This pattern is useful when a microservice needs to broadcast information to a significant number of consumers. Figure 3: Pub/sub pattern The pub/sub pattern has the following advantages: It decouples publishers and subscribers that need to communicate. Publishers and subscribers can be managed independently, and messages can be managed even if one or more subscribers are offline. It increases scalability and improves responsiveness of the publisher. The publisher can quickly publish a message to the input channel, then return to its core processing responsibilities. The messaging infrastructure is responsible for ensuring messages are delivered to interested subscribers. It provides separation of concerns for microservices. Each microservice can focus on its core responsibilities, while the message broker handles everything required to reliably route messages to multiple subscribers. There are a few disadvantages of using this pattern: The pub/sub pattern introduces high semantic coupling in the messages passed by the publishers to the subscribers. Once the structure of the data is established, it is often difficult to change. To change the message structure, all subscribers must be altered to accept the changed format. This can be difficult or impossible if the subscribers are external. Another drawback of the pub/sub pattern is that it is difficult to gauge the health of subscribers. The publisher does not have knowledge of the health status of the systems listening to the messages. As a pub/sub system scales, the broker often becomes a bottleneck for message flow. Load surges can slow down the pub/sub system, and subscribers can get a spike in response time. Queue-Based Pattern In the queue-based pattern, a sender posts a message to a queue containing the data required by the receiver. The queue acts as a buffer, storing the message until it is retrieved by the receiver. The receiver retrieves messages from the queue and processes them at its own pace. This pattern is useful for any application that uses services that are subject to overloading. Figure 4: Queue-based pattern The queue-based pattern has the following advantages: It can help maximize scalability because both the number of queues and the number of services can be scaled to meet demand. It can help maximize availability. Delays arising in the producer or consumer won't have an immediate or direct impact on the services, which can continue to post messages to the queue even when the consumer isn't available or is under heavy load to process messages. There are some disadvantages of using this pattern: When a consumer receives a message from the queue, the message is no longer available in the queue. If a consumer fails to process the message, the message is lost and may need a rollback in the consumer. Message queues do not come out of the box. We need to create, configure, and monitor them. It can cause operational complexity when systems are scaled up. Keys To Streamlined Messaging Infrastructure Asynchronous communication is usually managed through a message broker. There are some factors to consider when choosing the right messaging infrastructure for asynchronous communication: Scalability – the ability to scale automatically when there is a load surge on the message broker Data persistency – the ability to recover messages in case of reboot/failure Consumer capability – whether the broker can manage one-to-one and/or one-to-many consumers Monitoring – whether monitoring capabilities are available Push and pull queue – the ability to handle push and pull delivery by message queues Security – proper authentication and authorization for messaging queues and topics Automatic failover – the ability to connect to a failover broker automatically when one broker fails without impacting publisher/consumer Conclusion More and more, microservices are becoming the de facto approach for designing scalable and resilient systems. There is no single approach for all communications between microservices. While RESTful APIs provide a request-response model to communicate between services, asynchronous messaging offers a more scalable producer-consumer relationship between different services. And although microservices can communicate with each other via both messaging and REST APIs, messaging architectures are ideal for improving agility and moving quickly. They are commonly found in modern applications that use microservices or any application that has decoupled components. When it comes to choosing a right style of communication for your microservices, be sure to match the needs of the consumer with one or more communication types to offer a robust interface for your services. This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report

By Swathi Prasad
Full Lifecycle API Management Is Dead
Full Lifecycle API Management Is Dead

This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report As organizations look to enable integration, innovation, and digital experiences through their IT teams, they often build APIs and expose them by leveraging a full-lifecycle API management system. Historically, these API management systems provided tooling such as: Defining an API (e.g., Swagger, OpenAPI Spec, RAML) API testing API scaffolding and implementation Specifications for quota and usage policies/plans Documentation An API portal These API management systems were often delivered as a fully integrated stack with a fancy UI, role-based access control, and push-button mechanisms to accomplish the lifecycle management functions. While this all sounds very nice, there are some realities we face as organizations look to modernize their application and API delivery engines. An API management platform does not exist in a vacuum. DevOps philosophies have influenced organizational structures, automation, and self-service. Any API management system must fit within a modern development environment that is often multi-language, multi-platform, and multi-cloud. This infrastructure must also fit natively with Git-based deployment workflows (GitOps), including systems built for CI/CD. Avoid Yet Another Silo (YAS) Although developer productivity can be difficult to measure, proxy metrics that can be useful include things like the following: Lead time to make code changes in production Number of deployments to production per week Traditionally, developers write code, create services, build APIs, and then hand them off to operations to deploy and operate those services and APIs. The silos between development, infrastructure, security, and network teams often leads to complex synchronization points, handoffs, and a lot of waiting. This slows down code changes and deployments to production. Figure 1: Siloed handoffs between teams cause a slowdown in delivery to production Large monolithic software systems can further this problem by forcing their own silos within each of the organizational silos. They have their own proprietary UIs, require specialized skills or privilege to operate, and are often owned by specific teams. If you need something from the large monolithic software system, you typically need to open a ticket to signal to the team who owns the system that they need to make a change. In practice, traditional full lifecycle API management systems create silos by forcing users into an all-or-nothing set of tools for defining, implementing, testing, and exposing APIs even if these differ from what a development team wants to use. These systems are very difficult to automate and integrate with other parts of the software delivery systems, and they are usually guarded by some API management team that is responsible for configuring and deploying APIs. This centralization from both a technology and organizational standpoint creates bottlenecks that slow down delivery in a modern DevOps-minded organization. Favor Automation Over Point-and-Click UIs Most traditional full lifecycle API management systems do have some role-centric capabilities, like role-based UIs and tools for specific personas. One principle prevalent in modern DevOps implementations is around eliminating manual or repetitive tasks using automation. We cannot expect users to log into a system that runs tests, a totally different system to manage APIs, and yet another system to do a deployment. Figure 2: We should reduce multiple, manual, point-and-click UIs in favor of automation Ideally, we would automate a lot of these steps so a developer can go to a single self-service UI for anything related to software development and deployment. Any functionality we would like, including traditional API management and each of its "full lifecycle" functionalities, should be automatable. With a lot of the functionality in modern API management locked into proprietary UIs, automation is often very challenging and brittle, if accomplished at all. The API Lifecycle Is The Software Development Lifecycle The API lifecycle is often centered around design, implementation, testing, control, and consumption. Does this sound familiar? It should — because it's exactly what we do with any software we write. When developers create APIs, they use software to do so. The API lifecycle is the software development lifecycle. Trying to treat the lifecycle of APIs differently from the rest of our software development practices creates inconsistencies, fragmentation, and friction. For example, when we create an API, we may need to develop it, test it, and will probably eventually need to notify users when we need to retire it. We need the same capabilities for internal services, libraries, and other system components. Although there may be some slight differences, should these be separate and different processes? Should these be completely different sets of tools? Trying to duplicate what is already necessary for the software development lifecycle with substandard and proprietary tools specific for API management causes adoption, governance, and bifurcation issues. Use an Internal Developer Platform As organizations attempt to improve developer productivity by shifting left and giving developers more responsibility and control over building and running their services and APIs, we've seen an emergence in platform teams responsible for building workflows and toolchains that enable self-service. These workflows get boiled down to "golden paths" that developers can easily follow and that automate a lot of the tasks around bootstrapping new projects, documenting their software, enforcing access/security policies, and controlling deployment rollouts. This developer-focused self-service platform is known as an Internal Developer Platform (IDP) and aims to cover the operational necessities of the entire lifecycle of a service. Although many teams have built their own platforms, there are some good open-source frameworks that go a long way to building an IDP. For example, Backstage is a popular open-source project used to build IDPs. Platform engineering teams typically have a lot of flexibility picking the best of breed tools for developers that support multiple types of languages and developer frameworks. Plus, these tools can be composed through automation and don't rely on proprietary vendor UIs. Platform engineering teams also typically build their platform around container technology that can be used across multiple clusters and stretch into on-premises deployments as well as the public cloud. These IDPs insulate from vendor lock-in whether that's a particular public cloud or vendor. For example, here's a very common scenario that I've run into numerous times: An organization bought into a full-lifecycle API management vendor and finds itself in a situation where their modernization efforts are centered around containers and Kubernetes, GitOps, and CI/CD. They find the API management vendor may have strong tools around API design; however, runtime execution, the API portal, and analytics features are lagging, outdated, or cannot be automated with the rest of the container platform via GitOps. They often wish to use a different API gateway technology based on more modern open-source proxies like Envoy Proxy but are locked into a tightly integrated yet outdated gateway technology with their current vendor. Instead, these organizations should opt to use newer proxy technologies, select more developer-friendly API testing tools, tie API analytics into their existing streaming and analytics efforts, and rely on tools like Backstage to tie all of this together. Doing so, they would reduce silos centered around vendor products, leverage best-of-breed tools, and automate these tools in a way that preserves governance and prescribed guard rails. These platforms can then support complex deployment strategies like multi-cluster, hybrid, and multi-cloud deployments. Conclusion Managing APIs will continue to be an important aspect of software development, but it doesn't happen in a vacuum. Large monolithic full lifecycle API management stacks are outdated, don't fit in with modern development practices, and cause silos when we are trying to break down silos. Choosing the best-of-bread tools for API development and policy management allows us to build a powerful software development platform (an IDP) that improves developer productivity, reduces lock-in, and allows organizations to deploy APIs and services across containers and cloud infrastructure whether on-premises or any public cloud. This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report

By Christian Posta
The Path From APIs to Containers
The Path From APIs to Containers

This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report In recent years, the rise of microservices has drastically changed the way we build and deploy software. The most important aspect of this shift has been the move from traditional API architectures driven by monolithic applications to containerized microservices. This shift not only improved the scalability and flexibility of our systems, but it has also given rise to new ways of software development and deployment approaches. In this article, we will explore the path from APIs to containers and examine how microservices have paved the way for enhanced API development and software integration. The Two API Perspectives: Consumer and Provider The inherent purpose of building an API is to exchange information. Therefore, APIs require two parties: consumers and providers of the information. However, both have completely different views. For an API consumer, an API is nothing more than an interface definition and a URL. It does not matter to the consumer whether the URL is pointing to a mainframe system or a tiny IoT device hosted on the edge. Their main concern is ease of use, reliability, and security. An API provider, on the other hand, is more focused on the scalability, maintainability, and monetization aspects of an API. They also need to be acutely aware of the infrastructure behind the API interface. This is the place where APIs actually live, and it can have a lot of impact on their overall behavior. For example, an API serving millions of consumers would have drastically different infrastructure requirements when compared to a single-consumer API. The success of an API offering often depends on how well it performs in a production-like environment with real users. With the explosion of the internet and the rise of always-online applications like Netflix, Amazon, Uber, and so on, API providers had to find ways to meet the increasing demand. They could not rely on large monolithic systems that were difficult to change and scale up as and when needed. This increased focus on scalability and maintainability, which led to the rise of microservices architecture. The Rise of Microservices Architecture Microservices are not a completely new concept. They have been around for many years under various names, but the official term was actually coined by a group of software architects at a workshop near Venice in 2011/2012. The goal of microservices has always been to make a system flexible and maintainable. This is an extremely desirable target for API providers and led to the widespread adoption of microservices architecture styles across a wide variety of applications. The adoption of microservices to build and deliver APIs addressed several challenges by providing important advantages: Since microservices are developed and deployed independently, they allow developers to work on different parts of the API in parallel. This reduces the time to market for new features. Microservices can be scaled up or down to meet the varying demands of specific API offerings. This helps to improve resource use and cost savings. There is a much better distribution of API ownership as different teams can focus on different sets of microservices. By breaking down an API into smaller and more manageable services, it becomes theoretically easier to manage outages and downtimes. This is because one service going down does not mean the entire application goes down. The API consumers also benefit due to the microservices-based APIs. In general, consumer applications can model better interactions by integrating a bunch of smaller services rather than interfacing with a giant monolith. Figure 1: APIs perspectives for consumer and provider Since each microservice has a smaller scope when compared to a monolith, there is less impact on the client application in case of changes to the API endpoints. Moreover, testing for individual interactions becomes much easier. Ultimately, the rise of microservices enhanced the API-development landscape. Building an API was no longer a complicated affair. In fact, APIs became the de facto method of communication between different systems. Nonetheless, despite the huge number of benefits provided by microservices-based APIs, they also brought some initial challenges in terms of deployments and managing dependencies. Streamlining Microservices Deployment With Containers The twin challenges of deployment and managing dependencies in a microservices architecture led to the rise in container technologies. Over the years, containers have become increasingly popular, particularly in the context of microservices. With containers, we can easily package the software with its dependencies and configuration parameters in a container image and deploy it on a platform. This makes it trivial to manage and isolate dependencies in a microservices-based application. Containers can be deployed in parallel, and each deployment is predictable since everything that is needed by an application is present within the container image. Also, containers make it easier to scale and load balance resources, further boosting the scalability of microservices and APIs. Figure 2 showcases the evolution from monolithic to containerized microservices: Figure 2: Evolution of APIs from monolithic to containerized microservices Due to the rapid advancement in cloud computing, container technologies and orchestration frameworks are now natively available on almost all cloud platforms. In a way, the growing need for microservices and APIs boosted the use of containers to deploy them in a scalable manner. The Future of Microservices and APIs Although APIs and microservices have been around for numerous years, they have yet to reach their full potential. Both are going to evolve together in this decade, leading to some significant trends. One of the major trends is around API governance. Proper API governance is essential to make your APIs discoverable, reusable, secure, and consistent. In this regard, OpenAPI, a language-agnostic interface to RESTful APIs, has more or less become the prominent and standard way of documenting APIs. It can be used by both humans and machines to discover and understand an API's capabilities without access to the source code. Another important trend is the growth in API-powered capabilities in the fields of NLP, image recognition, sentiment analysis, predictive analysis, chatbot APIs, and so on. With the increased sophistication of models, this trend is only going to grow stronger, and we will see many more applications of APIs in the coming years. The rise of tools like ChatGPT and Google Bard shows that we are only at the beginning of this journey. A third trend is the increased use of API-driven DevOps for deploying microservices. With the rise of cloud computing and DevOps, managing infrastructure is an extremely important topic in most organizations. API-driven DevOps is a key enabler for Infrastructure as Code tools to provision infrastructure and deploy microservices. Under the covers, these tools rely on APIs exposed by the platforms. Apart from major ones, there are also other important trends when it comes to the future of microservices and APIs: There is a growing role of API enablement on the edge networks to power millions of IoT devices. API security practices have become more important than ever in a world of unprecedented integrations and security threats. API ecosystems are expanding as more companies develop a suite of APIs that can be used in a variety of situations to build applications. Think of API suites like Google Maps API. There is an increased use of API gateways and service meshes to improve reliability, observability, and security of microservices-based systems. Conclusion The transition from traditional APIs delivered via monolithic applications to microservices running on containers has opened up a world of possibilities for organizations. The change has enabled developers to build and deploy software faster and more reliably without compromising on the scalability aspects. They have made it possible to build extremely complex applications and operate them at an unprecedented scale. Developers and architects working in this space should first focus on the key API trends such as governance and security. However, as these things become more reliable, they should explore cutting-edge areas such as API usage in the field of artificial intelligence and DevOps. This will keep them abreast with the latest innovations. Despite the maturity of the API and microservices ecosystem, there is a lot of growth potential in this area. With more advanced capabilities coming up every day and DevOps practices making it easier to manage the underlying infrastructure, the future of APIs and microservices looks bright. References: "A Brief History of Microservices" by Keith D. Foote "The Future of APIs: 7 Trends You Need to Know" by Linus Håkansson "Why Amazon, Netflix, and Uber Prefer Microservices over Monoliths" by Nigel Pereira "Google Announces ChatGPT Rival Bard, With Wider Availability in 'Coming Weeks'" by James Vincent "Best Practices in API Governance" by Janet Wagner "APIs Impact on DevOps: Exploring APIs Continuous Evolution," xMatters Blog This is an article from DZone's 2023 Software Integration Trend Report.For more: Read the Report

By Saurabh Dashora CORE
Implementing PEG in Java
Implementing PEG in Java

In Part 1 of the series on PEG implementation, I explained the basics of Parser Expression Grammar and how to implement it in JavaScript. This second part of the series is focused on implementation in Java using the parboiled library. We will try to build the same example for parsing arithmetic expressions but using different syntax and API. QuickStart parboiled is a lightweight and easy-to-use library to parse text input based on formal rules defined using Parser Expression Grammar. Unlike other parsers that use external grammar definition, parboiled provides a quick DSL (domain-specific language) to define grammar rules that can be used to generate parser rules on the runtime. This approach helps to avoid separate parsing and lexing phases and also does not require additional build steps. Installation The parboiled library is packaged into two level dependencies. There is a core artifact and two implementation artifacts for Java and Scala support. Both Java and Scala artifacts depend on the core and can be used independently in respective environments. They are available as Maven dependencies and can be downloaded from Maven central with the coordinates below: XML <dependency> <groupId>org.parboiled</groupId> <artifactId>parboiled-java</artifactId> <version>1.4.1</version> </dependency> Defining the Grammar Rules Let’s take the same example we used earlier to define rules to parse arithmetic expressions. Expression ← Term ((‘+’ / ‘-’) Term)* Term ← Factor ((‘*’ / ‘/’) Factor)* Factor ← Number / ‘(’ Expression ‘)’ Number ← [0-9]+ With the help of integrated DSL, the following rules can be easily defined as follows. Java public class CalculatorParser extends BaseParser { Rule Expression() { return Sequence( Term(), ZeroOrMore(AnyOf("+-"), Term())); } Rule Term() { return Sequence(Factor(), ZeroOrMore(AnyOf("*/"), Factor())); } Rule Factor() { return FirstOf(Number(), Sequence('(', Expression(), ')')); } Rule Number() { return OneOrMore(CharRange('0', '9')); } } If we take a closer look at the example, the parser class inherits all the DSL functions from its parent class BaseParser. It provides various builder methods for creating different types of Rules. By combining and nesting those you can build your custom grammar rules. There needs to be starting rules that recursively expand to terminal rules which are usually literals and character classes. Generating the Parser parbolied’s createParser API will take the DSL input and generates a parser class by enhancing the byte code of the existing class on the runtime using the ASM utils library. Java CalculatorParser parser = Parboiled.createParser(CalculatorParser.class); Using the Parser The generated parser is then passed to a parse runner which lazily initializes the rule tree for the first time and uses it for the subsequent run. Java String input = "1+2"; ParseRunner runner = new ReportingParseRunner(parser.Expression()); ParsingResult<?> result = runner.run(input); Here, the thing to care about is that both the generated parser and parse runner are not thread-safe. So, we need to keep it minimum scope and avoid sharing it across multiple threads. Understanding the Parse Result/Tree The output parse result encapsulates information about parse success or failure. A successful run generates a parse tree with the appropriate label and text fragments. ParseTreeUtils can be used to print the whole or partial parse tree based on passed filters. Java String parseTreePrintOut = ParseTreeUtils.printNodeTree(result); System.out.println(parseTreePrintOut); For more fine-grained control over the parse tree, you can use the visitor API and traverse it to collect the required information out of it. Sample Implementation There are some sample implementations available with the library itself. It contains samples for calculators, Java, SPARQL, and time formats. Visit this GitHub repository for more. Conclusion As we observed, it is very quick and easy to build/use the parser using the parboiled library. However, there might be some use cases that can lead to performance and memory issues while using it on large input with a complex rule tree. Therefore, we need to be careful about complexity and ambiguity while defining the rules.

By Vinod Pahuja
Using Swagger for Creating a PingFederate Admin API Java Wrapper
Using Swagger for Creating a PingFederate Admin API Java Wrapper

In my previous articles listed below, I have shown how to use Swagger, especially the Springdoc implementation. for doing the code first/bottom-up approach. OpenAPI 3 Documentation With Spring Boot Doing More With Springdoc-OpenAPI Extending Swagger and Spring Doc Open API This time I am writing about the design first/top-down approach. I am not writing about the usual generated Java server, and say, associated Angular TypeScript client code; but first, some background context. Background Some time back I had the opportunity to use PingFederate to solve a business problem for a client of mine(no details due to NDAs). This involved working with the US government’s SSN verification web service and leveraging OIDC for this purpose. The actual code I wrote was just a few Spring Boot classes. The project was more about architecture, integration, infrastructure, etc. When working on this project, I created a side utility. Highlights This is the first time in the PingFed world such a utility has been created. There are some innovative concepts in it. Creating it had some challenges. We will discuss them along with how they were overcome. What Does This Article Offer to the Reader? Speeds up getting the reader started on PingFederate Introduces my utility that helps in meeting this above objective Also showcases two sample applications that demonstrate the Authorization Code Flow: These sample applications are used to demonstrate the effectiveness of our PingFederate configuration. Of particular interest to the reader will be the application that demonstrates my attempt at the authorization code flow using the BFF pattern for the Spring Boot and Angular applications. Note: While these sample applications have been tuned for PingFederate, it should be easy to tweak them for other OIDC providers like Okta, Auth0, etc. Also note: When working on my client's project, there was no front end. It was a machine-to-machine communication project. That said, for most readers, it would be more relevant to have a front end in the examples. Therefore, the two examples do have a front end. A Quick Swagger Recap It supports both the code first/bottom-up and design first/top-down approaches. A Swagger document can be created by using: Swagger Editor Code first libraries like springdoc, SpringFox, Swagger Core, and related libraries that can introspect the actual code The Swagger YAML/JSON document can be visualized using the Swagger UI. This UI is also exposed by the springdoc and SpringFox libraries. Swagger Codegen can be used to generate server/client code. Lastly, there is the SwaggerHub, which leverages all the Swagger tools and offers much more when using the Design First/Top Down approach. What Is PingFederate? PingFederate describes itself as follows: "PingFederate is an enterprise federation server that enables user authentication and single sign-on. It serves as a global authentication authority that allows customers, employees, and partners to securely access all the applications they need from any device. PingFederate easily integrates with applications across the enterprise, third-party authentication sources, diverse user directories, and existing IAM systems, all while supporting current and past versions of identity standards like OAuth, OpenID Connect, SAML, and WS-Federation. It will connect everyone to everything." In my limited context, I used it for OIDC and OAuth purposes. While on the subject of PingFederate, it is not a free product. That said, you can always download and use the latest version of Ping products for free. Trial license files are available. I was able to keep getting new trial license files as needed. I found it very easy to learn. I used PingFederate because, in my client project, some requirements were met better by PingFederate than, say, its cloud-based alternative. What Is the Problem Definition We Are Trying To Solve? Problem Definition: PingFederate Admin API can be used for automating its setup configurations in addition to doing it manually by the admin console. The lack of any programmatic language wrapper makes it hard to administer/configure automatically. Elaborating on the point, just to illustrate the problem: AWS provides SDKs in various programming languages. These SDKs sit on top of the underlying web service API. AWS SDKs It's always easier to use the AWS SDK than work with the underlying web services using Postman/cURL. Similarly for PingFederate A Java Wrapper was achieved: Note: This has been done for the first time in the PingFederate world. :) It is also possible to achieve this in other languages if needed. Is This All That We Did? Is all we did run a Maven-based code generator that reads Swagger specifications of PingFederate Admin API to generate some code and use that? Yes and No. High-Level Solutioning Here, we have 2 flows represented by blue and green arrows. The blue arrows demonstrate: The use of Swagger Core and related code-first annotation-based libraries, causing the automatic generation of the Swagger YAML/JSON Admin API document; this is part of PingFederate itself. This Swagger document is leveraged by the code generator to generate actual code. In our case, we are generating Java REST client code. The green arrows demonstrate: The user interacts with our library: additional convenience code and a particular rest template interceptor. This in turn invokes the generated code. Finally, the PingFederate Admin API is invoked which changes/configures PingFederate. Hurdle in getting this to work: The generated code was not usable in some scenarios. Read more about that and the adopted solution in these Swagger notes on GitHub. In addition to the general approach used, we had to innovate further and resolve the hurdles. That's where the interceptor was leveraged. How To Setup Follow the steps in this GitHub repo. There is a README.md and Setup.md. To summarize, these are the steps. Clone the project. Maven-build the project. Download the ZIP files and license files of PingFederate, PingDirectory. Download a MySQL connector JAR file, also. Verify the downloads. Configure MySQL root user credentials. Install and start PingDirectory and PingFederate using provided Ant script. Launch the PingFederate Admin console for the first time. Maven-build the project with the additional option of generating the Admin API Client code. Use the generated Admin API Client code to administer PingFederate. The code is available on the Git repository. However, let's discuss some code below for better visualization: Java public void setup() throws NoSuchAlgorithmException, KeyManagementException, FileNotFoundException, IOException { String ldapDsId="MyLDAP"; String formAdapterid="HTMLFormAdapter"; String passwordValidatorId="PasswordValidator"; String atmId1="testingATM1"; String policyId1="testingpolicy1"; String ldapAttributeSourceId="mypingfedldapds"; String atmId2="testingATM2"; Properties mySqlProps = PropertiesUtil.loadProps(new File("../mysql.properties")); this.setupDb(mySqlProps); new LdapCreator(core) .createLdap(ldapDsId, "MyLdap", "localhost", "cn=Directory Manager", "manager"); PasswordCredentialValidator passwordCredentialValidator = new PasswordCredentialValidatorCreator(core) .createPasswordCredentialValidator( ldapDsId, passwordValidatorId, passwordValidatorId, "uid=${username}"); IdpAdapter idpAdapter1 = new IdpAdapterCreator(core) .createIdpAdapter( passwordValidatorId, formAdapterid, new String[] {"givenName", "mail", "sn", "uid"}, new String[]{"uid"}, "uid"); IdpAdapterMapping createdIdpAdapterMapping = new IdpAdapterMappingCreator(core).createIdpAdapterGrantMapping(formAdapterid, "username"); new JwtAtmCreator(core) .createJWTATM( atmId1, "jwtatm1", 120, 1, AutomationSharedConstants.AtmOauth_PersistentGrantUserKeyAttrName, "iat", "nbf"); new AtmMappingCreator(core) .createTokenMappings( "jwtatm1mapping", AccessTokenMappingContext.TypeEnum.IDP_ADAPTER, formAdapterid, atmId1, new AccessTokenMappingAttribute(null, AutomationSharedConstants.AtmOauth_PersistentGrantUserKeyAttrName, SourceTypeIdKey.TypeEnum.OAUTH_PERSISTENT_GRANT, "USER_KEY"), new AccessTokenMappingAttribute(null, "iat", SourceTypeIdKey.TypeEnum.EXPRESSION, "#iat=@org.jose4j.jwt.NumericDate@now().getValue()"), new AccessTokenMappingAttribute(null, "nbf", SourceTypeIdKey.TypeEnum.EXPRESSION, "#nbf = @org.jose4j.jwt.NumericDate@now(), #nbf.addSeconds(10), #nbf = #nbf.getValue()") ); new JwtAtmCreator(core) .createJWTATM(atmId2, "jwtatm2", 5, 2, "iss", "sub", "aud", "nbf", "iat"); new AtmMappingCreator(core) .createTokenMappings("jwtatm2mapping", AccessTokenMappingContext.TypeEnum.CLIENT_CREDENTIALS, null, atmId2, new AccessTokenMappingAttribute(null, "iss", SourceTypeIdKey.TypeEnum.EXPRESSION, "#value = #this.get(\"context.HttpRequest\").getObjectValue().getRequestURL().toString(), #length = #value.length(), #length = #length-16, #iss = #value.substring(0, #length)"), new AccessTokenMappingAttribute(null, "sub", SourceTypeIdKey.TypeEnum.TEXT, "6a481348-42a1-49d7-8361-f76ebd23634b"), new AccessTokenMappingAttribute(null, "aud", SourceTypeIdKey.TypeEnum.TEXT, "https://apiauthete.ssa.gov/mga/sps/oauth/oauth20/token"), new AccessTokenMappingAttribute(null, "nbf", SourceTypeIdKey.TypeEnum.EXPRESSION, "#nbf = @org.jose4j.jwt.NumericDate@now(), #nbf.addSeconds(10), #nbf = #nbf.getValue()"), new AccessTokenMappingAttribute(null, "iat", SourceTypeIdKey.TypeEnum.EXPRESSION, "#iat=@org.jose4j.jwt.NumericDate@now().getValue()") ); new ScopesCreator(core).addScopes("email", "foo", "bar"); new ClientCreator(core) .createClient( AutomationSharedConstants.AuthCodeClientId, AutomationSharedConstants.AuthCodeClientId, AutomationSharedConstants.AuthCodeClientSecret, atmId1, true, null, "http://"+AutomationSharedConstants.HOSTNAME+":8080/oidc-hello|http://"+AutomationSharedConstants.HOSTNAME+":8081/login/oauth2/code/pingfed", GrantTypesEnum.AUTHORIZATION_CODE, GrantTypesEnum.ACCESS_TOKEN_VALIDATION); new ClientCreator(core) .createClient( "manual2", "manual2", "secret", atmId2, true, null, "", GrantTypesEnum.CLIENT_CREDENTIALS); Pair<String, String[]>[] scopesToAttributes=new Pair[] { Pair.with("email", new String[] {"email", "family_name", "given_name"}) }; new OpenIdConnectPolicyCreator(core) .createOidcPolicy( atmId1, policyId1, policyId1, false, false, false, 5, new Triplet [] { Triplet.with("email", true, true), Triplet.with("family_name", true, true), Triplet.with("given_name", true, true)}, AttributeSource.TypeEnum.LDAP, ldapDsId, ldapAttributeSourceId, "my pingfed ldap ds", SourceTypeIdKey.TypeEnum.LDAP_DATA_STORE, new Pair[] { Pair.with("sub", "Subject DN"), Pair.with("email", "mail"), Pair.with("family_name", "sn"), Pair.with("given_name", "givenName") }, scopesToAttributes, true, true, "uid=${"+AutomationSharedConstants.AtmOauth_PersistentGrantUserKeyAttrName+"}", "/users?uid=${"+AutomationSharedConstants.AtmOauth_PersistentGrantUserKeyAttrName+"}"); } The above is an actual code snippet used by me to administer the PingFederate. As an example, let's look at what is happening in the LdapCreator class createLdap method. Java public DataStore createLdap(String id, String name, String hostName, String userDn, String password) { DataStoresApi dataStoresApi= new DataStoresApi(core.getApiClient()); core.setRequestTransformBeans(new TransformBean("type",type->TypeEnum.LDAP.name())); core.setResponseTransformBeans(new TransformBean("type",type->type.charAt(0)+type.substring(1) .toLowerCase()+"DataStore")); LdapDataStore ldapDataStore = new LdapDataStore(); List<String> hostNames = addStringToNewList(hostName); ldapDataStore.setHostnames(hostNames); ldapDataStore.setType(TypeEnum.LDAP); ldapDataStore.setId(id); ldapDataStore.setName(name); ldapDataStore.setLdapType(LdapTypeEnum.PING_DIRECTORY); ldapDataStore.setUserDN(userDn); ldapDataStore.setPassword(password); DataStore createdDataStore = dataStoresApi. createDataStore(ldapDataStore, false); return createdDataStore; } LdapCreator is a layer that was written on top of the generated code. The classes DataStoresApi, LdapDataStore, and DataStore are the classes from the generated code. In the createLdap method, the lines below are how we instruct the interceptor to transform the request and response. Java core.setRequestTransformBeans(new TransformBean("type",type->TypeEnum.LDAP.name())); core.setResponseTransformBeans(new TransformBean("type", type->type.charAt(0)+type.substring(1).toLowerCase()+"DataStore")); (Again, you can read more about that from the previous link to the Swagger notes on GitHub.) It did something. How do we know it really worked? Does It Really Work? The code base in the repository also contains example code that demonstrates Authorization Code Flow. The example code projects can be set up and run using their Readme.md. The example code projects also serve the purpose of demonstrating that our PingFederate setup worked, in addition to being hopefully useful. The Example Code Projects There are two examples: simple-oidc-check springboot.oidc.with.angular The example simple-oidc-check is a roll-your-own example. It will demonstrate the Authorization Code Flow and also the Client Credentials grant flow. It can be used to better understand many different concepts including JEE and OIDC. There are some concepts there that might raise your eyebrows and are not so often seen. The example springboot.oidc.with.angular is an Authorization Code Flow BFF pattern implementation. This is often considered the most secure approach because the access token is kept only at the back end. The access token never reaches the JavaScript/HTML layer. This and other approaches are also discussed in the example code Readme.md. Supported Versions The versions of PingFederate supported by this utility are detailed here. Future Vision I created this utility mainly because it helped me stand up my PingFed PoCs rapidly when working on a client project. I will try maintaining it as long as it does not tax me too much and PingFederate itself does not provide similar solutions. I can already think of some more improvements and enhancements. I can be encouraged to maintain and carry on with it with stars, likes, clones, etc. on the Git repository.

By Raghuraman Ramaswamy CORE

Culture and Methodologies

Agile

Agile

Career Development

Career Development

Methodologies

Methodologies

Team Management

Team Management

What “The Rings of Power” Taught Me About a Career in Tech (Part 4)

March 30, 2023 by Leon Adato

Low-Code and No-Code Are the Future of Work — For IT and Beyond

March 30, 2023 by Vijay Sundaram

Driving Isn’t Like Riding; Building Isn’t Like Using

March 30, 2023 by Leon Adato

Data Engineering

AI/ML

AI/ML

Big Data

Big Data

Databases

Databases

IoT

IoT

Deploying Prometheus and Grafana as Applications using ArgoCD — Including Dashboards

March 31, 2023 by lidor ettinger

Install and Setup MongoDB on Windows or Mac

March 30, 2023 by Vanitha P K

Getting Started With Web Components Using Stencil

March 30, 2023 by Vaibhav Shinde

Software Design and Architecture

Cloud Architecture

Cloud Architecture

Integration

Integration

Microservices

Microservices

Performance

Performance

Deploying Prometheus and Grafana as Applications using ArgoCD — Including Dashboards

March 31, 2023 by lidor ettinger

Getting Started With Web Components Using Stencil

March 30, 2023 by Vaibhav Shinde

Compress File Using Mule 4 With AES 256 Encryption

March 30, 2023 by ARINDAM GOSWAMI

Coding

Frameworks

Frameworks

Java

Java

JavaScript

JavaScript

Languages

Languages

Tools

Tools

Deploying Prometheus and Grafana as Applications using ArgoCD — Including Dashboards

March 31, 2023 by lidor ettinger

Building Multiple Barcode, QR Code and Datamatrix Scanner With Flutter for Inventory Management

March 30, 2023 by Xiao Ling

Getting Started With Web Components Using Stencil

March 30, 2023 by Vaibhav Shinde

Testing, Deployment, and Maintenance

Deployment

Deployment

DevOps and CI/CD

DevOps and CI/CD

Maintenance

Maintenance

Monitoring and Observability

Monitoring and Observability

Deploying Prometheus and Grafana as Applications using ArgoCD — Including Dashboards

March 31, 2023 by lidor ettinger

How To Install Oceanbase on an AWS EC2 Instance

March 30, 2023 by Wayne S

How to Use Buildpacks to Build Java Containers

March 30, 2023 by Dmitry Chuyko

Popular

AI/ML

AI/ML

Java

Java

JavaScript

JavaScript

Open Source

Open Source

File Uploads for the Web (3): File Uploads in Node and Nuxt

March 30, 2023 by Austin Gil CORE

How to Use Buildpacks to Build Java Containers

March 30, 2023 by Dmitry Chuyko

Natural Language Processing (NLP) in Software Testing: Automating Test Case Creation and Documentation

March 30, 2023 by Jacinth Paul

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: