Software Integration
Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!
Distributed SQL Essentials
Advanced Cloud Security
The article will cover the following topics: Why is Envoy proxy required? Introducing Envoy proxy Envoy proxy architecture with Istio Envoy proxy features Use cases of Envoy proxy Benefits of Envoy proxy Demo video - Deploying Envoy in K8s and configuring as a load balancer Why Is Envoy Proxy Required? Challenges are plenty for organizations moving their applications from monolithic to microservices architecture. Managing and monitoring the sheer number of distributed services across Kubernetes and the public cloud often exhausts app developers, cloud teams, and SREs. Below are some of the major network-level operational hassles of microservices, which shows why Envoy proxy is required. Lack of Secure Network Connection Kubernetes is not inherently secure because services are allowed to talk to each other freely. It poses a great threat to the infrastructure since an attacker who gains access to a pod can move laterally across the network and compromise other services. This can be a huge problem for security teams, as it is harder to ensure the safety and integrity of sensitive data. Also, the traditional perimeter-based firewall approach and intrusion detection systems will not help in such cases. Complying With Security Policies Is a Huge Challenge There is no developer on earth who would enjoy writing security logic to ensure authentication and authorization, instead of brainstorming business problems. However, organizations who want to adhere to policies such as HIPAA or GDPR, ask their developers to write security logic such as mTLS encryption in their applications. Such cases in enterprises will lead to two consequences: frustrated developers, and security policies being implemented locally and in silos. Lack of Visibility Due to Complex Network Topology Typically, microservices are distributed across multiple Kubernetes clusters and cloud providers. Communication between these services within and across cluster boundaries will contribute to a complex network topology in no time. As a result, it becomes hard for Ops teams and SREs to have visibility over the network, which impedes their ability to identify and resolve network issues in a timely manner. This will lead to frequent application downtime and compromised SLA. Complicated Service Discovery Services are often created and destroyed in a dynamic microservices environment. Static configurations provided by old-generation proxies are ineffective in keeping track of services in such an environment. This makes it difficult for application engineers to configure communication logic between services because they have to manually update the configuration file whenever a new service is deployed or deleted. It leads to application developers spending more of their time configuring the networking logic rather than coding the business logic. Inefficient Load Balancing and Traffic Routing It is crucial for platform architects and cloud engineers to ensure effective traffic routing and load balancing between services. However, it is a time-consuming and error-prone process for them to manually configure routing rules and load balancing policies for each service, especially when they have a fleet of them. Also, traditional load balancers with simple algorithms would result in inefficient resource utilization and suboptimal load balancing in the case of microservices. All these lead to increased latency, and service unavailability due to improper traffic routing. With the rise in the adoption of microservices architecture, there was a need for a fast, intelligent proxy that can handle the complex service-to-service connection across the cloud. Introducing Envoy Proxy Envoy is an open-source edge and service proxy, originally developed by Lyft to facilitate their migration from a monolith to cloud-native microservices architecture. It also serves as a communication bus for microservices (refer to Figure 1 below) across the cloud, enabling them to communicate with each other in a rapid, secure, and efficient manner. Envoy proxy abstracts network and security from the application layer to an infrastructure layer. This helps application developers simplify developing cloud-native applications by saving hours spent on configuring network and security logic. Envoy proxy provides advanced load balancing and traffic routing capabilities that are critical to run large, complex distributed applications. Also, the modular architecture of Envoy helps cloud and platform engineers to customize and extend its capabilities. Figure 1: Envoy proxy intercepting traffic between services Envoy Proxy Architecture With Istio Envoy proxies are deployed as sidecar containers alongside application containers. The sidecar proxy then intercepts and takes care of the service-to-service connection (refer to Figure 2 below) and provides a variety of features. This network of proxies is called a data plane, and it is configured and monitored from a control plane provided by Istio. These two components together form the Istio service mesh architecture, which provides a powerful and flexible infrastructure layer for managing and securing microservices. Figure 2: Istio sidecar architecture with Envoy proxy data plane Envoy Proxy Features Envoy proxy offers the following features at a high level. (Visit Envoy docs for more information on the features listed below.) Out-of-process architecture: Envoy proxy runs independently as a separate process apart from the application process. It can be deployed as a sidecar proxy and also as a gateway without requiring any changes to the application. Envoy is also compatible with any application language like Java or C++, which provides greater flexibility for application developers. L3/L4 and L7 filter architecture: Envoy supports filters and allows customizing traffic at the network layer (L3/L4) and at the application layer (L7). This allows for more control over the network traffic and offers granular traffic management capabilities such as TLS client certificate authentication, buffering, rate limiting, and routing/forwarding. HTTP/2 and HTTP/3 support: Envoy supports HTTP/1.1, HTTP/2, and HTTP/3 (currently in alpha) protocols. This enables seamless communication between clients and target servers using different versions of HTTP. HTTP L7 routing: Envoy's HTTP L7 routing subsystem can route and redirect requests based on various criteria, such as path, authority, and content type. This feature is useful for building front/edge proxies and service-to-service meshes. gRPC support: Envoy supports gRPC, a Google RPC framework that uses HTTP/2 or above as its underlying transport. Envoy can act as a routing and load-balancing substrate for gRPC requests and responses. Service discovery and dynamic configuration: Envoy supports service discovery and dynamic configuration through a layered set of APIs that provide dynamic updates about backend hosts, clusters, routing, listening sockets, and cryptographic material. This allows for centralized management and simpler deployment, with options for DNS resolution or static config files. Health checking: For building an Envoy mesh, service discovery is treated as an eventually consistent process. Envoy has a health-checking subsystem that can perform active and passive health checks to determine healthy load-balancing targets. Advanced load balancing: Envoy's self-contained proxy architecture allows it to implement advanced load-balancing techniques, such as automatic retries, circuit breaking, request shadowing, and outlier detection, in one place, accessible to any application. Front/edge proxy support: Using the same software at the edge provides benefits such as observability, management, and identical service discovery and load-balancing algorithms. Envoy's feature set makes it well-suited as an edge proxy for most modern web application use cases, including TLS termination, support for multiple HTTP versions, and HTTP L7 routing. Best-in-class observability: Envoy provides robust statistics support for all subsystems and supports distributed tracing via third-party providers, making it easier for SREs and Ops teams to monitor and debug problems occurring at both the network and application levels. Given its powerful set of features, Envoy proxy has become a popular choice for organizations to manage and secure multicloud and multicluster apps. In practice, it has two main use cases. Use Cases of Envoy Proxy Envoy proxy can be used as both a sidecar service proxy and a gateway. Envoy Sidecar Proxy As we have seen in the Isito architecture, Envoy proxy constitutes the data plane and manages the traffic flow between services deployed in the mesh. The sidecar proxy provides features such as service discovery, load balancing, traffic routing, etc., and offers visibility and security to the network of microservices. Envoy Gateway as API Envoy proxy can be deployed as an API Gateway and as an ingress (please refer to the Envoy Gateway project). Envoy Gateway is deployed at the edge of the cluster to manage external traffic flowing into the cluster and between multicloud applications (north-south traffic). Envoy Gateway helped application developers who were toiling to configure Envoy proxy (Istio-native) as API and ingress controller, instead of purchasing a third-party solution like NGINX. With its implementation, they have a central location to configure and manage ingress and egress traffic and apply security policies such as authentication and access control. Below is a diagram of Envoy Gateway architecture and its components. Envoy Gateway architecture (Source) Benefits of Envoy Proxy Envoy’s ability to abstract network and security layers offers several benefits for IT teams such as developers, SREs, cloud engineers, and platform teams. Following are a few of them. Effective Network Abstraction The out-of-process architecture of Envoy helps it to abstract the network layer from the application to its own infrastructure layer. This allows for faster deployment for application developers, while also providing a central plane to manage communication between services. Fine-Grained Traffic Management With its support for the network (L3/L4) and application (L7) layers, Envoy provides flexible and granular traffic routing, such as traffic splitting, retry policies, and load balancing. Ensure Zero Trust Security at L4/L7 Layers Envoy proxy helps to implement authentication among services inside a cluster with stronger identity verification mechanisms like mTLS and JWT. You can achieve authorization at the L7 layer with Envoy proxy easily and ensure zero trust. (You can implement AuthN/Z policies with Istio service mesh — the control plane for Envoy.) Control East-West and North-South Traffic for Multicloud Apps Since enterprises deploy their applications into multiple clouds, it is important to understand and control the traffic or communication in and out of the data centers. Since Envoy proxy can be used as a sidecar and also an API gateway, it can help manage east-west traffic and also north-south traffic, respectively. Monitor Traffic and Ensure Optimum Platform Performance Envoy aims to make the network understandable by emitting statistics, which are divided into three categories: downstream statistics for incoming requests, upstream statistics for outgoing requests, and server statistics for describing the Envoy server instance. Envoy also provides logs and metrics that provide insights into traffic flow between services, which is also helpful for SREs and Ops teams to quickly detect and resolve any performance issues. Video: Get Started With Envoy Proxy Deploying Envoy in k8s and Configuring as Load Balancer The below video discusses different deployment types and their use cases, and it shows a demo of Envoy deployment into Kubernetes and how to set it as a load balancer (edge proxy).
Cucumber is a well-known Behavior-Driven Development (BDD) framework that allows developers to implement end-to-end testing. The combination of Selenium with Cucumber provides a powerful framework that will enable you to create functional tests in an easy way. It allows you to express acceptance criteria in language that business people can read and understand, along with the steps to take to verify that they are met. The Cucumber tests are then run through a browser-like interface that allows you to see what's happening in your test at each step. This Cucumber Selenium tutorial will walk you through the basics of writing test cases with Cucumber in Selenium WebDriver. If you are not familiar with Cucumber, this Cucumber BDD tutorial will also give you an introduction to its domain-specific language (DSL) and guide you through writing your first step definitions, setting up Cucumber with Selenium WebDriver, and automating web applications using Cucumber, Selenium, and TestNG framework. Why Use Cucumber.js for Selenium Automation Testing? Cucumber.js is a tool that is often used in conjunction with Selenium for automating acceptance tests. It allows you to write tests in a natural language syntax called Gherkin, which makes it easy for non-technical team members to understand and write tests. Cucumber with Selenium is one the easiest-to-use combinations. Here are a few benefits of using Selenium Cucumber for automation testing: Improved collaboration: Since Cucumber.js tests are written in a natural language syntax, they can be easily understood by all members of the development team, including non-technical stakeholders. This can improve collaboration between different team members and ensure that everyone is on the same page. Easier to maintain: Cucumber.js tests are organized into features and scenarios, which makes it easy to understand the structure of the tests and locate specific tests when you need to make changes. Better documentation: The natural language syntax of Cucumber.js tests makes them a good source of documentation for the functionality of your application. This can be especially useful for teams that follow a behavior-driven development (BDD) approach. Greater flexibility: Cucumber.js allows you to write tests for a wide variety of applications, including web, mobile, and desktop applications. It also supports multiple programming languages, so you can use it with the language that your team is most familiar with. To sum it up, Cucumber with Selenium is a useful tool for automating acceptance tests, as it allows you to write tests in a natural language syntax that is easy to understand and maintain and provides greater flexibility and improved collaboration. Configure Cucumber Setup in Eclipse and IntelliJ [Tutorial] With the adoption of Agile methodology, a variety of stakeholders, such as Quality Assurance professionals, technical managers, and program managers, including those without technical backgrounds, have come together to work together to improve the product. This is where the need to implement Behavior Driven Development (BDD) arises. Cucumber is a popular BDD tool that is used for automated testing. In this section, we will explain how to set up Cucumber in Eclipse and IntelliJ for automated browser testing. How To Configure Cucumber in Eclipse In this BDD Cucumber tutorial, we will look at the instructions on how to add necessary JARs to a Cucumber Selenium Java project in order to set up Cucumber in Eclipse. It is similar to setting up the TestNG framework and is useful for those who are just starting with Selenium automation testing. Open Eclipse by double-clicking, then select workspace and change it anytime with the Browse button. Click Launch and close the Welcome window; not needed for the Cucumber Selenium setup. To create a new project, go to File > New > Java Project. Provide information or do what is requested, then enter a name for your project and click the Finish button. Right-click on the project, go to Build Path > Configure Build Path. Click on the button labeled "Add External JARs." Locate Cucumber JARs and click Open. Add required Selenium JARs for Cucumber setup in Eclipse. They will appear under the Libraries tab. To import the necessary JARs for the Cucumber setup in Eclipse, click 'Apply and Close.' The imported JARs will be displayed in the 'Referenced Libraries' tab of the project. How To Install Cucumber in IntelliJ In this part of the IntelliJ Cucumber Selenium tutorial, we will demonstrate how to set up Cucumber with IntelliJ, a widely used Integrated Development Environment (IDE) for Selenium Cucumber Java development. Here are the steps for configuring Cucumber in IntelliJ. To create a new project in IntelliJ IDEA, open the IntelliJ IDE and go to File > New > Project. To create a new Java project in IntelliJ IDEA, select Java and click Next. To complete the process of creating a new project in IntelliJ IDEA, name your project and click Finish. After creating a new project in IntelliJ IDEA, you will be prompted to choose whether to open the project in the current window or a new one. You can select 'This Window.' After creating a new project in IntelliJ IDEA, it will be displayed in the project explorer. To import the necessary JARs for Selenium and Cucumber in IntelliJ IDEA, go to File > Project Structure > Modules, similar to how it was done in Eclipse. To add dependencies to your project, click the '+' sign at the bottom of the window and select the necessary JARs or directories. To add Selenium to Cucumber, add the selenium-java and selenium dependencies JARs to the project. Click Apply and then OK. Similar to how Cucumber Selenium is set up in Eclipse, the imported JARs will be located in the "External Libraries" section once they have been added. For a step-by-step guide on using Cucumber for automated testing, be sure to check out our Cucumber testing tutorial on Cucumber in Eclipse and IntelliJ. This article provides detailed instructions and tips for setting up Cucumber in these two popular Java IDEs. Cucumber.js Tutorial With Examples for Selenium JavaScript Using a BDD framework has its own advantages, ones that can help you take your Selenium test automation a long way. Not to sideline, these BDD frameworks help all of your stakeholders to easily interpret the logic behind your test automation script. Leveraging Cucumber.js for your Selenium JavaScript testing can help you specify an acceptance criterion that would be easy for any non-programmer to understand. It could also help you quickly evaluate the logic implied in your Selenium test automation suite without going through huge chunks of code. Cucumber.js, a Behaviour Driven Development framework, has made it easier to understand tests by using a given-when-then structure. For example- Imagine you have $1000 in your bank account. You go to the ATM machine and ask for $200. If the machine is working properly, it will give you $200, and your bank account balance will be $800. The machine will also give your card back to you. How To Use Annotations in Cucumber Framework in Selenium [Tutorial] The Cucumber Selenium framework is popular because it uses natural language specifications to define test cases. It allows developers to write test scenarios in plain language, which makes it easier for non-technical stakeholders to understand and review. In Cucumber, test scenarios are written in a feature file using annotations, which are keywords that mark the steps in the scenario. The feature file is then translated into code using step definition files, which contain methods that correspond to each step in the feature file. Cucumber Selenium annotations are used to mark the steps in the feature file and map them to the corresponding methods in the step definition file. There are three main types of annotations in Cucumber: Given, When, and Then. Given annotations are used to set up the initial state of the system under test. When annotations are used to describe the actions that are being tested, and Then annotations are used to verify that the system is in the expected state after the actions have been performed. Cucumber Selenium also provides a number of "hooks," which are methods that are executed before or after certain events in the test execution process. For example, a "before" hook might be used to set up the test environment, while an "after" hook might be used to clean up resources or take screenshots after the test has been completed. There are several types of hooks available in Cucumber, including "before" and "after" hooks, as well as "beforeStep" and "afterStep" hooks, which are executed before and after each step in the test scenario. In this tutorial, we will go over the different types of annotations and hooks that are available in Cucumber and how to use them to write effective test scenarios. We will also look at some best practices for organizing and maintaining your Cucumber Selenium tests. How To Perform Automation Testing With Cucumber and Nightwatch JS To maximize the advantages of test automation, it is important to choose the right test automation framework and strategy so that the quality of the project is not compromised while speeding up the testing cycle, detecting bugs early, and quickly handling repetitive tasks. Automation testing is a crucial aspect of software development, allowing developers to quickly and efficiently validate the functionality and performance of their applications. Cucumber is a testing framework that uses natural language specifications to define test cases, and Nightwatch.js is a tool for automating tests for web applications. Behavior Driven Development (BDD) is a technique that clarifies the behavior of a feature using simple, user-friendly language. This approach makes it easy for anyone to understand the requirements, even those without technical expertise. DSL is also used to create automated test scripts. Cucumber allows users to write scenarios in plain text using Gherkin syntax. This syntax is made up of various keywords, including Feature, Scenario, Given, When, Then, and And. The feature is used to describe the high-level functionality, while Scenario is a collection of steps for Cucumber to execute. Each step is composed of keywords such as Given, When, Then, and And, all of which serve a specific purpose. A Gherkin document is stored in a file with a .feature extension. Automation Testing With Selenium, Cucumber, and TestNG Automation testing with Selenium, Cucumber, and TestNG is a powerful combination for testing web applications. Selenium is an open-source tool for automating web browsers, and it can be used to automate a wide range of tasks in a web application. Cucumber is a tool for writing and executing acceptance tests, and it can be used to define the expected behavior of a web application in a simple, human-readable language. TestNG is a testing framework for Java that can be used to organize and run Selenium tests. One of the benefits of using Selenium, Cucumber, and TestNG together is that they can be integrated into a continuous integration and delivery (CI/CD) pipeline. This allows developers to automatically run tests as part of their development process, ensuring that any changes to the codebase do not break existing functionality. To use Selenium, Cucumber, and TestNG together, you will need to install the necessary software and dependencies. This typically involves installing Java, Selenium, Cucumber, and TestNG. You will also need to set up a project in your development environment and configure it to use Selenium, Cucumber, and TestNG. Once you have set up your project, you can begin writing tests using Selenium, Cucumber, and TestNG. This involves creating test cases using Cucumber's Given-When-Then syntax and implementing those test cases using Selenium and TestNG. You can then run your tests using TestNG and analyze the results to identify any issues with the application. To sum it up, automation testing with Selenium, Cucumber, and TestNG can be a powerful tool for testing web applications. By integrating these tools into your CI/CD pipeline, you can ensure that your application is tested thoroughly and continuously, helping you to deliver high-quality software to your users. How To Integrate Cucumber With Jenkins Cucumber is a tool for writing and executing acceptance tests, and it is often used in conjunction with continuous integration (CI) tools like Jenkins to automate the testing process. By integrating Cucumber with Jenkins, developers can automatically run acceptance tests as part of their CI pipeline, ensuring that any changes to the codebase do not break existing functionality. To use Cucumber Selenium with Jenkins, you will need to install the necessary software and dependencies. This typically involves installing Java, Cucumber, and Jenkins. You will also need to set up a project in your development environment and configure it to use Cucumber. Once you have set up your project and configured Cucumber, you can begin writing acceptance tests using Cucumber's Given-When-Then syntax. These tests should define the expected behavior of your application in a simple, human-readable language. To run your Cucumber Selenium tests with Jenkins, you will need to set up a Jenkins job and configure it to run your tests. This typically involves specifying the location of your Cucumber test files and any necessary runtime parameters. You can then run your tests by triggering the Jenkins job and analyzing the results to identify any issues with the application. In conclusion, integrating Cucumber with Jenkins can be a powerful way to automate the acceptance testing process. By running your tests automatically as part of your CI pipeline, you can ensure that your application is tested thoroughly and continuously, helping you to deliver high-quality software to your users. Top Five Cucumber Best Practices for Selenium Automation Cucumber is a popular tool for writing and executing acceptance tests, and following best practices can help ensure that your tests are effective and maintainable. Here you can look at the five best practices for using Cucumber to create robust and maintainable acceptance tests. 1. Keep acceptance tests simple and focused: Cucumber tests should define the expected behavior of your application in a clear and concise manner. Avoid including unnecessary details or extraneous test steps. 2. Use the Given-When-Then syntax: Cucumber's Given-When-Then syntax is a helpful way to structure your acceptance tests. It helps to clearly define the pre-conditions, actions, and expected outcomes of each test. 3. Organize tests with tags: Cucumber allows you to use tags to label and group your tests. This can be useful for organizing tests by feature or by the level of testing (e.g. unit, integration, acceptance). 4. Avoid testing implementation details: Your acceptance tests should focus on the expected behavior of your application rather than the specific implementation details. This will make your tests more robust and easier to maintain. 5. Use data tables to test multiple scenarios: Cucumber's data tables feature allows you to test multiple scenarios with a single test. This can be a useful way to reduce the number of test cases you need to write. By following these best practices when using Cucumber, you can create effective and maintainable acceptance tests that help ensure the quality of your application.
The console.log function — the poor man’s debugger — is every JavaScript developer’s best friend. We use it to verify that a certain piece of code was executed or to check the state of the application at a given point in time. We may also use console.warn to send warning messages or console.error to explain what happened when things have gone wrong. Logging makes it easy to debug your app during local development. But what about debugging your Node.js app while it’s running in a hosted cloud environment? The logs are kept on the server, to which you may or may not have access. How do you view your logs then? Most companies use application performance monitoring tools and observability tools for better visibility into their hosted apps. For example, you might send your logs to a log aggregator like Datadog, Sumo Logic, or Papertrail where logs can be viewed and queried. In this article, we’ll look at how we can configure an app that is hosted on Render to send its system logs to Papertrail by using Render Log Streams. By the end, you’ll have your app up and running — and logging — in no time. Creating Our Node.JS App and Hosting It With Render Render is a cloud hosting platform made for developers by developers. With Render, you can easily host your static sites, web services, cron jobs, and more. We’ll start with a simple Node.js and Express app for our demo. You can find the GitHub repo here. You can also view the app here. To follow along on your machine, fork the repo so that you have a copy running locally. You can install the project’s dependencies by running yarn install, and you can start the app by running yarn start. Easy enough! Render Log Streams demo app Now it’s time to get our app running on Render. If you don’t have a Render account yet, create one now. It’s free! Once you’re logged in, click the “New” button and then choose the “Web Service” option from the menu. Creating a new web service This will take you to the next page where you’ll select the GitHub repo you’d like to connect. If you haven’t connected your GitHub account yet, you can do so here. And if you have connected your GitHub account but haven’t given Render access to your specific repo yet, you can click the “Configure account” button. This will take you to GitHub, where you can grant access to all your repos or just a selection of them. Connecting your GitHub repo Back on Render, after connecting to your repo, you’ll be taken to a configuration page. Give your app a name (I chose the same name as my repo, but it can be anything), and then provide the correct build command (yarn, which is a shortcut for yarn install) and start command (yarn start). Choose your instance type (free tier), and then click the “Create Web Service” button at the bottom of the page to complete your configuration setup. Configuring your app With that, Render will deploy your app. You did it! You now have an app hosted on Render’s platform. Log output from your Render app’s first deployment Creating Our Papertrail Account Let’s now create a Papertrail account. Papertrail is a log aggregator tool that helps make log management easy. You can create an account for free — no credit card is required. Once you’ve created your account, click on the “Add your first system” button to get started. Adding your first system in Papertrail This will take you to the next page which provides you with your syslog endpoint at the top of the screen. There are also instructions for running an install script, but in our case, we don’t actually need to install anything! So just copy that syslog endpoint, and we’ll paste it in just a bit. Syslog endpoint Connecting Our Render App to Papertrail We now have an app hosted on Render, and we have a Papertrail account for logging. Let’s connect the two! Back in the Render dashboard, click on your avatar in the global navigation, then choose “Account Settings” from the drop-down menu. Render account settings Then in the secondary side navigation, click on the “Log Streams” tab. Once on that page, you can click the “Add Log Stream” button, which will open a modal. Paste your syslog endpoint from Papertrail into the “Log Endpoint” input field, and then click “Add Log Stream” to save your changes. Adding your log stream You should now see your Log Stream endpoint shown in Render’s dashboard. Render Log Stream dashboard Great! We’ve connected Render to Papertrail. What’s neat is that we’ve set up this connection for our entire Render account, so we don’t have to configure it for each individual app hosted on Render. Adding Logs to Our Render App Now that we have our logging configured, let’s take it for a test run. In our GitHub repo’s code, we have the following in our app.js file: JavaScript app.get('/', (req, res) => { console.log('Log - home page'); console.info('Info - home page'); console.warn('Warn - home page'); console.error('Error - home page'); console.debug('Debug - home page'); return res.sendFile('index.html', { root: 'public' }); }); When a request is made to the root URL of our app, we do a bit of logging and then send the index.html file to the client. The user doesn’t see any of the logs since these are server-side rather than client-side logs. Instead, the logs are kept on our server, which, again, is hosted on Render. To generate the logs, open your demo app in your browser. This will trigger a request for the home page. If you’re following along, your app URL will be different from mine, but my app is hosted here. Viewing Logs in Papertrail Let’s go find those logs in Papertrail. After all, they were logged to our server, but our server is hosted on Render. In your Papertrail dashboard, you should see at least two systems: one for Render itself, which was used to test the account connection, and one for your Render app (“render-log-stream-demo” in my case). Papertrail systems Click on the system for your Render app, and you’ll see a page where all the logs are shown and tailed, with the latest logs appearing at the bottom of the screen. Render app logs in Papertrail You can see that we have logs for many events, not just the data that we chose to log from our app.js file. These are the syslogs, so you also get helpful log data from when Render was installing dependencies and deploying your app! At the bottom of the page, we can enter search terms to query our logs. We don’t have many logs here yet, but where you’re running a web service that gets millions of requests per day, these log outputs can get very large very quickly. Searching logs in Papertrail Best Practices for Logging This leads us to some good questions: Now that we have logging set up, what exactly should we be logging? And how should we be formatting our logs so that they’re easy to query when we need to find them? What you’re logging and why you’re logging something will vary by situation. You may be adding logs after a customer issue is reported that you’re unable to reproduce locally. By adding logs to your app, you can get better visibility into what’s happening live in production. This is a reactive form of logging in which you’re adding new logs to certain files and functions after you realize you need them. As a more proactive form of logging, there may be important business transactions that you want to log all the time, such as account creation or order placement. This will give you greater peace of mind that events are being processed as expected throughout the day. It will also help you see the volume of events generated in any given interval. And, when things do go wrong, you’ll be able to pinpoint when your log output changed. How you format your logs is up to you, but you should be consistent in your log structure. In our example, we just logged text strings, but it would be even better to log our data in JSON format. With JSON, we can include key-value pairs for all of our messages. For each message, we might choose to include data for the user ID, the timestamp, the actual message text, and more. The beauty of JSON is that it makes querying your logs much easier, especially when viewing them in a log aggregator tool that contains thousands or millions of other messages. Conclusion There you have it — how to host your app on Render and configure logging with Render Log Streams and Papertrail. Both platforms only took minutes to set up, and now we can manage our logs with ease. Keep in mind that Render Log Streams let you send your logs to any of several different log aggregators, giving you lots of options. For example, Render logs can be sent to Sumo Logic. You just need to create a Cloud Syslog Source in your Sumo Logic account. Or, you can send your logs to Datadog as well. With that, it’s time for me to log off. Thanks for reading, Happy coding, and happy logging!
Java collection framework provides a variety of classes and interfaces, such as lists, sets, queues, and maps, for managing and storing collections of related objects. In this blog, we go over effective Java collection framework: best practices and tips. What Is a Collection Framework? The Java collection framework is a key element of Java programming. To effectively use the Java collection framework, consider factors like utilizing the enhanced for loop, generics, avoiding raw types, and selecting the right collection. Choosing the Right Collection for the Task Each collection class has its own distinct set of qualities and is made to be used for a particular function. Following are some descriptions of each kind of collection: List: The ArrayList class is the most widely used list implementation in Java, providing resizable arrays when it is unknown how large the collection will be. Set: The HashSet class is the most popular implementation of a set in Java, providing uniqueness with a hash-table-based implementation. Queue: The LinkedList class is the most popular Java implementation of a queue, allowing elements to be accessed in a specific order. Map: The HashMap class of Java is the most popular map implementation for storing and retrieving data based on distinct keys. Factors to Consider While Choosing a Collection Type of data: Different collections may be more suitable depending on the kind of data that will be handled and stored. Ordering: A list or queue is preferable to a set or map when arranging important items. Duplicate elements: A set or map may be a better option than a list or queue if duplicate elements are not allowed. Performance: The characteristics of performance differences between different collections. By picking the right collection, you can improve the performance of your code. Examples of Use Cases for Different Collections Lists: Lists allow for the storage and modification of ordered data, such as a to-do list or shopping list. Set: A set can be used to create unique items, such as email addresses. Queue: A queue can be used to access elements in a specific order, such as handling jobs in the order they are received. Map: A map can be used to store and access data based on unique keys, such as user preferences. Selecting the right collection for a Java application is essential, taking into account data type, ordering, duplicate elements, and performance requirements. This will increase code effectiveness and efficiency. Using the Correct Methods and Interfaces In this section, the various methods and interfaces that the collection framework provides will be covered, along with some tips on how to effectively use them. Choosing the Right Collection: The collection framework provides a variety of collection types to improve code speed and readability, such as lists, sets, queues, maps, and deques. Using Iterators: Iterators are crucial for browsing through collections, but if modified, they can quickly break down and throw a ConcurrentModificationException. Use a copy-on-write array list or concurrent hash map to stop this. Using Lambda Expressions: Lambda expressions in Java 8 allow programmers to write code that can be used as an argument to a method and can be combined with the filter() and map() methods of the Stream API to process collections. Using the Stream API: The Stream API is a powerful feature in Java 8 that enables functional collection processing, parallelizable and lazy, resulting in better performance. Using Generics: Generics are a powerful feature introduced in Java 5 that allows you to write type-safe code. They are especially useful when working with collections, as they allow you to specify the types of elements that a collection can contain. To use generics, it is important to use the wildcard operator. The Java collection framework provides methods and interfaces to improve code efficiency, readability, and maintainability. Iterators, Lambda expressions, Stream API, and generics can be used to improve performance and avoid common pitfalls. Best Practices for Collection Usage In this section, we will explore some important best practices for collection usage. Proper Initialization and Declaration of Collections Collections should be initialized correctly before use to avoid null pointer exceptions. Use the appropriate interface or class to declare the collection for uniqueness or order. Using Generics to Ensure Type Safety Generics provide type safety by allowing us to specify the type of objects that can be stored in a collection, allowing us to catch type mismatch errors at compile time. When declaring a collection, specify the type using angle brackets (<>). For example, List<String> ensures that only String objects can be added to the list. Employing the Appropriate Interfaces for Flexibility The Java collection framework provides a variety of interfaces, allowing us to easily switch implementations and take advantage of polymorphism to write code that is more modular and reusable. Understanding the Behavior of Different Collection Methods It is important to understand the behavior of collection methods to use them effectively. To gain a thorough understanding, consult Java documentation or reliable sources. Understanding the complexities of operations like contains() and remove() can make a difference in code performance. Handling Null Values and Empty Collections To prevent unexpected errors or undesirable behavior, it's crucial to handle null values and empty collections properly. Check that collections are not null and have the required data to prevent errors. Memory and Performance Optimization In this section, we will explore techniques and best optimize to optimize memory utilization and enhance the performance of collections in Java as follows: 1. Minimizing the Memory Footprint With the Right Collection Implementation Memory usage can be significantly decreased by selecting the best collection implementation for the job. When frequent random access is required, for instance, using an array list rather than a linked list can reduce memory overhead. 2. Efficient Iteration Over Collections It is common practice to iterate over collections, so picking the most effective iteration strategy is crucial. In comparison to conventional loops, using iterator-based loops or enhanced for-each loops can offer better performance. 3. Considering Alternative Collection Libraries for Specific Use Cases The Java collection framework offers a wide range of collection types, but in some cases, alternative libraries like Guava or Apache commons-collections can provide additional features and better performance for specific use cases. 4. Utilizing Parallel Processing With Collections for Improved Performance With the advent of multi-core processors, leveraging parallel processing techniques can enhance the performance of operations performed on large collections. The Java Stream API provides support for parallel execution, allowing for efficient processing of data in parallel. Tips and Tricks for Effective Collection Usage Using the Right Data Structures for Specific Tasks The right data structure must be chosen for the task at hand, with advantages and disadvantages, to make wise decisions and improve performance. Making Use of Utility Methods in the Collections Class The collections class in Java provides utility methods to simplify and streamline collection operations, such as sorting, searching, shuffling, and reversing. Leveraging Third-Party Libraries and Frameworks for Enhanced Functionality The Java collection framework provides a wide range of data structures, but third-party libraries and frameworks can provide more advanced features and unique data structures. These libraries can boost productivity, give access to more powerful collection options, and address use cases that the built-in Java collections cannot. Optimizing Collections for Specific Use Cases Immutable collections offer better thread safety and can be shared without defensive copying. Dynamic collections can be used to prevent frequent resizing and enhance performance. Specialized collections like HashSet or TreeMap can improve efficiency for unique or sorted elements. Optimise collections to improve performance, readability, and maintainability. Conclusion In this blog post, we have covered some effective Java collection frameworks with the best practices and tips. To sum up, the Java collection framework is a crucial component of Java programming. You can use the collection framework effectively and create more effective, maintainable code by adhering to these best practices and advice.
The reason why we need a Transactional Outbox is that a service often needs to publish messages as part of a transaction that updates the database. Both the database update and the sending of the message must happen within a transaction. Otherwise, if the service doesn’t perform these two operations automatically, a failure could leave the system in an inconsistent state. The GitHub repository with the source code for this article. In this article, we will implement it using Reactive Spring and Kotlin with Coroutines. Here is a full list of used dependencies: Kotlin with Coroutines, Spring Boot 3, WebFlux, R2DBC, Postgres, MongoDB, Kafka, Grafana, Prometheus, Zipkin, and Micrometer for observability. The Transactional Outbox pattern solves the problem of the implementation where usually the transaction tries to update the database table, then publishes a message to the broker and commits the transaction. But here is the problem: If the last step of the transaction fails, the transaction will roll back database changes, but the event has already been published to the broker. So, we need to find a way to guarantee both databases are written and published to the broker. The idea of how we can solve it is in one transaction, save it to the orders table, and in the same transaction, save to the outbox table and commit the transaction. Then, we have to publish saved events from the outbox table to the broker. We have two ways to do that; using a CDC (Change data capture) tool like Debezium, which continuously monitors your databases and lets any of your applications stream every row-level change in the same order they were committed to the database and polling publisher. For this project, we used polling publisher. Highly recommend Chris Richardson's Book: Microservices Patterns, where the Transactional Outbox pattern is very well explained. One more important thing is that we have to be ready for cases when the same event can be published more than one time, so the consumer must be idempotent. Idempotence describes the reliability of messages in a distributed system, specifically the reception of duplicated messages. Because of retries or message broker features, a message sent once can be received multiple times by consumers. A service is idempotent if processing the same event multiple times results in the same state and output as processing that event just a single time. The reception of a duplicated event does not change the application state or behavior. Most of the time, an idempotent service detects these events and ignores them. Idempotence can be implemented using unique identifiers. So, let’s implement it. The business logic of our example microservice is simple: orders with product shop items; it’s two tables for simplicity and an outbox table, of course. Usually, when an outbox table looks like it does when in the data field, we store serialized events. The most common is the JSON format, but it’s up to you and concrete microservices. We can put as data field state changes or can simply put every time the last updated full order domain entity; of course, state changes take much less size, but again it’s up to you. Other fields in the outbox table usually include event type, timestamp, version, and other metadata. It depends on each concrete implementation, but often it’s required minimum. The version field is for concurrency control. All UI interfaces will be available on ports: Swagger UI URL. Grafana UI URL. Zipkin UI URL. Kafka UI URL. Prometheus UI URL. The docker-compose file for this article has Postgres, MongoDB, zookeeper, Kafka, Kafka-ui, Zipkin, Prometheus, and Grafana, For local development run: use make local or make develop, first run only docker-compose, second same include the microservice image. YAML version: "3.9" services: microservices_postgresql: image: postgres:latest container_name: microservices_postgresql expose: - "5432" ports: - "5432:5432" restart: always environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres - POSTGRES_DB=microservices - POSTGRES_HOST=5432 command: -p 5432 volumes: - ./docker_data/microservices_pgdata:/var/lib/postgresql/data networks: [ "microservices" ] zoo1: image: confluentinc/cp-zookeeper:7.3.0 hostname: zoo1 container_name: zoo1 ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_SERVER_ID: 1 ZOOKEEPER_SERVERS: zoo1:2888:3888 volumes: - "./zookeeper:/zookeeper" networks: [ "microservices" ] kafka1: image: confluentinc/cp-kafka:7.3.0 hostname: kafka1 container_name: kafka1 ports: - "9092:9092" - "29092:29092" - "9999:9999" environment: KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka1:19092,EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092,DOCKER://host.docker.internal:29092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT,DOCKER:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL KAFKA_ZOOKEEPER_CONNECT: "zoo1:2181" KAFKA_BROKER_ID: 1 KAFKA_LOG4J_LOGGERS: "kafka.controller=INFO,kafka.producer.async.DefaultEventHandler=INFO,state.change.logger=INFO" KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_JMX_PORT: 9999 KAFKA_JMX_HOSTNAME: ${DOCKER_HOST_IP:-127.0.0.1} KAFKA_AUTHORIZER_CLASS_NAME: kafka.security.authorizer.AclAuthorizer KAFKA_ALLOW_EVERYONE_IF_NO_ACL_FOUND: "true" depends_on: - zoo1 volumes: - "./kafka_data:/kafka" networks: [ "microservices" ] kafka-ui: image: provectuslabs/kafka-ui container_name: kafka-ui ports: - "8086:8080" restart: always environment: - KAFKA_CLUSTERS_0_NAME=local - KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka1:19092 networks: [ "microservices" ] zipkin-all-in-one: image: openzipkin/zipkin:latest restart: always ports: - "9411:9411" networks: [ "microservices" ] mongo: image: mongo restart: always ports: - "27017:27017" environment: MONGO_INITDB_ROOT_USERNAME: admin MONGO_INITDB_ROOT_PASSWORD: admin MONGODB_DATABASE: bank_accounts networks: [ "microservices" ] prometheus: image: prom/prometheus:latest container_name: prometheus ports: - "9090:9090" command: - --config.file=/etc/prometheus/prometheus.yml volumes: - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro networks: [ "microservices" ] node_exporter: container_name: microservices_node_exporter restart: always image: prom/node-exporter ports: - '9101:9100' networks: [ "microservices" ] grafana: container_name: microservices_grafana restart: always image: grafana/grafana ports: - '3000:3000' networks: [ "microservices" ] networks: microservices: name: microservices The Postgres database schema for this project is: Orders domain REST Controller has the following methods: Kotlin @RestController @RequestMapping(path = ["/api/v1/orders"]) class OrderController(private val orderService: OrderService, private val or: ObservationRegistry) { @GetMapping @Operation(method = "getOrders", summary = "get order with pagination", operationId = "getOrders") suspend fun getOrders( @RequestParam(name = "page", defaultValue = "0") page: Int, @RequestParam(name = "size", defaultValue = "20") size: Int, ) = coroutineScopeWithObservation(GET_ORDERS, or) { observation -> ResponseEntity.ok() .body(orderService.getAllOrders(PageRequest.of(page, size)) .map { it.toSuccessResponse() } .also { response -> observation.highCardinalityKeyValue("response", response.toString()) } ) } @GetMapping(path = ["{id}"]) @Operation(method = "getOrderByID", summary = "get order by id", operationId = "getOrderByID") suspend fun getOrderByID(@PathVariable id: String) = coroutineScopeWithObservation(GET_ORDER_BY_ID, or) { observation -> ResponseEntity.ok().body(orderService.getOrderWithProductsByID(UUID.fromString(id)).toSuccessResponse()) .also { response -> observation.highCardinalityKeyValue("response", response.toString()) log.info("getOrderByID response: $response") } } @PostMapping @Operation(method = "createOrder", summary = "create new order", operationId = "createOrder") suspend fun createOrder(@Valid @RequestBody createOrderDTO: CreateOrderDTO) = coroutineScopeWithObservation(CREATE_ORDER, or) { observation -> ResponseEntity.status(HttpStatus.CREATED).body(orderService.createOrder(createOrderDTO.toOrder()).toSuccessResponse()) .also { log.info("created order: $it") observation.highCardinalityKeyValue("response", it.toString()) } } @PutMapping(path = ["/add/{id}"]) @Operation(method = "addProductItem", summary = "add to the order product item", operationId = "addProductItem") suspend fun addProductItem( @PathVariable id: UUID, @Valid @RequestBody dto: CreateProductItemDTO ) = coroutineScopeWithObservation(ADD_PRODUCT, or) { observation -> ResponseEntity.ok().body(orderService.addProductItem(dto.toProductItem(id))) .also { observation.highCardinalityKeyValue("CreateProductItemDTO", dto.toString()) observation.highCardinalityKeyValue("id", id.toString()) log.info("addProductItem id: $id, dto: $dto") } } @PutMapping(path = ["/remove/{orderId}/{productItemId}"]) @Operation(method = "removeProductItem", summary = "remove product from the order", operationId = "removeProductItem") suspend fun removeProductItem( @PathVariable orderId: UUID, @PathVariable productItemId: UUID ) = coroutineScopeWithObservation(REMOVE_PRODUCT, or) { observation -> ResponseEntity.ok().body(orderService.removeProductItem(orderId, productItemId)) .also { observation.highCardinalityKeyValue("productItemId", productItemId.toString()) observation.highCardinalityKeyValue("orderId", orderId.toString()) log.info("removeProductItem orderId: $orderId, productItemId: $productItemId") } } @PutMapping(path = ["/pay/{id}"]) @Operation(method = "payOrder", summary = "pay order", operationId = "payOrder") suspend fun payOrder(@PathVariable id: UUID, @Valid @RequestBody dto: PayOrderDTO) = coroutineScopeWithObservation(PAY_ORDER, or) { observation -> ResponseEntity.ok().body(orderService.pay(id, dto.paymentId).toSuccessResponse()) .also { observation.highCardinalityKeyValue("response", it.toString()) log.info("payOrder result: $it") } } @PutMapping(path = ["/cancel/{id}"]) @Operation(method = "cancelOrder", summary = "cancel order", operationId = "cancelOrder") suspend fun cancelOrder(@PathVariable id: UUID, @Valid @RequestBody dto: CancelOrderDTO) = coroutineScopeWithObservation(CANCEL_ORDER, or) { observation -> ResponseEntity.ok().body(orderService.cancel(id, dto.reason).toSuccessResponse()) .also { observation.highCardinalityKeyValue("response", it.toString()) log.info("cancelOrder result: $it") } } @PutMapping(path = ["/submit/{id}"]) @Operation(method = "submitOrder", summary = "submit order", operationId = "submitOrder") suspend fun submitOrder(@PathVariable id: UUID) = coroutineScopeWithObservation(SUBMIT_ORDER, or) { observation -> ResponseEntity.ok().body(orderService.submit(id).toSuccessResponse()) .also { observation.highCardinalityKeyValue("response", it.toString()) log.info("submitOrder result: $it") } } @PutMapping(path = ["/complete/{id}"]) @Operation(method = "completeOrder", summary = "complete order", operationId = "completeOrder") suspend fun completeOrder(@PathVariable id: UUID) = coroutineScopeWithObservation(COMPLETE_ORDER, or) { observation -> ResponseEntity.ok().body(orderService.complete(id).toSuccessResponse()) .also { observation.highCardinalityKeyValue("response", it.toString()) log.info("completeOrder result: $it") } } } As I mentioned earlier, the main idea of implementation for the transactional outbox is that in the first step in the one transaction, write to orders and outbox tables and commit the transaction—additionally, but not required, optimization. We can, in the same methods, after successfully committing a transaction, then publish the event and delete it from the outbox table. But here, if any one step of publishing to the broker or deleting from the outbox table fails, it’s ok because we have polling producer as a scheduled process. It’s small optimization and improvement, and it’s not mandatory to implement an outbox pattern. Try both variants and chose the best for your case. In our case, we are using Kafka, so we have to remember that producers have acks setting, When acks=0, producers consider messages as “written successfully” the moment the message was sent without waiting for the broker to accept it at all. If the broker goes offline or an exception happens, we won’t know and will lose data, so be careful with this setting and don’t use acks=0. When acks=1, producers consider messages as “written successfully” when the message was acknowledged by only the leader. When acks=all, producers consider messages as “written successfully” when the message is accepted by all in-sync replicas (ISR). In the simplified sequence diagram for service layer business logic, steps 5 and 6 are optional and not required optimization because we have polling publisher anyway: The order service implementation: Kotlin interface OrderService { suspend fun createOrder(order: Order): Order suspend fun getOrderByID(id: UUID): Order suspend fun addProductItem(productItem: ProductItem) suspend fun removeProductItem(orderID: UUID, productItemId: UUID) suspend fun pay(id: UUID, paymentId: String): Order suspend fun cancel(id: UUID, reason: String?): Order suspend fun submit(id: UUID): Order suspend fun complete(id: UUID): Order suspend fun getOrderWithProductsByID(id: UUID): Order suspend fun getAllOrders(pageable: Pageable): Page<Order> suspend fun deleteOutboxRecordsWithLock() } Kotlin @Service class OrderServiceImpl( private val orderRepository: OrderRepository, private val productItemRepository: ProductItemRepository, private val outboxRepository: OrderOutboxRepository, private val orderMongoRepository: OrderMongoRepository, private val txOp: TransactionalOperator, private val eventsPublisher: EventsPublisher, private val kafkaTopicsConfiguration: KafkaTopicsConfiguration, private val or: ObservationRegistry, private val outboxEventSerializer: OutboxEventSerializer ) : OrderService { override suspend fun createOrder(order: Order): Order = coroutineScopeWithObservation(CREATE, or) { observation -> txOp.executeAndAwait { orderRepository.insert(order).let { val productItemsEntityList = ProductItemEntity.listOf(order.productsList(), UUID.fromString(it.id)) val insertedItems = productItemRepository.insertAll(productItemsEntityList).toList() it.addProductItems(insertedItems.map { item -> item.toProductItem() }) Pair(it, outboxRepository.save(outboxEventSerializer.orderCreatedEventOf(it))) } }.run { observation.highCardinalityKeyValue("order", first.toString()) observation.highCardinalityKeyValue("outboxEvent", second.toString()) publishOutboxEvent(second) first } } override suspend fun addProductItem(productItem: ProductItem): Unit = coroutineScopeWithObservation(ADD_PRODUCT, or) { observation -> txOp.executeAndAwait { val order = orderRepository.findOrderByID(UUID.fromString(productItem.orderId)) order.incVersion() val updatedProductItem = productItemRepository.upsert(productItem) val savedRecord = outboxRepository.save( outboxEventSerializer.productItemAddedEventOf( order, productItem.copy(version = updatedProductItem.version).toEntity() ) ) orderRepository.updateVersion(UUID.fromString(order.id), order.version) .also { result -> log.info("addOrderItem result: $result, version: ${order.version}") } savedRecord }.run { observation.highCardinalityKeyValue("outboxEvent", this.toString()) publishOutboxEvent(this) } } override suspend fun removeProductItem(orderID: UUID, productItemId: UUID): Unit = coroutineScopeWithObservation(REMOVE_PRODUCT, or) { observation -> txOp.executeAndAwait { if (!productItemRepository.existsById(productItemId)) throw ProductItemNotFoundException(productItemId) val order = orderRepository.findOrderByID(orderID) productItemRepository.deleteById(productItemId) order.incVersion() val savedRecord = outboxRepository.save(outboxEventSerializer.productItemRemovedEventOf(order, productItemId)) orderRepository.updateVersion(UUID.fromString(order.id), order.version) .also { log.info("removeProductItem update order result: $it, version: ${order.version}") } savedRecord }.run { observation.highCardinalityKeyValue("outboxEvent", this.toString()) publishOutboxEvent(this) } } override suspend fun pay(id: UUID, paymentId: String): Order = coroutineScopeWithObservation(PAY, or) { observation -> txOp.executeAndAwait { val order = orderRepository.getOrderWithProductItemsByID(id) order.pay(paymentId) val updatedOrder = orderRepository.update(order) Pair(updatedOrder, outboxRepository.save(outboxEventSerializer.orderPaidEventOf(updatedOrder, paymentId))) }.run { observation.highCardinalityKeyValue("order", first.toString()) observation.highCardinalityKeyValue("outboxEvent", second.toString()) publishOutboxEvent(second) first } } override suspend fun cancel(id: UUID, reason: String?): Order = coroutineScopeWithObservation(CANCEL, or) { observation -> txOp.executeAndAwait { val order = orderRepository.findOrderByID(id) order.cancel() val updatedOrder = orderRepository.update(order) Pair(updatedOrder, outboxRepository.save(outboxEventSerializer.orderCancelledEventOf(updatedOrder, reason))) }.run { observation.highCardinalityKeyValue("order", first.toString()) observation.highCardinalityKeyValue("outboxEvent", second.toString()) publishOutboxEvent(second) first } } override suspend fun submit(id: UUID): Order = coroutineScopeWithObservation(SUBMIT, or) { observation -> txOp.executeAndAwait { val order = orderRepository.getOrderWithProductItemsByID(id) order.submit() val updatedOrder = orderRepository.update(order) updatedOrder.addProductItems(order.productsList()) Pair(updatedOrder, outboxRepository.save(outboxEventSerializer.orderSubmittedEventOf(updatedOrder))) }.run { observation.highCardinalityKeyValue("order", first.toString()) observation.highCardinalityKeyValue("outboxEvent", second.toString()) publishOutboxEvent(second) first } } override suspend fun complete(id: UUID): Order = coroutineScopeWithObservation(COMPLETE, or) { observation -> txOp.executeAndAwait { val order = orderRepository.findOrderByID(id) order.complete() val updatedOrder = orderRepository.update(order) log.info("order submitted: ${updatedOrder.status} for id: $id") Pair(updatedOrder, outboxRepository.save(outboxEventSerializer.orderCompletedEventOf(updatedOrder))) }.run { observation.highCardinalityKeyValue("order", first.toString()) observation.highCardinalityKeyValue("outboxEvent", second.toString()) publishOutboxEvent(second) first } } @Transactional(readOnly = true) override suspend fun getOrderWithProductsByID(id: UUID): Order = coroutineScopeWithObservation(GET_ORDER_WITH_PRODUCTS_BY_ID, or) { observation -> orderRepository.getOrderWithProductItemsByID(id).also { observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun getAllOrders(pageable: Pageable): Page<Order> = coroutineScopeWithObservation(GET_ALL_ORDERS, or) { observation -> orderMongoRepository.getAllOrders(pageable).also { observation.highCardinalityKeyValue("pageResult", it.toString()) } } override suspend fun deleteOutboxRecordsWithLock() = coroutineScopeWithObservation(DELETE_OUTBOX_RECORD_WITH_LOCK, or) { observation -> outboxRepository.deleteOutboxRecordsWithLock { observation.highCardinalityKeyValue("outboxEvent", it.toString()) eventsPublisher.publish(getTopicName(it.eventType), it) } } override suspend fun getOrderByID(id: UUID): Order = coroutineScopeWithObservation(GET_ORDER_BY_ID, or) { observation -> orderMongoRepository.getByID(id.toString()) .also { log.info("getOrderByID: $it") } .also { observation.highCardinalityKeyValue("order", it.toString()) } } private suspend fun publishOutboxEvent(event: OutboxRecord) = coroutineScopeWithObservation(PUBLISH_OUTBOX_EVENT, or) { observation -> try { log.info("publishing outbox event: $event") outboxRepository.deleteOutboxRecordByID(event.eventId!!) { eventsPublisher.publish(getTopicName(event.eventType), event.aggregateId.toString(), event) } log.info("outbox event published and deleted: $event") observation.highCardinalityKeyValue("event", event.toString()) } catch (ex: Exception) { log.error("exception while publishing outbox event: ${ex.localizedMessage}") observation.error(ex) } } } Order and product items Postgres repositories are a combination of CoroutineCrudRepository and custom implementation using DatabaseClient and R2dbcEntityTemplate, supporting optimistic and pessimistic locking, depending on method requirements. Kotlin @Repository interface OrderRepository : CoroutineCrudRepository<OrderEntity, UUID>, OrderBaseRepository @Repository interface OrderBaseRepository { suspend fun getOrderWithProductItemsByID(id: UUID): Order suspend fun updateVersion(id: UUID, newVersion: Long): Long suspend fun findOrderByID(id: UUID): Order suspend fun insert(order: Order): Order suspend fun update(order: Order): Order } @Repository class OrderBaseRepositoryImpl( private val dbClient: DatabaseClient, private val entityTemplate: R2dbcEntityTemplate, private val or: ObservationRegistry ) : OrderBaseRepository { override suspend fun updateVersion(id: UUID, newVersion: Long): Long = coroutineScopeWithObservation(UPDATE_VERSION, or) { observation -> dbClient.sql("UPDATE microservices.orders SET version = (version + 1) WHERE id = :id AND version = :version") .bind(ID, id) .bind(VERSION, newVersion - 1) .fetch() .rowsUpdated() .awaitSingle() .also { log.info("for order with id: $id version updated to $newVersion") } .also { observation.highCardinalityKeyValue("id", id.toString()) observation.highCardinalityKeyValue("newVersion", newVersion.toString()) } } override suspend fun getOrderWithProductItemsByID(id: UUID): Order = coroutineScopeWithObservation(GET_ORDER_WITH_PRODUCTS_BY_ID, or) { observation -> dbClient.sql( """SELECT o.id, o.email, o.status, o.address, o.version, o.payment_id, o.created_at, o.updated_at, |pi.id as productId, pi.price, pi.title, pi.quantity, pi.order_id, pi.version as itemVersion, pi.created_at as itemCreatedAt, pi.updated_at as itemUpdatedAt |FROM microservices.orders o |LEFT JOIN microservices.product_items pi on o.id = pi.order_id |WHERE o.id = :id""".trimMargin() ) .bind(ID, id) .map { row, _ -> Pair(OrderEntity.of(row), ProductItemEntity.of(row)) } .flow() .toList() .let { orderFromList(it) } .also { log.info("getOrderWithProductItemsByID order: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun findOrderByID(id: UUID): Order = coroutineScopeWithObservation(FIND_ORDER_BY_ID, or) { observation -> val query = Query.query(Criteria.where(ID).`is`(id)) entityTemplate.selectOne(query, OrderEntity::class.java).awaitSingleOrNull()?.toOrder() .also { observation.highCardinalityKeyValue("order", it.toString()) } ?: throw OrderNotFoundException(id) } override suspend fun insert(order: Order): Order = coroutineScopeWithObservation(INSERT, or) { observation -> entityTemplate.insert(order.toEntity()).awaitSingle().toOrder() .also { log.info("inserted order: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun update(order: Order): Order = coroutineScopeWithObservation(UPDATE, or) { observation -> entityTemplate.update(order.toEntity()).awaitSingle().toOrder() .also { log.info("updated order: $it") observation.highCardinalityKeyValue("order", it.toString()) } } } Kotlin interface ProductItemBaseRepository { suspend fun insert(productItemEntity: ProductItemEntity): ProductItemEntity suspend fun insertAll(productItemEntities: List<ProductItemEntity>): List<ProductItemEntity> suspend fun upsert(productItem: ProductItem): ProductItem } @Repository class ProductItemBaseRepositoryImpl( private val entityTemplate: R2dbcEntityTemplate, private val or: ObservationRegistry, ) : ProductItemBaseRepository { override suspend fun upsert(productItem: ProductItem): ProductItem = coroutineScopeWithObservation(UPDATE, or) { observation -> val query = Query.query( Criteria.where("id").`is`(UUID.fromString(productItem.id)) .and("order_id").`is`(UUID.fromString(productItem.orderId)) ) val product = entityTemplate.selectOne(query, ProductItemEntity::class.java).awaitSingleOrNull() if (product != null) { val update = Update .update("quantity", (productItem.quantity + product.quantity)) .set("version", product.version + 1) .set("updated_at", LocalDateTime.now()) val updatedProduct = product.copy(quantity = (productItem.quantity + product.quantity), version = product.version + 1) val updateResult = entityTemplate.update(query, update, ProductItemEntity::class.java).awaitSingle() log.info("updateResult product: $updateResult") log.info("updateResult updatedProduct: $updatedProduct") return@coroutineScopeWithObservation updatedProduct.toProductItem() } entityTemplate.insert(ProductItemEntity.of(productItem)).awaitSingle().toProductItem() .also { productItem -> log.info("saved productItem: $productItem") observation.highCardinalityKeyValue("productItem", productItem.toString()) } } override suspend fun insert(productItemEntity: ProductItemEntity): ProductItemEntity = coroutineScopeWithObservation(INSERT, or) { observation -> val product = entityTemplate.insert(productItemEntity).awaitSingle() log.info("saved product: $product") observation.highCardinalityKeyValue("product", product.toString()) product } override suspend fun insertAll(productItemEntities: List<ProductItemEntity>) = coroutineScopeWithObservation(INSERT_ALL, or) { observation -> val result = productItemEntities.map { entityTemplate.insert(it) }.map { it.awaitSingle() } log.info("inserted product items: $result") observation.highCardinalityKeyValue("result", result.toString()) result } } The important detail here is to be able to handle the case of multiple pod instances processing in a parallel outbox table. We have idempotent consumers, but we have to avoid processing the same table events more than one time. To prevent multiple instances select and publish the same events, we use FOR UPDATE SKIP LOCKED.This combination tries to select a batch of outbox events. If some other instance has already selected these records, first, one will skip locked records and select the next available and not locked, and so on. Kotlin @Repository interface OutboxBaseRepository { suspend fun deleteOutboxRecordByID(id: UUID, callback: suspend () -> Unit): Long suspend fun deleteOutboxRecordsWithLock(callback: suspend (outboxRecord: OutboxRecord) -> Unit) } class OutboxBaseRepositoryImpl( private val dbClient: DatabaseClient, private val txOp: TransactionalOperator, private val or: ObservationRegistry, private val transactionalOperator: TransactionalOperator ) : OutboxBaseRepository { override suspend fun deleteOutboxRecordByID(id: UUID, callback: suspend () -> Unit): Long = coroutineScopeWithObservation(DELETE_OUTBOX_RECORD_BY_ID, or) { observation -> withTimeout(DELETE_OUTBOX_RECORD_TIMEOUT_MILLIS) { txOp.executeAndAwait { callback() dbClient.sql("DELETE FROM microservices.outbox_table WHERE event_id = :eventId") .bind("eventId", id) .fetch() .rowsUpdated() .awaitSingle() .also { log.info("outbox event with id: $it deleted") observation.highCardinalityKeyValue("id", it.toString()) } } } } override suspend fun deleteOutboxRecordsWithLock(callback: suspend (outboxRecord: OutboxRecord) -> Unit) = coroutineScopeWithObservation(DELETE_OUTBOX_RECORD_WITH_LOCK, or) { observation -> withTimeout(DELETE_OUTBOX_RECORD_TIMEOUT_MILLIS) { txOp.executeAndAwait { dbClient.sql("SELECT * FROM microservices.outbox_table ORDER BY timestamp ASC LIMIT 10 FOR UPDATE SKIP LOCKED") .map { row, _ -> OutboxRecord.of(row) } .flow() .onEach { log.info("deleting outboxEvent with id: ${it.eventId}") callback(it) dbClient.sql("DELETE FROM microservices.outbox_table WHERE event_id = :eventId") .bind("eventId", it.eventId!!) .fetch() .rowsUpdated() .awaitSingle() log.info("outboxEvent with id: ${it.eventId} published and deleted") observation.highCardinalityKeyValue("eventId", it.eventId.toString()) } .collect() } } } } The polling producer implementation is a scheduled process that does the same job for publishing and deleting events at the given interval as typed earlier and uses the same service method: Kotlin @Component @ConditionalOnProperty(prefix = "schedulers", value = ["outbox.enable"], havingValue = "true") class OutboxScheduler(private val orderService: OrderService, private val or: ObservationRegistry) { @Scheduled(initialDelayString = "\${schedulers.outbox.initialDelayMillis}", fixedRateString = "\${schedulers.outbox.fixedRate}") fun publishAndDeleteOutboxRecords() = runBlocking { coroutineScopeWithObservation(PUBLISH_AND_DELETE_OUTBOX_RECORDS, or) { log.debug("starting scheduled outbox table publishing") orderService.deleteOutboxRecordsWithLock() log.debug("completed scheduled outbox table publishing") } } companion object { private val log = LoggerFactory.getLogger(OutboxScheduler::class.java) private const val PUBLISH_AND_DELETE_OUTBOX_RECORDS = "OutboxScheduler.publishAndDeleteOutboxRecords" } } Usually, the transactional outbox is more often required to guarantee data consistency between microservices. Here, for example, consumers in the same microservice process it and save it to MongoDB. The one more important detail here, as we’re processing Kafka events in multiple consumer processes, possible use cases when the order of the events processing can be randomized. In Kafka, we have key features, and it helps us because it sends messages with the same key to one partition. But if the broker has not had this feature, we have to handle it manually. Cases when, for example, first, some of the consumers are trying to process event #6 before events #4 and #5 were processed. For this reason, have a domain entity version field in outbox events, so we can simply look at the version and validate if in our database we have order version #3, but now processing event with version #6, we need first wait for #4,#5 and process them first, but of course, these details depend on each concrete business logic of the application, here shows only the idea that it’s a possible case. And one more important detail — is to retry topics. If we need to retry the process of the messages, better to create a retry topic and process retry here, how much time to retry, and other advanced logic detail depending on your concrete case. In the example, we have two listeners. Where one of them is for retry topic message processing: Kotlin @Component class OrderConsumer( private val kafkaTopicsConfiguration: KafkaTopicsConfiguration, private val serializer: Serializer, private val eventsPublisher: EventsPublisher, private val orderEventProcessor: OrderEventProcessor, private val or: ObservationRegistry, ) { @KafkaListener( groupId = "\${kafka.consumer-group-id:order-service-group-id}", topics = [ "\${topics.orderCreated.name}", "\${topics.productAdded.name}", "\${topics.productRemoved.name}", "\${topics.orderPaid.name}", "\${topics.orderCancelled.name}", "\${topics.orderSubmitted.name}", "\${topics.orderCompleted.name}", ], id = "orders-consumer" ) fun process(ack: Acknowledgment, consumerRecord: ConsumerRecord<String, ByteArray>) = runBlocking { coroutineScopeWithObservation(PROCESS, or) { observation -> try { observation.highCardinalityKeyValue("consumerRecord", getConsumerRecordInfoWithHeaders(consumerRecord)) processOutboxRecord(serializer.deserialize(consumerRecord.value(), OutboxRecord::class.java)) ack.acknowledge() log.info("committed record: ${getConsumerRecordInfo(consumerRecord)}") } catch (ex: Exception) { observation.highCardinalityKeyValue("consumerRecord", getConsumerRecordInfoWithHeaders(consumerRecord)) observation.error(ex) if (ex is SerializationException || ex is UnknownEventTypeException || ex is AlreadyProcessedVersionException) { log.error("ack not serializable, unknown or already processed record: ${getConsumerRecordInfoWithHeaders(consumerRecord)}") ack.acknowledge() return@coroutineScopeWithObservation } if (ex is InvalidVersionException || ex is NoSuchElementException || ex is OrderNotFoundException) { publishRetryTopic(kafkaTopicsConfiguration.retryTopic.name, consumerRecord, 1) ack.acknowledge() log.warn("ack concurrency write or version exception ${ex.localizedMessage}") return@coroutineScopeWithObservation } publishRetryTopic(kafkaTopicsConfiguration.retryTopic.name, consumerRecord, 1) ack.acknowledge() log.error("ack exception while processing record: ${getConsumerRecordInfoWithHeaders(consumerRecord)}", ex) } } } @KafkaListener(groupId = "\${kafka.consumer-group-id:order-service-group-id}", topics = ["\${topics.retryTopic.name}"], id = "orders-retry-consumer") fun processRetry(ack: Acknowledgment, consumerRecord: ConsumerRecord<String, ByteArray>): Unit = runBlocking { coroutineScopeWithObservation(PROCESS_RETRY, or) { observation -> try { log.warn("processing retry topic record >>>>>>>>>>>>> : ${getConsumerRecordInfoWithHeaders(consumerRecord)}") observation.highCardinalityKeyValue("consumerRecord", getConsumerRecordInfoWithHeaders(consumerRecord)) processOutboxRecord(serializer.deserialize(consumerRecord.value(), OutboxRecord::class.java)) ack.acknowledge() log.info("committed retry record: ${getConsumerRecordInfo(consumerRecord)}") } catch (ex: Exception) { observation.highCardinalityKeyValue("consumerRecord", getConsumerRecordInfoWithHeaders(consumerRecord)) observation.error(ex) val currentRetry = String(consumerRecord.headers().lastHeader(RETRY_COUNT_HEADER).value()).toInt() observation.highCardinalityKeyValue("currentRetry", currentRetry.toString()) if (ex is InvalidVersionException || ex is NoSuchElementException || ex is OrderNotFoundException) { publishRetryTopic(kafkaTopicsConfiguration.retryTopic.name, consumerRecord, currentRetry) log.warn("ack concurrency write or version exception ${ex.localizedMessage},record: ${getConsumerRecordInfoWithHeaders(consumerRecord)}") ack.acknowledge() return@coroutineScopeWithObservation } if (currentRetry > MAX_RETRY_COUNT) { publishRetryTopic(kafkaTopicsConfiguration.deadLetterQueue.name, consumerRecord, currentRetry + 1) ack.acknowledge() log.error("MAX_RETRY_COUNT exceed, send record to DLQ: ${getConsumerRecordInfoWithHeaders(consumerRecord)}") return@coroutineScopeWithObservation } if (ex is SerializationException || ex is UnknownEventTypeException || ex is AlreadyProcessedVersionException) { ack.acknowledge() log.error("commit not serializable, unknown or already processed record: ${getConsumerRecordInfoWithHeaders(consumerRecord)}") return@coroutineScopeWithObservation } log.error("exception while processing: ${ex.localizedMessage}, record: ${getConsumerRecordInfoWithHeaders(consumerRecord)}") publishRetryTopic(kafkaTopicsConfiguration.retryTopic.name, consumerRecord, currentRetry + 1) ack.acknowledge() } } } private suspend fun publishRetryTopic(topic: String, record: ConsumerRecord<String, ByteArray>, retryCount: Int) = coroutineScopeWithObservation(PUBLISH_RETRY_TOPIC, or) { observation -> observation.highCardinalityKeyValue("topic", record.topic()) .highCardinalityKeyValue("key", record.key()) .highCardinalityKeyValue("offset", record.offset().toString()) .highCardinalityKeyValue("value", String(record.value())) .highCardinalityKeyValue("retryCount", retryCount.toString()) record.headers().remove(RETRY_COUNT_HEADER) record.headers().add(RETRY_COUNT_HEADER, retryCount.toString().toByteArray()) mono { publishRetryRecord(topic, record, retryCount) } .retryWhen(Retry.backoff(PUBLISH_RETRY_COUNT, Duration.ofMillis(PUBLISH_RETRY_BACKOFF_DURATION_MILLIS)) .filter { it is SerializationException }) .awaitSingle() } } The role of the orders events processor at this microservice is validating the version of the events and updating MongoDB: Kotlin interface OrderEventProcessor { suspend fun on(orderCreatedEvent: OrderCreatedEvent) suspend fun on(productItemAddedEvent: ProductItemAddedEvent) suspend fun on(productItemRemovedEvent: ProductItemRemovedEvent) suspend fun on(orderPaidEvent: OrderPaidEvent) suspend fun on(orderCancelledEvent: OrderCancelledEvent) suspend fun on(orderSubmittedEvent: OrderSubmittedEvent) suspend fun on(orderCompletedEvent: OrderCompletedEvent) } @Service class OrderEventProcessorImpl( private val orderMongoRepository: OrderMongoRepository, private val or: ObservationRegistry, ) : OrderEventProcessor { override suspend fun on(orderCreatedEvent: OrderCreatedEvent): Unit = coroutineScopeWithObservation(ON_ORDER_CREATED_EVENT, or) { observation -> orderMongoRepository.insert(orderCreatedEvent.order).also { log.info("created order: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(productItemAddedEvent: ProductItemAddedEvent): Unit = coroutineScopeWithObservation(ON_ORDER_PRODUCT_ADDED_EVENT, or) { observation -> val order = orderMongoRepository.getByID(productItemAddedEvent.orderId) validateVersion(order.id, order.version, productItemAddedEvent.version) order.addProductItem(productItemAddedEvent.productItem) order.version = productItemAddedEvent.version orderMongoRepository.update(order).also { log.info("productItemAddedEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(productItemRemovedEvent: ProductItemRemovedEvent): Unit = coroutineScopeWithObservation(ON_ORDER_PRODUCT_REMOVED_EVENT, or) { observation -> val order = orderMongoRepository.getByID(productItemRemovedEvent.orderId) validateVersion(order.id, order.version, productItemRemovedEvent.version) order.removeProductItem(productItemRemovedEvent.productItemId) order.version = productItemRemovedEvent.version orderMongoRepository.update(order).also { log.info("productItemRemovedEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(orderPaidEvent: OrderPaidEvent): Unit = coroutineScopeWithObservation(ON_ORDER_PAID_EVENT, or) { observation -> val order = orderMongoRepository.getByID(orderPaidEvent.orderId) validateVersion(order.id, order.version, orderPaidEvent.version) order.pay(orderPaidEvent.paymentId) order.version = orderPaidEvent.version orderMongoRepository.update(order).also { log.info("orderPaidEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(orderCancelledEvent: OrderCancelledEvent): Unit = coroutineScopeWithObservation(ON_ORDER_CANCELLED_EVENT, or) { observation -> val order = orderMongoRepository.getByID(orderCancelledEvent.orderId) validateVersion(order.id, order.version, orderCancelledEvent.version) order.cancel() order.version = orderCancelledEvent.version orderMongoRepository.update(order).also { log.info("orderCancelledEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(orderSubmittedEvent: OrderSubmittedEvent): Unit = coroutineScopeWithObservation(ON_ORDER_SUBMITTED_EVENT, or) { observation -> val order = orderMongoRepository.getByID(orderSubmittedEvent.orderId) validateVersion(order.id, order.version, orderSubmittedEvent.version) order.submit() order.version = orderSubmittedEvent.version orderMongoRepository.update(order).also { log.info("orderSubmittedEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } override suspend fun on(orderCompletedEvent: OrderCompletedEvent): Unit = coroutineScopeWithObservation(ON_ORDER_COMPLETED_EVENT, or) { observation -> val order = orderMongoRepository.getByID(orderCompletedEvent.orderId) validateVersion(order.id, order.version, orderCompletedEvent.version) order.complete() order.version = orderCompletedEvent.version orderMongoRepository.update(order).also { log.info("orderCompletedEvent updatedOrder: $it") observation.highCardinalityKeyValue("order", it.toString()) } } private fun validateVersion(id: Any, currentDomainVersion: Long, eventVersion: Long) { log.info("validating version for id: $id, currentDomainVersion: $currentDomainVersion, eventVersion: $eventVersion") if (currentDomainVersion >= eventVersion) { log.warn("currentDomainVersion >= eventVersion validating version for id: $id, currentDomainVersion: $currentDomainVersion, eventVersion: $eventVersion") throw AlreadyProcessedVersionException(id, eventVersion) } if ((currentDomainVersion + 1) < eventVersion) { log.warn("currentDomainVersion + 1) < eventVersion validating version for id: $id, currentDomainVersion: $currentDomainVersion, eventVersion: $eventVersion") throw InvalidVersionException(eventVersion) } } } The MongoDB repository code is quite simple: Kotlin interface OrderMongoRepository { suspend fun insert(order: Order): Order suspend fun update(order: Order): Order suspend fun getByID(id: String): Order suspend fun getAllOrders(pageable: Pageable): Page<Order> } @Repository class OrderMongoRepositoryImpl( private val mongoTemplate: ReactiveMongoTemplate, private val or: ObservationRegistry, ) : OrderMongoRepository { override suspend fun insert(order: Order): Order = coroutineScopeWithObservation(INSERT, or) { observation -> withContext(Dispatchers.IO) { mongoTemplate.insert(OrderDocument.of(order)).awaitSingle().toOrder() .also { log.info("inserted order: $it") } .also { observation.highCardinalityKeyValue("order", it.toString()) } } } override suspend fun update(order: Order): Order = coroutineScopeWithObservation(UPDATE, or) { observation -> withContext(Dispatchers.IO) { val query = Query.query(Criteria.where(ID).`is`(order.id).and(VERSION).`is`(order.version - 1)) val update = Update() .set(EMAIL, order.email) .set(ADDRESS, order.address) .set(STATUS, order.status) .set(VERSION, order.version) .set(PAYMENT_ID, order.paymentId) .set(PRODUCT_ITEMS, order.productsList()) val options = FindAndModifyOptions.options().returnNew(true).upsert(false) val updatedOrderDocument = mongoTemplate.findAndModify(query, update, options, OrderDocument::class.java) .awaitSingleOrNull() ?: throw OrderNotFoundException(order.id.toUUID()) observation.highCardinalityKeyValue("order", updatedOrderDocument.toString()) updatedOrderDocument.toOrder().also { orderDocument -> log.info("updated order: $orderDocument") } } } override suspend fun getByID(id: String): Order = coroutineScopeWithObservation(GET_BY_ID, or) { observation -> withContext(Dispatchers.IO) { mongoTemplate.findById(id, OrderDocument::class.java).awaitSingle().toOrder() .also { log.info("found order: $it") } .also { observation.highCardinalityKeyValue("order", it.toString()) } } } override suspend fun getAllOrders(pageable: Pageable): Page<Order> = coroutineScopeWithObservation(GET_ALL_ORDERS, or) { observation -> withContext(Dispatchers.IO) { val query = Query().with(pageable) val data = async { mongoTemplate.find(query, OrderDocument::class.java).collectList().awaitSingle() }.await() val count = async { mongoTemplate.count(Query(), OrderDocument::class.java).awaitSingle() }.await() PageableExecutionUtils.getPage(data.map { it.toOrder() }, pageable) { count } .also { observation.highCardinalityKeyValue("pageResult", it.pageable.toString()) } } } } More details and source code of the full project you can find here in the GitHub repository. In real-world applications, we have to implement many more necessary features, like k8s health checks, rate limiters, etc. Depending on the project, it can be implemented in different ways. For example, you can use Kubernetes and Istio for some of them. I hope this article is useful and helpful, and am happy to receive any feedback or questions. Feel free to contact me by email or any messengers :)
To get more clarity about ISR in Apache Kafka, we should first carefully examine the replication process in the Kafka broker. In short, replication means having multiple copies of our data spread across multiple brokers. Maintaining the same copies of data in different brokers makes possible the high availability in case one or more brokers go down or are untraceable in a multi-node Kafka cluster to server the requests. Because of this reason, it is mandatory to mention how many copies of data we want to maintain in the multi-node Kafka cluster while creating a topic. It is termed a replication factor, and that’s why it can’t be more than one while creating a topic on a single-node Kafka cluster. The number of replicas specified while creating a topic can be changed in the future based on node availability in the cluster. On a single-node Kafka cluster, however, we can have more than one partition in the broker because each topic can have one or more partitions. The Partitions are nothing but sub-divisions of the topic into multiple parts across all the brokers on the cluster, and each partition would hold the actual data(messages). Internally, each partition is a single log file upon which records are written in an append-only fashion. Based on the provided number, the topic internally split into the number of partitions at the time of creation. Thanks to partitioning, messages can be distributed in parallel among several brokers in the cluster. Kafka scales to accommodate several consumers and producers at once by employing this parallelism technique. This partitioning technique enables linear scaling for both consumers and providers. Even though more partitions in a Kafka cluster provide a higher throughput but with more partitions, there are pitfalls too. Briefly, more file handlers would be created if we increase the number of partitions as each partition maps to a directory in the file system in the broker. Now it would be easy for us to understand better the ISR as we have discussed replication and partitions of Apache Kafka above. The ISR is just a partition’s replicas that are “in sync” with the leader, and the leader is nothing but a replica that all requests from clients and other brokers of Kafka go to it. Other replicas that are not the leader are termed followers. A follower that is in sync with the leader is called an ISR (in-sync replica). For example, if we set the topic’s replication factor to 3, Kafka will store the topic-partition log in three different places and will only consider a record to be committed once all three of these replicas have verified that they have written the record to the disc successfully and eventually send back the acknowledgment to the leader. In a multi-broker (multi-node) Kafka cluster (please click here to read how a multi-node Kafka cluster can be created), one broker is selected as the leader to serve the other brokers, and this leader broker would be responsible to handle all the read and write requests for a partition while the followers (other brokers) passively replicate the leader to achieve the data consistency. Each partition can only have one leader at a time and handles all reads and writes of records for that partition. The Followers replicate leaders and take over if the leader dies. By leveraging Apache Zookeeper, Kafka internally selects the replica of one broker’s partition, and if the leader of that partition fails (due to an outage of that broker), Kafka chooses a new ISR (in-sync replica) as the new leader. When all of the ISRs for a partition write to their log, the record is said to have been “committed,” and the consumer can only read committed records. The minimum in-sync replica count specifies the minimum number of replicas that must be present for the producer to successfully send records to a partition. Even though the high number of minimum in-sync replicas gives a higher persistence but there might be a repulsive effect, too, in terms of availability. The data availability automatically gets reduced if the minimum number of in-sync replicas won’t be available before publishing. The minimum number of in-sync replicas indicates how many replicas must be available for the producer to send records to a partition successfully. For example, if we have a three-node operational Kafka cluster with minimum in-sync replicas configuration as three, and subsequently, if one node goes down or unreachable, then the rest other two nodes will not be able to receive any data/messages from the producers because of only two active/available in sync replicas across the brokers. The third replica, which existed on the dead or unavailable broker, won’t be able to send the acknowledgment to the leader that it was synced with the latest data like how the other two live replicas did on the available brokers in the cluster. Hope you have enjoyed this read. Please like and share if you feel this composition is valuable.
We want to save our thumbnail data to a database so that we can render our pictures to a nice HTML gallery page and finish the proof of concept for our Google Photos clone! Which database should we use and why? Which Java database API? What database tools will make our lives easier along the way? Find out in this episode of Marco Codes! What’s in the Video 00:00 Intro We'll cover what the plan is for this episode: to add database capabilities to our Google Photos clone, which currently only works with files, but doesn't store their metadata in a database table. 00:52 Where We Left Off Before jumping straight into implementing database and ORM features, we will do a quick code recap of the previous episodes, to remind ourselves how the image scanning and conversion process currently works. 01:46 Setup Whenever we want to do something with databases and Java, we need a couple of (in this case) Maven dependencies. More specifically we want to make sure to add the H2 database to our project, which we will use for production, not just for testing! We'll also add the HikariCP connection pool to it - something I do by default in every project and which is usually done automatically by frameworks like Spring Boot. 04:38 Writing a Database Schema Here, I present my current approach when doing Java database work: making sure the database schema is hand-written, thinking through table names, column names, types, etc. Hence, we'll start writing a schema.sql file for our new "media" table during this section. 10:08 Creating a DataSource Having created the schema, we'll need to create a DataSource next. As we're using HikariCP, we'll follow its documentation pages to set up the DataSource. We'll also make sure the schema.sql file written earlier gets automatically executed whenever we run our application. 12:46 Saving Thumbnail Data It's finally time to not just render thumbnail files on disk, but also save information about the generated thumbnails and original images in our brand-new database table! We'll use plain JDBC to do that and talk about its advantages and disadvantages. 14:00 Refactoring Maneuver Sometimes you just need to _see_ certain things that are very hard to explain in words. To clean up our program, we will have to change a couple of method signatures and move parameters up and down throughout the file. 16:21 Extracting Image Creation Dates At the moment, we don't properly detect the image creation date from its metadata. We'll talk about how to implement this in the future and why we'll stick with the file creation date for now. 17:10 Avoiding Duplication We'll also need to handle duplicates. If we re-run our program several times, we don't want to store the image metadata multiple times in our tables. Let's fix this here. 19:04 Inspecting H2 File DBs In case you don't know how to access H2 file databases, we will spend some time showing you how to do that from inside IntelliJ IDEA and its database tool window. 21:23 Rendering HTML Output Last but not least, we'll need to render all the information from our database to a nice, little HTML page, so we can actually browse our thumbnails! As a bonus point, this will be the simplest and probably dirtiest implementation of such an HTML page you've seen for a while - but it works! 30:30 What’s Next? Did you like what you saw? Which feature should we implement next? Let me know! Video
In many places, you can read that Podman is a drop-in replacement for Docker. But is it as easy as it sounds? In this blog, you will start with a production-ready Dockerfile and execute the Podman commands just like you would do when using Docker. Let’s investigate whether this works without any problems! Introduction Podman is a container engine, just as Docker is. Podman, however, is a daemonless container engine, and it runs containers by default as rootless containers. This is more secure than running containers as root. The Docker daemon can also run as a non-root user nowadays. Podman advertises on its website that Podman is a drop-in replacement for Docker. Just add alias docker=podman , and you will be fine. Let’s investigate whether it is that simple. In the remainder of this blog, you will try to build a production-ready Dockerfile for running a Spring Boot application. You will run it as a single container, and you will try to run two containers and have some inter-container communication. In the end, you will verify how volumes can be mounted. One of the prerequisites for this blog is using a Linux operating system. Podman is not available for Windows. The sources used in this blog can be found at GitHub. The Dockerfile you will be using runs a Spring Boot application. It is a basic Spring Boot application containing one controller which returns a hello message. Build the jar: Shell $ mvn clean verify Run the jar: Shell $ java -jar target/mypodmanplanet-0.0.1-SNAPSHOT.jar Check the endpoint: Shell $ curl http://localhost:8080/hello Hello Podman! The Dockerfile is based on a previous blog about Docker best practices. The file 1-Dockerfile-starter can be found in the Dockerfiles directory. Dockerfile FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder WORKDIR application ARG JAR_FILE COPY target/${JAR_FILE} app.jar RUN java -Djarmode=layertools -jar app.jar extract FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f WORKDIR /opt/app RUN addgroup --system javauser && adduser -S -s /usr/sbin/nologin -G javauser javauser COPY --from=builder application/dependencies/ ./ COPY --from=builder application/spring-boot-loader/ ./ COPY --from=builder application/snapshot-dependencies/ ./ COPY --from=builder application/application/ ./ RUN chown -R javauser:javauser . USER javauser ENTRYPOINT ["java", "org.springframework.boot.loader.JarLauncher"] Prerequisites Prerequisites for this blog are: Basic Linux knowledge, Ubuntu 22.04 is used during this post; Basic Java and Spring Boot knowledge; Basic Docker knowledge; Installation Installing Podman is quite easy. Just run the following command. Shell $ sudo apt-get install podman Verify the correct installation. Shell $ podman --version podman version 3.4.4 \You can also install podman-docker, which will create an alias when you use docker in your commands. It is advised to wait for the conclusion of this post before you install this one. Build Dockerfile The first thing to do is to build the container image. Execute from the root of the repository the following command. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/1-Dockerfile-starter --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder [2/2] STEP 1/10: FROM eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Error: error creating build container: short-name "eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f" did not resolve to an alias and no unqualified-search registries are defined in "/etc/containers/registries.conf" This returns an error while retrieving the base image. The error message refers to /etc/containers/registries.conf. The following is stated in this file. Plain Text # For more information on this configuration file, see containers-registries.conf(5). # # NOTE: RISK OF USING UNQUALIFIED IMAGE NAMES # We recommend always using fully qualified image names including the registry # server (full dns name), namespace, image name, and tag # (e.g., registry.redhat.io/ubi8/ubi:latest). Pulling by digest (i.e., # quay.io/repository/name@digest) further eliminates the ambiguity of tags. # When using short names, there is always an inherent risk that the image being # pulled could be spoofed. For example, a user wants to pull an image named # `foobar` from a registry and expects it to come from myregistry.com. If # myregistry.com is not first in the search list, an attacker could place a # different `foobar` image at a registry earlier in the search list. The user # would accidentally pull and run the attacker's image and code rather than the # intended content. We recommend only adding registries which are completely # trusted (i.e., registries which don't allow unknown or anonymous users to # create accounts with arbitrary names). This will prevent an image from being # spoofed, squatted or otherwise made insecure. If it is necessary to use one # of these registries, it should be added at the end of the list. To conclude, it is suggested to use a fully qualified image name. This means that you need to change the lines containing: Dockerfile eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Into: Dockerfile docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f You just add docker.io/ to the image name. A minor change, but already one difference compared to Docker. The image name is fixed in file 2-Dockerfile-fix-shortname, so let’s try building the image again. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/2-Dockerfile-fix-shortname --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder Trying to pull docker.io/library/eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f... Getting image source signatures Copying blob 72ac8a0a29d6 done Copying blob f56be85fc22e done Copying blob f8ed194273be done Copying blob e5daea9ee890 done [2/2] STEP 1/10: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f Error: error creating build container: writing blob: adding layer with blob "sha256:f56be85fc22e46face30e2c3de3f7fe7c15f8fd7c4e5add29d7f64b87abdaa09": Error processing tar file(exit status 1): potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/shadow): Check /etc/subuid and /etc/subgid: lchown /etc/shadow: invalid argument Now there is an error about potentially insufficient UIDs or GIDs available in the user namespace. More information about this error can be found here. It is very well explained in that post, and it is too much to repeat all of this in this post. The summary is that the image which is trying to be pulled, has files owned by UIDs over 65.536. Due to that issue, the image would not fit into rootless Podman’s default UID mapping, which limits the number of UIDs and GIDs available. So, how to solve this? First, check the contents of /etc/subuid and /etc/subgid. In my case, the following is the output. For you, it will probably be different. Shell $ cat /etc/subuid admin:100000:65536 $ cat /etc/subgid admin:100000:65536 The admin user listed in the output has 100.000 as the first UID or GID available, and it has a size of 65.536. The format is user:start:size. This means that the admin user has access to UIDs or GIDs 100.000 up to and including 165.535. My current user is not listed here, and that means that my user can only allocate 1 UID en 1 GID for the container. That 1 UID/GID is already taken for the root user in the container. If a container image needs an extra user, there will be a problem, as you can see above. This can be solved by adding UIDs en GIDs for your user. Let’s add values 200.000 up to and including 265.535 to your user. Shell $ sudo usermod --add-subuids 200000-265535 --add-subgids 200000-265535 <replace with your user> Verify the contents of both files again. The user is added to both files. Shell $ cat /etc/subgid admin:100000:65536 <your user>:200000:65536 $ cat /etc/subuid admin:100000:65536 <your user>:200000:65536 Secondly, you need to run the following command. Shell $ podman system migrate Try to build the image again, and now it works. Shell $ podman build . --tag mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f Dockerfiles/2-Dockerfile-fix-shortname --build-arg JAR_FILE=mypodmanplanet-0.0.1-SNAPSHOT.jar [1/2] STEP 1/5: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f AS builder Trying to pull docker.io/library/eclipse-temurin@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f... Getting image source signatures Copying blob f56be85fc22e done Copying blob f8ed194273be done Copying blob 72ac8a0a29d6 done Copying blob e5daea9ee890 done Copying config c74d412c3d done Writing manifest to image destination Storing signatures [1/2] STEP 2/5: WORKDIR application --> d4f0e970dc1 [1/2] STEP 3/5: ARG JAR_FILE --> ca97dcd6f2a [1/2] STEP 4/5: COPY target/${JAR_FILE} app.jar --> 58d88cfa511 [1/2] STEP 5/5: RUN java -Djarmode=layertools -jar app.jar extract --> 348cae813a4 [2/2] STEP 1/10: FROM docker.io/eclipse-temurin:17.0.6_10-jre-alpine@sha256:c26a727c4883eb73d32351be8bacb3e70f390c2c94f078dc493495ed93c60c2f [2/2] STEP 2/10: WORKDIR /opt/app --> 4118cdf90b5 [2/2] STEP 3/10: RUN addgroup --system javauser && adduser -S -s /usr/sbin/nologin -G javauser javauser --> cd11f346381 [2/2] STEP 4/10: COPY --from=builder application/dependencies/ ./ --> 829bffcb6c7 [2/2] STEP 5/10: COPY --from=builder application/spring-boot-loader/ ./ --> 2a93f97d424 [2/2] STEP 6/10: COPY --from=builder application/snapshot-dependencies/ ./ --> 3e292cb0456 [2/2] STEP 7/10: COPY --from=builder application/application/ ./ --> 5dd231c5b51 [2/2] STEP 8/10: RUN chown -R javauser:javauser . --> 4d736e8c3bb [2/2] STEP 9/10: USER javauser --> d7a96ca6f36 [2/2] STEP 10/10: ENTRYPOINT ["java", "org.springframework.boot.loader.JarLauncher"] [2/2] COMMIT mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT --> 567fd123071 Successfully tagged localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 567fd1230713f151950de7151da82a19d34f80af0384916b13bf49ed72fd2fa1 Verify the list of images with Podman just like you would do with Docker: Shell $ podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/mydeveloperplanet/mypodmanplanet 0.0.1-SNAPSHOT 567fd1230713 2 minutes ago 209 MB Is Podman a drop-in replacement for Docker for building a Dockerfile? No, it is not a drop-in replacement because you needed to use the fully qualified image name for the base image in the Dockerfile, and you needed to make changes to the user namespace in order to be able to pull the image. Besides these two changes, building the container image just worked. Start Container Now that you have built the image, it is time to start a container. Shell $ podman run --name mypodmanplanet -d localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The container has started successfully. Shell $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 27639dabb573 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 18 seconds ago Up 18 seconds ago mypodmanplanet You can also inspect the container logs. Shell $ podman logs mypodmanplanet . ____ _ __ _ _ /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ \\/ ___)| |_)| | | | | || (_| | ) ) ) ) ' |____| .__|_| |_|_| |_\__, | / / / / =========|_|==============|___/=/_/_/_/ :: Spring Boot :: (v3.0.5) 2023-04-22T14:38:05.896Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : Starting MyPodmanPlanetApplication v0.0.1-SNAPSHOT using Java 17.0.6 with PID 1 (/opt/app/BOOT-INF/classes started by javauser in /opt/app) 2023-04-22T14:38:05.898Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : No active profile set, falling back to 1 default profile: "default" 2023-04-22T14:38:06.803Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http) 2023-04-22T14:38:06.815Z INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat] 2023-04-22T14:38:06.816Z INFO 1 --- [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.7] 2023-04-22T14:38:06.907Z INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext 2023-04-22T14:38:06.910Z INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 968 ms 2023-04-22T14:38:07.279Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path '' 2023-04-22T14:38:07.293Z INFO 1 --- [ main] c.m.m.MyPodmanPlanetApplication : Started MyPodmanPlanetApplication in 1.689 seconds (process running for 1.911) Verify whether the endpoint can be accessed. Shell $ curl http://localhost:8080/hello curl: (7) Failed to connect to localhost port 8080 after 0 ms: Connection refused That’s not the case. With Docker, you can inspect the container to see which IP address is allocated to the container. Shell $ podman inspect mypodmanplanet | grep IPAddress "IPAddress": "", It seems that the container does not have a specific IP address. The endpoint is also not accessible at localhost. The solution is to add a port mapping when creating the container. Stop the container and remove it. Shell $ podman stop mypodmanplanet mypodmanplanet $ podman rm mypodmanplanet 27639dabb5730d3244d205200a409dbc3a1f350196ba238e762438a4b318ef73 Start the container again, but this time with a port mapping of internal port 8080 to an external port 8080. Shell $ podman run -p 8080:8080 --name mypodmanplanet -d localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify again whether the endpoint can be accessed. This time it works. Shell $ curl http://localhost:8080/hello Hello Podman! Stop and remove the container before continuing this blog. Is Podman a drop-in replacement for Docker for running a container image? No, it is not a drop-in replacement. Although it was possible to use exactly the same commands as with Docker, you needed to explicitly add a port mapping. Without the port mapping, it was not possible to access the endpoint. Volume Mounts Volume mounts and access to directories and files outside the container and inside a container often lead to Permission Denied errors. In a previous blog, this behavior is extensively described for the Docker engine. It is interesting to see how this works when using Podman. You will map an application.properties file in the container next to the jar file. The Spring Boot application will pick up this application.properties file. The file configures the server port to port 8082, and the file is located in the directory properties in the root of the repository. Properties files server.port=8082 Run the container with a port mapping from internal port 8082 to external port 8083 and mount the application.properties file into the container directory /opt/app where also the jar file is located. The volume mount has the property ro in order to indicate that it is a read-only file. Shell $ podman run -p 8083:8082 --volume ./properties/application.properties:/opt/app/application.properties:ro --name mypodmanplanet localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify whether the endpoint can be accessed and whether it works. Shell $ curl http://localhost:8083/hello Hello Podman! Open a shell in the container and list the directory contents in order to view the ownership of the file. Shell $ podman exec -it mypodmanplanet sh /opt/app $ ls -la total 24 drwxr-xr-x 1 javauser javauser 4096 Apr 15 10:33 . drwxr-xr-x 1 root root 4096 Apr 9 12:57 .. drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 BOOT-INF drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 META-INF -rw-r--r-- 1 root root 16 Apr 15 10:24 application.properties drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 org With Docker, the file would have been owned by your local system user, but with Podman, the file is owned by root. Let’s check the permissions of the file on the local system. Shell $ ls -la total 12 drwxr-xr-x 2 <myuser> domain users 4096 apr 15 12:24 . drwxr-xr-x 8 <myuser> domain users 4096 apr 15 12:24 .. -rw-r--r-- 1 <myuser> domain users 16 apr 15 12:24 application.properties As you can see, the file on the local system is owned by <myuser>. This means that your host user, who is running the container, is seen as a user root inside of the container. Open a shell in the container and try to change the contents of the file application.properties. You will notice that this is not allowed because you are a user javauser. Shell $ podman exec -it mypodmanplanet sh /opt/app $ vi application.properties /opt/app $ whoami javauser Stop and remove the container. Run the container, but this time with property U instead of ro. The U suffix tells Podman to use the correct host UID and GID based on the UID and GID within the container to change the owner and group of the source volume recursively. Shell $ podman run -p 8083:8082 --volume ./properties/application.properties:/opt/app/application.properties:U --name mypodmanplanet localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Open a shell in the container, and now the user javauser is the owner of the file. Shell $ podman exec -it mypodmanplanet sh /opt/app $ ls -la total 24 drwxr-xr-x 1 javauser javauser 4096 Apr 15 10:41 . drwxr-xr-x 1 root root 4096 Apr 9 12:57 .. drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 BOOT-INF drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 META-INF -rw-r--r-- 1 javauser javauser 16 Apr 15 10:24 application.properties drwxr-xr-x 1 javauser javauser 4096 Apr 9 12:57 org On the local system, a different UID and GID than my local user have taken ownership. Shell $ ls -la properties/ total 12 drwxr-xr-x 2 <myuser> domain users 4096 apr 15 12:24 . drwxr-xr-x 8 <myuser> domain users 4096 apr 15 12:24 .. -rw-r--r-- 1 200099 200100 16 apr 15 12:24 application.properties This time, changing the file on the local system is not allowed, but it is allowed inside the container for user javauser. Is Podman a drop-in replacement for Docker for mounting volumes inside a container? No, it is not a drop-in replacement. The file permissions function is a bit different than with the Docker engine. You need to know the differences in order to be able to mount files and directories inside containers. Pod Podman knows the concept of a Pod, just like a Pod in Kubernetes. A Pod allows you to group containers. A Pod also has a shared network namespace, and this means that containers inside a Pod can connect to each other. More information about container networking can be found here. This means that Pods are the first choice for grouping containers. When using Docker, you will use Docker Compose for this. There exists something like Podman Compose, but this deserves a blog in itself. Let’s see how this works. You will set up a Pod running two containers with the Spring Boot application. First, you need to create a Pod. You also need to expose the ports you want to be accessible outside of the Pod. This can be done with the -p argument. And you give the Pod a name, hello-pod in this case. Shell $ podman pod create -p 8080-8081:8080-8081 --name hello-pod When you list the Pod, you notice that it already contains one container. This is the infra container. This infra container holds the namespace in order that containers can connect to each other, and it enables starting and stopping containers in the Pod. The infra container is based on the k8s.gcr.io/pause image. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Created 3 seconds ago aac3420b3672 1 $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 4 minutes ago Created 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra Create a container mypodmanplanet-1 and add it to the Pod. By means of the --env argument, you change the port of the Spring Boot application to port 8081. Shell $ podman create --pod hello-pod --name mypodmanplanet-1 --env 'SERVER_PORT=8081' localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env Start the Pod. Shell $ podman pod start hello-pod Verify whether the endpoint can be reached at port 8081 and verify that the endpoint at port 8080 cannot be reached. Shell $ curl http://localhost:8081/hello Hello Podman! $ curl http://localhost:8080/hello curl: (56) Recv failure: Connection reset by peer Add a second container mypodmanplanet-2 to the Pod, this time running at the default port 8080. Shell $ podman create --pod hello-pod --name mypodmanplanet-2 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Verify the Pod status. It says that the status is Degraded. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Degraded 9 minutes ago aac3420b3672 3 Take a look at the containers. Two containers are running, and a new container is just created. That is the reason the Pod has the status Degraded. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 11 minutes ago Up 2 minutes ago 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra 321a62fbb4fc localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env 3 minutes ago Up 2 minutes ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-1 7b95fb521544 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Created 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-2 Start the second container and verify the Pod status. The status is now Running. Shell $ podman start mypodmanplanet-2 $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Running 12 minutes ago aac3420b3672 3 Both endpoints can now be reached. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! Verify whether you can access the endpoint of container mypodmanplanet-1 from within mypodmanplanet-2. This also works. Shell $ podman exec -it mypodmanplanet-2 sh /opt/app $ wget http://localhost:8081/hello Connecting to localhost:8081 (127.0.0.1:8081) saving to 'hello' hello 100% |***********************************************************************************************************************************| 13 0:00:00 ETA 'hello' saved Cleanup To conclude, you can do some cleanup. Stop the running Pod. Shell $ podman pod stop hello-pod The Pod has the status Exited now. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS dab9029ad0c5 hello-pod Exited 55 minutes ago aac3420b3672 3 All containers in the Pod are also exited. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES aac3420b3672 k8s.gcr.io/pause:3.5 56 minutes ago Exited (0) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp dab9029ad0c5-infra 321a62fbb4fc localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env 48 minutes ago Exited (143) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-1 7b95fb521544 localhost/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 46 minutes ago Exited (143) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp mypodmanplanet-2 Remove the Pod. Shell $ podman pod rm hello-pod The Pod and the containers are removed. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES Conclusion The bold statement that Podman is a drop-in replacement for Docker is not true. Podman differs from Docker on certain topics like building container images, starting containers, networking, volume mounts, inter-container communication, etc. However, Podman does support many Docker commands. The statement should be Podman is an alternative to Docker. This is certainly true. It is important for you to know and understand the differences before switching to Podman. After this, it is definitely a good alternative.
This post explains how to launch an Amazon EMR cluster and deploy a Kedro project to run a Spark job. Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform for applications built using open-source big data frameworks, such as Apache Spark, that process and analyze vast amounts of data with AWS. 1. Set up the Amazon EMR Cluster One way to install Python libraries onto Amazon EMR is to package a virtual environment and deploy it. To do this, the cluster needs to have the same Amazon Linux 2 environment as used by Amazon EMR. We used this example Dockerfile to package our dependencies on an Amazon Linux 2 base. Our example Dockerfile is as below: Shell FROM --platform=linux/amd64 amazonlinux:2 AS base RUN yum install -y python3 ENV VIRTUAL_ENV=/opt/venv RUN python3 -m venv $VIRTUAL_ENV ENV PATH="$VIRTUAL_ENV/bin:$PATH" COPY requirements.txt /tmp/requirements.txt RUN python3 -m pip install --upgrade pip && \ python3 -m pip install venv-pack==0.2.0 && \ python3 -m pip install -r /tmp/requirements.txt RUN mkdir /output && venv-pack -o /output/pyspark_deps.tar.gz FROM scratch AS export COPY --from=base /output/pyspark_deps.tar.gz / Note: A DOCKER_BUILDKIT backend is necessary to run this Dockerfile (make sure you have it installed). Run the Dockerfile using the following command:DOCKER_BUILDKIT=1 docker build --output . <output-path> This will generate a pyspark_deps.tar.gz file at the <output-path> specified in the command above. Use this command if your Dockerfile has a different name:DOCKER_BUILDKIT=1 docker build -f Dockerfile-emr-venv --output . <output-path> 2. Set up CONF_ROOT The kedro package command only packs the source code and yet the conf directory is essential for running any Kedro project. To make it available to Kedro separately, its location can be controlled by setting CONF_ROOT. By default, Kedro looks at the root conf folder for all its configurations (catalog, parameters, globals, credentials, logging) to run the pipelines, but this can be customised by changing CONF_ROOT in settings.py.For Kedro versions < 0.18.5 For Kedro versions >= 0.18.5 Change CONF_ROOT in settings.py to the location where the conf directory will be deployed. It could be anything. e.g. ./conf or /mnt1/kedro/conf. For Kedro versions >= 0.18.5 Use the --conf-source CLI parameter directly with kedro run to specify the path. CONF_ROOT need not be changed in settings.py. 3. Package the Kedro Project Package the project using the kedro package command from the root of your project folder. This will create a .whl in the dist folder that will be used when doing spark-submit to the Amazon EMR cluster to specify the --py-files to refer to the source code. 4. Create .tar for conf As described, the kedro package command only packs the source code and yet the conf directory is essential for running any Kedro project. Therefore it needs to be deployed separately as a tar.gz file. It is important to note that the contents inside the folder needs to be zipped and not the conf folder entirely. Use the following command to zip the contents of the conf directory and generate a conf.tar.gz file containing catalog.yml, parameters.yml and other files needed to run the Kedro pipeline. It will be used with spark-submit for the --archives option to unpack the contents into a conf directory.tar -czvf conf.tar.gz --exclude="local" conf/* 5. Create an Entrypoint for the Spark Application Create an entrypoint.py file that the Spark application will use to start the job. This file can be modified to take arguments and can be run only using main(sys.argv) after removing the params array.python entrypoint.py --pipeline my_new_pipeline --params run_date:2023-02-05,runtime:cloud This would mimic the exact kedro run behaviour. Python import sys from proj_name.__main__ import main: if __name__ == "__main__": """ These params could be used as *args to test pipelines locally. The example below will run `my_new_pipeline` using `ThreadRunner` applying a set of params params = [ "--pipeline", "my_new_pipeline", "--runner", "ThreadRunner", "--params", "run_date:2023-02-05,runtime:cloud", ] main(params) """ main(sys.argv) 6. Upload Relevant Files to S3 Upload the relevant files to an S3 bucket (Amazon EMR should have access to this bucket), in order to run the Spark Job. The following artifacts should be uploaded to S3: .whl file created in step #3 Virtual Environment tar.gz created in step 1 (e.g. pyspark_deps.tar.gz) .tar file for conf folder created in step #4 (e.g. conf.tar.gz) entrypoint.py file created in step #5. 7.spark-submit to the Amazon EMR Cluster Use the following spark-submit command as a step on Amazon EMR running in cluster mode. A few points to note: pyspark_deps.tar.gz is unpacked into a folder named environment Environment variables are set referring to libraries unpacked in the environment directory above. e.g. PYSPARK_PYTHON=environment/bin/python conf directory is unpacked to a folder specified in the following after the # symbol ( s3://{S3_BUCKET}/conf.tar.gz#conf) Note the following: Kedro versions < 0.18.5. The folder location/name after the # symbol should match with CONF_ROOT in settings.py Kedro versions >= 0.18.5. You could follow the same approach as earlier. However, Kedro now provides flexibility to provide the CONF_ROOT through the CLI parameters using --conf-source instead of setting CONF_ROOT in settings.py. Therefore --conf-root configuration could be directly specified in the CLI parameters and step 2 can be skipped completely. Shell spark-submit --deploy-mode cluster --master yarn --conf spark.submit.pyFiles=s3://{S3_BUCKET}/<whl-file>.whl --archives=s3://{S3_BUCKET}/pyspark_deps.tar.gz#environment,s3://{S3_BUCKET}/conf.tar.gz#conf --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=environment/bin/python --conf spark.executorEnv.PYSPARK_PYTHON=environment/bin/python --conf spark.yarn.appMasterEnv.<env-var-here>={ENV} --conf spark.executorEnv.<env-var-here>={ENV} s3://{S3_BUCKET}/run.py --env base --pipeline my_new_pipeline --params run_date:2023-03-07,runtime:cloud Summary This post describes the sequence of steps needed to deploy a Kedro project to an Amazon EMR cluster. Set up the Amazon EMR cluster Set up CONF_ROOT (optional for Kedro versions >= 0.18.5) Package the Kedro project Create an entrypoint for the Spark application Upload relevant files to S3 spark-submit to the Amazon EMR cluster Kedro supports a range of deployment targets, including Amazon SageMaker, Databricks, Vertex AI and Azure ML, and our documentation additionally includes a range of approaches for single-machine deployment to a production server.
In this post, we'll delve into the fascinating world of operator overloading in Java. Although Java doesn't natively support operator overloading, we'll discover how Manifold can extend Java with that functionality. We'll explore its benefits, limitations, and use cases, particularly in scientific and mathematical code. We will also explore three powerful features provided by Manifold that enhance the default Java-type safety while enabling impressive programming techniques. We'll discuss unit expressions, type-safe reflection coding, and fixing methods like equals during compilation. Additionally, we'll touch upon a solution that Manifold offers to address some limitations of the var keyword. Let's dive in! Before we begin, as always, you can find the code examples for this post and other videos in this series on my GitHub page. Be sure to check out the project, give it a star, and follow me on GitHub to stay updated! Arithmetic Operators Operator overloading allows us to use familiar mathematical notation in code, making it more expressive and intuitive. While Java doesn't support operator overloading by default, Manifold provides a solution to this limitation. To demonstrate, let's start with a simple Vector class that performs vector arithmetic operations. In standard Java code, we define variables, accept them in the constructor, and implement methods like plus for vector addition. However, this approach can be verbose and less readable. Java public class Vec { private float x, y, z; public Vec(float x, float y, float z) { this.x = x; this.y = y; this.z = z; } public Vec plus(Vec other) { return new Vec(x + other.x, y + other.y, z + other.z); } } With Manifold, we can simplify the code significantly. Using Manifold's operator overloading features, we can directly add vectors together using the + operator as such: Java Vec vec1 = new Vec(1, 2, 3); Vec vec2 = new Vec(1, 1, 1); Vec vec3 = vec1 + vec2; Manifold seamlessly maps the operator to the appropriate method invocation, making the code cleaner and more concise. This fluid syntax resembles mathematical notation, enhancing code readability. Moreover, Manifold handles reverse notation gracefully. Suppose we reverse the order of the operands, such as a scalar plus a vector, Manifold swaps the order and performs the operation correctly. This flexibility enables us to write code in a more natural and intuitive manner. Let’s say we add this to the Vec class: Java public Vec plus(float other) { return new Vec(x + other, y + other, z + other); } This will make all these lines valid: Java vec3 += 5.0f; vec3 = 5.0f + vec3; vec3 = vec3 + 5.0f; vec3 += Float.valueOf(5.0f); In this code, we demonstrate that Manifold can swap the order to invoke Vec.plus(float) seamlessly. We also show that the plus equals operator support is built into the plus method support As implied by the previous code, Manifold also supports primitive wrapper objects, specifically in the context of autoboxing. In Java, primitive types have corresponding wrapper objects. Manifold handles the conversion between primitives and their wrapper objects seamlessly, thanks to autoboxing and unboxing. This enables us to work with objects and primitives interchangeably in our code. There are caveats to this, as we will find out. BigDecimal Support Manifold goes beyond simple arithmetic and supports more complex scenarios. For example, the manifold-science dependency includes built-in support for BigDecimal arithmetic. BigDecimal is a Java class used for precise calculations involving large numbers or financial computations? By using Manifold, we can perform arithmetic operations with BigDecimal objects using familiar operators, such as +, -, *, and /. Manifold's integration with BigDecimal simplifies code and ensures accurate calculations. The following code is legal once we add the right set of dependencies, which add method extensions to the BigDecimal class: Java var x = new BigDecimal(5L); var y = new BigDecimal(25L); var z = x + y; Under the hood, Manifold adds the applicable plus, minus, times, etc. methods to the class. It does so by leveraging class extensions which I discussed before. Limits of Boxing We can also extend existing classes to support operator overloading. Manifold allows us to extend classes and add methods that accept custom types or perform specific operations. For instance, we can extend the Integer class and add a plus method that accepts BigDecimal as an argument and returns a BigDecimal result. This extension enables us to perform arithmetic operations between different types seamlessly. The goal is to get this code to compile: Java var z = 5 + x + y; Unfortunately, this won’t compile with that change. The number five is a primitive, not an Integer, and the only way to get that code to work would be: Java var z = Integer.valueOf(5) + x + y; This isn’t what we want. However, there’s a simple solution. We can create an extension to BigDecimal itself and rely on the fact that the order can be swapped seamlessly. This means that this simple extension can support the 5 + x + y expression without a change: Java @Extension public class BigDecimalExt { public static BigDecimal plus(@This BigDecimal b, int i) { return b.plus(BigDecimal.valueOf(i)); } } List of Arithmetic Operators So far, we focused on the plus operator, but Manifold supports a wide range of operators. The following table lists the method name and the operators it supports: Operator Method + , += plus -, -= minus *, *= times /, /= div %, %= rem -a unaryMinus ++ inc -- dec Notice that the increment and decrement operators don’t have a distinction between the prefix and postfix positioning. Both a++ and ++a would lead to the inc method. Index Operator The support for the index operator took me completely off guard when I looked at it. This is a complete game-changer… The index operator is the square brackets we use to get an array value by index. To give you a sense of what I’m talking about, this is valid code in Manifold: Java var list = List.of("A", "B", "C"); var v = list[0]; In this case, v will be “A” and the code is the equivalent of invoking list.get(0). The index operators seamlessly map to get and set methods. We can do assignments as well using the following: Java var list = new ArrayList<>(List.of("A", "B", "C")); var v = list[0]; list[0] = "1"; Notice I had to wrap the List in an ArrayList since List.of() returns an unmodifiable List. But this isn’t the part I’m reeling about. That code is “nice.” This code is absolutely amazing: Java var map = new HashMap<>(Map.of("Key", "Value")); var key = map["Key"]; map["Key"] = "New Value"; Yes! You’re reading valid code in Manifold. An index operator is used to lookup in a map. Notice that a map has a put() method and not a set method. That’s an annoying inconsistency that Manifold fixed with an extension method. We can then use an object to look up within a map using the operator. Relational and Equality Operators We still have a lot to cover… Can we write code like this (referring to the Vec object from before): Java if(vec3 > vec2) { // … } This won’t compile by default. However, if we add the Comparable interface to the Vec class this will work as expected: Java public class Vec implements Comparable<Vec> { // … public double magnitude() { return Math.sqrt(x x + y y + z * z); } @Override public int compareTo(Vec o) { return Double.compare(magnitude(), o.magnitude()); } } These >=, >, <, <= comparison operators will work exactly as expected by invoking the compareTo method. But there’s a big problem. You will notice that the == and != operators are missing from this list. In Java, we often use these operators to perform pointer comparisons. This makes a lot of sense in terms of performance. We wouldn’t want to change something so inherent in Java. To avoid that, Manifold doesn’t override these operators by default. However, we can implement the ComparableUsing interface, which is a sub-interface of the Comparable interface. Once we do that the == and != will use the equals method by default. We can override that behavior by overriding the method equalityMode() which can return one of these values: CompareTo — will use the compareTo method for == and != Equals (the default) — will use the equals method Identity — will use pointer comparison as is the norm in Java That interface also lets us override the compareToUsing(T, Operator) method. This is similar to the compareTo method but lets us create operator-specific behavior, which might be important in some edge cases. Unit Expressions for Scientific Coding Notice that Unit expressions are experimental in Manifold. But they are one of the most interesting applications of operator overloading in this context. Unit expressions are a new type of operator that significantly simplifies and enhances scientific coding while enforcing strong typing. With unit expressions, we can define notations for mathematical expressions that incorporate unit types. This brings a new level of clarity and types of safety to scientific calculations. For example, consider a distance calculation where speed is defined as 100 miles per hour. By multiplying the speed (miles per hour) by the time (hours), we can obtain the distance as such: Java Length distance = 100 mph * 3 hr; Force force = 5kg * 9.807 m/s/s; if(force == 49.035 N) { // true } The unit expressions allow us to express numeric values (or variables) along with their associated units. The compiler checks the compatibility of units, preventing incompatible conversions and ensuring accurate calculations. This feature streamlines scientific code and enables powerful calculations with ease. Under the hood, a unit expression is just a conversion call. The expression 100 mph is converted to: Java VelocityUnit.postfixBind(Integer.valueOf(100)) This expression returns a Velocity object. The expression 3 hr is similarly bound to the postfix method and returns a Time object. At this point, the Manifold Velocity class has a times method, which, as you recall, is an operator, and it’s invoked on both results: Java public Length times( Time t ) { return new Length( toBaseNumber() * t.toBaseNumber(), LengthUnit.BASE, getDisplayUnit().getLengthUnit() ); } Notice that the class has multiple overloaded versions of the times method that accept different object types. A Velocity times Mass will produce Momentum. A Velocity times Force results in Power. Many units are supported as part of this package even in this early experimental stage, check them out here. You might notice a big omission here: Currency. I would love to have something like: Java var sum = 50 USD + 70 EUR; If you look at that code, the problem should be apparent. We need an exchange rate. This makes no sense without exchange rates and possibly conversion costs. The complexities of financial calculations don’t translate as nicely to the current state of the code. I suspect that this is the reason this is still experimental. I’m very curious to see how something like this can be solved elegantly. Pitfalls of Operator Overloading While Manifold provides powerful operator overloading capabilities, it's important to be mindful of potential challenges and performance considerations. Manifold's approach can lead to additional method calls and object allocations, which may impact performance, especially in performance-critical environments. It's crucial to consider optimization techniques, such as reducing unnecessary method calls and object allocations, to ensure efficient code execution. Let’s look at this code: Java var n = x + y + z; On the surface, it can seem efficient and short. It physically translates to this code: Java var n = x.plus(y).plus(z); This is still hard to spot but notice that in order to create the result, we invoke two methods and allocate at least two objects. A more efficient approach would be: Java var n = x.plus(y, z); This is an optimization we often do for high-performance matrix calculations. You need to be mindful of this and understand what the operator is doing under the hood if performance is important. I don’t want to imply that operators are inherently slower. In fact, they’re as fast as a method invocation, but sometimes the specific method invoked and the number of allocations are unintuitive. Type Safety Features The following aren’t related to operator overloading, but they were a part of the second video, so I feel they make sense as part of a wide-sweeping discussion on type safety. One of my favorite things about Manifold is its support of strict typing and compile time errors. To me, both represent the core spirit of Java. JailBreak: Type-Safe Reflection @JailBreak is a feature that grants access to the private state within a class. While it may sound bad, @JailBreak offers a better alternative to using traditional reflection to access private variables. By jailbreaking a class, we can access its private state seamlessly, with the compiler still performing type checks. In that sense, it’s the lesser of two evils. If you’re going to do something terrible (accessing private state), then at least have it checked by the compiler. In the following code, the value array is private to String, yet we can manipulate it thanks to the @JailBreak annotation. This code will print “Ex0osed…”: Java @Jailbreak String exposedString = "Exposed..."; exposedString.value[2] = '0'; System.out.println(exposedString); JailBreak can be applied to static fields and methods as well. However, accessing static members requires assigning null to the variable, which may seem counterintuitive. Nonetheless, this feature provides a more controlled and type-safe approach to accessing the internal state, minimizing the risks associated with using reflection. Java @Jailbreak String str = null; str.isASCII(new byte[] { 111, (byte)222 }); Finally, all objects in Manifold are injected with a jailbreak() method. This method can be used like this (notice that fastTime is a private field): Java Date d = new Date(); long t = d.jailbreak().fastTime; Self Annotation: Enforcing Method Parameter Type In Java, certain APIs accept objects as parameters, even when a more specific type could be used. This can lead to potential issues and errors at runtime. However, Manifold introduces the @Self annotation, which helps enforce the type of object passed as a parameter. By annotating the parameter with @Self, we explicitly state that only the specified object type is accepted. This ensures type safety and prevents the accidental use of incompatible types. With this annotation, the compiler catches such errors during development, reducing the likelihood of encountering issues in production. Let’s look at the MySizeClass from my previous posts: Java public class MySizeClass { int size = 5; public int size() { return size; } public void setSize(int size) { this.size = size; } public boolean equals(@Self Object o) { return o != null && ((MySizeClass)o).size == size; } } Notice I added an equals method and annotated the argument with Self. If I remove the Self annotation, this code will compile: Java var size = new MySizeClass(); size.equals(""); size.equals(new MySizeClass()); With the @Self annotation, the string comparison will fail during compilation. Auto Keyword: A Stronger Alternative to Var I’m not a huge fan of the var keyword. I feel it didn’t simplify much, and the price is coding to an implementation instead of to an interface. I understand why the devs at Oracle chose this path. Conservative decisions are the main reason I find Java so appealing. Manifold has the benefit of working outside of those constraints, and it offers a more powerful alternative called auto. auto can be used in fields and method return values, making it more flexible than var. It provides a concise and expressive way to define variables without sacrificing type safety. Auto is particularly useful when working with tuples, a feature not yet discussed in this post. It allows for elegant and concise code, enhancing readability and maintainability. You can effectively use auto as a drop-in replacement for var. Finally Operator overloading with Manifold brings expressive and intuitive mathematical notation to Java, enhancing code readability and simplicity. While Java doesn't natively support operator overloading, Manifold empowers developers to achieve similar functionality and use familiar operators in their code. By leveraging Manifold, we can write more fluid and expressive code, particularly in scientific, mathematical, and financial applications. The type of safety enhancements in Manifold makes Java more… Well, “Java-like.” It lets Java developers build upon the strong foundation of the language and embrace a more expressive type-safe programming paradigm. Should we add operator overloading to Java itself? I'm not in favor. I love that Java is slow, steady, and conservative. I also love that Manifold is bold and adventurous. That way, I can pick it when I'm doing a project where this approach makes sense (e.g., a startup project) but pick standard conservative Java for an enterprise project.
How to Implement Istio in Multicloud and Multicluster
June 6, 2023 by
Introduction to Domain-Driven Design
June 6, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
How to Implement Istio in Multicloud and Multicluster
June 6, 2023 by
Idempotent Liquibase Changesets
June 6, 2023 by
Decoding ChatGPT: The Concerns We All Should Be Aware Of
June 6, 2023 by
Personalized Code Searches Using OpenGrok
June 6, 2023 by CORE
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by