Software Integration
Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!
Distributed SQL Essentials
Advanced Cloud Security
Microservice architecture is a popular software design pattern that involves dividing a large application into smaller, independent services that can be developed, deployed, and scaled autonomously. Each microservice is responsible for a specific function within the larger application and communicates with other services via well-defined interfaces, often leveraging lightweight communication protocols such as REST or message queues. Advantages of Microservice Architecture Microservice architecture provides several advantages. Here are some benefits of using microservices: Maintainability: Smaller services are more manageable and easier to understand, leading to more stable and reliable applications. Technology Diversity: Microservices can be built using different technologies, providing greater flexibility and innovation in the development process. Scalability: Microservices can be scaled independently of each other, making it easier to allocate resources based on demand Flexibility: Microservices can be developed and deployed independently, allowing for more agile development processes and faster time-to-market Resilience: Microservices are independent of each other; failures in one service do not necessarily affect other services. Complexity: Microservices can introduce additional complexity in managing and coordinating interactions between services, and it can be more challenging to test and debug a distributed system. Disadvantages of Microservice Architecture Microservice architecture has its own set of challenges and drawbacks, which include: Complexity: The complexity increases as the number of microservices increases, making it harder to manage and coordinate interactions between services. Skilled developers: Working with microservices architecture requires skilled developers who can identify the microservices and manage their inter-communications effectively. Deployment: The independent deployment of microservices can be complicated, requiring additional coordination and management. Network usage: Microservices can be costly in terms of network usage as they need to interact with each other, resulting in network latency and increased traffic. Security: Microservices may be less secure than monolithic applications due to the inter-services communication over the network, which can be vulnerable to security breaches. Debugging: Debugging can be difficult as control flows over multiple microservices, and identifying where and why an error occurred can be a challenging task. Difference Between Monolithic and Microservice Architecture Monolithic architecture refers to a traditional approach where an entire application is built as a single, unified system, with all the functions tightly coupled together. In this approach, changes to one part of the application can have a ripple effect on other parts, making it difficult to scale and maintain over time. In addition, monolithic architecture is typically deployed as a single, large application, and it can be difficult to add new features or make changes to the system without affecting the entire application. In contrast, microservice architecture is a more modern approach where an application is broken down into smaller, independent services that can be developed, deployed, and scaled independently of each other. Each microservice focuses on a specific function within the larger application, and communication between services is typically accomplished through lightweight communication protocols such as REST or message queues. This approach offers more flexibility and scalability, as changes to one microservice do not necessarily affect the other microservices. Why Golang for Microservices? Go, also known as Golang, is an ideal language for developing microservices due to its unique characteristics. Here are some reasons why you consider Go for building microservices: Performance: Go is a fast and efficient language, which makes it perfect for building microservices that need to handle high volumes of traffic and requests. It also provides excellent runtime performance. Benchmarks game test: fannkuch-redux source secs mem gz cpu secs Go 8.25 10,936 969 32.92 Java 40.61 39,396 1257 40.69 Python 913.87 11,080 385 913.83 Benchmarks game test: n-body source secs mem gz cpu secs Go 6.36 11,244 1200 6.37 Java 7.46 39,992 1430 7.5 Python 383.12 11,096 1196 383.11 Source: Benchmarks Game Concurrency: Go provides built-in support for concurrency, which allows it to handle multiple requests simultaneously, making it a great choice for building scalable microservices. Small footprint: Go binaries have a small footprint and are lightweight, which makes it easier to deploy and manage microservices in a containerized environment. Simple and readable syntax: Go has a simple and intuitive syntax that is easy to learn and read. This makes it a great language for building maintainable microservices that are easy to understand and modify. Strong standard library: Go has a strong standard library that provides many useful built-in functions and packages for building microservices, such as networking, serialization, and logging. Large community: Go has a large and active community that provides support, resources, and third-party libraries for building microservices. Cross-platform support: Go is designed to be cross-platform, which means that you can build and deploy microservices on a variety of operating systems and environments. Overall, Go is an excellent choice for building microservices that require high performance, scalability, and efficiency. Its simple syntax, strong standard library, and large community make it a popular choice among developers for building microservices, and its small footprint and cross-platform support make it easy to deploy and manage microservices in a containerized environment. Golang Frameworks Here are a few frameworks build in Golang: • Go Micro – Go Micro is an open-source platform designed for API-driven development. It is also the most popular remote procedure call (RPC) framework today. It provides load balancing, synchronous and asynchronous communication, message encoding, service discovery, and RPC client/server packages. • Go Kit – Go Kit is a library that fills the gaps left by the standard library and provides solutions for most operational and infrastructure concerns. It makes Go a first-class language for writing microservices in any organization. These frameworks provide a variety of features and functionalities that can help developers build and manage microservices more efficiently. Choosing the right framework will depend on the specific requirements of the project and the expertise of the development team. Why Do Companies Use Golang? Two main reasons why Google's programming language is used by companies of various tiers: Designed for multi-core processing: Golang was specifically designed for cloud computing and takes advantage of modern hardware's parallelism and concurrency capabilities. Golang's goroutines and channel-based approach make it easy to utilize all available CPU cores and handle parallel I/O without adding complexity to development. Built for big projects: Golang's code is simple and easy to read, making it ideal for large-scale projects. Real-Life Examples of Using Go to Build Microservices Here are some real-life examples of companies using Go to build microservices: Netflix: Netflix uses Go for several of its microservices, including its platform engineering and cloud orchestration services. Go's performance and concurrency features help Netflix handle the scale and complexity of its infrastructure. Docker: Docker is a popular containerization platform that uses Go to power many of its core components, including the Docker daemon and Docker CLI. Go's small footprint, and cross-platform support make it well-suited for building containerized applications. Conclusion Microservices have become a popular choice for startups and large companies due to their scalability, sustainability, and ability to deploy them independently. Among programming languages, Go stands out as an excellent choice for microservices. Its easy-to-write code, high level of security, fast execution speed, and low entry threshold make it unmatched for this architecture. Go's performance, concurrency, and simplicity make it an increasingly popular option for building scalable and reliable microservices architectures.
Do you consider yourself a developer? If "yes," then this survey is for you. We need you to share your knowledge on web and mobile development, how (if) you leverage low code, scalability challenges, and more. The research covered in our Trend Reports depends on your feedback and helps shape the report. Our April Trend Report this year focuses on you, the developer. This Trend Report explores development trends and how they relate to scalability within organizations, highlighting application challenges, code, and more. And this is where we could use your insights! We're asking for ~10 minutes of your time to share your experience.Let us know your thoughts on the future of the developer! And enter for a chance to win one of 10 $50 gift cards! Take Our Survey Over the coming weeks, we will compile and analyze data from hundreds of DZone members to help inform the "Key Research Findings" for our upcoming April Trend Report, Development at Scale: An Exploration of Mobile, Web, and Low-Code Applications. Your responses help shape the narrative of our Trend Reports, so we cannot do this without you. The DZone Publications team thanks you in advance for all your help!
ChatGPT has taken the world by storm, and this week, OpenAI released the ChatGPT API. I’ve spent some time playing with ChatGPT in the browser, but the best way to really get on board with these new capabilities is to try building something with it. With the API available, now is that time. This was inspired by Greg Baugues’s implementation of a chatbot command line interface (CLI) in 16 lines of Python. I thought I’d start by trying to build the same chatbot but using JavaScript. (It turns out that Ricky Robinett also had this idea and published his bot code here. It’s pleasing to see how similar the implementations are!) The Code It turns out that Node.js requires a bit more code to deal with command line input than Python, so where Greg’s version was 16 lines, mine takes 31. Having built this little bot, I’m no less excited about the potential for building with this API though. Here’s the full code. I’ll explain what it is doing further down. import { createInterface } from "node:readline/promises"; import { stdin as input, stdout as output, env } from "node:process"; import { Configuration, OpenAIApi } from "openai"; const configuration = new Configuration({ apiKey: env.OPENAI_API_KEY }); const openai = new OpenAIApi(configuration); const readline = createInterface({ input, output }); const chatbotType = await readline.question( "What type of chatbot would you like to create? " ); const messages = [{ role: "system", content: chatbotType }]; let userInput = await readline.question("Say hello to your new assistant.\n\n"); while (userInput !== ".exit") { messages.push({ role: "user", content: userInput }); try { const response = await openai.createChatCompletion({ messages, model: "gpt-3.5-turbo", }); const botMessage = response.data.choices[0].message; if (botMessage) { messages.push(botMessage); userInput = await readline.question("\n" + botMessage.content + "\n\n"); } else { userInput = await readline.question("\nNo response, try asking again\n"); } } catch (error) { console.log(error.message); userInput = await readline.question("\nSomething went wrong, try asking again\n"); } } readline.close(); When you run this code, it looks like this: Let’s dig into how it works and how you can build your own. Building a Chatbot You will need an OpenAI platform account to interact with the ChatGPT API. Once you have signed up, create an API key from your account dashboard. As long as you have Node.js installed, the only other thing you’ll need is the openai Node.js module. Let’s start a Node.js project and create this CLI application. First create a directory for the project, change into it, and initialize it with npm: mkdir chatgpt-cli cd chatgpt-cli npm init --yes Install the openai module as a dependency: npm install openai Open package.json and add the key "type": "module" to the configuration, so we can build this as an ES module which will allow us to use top level await. Create a file called index.js and open it in your editor. Interacting With the OpenAI API There are two parts to the code: dealing with input and output on the command line and dealing with the OpenAI API. Let’s start by looking at how the API works. First we import two objects from the openai module, the Configuration and OpenAIApi. The Configuration class will be used to create a configuration that holds the API key, you can then use that configuration to create an OpenAIApi client. import { env } from "node:process"; import { Configuration, OpenAIApi } from "openai"; const configuration = new Configuration({ apiKey: env.OPENAI_API_KEY }); const openai = new OpenAIApi(configuration); In this case, we’ll store the API key in the environment and read it with env.OPENAI_API_KEY. To interact with the API, we now use the OpenAI client to create chat completions for us. OpenAI’s text-generating models don’t actually converse with you, but are built to take input and come up with plausible-sounding text that would follow that input, a completion. With ChatGPT, the model is configured to receive a list of messages and then come up with a completion for the conversation. Messages in this system can come from one of 3 different entities, the “system”, “user,” and “assistant.” The “assistant” is ChatGPT itself, the “user” is the person interacting, and the system allows the program (or the user, as we’ll see in this example) to provide instructions that define how the assistant behaves. Changing the system prompts for how the assistant behaves is one of the most interesting things to play around with and allows you to create different types of assistants. With our openai object configured as above, we can create messages to send to an assistant and request a response like this: const messages = [ { role: "system", content: "You are a helpful assistant" }, { role: "user", content: "Can you suggest somewhere to eat in the centre of London?" } ]; const response = await openai.createChatCompletion({ messages, model: "gpt-3.5-turbo", }); console.log(response.data.choices[0].message); // => "Of course! London is known for its diverse and delicious food scene..." As the conversation goes on, we can add the user’s questions and assistant’s responses to the messages array, which we send with each request. That gives the bot history of the conversation, the context for which it can build further answers on. To create the CLI, we just need to hook this up to user input in the terminal. Interacting With the Terminal Node.js provides the Readline module which makes it easy to receive input and write output to streams. To work with the terminal, those streams will be stdin and stdout. We can import stdin and stdout from the node:process module, renaming them to input and output to make them easier to use with Readline. We also import the createInterface function from node:readline import { createInterface } from "node:readline/promises"; import { stdin as input, stdout as output } from "node:process"; We then pass the input and output streams to createInterface and that gives us an object we can use to write to the output and read from the input, all with the question function: const readline = createInterface({ input, output }); const chatbotType = await readline.question( "What type of chatbot would you like to create? " ); The above code hooks up the input and output stream. The readline object is then used to post the question to the output and return a promise. When the user replies by writing into the terminal and pressing return, the promise resolves with the text that the user wrote. Completing the CLI With both of those parts, we can write all of the code. Create a new file called index.js and enter the code below. We start with the imports we described above: import { createInterface } from "node:readline/promises"; import { stdin as input, stdout as output, env } from "node:process"; import { Configuration, OpenAIApi } from "openai"; Then we initialize the API client and the Readline module: const configuration = new Configuration({ apiKey: env.OPENAI_API_KEY }); const openai = new OpenAIApi(configuration); const readline = createInterface({ input, output }); Next, we ask the first question of the user: “What type of chatbot would you like to create?” We will use the answer of this to create a “service” message in a new array of messages that we will continue to add to as the conversation goes on. const chatbotType = await readline.question( "What type of chatbot would you like to create? " ); const messages = [{ role: "system", content: chatbotType }]; We then prompt the user to start interacting with the chatbot and start a loop that says while the user input is not equal to the string “.exit” keep sending that input to the API. If the user enters “.exit” the program will end, like in the Node.js REPL. let userInput = await readline.question("Say hello to your new assistant.\n\n"); while (userInput !== ".exit") { // loop } readline.close(); Inside the loop, we add the userInput to the messages array as a “user” message. Then, within a try/catch block, send it to the OpenAI API. We set the model as “gpt-3.5-turbo” which is the underlying name for ChatGPT. When we get a response from the API, we get the message out of the response.data.choices array. If there is a message, we store it as an “assistant” message in the array of messages and output it to the user, waiting for their input again using readline. If there is no message in the response from the API, we alert the user and wait for further user input. Finally, if there is an error making a request to the API we catch the error, log the message and tell the user to try again. while (userInput !== ".exit") { messages.push({ role: "user", content: userInput }); try { const response = await openai.createChatCompletion({ messages, model: "gpt-3.5-turbo", }); const botMessage = response.data.choices[0].message; if (botMessage) { messages.push(botMessage); userInput = await readline.question("\n" + botMessage.content + "\n\n"); } else { userInput = await readline.question("\nNo response, try asking again\n"); } } catch (error) { console.log(error.message); userInput = await readline.question( "\nSomething went wrong, try asking again\n" ); } } Put that all together and you have your assistant. The full code is at the top of this post or on GitHub. You can now run the assistant by passing it your OpenAI API key as an environment on the command line: OPENAI_API_KEY=YOUR_API_KEY node index.js This will start your interaction with the assistant, starting with it asking what kind of assistant you want. Once you’ve declared that, you can start chatting with it. Experimenting Helps Us to Understand Personally, I’m not actually sure how useful ChatGPT is. It is clearly impressive; its ability to return text that reads as if it was written by a human is incredible. However, it returns content that is not necessarily correct, regardless of how confidently it presents that content. Experimenting with ChatGPT is the only way that we can try to understand what it is useful for, thus building a simple chatbot like this gives us grounds for that experiment. Learning that the system commands can give the bot different personalities and make it respond in different ways is very interesting. You might have heard, for example, that you can ask ChatGPT to help you with programming, but you could also specify a JSON structure and effectively use it as an API as well. But as you experiment with that, you will likely find that it should not be an information API, but more likely something you can use to understand your natural text and turn it into a JSON object. To me, this is exciting, as it means that ChatGPT could help create more natural voice assistants that can translate meaning from speech better than the existing crop that expects commands to be given in a more exact manner. I still have experimenting to do with this idea, and having this tool gives me that opportunity. This Is Just the Beginning If experimenting with this technology is the important thing for us to understand what we can build with it and what we should or should not build with it, then making it easier to experiment is the next goal. My next goal is to expand this tool so that it can save, interact with, and edit multiple assistants so that you can continue to work with them and improve them over time. In the meantime, you can check out the full code for this first assistant in GitHub, and follow the repo to keep up with improvements.
The benefits of adopting cloud-native practices have been talked about by industry professionals ad nauseam, with everyone extolling its ability to lower costs, easily scale, and fuel innovation like never before. Easier said than done. Companies want to go cloud-native, but they’re still trying to solve the metaphorical “security puzzle” that’s at the heart of Kubernetes management. They start with people and processes, shift corporate culture to match cloud-native security best-practices, and then sort of hit a wall. They understand they need a way to embed security into the standard developer workflows and cluster deployments, but creating continuous and secure GitOps is—in a word—hard. What Makes Kubernetes Security So Difficult? For starters, Kubernetes is usually managed by developers. They can spin up clusters quickly, but they may not be responsible for managing or securing them. Enter the Site-Reliability Engineers (SREs), who feel the need to slow down developer velocity to better manage clusters and make sure they know what’s going into production. Though not ideal, slowing things down is often seen as a reasonable tradeoff for eliminating unmanaged clusters and the kinds of risks that make IT security professionals shudder. Another challenge of Kubernetes security is that its default configurations aren’t very secure. Kubernetes is built for speed, performance, and scalability. Though each piece of the Kubernetes puzzle has security capabilities built-in, they’re often not turned on by default. That usually means developers forgo security features to move faster. As if that weren’t challenging enough, Kubernetes has complex privilege-management features that can easily become everyone’s greatest pain point. Kubernetes comes with its own Role-Based Access Controls (RBAC), but they add yet another layer of complexity. Any changes made to the Kubernetes puzzle need to be reflected in the others. Four Tenets of Kubernetes Security Although all these challenges roll up to something pretty daunting, all hope is not lost. Enter the four tenets of Kubernetes security: Easy-to-digest best-practices you can implement—today—that will make your Kubernetes and cloud-native infrastructure more secure, and help you manage your cyber exposure. Tenet #1: Manage Kubernetes Misconfigurations With Solid Policies Applying policies and associated industry benchmarks consistently throughout the development lifecycle is an essential first step in Kubernetes security. For example, you might have a policy that says you’re not allowed to run any containers with root privileges in your clusters, or that your Kubernetes API server cannot be accessible to the public. If your policies are violated, that means you have misconfigurations that will lead to unacceptable security risks. Leveraging policy frameworks, like OPA, and industry benchmarks, like CIS, can help harden Kubernetes environments and prevent misconfigurations from going into production. Set your policies first, and don’t just park them on a shelf. Revisit them on a regular schedule to ensure they continue to suit your needs. Tenet #2: Implement Security Guardrails Kubernetes security starts in the development process. Most teams developing for cloud-native platforms are using Infrastructure-as-Code (IaC) to provision and configure their systems. They should expand that to policy-as-code. That means you and your team can apply the same security policies across the software-development lifecycle. For example, developers can scan their code for security problems on their workstations, in CI/CD pipelines, container image registries, and the Kubernetes environment itself. Leveraging developer-friendly tools can help seamlessly integrate security into the development process. Open-source IaC static code-scanners, like Terrascan, can help ensure only secure code enters your environment. Tenet #3: Understand and Remediate Container Image Vulnerabilities Remediating container image vulnerabilities can be a challenge because it can be hard to see what’s actually running in your environment. Kubernetes dashboards don’t tell you much; you need to know what your images contain and how they’re built. Many developers use common base images and call it good. Unfortunately, unscanned base images can leave you vulnerable, and the risks are compounded at the registry and host levels. Developers usually skip this sort of image-scanning because it slows them down. The fact is, if they’re not in the habit of being on the lookout for outdated OS images, misconfigured settings, permission issues, embedded credentials and secrets, deprecated or unverified packages, unnecessary services, and exposed ports, they’re just handing off the problem to someone else. Yes, their work may go more quickly, but their practices are slowing the entire software-delivery process. Some common security risks: In images: Outdated packages, embedded secrets, and the use of untrusted images. In registries: Stale images, lax authentication, and lack of testing. In hosts: Lax network policies, weak access controls, unencrypted data volumes and storage, and insecure host infrastructure. Tenet #4: Exposure Management Once you’ve implemented the first three tenets, it’s time to start looking at your infrastructure holistically. Not all policies apply in all cases, so how are you applying your exclusions? Not every vulnerability is critical, so how do you prioritize fixes and automate remediation? These questions can guide you as you work toward becoming less reactive and more proactive. At the end of the day, visibility is central to managing security in Kubernetes and cloud-native environments. You need to be able to recognize when your configurations have drifted from your secure baselines, and readily identify failing policies and misconfigurations. Only then can you get the full picture of your attack surface. Where Do You Go From Here? The Journey to Comprehensive Cloud Security You might be feeling a little overwhelmed by the sheer number of attack vectors in cloud environments. But, with a solid game-plan and taking to heart the best-practices described above, cloud-native security isn’t just about the destination—your runtimes—but the journey (the entire development lifecycle). Shifting left to catch vulnerabilities using policy-as-code is a great first step toward ensuring the reliability and security of your cloud assets—and wrangling all the pieces of the Kubernetes puzzle.
I first ran into the concept of Continuous Integration (CI) when the Mozilla project launched. It included a rudimentary build server as part of the process and this was revolutionary at the time. I was maintaining a C++ project that took two hours to build and link. We rarely went through a clean build, which created compounding problems as bad code was committed into the project. A lot has changed since those old days. CI products are all over the place and, as Java developers, we enjoy a richness of capabilities like never before. But I’m getting ahead of myself. Let’s start with the basics. Continuous Integration is a software development practice in which code changes are automatically built and tested in a frequent and consistent manner. The goal of CI is to catch and resolve integration issues as soon as possible, reducing the risk of bugs and other problems slipping into production. CI often goes hand in hand with Continuous Delivery (CD), which aims to automate the entire software delivery process, from code integration to deployment in production. The goal of CD is to reduce the time and effort required to deploy new releases and hotfixes, enabling teams to deliver value to customers faster and more frequently. With CD, every code change that passes the CI tests is considered ready for deployment, allowing teams to deploy new releases at any time with confidence. I won’t discuss continuous delivery in this article, but I will go back to it as there’s a lot to discuss. I’m a big fan of the concept but there are some things we need to monitor. Continuous Integration Tools There are many powerful continuous integration tools. Here are some commonly used tools: Jenkins: Jenkins is one of the most popular CI tools, offering a wide range of plugins and integrations to support various programming languages and build tools. It is open-source and offers a user-friendly interface for setting up and managing build pipelines. It’s written in Java and was often my “go to tool.” However, it’s a pain to manage and set-up. There are some “Jenkins as a service” solutions that also clean up its user experience, which is somewhat lacking. Travis CI: Travis CI is a cloud-based CI tool that integrates well with GitHub, making it an excellent choice for GitHub-based projects. Since it predated GitHub Actions, it became the default for many open-source projects on GitHub. CircleCI: CircleCI is a cloud-based CI tool that supports a wide range of programming languages and build tools. It offers a user-friendly interface but it’s big selling point is the speed of the builds and delivery. GitLab CI/CD: GitLab is a popular source code management tool that includes built-in CI/CD capabilities. The GitLab solution is flexible yet simple, it has gained some industry traction even outside of the GitLab sphere. Bitbucket pipelines: Bitbucket pipelines is a cloud-based CI tool from Atlassian that integrates seamlessly with Bitbucket, their source code management tool. Since it’s an Atlassian product, it provides seamless JIRA integration and very fluid, enterprise oriented functionality. Notice I didn’t mention GitHub Actions, which we will get to shortly. There are several factors to consider when comparing CI tools: Ease of use: Some CI tools have a simple setup process and user-friendly interface, making it easier for developers to get started and manage their build pipelines. Integration with Source Code Management (SCM): Tools such as GitHub, GitLab, and Bitbucket. This makes it easier for teams to automate their build, test, and deployment processes. Support for different programming languages and build tools: Different CI tools support different programming languages and build tools, so it’s important to choose a tool that is compatible with your development stack. Scalability: Some CI tools are better suited to larger organizations with complex build pipelines, while others are better suited to smaller teams with simpler needs. Cost: CI tools range in cost from free and open-source to commercial tools that can be expensive, so it’s important to choose a tool that fits your budget. Features: Different CI tools offer distinct features, such as real-time build and test results, support for parallel builds, and built-in deployment capabilities. In general, Jenkins is known for its versatility and extensive plugin library, making it a popular choice for teams with complex build pipelines. Travis CI and CircleCI are known for their ease of use and integration with popular SCM tools, making them a good choice for small to medium-sized teams. GitLab CI/CD is a popular choice for teams using GitLab for their source code management, as it offers integrated CI/CD capabilities. Bitbucket Pipelines is a good choice for teams using Bitbucket for their source code management, as it integrates seamlessly with the platform. Cloud vs. On-Premise The hosting of agents is an important factor to consider when choosing a CI solution. There are two main options for agent hosting: cloud-based and on-premise: Cloud-based: Cloud-based CI solutions, such as Travis CI, CircleCI, GitHub Actions, and Bitbucket Pipelines, host the agents on their own servers in the cloud. This means you don’t have to worry about managing the underlying infrastructure, and you can take advantage of the scalability and reliability of the cloud. On-premise: On-premise CI solutions, such as Jenkins, allow you to host the agents on your own servers. This gives you more control over the underlying infrastructure, but also requires more effort to manage and maintain the servers. When choosing a CI solution, it’s important to consider your team’s specific needs and requirements. For example, if you have a large and complex build pipeline, an on-premise solution, such as Jenkins, may be a better choice, as it gives you more control over the underlying infrastructure. On the other hand, if you have a small team with simple needs, a cloud-based solution, such as Travis CI, may be a better choice, as it is easy to set up and manage. Agent Statefulness Statefulness determines whether the agents retain their data and configurations between builds: Stateful agents: Some CI solutions, such as Jenkins, allow for stateful agents, which means the agents retain their data and configurations between builds. This is useful for situations where you need to persist data between builds, such as when you’re using a database or running long-running tests. Stateless agents: Other CI solutions, such as Travis CI, use stateless agents, which means the agents are recreated from scratch for each build. This provides a clean slate for each build, but it also means you need to manage any persisted data and configurations externally, such as in a database or cloud storage. There’s a lively debate among CI proponents regarding the best approach. Stateless agents provide a clean and easy to reproduce environment. I choose them for most cases and think they are the better approach. Stateless agents can be more expensive too as they are slower to set up. Since we pay for cloud resources, the cost can add up. But the main reason some developers prefer the stateful agents is the ability to investigate. With a stateless agent, when a CI process fails, you are usually left with no means of investigation other than the logs. With a stateful agent, we can log into the machine and try to run the process manually on the given machine. We might reproduce an issue that failed and gain insight thanks to that. A company I worked with chose Azure over GitHub Actions because Azure allowed for stateful agents. This was important to them when debugging a failed CI process. I disagree with that but that’s a personal opinion. I feel I spent more time troubleshooting bad agent cleanup than I benefited from investigating a bug. But that’s a personal experience and some smart friends of mine disagree. Repeatable Builds Repeatable builds refer to the ability to produce the same exact software artifacts every time a build is performed, regardless of the environment or the time the build is performed. From a DevOps perspective, having repeatable builds is essential to ensuring software deployments are consistent and reliable. Intermittent failures are the bane of DevOps everywhere and they are painful to track. Unfortunately, there’s no easy fix. As much as we’d like it, some flakiness finds its way into projects with reasonable complexity. It is our job to minimize this as much as possible. There are two blockers to repeatable builds: Dependencies: If we don’t use specific versions for dependencies even a small change can break our build. Flaky tests: Tests that fail occasionally for no obvious reasons are the absolute worst. When defining dependencies, we need to focus on specific versions. There are many versioning schemes but, over the past decade, the standard three-number semantic versioning took over the industry. This scheme is immensely important for CI as its usage can significantly impact the repeatability of a build, e.g., with Maven, we can do: <dependency> <groupId>group</groupId> <artifactId>artifact</artifactId> <version>2.3.1</version> </dependency> This is very specific and great for repeatability. However, this might become out of date quickly. We can replace the version number with LATEST or RELEASE, which will automatically get the current version. This is bad as the builds will no longer be repeatable. However, the hard-coded three-number approach is also problematic. It’s often the case that a patch version represents a security fix for a bug. In that case, we would want to update all the way to the latest minor update but not newer versions. For that previous case, I would want to use version 2.3.2 implicitly and not 2.4.1. This trades off some repeatability for minor security updates and bugs. A better way would be to use the Maven Versions Plugin and periodically invoke the mvn versions:use-latest-releases command. This updates the versions to the latest to keep our project up to date. This is the straightforward part of repeatable builds. The difficulty is in the flaky tests. This is such a common pain that some projects define a “reasonable amount” of failed tests and some projects re-run the build multiple times before acknowledging failure. A major cause of test flakiness is state leakage. Tests might fail because of subtle side effects left over from a previous test. Ideally, a test should clean up after itself, so each test will run in isolation. In a perfect world, we would run every test in a completely isolated fresh environment, but this isn’t practical. It would mean tests would take too long to run and we would need to wait a great deal of time for the CI process. We can write tests with various isolation levels, sometimes we need complete isolation and might need to spin up a container for a test. But most times we don’t, and the difference in speed is significant. Cleaning up after tests is very challenging. Sometimes state leaks from external tools, such as the database, can cause a flaky test failure. To ensure repeatability of failure, it is a common practice to sort the test cases consistently, this ensures future runs of the build will execute in the same order. This is a hotly debated topic. Some engineers believe this encourages buggy tests and hides problems we can only discover with a random order of tests. From my experience, this did find bugs in the tests, but not in the code. My goal isn’t to build perfect tests, so I prefer running the tests in a consistent order such as alphabetic ordering. It is important to keep statistics of test failures and never simply press retry. By tracking the problematic tests and the order of execution for a failure, we can often find the source of the problem. Most times the root cause of the failure happens because of faulty cleanup in a prior test, which is why the order matters and its consistency is also important. Developer Experience and CI Performance We’re here to develop a software product, not a CI tool. The CI tool is here to make the process better. Unfortunately, many times the experience with the CI tool is so frustrating that we end up spending more time on logistics than actually writing code. Often, I spent days trying to pass a CI check so I could merge my changes. Every time I get close, another developer would merge their change first and would break my build. This contributes to a less than stellar developer experience, especially as a team scales and we spend more time in the CI queue than merging our changes. There are many things we can do to alleviate these problems: Reduce duplication in tests: Over testing is a common symptom that we can detect with coverage tools. Flaky test elimination: I know deleting or disabling tests is problematic. Don’t do that lightly. But if you spend more time debugging the test than debugging your code, it’s value is debatable. Allocate additional or faster machines for the CI process. Parallelize the CI process: We can parallelize some build types and some tests. Split the project into smaller projects: Notice that this doesn’t necessarily mean microservices. Ultimately, this connects directly to the productivity of the developers. But we don’t have profilers for these sorts of optimizations. We have to measure each time, this can be painstaking. GitHub Actions GitHub Actions is a Continuous Integration/Continuous Delivery (CI/CD) platform built into GitHub. It is stateless although it allows the self-hosting of agents to some degree. I’m focusing on it since it’s free for open-source projects and has a decent free quota for closed-source projects. This product is a relatively new contender in the field, it is not as flexible as most other CI tools mentioned before. However, it is very convenient for developers thanks to its deep integration with GitHub and stateless agents. To test GitHub Actions, we need a new project, which, in this case, I generated using JHipster with the configuration seen here: I created a separate project that demonstrates the use of GitHub Actions here. Notice you can follow this with any project, although we include Maven instructions in this case, the concept is very simple. Once the project is created, we can open the project page on GitHub and move to the “Actions” tab. We will see something like this: In the bottom right corner, we can see the “Java with Maven” project type. Once we pick this type, we move to the creation of a maven.yml file as shown here: Unfortunately, the default maven.yml suggested by GitHub includes a problem. This is the code we see in this image: name: Java CI with Maven on: push: branches: [ "master" ] pull_request: branches: [ "master" ] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up JDK 11 uses: actions/setup-java@v3 with: java-version: '11' distribution: 'temurin' cache: maven - name: Build with Maven run: mvn -B package --file pom.xml # Optional: Uploads the full dependency graph to GitHub to improve the quality of Dependabot alerts this repository can receive - name: Update dependency graph uses: advanced-security/maven-dependency-submission-action@571e99aab1055c2e71a1e2309b9691de18d6b7d6 The last three lines update the dependency graph. But this feature fails, or at least it failed for me. Removing them solved the problem. The rest of the code is a standard YAML configuration. The pull_request and push lines near the top of the code, declare that builds will run on a pull request and a push to the master. This means we can run our tests on a pull request before committing. If the test fails, we will not commit. We can disallow committing with failed tests in the project settings. Once we commit the YAML file, we can create a pull request and the system will run the build process for us. This includes running the tests, since the “package” target in Maven runs tests by default. The code that invokes the tests is in the line starting with “run” near the end. This is effectively a standard unix command line. Sometimes it makes sense to create a shell script and run it from the CI process. It’s sometimes easier to write a good shell script than deal with all the YAML files and configuration settings of various CI stacks. It’s also more portable if we choose to switch the CI tool in the future. Here, we don’t need it since Maven is enough for our current needs. We can see the successful pull request here: To test this out, we can add a bug to the code by changing the “/api” endpoint to “/myapi”. This produces the failure shown below. It also triggers an error email sent to the author of the commit: When such a failure occurs, we can click the “Details” link on the right side. This takes us directly to the error message you see here: Unfortunately, this is typically a useless message that does not provide help in the issue resolution. However, scrolling up will show the actual failure, which is usually conveniently highlighted for us as seen here: Note: there are often multiple failures, so it would be prudent to scroll up further. In this error, we can see the failure was an assertion in line 394 of the AccountResourceIT, which you can see here. The line numbers do not match. In this case, line 394 is the last line of the method: @Test @Transactional void testActivateAccount() throws Exception { final String activationKey = "some activation key"; User user = new User(); user.setLogin("activate-account"); user.setEmail("activate-account@example.com"); user.setPassword(RandomStringUtils.randomAlphanumeric(60)); user.setActivated(false); user.setActivationKey(activationKey); userRepository.saveAndFlush(user); restAccountMockMvc.perform(get("/api/activate?key={activationKey}", activationKey)).andExpect(status().isOk()); user = userRepository.findOneByLogin(user.getLogin()).orElse(null); assertThat(user.isActivated()).isTrue(); } This means the assert call failed. isActivated() returned false and failed the test. This should help a developer narrow down the issue and understand the root cause. Going Beyond As we mentioned before, CI is about developer productivity. We can go much further than merely compiling and testing. We can enforce coding standards, lint the code, detect security vulnerabilities, and much more. In this example, let’s integrate Sonar Cloud, which is a powerful code analysis tool (linter). It finds potential bugs in your project and helps you improve code quality. SonarCloud is a cloud-based version of SonarQube that allows developers to continuously inspect and analyze their code to find and fix issues related to code quality, security, and maintainability. It supports various programming languages such as Java, C#, JavaScript, Python, and more. SonarCloud integrates with popular development tools such as GitHub, GitLab, Bitbucket, Azure DevOps, and more. Developers can use SonarCloud to get real-time feedback on the quality of their code and improve the overall code quality. On the other hand, SonarQube is an open-source platform that provides static code analysis tools for software developers. It provides a dashboard that shows a summary of the code quality and helps developers identify and fix issues related to code quality, security, and maintainability. SonarCloud and SonarQube provide similar functionalities, but SonarCloud is a cloud-based service and requires a subscription, while SonarQube is an open-source platform that can be installed on-premise or on a cloud server. For simplicity’s sake, we will use SonarCloud, but SonarQube should work just fine. To get started, we go to sonarcloud.io, and sign up. Ideally with our GitHub account. We are then presented with an option to add a repository for monitoring by Sonar Cloud as shown here: When we select the “Analyze new page” option, we need to authorize access to our GitHub repository. The next step is selecting the projects we wish to add to Sonar Cloud as shown here: Once we select and proceed to the setup process, we need to pick the analysis method. Since we use GitHub Actions, we need to pick that option in the following stage as seen here: Once this is set, we enter the final stage within the Sonar Cloud wizard as seen in the following image. We receive a token that we can copy (entry 2 that is blurred in the image), we will use that shortly. Notice there are also default instructions to use with Maven that appear once you click the button labeled “Maven.” Going back to the project in GitHub, we can move to the “Project Settings” tab (not to be confused with the account settings in the top menu). Here we select “Secrets and variables” as shown here: In this section, we can add a new repository secret, specifically the SONAR_TOKEN key and value we copied from the SonarCloud as you can see here: GitHub Repository Secrets are a feature that allows developers to securely store sensitive information associated with a GitHub repository, such as API keys, tokens, and passwords, which are required to authenticate and authorize access to various third-party services or platforms used by the repository. The concept behind GitHub Repository Secrets is to provide a secure and convenient way to manage and share confidential information, without having to expose the information publicly in code or configuration files. By using secrets, developers can keep sensitive information separate from the codebase and protect it from being exposed or compromised in case of a security breach or unauthorized access. GitHub Repository Secrets are stored securely and can only be accessed by authorized users who have been granted access to the repository. Secrets can be used in workflows, actions, and other scripts associated with the repository. They can be passed as environment variables to the code, so it can access and use the secrets in a secure, reliable way. Overall, GitHub Repository Secrets provide a simple and effective way for developers to manage and protect confidential information associated with a repository, helping ensure the security and integrity of the project and the data it processes. We now need to integrate this into the project. First, we need to add these two lines to the pom.xml file. Notice that you need to update the organization name to match your own. These should go into the section in the XML: <sonar.organization>shai-almog</sonar.organization> <sonar.host.url>https://sonarcloud.io</sonar.host.url> Notice that the JHipster project we created already has SonarQube support, which should be removed from the pom file before this code will work. After this, we can replace the “Build with Maven” portion of the maven.yml file with the following version: - name: Build with Maven env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN } # Needed to get PR information, if any SONAR_TOKEN: ${{ secrets.SONAR_TOKEN } run: mvn -B verify org.sonarsource.scanner.maven:sonar-maven-plugin:sonar -Dsonar.projectKey=shai-almog_HelloJHipster package Once we do that, SonarCloud will provide reports for every pull request merged into the system as shown here: We can see a report that includes the list of bugs, vulnerabilities, smells, and security issues. Clicking every one of those issues leads us to something like this: Notice that we have tabs that explain exactly why the issue is a problem, how to fix it, and more. This is a remarkably powerful tool that serves as one of the most valuable code reviewers in the team. Two additional interesting elements we saw before are the coverage and duplication reports. SonarCloud expects that tests will have 80% code coverage (trigger 80% of the code in a pull request), this is high and can be configured in the settings. It also points out duplicate code, which might indicate a violation of the Don’t Repeat Yourself (DRY) principle. Finally CI is a huge subject with many opportunities to improve the flow of your project. We can automate the detection of bugs, streamline artifact generation, automated delivery, and so much more. In my humble opinion, the core principle behind CI is developer experience. It’s here to make our lives easier. When it is done badly, the CI process can turn this amazing tool into a nightmare. Passing the tests becomes an exercise in futility. We retry again and again until we can finally merge. We wait for hours to merge because of slow, crowded queues. This tool that was supposed to help becomes our nemesis. This shouldn’t be the case. CI should make our lives easier, not the other way around.
In this blog post, you will be using AWS Controllers for Kubernetes on an Amazon EKS cluster to put together a solution wherein data from an Amazon SQS queue is processed by an AWS Lambda function and persisted to a DynamoDB table. AWS Controllers for Kubernetes (also known as ACK) leverage Kubernetes Custom Resource and Custom Resource Definitions and give you the ability to manage and use AWS services directly from Kubernetes without needing to define resources outside of the cluster. The idea behind ACK is to enable Kubernetes users to describe the desired state of AWS resources using the Kubernetes API and configuration language. ACK will then take care of provisioning and managing the AWS resources to match the desired state. This is achieved by using Service controllers that are responsible for managing the lifecycle of a particular AWS service. Each ACK service controller is packaged into a separate container image that is published in a public repository corresponding to an individual ACK service controller. There is no single ACK container image. Instead, there are container images for each individual ACK service controller that manages resources for a particular AWS API. This blog post will walk you through how to use the SQS, DynamoDB, and Lambda service controllers for ACK. Prerequisites To follow along step-by-step, in addition to an AWS account, you will need to have AWS CLI, kubectl, and Helm installed. There are a variety of ways in which you can create an Amazon EKS cluster. I prefer using eksctl CLI because of the convenience it offers. Creating an EKS cluster using eksctl can be as easy as this: eksctl create cluster --name my-cluster --region region-code For details, refer to Getting started with Amazon EKS – eksctl. Clone this GitHub repository and change it to the right directory: git clone https://github.com/abhirockzz/k8s-ack-sqs-lambda cd k8s-ack-sqs-lambda Ok, let's get started! Setup the ACK Service Controllers for AWS Lambda, SQS, and DynamoDB Install ACK Controllers Log into the Helm registry that stores the ACK charts: aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws Deploy the ACK service controller for Amazon Lambda using the lambda-chart Helm chart: RELEASE_VERSION_LAMBDA_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/lambda-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/lambda-chart "--version=${RELEASE_VERSION_LAMBDA_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for SQS using the sqs-chart Helm chart: RELEASE_VERSION_SQS_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/sqs-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/sqs-chart "--version=${RELEASE_VERSION_SQS_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for DynamoDB using the dynamodb-chart Helm chart: RELEASE_VERSION_DYNAMODB_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/dynamodb-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/dynamodb-chart "--version=${RELEASE_VERSION_DYNAMODB_ACK}" --generate-name --set=aws.region=us-east-1 Now, it's time to configure the IAM permissions for the controller to invoke Lambda, DynamoDB, and SQS. Configure IAM Permissions Create an OIDC Identity Provider for Your Cluster For the steps below, replace the EKS_CLUSTER_NAME and AWS_REGION variables with your cluster name and region. export EKS_CLUSTER_NAME=demo-eks-cluster export AWS_REGION=us-east-1 eksctl utils associate-iam-oidc-provider --cluster $EKS_CLUSTER_NAME --region $AWS_REGION --approve OIDC_PROVIDER=$(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f2- | cut -d '/' -f2-) Create IAM Roles for Lambda, SQS, and DynamoDB ACK Service Controllers ACK Lambda Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-lambda-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_lambda.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-lambda-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK lambda controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_lambda.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # we are getting the policy directly from the ACK repo INLINE_POLICY="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/lambda-controller/main/config/iam/recommended-inline-policy)" aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ack-recommended-policy" \ --policy-document "${INLINE_POLICY}" Attach ECR permissions to the controller IAM role. These are required since Lambda functions will be pulling images from ECR. aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ecr-permissions" \ --policy-document file://ecr-permissions.json Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the SQS controller. ACK SQS Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-sqs-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_sqs.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-sqs-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK sqs controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_sqs.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for sqs controller, we use the managed policy ARN instead of the inline policy (unlike the Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/sqs-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the DynamoDB controller. ACK DynamoDB Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-dynamodb-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_dynamodb.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-dynamodb-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK dynamodb controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_dynamodb.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for dynamodb controller, we use the managed policy ARN instead of the inline policy (like we did for Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/dynamodb-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Restart ACK Controller Deployments and Verify the Setup Restart the ACK service controller Deployment using the following commands. It will update service controller Pods with IRSA environment variables. Get list of ACK service controller deployments: export ACK_K8S_NAMESPACE=ack-system kubectl get deployments -n $ACK_K8S_NAMESPACE Restart Lambda, SQS, and DynamoDB controller Deployments: DEPLOYMENT_NAME_LAMBDA=<enter deployment name for lambda controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_LAMBDA DEPLOYMENT_NAME_SQS=<enter deployment name for sqs controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_SQS DEPLOYMENT_NAME_DYNAMODB=<enter deployment name for dynamodb controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_DYNAMODB List Pods for these Deployments. Verify that the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables exist for your Kubernetes Pod using the following commands: kubectl get pods -n $ACK_K8S_NAMESPACE LAMBDA_POD_NAME=<enter Pod name for lambda controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $LAMBDA_POD_NAME | grep "^\s*AWS_" SQS_POD_NAME=<enter Pod name for sqs controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $SQS_POD_NAME | grep "^\s*AWS_" DYNAMODB_POD_NAME=<enter Pod name for dynamodb controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $DYNAMODB_POD_NAME | grep "^\s*AWS_" Now that the ACK service controller has been set up and configured, you can create AWS resources! Create SQS Queue, DynamoDB Table, and Deploy the Lambda Function Create SQS Queue In the file sqs-queue.yaml, replace the us-east-1 region with your preferred region as well as the AWS account ID. This is what the ACK manifest for the SQS queue looks like: apiVersion: sqs.services.k8s.aws/v1alpha1 kind: Queue metadata: name: sqs-queue-demo-ack annotations: services.k8s.aws/region: us-east-1 spec: queueName: sqs-queue-demo-ack policy: | { "Statement": [{ "Sid": "__owner_statement", "Effect": "Allow", "Principal": { "AWS": "AWS_ACCOUNT_ID" }, "Action": "sqs:SendMessage", "Resource": "arn:aws:sqs:us-east-1:AWS_ACCOUNT_ID:sqs-queue-demo-ack" }] } Create the queue using the following command: kubectl apply -f sqs-queue.yaml # list the queue kubectl get queue Create DynamoDB Table This is what the ACK manifest for the DynamoDB table looks like: apiVersion: dynamodb.services.k8s.aws/v1alpha1 kind: Table metadata: name: customer annotations: services.k8s.aws/region: us-east-1 spec: attributeDefinitions: - attributeName: email attributeType: S billingMode: PAY_PER_REQUEST keySchema: - attributeName: email keyType: HASH tableName: customer You can replace the us-east-1 region with your preferred region. Create a table (named customer) using the following command: kubectl apply -f dynamodb-table.yaml # list the tables kubectl get tables Build Function Binary and Create Docker Image GOARCH=amd64 GOOS=linux go build -o main main.go aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws docker build -t demo-sqs-dynamodb-func-ack . Create a private ECR repository, tag and push the Docker image to ECR: AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com aws ecr create-repository --repository-name demo-sqs-dynamodb-func-ack --region us-east-1 docker tag demo-sqs-dynamodb-func-ack:latest $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest Create an IAM execution Role for the Lambda function and attach the required policies: export ROLE_NAME=demo-sqs-dynamodb-func-ack-role ROLE_ARN=$(aws iam create-role \ --role-name $ROLE_NAME \ --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}' \ --query 'Role.[Arn]' --output text) aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Since the Lambda function needs to write data to DynamoDB and invoke SQS, let's add the following policies to the IAM role: aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "dynamodb-put" \ --policy-document file://dynamodb-put.json aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "sqs-permissions" \ --policy-document file://sqs-permissions.json Create the Lambda Function Update function.yaml file with the following info: imageURI - The URI of the Docker image that you pushed to ECR, e.g., <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest role - The ARN of the IAM role that you created for the Lambda function, e.g., arn:aws:iam::<AWS_ACCOUNT_ID>:role/demo-sqs-dynamodb-func-ack-role This is what the ACK manifest for the Lambda function looks like: apiVersion: lambda.services.k8s.aws/v1alpha1 kind: Function metadata: name: demo-sqs-dynamodb-func-ack annotations: services.k8s.aws/region: us-east-1 spec: architectures: - x86_64 name: demo-sqs-dynamodb-func-ack packageType: Image code: imageURI: AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest environment: variables: TABLE_NAME: customer role: arn:aws:iam::AWS_ACCOUNT_ID:role/demo-sqs-dynamodb-func-ack-role description: A function created by ACK lambda-controller To create the Lambda function, run the following command: kubectl create -f function.yaml # list the function kubectl get functions Add SQS Trigger Configuration Add SQS trigger which will invoke the Lambda function when an event is sent to the SQS queue. Here is an example using AWS Console: Open the Lambda function in the AWS Console and click on the Add trigger button. Select SQS as the trigger source, select the SQS queue, and click on the Add button. Now you are ready to try out the end-to-end solution! Test the Application Send a few messages to the SQS queue. For the purposes of this demo, you can use the AWS CLI: export SQS_QUEUE_URL=$(kubectl get queues/sqs-queue-demo-ack -o jsonpath='{.status.queueURL}') aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user1@foo.com --message-attributes 'name={DataType=String, StringValue="user1"}, city={DataType=String,StringValue="seattle"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user2@foo.com --message-attributes 'name={DataType=String, StringValue="user2"}, city={DataType=String,StringValue="tel aviv"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user3@foo.com --message-attributes 'name={DataType=String, StringValue="user3"}, city={DataType=String,StringValue="new delhi"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user4@foo.com --message-attributes 'name={DataType=String, StringValue="user4"}, city={DataType=String,StringValue="new york"}' The Lambda function should be invoked and the data should be written to the DynamoDB table. Check the DynamoDB table using the CLI (or AWS console): aws dynamodb scan --table-name customer Clean Up After you have explored the solution, you can clean up the resources by running the following commands: Delete SQS queue, DynamoDB table and the Lambda function: kubectl delete -f sqs-queue.yaml kubectl delete -f function.yaml kubectl delete -f dynamodb-table.yaml To uninstall the ACK service controllers, run the following commands: export ACK_SYSTEM_NAMESPACE=ack-system helm ls -n $ACK_SYSTEM_NAMESPACE helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the sqs chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the lambda chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the dynamodb chart> Conclusion and Next Steps In this post, we have seen how to use AWS Controllers for Kubernetes to create a Lambda function, SQS, and DynamoDB table and wire them together to deploy a solution. All of this (almost) was done using Kubernetes! I encourage you to try out other AWS services supported by ACK. Here is a complete list. Happy building!
In today's world of real-time data processing and analytics, streaming databases have become an essential tool for businesses that want to stay ahead of the game. These databases are specifically designed to handle data that is generated continuously and at high volumes, making them perfect for use cases such as the Internet of Things (IoT), financial trading, and social media analytics. However, with so many options available in the market, choosing the right streaming database can be a daunting task. This post helps you understand what SQL streaming is, the streaming database, when and why to use it, and discusses some key factors that you should consider when choosing the right streaming database for your business. Learning Objectives You will learn the following throughout the article: What streaming data is (event stream processing) What is the Streaming SQL method? Streaming Database features and use cases Top 5 streaming databases (both open-source and SaaS). Criteria for choosing a streaming database Let's review quickly some concepts like what’s streaming data, streaming SQL, and databases in the next few sections. What Is Streaming Data? A stream is a sequence of events/data elements made available over time. Data streaming is a method of processing and analyzing data in real-time as it's generated by various sources such as sensors, e-commerce purchases, web and mobile applications, social networks, and many more. It involves the continuous and persistent collection, processing, and delivery of data in the form of events or messages. You can ingest data from different data sources such as message brokers Kafka, Redpanda, Kinesis, Pulsar, or databases MySQL or PostgreSQL using their Change Data Capture (CDC), which is the process of identifying and capturing data changes. What Is Streaming SQL? Once you collect data, you can store this data in a streaming database (in the next section, it is explained), where it can be processed and analyzed using SQL queries with SQL streaming. It is a technique for processing real-time data streams using SQL queries. It allows businesses to use the same SQL language they use for batch processing to query and process data streams in real-time. The data can be transformed, filtered, and aggregated in real-time from the stream into more useful outputs like materialized view (CREATE MATERIALIZED VIEW) to provide insights and enable automated decision-making. Materialized views are typically used in situations where a complex query needs to be executed frequently or where the query result takes a long time to compute. By precomputing the result and storing it in a materialized view (virtual tables), queries can be executed more quickly and with less overhead. PostgreSQL, Microsoft SQL Server, RisingWave or Materialize support materialized views with automatic updates. One of the key benefits of SQL streaming is that it allows businesses to leverage their existing SQL skills and infrastructure to process real-time data. This can be more efficient than having to learn new programming languages such as Java, and Scala, or tools to work with data streams. What Is a Streaming Database? A streaming database, also known as a real-time database, is a database management system that is designed to handle a continuous stream of data in real-time. It is optimized for processing and storing large volumes of data that arrive in a continuous and rapid stream. A streaming database uses the same declarative SQL and the same abstractions (tables, columns, rows, views, indexes) as a traditional database. Unlike in traditional databases, data is stored in tables matching the structure of the writes (inserts, updates), and all the computation work happens on read queries (selects); streaming databases operate on a continuous basis, processing data as it arrives and saving it to persistent storage in the form of a materialized view. This allows for immediate analysis and response to real-time events, enabling businesses to make decisions and take actions based on the most up-to-date information. Streaming databases typically use specialized data structures and algorithms that are optimized for fast and efficient data processing. They also support complex event processing (CEP) and other real-time analytics tools to help businesses gain insights and extract value from the data in real-time. One of the features unique to streaming databases is the ability to update incrementally materialized views. What Can You Do With the Stream Database? Here are some of the things you can do with a streaming database: Collect and transform data from different streams/data sources, such as Apache Kafka. Create materialized views for the data that needs to be incrementally aggregated. Query complex stream data with simple SQL syntaxes. After aggregating and analyzing real-time data streams, you can use real-time analytics to trigger downstream applications. 5 Top Streaming Databases As there are various types of streaming databases available, and numerous features are provided by each. Below, I have shared the 5 top streaming databases (both open-source and SaaS) and note that they are not in the specific order of popularity or use. RisingWave. Materialize. Amazon Kinesis. Confluent. Apache Flink. How To Select Your Streaming Databases Choosing the right streaming data platform can be a challenging task, as there are several factors to consider. Here are some key considerations to keep in mind when selecting a streaming data platform: Data Sources: Consider the types of data sources that the platform can ingest and process. Make sure the platform can handle the data sources you need. Kafka, Redpanda, Apache Pulsar, AWS Kinesis, Google Pub/Sub are mostly used as stream source services/message brokers. Or databases such as PostgreSQL or MySQL. Scalability: Consider the ability of the platform to scale as your data needs grow. Some platforms may be limited in their ability to scale, while others can handle large volumes of data and multiple concurrent users. Make sure that the scaling process can be completed almost instantaneously without interrupting data processing. For example, the open-source project RisingWave dynamically partitions data into each compute node using a consistent hashing algorithm. These compute nodes collaborate by computing their unique portion of the data and then exchanging output with each other. In the case of streaming data platforms in cloud providers, they support auto-scaling features out-of-the-box, so it is not an issue. Integration: Consider the ability of the platform to integrate with other systems and tools, such as BI and data analytics platforms you are currently using or plan to use in the future. Make sure the platform supports the protocols and APIs that you need to connect with your other systems. RisingWave has integration with many BI services that, include Grafana, Metabase, Apache Superset, and so on. Performance: Consider the speed and efficiency of the platform. Some platforms may perform better than others in terms of query speed, data processing, and analysis. Therefore, you need to select a streaming database that can extract, transform and load millions of records in seconds. The key performance indicators (KPIs) for streaming data platforms are event rate, throughput (event rate times event size), latency, reliability, and the number of topics (for pub-sub architectures). Sometimes compared to JVM-based systems, a platform designed with a low-level programming language such as Rust can be super fast. Security: Consider the security features of the platform, such as access controls, data encryption, and compliance certifications, to ensure your data is protected. Ease of Use: Consider the ease of use of the platform, including its user interface, documentation, and support resources. Make sure the platform is easy to use and provides adequate support for your team. Cost: Consider the cost of the platform, including licensing fees, maintenance costs, and any additional hardware or software requirements. Make sure the platform fits within your budget and provides a good return on investment. Summary In summary, streaming databases offer several unique features, including real-time data processing, event-driven architecture, continuous processing, low latency, scalability, support for various data formats, and flexibility. These features enable faster insights, better decision-making, and more efficient use of data in real-time applications. The best streaming database for your use case will depend on your specific requirements, such as supported data sources, volume and velocity, data structure, scalability, performance, integration, and cost. It's important to carefully evaluate each option based on these factors to determine the best fit for your organization.
When it comes to managing large amounts of data in a distributed system, Apache Cassandra and Apache Pulsar are two names that often come up. Apache Cassandra is a highly scalable NoSQL database that excels at handling high-velocity writes and queries across multiple nodes. It is an ideal solution for use cases such as user profile management, product catalogs, and real-time analytics. A platform for distributed messaging and streaming, called Apache Pulsar, was created to manage moving data. It can handle standard messaging workloads and more complex streaming use cases including real-time data processing and event-driven architectures. This article covers the main steps of building a Spring Boot and React-based web application that interacts with Pulsar and Cassandra, displaying stock data live as it is received. This is not a complete tutorial, it only covers the most important steps. You can find the complete source code for the application on GitHub. You will learn how to: Set up Cassandra and Pulsar instances using DataStax Astra DB and Astra Streaming. Publish and consume Pulsar messages in a Spring Boot application. Store Pulsar messages in Cassandra using a sink. Viewing live and stored data in React using the Hilla framework by Vaadin. Used Technologies and Libraries Apache Cassandra (with Astra DB) Apache Pulsar (with Astra Streaming) Spring Boot Spring for Apache Pulsar Spring Data for Apache Cassandra React Hilla AlphaVantage API Requirements Java 17 or newer Node 18 or newer Intermediate Java skills and familiarity with Spring Boot Storing Sensitive Data in Spring Boot Much of the setup for Cassandra and Pulsar is configuration-based. While it might be tempting to put the configuration in application.properties, it is not a smart idea as the file is under source control, and you may unintentionally reveal secrets. Instead, create a local config/local/application.properties configuration file and add it to .gitignore to ensure it does not leave your computer. The settings from the local configuration file will be automatically applied by Spring Boot: mkdir -p config/local touch config/local/application.properties echo " # Contains secrets that shouldn't go into the repository config/local/" >> .gitignore You may provide Spring Boot with the options as environment variables when using it in production. Setting Up Cassandra and Pulsar Using DataStax Astra Both Apache technologies used in this article are open-source projects and can be installed locally. However, using cloud services to set up the instances is a simpler option. In this article, we set up the data infrastructure required for our example web application using DataStax free tier services. Begin by logging in to your existing account or signing up for a new one on Astra DataStax’s official website, where you will be required to create the database and streaming service separately. Cassandra Setup Start by clicking “Create Database” from the official Astra DataStax website. Sinking data from a stream into Astra DB requires that both services are deployed in a region that supports both Astra Streaming and Astra DB: Enter the name of your new database instance. Select the keyspace name. (A keyspace stores your group of tables, a bit like schema in relational databases). Select a cloud Provider and Region.Note: For the demo application to work, you need to deploy the database service on a region that supports streaming too. Select “Create Database.” Cassandra: Connecting to the Service Once the initialization of the database service is created, you need to generate a token and download the “Secure Connection Bundle” that encrypts the data transfer between the app and the cloud database (mTLS). Navigate to the DB dashboard “Connect” tab sheet where you will find the button to generate a one-time token (please remember to download it) and the bundle download button: spring.cassandra.schema-action=CREATE_IF_NOT_EXISTS spring.cassandra.keyspace-name=<KEYSPACE_NAME> spring.cassandra.username=<ASTRADB_TOKEN_CLIENT_ID> spring.cassandra.password=<ASTRADB_TOKEN_SECRET> # Increase timeouts when connecting to Astra from a dev workstation spring.cassandra.contact-points=<ASTRADB_DATACENTER_ID> spring.cassandra.port=9042 spring.cassandra.local-datacenter=<ASTRADB_REGION> datastax.astra.secure-connect-bundle=<secure-connect-astra-stock-db.zip> Pulsar parameters for application.properties. Pulsar Set Up Start by clicking “Create Stream” from the main Astra DataStax page: Enter the name for your new streaming instance. Select a provider and region.Note: Remember to use the same provider and region you used to create the database service. Select “Create Stream.” Pulsar: Enabling Auto Topic Creation In addition to getting the streaming service up and running, you will also need to define the topic that is used by the application to consume and produce messages. You can create a topic explicitly using UI, but a more convenient way is to enable “Allow Auto Topic Creation” setting for the created instance: Click on the newly created stream instance and navigate to the “Namespace and Topics” tab sheet, and click “Modify Namespace.” Navigate to the “Settings” tab located under the default namespace (not the top-level “Settings” tab) and scroll all the way down. Change the “Allow Topic Creation” to “Allow Auto Topic Creation.” Changing this default setting will allow the application to create new topics automatically without any additional admin effort in Astra. With this, you have successfully established the infrastructure for hosting your active and passive data. Pulsar: Connecting to the Service Once the streaming instance has been set up, you need to create a token to access the service from your app. Most of the necessary properties are located on the “Connect” tab sheet of the “Streaming dashboard.” The “topic-name” input is found in the “Namespaces and Topics” tab sheet: ## Client spring.pulsar.client.service-url=<Broker Service URL> spring.pulsar.client.auth-plugin-class-name=org.apache.pulsar.client.impl.auth.AuthenticationToken spring.pulsar.client.authentication.token=<Astra_Streaming_Token> ## Producer spring.pulsar.producer.topic-name=persistent://<TENANT_NAME>/default/<TOPIC_NAME> spring.pulsar.producer.producer-name=<name of your choice> ## Consumer spring.pulsar.consumer.topics=persistent://<TENANT_NAME>/default/<TOPIC_NAME> spring.pulsar.consumer.subscription-name=<name of your choice> spring.pulsar.consumer.consumer-name=<name of your choice> spring.pulsar.consumer.subscription-type=key_shared Pulsar parameters for application.properties. Publishing Pulsar Messages From Spring Boot The Spring for Apache Pulsar library takes care of setting up Pulsar producers and consumers based on the given configuration. In the application, the StockPriceProducer component handles message publishing. To fetch stock data, it makes use of an external API call before publishing it to a Pulsar stream using a PulsarTemplate. Autowire the PulsarTemplate into the class and save it to a field: Java @Component public class StockPriceProducer { private final PulsarTemplate<StockPrice> pulsarTemplate; public StockPriceProducer(PulsarTemplate<StockPrice> pulsarTemplate) { this.pulsarTemplate = pulsarTemplate; } //... } Then use it to publish messages: Java private void publishStockPrices(Stream<StockPrice> stockPrices) { // Publish items to Pulsar with 100ms intervals Flux.fromStream(stockPrices) // Delay elements for the demo, don't do this in real life .delayElements(Duration.ofMillis(100)) .subscribe(stockPrice -> { try { pulsarTemplate.sendAsync(stockPrice); } catch (PulsarClientException e) { throw new RuntimeException(e); } }); } You need to configure the schema for the custom StockPrice type. In Application.java, define the following bean: Java @Bean public SchemaResolver.SchemaResolverCustomizer<DefaultSchemaResolver> schemaResolverCustomizer() { return (schemaResolver) -> schemaResolver.addCustomSchemaMapping(StockPrice.class, Schema.JSON(StockPrice.class)); } Consuming Pulsar Messages in Spring Boot The Spring for Apache Pulsar library comes with a @PulsarListener annotation for a convenient way of listening to Pulsar messages. Here, the messages are emitted to a Project Reactor Sink so the UI can consume them as a Flux: Java @Service public class StockPriceConsumer { private final Sinks.Many<StockPrice> stockPriceSink = Sinks.many().multicast().directBestEffort(); private final Flux<StockPrice> stockPrices = stockPriceSink.asFlux(); @PulsarListener private void stockPriceReceived(StockPrice stockPrice) { stockPriceSink.tryEmitNext(stockPrice); } public Flux<StockPrice> getStockPrices() { return stockPrices; } } Creating a Server Endpoint for Accessing Data From React The project uses Hilla, a full-stack web framework for Spring Boot. It manages websocket connections for reactive data types and allows type-safe server communication. The client may utilize the matching TypeScript methods created by the StockPriceEndpoint to fetch data: Java @Endpoint @AnonymousAllowed public class StockPriceEndpoint { private final StockPriceProducer producer; private final StockPriceConsumer consumer; private final StockPriceService service; StockPriceEndpoint(StockPriceProducer producer, StockPriceConsumer consumer, StockPriceService service) { this.producer = producer; this.consumer = consumer; this.service = service; } public List<StockSymbol> getSymbols() { return StockSymbol.supportedSymbols(); } public void produceDataForTicker(String ticker) { producer.produceStockPriceData(ticker); } public Flux<StockPrice> getStockPriceStream() { return consumer.getStockPrices(); } public List<StockPrice> findAllByTicker(String ticker) { return service.findAllByTicker(ticker); } } Displaying a Live-Updating Chart in React The DashboardView has an Apex Chart candle stick chart for displaying the stock data. It’s bound to a state of type ApexAxisChartSeries: TypeScript const [series, setSeries] = useState<ApexAxisChartSeries>([]); The view uses a React effect hook to call the server endpoint and subscribe to new data. It returns a disposer function to close the websocket when it is no longer needed: TypeScript useEffect(() => { const subscription = StockPriceEndpoint .getStockPriceStream() .onNext((stockPrice) => updateSeries(stockPrice)); return () => subscription.cancel(); }, []); The series is bound to the template. Because the backend and frontend are reactive, the chart is automatically updated any time a new Pulsar message is received: HTML <ReactApexChart type="candlestick" options={options} series={series} height={350} ></div> Persisting Pulsar Messages to Cassandra Sinking Pulsar messages to Astra DB can be useful in scenarios where you need a reliable, scalable, and secure platform to store event data from Pulsar for further analysis, processing, or sharing. Perhaps you need to retain a copy of event data for compliance and auditing purposes, need to store event data from multiple tenants in a shared database, or for some other use case. Astra Streaming offers numerous fully-managed Apache Pulsar connectors you can use to persist event data to various databases and third party solutions, like Snowflake. In this article, we are persisting the stream data into Astra DB. Creating a Sink Start by selecting the “Sink” tab sheet from the Astra streaming dashboard. Select the “default” namespace: From the list of available “Sink Types,” choose “Astra DB.” Give the sink a name of your choice Select the “stock-feed” that will be available once you have published messages to that topic from your app. After selecting data stream input, select the database you want to persist pulsar messages: To enable table creation, paste the Astra DB token with valid roles. You’ll notice keyspaces after the entry of a valid token, choose the keyspace name that was used to create the database. Then enter the table name.Note: This needs to match the @Table("stock_price") annotation value you use in StockPrice.java class to read back the data. Next, you need to map the properties from the Pulsar message to the database table column. Property fields are automatically mapped in our demo application, so you can simply click “Create” to proceed. If you were, for instance, persisting a portion of the data to the database, opening the schema definition would enable you to view the property names employed and create a custom mapping between the fields. After the sink is created, the initialization process will begin. After which, the status will change to “active.” Then, you’re done with automatically persisting stock data into your database for easy access by application. The sink dashboard provides access to sink log files in the event of an error. Displaying Cassandra Data in a Table The historical data that is stored in Cassandra are displayed in a data grid component. The DetailsView contains a Vaadin Grid component that is bound to an array of StockPrice objects which are kept in a state variable: TypeScript const [stockData, setStockData] = useState<StockPrice[]>([]); The view has a dropdown selector for selecting the stock you want to view. When the selection is updated, the view fetches the data for that stock from the server endpoint: TypeScript async function getDataFor(ticker?: string) { if (ticker) setStockData(await StockPriceEndpoint.findAllByTicker(ticker)); } The StockData array is bound to the grid in the template. GridColumns define the properties that columns should map to: HTML <Grid items={stockData} className="flex-grow"> <GridColumn path="time" ></GridColumn> <GridColumn path="open" ></GridColumn> <GridColumn path="high" ></GridColumn> <GridColumn path="low" ></GridColumn> <GridColumn path="close" ></GridColumn> <GridColumn path="volume" ></GridColumn> </Grid> Conclusion In this article, we showed how you can build a scalable real-time application using an open-source Java stack. You can clone the completed application and use it as a base for your own experiments.
In this article, I will look at Specification by Example (SBE) as explained in Gojko Adzic’s book of the same name. It’s a collaborative effort between developers and non-developers to arrive at textual specifications that are coupled to automatic tests. You may also have heard of it as behavior-driven development or executable specifications. These are not synonymous concepts, but they do overlap. It's a common experience in any large, complex project. Crucial features do not behave as intended. Something was lost in the translation between intention and implementation, i.e., business and development. Inevitably we find that we haven’t built quite the right thing. Why wasn’t this caught during testing? Obviously, we’re not testing enough, or the wrong things. Can we make our tests more insightful? Some enthusiastic developers and SBE adepts jump to the challenge. Didn’t you know you can write all your tests in plain English? Haven’t you heard of Gherkin's syntax? She demonstrates the canonical Hello World of executable specifications, using Cucumber for Java. Gherkin Scenario: Items priced 100 euro or more are eligible for 5% discount for loyal customers Given Jill has placed three orders in the last six months When she looks at an item costing 100 euros or more Then she is eligible for a 5% discount Everybody is impressed. The Product Owner greenlights a proof of concept to rewrite the most salient test in Gherkin. The team will report back in a month to share their experiences. The other developers brush up their Cucumber skills but find they need to write a lot of glue code. It’s repetitive and not very DRY. Like the good coders they are, they make it more flexible and reusable. Gherkin Scenario: discount calculator for loyal customers Given I execute a POST call to /api/customer/12345/orders?recentMonths=6 Then I receive a list of 3 OrderInfo objects And a DiscountRequestV1 message for customer 12345 is put on the queue /discountcalculator [ you get the message ] Reusable yes, readable, no. They’re right to conclude that the textual layer offers nothing, other than more work. It has zero benefits over traditional code-based tests. Business stakeholders show no interest in these barely human-readable scripts, and the developers quickly abandon the effort. It’s About Collaboration, Not Testing The experiment failed because it tried to fix the wrong problem. It failed because better testing can’t repair a communication breakdown between getting from the intended functionality to implementation. SBE is about collaboration. It is not a testing approach. You need this collaboration to arrive at accurate and up-to-date specifications. To be clear, you always have a spec (like you always have an architecture). It may not always be a formal one. It can be a mess that only exists in your head, which is only acceptable if you’re a one-person band. In all other cases, important details will get lost or confused in the handover between disciplines. The word handover has a musty smell to it, reminiscent of old-school Waterfall: the go-to punchbag for everything we did wrong in the past, but also an antiquated approach that few developers under the age of sixty have any real experience with it. Today we’re Agile and multi-disciplinary. We don’t have specialists who throw documents over the fence of their silos. It is more nuanced than that, now as well as in 1975. Waterfall didn’t prohibit iteration. You could always go back to an earlier stage. Likewise, the definition of a modern multi-disciplinary team is not a fungible collection of Jacks and Jills of all trades. Nobody can be a Swiss army knife of IT skills and business domain knowledge. But one enduring lesson from the past is that we can’t produce flawless and complete specifications of how the software should function, before writing its code. Once you start developing, specs always turn out over-complete, under-complete, and just plain wrong in places. They have bugs, just like code. You make them better with each iteration. Accept that you may start off incomplete and imprecise. You Always Need a Spec Once we have built the code according to spec (whatever form that takes), do we still need that spec, as an architect’s drawing after the house was built? Isn’t the ultimate truth already in the code? Yes, it is, but only at a granular level, and only accessible to those who can read it. It gives you detail, but not the big picture. You need to zoom out to comprehend the why. Here’s where I live: This is the equivalent of source code. Only people who have heard of the Dutch village of Heeze can relate this map to the world. It’s missing the context of larger towns and a country. The next map zooms out only a little, but with the context of the country's fifth-largest city, it’s recognizable to all Dutch inhabitants. The next map should be universal. Even if you can’t point out the Netherlands on a map, you must have heard of London. Good documentation provides a hierarchy of such maps, from global and generally accessible to more detailed, requiring more domain-specific knowledge. At every level, there should be sufficient context about the immediately connecting parts. If there is a handover at all, it’s never of the kind: “Here’s my perfect spec. Good luck, and report back in a month”. It’s the finalization of a formal yet flexible document created iteratively with people from relevant disciplines in an ongoing dialogue throughout the development process. It should be versioned, and tied to the software that it describes. Hence the only logical place is together with the source code repository, at least for specifications that describe a well-defined body of code, a module, or a service. Such specs can rightfully be called the ultimate source of truth about what the code does, and why. Because everybody was involved and invested, everybody understands it, and can (in their own capacity) help create and maintain it. However, keeping versioned specs with your software is no automatic protection against mismatches, when changes to the code don’t reflect the spec and vice versa. Therefore, we make the spec executable, by coupling it to testing code that executes the code that the spec covers and validates the results. It sounds so obvious and attractive. Why isn’t everybody doing it if there’s a world of clarity to be gained? There are two reasons: it’s hard and you don’t always need SBE. We routinely overestimate the importance of the automation part, which puts the onus disproportionally on the developers. It may be a deliverable of the process, but it’s only the collaboration that can make it work. More Art Than Science Writing good specifications is hard, and it’s more art than science. If there ever was a need for clear, unambiguous, SMART writing, executable specifications fit the bill. Not everybody has a talent for it. As a developer with a penchant for writing, I flatter myself that I can write decent spec files on my own. But I shouldn’t – not without at least a good edit from a business analyst. For one, I don’t know when my assumptions are off the mark, and I can’t always avoid technocratic wording from creeping in. A process that I favor and find workable is when a businessperson drafts acceptance criteria which form the input to features and scenarios. Together with a developer, they are refined: adding clarity, and edge cases, and removing duplication and ambiguity. Only then can they be rigorous enough to be turned into executable spec files. Writing executable specifications can be tremendously useful for some projects and a complete waste of time for others. It’s not at all like unit testing in that regard. Some applications are huge but computationally simple. These are the enterprise behemoths with their thousands of endpoints and hundreds of database tables. Their code is full of specialist concepts specific to the world of insurance, banking, or logistics. What makes these programs complex and challenging to grasp is the sheer number of components and the specialist domain they relate to. The math in Fintech isn’t often that challenging. You add, subtract, multiply, and watch out for rounding errors. SBE is a good candidate to make the complexity of all these interfaces and edge cases manageable. Then there’s software with a very simple interface behind which lurks some highly complex logic. Consider a hashing algorithm, or any cryptographic code, that needs to be secure and performant. Test cases are simple. You can tweak the input string, seed, and log rounds, but that’s about it. Obviously, you should test for performance and resource usage. But all that is best handled in a code-based test, not Gherkin. This category of software is the world of libraries and utilities. Their concepts stay within the realm of programming and IT. It relates less directly to concepts in the real world. As a developer, you don’t need a business analyst to explain the why. You can be your own. No wonder so many Open Source projects are of this kind. Gherkin Scenario: discount calculator for loyal customers Given I execute a POST call to /api/customer/12345/orders?recentMonths=6 Then I receive a list of 3 OrderInfo objects And a DiscountRequestV1 message for customer 12345 is put on the queue /discountcalculator [ you get the message ]
Setting up a VPN server to allow remote connections can be challenging if you set this up for the first time. In this post, I will guide you through the steps to set up your own VPN Server and connect to it using a VPN Client. Additionally, I will also show how to set up a free Radius server and a plugin to implement multi-factor authentication for additional security. 1. Installation OpenVPN Server on Linux (Using a Centos Stream 9 Linux) # yum update # curl -O https://raw.githubusercontent.com/angristan/openvpn-install/master/openvpn-install.sh # chmod +x openvpn-install.sh # ./openvpn-install.sh Accept defaults for installation of OpenVPN and, in the end, provide a Client name e.g. demouser. I have chosen a passwordless client, but if you want, you can also add an additional password to protect your private key. PowerShell Client name: demouser Do you want to protect the configuration file with a password? (e.g. encrypt the private key with a password) 1) Add a passwordless client 2) Use a password for the client Select an option [1-2]: 1 ... ... The configuration file has been written to /root/demouser.ovpn. Download the .ovpn file and import it in your OpenVPN client. Finally, a client configuration file is ready to be imported into the VPN Client. 2. Installation of OpenVPN Client for Windows Download the OpenVPN Client software. Install the OpenVPN Client: Once the installation is finished, we can import the configuration file demouser.ovpn which was generated on the OpenVPN server, but before importing, we need to modify the IP address of our OpenVPN server within this file: client proto udp explicit-exit-notify remote 192.168.0.150 1194 dev tun resolv-retry infinite nobind persist-key persist-tun ... Normally the remote IP by default will be the address of your public IP which is normal if you have your VPN server on your local network and need remote access from outside this network. You can leave the public IP address in the config, but then you will have to open up the correct port and set the routing on your internet access point. Finally, we can test the VPN connection. The first time the connection will probably fail as the firewall on the OpenVPN Linux server is blocking the access. To quickly test this, we can just disable the firewall using the command: # systemctl stop firewalld Alternatively, configure Linux firewall for OpenVPN connectivity: # sudo firewall-cmd --add-service=openvpn # sudo firewall-cmd --permanent --add-service=openvpn # sudo firewall-cmd --add-masquerade # sudo firewall-cmd --permanent --add-masquerade # sudo firewall-cmd --permanent --add-port=1194/udp # sudo firewall-cmd --reload Now the connection should work: On the windows client, you should now also get an additional VPN adapter configured with a default IP address of 10.8.0.2 (this subnet is defined within the file /etc/openvpn/server.conf). 3. How To Use Radius With OpenVPN First, we will install the IBM Security Verify Gateway for Radius on a Windows machine. This package can be downloaded from the IBM Security AppExchange (you will need to use your IBMid to log in). Extract and run the installation using setup_radius.exe. Edit the Radius configuration file c:\Program Files\IBM\IbmRadius\IbmRadiusConfig.json: Find the clients section in the configuration file. The default file has three example client definitions. Delete these definitions and replace them with the single definition shown above. This definition will match any Radius client connecting from the network used by the test machines. The secret authenticates the client. Save the file and close the editor. JSON { "address":"::", "port":1812, /* "trace-file":"c:/tmp/ibm-auth-api.log", "trace-rollover":12697600, */ "ibm-auth-api":{ "client-id":"???????", "obf-client-secret":"???????", /* See IbmRadius -obf "the-secret" */ "protocol":"https", "host":"???????.verify.ibm.com", "port":443, "max-handles":16 }, "clients":[ { "name": "OpenVPN", "address": "192.168.0.0", "mask": "255.255.0.0", "secret": "Passw0rd", "auth-method": "password-and-device", "use-external-ldap": false, "reject-on-missing-auth-method": true, "device-prompt": "A push notification has been sent to your device:[%D].", "poll-device": true, "poll-timeout": 60 } ] } Complete the fields client-id, obf-client-secret and host with the correct information to point to your IBM Verify Saas API. Before we can do this, we will need to set up API access in IBM Verify Saas. Login to your environment or go for a trial account if you don’t have one. From the main menu, select Security > API Access > Add API client Create a new API Client : Specify the entitlements by selecting the check bow from the list: Authenticate any user Read authenticator registrations for all users Read users and groups Read second-factor authentication enrollment for all users Click next on the following screens and finally give the API client a name: e.g. MFA-Client A Client ID and Secret will automatically be created for you. Use this information to complete the Radius config. Use the c:\Program Files\IBM\IbmRadius\IbmRadius.exe -obf command to generate the obfuscated secret value. Finally, configure the IBM Radius service to startup automatically and start the service: Test Radius Authentication using the Radius tool : NTRadPing You should get a push notification on the IBM Verify app on the mobile device. (Make sure you test with a userid that is known in IBM Verify Saas and is enrolled for OTP) 4. Install OpenVPN Radius Plugin Log in to the Linux OpenVPN server and launch the following commands: # wget https://www.nongnu.org/radiusplugin/radiusplugin_v2.1a_beta1.tar.gz # tar -xvf radiusplugin_v2.1a_beta1.tar.gz # cd radiusplugin_v2.1a_beta1 # yum install libgcrypt libgcrypt-devel gcc-c++ # make Copy the Radius plugin files to /etc/openvpn # cp /root/radiusplugin_v2.1a_beta1/radiusplugin.cnf /etc/openvpn # cp /root/radiusplugin_v2.1a_beta1/radiusplugin.so /etc/openvpn Edit the file /etc/openvpn/server.conf and add the following line to activate the Radius plugin: plugin /etc/openvpn/radiusplugin.so /etc/openvpn/radiusplugin.cnf Edit the file /etc/openvpn/radiusplugin.cnf and modify the ip address of the Radius server and set the sharedsecret to Passw0rd (this is the secret that was also configured on the Radius server side). Make sure to set nonfatalaccounting=true because the Radius server does not support Radius accounting. C ... NAS-IP-Address=<IP Address of the OpenVPN Server> ... nonfatalaccounting=true ... Server { # The UDP port for RADIUS accounting. acctport=1813 # The UDP port for RADIUS authentication. authport=1812 # The name or ip address of the RADIUS server. name=<IP Address of the RADIUS Server> # How many times should the plugin send the if there is no response? retry=1 # How long should the plugin wait for a response? wait=60 # The shared secret. sharedsecret=Passw0rd } Save the file and restart the OpenVPN server using the command : # systemctl restart openserver-server@server.service Finally, edit the OpenVPN client file demouser.ovpn and add a line auth-user-pass : client proto udp auth-user-pass explicit-exit-notify remote 192.168.0.150 1194 dev tun resolv-retry infinite nobind persist-key persist-tun ... This will allow the user to enter a username and password when initiating the VPN connection. These credentials will be authenticated against the IBM Verify Saas directory, and this should result in a challenge request on the IBM Verify Mobile app. The wait=60 will allow the plugin to wait for a response from the user who has to respond to the challenge using the IBM Verify App on his phone. If you prefer to use a TOTP challenge instead, you can modify the Radius configuration file on Windows (IBMRadiusConfig.json) and set the auth-method to password-and-totp. Then you can open the client VPN connection and use 123456:password instead of the normal password.
What's Going to Make Developers' Lives Easier?
March 24, 2023 by CORE
Testing Level Dynamics: Achieving Confidence From Testing
March 24, 2023 by
What “The Rings of Power” Taught Me About a Career in Tech
March 23, 2023 by
10 Things to Know When Using SHACL With GraphDB
March 25, 2023 by
Top 5 Data Streaming Trends for 2023
March 25, 2023 by CORE
REST vs. Messaging for Microservices
March 24, 2023 by
REST vs. Messaging for Microservices
March 24, 2023 by
MongoDB Time Series Benchmark and Review
March 24, 2023 by
March 24, 2023 by
10 Things to Know When Using SHACL With GraphDB
March 25, 2023 by
Top 5 Data Streaming Trends for 2023
March 25, 2023 by CORE
ClickHouse: A Blazingly Fast DBMS With Full SQL Join Support
March 24, 2023 by
REST vs. Messaging for Microservices
March 24, 2023 by
Browser Engines: The Crux of Cross-Browser Compatibility
March 24, 2023 by
Stop Using Spring Profiles Per Environment
March 24, 2023 by
What Database We Use To Support Our AI Chatbots
March 24, 2023 by
March 24, 2023 by
Browser Engines: The Crux of Cross-Browser Compatibility
March 24, 2023 by