Software Integration
Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!
Distributed SQL Essentials
Advanced Cloud Security
A few years in software development do not equate to a linear amount of knowledge and information. So is true for the .NET ecosystem. The first release of .NET 1.0 saw the light of day in January 2002. The significance of the event was not the palindrome year. The new paradigm is offered to the traditional C++ MFC, VB, and classic ASP developers, supporting two main new languages, C#, and VB.NET. It started as a proprietary Windows technology, closed source language, primarily appealing to the companies on Microsoft Windows stack, tied to the sole IDE, Visual Studio. While .NET was a step forward, the pace and the options outside the designated use were abysmal. The situation took a sharp turn, with a substantial change in 2016 with the introduction of .NET Core. Starting from the heart of the .NET ecosystem, ASP.NET, the change has spread through the entire platform, leading to a complete re-imagination and makeover of runtime and language. Open source, cross-platform, and free for commercial use, .NET and C# became viable options for many projects that traditionally would go with another platform and languages. From web to mobile, from desktop to backend systems, in any cloud environment, .NET is a solid and viable option with outstanding experience and rich offerings. Common Challenges When Uncle Ben told Pete, "With great power comes great responsibility" in the 2002 Spiderman movie, he was not referring to the newly emerging .NET platform. This phrase could be applied to the challenges any developer on any platform using any language will eventually face. And .NET is not an exception. Here are just a few things that can go wrong. Cross-platform software development is often taking place on a single platform. It could be Windows, Linux, or Mac operating systems. The software then needs to be deployed and executed on a platform that might not be a one-to-one match with the development or testing environment. And while technologies such as Docker containers have made it simpler to "ship your development environment to production," it is still not a bulletproof solution. The differences between environments still pose a risk of running into environmental discrepancies, leading to bugs. Cloud environments pose even more significant challenges, with infrastructure and code executed remotely on the environment outside our reach and control. It could be Azure App Service, AWS Fargate, or GCP Cloud Functions. These services provide foundational troubleshooting but cannot cater to the specifics of the application and its use cases, usually involving additional intrusive services required for troubleshooting. Troubleshooting Options .NET developers are offered a few options to troubleshoot the problems faced in production running .NET-based systems. Here are a few of those: Crash dumps. Analyzing crash dumps is a skill that only a few possess. A more significant challenge is that the analysis can pinpoint why the code was crushed but not what led to that critical moment. Metric and counters. Emitting metrics and counter values from the code, collected and visualized, allows better insights into the remotely executing code. But it lacks the necessary dials and knobs to dynamically adjust the scope of focus. For example, emitting the value of a specific one or more variables within a particular method is not an option. Logs. Logging is one of the oldest and the most common techniques to help the developer identify issues in the running code. The code is decorated with additional instructions that emit information into a sink, such as a file, a remote service, or any other destination where the logs are retained for a defined period. This option provides a lot of information, but it also has drawbacks. Unnecessary amounts of data irrelevant to the outage or the bug being investigated are stored and muddy the water. In a scaled-out solution, these logs are multiplied. And when consolidated into a single service, such as Application Insights, it requires accurate filtering to separate the wheat from the chaff. Not to mention the price tag associated with processing and storing those logs. But one of the most significant drawbacks of static logging is the inability to adjust what is logged and the duration of the logging session. Snapshot debugging. Ability to debug remotely executed code within the IDE. Upon exceptions thrown, a debug snapshot is collected from the running application and sent to Application Insights. The stored information includes the stack trace. In addition, a minidump can be obtained in Visual Studio to enhance visibility into why an exception occurred in the first place. But this is still a reactive solution to a problem that requires a more proactive approach. Dynamic Logging and Observability Meet dynamic logging with Lightrun. With dynamic logging, the code does not have to be instrumented at every spot. Instead, we want to gain visibility at run time. So instead, the code is left as-is. The magic ingredient is the agent enabling dynamic logging and observability that is added to the solution in the form of a NuGet package. Once included, it is initialized once in your application starting code. And that is it. From there, the Lightrun agent takes care of everything else. And by that, it means connecting to your application at any time to retrieve logs on an ad-hoc basis, without the unnecessary code modification/instrumentation, going through the rigorous code changes approval process, testing, and deployment. Logs, metrics, and snapshots can be collected and presented on demand, all without wracking a substantial bill otherwise incurred with static logging and metrics storage or leaving the comfort of the preferable .NET IDE of your choice — VS Code, Rider, or Visual Studio (coming soon). The results are immediately available in the IDE or streamed to your observability services such as Sentry, New Relic, DataDog, and many other services in the application performance monitoring space. Summary .NET offers a range of software development options with rich support for libraries and platforms. With extra help from Lightrun, a symbiotic relationship takes code troubleshooting to the next level, where investigating and resolving code deficiencies does not have to be a lengthy and costly saga.
There is a lot of buzz within the software testing and development communities about ChatGPT and the role of generative AI in testing. Some of the opinion pieces, webinars, and videos focus on the potentially beneficial applications of generative AI for testing speed and quality. However, many focus on how ChatGPT might impact a tester's future job prospects and security. Some go as far as questioning whether ChatGPT will replace the role of manual testers and SDETs. The popularity of this type of content is natural, given the anxiety associated with wavering job security in tech and the natural hunger of testers to learn the latest tools and skills. Meanwhile, some organizations still view testing as a problematic cost center. The appeal of automation and AI for these managers, in turn, becomes cost avoidance while still going through the motions of "QA." How well-founded is the speculation that ChatGPT might "replace" testing or, at least, replace testers? And, for senior management, how should you think about the role of AI-based automation in your organization's future QA investment? A Worry as Old as Automation Itself The view that newer technologies and newly skilled workers will render existing jobs redundant is not new. The term "luddite," for instance, stems from a movement of textile workers in 19th Century England who opposed textile machinery as a threat to the value of their highly-skilled work. "Luddites" in the early 1800s opposed – and smashed – new textile machinery like looms. Within software testing, these same concerns emerged some 10+ years ago when marketers presented test execution automation as a "silver bullet" for testing efficiencies and bloated testing budgets. And now, today, ChatGPT and generative AI brings the promise that it will automate the automation itself, automating the labor performed by SDETs and engineers. These toilsome tasks might include writing test scripts and Gherkin specifications or sourcing test data and analyzing requirements. Let's assume that ChatGPT and generative AI can automate these tasks. For the sake of this article, let's park the sizeable challenges surrounding the data used to train ML/AI models, questions of test coverage, and issues associated with governance, bias, and more. What room will be left for the testers and SDETs who have spent years crafting their skills and expertise if intelligent automation can produce test automation at a fraction of the time and cost? The Testing Discipline Is Going Nowhere Yet, automating test execution did not automate away testing or QA. In fact, it created a raft of new and complex processes, requiring new skills, tools, and dedicated roles like the SDET. No matter how much testing has been "automated," there has always been more to automate and maintain. Testing exhaustively is more of an impossible goal today than it was a decade ago. The average test team has automated just 15-20% of all tests, reflecting this ever-growing testing requirement. The total number of possible test scenarios that could be automated just keeps growing. To understand this ever-growing "to-do" list, we must consider the relationship between speed and complexity. We must consider how doing individual tasks faster tends to create more tasks to complete, however paradoxical this might at first appear. Moore's Law and More to Test One appeal of automation is the time and cost efficiency by which it executes high volumes of tasks. In testing, automation has been well suited to repetitive and recurring tasks like regression testing. ChatGPT and generative AI promise to expand the scope beyond scenarios pre-defined by humans, further reducing the labor associated with creating and running tests. Yet, the efficiency gained through automation is not isolated to testing. If it were, testing could hope to close the gap in exhaustive test coverage. Instead, the testing requirement continues to grow in size and complexity. As QA becomes faster in its individual processes, so does software design, development, and every constituent part of the SDLC. These individual efficiencies enable faster software delivery, making changes of greater functional size and logical complexity. This leaves less time to test. The number of logical combinations to test will continue to grow, as will the need to maintain and refactor ever-growing volumes of historical code and tests. No matter how much you optimize, your delivery processes will also grow in complexity: As Moore's law reminds us, this growing complexity of software systems will not slow down. One benefit of automation is that we can offload this complexity and the associated workload to machines that are constantly growing in processing power. As a result, they can execute ever more tasks ever faster. Yet, this efficiency, in turn, brings more complexity and, with it, more work to do. The requirement to optimize and go faster will never go away, while the sum total of work in front of us will never in itself decrease. How Will ChatGPT Sit Alongside Your Testing? It stands to reason, then, that ChatGPT and generative AI will not "replace" testing or remove the need to invest in QA. Instead, like test execution automation before it, generative AI will provide a useful tool for moving faster. Yet, there will always be a need for more work, and at least a constant (if not greater) need for human input. Testers' time might be applied less to repetitive tasks like scripting, but new processes will fill the void. Meanwhile, the creativity and critical thinking offered by testers will not diminish in value as these repetitive processes are automated; such creativity should be given greater freedom. At the same time, your testers will have vital insight into how generative AI should be used in your organization. Nothing is adopted overnight, and identifying the optimal applications of tools like ChatGPT will be an ongoing conversation, just as the testing community has continually explored and improved practices for getting the most out of test automation frameworks. Lastly, as the volume of possible test scenarios grows, automation and AI will need a human steer in knowing where to target its efforts, even as we can increasingly use data to target test generation. How Can You Use ChatGPT in Your Testing? So, what can you do to get the most out of ChatGPT in your testing? If you are a tester or SDET, approach generative AI like you approach test automation tools and frameworks. Skill up and learn new technologies, considering which are the best fit for your organization. There is already a range of resources to learn how to leverage ChatGPT in testing, even if many are still quite speculative. These new skills will not only help you remain competitive within an increasingly crowded job market but also help you become a leader of beneficial change at your organization. If you are a QA manager or director, give your teams the time, space, and investment needed to experiment and explore these new tools and techniques. Change and improvement in testing practices are often blocked because teams are too busy pumping out deliverables using outdated techniques. Rather than hope that generative AI will provide a silver bullet to these growing inefficiencies, sacrifice a fraction of your teams' available testing time to learn about new tools, and consider how they can best be applied. Remember that your solutions will lie in "engineering augmented by AI," just as test automation offered an accelerator – but not a replacement – to Quality Engineering.
Java developers have often envied JavaScript for its ease of parsing JSON. Although Java offers more robustness, it tends to involve more work and boilerplate code. Thanks to the Manifold project, Java now has the potential to outshine JavaScript in parsing and processing JSON files. Manifold is a revolutionary set of language extensions for Java that completely changes the way we handle JSON (and much more). Getting Started With Manifold The code for this tutorial can be found on my GitHub page. Manifold is relatively young but already vast in its capabilities. You can learn more about the project on their website and Slack channel. To begin, you'll need to install the Manifold plugin, which is currently only available for JetBrains IDEs. The project supports LTS releases of Java, including the latest JDK 19. We can install the plugin from IntelliJ/IDEAs settings UI by navigating to the marketplace and searching for Manifold. The plugin makes sure the IDE doesn’t collide with the work done by the Maven/Gradle plugin. Manifold consists of multiple smaller projects, each offering a custom language extension. Today, we'll discuss one such extension, but there's much more to explore. Setting Up a Maven Project To demonstrate Manifold, we'll use a simple Maven project (it also works with Gradle). We first need to paste the current Manifold version from their website and add the necessary dependencies. The main dependency for JSON is the manifold-json-rt dependency. Other dependencies can be added for YAML, XML, and CSV support. We need to add this to the pom.xml file in the project. I'm aware of the irony where the boilerplate reduction for JSON starts with a great deal of configuration in the Maven build script. But this is configuration, not "actual code" and it's mostly copy and paste. Notice that if you want to reduce this code the Gradle equivalent code is terse by comparison. This line needs to go into the properties section: <manifold.version>2023.1.5</manifold.version> The dependencies we use are these: <dependencies> <dependency> <groupId>systems.manifold</groupId> <artifactId>manifold-json-rt</artifactId> <version>${manifold.version}</version> </dependency> The compilation plugin is the boilerplate that weaves Manifold into the bytecode and makes it seamless for us. It’s the last part of the pom setup: <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>19</source> <target>19</target> <encoding>UTF-8</encoding> <compilerArgs> <!-- Configure manifold plugin--> <arg>-Xplugin:Manifold</arg> </compilerArgs> <!-- Add the processor path for the plugin --> <annotationProcessorPaths> <path> <groupId>systems.manifold</groupId> <artifactId>manifold-json</artifactId> <version>${manifold.version}</version> </path> </annotationProcessorPaths> </configuration> </plugin> </plugins> </build> With the setup complete, let's dive into the code. Parsing JSON With Manifold We place a sample JSON file in the project directory under the resources hierarchy. I placed this file under src/main/resources/com/debugagent/json/Test.json: { "firstName": "Shai", "surname": "Almog", "website": "https://debugagent.com/", "active": true, "details":[ {"key": "value"} ] } In the main class, we refresh the Maven project, and you'll notice a new Test class appears. This class is dynamically created by Manifold based on the JSON file. If you change the JSON and refresh Maven, everything updates seamlessly. It’s important to understand that Manifold isn’t a code generator. It compiles the JSON we just wrote into bytecode. The Test class comes with several built-in capabilities, such as a type-safe builder API that lets you construct JSON objects using builder methods. You can also generate nested objects and convert the JSON to a string by using the write() and toJson() methods. It means we can now write: Test test = Test.builder().withFirstName("Someone") .withSurname("Surname") .withActive(true) .withDetails(List.of( Test.details.detailsItem.builder(). withKey("Value 1").build() )) .build(); Which will printout the following JSON: { "firstName": "Someone", "surname": "Surname", "active": true, "details": [ { "key": "Value 1" } ] } We can similarly read a JSON file using code such as this: Test readObject = Test.load().fromJson(""" { "firstName": "Someone", "surname": "Surname", "active": true, "details": [ { "key": "Value 1" } ] } """); Note the use of Java 15 TextBlock syntax for writing a long string. The load() method returns an object that includes various APIs for reading the JSON. In this case, it is read from a String but there are APIs for reading it from a URL, file, etc. Manifold supports various formats, including CSV, XML, and YAML, allowing you to generate and parse any of these formats without writing any boilerplate code or sacrificing type safety. In order to add that support we will need to add additional dependencies to the pom.xml file: <dependency> <groupId>systems.manifold</groupId> <artifactId>manifold-csv-rt</artifactId> <version>${manifold.version}</version> </dependency> <dependency> <groupId>systems.manifold</groupId> <artifactId>manifold-xml-rt</artifactId> <version>${manifold.version}</version> </dependency> <dependency> <groupId>systems.manifold</groupId> <artifactId>manifold-yaml-rt</artifactId> <version>${manifold.version}</version> </dependency> With these additional dependencies, this code will print out the same data as the JSON file. With test.write().toCsv() the output would be: "firstName","surname","active","details" "Someone","Surname","true","[manifold.json.rt.api.DataBindings@71070b9c]" Notice that the Comma Separated Values (CSV) output doesn’t include hierarchy information. That’s a limitation of the CSV format and not the fault of Manifold. With test.write().toXml() the output is familiar and surprisingly concise: <root_object firstName="Someone" surname="Surname" active="true"> <details key="Value 1"/> </root_object> With test.write().toYaml() we again get a familiar printout: firstName: Someone surname: Surname active: true details: - key: Value 1 Working With JSON Schema Manifold also works seamlessly with JSON schema, allowing you to enforce strict rules and constraints. This is particularly useful when working with dates and enums. Manifold seamlessly creates/updates byte code that adheres to the schema, making it much easier to work with complex JSON data. This schema is copied and pasted from the Manifold GitHub project: { "$schema": "http://json-schema.org/draft-07/schema#", "$id": "http://example.com/schemas/User.json", "type": "object", "definitions": { "Gender": { "type": "string", "enum": ["male", "female"] } }, "properties": { "name": { "type": "string", "description": "User's full name.", "maxLength": 80 }, "email": { "description": "User's email.", "type": "string", "format": "email" }, "date_of_birth": { "type": "string", "description": "Date of uses birth in the one and only date standard: ISO 8601.", "format": "date" }, "gender": { "$ref" : "#/definitions/Gender" } }, "required": ["name", "email"] } It’s a relatively simple schema, but I’d like to turn your attention to several things here. It defines name and email as required. This is why when we try to create a User object using a builder in Manifold, the build() method requires both parameters: User.builder("Name", "email@domain.com") That is just the start. The schema includes a date. Dates are a painful prospect in JSON, the standardization is poor and fraught with issues. The schema also includes a gender field which is effectively an enum. This is all converted to type-safe semantics using common Java classes such as LocalDate: User u = User.builder("Name", "email@domain.com") .withDate_of_birth(LocalDate.of(1999, 10, 11)) .withGender(User.Gender.male) .build(); That can be made even shorter with static imports but the gist of the idea is clear. JSON is effectively native to Java in Manifold. Video The Tip of The Iceberg Manifold is a powerful and exciting project. It revolutionizes JSON parsing in Java but that’s just one tiny portion of what it can do! We've only scratched the surface of its capabilities in this post. In the next article, we'll dive deeper into Manifold and explore some additional unexpected features. Please share your experience and thoughts about Manifold in the comments section. If you have any questions, don't hesitate to ask.
Artificial intelligence (AI) has permeated our lives in a myriad of ways, making everyday tasks easier, more efficient, and personalized. One of the most significant applications of AI is in recommender systems, which have become an integral part of our digital experiences. From suggesting movies on streaming platforms to proposing products on e-commerce websites, AI-based recommender systems have revolutionized content consumption and online shopping. This article delves into the inner workings of AI-based recommender systems, exploring their different types, algorithms, and challenges. We will also discuss the potential future developments in this field. Understanding Recommender Systems A recommender system is a sophisticated algorithm that analyzes user preferences, behavior, and other contextual factors to provide personalized recommendations. These systems enable businesses to offer relevant content or products to users, improving user experience and engagement. Recommender systems have become increasingly popular due to the exponential growth of digital content and the need to filter through the vast amount of information available to users. By presenting users with relevant content or products, recommender systems help users make choices more efficiently and drive customer satisfaction. Types of Recommender Systems AI-based recommender systems can be broadly classified into three categories: 1. Content-Based Filtering These systems recommend items based on their features and the user's preferences or past behavior. For instance, if a user has watched action movies in the past, the system will recommend more action movies for that user. Content-based filtering relies on analyzing item features and user preferences to generate recommendations. 2. Collaborative Filtering Collaborative filtering systems make recommendations based on the collective behavior of users. There are two main types of collaborative filtering: User-User Collaborative Filtering: This method finds users who have similar preferences or behavior and recommends items that these similar users have liked or interacted with in the past. Item-Item Collaborative Filtering: This approach identifies items that are similar to the ones the user has liked or interacted with and recommends these similar items to the user. 3. Hybrid Recommender Systems These systems combine content-based and collaborative filtering techniques to provide more accurate and diverse recommendations. By leveraging the strengths of both methods, hybrid systems can overcome the limitations of each individual approach. Key Algorithms Used in AI-Based Recommender Systems There are several algorithms used in building AI-based recommender systems, some of which are: Matrix Factorization This technique reduces the dimensionality of the user-item interaction matrix by finding latent factors that explain the observed interactions. Matrix factorization methods, such as Singular Value Decomposition (SVD), are widely used in collaborative filtering systems. Deep Learning Deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), can be used to analyze and extract features from the content of items, enabling content-based filtering systems to generate more accurate recommendations. Nearest Neighbors The k-nearest neighbors (k-NN) algorithm is a popular choice for collaborative filtering systems, as it can quickly identify similar users or items based on their interactions. The algorithm calculates the similarity between users or items and recommends the most similar ones to the user. Reinforcement Learning Some recommender systems use reinforcement learning techniques like Q-learning and Deep Q-Networks (DQN) to learn the best recommendations by continually updating their models based on user feedback and interactions. Challenges in AI-Based Recommender Systems Despite their widespread success, AI-based recommender systems still face several challenges: Cold Start Problem When a new user or item is introduced to the system, there is limited information about their preferences or features, making it difficult to generate accurate recommendations. This is known as the cold start problem. One solution to this issue is incorporating demographic information, social network data, or other contextual factors to generate initial recommendations. Scalability As the number of users and items increases, the computational complexity of the recommender system grows, posing challenges in terms of processing power and storage requirements. However, techniques such as matrix factorization, approximate nearest neighbor search, and distributed computing can help address scalability issues. Diversity and Serendipity Recommender systems may become too focused on providing similar content or products, leading to a lack of diversity in recommendations. This can result in users being trapped in a so-called filter bubble, where they are only exposed to content that aligns with their existing preferences. To overcome this, systems can be designed to incorporate diversity and serendipity, providing users with unexpected recommendations that may be of interest. Privacy and Security AI-based recommender systems rely on user data to generate recommendations, raising concerns about user privacy and the security of personal information. To mitigate these risks, methods such as anonymization, data encryption, and federated learning can be employed. The Future of AI-Based Recommender Systems As AI and machine learning technologies continue to advance, we can expect recommender systems to evolve in several ways: Context-Aware Recommendations Future recommender systems will likely take into account more contextual information, such as user location, device, time of day, and other situational factors, to generate more relevant recommendations. Explainable AI Users may demand more transparency and interpretability from AI-based recommender systems. Therefore, developing models that can provide clear explanations for their recommendations will be crucial in building trust and fostering user engagement. Multimodal Recommendations Recommender systems may begin incorporating multiple data types, such as text, images, and audio, to understand user preferences and item features better, leading to more accurate and diverse recommendations. Cross-Domain Recommendations AI-based recommender systems could be developed to provide recommendations across different domains, such as suggesting movies based on a user's favorite books or recommending travel destinations based on their preferred activities. Conclusion AI-based recommender systems have become an essential part of our digital lives, helping us navigate the overwhelming amount of content and products available online. By understanding the underlying algorithms and techniques, as well as the challenges and potential future developments, we can better appreciate the power and value of these systems. As AI technology continues to evolve, we can expect recommender systems to become even more accurate, personalized, and diverse, further enhancing our digital experiences.
Developing scalable and reliable applications is a labor of love. A cloud-native system might consist of unit tests, integration tests, build tests, and a full pipeline for building and deploying applications at the click of a button. A number of intermediary steps might be required to ship a robust product. With distributed and containerized applications flooding the market, so too have container orchestration tools like Kubernetes. Kubernetes allows us to build distributed applications across a cluster of nodes, with fault tolerance, self-healing, and load balancing — plus many other features. Let’s explore some of these tools by building a distributed to-do list application in Node.js, backed by the YugabyteDB distributed SQL database. Getting Started A production deployment will likely involve setting up a full CI/CD pipeline to push containerized builds to the Google Container Registry to run on Google Kubernetes Engine or similar cloud services. For demonstration purposes, let’s focus on running a similar stack locally. We’ll develop a simple Node.js server, which is built as a docker image to run on Kubernetes on our machines. We’ll use this Node.js server to connect to a YugabyteDB distributed SQL cluster and return records from a rest endpoint. Installing Dependencies We begin by installing some dependencies for building and running our application. Docker Desktop Docker is used to build container images, which we’ll host locally. Minikube Creates a local Kubernetes cluster for running our distributed and application YugabyteDB Managed Next, we create a YugabyteDB Managed account and spin up a cluster in the cloud. YugabyteDB is PostgreSQL-compatible, so you can also run a PostgreSQL database elsewhere or run YugabyteDB locally if desired. For high availability, I’ve created a 3-node database cluster running on AWS, but for demonstration purposes, a free single-node cluster works fine. Seeding Our Database Once our database is up and running in the cloud, it’s time to create some tables and records. YugabyteDB Managed has a cloud shell that can be used to connect via the web browser, but I’ve chosen to use the YugabyteDB client shell on my local machine. Before connecting, we need to download the root certificate from the cloud console. I’ve created a SQL script to use to create a todos table and some records. SQL CREATE TYPE todo_status AS ENUM ('complete', 'in-progress', 'incomplete'); CREATE TABLE todos ( id serial PRIMARY KEY, description varchar(255), status todo_status ); INSERT INTO todos (description, status) VALUES ( 'Learn how to connect services with Kuberenetes', 'incomplete' ), ( 'Build container images with Docker', 'incomplete' ), ( 'Provision multi-region distributed SQL database', 'incomplete' ); We can use this script to seed our database. Shell > ./ysqlsh "user=admin \ host=<DATABASE_HOST> \ sslmode=verify-full \ sslrootcert=$PWD/root.crt" -f db.sql With our database seeded, we’re ready to connect to it via Node.js. Build a Node.js Server It’s simple to connect to our database with the node-postgres driver. YugabyteDB has built on top of this library with the YugabyteDB Node.js Smart Driver, which comes with additional features that unlock the powers of distributed SQL, including load-balancing and topology awareness. Shell > npm install express > npm install @yugabytedb/pg JavaScript const express = require("express"); const App = express(); const { Pool } = require("@yugabytedb/pg"); const fs = require("fs"); let config = { user: "admin", host: "<DATABASE_HOST>", password: "<DATABASE_PASSWORD>", port: 5433, database: "yugabyte", min: 5, max: 10, idleTimeoutMillis: 5000, connectionTimeoutMillis: 5000, ssl: { rejectUnauthorized: true, ca: fs.readFileSync("./root.crt").toString(), servername: "<DATABASE_HOST>", }, }; const pool = new Pool(config); App.get("/todos", async (req, res) => { try { const data = await pool.query("select * from todos"); res.json({ status: "OK", data: data?.rows }); } catch (e) { console.log("error in selecting todos from db", e); res.status(400).json({ error: e }); } }); App.listen(8000, () => { console.log("App listening on port 8000"); }); Containerizing Our Node.js Application To run our Node.js application in Kubernetes, we first need to build a container image. Create a Dockerfile in the same directory. Dockerfile FROM node:latest WORKDIR /app COPY . . RUN npm install EXPOSE 8000 ENTRYPOINT [ "npm", "start" ] All of our server dependencies will be built into the container image. To run our application using the npm start command, update your package.json file with the start script. JSON … "scripts": { "start": "node index.js" } … Now, we’re ready to build our image with Docker. Shell > docker build -t todo-list-app . Sending build context to Docker daemon 458.4MB Step 1/6 : FROM node:latest ---> 344462c86129 Step 2/6 : WORKDIR /app ---> Using cache ---> 49f210e25bbb Step 3/6 : COPY . . ---> Using cache ---> 1af02b568d4f Step 4/6 : RUN npm install ---> Using cache ---> d14416ffcdd4 Step 5/6 : EXPOSE 8000 ---> Using cache ---> e0524327827e Step 6/6 : ENTRYPOINT [ "npm", "start" ] ---> Using cache ---> 09e7c61855b2 Successfully built 09e7c61855b2 Successfully tagged todo-list-app:latest Our application is now packaged and ready to run in Kubernetes. Running Kubernetes Locally With Minikube To run a Kubernetes environment locally, we’ll run Minikube, which creates a Kubernetes cluster inside of a Docker container running on our machine. Shell > minikube start That was easy! Now we can use the kubectl command-line tool to deploy our application from a Kubernetes configuration file. Deploying to Kubernetes First, we create a configuration file called kubeConfig.yaml which will define the components of our cluster. Kubernetes deployments are used to keep pods running and up-to-date. Here we’re creating a cluster of nodes running the todo-app container that we’ve already built with Docker. YAML apiVersion: apps/v1 kind: Deployment metadata: name: todo-app-deployment labels: app: todo-app spec: selector: matchLabels: app: todo-app replicas: 3 template: metadata: labels: app: todo-app spec: containers: - name: todo-server image: todo ports: - containerPort: 8000 imagePullPolicy: Never In the same file, we’ll create a Kubernetes service, which is used to set the networking rules for your application and expose it to clients. YAML --- apiVersion: v1 kind: Service metadata: name: todo-app-service spec: type: NodePort selector: app: todo-app ports: - name: todo-app-service-port protocol: TCP port: 8000 targetPort: 8000 nodePort: 30100 Let’s use our configuration file to create our todo-app-deployment and todo-app-service. This will create a networked cluster, resilient to failures and orchestrated by Kubernetes! Shell > kubectl create -f kubeConfig.yaml Accessing Our Application in Minikube Shell > minikube service todo-app-service --url Starting tunnel for service todo-app-service. Because you are using a Docker driver on darwin, the terminal needs to be open to run it. We can find the tunnel port by executing the following command. Shell > ps -ef | grep docker@127.0.0.1 503 2363 2349 0 9:34PM ttys003 0:00.01 ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -N docker@127.0.0.1 -p 53664 -i /Users/bhoyer/.minikube/machines/minikube/id_rsa -L 63650:10.107.158.206:8000 The output indicates that our tunnel is running at port 63650. We can access our /todos endpoint via this URL in the browser or via a client. Shell > curl -X GET http://127.0.0.1:63650/todos -H 'Content-Type: application/json' {"status":"OK","data":[{"id":1,"description":"Learn how to connect services with Kuberenetes","status":"incomplete"},{"id":2,"description":"Build container images with Docker","status":"incomplete"},{"id":3,"description":"Provision multi-region distributed SQL database","status":"incomplete"}]} Wrapping Up With a distributed infrastructure in place in our application and database tiers, we’ve developed a system built to scale and survive. I know, I know, I promised you the most resilient to-do app the world has ever seen and didn’t provide a user interface. Well, that’s your job! Extend the API service we’ve developed in Node.js to serve the HTML required to display our list. Look out for more from me on Node.js and distributed SQL — until then, keep on coding!
Have you ever wondered if people can take advantage of vulnerabilities present in your code and exploit it in different ways, like selling or sharing exploits, creating malware that can destroy your functionality, launching targeted attacks, or even engaging in cyber attacks? These mostly happen through known vulnerabilities which are present in the code, which are also known as CVEs, which stand for Common Vulnerabilities and Exposures. In 2017, a malicious ransomware attack, WannaCry, wrought havoc by infiltrating over 300,000 computers in more than 150 nations. The assailants were able to utilize a flaw in the Microsoft Windows operating system, which had been designated a CVE identifier (CVE-2017–0144), to infect the computers with the ransomware. The ransomware encrypted users’ files and demanded a ransom payment in exchange for the decryption key, causing massive disruptions to businesses, hospitals, and government agencies. The attack’s total cost was estimated to have been in the billions of dollars. When you have thousands of packages in use for a single functionality, it can be daunting to track every package utilized in your code and determine whether it is vulnerable. How do you ensure that your code is secure and cannot be abused in any way? What if you automatically get notified with a Slack alert — as soon as a new vulnerability is detected — in one of your many Git repositories. What if an issue automatically gets created, which can be used to monitor similar issues daily? This is where Dependabot enters the picture. Dependabot can be integrated with various development tools and platforms such as GitHub, GitLab, Jenkins, and Travis CI, among others. It also supports a wide range of programming languages, and it can also be used with Docker to examine Docker images for outdated dependencies. As a result, it is a versatile tool for managing dependencies and keeping projects current with the most recent security patches and bug fixes. To maintain the safety and security of your program dependencies, the use of Dependabot notifications is critical. Dependabot automates the process of scanning your code repositories for vulnerabilities and out-of-date dependencies. Dependabot alerts are notifications sent when a vulnerability is discovered in one of your dependencies. They are meant to keep you informed about any potential security risks that may emerge. Dependabot Alerts can only be viewed by individuals with admin access to a repository or to users, and teams who are given access explicitly will have permission to view and manage Dependabot or secret scanning alerts. However, most people will not have access to Dependabot alerts for the component or microservice which they are working on. How can a developer learn about CVEs present in the code? How can they remediate vulnerabilities if they are unaware of them? In this scenario, we can use a combination of GitHub API, Webhooks, and Tekton pipelines to our advantage. You can leverage the IBM Cloud toolchain to create an automation that creates issues in the repository where the CVE was identified and can also close the issue once the CVE has been remedied. This way, developers can keep track of the vulnerabilities present in the code and have a clearer idea, which aids them in remaining compliant. Flow Diagram The Pipeline Implementation GitHub has the ability to send POST requests to a webhook for a different set of events like repository vulnerability alerts (aka Dependabot Alerts) which will be useful for our case. Creating a webhook with this event selected will send POST requests whenever a new vulnerability is detected or remediated. The webhook can act as a pipeline trigger. Github Hooks Configuration Before implementing the pipeline, it is important to understand the payload of the POST request and how it can be utilized. The payload contains an “action” key and an “alert” key. The “action” key indicates whether it is a remediation alert or a creation alert, while the “alert” key contains an array of important details such as the affected package name, range, severity, and suggested fix. Git Issue To utilize this information and alert the team, a pipeline can be created with a generic webhook trigger, which will trigger whenever there is a request sent to the webhook. The pipeline can extract the affected package from the payload and check if an issue for the CVE already exists in the repository. If not, it will create an issue with all the required details to identify the CVE and provide a suggested fix, if any. The pipeline can also add IBM-recommended due dates based on the severity. Once the developer works on remediation and the changes are reflected in the default branch, Dependabot will send a request to the webhook with the action being “resolve.” The pipeline can extract the affected package name from the payload and check if the issue is open in the repository. If yes, it will automatically close the issue and add a comment saying, “Vulnerability has been remediated, closing the issue.” Automatic Git Issue Closure Additionally, the pipeline can be configured to send a Slack alert whenever an issue is created or resolved based on the team’s requirements. This pipeline can work at either the repository level or organization level, tracking all the repositories inside an organization if the webhook is integrated at an organization level. Slack Alert Overall, implementing this pipeline can help developers stay compliant and ensure safety and security. However, it is important to follow best practices as a team to ensure the effectiveness of the pipeline. Don’t miss out on the next blog post about setting up the automation. Subscribe to my page and receive instant notifications as soon as I publish it, so you can stay ahead of the game and keep your skills sharp!
In this blog post, you will be using the aws-lambda-go library along with the AWS Go SDK v2 for an application that will process records from an Amazon SNS topic and store them in a DynamoDB table. You will also learn how to use Go bindings for AWS CDK to implement “Infrastructure-as-code” for the entire solution and deploy it with the AWS CDK CLI. The code is available on GitHub. Introduction Amazon Simple Notification Service (SNS) is a highly available, durable, and scalable messaging service that enables the exchange of messages between applications or microservices. It uses a publish/subscribe model where publishers send messages to topics, and subscribers receive messages from topics they are interested in. Clients can subscribe to the SNS topic and receive published messages using a supported endpoint type, such as Amazon Kinesis Data Firehose, Amazon SQS, AWS Lambda, HTTP, email, mobile push notifications, and mobile text messages (SMS). AWS Lambda and Amazon SNS integration enable developers to build event-driven architectures that can scale automatically and respond to changes in real time. When a new message is published to an SNS topic, it can trigger a Lambda function (Amazon SNS invokes your function asynchronously with an event that contains a message and metadata) which can perform a set of actions, such as processing the message, storing data in a database, sending emails or SMS messages, or invoking other AWS services. Prerequisites Before you proceed, make sure you have the Go programming language (v1.18 or higher) and AWS CDK installed. Clone the project and change it to the right directory: Shell git clone https://github.com/abhirockzz/sns-lambda-events-golang cd sns-lambda-events-golang Use CDK To Deploy the Solution To start the deployment, simply invoke cdk deploy and wait for a bit. You will see a list of resources that will be created and will need to provide your confirmation to proceed. Shell cd cdk cdk deploy # output Bundling asset SNSLambdaGolangStack/sns-function/Code/Stage... ✨ Synthesis time: 5.94s This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening). Please confirm you intend to make the following modifications: //.... omitted Do you wish to deploy these changes (y/n)? y This will start creating the AWS resources required for our application. If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder. You can keep track of the progress in the terminal or navigate to AWS console: CloudFormation > Stacks > SNSLambdaGolangStack Once all the resources are created, you can try out the application. You should have: A Lambda function A SNS topic A DynamoDB table Along with a few other components (like IAM roles etc.) Verify the Solution You can check the table and SNS info in the stack output (in the terminal or the Outputs tab in the AWS CloudFormation console for your Stack): Send few messages to the SNS topic. For the purposes of this demo, you can use the AWS CLI: Shell export SNS_TOPIC_ARN=<enter the queue url from cloudformation output> aws sns publish --topic-arn $SNS_TOPIC_ARN --message "user1@foo.com" --message-attributes 'name={DataType=String, StringValue="user1"}, city={DataType=String,StringValue="seattle"}' aws sns publish --topic-arn $SNS_TOPIC_ARN --message "user2@foo.com" --message-attributes 'name={DataType=String, StringValue="user2"}, city={DataType=String,StringValue="new delhi"}' aws sns publish --topic-arn $SNS_TOPIC_ARN --message "user3@foo.com" --message-attributes 'name={DataType=String, StringValue="user3"}, city={DataType=String,StringValue="new york"}' You can also use the AWS console to send SQS messages. Check the DynamoDB table to confirm that the file metadata has been stored. You can use the AWS console or the AWS CLI aws dynamodb scan --table-name <enter the table name from cloudformation output> Don’t Forget To Clean Up Once you’re done, to delete all the services, simply use: Shell cdk destroy #output prompt (choose 'y' to continue) Are you sure you want to delete: SQSLambdaGolangStack (y/n)? You were able to setup and try the complete solution. Before we wrap up, let’s quickly walk through some of important parts of the code to get a better understanding of what’s going the behind the scenes. Code Walk Through Some of the code (error handling, logging etc.) has been omitted for brevity since we only want to focus on the important parts. CDK You can refer to the CDK code here. We start by creating a DynamoDB table: Shell table := awsdynamodb.NewTable(stack, jsii.String("dynamodb-table"), &awsdynamodb.TableProps{ PartitionKey: &awsdynamodb.Attribute{ Name: jsii.String("email"), Type: awsdynamodb.AttributeType_STRING}, }) table.ApplyRemovalPolicy(awscdk.RemovalPolicy_DESTROY) Then, we handle the Lambda function (CDK will take care of building and deploying the function) and make sure we provide it appropriate permissions to write to the DynamoDB table. Shell function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("sns-function"), &awscdklambdagoalpha.GoFunctionProps{ Runtime: awslambda.Runtime_GO_1_X(), Environment: &map[string]*string{"TABLE_NAME": table.TableName()}, Entry: jsii.String(functionDir), }) table.GrantWriteData(function) Then, we create the SNS topic and add that as an event source to the Lambda function. Shell snsTopic := awssns.NewTopic(stack, jsii.String("sns-topic"), nil) function.AddEventSource(awslambdaeventsources.NewSnsEventSource(snsTopic, nil)) Finally, we export the SNS topic and DynamoDB table name as CloudFormation outputs. Shell awscdk.NewCfnOutput(stack, jsii.String("sns-topic-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("sns-topic-name"), Value: snsTopic.TopicName()}) awscdk.NewCfnOutput(stack, jsii.String("dynamodb-table-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("dynamodb-table-name"), Value: table.TableName()}) Lambda Function You can refer to the Lambda Function code here. The Lambda function handler iterates over each SNS topic, and for each of them: Stores the message body in the primary key attribute (email) of the DynamoDB table Rest of the message attributes are stored as is. Shell func handler(ctx context.Context, snsEvent events.SNSEvent) { for _, record := range snsEvent.Records { snsRecord := record.SNS item := make(map[string]types.AttributeValue) item["email"] = &types.AttributeValueMemberS{Value: snsRecord.Message} for attrName, attrVal := range snsRecord.MessageAttributes { fmt.Println(attrName, "=", attrVal) attrValMap := attrVal.(map[string]interface{}) dataType := attrValMap["Type"] val := attrValMap["Value"] switch dataType.(string) { case "String": item[attrName] = &types.AttributeValueMemberS{Value: val.(string)} } } _, err := client.PutItem(context.Background(), &dynamodb.PutItemInput{ TableName: aws.String(table), Item: item, }) } } Wrap Up In this blog, you saw an example of how to use Lambda to process messages sent to SNS and store them in DynamoDB, thanks to the SNS and Lamdba integration. The entire infrastructure life-cycle was automated using AWS CDK. All this was done using the Go programming language, which is well-supported in DynamoDB, AWS Lambda, and AWS CDK. Happy building!
Building a cluster of single-board mini-computers is an excellent way to explore and learn about distributed computing. With the scarcity of Raspberry Pi boards, and the prices starting to get prohibitive for some projects, alternatives such as Orange Pi have gained popularity. In this article, I’ll show you how to build a (surprisingly cheap) 4-node cluster packed with 16 cores and 4GB RAM to deploy a MariaDB replicated topology that includes three database servers and a database proxy, all running on a Docker Swarm cluster and automated with Ansible. This article was inspired by a member of the audience who asked my opinion about Orange Pi during a talk I gave in Colombia. I hope this completes the answer I gave you. What Is a Cluster? A cluster is a group of computers that work together to achieve a common goal. In the context of distributed computing, a cluster typically refers to a group of computers that are connected to each other and work together to perform computation tasks. Building a cluster allows you to harness the power of multiple computers to solve problems that a single computer cannot handle. For example, a database can be replicated in multiple nodes to achieve high availability—if one node fails, other nodes can take over. It can also be used to implement read/write splitting to make one node handle writes, and another reads in order to achieve horizontal scalability. What Is Orange Pi Zero2? The Orange Pi Zero2 is a small single-board computer that runs on the ARM Cortex-A53 quad-core processor. It has 512MB or 1GB of DDR3 RAM, 100Mbps Ethernet, Wi-Fi, and Bluetooth connectivity. The Orange Pi Zero2 is an excellent choice for building a cluster due to its low cost, small size, and good performance. The only downside I found was that the Wi-Fi connection didn’t seem to perform as well as with other single-board computers. From time to time, the boards disconnect from the network, so I had to place them close to a Wi-Fi repeater. This could be a problem with my setup or with the boards. I’m not entirely sure. Having said that, this is not a production environment, so it worked pretty well for my purposes. What You Need Here are the ingredients: Orange Pi Zero2: I recommend the 1GB RAM variant and try to get at least 4 of them. I recently bought 4 of them for €30 each. Not bad at all! Give it a try! MicroSD cards: One per board. Try to use fast ones — it will make quite a difference in performance! I recommend at least 16GB. For reference, I used SanDisk Extreme Pro Micro/SDXC with 32GB, which offers a write speed of 90 MB/s and reads at 170 MB/s. A USB power hub: To power the devices, I recommend a dedicated USB power supply. You could also just use individual chargers, but the setup will be messier and require a power strip with as many outlets as devices as you have. It’s better to use a USB multi-port power supply. I used an Anker PowerPort 6, but there are also good and cheaper alternatives. You’ll have to Google this too. Check that each port can supply 5V and at least 2.4A. USB cables: Each board needs to be powered via a USB-C port. You need a cable with one end of type USB-C and the other of the type your power hub accepts. Bolts and nuts: To stack up the boards. Heat sinks (optional): These boards can get hot. I recommend getting heat sinks to help with heat dissipation. Materials needed for building an Orange Pi Zero2 cluster Assembling the Cluster One of the fun parts of building this cluster is the physical assembly of the boards on a case or some kind of structure that makes them look like a single manageable unit. Since my objective here is to keep the budget as low as possible, I used cheap bolts and nuts to stack the boards one on top of the other. I didn’t find any ready-to-use cluster cases for the Orange Pi Zero2. One alternative is to 3D-print your own case. When stacking the boards together, keep an eye on the antenna placement. Avoid crushing the cable, especially if you installed heat sinks. An assembled Orange Pi Zero2 cluster with 4 nodes Installing the Operating System The second step is to install the operating system on each microSD card. I used Armbian bullseye legacy 4.9.318. Download the file and use a tool like balenaEtcher to make bootable microSD cards. Download and install this tool on your computer. Select the Armbian image file and the drive that corresponds to the micro SD card. Flash the image and repeat the process for each micro SD card. Configuring Orange Pi WiFi Connection (Headless) To configure the Wi-Fi connection, Armbian includes the /boot/armbian_first_run.txt.template file which allows you to configure the operating system when it runs for the first time. The template includes instructions, so it’s worth checking. You have to rename this file to armbian_first_run.txt. Here’s what I used: Plain Text FR_general_delete_this_file_after_completion=1 FR_net_change_defaults=1 FR_net_ethernet_enabled=0 FR_net_wifi_enabled=1 FR_net_wifi_ssid='my_connection_id>' FR_net_wifi_key='my_password' FR_net_wifi_countrycode='FI' FR_net_use_static=1 FR_net_static_gateway='192.168.1.1' FR_net_static_mask='255.255.255.0' FR_net_static_dns='192.168.1.1 8.8.8.8' FR_net_static_ip='192.168.1.181' Use your own Wi-Fi details, including connection name, password, country code, gateway, mask, and DNS. I wasn’t able to read the SD card from macOS. I had to use another laptop with Linux on it to make the changes to the configuration file on each SD card. To mount the SD card on Linux, run the following command before and after inserting the SD card and see what changes: Shell sudo fdisk -l I created a Bash script to automate the process. The script accepts as a parameter the IP to set. For example: Shell sudo ./armbian-setup.sh 192.168.1.181 I run this command on each of the four SD cards changing the IP address from 192.168.1.181 to 192.168.1.184. Connecting Through SSH Insert the flashed and configured micro SD cards on each board and turn the power supply on. Be patient! Give the small devices time to boot. It can take several minutes the first time you boot them. An Orange Pi cluster running Armbian Use the ping command to check whether the devices are ready and connected to the network: Shell ping 192.168.1.181 Once they respond, connect to the mini-computers through SSH using the root user and the IP address that you configured. For example: Shell ssh root@192.168.1.181 The default password is: Plain Text 1234 You’ll be presented with a wizard-like tool to complete the installation. Follow the steps to finish the configuration and repeat the process for each board. Installing Ansible Imagine you want to update the operating system on each machine. You’ll have to log into a machine and run the update command and end the remote session. Then repeat for each machine in the cluster. A tedious job even if you have only 4 nodes. Ansible is an automation tool that allows you to run a command on multiple machines using a single call. You can also create a playbook, a file that contains commands to be executed in a set of machines defined in an inventory. Install Ansible on your working computer and generate a configuration file: Shell sudo su ansible-config init --disabled -t all > /etc/ansible/ansible.cfg exit In the /etc/ansible/ansible.cfg file, set the following properties (enable them by removing the semicolon): Plain Text host_key_checking=False become_allow_same_user=True ask_pass=True This will make the whole process easier. Never do this in a production environment! You also need an inventory. Edit the /etc/ansible/hosts file and add the Orange Pi nodes as follows: Plain Text ############################################################################## # 4-node Orange Pi Zero 2 cluster ############################################################################## [opiesz] 192.168.1.181 ansible_user=orangepi hostname=opiz01 192.168.1.182 ansible_user=orangepi hostname=opiz02 192.168.1.183 ansible_user=orangepi hostname=opiz03 192.168.1.184 ansible_user=orangepi hostname=opiz04 [opiesz_manager] opiz01.local ansible_user=orangepi [opiesz_workers] opiz[02:04].local ansible_user=orangepi In the ansible_user variable, specify the username that you created during the installation of Armbian. Also, change the IP addresses if you used something different. Setting up a Cluster With Ansible Playbooks A key feature of a computer cluster is that the nodes should be somehow logically interconnected. Docker Swarm is a container orchestration tool that will convert your arrangement of Orange Pi devices into a real cluster. You can later deploy any kind of server software. Docker Swarm will automatically pick one of the machines to host the software. To make the process easier, I have created a set of Ansible playbooks to further configure the boards, update the packages, reboot or power off the machines, install Docker, set up Docker Swarm, and even install a MariaDB database with replication and a database cluster. Clone or download this GitHub repository: Shell git clone https://github.com/alejandro-du/orange-pi-zero-cluster-ansible-playbooks.git Let’s start by upgrading the Linux packages on all the boards: Shell ansible-playbook upgrade.yml --ask-become-pass Now configure the nodes to have an easy-to-remember hostname with the help of Avahi, and configure the LED activity (red LED activates on SD card activity): Shell ansible-playbook configure-hosts.yml --ask-become-pass Reboot all the boards: Shell ansible-playbook reboot.yml --ask-become-pass Install Docker: Shell ansible-playbook docker.yml --ask-become-pass Set up Docker Swarm: Shell ansible-playbook docker-swarm.yml --ask-become-pass Done! You have an Orange Pi cluster ready for fun! Deploying MariaDB on Docker Swarm I have to warn you here. I don’t recommend running a database on container orchestration software. That’s Docker Swarm, Kubernetes, and others. Unless you are willing to put a lot of effort into it. This article is a lab. A learning exercise. Don’t do this in production! Now let’s get back to the fun… Run the following to deploy one MariaDB primary server, two MariaDB replica servers, and one MaxScale proxy: Shell ansible-playbook mariadb-stack.yml --ask-become-pass The first time you do this, it will take some time. Be patient. SSH into the manager node: Shell ssh orangepi@opiz01.local Inspect the nodes in the Docker Swarm cluster: Shell docker node ls Inspect the MariaDB stack: Shell docker stack ps mariadb A cooler way to inspect the containers in the cluster is by using the Docker Swarm Visualizer. Deploy it as follows: Shell docker service create --name=viz --publish=9000:8080 --constraint=node.role==manager --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock alexellis2/visualizer-arm:latest On your working computer, open a web browser and go to this URL. You should see all the nodes in the cluster and the deployed containers. Docker Swarm Visualizer showing MariaDB deployed MaxScale is an intelligent database proxy with tons of features. For now, let’s see how to connect to the MariaDB cluster through this proxy. Use a tool like DBeaver, DbGate, or even a database extension for your favorite IDE. Create a new database connection using the following connection details: Host: opiz01.local Port: 4000 Username: user Password: password Create a new table: MariaDB SQL USE demo; CREATE TABLE messages( id INT PRIMARY KEY AUTO_INCREMENT, content TEXT NOT NULL ); Insert some data: MariaDB SQL INSERT INTO messages(content) VALUES ("It works!"), ("Hello, MariaDB"), ("Hello, Orange Pi"); When you execute this command, MaxScale sends it to the primary server. Now read the data: MariaDB SQL SELECT * FROM messages; When you execute this command, MaxScale sends it to one of the replicas. This division of reads and writes is called read-write splitting. The MaxScale UI showing a MariaDB cluster with replication and read-write splitting You can also access the MaxScale UI. Use the following credentials: Username: admin Password: mariadb Watch the following video if you want to learn more about MaxScale and its features. You won’t regret it!
With the growth of the application modernization demands, monolithic applications were refactored to cloud-native microservices and serverless functions with lighter, faster, and smaller application portfolios for the past years. This was not only about rewriting applications, but the backend data stores were also redesigned in terms of dynamic scalability, high performance, and flexibility for event-driven architecture. For example, traditional data structures in relational databases started to move forward to a new approach that enables to storage and retrieval of key-value and document data structures using NoSQL databases. However, faster modernization presents more challenges for Java developers in terms of steep learning curves about new technologies adoption and retaining current skillsets with experience. For instance, Java developers need to rewrite all existing Java applications to Golang and JavaScript for new serverless functions and learn new APIs or SDKs to process dynamic data records by new modernized serverless applications. This article will take you through a step-by-step tutorial to learn how Quarkus enables Java developers to implement serverless functions on AWS Lambda to process dynamic data on AWS DynamoDB. Quarkus enables developers not only to optimize Java applications for superfast startup time (e.g., milliseconds) and tiny memory footprints (e.g., less than 100 MB) for serverless applications, but developers can also use more than XX AWS extensions to deploy Java applications to AWS Lambda and access AWS DynamoDB directly without steep learning curves. Creating a New Serverless Java Project Using Quarkus We’ll use the Quarkus command to generate a new project with required files such as Maven Wrapper, Dockerfiles, configuration properties, and sample code. Find more information about the benefits of the Quarkus command (CLI) here. Run the following Quarkus command in your working directory. Shell quarkus create piggybank --java=17 You need to use the JDK 17 version since AWS Lambda currently supports JDK 17 as the latest version by default Java runtime (Corretto). Let’s start Quarkus Live Coding, also known as quarkus dev mode, using the following command. Shell cd piggybank && quarkus dev mode Developing Business Logic for Piggybank Now let's add a couple of Quarkus extensions to create a DynamoDB entity and relevant abstract services using the following Quarkus command in the Piggybank directory. Shell quarkus ext add amazon-dynamodb resteasy-reactive-jackson The output should look like this. Java [SUCCESS] ✅ Platform io.quarkus.platform:quarkus-amazon-services-bom has been installed [SUCCESS] ✅ Extension io.quarkiverse.amazonservices:quarkus-amazon-dynamodb has been installed [SUCCESS] ✅ Extension io.quarkus:quarkus-resteasy-reactive-jackson has been installed Creating an Entity Class You will create a new data model (entry.java) file to define Java attributes that map into the fields in DynamoDB. The Java class should look like the following code snippet (you can find the solution in the GitHub repository): Java @RegisterForReflection public class Entry { public Long timestamp; public String accountID; ... public Entry() {} public static Entry from(Map<String, AttributeValue> item) { Entry entry = new Entry(); if (item != null && !item.isEmpty()) { entry.setAccountID(item.get(AbstractService.ENTRY_ACCOUNTID_COL).s()); ... } return entry; } ... } The @RegisterForReflectionannotation instructs Quarkus to keep the class and its members during the native compilation. Find more information here. Creating an Abstract Service Now you will create a new AbstractService.java file to consist of helper methods that prepare DynamoDB to request objects for reading and adding items to the table. The code snippet should look like this (find the solution in the GitHub repository): Java public class AbstractService { public String accountID; ... public static final String ENTRY_ACCOUNTID_COL = "accountID"; ... public String getTableName() { return "finance"; } protected ScanRequest scanRequest() { return ScanRequest.builder().tableName(getTableName()) .attributesToGet(ENTRY_ACCOUNTID_COL, ENTRY_DESCRIPTION_COL, ENTRY_AMOUNT_COL, ENTRY_BALANCE_COL, ENTRY_DATE_COL, ENTRY_TIMESTAMP, ENTRY_CATEGORY).build(); } ... } Adding a Business Layer for REST APIs Create a new EntryService.java file to extend the AbstractService class that will be the business layer of your application. This logic will store and retrieve the entry data from DynamoDB synchronously. The code snippet should look like this (solution in the GitHub repository): Java @ApplicationScoped public class EntryService extends AbstractService { @Inject DynamoDbClient dynamoDB; public List<Entry> findAll() { List<Entry> entries = dynamoDB.scanPaginator(scanRequest()).items().stream() .map(Entry::from) .collect(Collectors.toList()); entries.sort((e1, e2) -> e1.getDate().compareTo(e2.getDate())); BigDecimal balance = new BigDecimal(0); for (Entry entry : entries) { balance = balance.add(entry.getAmount()); entry.setBalance(balance); } return entries; } ... } Creating REST APIs Now you'll create a new EntryResource.java file to implement REST APIs to get and post the entry data from and to DynamoDB. The code snippet should look like the below (solution in the GitHub repository): Java @Path("/entryResource") public class EntryResource { SimpleDateFormat piggyDateFormatter = new SimpleDateFormat("yyyy-MM-dd+HH:mm"); @Inject EntryService eService; @GET @Path("/findAll") public List<Entry> findAll() { return eService.findAll(); } ... } Verify the Business Services Locally First, we need to install a local DynamoDB that the piggy bank services access. There’re a variety of ways to stand up a local DynamoDB such as downloading an executable .jar file, running a container image, and deploying by Apache Maven repository. Today, you will use the Docker compose to install and run DynamoDB locally. Find more information here. Create the following docker-compose.yml file in your local environment. YAML version: '3.8' services: dynamodb-local: command: "-jar DynamoDBLocal.jar -sharedDb -dbPath ./data" image: "amazon/dynamodb-local:latest" container_name: dynamodb-local ports: - "8000:8000" volumes: - "./docker/dynamodb:/home/dynamodblocal/data" working_dir: /home/dynamodblocal Then, run the following command-line command. Shell docker-compose up The output should look like this. Shell [+] Running 2/2 ⠿ Network quarkus-piggybank_default Created 0.0s ⠿ Container dynamodb-local Created 0.1s Attaching to dynamodb-local dynamodb-local | Initializing DynamoDB Local with the following configuration: dynamodb-local | Port: 8000 dynamodb-local | InMemory: false dynamodb-local | DbPath: ./data dynamodb-local | SharedDb: true dynamodb-local | shouldDelayTransientStatuses: false dynamodb-local | CorsParams: null dynamodb-local | Creating an Entry Table Locally Run the following AWS DynamoDB API command to create a new entry table in the running DynamoDB container. Shell aws dynamodb create-table --endpoint-url http://localhost:8000 --table-name finance --attribute-definitions AttributeName=accountID,AttributeType=S AttributeName=timestamp,AttributeType=N --key-schema AttributeName=timestamp,KeyType=HASH AttributeName=accountID,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 --table-class STANDARD Adding DynamoDB Clients Configurations DynamoDB clients are configurable in the application.properties programmatically. You also need to add to the classpath a proper implementation of the sync client. By default, the extension uses the java.net.URLConnection HTTP client. Open the pom.xml file and copy the following dependency right after the quarkus-amazon-dynamodb dependency. XML <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>url-connection-client</artifactId> </dependency> Then, add the following key and value to the application.properties to specify your local DynamoDB's endpoint. Java %dev.quarkus.dynamodb.endpoint-override=http://localhost:8000 Starting Quarkus Live Coding Now you should be ready to verify the Piggybank application using Quarkus Dev mode and local DynamoDB. Run the Quarkus Dev mode using the following Quarkus command. Shell quarkus dev The output should end up this. Shell __ ____ __ _____ ___ __ ____ ______ --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \ --\___\_\____/_/ |_/_/|_/_/|_|\____/___/ [io.quarkus] (Quarkus Main Thread) Profile dev activated. Live Coding activated.quarkus xx.xx.xx.) s2023-04-30 21:14:49,824 INFO [io.quarkus] (Quarkus Main Thread) Installed features: [amazon-dynamodb, cdi, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertx] -- Tests paused Press [r] to resume testing, [o] Toggle test output, [:] for the terminal, [h] for more options> Run the following curl command to insert several expense items into the piggybank account (entry table). Shell curl -X POST http://localhost:8080/entryResource -H 'Content-Type: application/json' -d '{"accountID": "Food", "description": "Shrimp", "amount": "-20", "balance": "0", "date": "2023-02-01"}' curl -X POST http://localhost:8080/entryResource -H 'Content-Type: application/json' -d '{"accountID": "Car", "description": "Flat tires", "amount": "-200", "balance": "0", "date": "2023-03-01"}' curl -X POST http://localhost:8080/entryResource -H 'Content-Type: application/json' -d '{"accountID": "Payslip", "description": "Income", "amount": "2000", "balance": "0", "date": "2023-04-01"}' curl -X POST http://localhost:8080/entryResource -H 'Content-Type: application/json' -d '{"accountID": "Utilities", "description": "Gas", "amount": "-400", "balance": "0", "date": "2023-05-01"}' Verify the stored data using the following command. Shell curl http://localhost:8080/entryResource/findAll The output should look like this. JSON [{"accountID":"Food","description":"Shrimp","amount":"-20","balance":"-30","date":"2023-02-01"},{"accountID":"Drink","description":"Wine","amount":"-10","balance":"-10","date":"2023-01-01"},{"accountID":"Payslip","description":"Income","amount":"2000","balance":"1770","date":"2023-04-01"},{"accountID":"Car","description":"Flat tires","amount":"-200","balance":"-230","date":"2023-03-01"},{"accountID":"Utilities","description":"Gas","amount":"-400","balance":"1370","date":"2023-05-01"}] You can also find a certain expense based on accountID. Run the following curl command again. Shell curl http://localhost:8080/entryResource/find/Drink The output should look like this. JSON {"accountID":"Drink","description":"Wine","amount":"-10","balance":"-10","date":"2023-01-01"} Conclusion You learned how Quarkus enables developers to write serverless functions that connect NoSQL databases to process dynamic data. To stand up local development environments, you quickly ran the local DynamoDB image using the docker-compose command as well. Quarkus also provide various AWS extensions including amazon-dynamodb to access the AWS cloud services directly from your Java applications. Find more information here. In the next article, you’ll learn how to create a serverless database using AWS DynamoDB and build and deploy your local serverless Java functions to AWS Lambda by enabling SnapStart.
Caches are very useful software components that all engineers must know. It is a transversal component that applies to all the tech areas and architecture layers such as operating systems, data platforms, backend, frontend, and other components. In this article, we are going to describe what is a cache and explain specific use cases focusing on the frontend and client side. What Is a Cache? A cache can be defined in a basic way as an intermediate memory between the data consumer and the data producer that stores and provides the data that will be accessed many times by the same/different consumers. It is a transparent layer for the data consumer in terms of user usability except to improve performance. Usually, the reusability of data provided by the data producer is the key to taking advantage of the benefits of a cache. Performance is the other reason to use a cache system such as in-memory databases to provide a high-performance solution with low latency, high throughput, and concurrency. For example, how many people query the weather on a daily basis and how many times do they repeat the same query? Let's suppose that there are 1,000 people in New York consulting the weather and 50% repeat the same query twice per day. In this scenario, if we can store the first query as close as possible to the user's device, we achieve two benefits increase the user experience because the data is provided faster and reduce the number of queries to the data producer/server side. The output is a better user experience and a solution that will support more concurrent users using the platform. At a high level, there are two caching strategies that we can apply in a complementary way: Client/Consumer Side: The data cached is stored on the consumer or user side, usually in the browser's memory when we are talking about web solutions (also called private cache). Server/Producer Side: The data cached is stored in the components of the data producer architecture. Caches like any other solution have a series of advantages that we are going to summarize: Application performance: Provide faster response times because can serve data more quickly. Reduce load on the server side: When we apply caches to the previous system and reuse a piece of data, we are avoiding queries/requests to the following layer. Scalability and cost improvement: As data caching gets closer to the consumer, we increase the scalability and performance of the solution at a lower cost. Components closer to the client side are more scalable and cheaper because three main reasons: These components are focused on performance and availability but have poor consistency. They have only part of the information: the data used more by the users. In the case of the browser's local cache, there is no cost for the data producer. The big challenges of cache are data consistency and data freshness, which means how the data is synchronized and up-to-date across the organization. Depending on the use case, we will have more or fewer requirements restrictions because it is so different from caching images than the inventory stock or sales behavior. Client-Side Caches Speaking about the client-side cache, we can have different types of cache that we are going to analyze a little bit in this article: HTTP Caching: This caching type is an intermediate cache system, as it depends partially on the server. Cache API: This is a browser API(s) that allows us to cache requests in the browser. Custom Local Cache: The front-end app controls the cache storage, expiration, invalidation, and update. HTTP Caching It caches the HTTP requests for any resource (CSS, HTML, images, video, etc.) in the browsers, and it manages all related to storage, expiration, validation, fetch, etc., from the front end. The application’s point of view is almost transparent as it makes a request in a regular way and the browser does all the “magic." The way of controlling the caching is by using HTTP Headers, in the server side, it adds cache-specific headers to the HTTP response, for example: "Expires: Tue, 30 Jul 2023 05:30:22 GMT," then the browser knows this resource can be cached, and the next time the client (application) requests the same resource if the request time is before the expiration date the request will not be done, the browser will return the local copy of the resource. It allows you to set the way the responses are disguised, as the same URL can generate different responses (and their cache should be handled in a different way). For example, in an API endpoint that returns some data (i.e., http://example.com/my-data) we could use the request header Content-type to specify if we want the response in JSON or CSV, etc. Therefore, the cache should be stored with the response depending on the request header(s). For that, the server should set the response header Vary: Accept-Language to let the browser know the cache depends on that value. There are a lot of different headers to control the cache flow and behavior, but it is not the goal of this article to go deep into it. It will probably be addressed in another article. As we mentioned before, this caching type needs the server to set the resources expiration, validation, etc. So this is not a pure frontend caching method or type, but it’s one of the simplest ways to cache the resources the front-end application uses, and it is complementary to the other way we will mention down below. Related to this cache type, as it is an intermediate cache, we can even delegate it in a “piece” between the client and the server; for example, a CDN, a reverse proxy (for example Varnish), etc. Cache API It is quite similar to the HTTP caching method, but in this case, we control which requests are stored or extracted from the cache. We have to manage the cache expiration (and it’s not easy, because those caches were thought to live “forever”). Even if these APIs are available in the windowed contexts are very oriented to their usage in a worker context. This cache is very oriented to use for offline applications. On the first request, we can get and cache all the resources need it (images, CSS, JS, etc.), allowing the application to work offline. It is very useful in mobile applications, for example with the use of maps for our GPS systems in addition to weather data. This allows us to have all the information for our hiking route even if we have no connection to the server. One example of how it works in a windowed context: const url = ‘https://catfact.ninja/breeds’ caches.open('v1').then((cache) => { cache.match((url).then((res) => { if (res) { console.log('it is in cache') console.log(res.json()) } else { console.log('it is NOT in cache') fetch(url) .then(res => { cache.put('test', res.clone()) }) } }) }) Custom Local Cache In some cases, we will need more control over the cached data and the invalidation (not just expiration). Cache invalidation is more than just checking the max-age of a cache entry. Imagine the weather app we mentioned above. This app allows the users to update the weather to reflect the real weather in a place. The app needs to do a request per city and transform the temperature values from F to ºC (this is a simple example: calculations can be more expensive in other use cases). To avoid doing requests to the server (even if it’s cached), we can do all the requests the first time, put all the data together in a data structure convenient for us, and store it in, for example in the browser’s IndexedDB, in the LocalStorage, SessionStorage or even in memory (not recommended). The next time we want to show the data, we can get it from the cache, not just the resource data (even the computation we did), saving network and computation time. We can control the expiration of the caches by adding the issue time next to the API, and we can also control the cache invalidation. Imagine now that the user adds a new cat in its browser. We can just invalidate the cache and do the requests and calculations next time, or go further, updating our local cache with the new data. Or, another user can change the value, and the server will send an event to notify the change to all clients. For example, using WebSockets, our front-end application can hear these events and invalidate the cache or just update the cache. This kind of cache requires work on our side to check the caches and handle events that can invalidate or update it, etc., but fits very well in a hexagonal architecture where the data is consumed from the API using a port adaptor (repository) that can hear domain events to react to the changes and invalidate or update some caches. This is not a cache generic solution. We need to think if it fits our use case as it requires work on the front-end application side to invalidate the caches or to emit and handle data change events. In most cases, the HTTP caching is enough. Conclusion Having a cache solution and good strategy should be a must in any software architecture, but our solution will be incomplete and probably not optimized. Caches are our best friends mostly in high-performance scenarios. It seems that the technical invalidation cache process is the challenge, but the biggest challenge is to understand the business scenarios and uses cases to identify what are the requirements in terms of data freshness and consistency that allow us to design and choose the best strategy. We will talk about other cache approaches for databases, backend, and in-memory databases in the next articles.
Agile Transformation — Gaining Executive Leadership Support
May 5, 2023 by
Three Tips for Success as Product and Engineering Grow
May 4, 2023 by
Why Developers Should Care About FinOps
May 3, 2023 by
Choosing the Right Azure Storage Service
May 5, 2023 by