Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.
Low-Code Development: Learn the concepts of low code, features + use cases for professional devs, and the low-code implementation process.
The latest and popular trending topics on DZone.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Java is an object-oriented programming language that allows engineers to produce software for multiple platforms. Our resources in this Zone are designed to help engineers with Java program development, Java SDKs, compilers, interpreters, documentation generators, and other tools used to produce a complete application.
JavaScript (JS) is an object-oriented programming language that allows engineers to produce and implement complex features within web browsers. JavaScript is popular because of its versatility and is preferred as the primary choice unless a specific function is needed. In this Zone, we provide resources that cover popular JS frameworks, server applications, supported data types, and other useful topics for a front-end engineer.
Open source refers to non-proprietary software that allows anyone to modify, enhance, or view the source code behind it. Our resources enable programmers to work or collaborate on projects created by different teams, companies, and organizations.
New Profiles Now on DZone!
A Roadmap to True Observability
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report Employing cloud services can incur a great deal of risk if not planned and designed correctly. In fact, this is really no different than the challenges that are inherit within a single on-premises data center implementation. Power outages and network issues are common examples of challenges that can put your service — and your business — at risk. For AWS cloud service, we have seen large-scale regional outages that are documented on the AWS Post-Event Summaries page. To gain a broader look at other cloud providers and services, the danluu/post-mortems repository provides a more holistic view of the cloud in general. It's time for service owners relying (or planning) on a single region to think hard about the best way to design resilient cloud services. While I will utilize AWS for this article, it is solely because of my level of expertise with the platform and not because one cloud platform should be considered better than another. A Single-Region Approach Is Doomed to Fail A cloud-based service implementation can be designed to leverage multiple availability zones. Think of availability zones as distinct locations within a specific region, but they are isolated from other availability zones in that region. Consider the following cloud-based service running on AWS inside the Kubernetes platform: Figure 1: Cloud-based service utilizing Kubernetes with multiple availability zones In Figure 1, inbound requests are handled by Route 53, arrive at a load balancer, and are directed to a Kubernetes cluster. The controller routes requests to the service that has three instances running, each in a different availability zone. For persistence, an Aurora Serverless database has been adopted. While this design protects from the loss of one or two availability zones, the service is considered at risk when a region-wide outage occurs, similar to the AWS outage in the US-EAST-1 region on December 7th, 2021. A common mitigation strategy is to implement stand-by patterns that can become active when unexpected outages occur. However, these stand-by approaches can lead to bigger issues if they are not consistently participating by handling a portion of all requests. Transitioning to More Than Two With single-region services at risk, it's important to understand how to best proceed. For that, we can draw upon the simple example of a trucking business. If you have a single driver who operates a single truck, your business is down when the truck or driver is unable to fulfill their duties. The immediate thought here is to add a second truck and driver. However, the better answer is to increase the fleet by two, which allows for an unexpected issue to complicate the original situation. This is known as the "n + 2" rule, which becomes important when there are expectations set between you and your customers. For the trucking business, it might be a guaranteed delivery time. For your cloud-based service, it will likely be measured in service-level objectives (SLOs) and service-level agreements (SLAs). It is common to set SLOs as four nines, meaning your service is operating as expected 99.99% of the time. This translates to the following error budgets, or down time, for the service: Month = 4 minutes and 21 seconds Week = 1 minute and 0.48 seconds Day = 8.6 seconds If your SLAs include financial penalties, the importance of implementing the n + 2 rule becomes critical to making sure your services are available in the wake of an unexpected regional outage. Remember, that December 7, 2021 outage at AWS lasted more than eight hours. The cloud-based service from Figure 1 can be expanded to employ a multi-region design: Figure 2: Multi-region cloud-based service utilizing Kubernetes and multiple availability zones With a multi-region design, requests are handled by Route 53 but are directed to the best region to handle the request. The ambiguous term "best" is used intentionally, as the criteria could be based upon geographical proximity, least latency, or both. From there, the in-region Kubernetes cluster handles the request — still with three different availability zones. Figure 2 also introduces the observability layer, which provides the ability to monitor cloud-based components and establish SLOs at the country and regional levels. This will be discussed in more detail shortly. Getting Out of the Toil Game Google Site Reliability Engineering's Eric Harvieux defined toil as noted below: "Toil is the kind of work that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows." When designing services that run in multiple regions, the amount of toil that exists with a single region becomes dramatically larger. Consider the example of creating a manager-approved change request every time code is deployed into the production instance. In the single-region example, the change request might be a bit annoying, but it is something a software engineer is willing to tolerate. Now, with two additional regions, this will translate to three times the amount of change requests, all with at least one human-based approval being required. An obtainable and desirable end-state should still include change requests, but these requests should become part of the continuous delivery (CD) lifecycle and be created automatically. Additionally, the observability layer introduced in Figure 2 should be leveraged by the CD tooling in order to monitor deployments — rolling back in the event of any unforeseen circumstances. With this approach, the need for human-based approvals is diminished, and unnecessary toil is removed from both the software engineer requesting the deployment and the approving manager. Harnessing the Power of Observability Observability platforms measure a system's state by leverage metrics, logs, and traces. This means that a given service can be measured by the outputs it provides. Leading observability platforms go a step further and allow for the creation of synthetic API tests that can be used to exercise resources for a given service. Tests can include assertions that introduce expectations — like a particular GET request will respond with an expected response code and payload within a given time period. Otherwise, the test will be marked as failed. SLOs can be attached to each synthetic test, and each test can be executed in multiple geographical locations, all monitored from the observability platform. Taking this approach allows service owners the ability to understand service performance from multiple entry points. With the multi-region model, tests can be created and performance thereby monitored at the regional and global levels separately, thus producing a high degree of certainty on the level of performance being produced in each region. In every case, the power of observability can justify the need for manual human-based change approvals as noted above. Bringing It All Together From the 10,000-foot level, the multiregion service implementation from Figure 2 can be placed onto a United States map. In Figure 3, the database connectivity is mapped to demonstrate the inner-region communication, while the observability and cloud metrics data are gathered from AWS and the observability platform globally. Figure 3: Multi-region service adoption placed near the respective AWS regions Service owners have peace of mind that their service is fully functional in three regions by implementing the n + 2 rule. In this scenario, the implementation is prepared to survive two complete region outages. As an example, the eight-hour AWS outage referenced above would not have an impact on the service's SLOs/ SLAs during the time when one of the three regions is unavailable. Charting a Plan Toward Multi-Region Implementing a multi-region footprint for your service without increasing toil is possible, but it does require planning. Some high-level action items are noted below: Understand your persistence layer – Understanding your persistence layer early on is key. If multiple-write regions are not a possibility, alternative approaches will be required. Adopt Infrastructure as Code – The ability to define your cloud infrastructure via code is critical to eliminate toil and increase the ability to adopt additional regions, or even zones. Use containerization – The underlying service is best when containerized. Build the container you wish to deploy during the continuous integration stage and scan for vulnerabilities within every layer of the container for added safety. Reduce time to deploy – Get into the habit of releasing often, as it only makes your team stronger. Establish SLOs and synthetics – Take the time to set SLOs for your service and write synthetic tests to constantly measure your service — across every environment. Automate deployments – Leverage observability during the CD stage to deploy when a merge-to-main event occurs. If a dev deploys and no alerts are emitted, move on to the next environment and continue all the way to production. Conclusion It's important to understand the limitations of the platform where your services are running. Leveraging a single region offered by your cloud provider is only successful when there are zero region-wide outages. Based upon prior history, this is no longer good enough and is certain to happen again. No cloud provider is ever going to be 100% immune from a region-wide outage. A better approach is to utilize the n + 2 rule and increase the number of regions your service is running in by two additional regions. In taking this approach, the service will still be able to respond to customer requests in the event of not only one regional outage but also any form of outage in a second region where the service is running. By adopting the n + 2 approach, there is a far better chance at meeting SLAs set with your customers. Getting to this point will certainly present challenges but should also provide the opportunity to cut down (or even eliminate) toil within your organization. In the end, your customers will benefit from increased service resiliency, and your team will benefit from significant productivity gains. Have a really great day! Resources AWS Post-Event Summaries, AWS Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region, AWS danluu/post-mortems, GitHub "Identifying and Tracking Toil Using SRE Principles" by Eric Harvieux, 2020 "Failure Recovery: When the Cure Is Worse Than the Disease" by Guo et al., 2013 This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report AIOps applies AI to IT operations, enabling agility, early issue detection, and proactive resolution to maintain service quality. AIOps integrates DataOps and MLOps, enhancing efficiency, collaboration, and transparency. It aligns with DevOps for application lifecycle management and automation, optimizing decisions throughout DataOps, MLOps, and DevOps. Observability for IT operations is a transformative approach that provides real-time insights, proactive issue detection, and comprehensive performance analysis, ensuring the reliability and availability of modern IT systems. Why AIOps Is Fundamental to Modern IT Operations AIOps streamlines operations by automating problem detection and resolution, leading to increased IT staff efficiency, outage prevention, improved user experiences, and optimized utilization of cloud technologies. The major contributions of AIOps are shared in Table 1: CONTRIBUTIONS OF AIOPS Key Functions Function Explanations Event correlation Uses rules and logic to filter and group event data, prioritizing service issues based on KPIs and business metrics. Anomaly detection Identifies normal and abnormal behavior patterns, monitoring multiple services to predict and mitigate potential issues. Automated incident management Aims to automate all standardized, high-volume, error-sensitive, audit-critical, repetitive, multi-person, and time-sensitive tasks. Meanwhile, it preserves human involvement in low ROI and customer support-related activities. Performance optimization Analyzes large datasets employing AI and ML, proactively ensuring service levels and identifying issue root causes. Enhanced collaboration Fosters collaboration between IT teams, such as DevOps, by providing a unified platform for monitoring, analysis, and incident response. Table 1 How Does AIOps Work? AIOps involves the collection and analysis of vast volumes of data generated within IT environments, such as network performance metrics, application logs, and system alerts. AIOps uses these insights to detect patterns and anomalies, providing early warnings for potential issues. By integrating with other DevOps practices, such as DataOps and MLOps, it streamlines processes, enhances efficiency, and ensures a proactive approach to problem resolution. AIOps is a crucial tool for modern IT operations, offering the agility and intelligence required to maintain service quality in complex and dynamic digital environments. Figure 1: How AIOps works Popular AIOps Platforms and Key Features Leading AIOps platforms are revolutionizing IT operations by seamlessly combining AI and observability, enhancing system reliability, and optimizing performance across diverse industries. The following tools are just a few of many options: Prometheus acts as an efficient AIOps platform by capturing time-series data, monitoring IT environments, and providing anomaly alerts. OpenNMS automatically discovers, maps, and monitors complex IT environments, including networks, applications, and systems. Shinken enables users to monitor and troubleshoot complex IT environments, including networks and applications. The key features of the platforms and the role they play in AIOps are shared in Table 2: KEY FEATURES OF AIOPS PLATFORMS AND THE CORRESPONDING TASKS Features Tasks Visibility Provides insight into the entire IT environment, allowing for comprehensive monitoring and analysis. Monitoring and management Monitors the performance of IT systems and manages alerts and incidents. Performance Measures and analyzes system performance metrics to ensure optimal operation. Functionality Ensures that the AIOps platform offers a range of functionalities to meet various IT needs. Issue resolution Utilizes AI-driven insights to address and resolve IT issues more effectively. Analysis Analyzes data and events to identify patterns, anomalies, and trends, aiding in proactive decision-making. Table 2 Observability's Role in IT Operations Observability plays a pivotal role in IT operations by offering the means to monitor, analyze, and understand the intricacies of complex IT systems. It enables continuous tracking of system performance, early issue detection, and root cause analysis. Observability data empowers IT teams to optimize performance, allocate resources efficiently, and ensure a reliable user experience. It supports proactive incident management, compliance monitoring, and data-driven decision-making. In a collaborative DevOps environment, observability fosters transparency and enables teams to work cohesively toward system reliability and efficiency. Data sources like logs, metrics, and traces play a crucial role in observability by providing diverse and comprehensive insights into the behavior and performance of IT systems. ROLES OF DATA SOURCES Logs Metrics Traces Event tracking Root cause analysis Anomaly detection Compliance and auditing Performance monitoring Threshold alerts Capacity planning End-to-end visibility Latency analysis Dependency mapping Table 3 Challenges of Observability Observability is fraught with multiple technical challenges. Accidental invisibility takes place where critical system components or behaviors are not being monitored, leading to blind spots in observability. The challenge of insufficient source data can result in incomplete or inadequate observability, limiting the ability to gain insights into system performance. Dealing with multiple information formats poses difficulties in aggregating and analyzing data from various sources, making it harder to maintain a unified view of the system. Popular Observability Platforms and Key Features Observability platforms offer a set of key capabilities essential for monitoring, analyzing, and optimizing complex IT systems. OpenObserve provides scheduled and real-time alerts and reduces operational costs. Vector allows users to collect and transform logs, metrics, and traces. The Elastic Stack — comprising Elasticsearch, Kibana, Beats, and Logstash — can search, analyze, and visualize data in real time. The capabilities of observability platforms include real-time data collection from various sources such as logs, metrics, and traces, providing a comprehensive view of system behavior. They enable proactive issue detection, incident management, root cause analysis, system reliability aid, and performance optimization. Observability platforms often incorporate machine learning for anomaly detection and predictive analysis. They offer customizable dashboards and reporting for in-depth insights and data-driven decision-making. These platforms foster collaboration among IT teams by providing a unified space for developers and operations to work together, fostering a culture of transparency and accountability. Leveraging AIOps and Observability for Enhanced Performance Analytics Synergizing AIOps and observability represents a cutting-edge strategy to elevate performance analytics in IT operations, enabling data-driven insights, proactive issue resolution, and optimized system performance. Observability Use Cases Best Supported by AIOps Elevating cloud-native and hybrid cloud observability with AIOps: AIOps transcends the boundaries between cloud-native and hybrid cloud environments, offering comprehensive monitoring, anomaly detection, and seamless incident automation. It adapts to the dynamic nature of cloud-native systems while optimizing on-premises and hybrid cloud operations. This duality makes AIOps a versatile tool for modern enterprises, ensuring a consistent and data-driven approach to observability, regardless of the infrastructure's intricacies. Seamless collaboration of dev and ops teams with AIOps: AIOps facilitates the convergence of dev and ops teams in observability efforts. By offering a unified space for data analysis, real-time monitoring, and incident management, AIOps fosters transparency and collaboration. It enables dev and ops teams to work cohesively, ensuring the reliability and performance of IT systems. Challenges To Adopting AIOps and Observability The three major challenges to adopting AIOps and observability are data complexity, integration complexity, and data security. Handling the vast and diverse data generated by modern IT environments can be overwhelming. Organizations need to manage, store, and analyze this data efficiently. Integrating AIOps and observability tools with existing systems and processes can be complex and time-consuming, potentially causing disruptions if not executed properly. The increased visibility into IT systems also raises concerns about data security and privacy. Ensuring the protection of sensitive information is crucial. Impacts and Benefits of Combining AIOps and Observability Across Sectors The impacts and benefits of integrating AIOps and observability transcend industries, enhancing reliability, efficiency, and performance across diverse sectors. It helps in improved incident response by using machine learning to detect patterns and trends, enabling proactive issue resolution, and minimizing downtime. Predictive analytics anticipates capacity needs and optimizes resource allocation in advance, which ensures uninterrupted operations. Full-stack observability leverages data from various sources — including metrics, events, logs, and traces (MELT) — to gain comprehensive insights into system performance, supporting timely issue identification and resolution. MELT capabilities are the key drivers where metrics help pinpoint issues, events automate alert prioritization, logs aid in root cause analysis, and traces assist in locating problems within the system. All contribute to improved operational efficiency. APPLICATION SCENARIOS OF COMBINING AIOPS AND OBSERVABILITY Industry Sectors Key Contributions Finance Enhance fraud detection, minimize downtime, and ensure compliance with regulatory requirements, thus safeguarding financial operations. Healthcare Improve patient outcomes by guaranteeing the availability and performance of critical healthcare systems and applications, contributing to better patient care. Retail Optimize supply chain operations, boost customer experiences, and maintain online and in-store operational efficiency. Manufacturing Enhance the reliability and efficiency of manufacturing processes through predictive maintenance and performance optimization. Telecommunications Support network performance to ensure reliable connectivity and minimal service disruptions. E-commerce Real-time insights into website performance, leading to seamless shopping experiences and improved conversion rates. Table 4 The application scenarios of combining AIOps and observability span diverse industries, showcasing their transformative potential in improving system reliability, availability, and performance across the board. Operational Guidance for AIOps Implementation Operational guidance for AIOps implementation offers a strategic roadmap to navigate the complexities of integrating AI into IT operations, ensuring successful deployment and optimization. Figure 2: Steps for implementing AIOps The Future of AIOps in Observability: The Road Ahead AIOps' future in observability promises to be transformative. As IT environments become more complex and dynamic, AIOps will play an increasingly vital role in ensuring system reliability and performance and will continue to evolve, integrating with advanced technologies like cognitive automation, natural language understanding (NLU), large language models (LLMs), and generative AI. APPLICATION SCENARIOS OF COMBINING AIOPS AND OBSERVABILITY Impact Area Role of AIOps Synergy With Cognitive Automation LLM and Generative AI Integration Data collection and analysis Collects and analyzes a wide range of IT data, including performance metrics, logs, and incidents Process unstructured data, such as emails, documents, and images Predict potential issues based on historical data patterns and generate reports Incident management Automatically detects, prioritizes, and responds to IT incidents Extract relevant information from incident reports and suggest or implement appropriate actions Understand its context and generate appropriate responses Root cause analysis Identifies root causes of incidents Access historical documentation and knowledge bases to offer detailed explanations and solutions Provide recommendations by analyzing historical data for resolving issues NLU Uses NLU to process user queries and understand context Engage in natural language conversations with IT staff or end-users, improving user experiences Power chatbots and virtual IT assistants, offering user-friendly interaction and support to answer queries and provide guidance Table 5 Conclusion The fusion of AI/ML with AIOps has ushered in a new era of observability. IT operations are constantly evolving, and so is the capability to monitor, analyze, and optimize performance. In the age of AI/ML-driven observability, our IT operations won't merely survive, but will thrive, underpinned by data-driven insights, predictive analytics, and an unwavering commitment to excellence. References: OpenNMS repositories, GitHub OpenObserve repositories, GitHub OpsPAI/awesome-AIOps, GitHub Precompiled binaries and Docker images for Prometheus components Shinken documentation This is an article from DZone's 2023 Observability and Application Performance Trend Report.For more: Read the Report
This blog post focuses on optimizing the size of JVM Docker images. It explores various techniques such as multi-stage builds, jlink, jdeps, and experimenting with base images. By implementing these optimizations, deployments can be faster, and resource usage can be optimized. The Problem Since Java 11, there is no pre-bundled JRE provided. As a result, basic Dockerfiles without any optimization can result in large image sizes. In the absence of a provided JRE, it becomes necessary to explore techniques and optimizations to reduce the size of JVM Docker images. Now, let's take a look at the simplest version of the Dockerfile for our application and see what's wrong with it. The project we will use in all the examples is Spring Petclinic. The simplest Dockerfile for our project looks like this: NOTE: Do not forget to build your JAR file. Dockerfile FROM eclipse-temurin:17 VOLUME /tmp COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar app.jar After we have built the JAR file of our project, let's build our Dockerfile image and compare the sizes of our JAR file and the created Docker image. Dockerfile docker build -t spring-pet-clinic/jdk -f Dockerfile . docker image ls spring-pet-clinic/jdk # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jdk latest 3dcd0ab89c3d 23 minutes ago 465MB If we look at the SIZE column, we can see that the size of our Docker image is 465MB! That's a lot, you might think, but maybe it's because our JAR is pretty big? In order to verify this, let's take a look at the size of our JAR file using the following command: Dockerfile ls -lh target/spring-petclinic-3.1.0-SNAPSHOT.jar | awk '{print $9, $5}' # target/spring-petclinic-3.1.0-SNAPSHOT.jar 55M According to the output of our command, you can see that the size of our JAR file is only 55MB. If we compare it to the size of a built Docker image, our JAR file is almost nine times smaller! Let's move on to analyze the reasons and how to make it smaller. What Are the Reasons for Big Docker Images, and How To Reduce Them? Before we move on to the optimization of our Docker image, we need to find out what exactly is causing it to be so relatively large. To do this, we will use a tool called Dive which is used for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image. To install Dive, follow the guide in their README: Now, let’s find out why our Docker image has such a size by exploring layers by using this command: dive spring-pet-clinic/jdk (instead of spring-pet-clinic/jdk use your Docker image name). Its output may feel a little bit overwhelming, but don’t worry, we will explore its output together. For our purpose, we are mostly interested only in the top left part, which is the layers of our Docker image. We can navigate between layers by using the “arrow” buttons. Now, let’s find out which layers our Docker image consists of. Remember, these are the layers of Docker image built from our basic Dockerfile. The first layer is our operating system. By default, it is Ubuntu. In the next one, it installs tzdata, curl, wget, locales, and some more different utils, which takes 50MB! The third layer, as you can see from the screenshot above, is our entire Eclipse Temurin 17 JDK, and it takes 279MB, which is pretty big. And the last one is our built JAR, which takes 58MB. Now that we understand what our Docker image consists of, we can see that a big part of our Docker image includes the entire JDK and things such as timezones, locales, and different utilities, which is unnecessary. The first optimization for our Docker images is to use jlink tool included in Java 9 along with modularity. With jlink, we can create a custom Java runtime that includes only the necessary components, resulting in a smaller final image. Now, let's take a look at our new Dockerfile incorporating the jlink tool, which, in theory, should be smaller than the previous one. Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules ALL-MODULE-PATH \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/app COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /opt/app/app.jar CMD ["java", "-jar", "/opt/app/app.jar"] To understand how our new Dockerfile works, let's walk through it: We use multi-stage Docker build in this Dockerfile and it consists of 2 stages. For the first stage, we use the same base image as in the previous Dockerfile. Also, we employ jlink tool to create a custom JRE, including all Java modules using —add-modules ALL-MODULE-PATH The second stage uses the debian:buster-slim base image and sets the environment variables for JAVA_HOME and PATH. It copies the custom JRE created in the first stage to the image. The Dockerfile then creates a directory for the application, copies the application JAR file into it, and specifies a command to run the Java application when the container starts. Let’s now build our container image and find out how much smaller it has become. Dockerfile docker build -t spring-pet-clinic/jlink -f Dockerfile_jlink . docker image ls spring-pet-clinic/jlink # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink latest e7728584dea5 1 hours ago 217MB Our new container image is 217MB in size, which is two times smaller than our previous one. Stripping Container Image Size, Even More, Using Java Dependency Analysis Tool (Jdeps) What if I told you that the size of our container image can be made even smaller? When paired with jlink, you can also use the Java Dependency Analysis Tool (jdeps), which was first introduced in Java 8, to understand the static dependencies of your applications and libraries. In our previous example, for the jlink —add-modules parameter, we set ALL-MODULE-PATH which adds all existing Java modules in our custom JRE, and obviously, we don’t need to include every module. This way we can use jdeps to analyze the project's dependencies and remove any unused ones, further reducing the image size. Let’s take a look at how to use jdeps in our Dockerfile: Dockerfile # Example of custom Java runtime using jlink in a multi-stage container build FROM eclipse-temurin:17 as jre-build COPY target/spring-petclinic-3.1.0-SNAPSHOT.jar /app/app.jar WORKDIR /app # List jar modules RUN jar xf app.jar RUN jdeps \ --ignore-missing-deps \ --print-module-deps \ --multi-release 17 \ --recursive \ --class-path 'BOOT-INF/lib/*' \ app.jar > modules.txt # Create a custom Java runtime RUN $JAVA_HOME/bin/jlink \ --add-modules $(cat modules.txt) \ --strip-debug \ --no-man-pages \ --no-header-files \ --compress=2 \ --output /javaruntime # Define your base image FROM debian:buster-slim ENV JAVA_HOME=/opt/java/openjdk ENV PATH "${JAVA_HOME}/bin:${PATH}" COPY --from=jre-build /javaruntime $JAVA_HOME # Continue with your application deployment RUN mkdir /opt/server COPY --from=jre-build /app/app.jar /opt/server/ CMD ["java", "-jar", "/opt/server/app.jar"] Even without going into details, you can see that our Dockerfile has become much larger. Now let's analyze each piece and what it is responsible for: We still use multi-stage Docker build. Copy our built Java app and set WORKDIR to /app. Unpacks the JAR file, making its contents accessible for jdeps tool. The second RUN instruction runs jdeps tool on the extracted JAR file to analyze its dependencies and create a list of required Java modules. Here's what each option does: --ignore-missing-deps: Ignores any missing dependencies, allowing the analysis to continue. --print-module-deps: Specifies that the analysis should print the module dependencies. --multi-release 17: Indicates that the application JAR is compatible with multiple Java versions, in our case, Java 17. --recursive: Performs a recursive analysis to identify dependencies at all levels. --class-path 'BOOT-INF/lib/*': Defines the classpath for the analysis, instructing "jdeps" to look in the "BOOT-INF/lib" directory within the JAR file. app.jar > modules.txt: Redirects the output of the "jdeps" command to a file named "modules.txt," which will contain the list of Java modules required by the application. Then, we replace the ALL-MODULE-PATH value for —add-modules jlink parameter with $(cat modules.txt) to include only necessary modules # Define your base image section stays the same as in the previous Dockerfile. # Continue with your application deployment was modified to COPY out JAR file from the previous stage. The only thing left to do is to see how much the container image has shrunk using our latest Dockerfile: Dockerfile docker build -t spring-pet-clinic/jlink_jdeps -f Dockerfile_jdeps . docker image ls spring-pet-clinic/jlink_jdeps # REPOSITORY TAG IMAGE ID CREATED SIZE # spring-pet-clinic/jlink_jdeps latest d24240594f1e 3 hours ago 184MB So, by using only the modules we need to run our application, we reduced the size of our container image by 33MB, not a lot, but still nice. Conclusion Let's take another look, using Dive, at how our Docker images have shrunk after our optimizations. Instead of using the entire JDK, in this case, we built our custom JRE using jlink tool and using debian-slim base image. Which significantly reduced our image size. And, as you can see, we don’t have unnecessary stuff, such as timezones, locales, big OS, and entire JDK. We include only what we use and need. Dockerfile_jlink Here, we went even further and passed only used Java modules to our JRE, making the built JRE even smaller, thus reducing the size of the entire final image. Dockerfile_jdeps In conclusion, reducing the size of JVM Docker images can significantly optimize resource usage and speed up deployments. Employing techniques like multi-stage builds, jlink, jdeps, and experimenting with base images can make a substantial difference. While the size reduction might seem minimal in some cases, the cumulative effect can be significant, especially in environments where multiple containers are running. Thus, optimizing Docker images should be a key consideration in any application development and deployment process.
Hello everyone! In this article, I want to share my knowledge and opinion about the data types that are often used as an identifier. Today we will touch on two topics at once. These are measurements of search speed by key and data types for the key on the database side. I will use a PostgreSQL database and a demo Java service to compare query speeds. UUID and ULID Why do we need some kind of incomprehensible types for IDs? I won’t talk about distributed systems, connectivity of services, sensitive data, and the like. If someone is interested in this, they can Google it - at the moment we are interested in performance. As the name suggests, we will talk about two types of keys: UUID and ULID. UUID has long been known to everyone, but ULID may be unfamiliar to some. The main advantage of ULID is that it is monotonically increasing and is a sortable type. Naturally, these are not all the differences. Personally, I also like the fact that there are no special characters in it. A small digression, I noticed a long time ago that many teams use the varchar(36) data type to store UUID in the PostgreSQL database and I don’t like this, since this database has a corresponding data type for UUID. A little later, we will see which type is preferable on the database side. Therefore, we will look not only at a comparison of the two data types on the backend side but also at the difference when storing UUID in different formats on the database side. Comparison So let's start comparing things. The UUID is 36 characters long and takes up 128 bits of memory. The ULID is 26 characters long and also takes up 128 bits of memory. For my examples, I created two tables in the database with three fields: SQL CREATE TABLE test.speed_ulid ( id varchar(26) PRIMARY KEY, name varchar(50), created timestamp ); CREATE TABLE test.speed_uuid ( id varchar(36) PRIMARY KEY, name varchar(50), created timestamp ); For the first comparison, I stored the UUID in varchar(36) format, as is often done. In the database, I recorded 1,000,000 in each of the tables. The test case will consist of 100 requests using identifiers previously pulled from the database; that is, when calling the test method, we will access the database 100 times and retrieve the entity by key. The connection will be created and warmed up before measurement. We will conduct two test runs and then 10 effective iterations. For your convenience, I will provide a link to the Java code at the end of the article. Sorry, but the measurements were taken on a standard MacBook Pro laptop and not on a dedicated server, but I don't believe there will be a significant difference in the results other than increased time spent on network traffic between the database and the backend. Here is some background information: # CPU I9-9980HK # CPU count: 16 # RAM: 32GB # JMH version: 1.37 # VM version: JDK 11.0.12, Java HotSpot(TM) 64-Bit Server VM, 11.0.12+8-LTS-237 # DB: PostgreSQL 13.4, build 1914, 64-bit Queries that will be used to obtain an entity by key: SQL SELECT * FROM test.speed_ulid where id = ? SELECT * FROM test.speed_uuid where id = ? Measurement Results Let's look at the measurement results. Let me remind you that each table has 1,000,000 rows. Both Types of Identifiers Are Stored in the Database as varchar I ran this test several times, and the result was about the same: either the ULID was a little faster, or the UUID. In percentage terms, the difference is practically zero. Well, you can disagree that there is no difference between these types. I would say that it is not possible to use other data types on the database side. UUID as uuid, ULID as varchar in DB For the next test, I changed the data type from varchar(36) to uuid in the test.speed_uuid table. In this case, the difference is obvious: 4.5% in favor of UUID. As you can see, it makes sense to use the uuid data type on the database side in the case of a type of the same name on the service side. The index for this format is very well optimized in PostgreSQL and shows good results. Well, now we can definitely part ways. Or not? If you look at the index search query plan, you can see the following ((id)::text = '01HEE5PD6HPWMBNF7ZZRF8CD9R'::text) in the case when we use varchar. In general, comparing two text variables is a rather slow operation, so maybe there is no need to store the ID in this format. Or are there other ways to speed up key comparison? First, let's create another index of the kind “hash” for the table with ULID. SQL create index speed_ulid_id_index on test.speed_ulid using hash (id); Let's look at the execution plan for our query: We will see that the database uses a hash index, and not a btree in this case. Let's run our test and see what happens. varchar + index(hash) for ULID, uuid for UUID This combination gave an increase of 2.3% relative to uuid and its cheating index. I'm not sure that keeping two indexes on one field can somehow be justified. So it's worth considering whether there's more you can do. And here it’s worth looking into the past and remembering how uuid or some other string identifiers used to be stored. That's right: either text or a byte array. So let's try this option: I removed all the indexes for the ULID, cast it to bytea , and recreated the primary key. bytea for ULID, uuid for UUID As a result, we got approximately the same result as in the previous run with an additional index, but I personally like this option better. Measurement result with 2,000,000 rows in the database: Measurement result with 3,000,000 rows in the database: I think there is no point in continuing measurements further. The pattern remains: ULID saved as bytea slightly outperforms UUID saved as uuid in DB. If we take the data from the first measurements, it is obvious that with the help of small manipulations, you can increase performance by about 9% if you use varchar. So, if you have read this far, I assume the article was interesting to you and you have already drawn some conclusions for yourself. It is worth noting that the measurements were made under ideal conditions for both the backend part and the database. We did not have any parallel processes running that write something to the database, change records, or perform complex calculations on the back-end side. Сonclusions Let's go over the material. What did you learn that was useful? Do not neglect the uuid data type on the PostgreSQL side. Perhaps someday extensions for ULID will appear in this database, but for now, we have what we have. Sometimes it is worth creating an additional index of the desired type manually, but there is an overhead to consider. If you are not afraid of unnecessary work - namely, writing your own converters for types - then you should try bytea if there is no corresponding type for your identifier on the database side. What type of data should be used for the primary key and in what format should it be stored? I don’t have a definite answer to these questions: it all depends on many factors. It is also worth noting that a competent choice of data type for ID, and not only for it, can at some point play an important role in your project. I hope this article was useful to you. Good luck! Project on GitHub
In contemporary web development, a recurring challenge revolves around harmonizing the convenience and simplicity of using a database with a web application. My name is Viacheslav Aksenov, and in this article, I aim to explore several of the most popular approaches for integrating databases and web applications within the Kubernetes ecosystem. These examples are examined within the context of a testing environment, where constraints are more relaxed. However, these practices can serve as a foundation applicable to production environments as well. One Service, One Database. Why? Running a database alongside a microservice aligns with the principles outlined in the Twelve-Factor App methodology. One key factor is "Backing Services" (Factor III), which suggests treating databases, message queues, and other services as attached resources to be attached or detached seamlessly. By co-locating the database with the microservice, we adhere to the principle of having a single codebase that includes the application and its dependencies, making it easier to manage, scale, and deploy. Additionally, it promotes encapsulation and modularity, allowing the microservice to be self-contained and portable across different environments, following the principles of the Twelve-Factor App. This approach enhances the maintainability and scalability of the entire application architecture. For this task, you can leverage various tools, and one example is using KubeDB. What Is KubeDB? KubeDB is an open-source project that provides a database management framework for Kubernetes, an open-source container orchestration platform. KubeDB simplifies the deployment, management, and scaling of various database systems within Kubernetes clusters. We used the following benefits from using this tool: Database operators: Postgres operator to simplify the process of deploying and managing database instances on Kubernetes. Monitoring and alerts: KubeDB integrates with monitoring and alerting tools like Prometheus and Grafana, enabling you to keep an eye on the health and performance of your database instances. Security: KubeDB helps you set up secure access to your databases using authentication mechanisms and secrets management. And it is very easy to set up the deployment. deployment.yaml: YAML apiVersion: kubedb.com/v1alpha2 kind: PostgreSQL metadata: name: your-postgresql spec: version: "11" storageType: Durable storage: storageClassName: <YOUR_STORAGE_CLASS> accessModes: - ReadWriteOnce resources: requests: storage: 1Gi terminationPolicy: WipeOut databaseSecret: secretName: your-postgresql-secret databaseURLFromSecret: true replicas: 1 users: - name: <YOUR_DB_USER> passwordSecret: secretName: your-postgresql-secret passwordKey: password databaseName: <YOUR_DB_NAME> Then, you can use the credentials and properties of this database to connect your service's pod to it with deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: your-microservice spec: replicas: 1 selector: matchLabels: app: your-microservice template: metadata: labels: app: your-microservice spec: containers: - name: your-microservice-container image: your-microservice-image:tag ports: - containerPort: 80 env: - name: DATABASE_URL value: "postgres://<YOUR_DB_USER>:<YOUR_DB_PASSWORD>@<YOUR_DB_HOST>:<YOUR_DB_PORT>/<YOUR_DB_NAME>" --- apiVersion: v1 kind: Service metadata: name: your-microservice-service spec: selector: app: your-microservice ports: - protocol: TCP port: 80 targetPort: 80 And if, for some reason, you are not ready to use KubeDB or don't require the full functional of their product, you can use the Postgresql container as a sidecar for your test environment. Postgres Container as a Sidecar In the context of Kubernetes and databases like PostgreSQL, a sidecar is a separate container that runs alongside the main application container within a pod. The sidecar pattern is commonly used to enhance or extend the functionality of the main application container without directly impacting its core logic. Let's see the example of a configuration for a small Spring Boot Kotlin service that handles cat names. deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: cat-svc labels: app: cat-svc spec: replicas: 1 selector: matchLabels: app: cat-svc template: metadata: labels: app: cat-svc type: http spec: containers: - name: cat-svc image: cat-svc:0.0.1 ports: - name: http containerPort: 8080 protocol: TCP readinessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 30 timeoutSeconds: 10 periodSeconds: 10 livenessProbe: httpGet: path: /actuator/health port: 8080 initialDelaySeconds: 60 timeoutSeconds: 10 periodSeconds: 30 env: - name: PLACES_DATABASE value: localhost:5432/cats - name: POSTGRES_USER value: pwd - name: POSTGRES_PASSWORD value: postgres - name: cat-postgres image: postgres:11.1 ports: - name: http containerPort: 5432 protocol: TCP env: - name: POSTGRES_USER value: pwd - name: POSTGRES_PASSWORD value: postgres - name: POSTGRES_DB value: cats Dockerfile FROM gradle:8.3.0-jdk17 COPY . . EXPOSE 8080 CMD ["gradle", "bootRun"] And for local run, it is possible to use docker-compose with the following configuration. docker-compose.yaml: YAML version: '3.8' services: cat-postgres: image: postgres:12.13 restart: always ports: - "5432:5432" environment: POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres POSTGRES_DB: cats # volumes: # - ./init.sql:/docker-entrypoint-initdb.d/create_tables.sql - if you want to run any script before an app # - ./db-data/:/var/lib/postgresql/data/ service: image: cat-svc:0.0.1 restart: always ports: - '8080:8080' environment: SPRING_PROFILES_ACTIVE: prod PLACES_DATABASE: cat-postgres:5432/cats POSTGRES_PASSWORD: postgres POSTGRES_USER: postgres Migrations The big thing that has to be decided before using this approach is the migration question. The best option in this approach is to delegate the migration process to any tool that can work within your app infrastructure. For example, for Java World, you could use Flyway or Liquibase. Flyway is a popular open-source database migration tool. It allows you to version control your database schema and apply changes in a structured manner. Flyway supports multiple databases, including PostgreSQL, MySQL, and Oracle. Liquibase is an open-source database migration tool that supports tracking, managing, and applying database changes. It provides a way to define database changes using XML, YAML, or SQL, and it supports various databases. Pros of Using a PostgreSQL Sidecar in Kubernetes Separation of concerns: Sidecars allow you to separate specific functionalities (e.g., database migrations, backups) from the main application logic. Сompliance with microservice architecture. Simplified deployment: Sidecars can be deployed and managed alongside the main application using the same deployment configurations, simplifying the overall deployment process. You don't need to support separated database for testing the environment. And it leads to decreasing the complexity of tests (you don't need to think about collisions while you are running many CI with tests for the same table) Cons of Using a PostgreSQL Sidecar in Kubernetes Resource overhead: Running additional containers consumes resources (CPU, memory) on the node, which may impact the overall performance and resource utilization of the Kubernetes cluster. It's best to use as few resources as possible. Startup order: The main application may become dependent on the sidecar for certain functionalities, potentially leading to issues if there are discrepancies or version mismatches between the main application and the sidecar. Arranging containers in a specific order without additional configuration can be somewhat challenging. However, this shouldn't pose a problem in test environments due to the quick startup of the PostgreSQL container. In most scenarios, the PostgreSQL container will initiate before any of your business applications. Even if the application attempts to run before PostgreSQL is ready, it will encounter a failure and be automatically restarted by the default Kubernetes mechanism until the database becomes available. Learning curve: Adopting the sidecar pattern may require a learning curve for development and operations teams, particularly if they are new to the concept of containerized sidecar architectures. Once the setup is complete, new team members should encounter no issues with this approach. Conclusion In conclusion, the choice between using KubDB and the PostgreSQL sidecar approach for integrating web applications and databases in a test environment ultimately depends on your specific requirements and preferences. KubDB offers a comprehensive solution with Kubernetes-native features, streamlining the management of databases alongside web services. On the other hand, the PostgreSQL sidecar approach provides flexibility and fine-grained control over how databases and web applications interact. Whether you opt for the simplicity and seamless integration provided by KubDB or the customization potential inherent in the sidecar pattern, both approaches lay a solid foundation for test environments. The key lies in understanding the unique demands of your project and selecting the method that aligns best with your development workflow, scalability needs, and overall application architecture. Whichever path you choose, the insights gained from exploring these approaches in a test setting can pave the way for a robust and efficient integration strategy in your production environment.
What Is a Micro Frontend? Micro frontends are an architectural approach that extends the concept of microservices to the front end of web applications. In a micro frontend architecture, a complex web application is broken down into smaller, independently deployable, and maintainable units called micro frontends. Each micro frontend is responsible for a specific part of the user interface and its related functionality. Key characteristics and concepts of micro frontends include: Independence: Micro frontends are self-contained and independently developed, tested, and deployed. This autonomy allows different teams to work on different parts of the application with minimal coordination. Technology agnostic: Each micro frontend can use different front-end technologies (e.g., Angular, React, Vue.js) as long as they can be integrated into the parent or Shell application. Isolation: Micro frontends are isolated from each other, both in terms of code and dependencies. This isolation ensures that changes in one micro frontend do not impact others. Integration: A container or Shell application is responsible for integrating and orchestrating the micro frontends. It provides the overall structure of the user interface and handles routing between micro frontends. Independent deployment: Micro frontends can be deployed independently, allowing for continuous delivery and faster updates. This reduces the risk of regression issues and accelerates the release cycle. Loose coupling: Micro frontends communicate through well-defined APIs and shared protocols, such as HTTP, allowing them to be loosely coupled. This separation of concerns simplifies development and maintenance. User Interface composition: The container application assembles the user interface by composing the micro frontends together. This composition can be done on the server-side (Server-Side Includes) or client-side (Client-Side Routing). Scaling and performance: Micro frontends enable the horizontal scaling of specific parts of an application, helping to optimize performance for frequently accessed areas. Decentralized teams: Different teams or development groups can work on individual micro frontends. This decentralization is advantageous for large or distributed organizations. Micro frontend architectures are particularly useful in large, complex web applications, where a monolithic approach might lead to development bottlenecks, increased complexity, and difficulties in maintaining and scaling the application. By using micro frontends, organizations can achieve greater flexibility, agility, and maintainability in their front-end development processes, aligning with the broader trend of microservices in the world of software architecture. Micro Frontends Hosted Into a Single Shell UI Let's look at how two Angular micro frontends can be hosted into a single Shell UI. To host two Angular micro frontends in a single Shell Angular UI, you can use a micro frontend framework like single-spa or qiankun to achieve this. These frameworks enable you to integrate multiple independently developed micro frontends into a single application Shell. Here’s a high-level overview of how to set up such an architecture: 1. Create the Shell Angular Application Set up your Shell Angular application as the main container for hosting the micro frontends. You can create this application using the Angular CLI or any other preferred method. 2. Create the Micro Frontends Create your two Angular micro frontends as separate Angular applications. Each micro frontend should have its own routing and functionality. 3. Configure Routing for Micro Frontends In each micro frontend application, configure the routing so that each micro frontend has its own set of routes. You can use Angular routing for this. 4. Use a Micro Frontend Framework Integrate a micro frontend framework like single-spa or qiankun into your Shell Angular application. Here’s an example of how to use single-spa in your Shell Angular application: Install single-spa: npm install single-spa Shell Angular Application Code In your Shell Angular application, configure the single-spa to load the micro frontends. import { registerApplication, start } from 'single-spa'; // Register the micro frontends registerApplication({ name: 'customer-app', app: () => System.import('customer-app'), // Load customer-app activeWhen: (location) => location.pathname.startsWith('/customer-app'), }); registerApplication({ name: 'accounts-app', app: () => System.import('accounts-app'), // Load accounts-app activeWhen: (location) => location.pathname.startsWith('/accounts-app'), }); // Start single-spa start(); 5. Host Micro Frontends Configure your Shell Angular’s routing to direct to the respective micro frontends based on the URL. For example, when a user accesses /customer-app, the shell should load the customer micro frontend, and for /accounts-app, load accounts micro frontend. 6. Development and Build Develop and build your micro frontends separately. Each should be a standalone Angular application. 7. Deployment Deploy the Shell Angular application along with the micro frontends, making sure they are accessible from the same domain. With this setup, your Shell Angular application will act as the main container for hosting the micro frontends, and you can navigate between the micro frontends within the shell’s routing. This allows you to have a single Angular UI that hosts multiple micro frontends, each with its own functionality.
In my previous article, I hinted at explaining how Ansible can be used to expose applications running inside a high availability K8s cluster to the outside world. This post will show how this can be achieved using a K8s ingress controller and load balancer. This example uses the same setup as last time around: virtual machines running under the default Windows Hypervisor (Hyper-V). To make room for the addition of a proxy each VM had to give up some RAM. With the exception of the initial master and the Ansible runner, each of the remaining nodes received an allocation of 2000MB. A new version of the sample project is available at GitHub with a new playbook called k8s_boot.yml. This yaml boots up the entire cluster instead of having to run multiple playbooks one after the other. It configures the cluster according to the specification of the inventory file. The flow of execution can be better, but I changed the underlying copybooks as little as possible so readers of previous posts can still find their way. Since the architecture of this post might seem complex at first encounter, an architectural diagram is included towards the very bottom to clarify the landscape. Master and Commanders In the previous article I alluded to the fact that a high availability cluster requires multiple co-masters to provide backup should the current master act up. We will start off by investigating how this redundancy is used to establish high availability. The moment a co-master loses comms with the master, it nominates itself to become the next master. Each of the remaining masters then has to acknowledge its claim upon receiving news of its candidacy. However, another co-master can also notice the absence of the current master before receiving word of a candidacy and nominating itself. Should 50% of the vote be the requirement to assume control, it is possible for two control planes to each attract 50% and think itself the master. Such a cluster will go split-brained with two masters orchestrating a bunch of very confused worker nodes. For this reason, K8s implements the raft protocol from, which follows the typical requirement that a candidate should receive a quorum of 50%+1 before it gains the respect to boss all and sundry. Consequently, a high availability K8s cluster should always comprise of an unequal number of masters. For the project, this means that the inventory should always contain an equal number of co-masters, with the initial master then assuring the inequality. The bootup playbook imports the older k8s_comasters.yml playbook into its execution to prepare and execute the well-known "kubeadm join" command on each of the co-masters: kubeadm join k8scp:6443 --token 9ei28c.b496t8c4vbjea94h --discovery-token-ca-cert-hash sha256:3ae7abefa454d33e9339050bb26dcf3a31dc82f84ab91b2b40e3649cbf244076 --control-plane --certificate-key 5d89284dee1717d0eff2b987f090421fb6b077c07cf21691089a369781038c7b Joining workers nodes to the cluster uses a similar join command but omits the --control-plane switch, as can be seen in k8s_workers.yml, also imported during bootup. After running the bootup playbook, the cluster will comprise both control-plane and worker nodes: Control At All Times At this point in time, all nodes refer to the original master by hostname, as can be seen from the "kube init" command that starts the first master: kubeadm init --pod-network-cidr 10.244.0.0/16 --control-plane-endpoint k8scp:6443 --upload-certs Clearly, this node is currently the single point of failure of the cluster. Should it fall away, the cluster's nodes will lose contact with each other. The Ansible scripts mitigate for this by installing the kube config to all masters so kubectl commands can be run from any master by such designated user. Changing the DNS entry to map k8scp to one of the other control planes will hence restore service. While this is easy to do using the host file, additional complexities can arise when using proper DNS servers. Kubernetes orthodoxy, consequently, has that a load balancer should be put in front of the cluster to spread traffic across each of the master nodes. A control plane that falls out will be removed from the duty roster by the proxy. None will be the wiser. HAProxy fulfills this role perfectly. The Ansible tasks that make this happen are: - name: Install HAProxy become: true ansible.builtin.apt: name: haproxy=2.0.31-0ubuntu0.2 state: present - name: Replace line in haproxy.cfg1. become: true lineinfile: dest: /etc/haproxy/haproxy.cfg regexp: 'httplog' line: " option tcplog" - name: Replace line in haproxy.cfg2. become: true lineinfile: dest: /etc/haproxy/haproxy.cfg regexp: 'mode' line: " mode tcp" - name: Add block to haproxy.cfg1 become: true ansible.builtin.blockinfile: backup: false path: /etc/haproxy/haproxy.cfg block: |- frontend proxynode bind *:80 bind *:6443 stats uri /proxystats default_backend k8sServers backend k8sServers balance roundrobin server cp {{ hostvars['host1']['ansible_host'] }:6443 check {% for item in comaster_names -%} server {{ item } {{ hostvars[ item ]['ansible_host'] }:6443 check {% endfor -%} listen stats bind :9999 mode http stats enable stats hide-version stats uri /stats - name: (Re)Start HAProxy service become: true ansible.builtin.service: name: haproxy enabled: true state: restarted The execution of this series of tasks is triggered by the addition of a dedicated server to host HAProxy to the inventory file. Apart from installing and registering HAProxy as a system daemon, this snippet ensures that all control-plane endpoints are added to the duty roster. Not shown here is that the DNS name (k8scp) used in the "kubeadm join" command above is mapped to the IP address of the HAProxy during bootup. Availability and Accessibility Up to this point, everything we have seen constitutes the overhead required for high-availability orchestration. All that remains is to do a business Deployment and expose a K8s service to track its pods on whichever node they may be scheduled on: kubectl create deployment demo --image=httpd --port=80 kubectl expose deployment demo Let us scale this deployment to two pods, each running an instance of the Apache web server: This two-pod deployment is fronted by the demo Service. The other Service (kubernetes) is automatically created and allows access to the API server of the control plane. In a previous DZone article, I explained how this API can be used for service discovery. Both services are of type ClusterIP. This is a type of load balancer, but its backing httpd pods will only be accessible from within the cluster, as can be seen from the absence of an external ip. Kubernetes provides various other service types, such as NodePort and LoadBalancer, to open up pods and containers for outside access. A NodePort opens up access to the service on each Node. Although it is possible for clients to juggle IP addresses should a node fall out, the better way is to use a LoadBalancer. Unfortunately, Kubernetes does not provide an instance as it is typically provided by cloud providers. Similarly, an on-premise or bare-metal cluster has to find and run its own one. Alternatively, its clients have to make do as best they can by using NodePorts or implementing its own discovery mechanism. We will follow the first approach by using MetalLB to slot K8s load balancing into our high availability cluster. This is a good solution, but it is not the best solution. Since every K8s deployment will be exposed behind its own LoadBalancer/Service, clients calling multiple services within the same cluster will have to register the details of multiple load balancers. Kubernetes provides the Ingress API type to counter this. It enables clients to request service using the HTTP(S) routing rules of the Ingress, much the way a proxy does it. Enough theory! It is time to see how Ansible can declare the presence of an Ingress Controller and LoadBalancer: - hosts: masters gather_facts: yes connection: ssh vars_prompt: - name: "metal_lb_range" prompt: "Enter the IP range from which the load balancer IP can be assigned?" private: no default: 192.168.68.200-192.168.69.210 tasks: - name: Installing Nginx Ingress Controller become_user: "{{ ansible_user }" become_method: sudo # become: yes command: kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.5/deploy/static/provider/cloud/deploy.yaml run_once: true - name: Delete ValidatingWebhookConfiguration become_user: "{{ ansible_user }" become_method: sudo # become: yes command: kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission run_once: true - name: Install Metallb1. become_user: "{{ ansible_user }" become_method: sudo become: yes shell: 'kubectl -n kube-system get configmap kube-proxy -o yaml > /home/{{ ansible_user }/kube-proxy.yml' - name: Install Metallb2. become_user: "{{ ansible_user }" become_method: sudo become: yes command: kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml - name: Prepare L2Advertisement. become_user: "{{ ansible_user }" become_method: sudo copy: dest: "~/l2advertisement.yml" content: | apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: example namespace: metallb-system - name: Prepare address pool. become_user: "{{ ansible_user }" become_method: sudo copy: dest: "~/address-pool.yml" content: | apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: first-pool namespace: metallb-system spec: addresses: - {{ metal_lb_range } - pause: seconds=30 - name: Load address pool become_user: "{{ ansible_user }" become_method: sudo command: kubectl apply -f ~/address-pool.yml - name: Load L2Advertisement become_user: "{{ ansible_user }" become_method: sudo command: kubectl apply -f ~/l2advertisement.yml ... First off, it asks for a range of IP addresses that are available for use by the LoadBalancers. It subsequently installs the Nginx Ingress Controller and, lastly, MetallLB to load balance behind the Ingress. MetalLB uses either the ARP (IPv4)/NDP(IPv6) or the BGP to announce the MAC address of the network adaptor. Its pods attract traffic to the network. BGP is probably better as it has multiple MetalLB speaker pods announcing. This might make for a more stable cluster should a node fall out. ARP/NDP only has one speaker attracting traffic. This causes a slight unresponsiveness should the master speaker fail and another speaker has to be elected. ARP is configured above because I do not have access to a router with a known ASN that can be tied into BGP. Next, we prepare to boot the cluster by designating co-masters and an HAProxy instance in the inventory. Lastly, booting with the k8s_boot.yml playbook ensures the cluster topology as declared in the inventory file is enacted: Each node in the cluster has one MetalLB speaker pod responsible for attracting traffic. As stated above, only one will associate one of the available IP addresses with its Mac address when using ARP. The identity of this live wire can be seen at the very bottom of the Ingress Controller service description: Availability in Action We can now test cluster stability. The first thing to do is to install an Ingress: kubectl create ingress demo --class=nginx --rule="www.demo.io/*=demo:80" Browse the URL, and you should see one of the Apache instances returning a page stating: "It works!": This IP address spoofing is pure magic. It routes www.demo.io to the Apache web server without it being defined using a DNS entry outside the cluster. The Ingress can be interrogated from kubectl: One sees that it can be accessed on one of the IP addresses entered during bootup. The same can also be confirmed using wget, the developer tools of any browser worth its salt, or by inspecting the ingress controller: Should the external IP remain in the pending state, Kubernetes could not provision the load balancers. The MetalLB site has a section that explains how to troubleshoot this. We confirmed that the happy case works, but does the web server regain responsiveness in case of failure? We start off by testing whether the IngressController is a single point of failure by switching the node where it ran: Kubernetes realized that the node was no longer in the cluster, terminated all the pods running on that cluster, and rescheduled them on the remaining worker node. This included the IngressController. The website went down for a while, but Kubernetes eventually recovered service. In other words, orchestration in action! Next up, we remove the MetalLB speaker by taking down the cluster where it runs: Another speaker will step up to the task! What about HAProxy? It runs outside the cluster. Surely, this is the single point of failure. Well... Yes and no. Yes, because one loses connection to the control planes: No, because all that is required is to map the IP address of k8scp from that of the HAProxy to that of one of the masters. The project has an admin playbook to do this. Run it and wait for the nodes to stabilize into a ready state. Ingress still routes, MetalLB still attracts, and httpd still serves: Due to the HAProxy being IAC, it is also no trouble to boot a new proxy and slot out the faulty/crashed one. The playbook used above to temporarily switch traffic to a master can also be used during such a proxy replacement. Unfortunately, this requires human interaction, but at least the human knows what to monitor with the utmost care and how to quickly recover the cluster. Final Architecture The final architecture is as follows: Note that all the MetalLB speakers work as a team to provide LoadBalancing for the Kubernetes Services and its Deployments. Conclusion There probably are other ways to install a high availability K8s cluster, but I like this double load balancer approach: HAProxy abstracts and encapsulates the redundancy of an unequal number of control planes, e.g., it ensures 99.9999% availability for cluster controlling commands coming from kubectl; MetalLB and Nginx Ingress Controller working together to track the scheduling of business pods. Keep in mind that the master can move a pod with its container(s) to any worker node depending on failure and resource availability. In other words, the MetalLB LoadBalancer ensures continuity of business logic in case of catastrophic node failure. In our sample, the etcd key-value store is located as part of the control-planes. This is called the stacked approach. The etcd store can also be removed from the control-planes and hosted inside its own nodes for increased stability. More on this here. Our K8s as Ansible project is shaping nicely for use as a local or play cloud. However, a few things are outstanding that one would expect in a cluster of industrial strength: Role based access control (RBAC); Service mesh to move security, observability, and reliability from the application into the platform; Availability zones in different locations, each with its one set of HAProxy, control-planes, and workers separated from each other using a service mesh; Secret management; Ansible lint needs to be run against the Ansible playbooks to identify bad and insecure practices requiring rectification; Choking incoming traffic when a high load of failure is experienced to allow business pods to continue service or recover gracefully. It should be noted, though, that nothing prevents one to add these to one's own cluster.
Did you know that email Send Time Optimization (STO) can improve the open rate by up to 93%? Awesome! Or it might only be 10%. A slightly more credible case study claims that message delivery at the right time resulted in an open rate of 55%, a click rate of 30%, and a conversion rate of 13%. I’ll take that increase any day if there’s a positive ROI. Optimization can be applied to any number of problems. It can be applied equally to content, where it may be to the customer’s benefit, as it can be applied to price, where optimization can deliver the maximum possible price for merchants. Unfortunately, there’s no way to know in advance what the results of any particular optimization will be without the right data. The only way to get that data is through science! Science + Data = Profit Let’s consider the case of email conversion rates. If we’re considering an email message sent to paying customers (we’re not worried about deliverability or the ‘Message From’ text), the factors that can affect customer behavior look like the table below. Think of these five as the variables in an algorithm, where some terms may have an infinite range of possible values, and when we put them all together, we get an impossibly complex set of potential interactions. Customer segments Message position in email app Subject line content Message content Call to action content Data like that mentioned above suggests that varying the send time of any particular email message can have a significant impact on the conversion rate, the percentage of customers who open the email and click on the desired call to action link (buy, sell, join, etc.) within the email. How can we determine the best possible time to send an email? Problem: You and I work at a company that regularly sends email messages to our customers. Our SaaS app allows users to tell us the best time to deliver email messages, but not everyone has taken advantage, particularly our new customers. How can the best time to deliver email messages for users not configured a preference be determined? In our case, the best time means “results in the highest conversion rate.” N.B. When speaking from our perspective, we refer to “send time.” From the customer’s perspective, we refer to “delivery time.” Science! Have a look at your email app on your phone and desktop. My mobile Gmail account shows six messages before I scroll to see the rest (depending on whether there are message attachments). The desktop version shows 16 messages. Another mobile email app I use shows ten. The desktop web email app for Office 365 shows ten. We have an untested assumption that the time an email message is sent/delivered to a customer will impact the conversion rate. In other words, the higher a message is placed in the list of unread messages, the greater the chance of being opened, the critical first step. If we propose a solution to the STO problem, we want our coworkers to be confident in our recommendations. We’ll take our layman’s assumption and cast it as a pair of hypotheses: null and alternative. The null hypothesis is the claim that no relationship exists between two sets of data or variables being analyzed (which we are trying to disprove), and the alternative hypothesis is the hypothesis that we are trying to prove, which is accepted if we have sufficient evidence to reject the null hypothesis. Null hypothesis (send time makes no difference): There is no relationship between conversion rate and send time. Alternative hypothesis (send time does make a difference): The conversion rate varies depending on the time an email is sent. The conversion rate is the percentage of customers who open the email and click on a call to action link within the email. The null and alternative hypotheses concern themselves with two variables: The independent variable is the email send time. The dependent variable is the conversion rate metric. There’s also a third set of variables that, if not carefully controlled, will turn our experiment into rubbish: The confounding variables influence the dependent and independent variables, causing a spurious association. In our case, the confounding variables are the four listed below: Customer segments Subject line Message content Call to action content. There could be more confounders like desktop or mobile apps, but the only variables we have control over are these four. Pro Tip: Use confounding variables likely to have high open, read, and conversion rates, something highly desirable with low friction and free or low cost. While you are making a real offer, our goal is to determine the best send time. N.B., It is important that none of the confounding variables change during the experiment. Experiment We will test our hypothesis using the scientific method or something close to it. Our starting point is the independent variable: when should we send the messages during the experiment? We have two ways to tackle this problem: use our intuition or use our data. As mentioned previously, our SaaS app allows users to set a preference for email delivery time. If we query the preference data using the user’s local time, we will get something like the normal distribution: Fig. 1 Normal distribution of preferred email delivery times; median: 0730 local. According to users with a preference, the median time for delivery is 7:30-ish. It’s unfortunate that there’s such a broad range of preferred times; five hours is a big window. Ideally, we want to send the messages one hour apart. Having a five-hour window means five customer segments with at least 1,000 each. The choice of how many independent variables (send times) to test boils down to the number of new customers that can participate in the experiment. In this case, we’re a global company with about 30,000 new customers per month, and it usually takes a full month before half of them choose a preferred time. That leaves us with 15,000 spread out across the world, with about half of those in the United States. 7,000 is enough to test three independent variables. Ideally, the minimum number of customers is 1,000, so we can be fairly confident in the experiment’s results. N.B. All times are local. We will send the messages three times: t1, t2, and t3. Where: t1 is the initial email send time: 0500 local (two and a half hours before the peak time of 0730) local. t2 is two hours after t1. t3 is two hours after t2. Fig. 2 Three cohorts, each with a two-hour window. This will give us a delivery time window of six hours, covering a large portion of the normal distribution of our existing customers. As the US covers six time zones, we’ll have to do a bit of time arithmetic to arrive at the correct data center or cloud send-time for each customer in each cohort so that the messages are sent at the correct local time. !Important: Do not spread out the send times within the cohort’s time window; try to send all messages so that they are delivered as close as possible to each of the three times, t1, t2, and t3. Customer Segmentation, AKA Cohort Engineering We should consider a few other criteria when creating the customer cohorts. Our previous work on demographics shows that 90% of our customers live in metropolitan areas. We can use zip codes or geolocation to create the location predicate and split each metro into three groups. Are there any other criteria that might be useful? Mobile vs. desktop users Android vs iOS Windows vs Mac User-agent Organization size SaaS subscription plan I’ll leave it to you to decide how to slice and dice, but keep in mind that if you can, use whatever demo- or psychographic data you have and ensure that each of the three segments is well-balanced. That will avoid trouble when we want to do some analytic exploration with the results. Launch in 3, 2, HOLD THE LAUNCH! Before we start sending messages, we need to be sure we have the right observability in place. We need to know more than a few of the key events like message-was-opened and message-is-converted. We must also know when anything goes wrong, anywhere in the customer/experiment journey. We must also be alerted if certain failure conditions are met so we can stop the experiment before wasting the very real offer we want our real customers to accept. Be sure to include the actual send time. Another group of facts that may come in handy at some point is the network and geolocation data. Perhaps a significant number of customers open messages while online for coffee or When everything is ready, all that’s left to do is push the big, red GO! Button and collect the data. How long should you wait? Data Exploration and Analysis As you can see from the three graphs, the results have dropped to a trickle within 48 hours of beginning the experiment. This information, too, is highly valuable. It’s safe to assume that each customer receives more messages as time passes, pushing the experimental message further down the list. This is where tracking each user’s app or user-agent will allow you to correlate email app window size with message open rate. In addition to looking at each cohort individually, look at all three combined to see if all patterns are common. For example, maybe all of the tracked customer rates (open, clicked web page link, clicked call to action link) will decline just before lunch and remain low until the end of the day. Other patterns may be related to subscription plans or the customer’s phone model. Perhaps iPhone users are busy getting a double soy chai latte in the 15 minutes before they start their day, and that’s when they check their email apps most thoroughly. Finally, it may be that regardless of when the emails were sent, there is a definite peak in opening emails around 8:30 a.m. local time or 8:00 or 9:00. YMMV. Results Finally, the results are in, and you have an unambiguous result. You can clearly see that new customers have a preference for reading email at 8:24 a.m. From this day forward, you can set the default email send time for new customers to this time. Hooray! All left to do now is write up a paper, distribute it to your coworkers, and get approval to change the default send time unless one of the business unit heads or product owners wants a meeting to discuss the results –all the results – including the analytic explorations and assumptions. Assumptions? Discussion/Assumptions We all make assumptions, or so I assume. I’ve never seen any research on this question. Before you write your conclusion and send out the paper, now would be a good time to think about any assumptions made and include them in a discussion section. As an example, in preparing the cohorts, we tried to balance, as much as possible, all of the definitive customer attributes that we know about, like subscription plan, company size, and so on. However, we know that company culture can vary quite a bit. Some companies – perhaps most of them – may have fully adopted work-from-home. STO With Deep Learning Suppose the results of your experiment do not show a strong enough signal across each or all cohorts. Or, perhaps while you were exploring the experiment’s result data, you noticed a strong signal from customers in Los Angeles using their iPhones. They would like to check their email later in the morning –maybe after a run on the beach or while sitting in traffic on the 405. You then look at your data from existing customers who have expressed a preference, also from LA, and also using their iPhone. This group also strongly prefers receiving emails between 9 and 10 am. Perhaps there are other strong correlations like this in your customer db –if so, you might be able to train a machine learning model to predict the best time to send emails to new users. How would you do this when you’ve never done ML training before? Is it even possible? Of course it is! Like most mid- to senior-level programmers, I have won so many battles with tough problems that I believe I can do just about anything with code. So, let’s give it a shot. Even if your org’s data set doesn’t lend itself to training an ML model, you will gain important insights into the hard work our data science and engineering colleagues do, understand just how difficult we make their job by sending them lousy data, and you’ll learn some important concepts and vocabulary that will become a larger part of every programmer’s job description in the coming years. Two Methods: Supervised or Unsupervised Learning The core of the problem is prediction: which ten or 15-minute send time slot are new users most likely to choose? This problem can be addressed by two classes of ML models: supervised and unsupervised learning. It may turn out that the best path for you is to use unsupervised clustering to explore and understand your data and help you hunt for similarities (clusters) in your data. If so, you can use clustering to predict send times. Or you can continue to try the supervised learning path. What’s the key difference? As far as you’re concerned, supervised learning requires labeled data, and unsupervised data does not, so unsupervised learning is a little easier. Fortunately, if your customer data includes their preferred email send time (or equivalent for your use case), then you're halfway to having labeled data. Supervised Learning With XGBoost XGBoost is an excellent choice for this problem. There are numerous Python and R implementation tutorials directly related to our problem. It doesn’t require massive computing resources, and it doesn’t require optimization of the parameters or tuning. Perfect for beginners like you and me. I don’t have the space in this article to walk you through each step, but I highly recommend the following tutorial in Python: Using XGBoost in Python. The tutorial is centered around the problem of predicting (classifying) diamond prices. Running through the tut shouldn’t take you more than an hour. How are diamond prices like customer’s preferred email delivery times? Actually, the business or data domain makes little difference to the ML model. Diamond Data Customer Data In both cases, the domain is not important to the problem. We have a bunch of data fields; some are relevant to the problem, some perhaps not. With the diamond data, we want to predict the price, while in the customer data, we want to predict the preferred send time. Yes, you and I know that there is a calculation that can be made using diamond data attributes, but the ML model doesn’t know that and doesn’t need to know that to achieve a reliable price prediction. The answers are in the data - or not. What is important are the attributes in the data. Do you have enough attributes/fields/columns in your customer data to answer a basic question: is a given new customer’s data similar enough to any existing groups of customers such that the new user would likely choose the existing user email send time? Our second ML model may help you answer that question. After working through the XGBoost tutorial, try the same with your data. You will likely have to iterate over a number of variations of the data you use. If that’s not working out well, move on to clustering. Unsupervised Learning With *–Clustering What is clustering? Datacamp.com has a very good introduction to clustering. “Clustering is an unsupervised machine learning technique with many applications in pattern recognition, image analysis, customer analytics, market segmentation, social network analysis… ... it is an iterative process of information discovery that requires domain expertise and human judgment used frequently to make adjustments to the data and the model parameters to achieve the desired result. ” Basically, your clustering model’s chosen algorithm processes your data and produces a result set of two-dimensional vectors: X and Y coordinates for each record in the data. These can be presented as a table or as a visualization like this, where it’s much easier to see the clusters: For now, start with K-means clustering, a widely used algorithm for clustering tasks due to its use of intuition and ease of implementation. It is a centroid-based algorithm where the user must define the number of clusters it wants to create. The number of clusters in our case is the number of send time slots we want predictions for, e.g., from 7 am to 10 am every 15 minutes, so 12 clusters (K = 12). Again, I highly recommend starting with a tutorial and then moving on to your customer data. I can highly recommend this tut from Kaggle: Customer-Segmentation with k-means. Once you’re ready to use your data, the overall workflow goes like this: Explore your data and try to intuitively identify features that might show a correlation with the existing user’s preferred send times. You will probably have too much data for existing and no preference. Iterate over the attributes/columns/fields until some clustering appears. Refine until you have enough clusters to cover a large portion of preferred send times. If your data shows absolutely no clustering correlated with send times, you have two options: Try to identify and collect data that would help to discern a preference or Accept defeat gracefully; sometimes, the null hypothesis wins. If any of your datasets shows clusters correlated to send times, you have won the golden ticket! Next, you must derive the cluster centroids (find a real or synthetic vector at the center of each send-time cluster). Collect a dataset of new users who have not expressed a send time preference. For each of the new users, calculate their vectors using the same method used to create the cluster vectors. Measure the difference (cosine similarity or Euclidean distance) between each user’s vector from #7 to each cluster centroid, and the centroid with the smallest difference is your new optimal send time. Over the next few weeks, check that set of new users to see if their preferred send time matches the predicted time. You can also test this method against a set of existing user data with preferred send times. If, by some miracle, this all works out very nicely, then you have won the golden ticket! Try another experiment; only this time, send out a new experimental email message using the predicted email send time and see if the conversion rate is higher than during the first experiment. Conclusion If you did some research online on this topic, you probably noticed that there are a lot of companies that make it their business to provide a solution to this problem; it’s worth a lot of money to those who can solve it or increase the conversion rate enough to justify the expense. In addition to the economic benefit you could potentially deliver to your organization, you will be able to add a new and important section to your CV: created and conducted data science experiments in the area of message send time optimization that resulted in a 27% increase in conversion rate and increased ARR by 4%. Now, that is something worth working toward. Good luck!
Nowadays in the agile way of the software development lifecycle, continuous integration and continuous delivery enable software delivery workflows to include multiple teams and functions spanning over development, assurance, operations, and security. What Are Software Design Patterns? Software design patterns are best practices that are followed in order to resolve common problems in development. By following software patterns, a development team can follow the same practices to deliver, build, and deploy code in a much more efficient and systematic way. Software design anti-patterns are the malpractices that can cause harm in the way software development is being done. Continuous Integration Software Design Patterns and Anti-Patterns Continuous integration is an automated integration process that gets the source code from multiple branches to be merged into a main branch which is then used as a reference to deploy the development code to different environments. Using certain patterns cleanses the code to be made deployment-ready. CI Pipeline Patterns and Anti-Patterns Version Controlling the Source Code The definition of standards plays an important role in the continuous integration chain. Now, there are several conventions that make it possible to facilitate the understanding of an application's source code and software development lifecycle. Defining conventions, therefore, has a major impact on both the individual or team, and at the level of automated processes. Continuous Integration Version Control Patterns: Define better conventions to set better contexts for the development lifecycle. Build on every change done to a commit, branch, merge, and pull request. Add useful information to commit messages, use a proper branch naming convention, and standardize the application version. Use pre- and- post actions on commits, merges, and pull requests. Continuous Integration Version Control Anti-Patterns: Have limited builds per sprint, per week; cherry-picking commits. Use a non-relevant branch name and meaningless commit messages. Use different versions for different applications for build. Test the maximum of the source code manually after packaging or deploying it. Running Builds Periodically The build phase is the most important phase of the continuous integration cycle. In this phase, several validations are required and considerations are ensured to make sure that the application has been packaged properly for deployment. Related Tutorial: Azure DevOps and Deploying a Mule Application into Cloudhub Continuous Integration Build Patterns: Use a fresh isolated environment to build the application and control the allocated resources to avoid impacting other builds. Automatically release and deploy a new version on every new commit, branch, merge, or pull request. Test the weekly builds to identify potential issues proactively instead of waiting for a code update. Deploy a hotfix as soon as possible. Test the code in staging before moving it to production. Deploy the build free of any security vulnerabilities and sensitive data exposure; take action immediately if a severity is defined, and disassociate passwords from the source code. Lint and format code to make the source code more readable. Run a set of tests automatically on each build to run specific sets periodically. Run the tests in the same pattern across different platforms using the same set of test data to compare results. Continuous Integration Build Anti-Patterns: Always use the same environment without handling dependency issues; not optimizing resources for subsequent builds and potentially impacting other builds as well. Start a build manually after every sprint or week, depending upon task allocation. Schedule a hotfix directly to the production environment. Add in sensitive data, like usernames, passwords, tokens, etc., to configuration files. Not setting in code quality standards and semantics Run tests manually after deployment. Run a test framework that would fail because of the status of the infrastructure. Continuous Deployment Software Design Patterns and Anti-Patterns Continuous deployment enables operations and workflows. The goal is to safely deliver artifacts into the different environments in a repeated fashion with a lesser error rate. The continuous deployment process helps with automating the operational services of releasing, deploying, and monitoring applications. Validations and Release Management The delivery phase is an extension of the continuous integration phase where the system needs to handle the automated process of deploying all code changes to a stable test environment to qualify the working functionality of the source code and the working version of the source code before deployment. Learn more about the release pipeline using Azure DevOps. Continuous Deployment Validation and Release Management Patterns: Automate the verification and validation procedure of the released version of the software to include unit, integration, and regression testing. Define an enterprise standard release convention for version management and facilitate automation. Deploy a hotfix when necessary; test the code in a pre-prod environment before moving it to production. Continuous Deployment Validation and Release Management Anti-Patterns: Use manual tests to verify and validate software. Do not increment the application version to overwrite the previous existing versions. Schedule the deployment of a hotfix; test directly in production. Related Guide: Automation Testing in CI/CD Pipelines Deployment Deployment is done once the feature is tested in a pre-production environment for any regression issues or uncaught errors in the platform. Continuous Deployment Patterns: Run the build process once while deploying to multiple target environments. Deploy code to the production but limit the access to the codebase by enabling a feature flag. Utilize automated provisioning code or scripts to automatically start and destroy environments. Continuous Deployment Anti-Patterns: Run the software build in every stage of the deployment pipeline. Wait to commit the code until the feature development has been completed. Rollback Rollback becomes very important if there's a deployment failure, which essentially means getting back the system to the previous working state. Continuous Deployment Rollback Patterns: Provide a single command rollback of changes after an unsuccessful deployment. Keep the environmental configuration changes. Externalize all variable values from the application configuration as build/deployment properties. Continuous Deployment Rollback Anti-Patterns: Manually undo changes applied to rollback the deployed code. Hardcode values inside the source code based on the target environments. Documentation for Steps and Procedures Documentation is a significant component of the deployment process flow to stream the respective information across stakeholders at every team level. Continuous Deployment Documentation Pattern: Define a standard of documentation that can be understood by every team. Continuous Deployment Documentation Anti-pattern: Keeping the documentation restricted to specific teams. Conclusion The CI/CD process is an omnipotent part of the software delivery lifecycle. It gives an enterprise the power to release quality code that follows all standardizations and compliance into the production environment. The CI/CD software patterns and anti-patterns are important to understand as they give immense potential to standardize the quality of code delivery. If the CI/CD process can be established with the right principles, it will help reduce fallacies and reduce the time-to-market of the product. Additional Resources "DevOps tech: Continuous delivery" "DevOps Metrics: Why, what, and how to measure success in DevOps" Progressive Delivery Patterns and Anti-Patterns Refcard Introduction to DevSecOps Refcard The Essentials of GitOps Refcard Getting Started With Feature Flags Refcard Getting Started With Log Management Refcard End-to-End Testing Automation Essentials Refcard
Coding is not just about strings of code — it's an art form, a creative endeavor where developers use Agile methodologies as their paintbrushes and palettes to craft software masterpieces. In this article, we'll explore how Agile strategies align with the creative spirit of developers, turning the process of coding into a journey of craftsmanship. The Agile Developer's Canvas Imagine the Agile Developer's Canvas as an artist's sketchbook. It's a place where developers lay the foundation for their creations. User stories, sprint planning, and retrospectives are the vivid colors on this canvas, guiding developers through the creative process. This visual representation ensures that the development journey is not just a technical process but a canvas filled with the aspirations and visions of the creators. Practical Considerations When filling the Agile Developer's Canvas, consider it not just as a planning tool but as a living document that evolves with the project. Regularly revisit and update the canvas during retrospectives to reflect new learnings and changes in project goals. This dynamic approach ensures that the canvas stays relevant and continues to guide the team through the entire creative journey. Encourage teams to use the Agile Developer's Canvas as a communication tool. Display it prominently in team spaces to serve as a constant reminder of the project's vision and priorities. This visual representation can align team members, fostering a shared understanding and commitment to the creative process. Refactoring as Sculpting Think of refactoring as a sculptor refining their masterpiece. Developers, like artists, continuously shape and refine their code to enhance its quality and elegance. It's not just about fixing bugs; it's about transforming the code into a work of art. This section encourages developers to view refactoring as a form of creative expression, providing practical tips for chiseling away imperfections and creating code that's not just functional but beautiful. Practical Considerations Incorporate refactoring into the definition of done (DoD) for user stories. This ensures that refactoring isn't treated as a separate task but is an integral part of delivering quality code with each iteration. Promote a mindset shift from viewing refactoring as merely fixing technical debt to considering it an opportunity for creative expression. Encourage developers to share their refactoring stories, celebrating instances where code was transformed into something more elegant and maintainable. Suggest organizing occasional "refactoring workshops" where team members collaboratively explore and refactor specific parts of the codebase. This not only enhances collective knowledge but also fosters a sense of shared responsibility for the overall craftsmanship of the code. The Symphony of Continuous Integration In the world of software development, Continuous Integration is like the conductor of a symphony, bringing harmony to the integration of various code contributions. Automated testing becomes the virtuoso, ensuring the quality of the software symphony. Real-world examples paint a picture of a well-coordinated orchestra of developers, each contributing to a harmonious composition of code that resonates with excellence. Practical Considerations Consider integrating Continuous Integration practices into sprint planning. Devote time to discuss integration strategies, potential challenges, and how each team member's contributions will harmonize within the larger development symphony. Highlight the role of feedback loops within the Continuous Integration process. Quick feedback on integration results helps developers course-correct early, preventing dissonance in the overall symphony. Encourage the use of visual dashboards displaying the status of builds and tests. This transparency allows team members to appreciate the collaborative effort and progress made in orchestrating a seamless software symphony. User Stories as Narrative Threads User stories are the threads that weave through the fabric of software development. Instead of dry technicalities, think of them as the characters in a story. This section delves into the art of crafting compelling user stories, transforming them from mere requirements into engaging narratives. By infusing creativity into these stories, developers can create software that tells a meaningful tale, capturing the attention and satisfaction of end-users. Practical Considerations Advocate for user story workshops that involve cross-functional team members. This collaborative approach ensures that diverse perspectives contribute to crafting meaningful narratives, fostering a sense of inclusivity in the creative process. Promote storytelling techniques within the team. Encourage developers to share their favorite user stories, emphasizing the impact on end-users and how each story contributes to the overarching narrative of the software. Consider implementing a "user story showcase" during sprint reviews, where developers present the narratives behind completed user stories. This not only celebrates achievements but also reinforces the connection between code and the meaningful stories it helps tell. Conclusion As developers embark on the Agile journey, they're not just writing code; they're creating something meaningful. The Agile Developer's Canvas, refactoring as sculpting, the symphony of continuous integration, and user stories as narrative threads provide the tools for developers to transcend the technicalities and embrace the artistry of their craft. This journey isn't just about delivering exceptional software; it's about contributing to the evolving narrative of software development as a creative and human-centered art form. By embracing this perspective, developers become not just coders but artists, each line of code a stroke on the canvas of digital innovation.