Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.
What metrics does your organization use to measure success? MTTR? Frequency of deploys? Other? Tell us!
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
Cloud Build Unleashed: Expert Techniques for CI/CD Optimization
DORA Metrics: Tracking and Observability With Jenkins, Prometheus, and Observe
With AI, the domain of software development is experiencing a breakthrough phase with the continuous integration of state-of-the-art Large Language Models like GPT-4 and Claude Opus. These models extend beyond the role of traditional developer tools to directly assist developers in translating verbal instructions into executable code across a variety of programming languages, which speeds up the process of coding. Code Generation Enhancing Developer Productivity LLMs understand context and generate best-practice pieces of code, making them very good at enhancing the productivity of developers and their future research. They work as a developer on-call assistant, offering insight and alternatives that may even elude more experienced programmers. Such a role gains a lot of importance in large and complex projects where the integration of different software modules might introduce subtle, sometimes undetectable bugs. Training and Adaptation Continuous improvements in LLMs will be realized through the feedback loops from their real-world use, wherein models will be trained according to the corrections and suggestions of the developers. Continuous training brings models closer to specific industry needs, further entrenching them in the core of software development processes. Debugging and Bug Fixing With AI Innovative Tools for Enhanced Accuracy LLM integration into debugging and bug fixing is a radical change. Tools like Meta's SapFix and Microsoft's InferFix automatically detect and fix errors, hence saving time in workflows and reducing downtime. Such systems are designed to be neatly plugged into the existing CI/CD pipelines, providing real-time feedback without interrupting the flow of development. Capabilities to scan millions of lines of code - these AI-enhanced tools reduce the error rates significantly by catching bugs at their early stages. This proactive detection of bugs will definitely help in maintaining the health of the codebase and ensuring bugs are resolved before they turn into major problems. Customized Solutions This flexibility, however, is what enables LLMs to fit into the needs of a given project. Whether matching different coding standards or particular programming languages, these models can be versatile instruments in the arsenal of a developer that can be trained to suit very granular needs. Seamless CI/CD Integration AI: The Catalyst for Reliable Deployments LLMs are fast becoming a staple in CI/CD ecosystems and further improve the reliability of deployments. They automate code reviews and quality checks that ensure only very stable versions of applications make it to deployment. This raises the pace of deployment, raising the quality of software products overall. Continuous Learning and Improvement This means that the integration of LLMs into CI/CD processes is not a one-time setup but part of a continuous improvement strategy. These models learn with every deployment and hence become efficient over time to reduce the chances of deployment failures. Closing the Gap Between Dev and Ops By providing more homogeneous outputs of code and automating routine checks, LLMs bridge the traditional gap between development and operations teams. That is a very important synergy in terms of modern DevOps practices, trying to create a more collaborative and efficient environment. Future Impact and Market Adoption of Large Language Models in Software Development The future of software development is inherently tied to the advances made with LLMs. The more they develop, the more they are going to change the roles within software teams and eventually alter processes, like Agile or Scrum, which now dominate. The ability of LLMs to work as a development and abstraction tool both instills the probability of increased productivity. This will lead to the completion of projects much faster and enable companies to deliver software products faster. Market Adoption and Economic Implications The potential of LLMs to impact software development economically is huge. Such advances in technologies, if adopted by companies, would lead to much higher productivity levels, which can result in cost savings in the software development and maintenance process. For instance, GitHub Copilot, when integrated into the development environment, will help to come up with code snippets and automate literal translation, thus considerably reducing the time a developer can take to perform these tasks. Moreover, with the capabilities of generating test cases and debugging, LLMs also reduce the resource requirements of these processes which are time-consuming but important. Reshaping the Workforce The nature of the workforce in the tech industry is also going to change as LLMs are integrated. Since these models are going to find themselves more and more engaged in routine and repetitive tasks, the nature of the work going to be done by a software developer will move toward being creative and problem-solving. This will mean that developers should re-skill themselves to amplify their competencies in machine learning, data science, and understanding AI-driven tooling. The tasks in software development will expand to include more problem-solving and critical thinking using strategic decision-making as coding becomes more resolutions through the LLMs. Conclusion LLMs are no longer just tools; they are becoming an integral part of software development. Their impact on productivity, economic outcomes, and the nature of work in the tech industry is promising. Successful integration requires careful planning and continuous learning to adapt to these ever-evolving technologies.
As enterprises mature in their CI/CD journey, they tend to ship code faster, safely, and securely. One essential strategy the DevOps team applies is releasing code progressively to production, also known as canary deployment. Canary deployment is a bulletproof mechanism that safely releases application changes and provides flexibility for business experiments. It can be implemented using open-source software like Argo Rollouts and Flagger. However, advanced DevOps teams want to gain granular control over their traffic and pod scaling while performing canary deployment to reduce overall costs. Many enterprises achieve advanced traffic management of canary deployment at scale using open-source Istio service mesh. We want to share our knowledge with the DevOps community through this blog. Before we get started, let us discuss the canary architecture implemented by Argo Rollouts and Istio. Recap of Canary Implementation Architecture With Argo Rollouts and Istio If you use Istio service mesh, all of your meshed workloads will have an Envoy proxy sidecar attached to the application container in the pod. You can have an API or Istio ingress gateway to receive incoming traffic from outside. In such a case, you can use Argo Rollouts to handle canary deployment. Argo Rollouts provides a CRD called Rollout to implement the canary deployment, which is similar to a Deployment object and responsible for creating, scaling, and deleting ReplicaSets in K8s. The canary deployment strategy starts by redirecting a small amount of traffic (5%) to the newly deployed app. Based on specific criteria, such as optimized resource utilization of new canary pods, you can gradually increase the traffic to 100%. The Istio sidecar handles the traffic for the baseline and canary as per the rules defined in the Virtual Service resource. Since Argo Rollouts provides native integration with Istio, it would override the Virtual Service resource to increase the traffic to the canary pods. Canary can be implemented using two methods: deploying new changes as a service or deploying new changes as a subset. 1. Deploying New Changes as a Service In this method, we can create a new service (called canary) and split the traffic from the Istio ingress gateway between the stable and canary services. Refer to the image below. You can refer to the YAML file for a sample implementation of deploying a canary with multiple services here. We have created two services called rollouts-demo-stable and rollouts-demo-canary. Each service will listen to HTTP traffic for the Argo Rollout resource called rollouts-demo. In the rollouts-demo YAML, we have specified the Istio virtual service resource and the logic to gradually improve the traffic weightage from 20% to 40%, 60%, 80%, and eventually 100%. 2. Deploying New Changes as a Subset In this method, you can have one service but create a new Deployment subset (canary version) pointing to the same service. Traffic can be split between the stable and canary deployment sets using Istio Virtual service and Destination rule resources. Please note that we have thoroughly discussed the second method in this blog. Implementing Canary Using Istio and Argo Rollouts Without Changing Deployment Resource Since there is a misunderstanding among DevOps professionals that Argo Rollouts is a replacement for Deployment resource, and the services considered for canary deployment have to refer to the Argo Rollouts with Deployment configuration rewritten. Well, that’s not true. The Argo Rollout resource provides a section called workloadRef where existing Deployments can be referred to without making significant changes to Deployment or service YAML. If you use the Deployments resource for a service in Kubernetes, you can provide a reference in the Rollout CRD, after which Argo Rollouts will manage the ReplicaSet for that service. Refer to the image below. We will use the same concept to deploy a canary version using the second method: deploying new changes using a Deployment. Argo Rollouts Configuration for Deploying New Changes Using a Subset Let's say you have a Kubernetes service called rollout-demo-svc and a deployment resource called rollouts-demo-deployment (code below). You need to follow the three steps to configure the canary deployment. Code for Service.yaml: YAML apiVersion: v1 kind: Service metadata: name: rollouts-demo-svc namespace: istio-argo-rollouts spec: ports: - port: 80 targetPort: http protocol: TCP name: http selector: app: rollouts-demo Code for deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: rollouts-demo-deployment namespace: istio-argo-rollouts spec: replicas: 0 # this has to be made 0 once Argo rollout is active and functional. selector: matchLabels: app: rollouts-demo template: metadata: labels: app: rollouts-demo spec: containers: - name: rollouts-demo image: argoproj/rollouts-demo:blue ports: - name: http containerPort: 8080 resources: requests: memory: 32Mi cpu: 5m Step 1: Setup Virtual Service and Destination Rule in Istio Set up the virtual service by specifying the back-end destination for the HTTP traffic from the Istio gateway. In our virtual service rollouts-demo-vs2, we mentioned the back-end service as rollouts-demo-svc, but we created two subsets (stable and canary) for the respective deployment sets. We have set the traffic weightage rule so that 100% of the traffic goes to the stable version and 0% goes to the canary version. As Istio is responsible for the traffic split, we will see how Argo updates this Virtual service resource with the new traffic configuration specified in the canary specification. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: rollouts-demo-vs2 namespace: istio-argo-rollouts spec: gateways: - istio-system/rollouts-demo-gateway hosts: - "*" http: - name: route-one route: - destination: host: rollouts-demo-svc port: number: 80 subset: stable weight: 100 - destination: host: rollouts-demo-svc port: number: 80 subset: canary weight: 0 Now, we have to define the subsets in the Destination rules. In the rollout-destrule below, we have defined the subsets canary and stable and referred to the Argo Rollout resource called rollouts-demo. YAML apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: rollout-destrule namespace: istio-argo-rollouts spec: host: rollouts-demo-svc subsets: - name: canary # referenced in canary.trafficRouting.istio.destinationRule.canarySubsetName labels: # labels will be injected with canary rollouts-pod-template-hash value app: rollouts-demo - name: stable # referenced in canary.trafficRouting.istio.destinationRule.stableSubsetName labels: # labels will be injected with stable rollouts-pod-template-hash value app: rollouts-demo In the next step, we will set up the Argo Rollout resource. Step 2: Setup Argo Rollout Resource The rollout spec should note two important items in the canary strategy: declare the Istio virtual service and destination rule and provide the traffic increment strategy. You can learn more about the Argo Rollout spec. In our Argo rollout resource, rollouts-demo, we have provided the deployment (rollouts-demo-deployment) in the workloadRef spec. In the canary spec, we have referred to the virtual resource (rollouts-demo-vs2) and destination rule (rollout-destrule) created in the earlier step. We have also specified the traffic rules to redirect 20% of the traffic to the canary pods and then pause for manual direction. We have given this manual pause so that in the production environment, the Ops team can verify whether all the vital metrics and KPIs, such as CPU, memory, latency, and the throughput of the canary pods, are in an acceptable range. Once we manually promote the release, the canary pod traffic will increase to 40%. We will wait 10 seconds before increasing the traffic to 60%. The process will continue until the traffic to the canary pods increases to 100% and the stable pods are deleted. YAML apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo namespace: istio-argo-rollouts spec: replicas: 5 strategy: canary: trafficRouting: istio: virtualService: name: rollouts-demo-vs2 # required routes: - route-one # optional if there is a single route in VirtualService, required otherwise destinationRule: name: rollout-destrule # required canarySubsetName: canary # required stableSubsetName: stable # required steps: - setWeight: 20 - pause: {} - setWeight: 40 - pause: {duration: 10} - setWeight: 60 - pause: {duration: 10} - setWeight: 80 - pause: {duration: 10} revisionHistoryLimit: 2 selector: matchLabels: app: rollouts-demo workloadRef: apiVersion: apps/v1 kind: Deployment name: rollouts-demo-deployment Once you have deployed all the resources in steps 1 and 2 and accessed them through the Istio ingress IP from the browser, you will see an output like the one below. You can run the command below to understand how the pods are handled by Argo Rollouts. YAML kubectl get pods -n <<namespace>> Validating Canary Deployment Let’s say developers have made new changes and created a new image that is supposed to be tested. For our case, we will make the Deployment manifest file (rollouts-demo-deployment) by modifying the image value from blue to red (refer to the image below). YAML spec: containers: - name: rollouts-demo image: argoproj/rollouts-demo:blue Once you deploy the rollouts-demo-deployment, Argo Rollout will understand that new changes have been introduced to the environment. It would then start making new canary pods and allow 20% of the traffic. Refer to the image below: Now, if you analyze the virtual service spec by running the following command, you will realize Argo has updated the traffic percentage to canary from 0% to 20% (as per the Rollouts spec). YAML kubectl get vs rollouts-demo-vs2 -n <<namespace>> -o yaml Gradually, 100% of the traffic will be shifted to the new version, and older/stable pods will be terminated. In advanced cases, the DevOps team must control the scaling of canary pods. The idea is not to create all the pods as per the replica at each gradual shifting of the canary but to create the number of pods based on specific criteria. In those cases, we need HorizontalPodAutoscaler (HPA) to handle the scaling of canary pods. Scaling of Pods During Canary Deployment Using HPA Kubernetes HPA is used to increase or decrease pods based on load. HPA can also be used to control the scaling of pods during canary deployment. HorizontalPosAutoscaler overrides the Rollouts behavior for scaling of pods. We have created and deployed the following HPA resource: hpa-rollout-example. Note: The HPA will create the number of pods = maximum (minimum pods as per HPA resource, or number of the replicas mentioned in the Rollouts). This means if the number of pods mentioned in the HPA resource is 2 but the replicas as per the Rollouts resource is 5, then a total of 5 pods will be created. Similarly, if we update the replicas in the rollouts-demo resource as 1, then the number of pods created by HPA will be 2. (We will have updated the replicas to 1 to test this scenario.) In the HPA resource, we have referenced the Argo Rollout resource rollouts-demo. That means HPA will be responsible for creating two replicas at the start. If the CPU utilization is more than 10%, more pods will be created. A maximum of six replicas will be created. YAML apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: hpa-rollout-example namespace: istio-argo-rollouts spec: maxReplicas: 6 minReplicas: 2 scaleTargetRef: apiVersion: argoproj.io/v1alpha1 kind: Rollout name: rollouts-demo targetCPUUtilizationPercentage: 10 When we deployed a canary, only two replicas were created at first (instead of the five mentioned in the Rollouts). Validating Scaling of Pods by HPA by Increasing Synthetic Loads We can run the following command to increase the loads to a certain pod. YAML kubectl run -i -tty load-generator-1 -rm -image=busybox:1.28 -restart=Never - /bin/sh -c "while sleep 0.01; do wget -q -O- http://<<service name>>.<<namespace>>; done;" You use the following command to observe the CPU utilization of the pods created by HPA. YAML kubectl get hpa hpa-rollout-example -n <<namespace>> -watch Once the load increases more than 10%, in our case to 14% (refer to the image below), new pods will be created. Many metrics, such as latency or throughput, can be used by HPA as criteria for scaling up or down the pods. Video Below is the video by Ravi Verma, CTO of IMESH, giving a walkthrough on advanced traffic management in Canary for enterprises at scale using Istio and Argo Rollouts. Final Thought As the pace of releasing software increases with the maturity of the CI/CD process, new complications will emerge. And so will new requirements by the DevOps team to tackle these challenges. Similarly, when the DevOps team adopts the canary deployment strategy, new scale, and traffic management challenges emerge to gain granular control over the rapid release process and infrastructure cost.
The Trivial Answer Most engineers know that we must have green builds because a red build indicates some kind of issue. Either a test did not pass, or some kind of tool found a vulnerability, or we managed to push our code when it couldn’t even compile. Either way, it is bad. You might have noticed that this article is far from over, so there must be more to this. You are right! What Does Green Mean Exactly? We have already discussed that red means something wrong, but can we say that green is the opposite? Does it guarantee that everything is working great, meets the requirements, and is ready to deploy? As usual, it depends. When your build turns green, we can say, that: The code compiled (assuming you are using a language with a compiler). The existing (and executed) tests passed. The analyzers found no critical issues that needed to be fixed right away. We were able to push our binaries to an artifact storage or image registry. Depending on our setup, we might be ready to deploy our code at the moment. Why am I still not saying anything definite about the state of the software even when the tests passed? It is because I am simply not sure whether a couple of important things are addressed by our theoretical CI system in this thought-experiment. Let me list a couple of the factors I am worried about. Please find these in the following sections! Test Quality I won’t go deep into details as testing and good quality tests are bigger topics, deserving way more focus than what I could squeeze in here. Still, when talking about test quality, I think we should at least mention the following thoughts as bullet points: Do we have sufficient test coverage? Are our test cases making strict assertions that can discover the issues we want to discover? Are we testing the things we should? Meaning: Are we focusing on the most important requirement first instead of testing the easy parts? Are our tests reliable and in general following the F.I.R.S.T. principles? Are we running our tests with each build of the code they are testing? Are we aware of the test pyramid and following the related recommendations? Augmenting these generic ideas, I would like to mention a few additional thoughts in a bit more detail. What Kinds of Dependencies Are We Using in Our Tests? In the lower layers of the test pyramid, we should prefer using test doubles instead of the real dependencies to help us focus on the test case and be able to generate exceptional scenarios we need to cover in our code. Do We Know What We Should Focus on For Each Layer of The Test Pyramid? The test pyramid is not only about the number of tests we should have on each layer, but it gives us an idea about their intent as well. For example, the unit tests should test only a small unit (i.e., a single class) to prod and poke our code and see how it behaves in a wide variety of situations assuming that everything else is working well. As we go higher, the focus moves onto how our classes behave when they are integrated into a component, still relying on test doubles to eliminate any dependency (and any unknowns) related to the third-party components used by our code. Then in the integration tests, we should focus on the integration of our components with their true dependencies to avoid any issues caused by the imperfections of the test doubles we have been using in our lower-layer tests. In the end, the system tests can use an end-to-end mindset to observe how the whole system behaves from the end user’s point of view. Are Our Code Dependencies Following Similar Practices? Hopefully, the dependency selection process considers the maturity and reliability of the dependencies as well as their functionality. This is very important because we must be able to trust our dependencies that they are doing what they say they do. Thorough testing of the dependency can help us build this trust, while the lack of tests can do the opposite. My personal opinion on this is that I cannot expect my users to test my code when they pick my components as dependencies, because not only they cannot possibly do it well; but I won’t know either when their tests fail because my code contains a bug — a bug that I was supposed to find and fix when I released my component. For the same reason, when I am using a dependency, I think my expectation that I should not test that dependency is reasonable. Having Repeatable Builds It can be a great feeling when our build turns green after a hard day’s work. It can give us pride, a feeling of accomplishment, or even closure depending on the context. Yet it can be an empty promise, a lie that does very little good (other than generating a bunch of happy chemicals for our brain) if we cannot repeat it when we need to. Fortunately, there is a way to avoid these issues if we consider the following factors. Using Reliable Tags It is almost a no-brainer that we need to tag our builds to be able to get back to the exact version we have used to build our software. This is a great start for at least our code, but we should keep in mind that nowadays it is almost impossible to imagine a project where we are starting from an empty directory and doing everything on our own without using any dependencies. When using dependencies, we can make a choice between convenience and doing the right thing. On one hand, the convenient option lets us use the latest dependencies without doing anything: we just need to use the wildcard or special version constant supported by our build tool to let it resolve the latest stable version during the build process. On the other hand, we can pin down our dependencies; maybe we can even vendor them if we want to avoid some nasty surprises and have a decent security posture. If we decide to do the right thing, we will be able to repeat the build process using the exact same dependencies as before, giving us a better chance of producing the exact same artifact if needed. In the other case, we would be hard-pressed to do the same a month or two after the original build. In my opinion, this is seriously undermining the usability of our tags and makes me trust the process less. Using The Same Configuration It is only half of the battle to be able to produce the same artifact in the end when we are rebuilding the code. We must be able to repeat the same steps during the build and use the same application configuration for the deployments in order to have the same code and use the same configuration and input to run our tests. It Shouldn't Start With The Main Branch Although we are doing this work in order to have repeatable builds on the main branch the process should not start there. If we want to be sure that the thing we are about to merge won't break the main build, we should at least try building it using the same tools and tests before we click merge. Luckily the Git branch protection rules are very good at this. To avoid broken builds, we should make sure that: The PRs cannot be merged without both the necessary approvals and a successful build validating everything the main build will validate as well.* The branch is up to date, meaning that it contains all changes from the main branch as well. Good code can still cause failures if the main branch contains incompatible changes. *Note: Of course, this is not trivial to achieve, because how can we test, for example, that the artifact will be successfully published to the registry containing the final, ready-to-deploy artifacts? Or how could we verify, that we will be able to push the Git tag when we release using the other workflow? Still, we should do our best to minimize the number of differences just like we do when we are testing our code. Using this approach, we can discover the slight incompatibilities of otherwise well-working changes before we merge them into the main. Why Do We Need Green Build Then? To be honest, green builds are not what we need. They are only the closest we have to the thing we need: a reliable indicator of working software. We need this indicator because we must be able to go there and develop the next feature or fix a production bug when it is discovered. Without being 100% sure that the main branch contains working software, we cannot do either of those because first, we need to see whether it is still working and fix the build if it is broken. In many cases, the broken builds are not due to our own changes, but external factors. For example, without pinning down all dependencies, we cannot guarantee the same input for the build, so the green builds cannot be considered reliable indicators either. This is not only true for code dependencies, but any dependency we are using for our tests as well. Of course, we cannot avoid every potential cause for failure. For example, we can’t do anything against security issues that are noticed after our initial build. Quite naturally, these can still cause build failures. My point is that we should do our best in the area where we have control over things, like the tests where we can rely on test doubles for the lower layers of the test pyramid. What Can You Do When Facing These Issues? Work on improving build repeatability. You can: Consider pinning down all your dependencies to use the same components in your tests. This can be achieved by: Using fixed versions instead of ranges in Maven or Gradle Make sure the dependencies of your dependencies will remain pinned, too, by checking whether their build files contain any ranges. Using SHA 256 manifest digests for Docker images instead of the tag names Make sure that you are performing the same test cases as before by: Following general testing best practices like the F.I.R.S.T. principles Starting from the same initial state in case of every other dependency (cleaning up database content, user accounts, etc.) Performing the same steps (with similar or equivalent data) Make sure you always tag: Your releases Your application configuration The steps of the build pipeline you have used for the build Apply strict branch protection rules. What Should We Not Try to Fix? We should keep in mind that this exercise is not about zealously working until we can just push a build button repeatedly and expect that the exact same workflow does the exact same thing every time like clockwork. This could be an asymptotic goal, but in my opinion, it shouldn’t. The goal is not to do the same thing and produce the exact same output, because we don’t need that. We have already built the project, published all versioned binary artifacts, and saved all test results the first time around. Rebuilding and overwriting these can be harmful because it can become a way to rewrite history and we can never trust our versioning or artifacts ever again. When a build step produces an artifact that is saved somewhere (may it be a binary, a test report, some code scan findings, etc.) that artifact should be handled as a read-only archive and should never change once saved. Therefore, if someone kicks off a build from a previously successfully built tag, it is allowed (or even expected) to fail when the artifact uploads are attempted. In Conclusion I hope this article helped you realize that focusing on the letter of the law is less important than the spirit of the law. It does not matter if you had a green build if you are not able to demonstrate, that your tagged software remained ready for deployment. At the end of the day, if you have a P1 issue in production nobody will care about the fact that your software was ready to deploy in the past if you cannot show that it is still ready to deploy now, and we can start working on the next increment without additional unexpected problems. What do you think about this? Let me know in the comments!
DevOps has become the groundwork for delivering top-notch applications quickly and efficiently in today’s agile development. Its efficiency and speed can also cause notable security threats if vulnerabilities are not managed properly. Sixty percent of data breaches succeed because organizations fail to apply known, available patches before the weaknesses are exploited. This piece explores the key importance vulnerability management plays in DevOps infrastructure, spotlighting methods to incorporate resilient defenses while retaining the swiftness and creativity provided by DevOps. The Essentials of Vulnerability Management Vulnerability management is an indispensable part of the security measures of an organization. It is a preventive method that spots, examines, handles, and reviews the security weaknesses in systems and software. Unpatched vulnerabilities played a role in 60% of data breaches. vulnerability management aims to alleviate the chances threat actors may have for exploitation. It consists of key components which include: Identification: This involves scanning software and systems to find weaknesses. Evaluation: This has to do with reviewing the strengths and expected outcomes of the weaknesses found. Treatment: Taking steps to amend vulnerabilities Reporting: Noting and communicating the position and advancement of security enhancement efforts Common Types of Vulnerabilities in Software and Infrastructure Weaknesses within infrastructure and software can prevail in numerous forms, common types include: Code flaws: Problems in software code can be taken advantage of. Configuration issues: Errors in applications or security can make security vulnerable. Outdated software: Operating outmoded editions with confirmed flaws Weak authentication: Inadequate security measures that can be evaded very easily Third-party components: Weaknesses in external dependencies incorporated in the software. The Impact of Unmanaged Vulnerabilities on DevOps Workflows Unmanaged weaknesses can have a significant influence on DevOps workflow. This can lead to critical repercussions, including leveraging weaknesses to breach systems causing severe financial and reputational damage. Operational disruption is also an impact of unmanaged vulnerabilities. When an attack targets a weak spot it can disrupt services and lead to a system downtime which will affect the continuity of business. Incapacity to handle vulnerabilities can result in non-compliance with the regulatory framework, attracting legal consequences. Lastly, the market position and reliability of an organization are also at stake here, unwavering security problems can deplete customers’ and stakeholders’ trust. Integration of Vulnerability Management in DevOps DevSecOps involves adding security measures to every software development phase. This method includes cooperation between operation, development, and security teams, making security a collective obligation. Focusing on security from the start enables identifying and resolving issues early to prevent them from becoming hurdles at a later time. This strategy helps reduce the risk associated with security issues later on when the software is completed. CI/CD pipelines are pivotal, they automate integration and deliver code changes. Including security scans in these pipelines can allow for consistent weakness tests, making sure that upon modification it is put to test for security issues. Running these automated tests alongside normal tests gives quick responses to developers to resolve issues that can become a roadblock to the development process. Automated Tools To manage security vulnerabilities, it is important to use automated tools. Different types of tools apply to different phases. Static Application security testing (SAST): These tools check the code for weaknesses before it is deployed. Dynamic Application Security Testing (DAST): Dast tools find vulnerabilities in an application or system that is already executed. Software Composition Analysis (SCA): Software composition analysis tools observe external dependencies used in the project for identified security problems. In addition, GitLab CI, Jenkins, and CircleCI are tools that can easily combine security checks into the creation process. It allows for constant observation and quick patches to any security risks. Comprehensive Vulnerability Management Strategy According to research by Positive Technologies, 84 percent of companies have high-risk vulnerabilities on their external networks. The research also hinted that more than half of these weaknesses can be erased simply by installing updates. Image Source The first step in comprehensive vulnerability management is carrying out rigorous tests that involve discovering susceptibilities that exist within software and systems. Once these weaknesses have been identified, rank them in order of priority; that is, focus on the most critical exposure first, as high-risk vulnerabilities can cause greater damage. Lower risks should be resolved later. Setting up well-defined vulnerability management rules is indispensable. This guideline should draft how weaknesses will be identified, reported, and reviewed. The obligation of team members should also be clarified to make sure everyone knows the role they play in security upkeep. A sophisticated framework includes: Patch management Regular vulnerability scan Incident response plan to manage any data breach that could happen regardless of safety measures The value of an enlightened team cannot be overstated. To keep your DevOps team updated on new security risks and their model procedures, regular training and awareness programs are needed. These programs should address how to discover susceptibilities in code and software, how security tools can be used efficiently, and the benefits of compliance with vulnerability management policy. When your team is enhanced with the right skills and knowledge, vulnerabilities will be preemptively identified and alleviated before they are capitalized on. Security is a continuous operation, not a task to be performed once. To adjust to new risks and weaknesses, security baselines should be regularly updated and maintained. This involves: Monitoring systems and software regularly Applying updates and patches instantly Revising security policies as necessary Conducting audits and assessments to ensure compliance with security standards and discover the areas that need improvement Embracing a Proactive Security Culture in DevOps To maintain the integrity and security of cutting-edge programming, effective vulnerability management in a DevOps environment is crucial. Integrating security practices into every phase of DevOps development can help organizations preemptively uncover and mitigate weaknesses ensuring a robust and resilient infrastructure. DevOps continues to evolve and so do cyber threats. To maintain reliability and be safe from unfolding risks in the virtual terrain prioritize security, and embrace a culture of continuous progress.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense. In today's cybersecurity landscape, securing the software supply chain has become increasingly crucial. The rise of complex software ecosystems and third-party dependencies has introduced new vulnerabilities and threats, making it imperative to adopt robust security measures. This article delves into the significance of a software bill of materials (SBOM) and DevSecOps practices for enhancing application security. We will cover key points such as the importance of software supply chain security, the role of SBOMs, the integration of DevSecOps, and practical steps to secure your software supply chain. Understanding the Importance of Software Supply Chain Security Software supply chain security encompasses the protection of all components and processes involved in the creation, deployment, and maintenance of software. This includes source code, libraries, development tools, and third-party services. As software systems grow more interconnected, the attack surface expands, making supply chain security a critical focus area. The software supply chain is vulnerable to various threats, including: Malicious code injection – attackers embedding malicious code into software components Dependency hijacking – compromising third-party libraries and dependencies Code tampering – making unauthorized modifications to source code Credential theft – stealing credentials to access and manipulate development environments To combat these threats, a comprehensive approach to software supply chain security entails: Continuous monitoring and assessment – regularly evaluating the security posture of all supply chain components Collaboration and transparency – fostering open communication between developers, security teams, and third-party vendors Proactive threat management – identifying and mitigating potential threats before they can cause damage The Importance of an SBOM and Why It Matters for Supply Chain Security An SBOM is a detailed inventory of all components, dependencies, and libraries used in a software application. It provides visibility into the software's composition, enabling organizations to: Identify vulnerabilities – By knowing exactly what components are in use, security teams can swiftly identify which parts of the software are affected by newly discovered vulnerabilities, significantly reducing the time required for remediation and mitigating potential risks. Ensure compliance – Many regulations mandate transparency in software components to ensure security and integrity. An SBOM helps organizations adhere to these regulations by providing a clear record of all software components, demonstrating compliance, and avoiding potential legal and financial repercussions. Improve transparency – An SBOM allows all stakeholders, including developers, security teams, and customers, to understand the software’s composition. This transparency fosters better communication, facilitates informed decision making, and builds confidence in the security and reliability of the software. Enhance supply chain security – Detailed insights into the software supply chain help organizations manage third-party risks more effectively. Having an SBOM allows for better assessment and monitoring of third-party components, reducing the likelihood of supply chain attacks and ensuring that all components meet security and quality standards. Table 1. SBOM benefits and challenges Benefits Challenges Enhanced visibility of all software components Creating and maintaining an accurate SBOM Faster vulnerability identification and remediation Integrating SBOM practices into existing workflows Improved compliance with regulatory standards Ensuring SBOM data accuracy and reliability across the entire software development lifecycle (SDLC) Regulatory and Compliance Aspects Related to SBOMs Regulatory bodies increasingly mandate the use of SBOMs to ensure software transparency and security. Compliance with standards such as the Cybersecurity Maturity Model Certification (CMMC) and Executive Order 14028 on "Improving the Nation's Cybersecurity" emphasizes the need for comprehensive SBOM practices to ensure detailed visibility and accountability for software components. This enhances security by quickly identifying and mitigating vulnerabilities while ensuring compliance with regulatory requirements and maintaining supply chain integrity. SBOMs also facilitate rapid response to newly discovered threats, reducing the risk of malicious code introduction. Creating and Managing SBOMs An SBOM involves generating a detailed inventory of all software components, dependencies, and libraries and maintaining it accurately throughout the SDLC to ensure security and compliance. General steps to create an SBOM include: Identify components – list all software components, including libraries, dependencies, and tools Document metadata – record version information, licenses, and source details for each component Automate SBOM generation – use automated tools to generate and update SBOMs Regular updates – continuously update the SBOM to reflect changes in the software Several tools and technologies aid in managing SBOMs, such as: CycloneDX, a standard format for creating SBOMs OWASP dependency-check identifies known vulnerabilities in project dependencies Syft generates SBOMs for container images and filesystems Best Practices for Maintaining and Updating SBOMs Maintaining and updating an SBOM is crucial for ensuring the security and integrity of software applications. Let's review some best practices to follow. Automate Updates Automating the update process of SBOMs is essential to keeping them current and accurate. Automated tools can continuously monitor software components and dependencies, identifying any changes or updates needed to the SBOM. This practice reduces the risk of human error and ensures that the SBOM reflects the latest state of the software, which is critical for vulnerability management and compliance. Implementation tips: Use automation tools like CycloneDX and Syft that integrate seamlessly with your existing development environment Schedule regular automated scans to detect updates or changes in software components Ensure that the automation process includes notification mechanisms to alert relevant teams of any significant changes Practices to avoid: Relying solely on manual updates, which can lead to outdated and inaccurate SBOMs Overlooking the importance of tool configuration and updates to adapt to new security threats Integrate Into CI/CD Embedding SBOM generation into the continuous integration and continuous deployment (CI/CD) pipeline ensures that SBOMs are generated and updated automatically as part of the SDLC. This integration ensures that every software build includes an up-to-date SBOM, enabling developers to identify and address vulnerabilities early in the process. Implementation tips: Define clear triggers within the CI/CD pipeline to generate or update SBOMs at specific stages, such as code commits or builds Use tools like Jenkins and GitLab CI that support SBOM generation and integrate with popular CI/CD platforms Train development teams on the importance of SBOMs and how to use them effectively within the CI/CD process Practices to avoid: Neglecting the integration of SBOM generation into the CI/CD pipeline, which can lead to delays and missed vulnerabilities Failing to align SBOM practices with overall development workflows and objectives Regular Audits Conducting periodic audits of SBOMs is vital to verifying their accuracy and completeness. Regular audits help identify discrepancies or outdated information and ensure that the SBOM accurately reflects the software's current state. These audits should be scheduled based on the complexity and frequency of software updates. Implementation tips: Establish a routine audit schedule, such as monthly or quarterly, depending on the project’s needs Involve security experts in the auditing process to identify potential vulnerabilities and ensure compliance Use audit findings to refine and improve SBOM management practices Practices to avoid: Skipping audits, which can lead to undetected security risks and compliance issues Conducting audits without a structured plan or framework, resulting in incomplete or ineffective assessments DevSecOps and Its Role in Software Supply Chain Security DevSecOps integrates security practices into the DevOps pipeline, ensuring that security is a shared responsibility throughout the SDLC. This approach enhances supply chain security by embedding security checks and processes into every stage of development. Key Principles and Practices of DevSecOps The implementation of key DevSecOps principles can bring several benefits and challenges to organizations adopting the practice. Table 3. DevSecOps benefits and challenges Benefits Challenges Identifies and addresses security issues early in the development process Requires a shift in mindset toward prioritizing security Streamlines security processes, reducing delays and improving efficiency Integrating security tools into existing pipelines can be complex Promotes a culture of shared responsibility for security Ensuring SBOM data accuracy and reliability Automation Automation in DevSecOps involves integrating security tests and vulnerability scans into the development pipeline. By automating these processes, organizations can ensure consistent and efficient security checks, reducing human error and increasing the speed of detection and remediation of vulnerabilities. This is particularly important in software supply chain security, where timely identification of issues can prevent vulnerabilities from being propagated through dependencies. Implementation tip: Use tools like Jenkins to automate security testing within your CI/CD pipeline. Collaboration Collaboration between development, security, and operations teams is essential in DevSecOps. This principle emphasizes breaking down silos and fostering open communication and cooperation among all stakeholders. Effective collaboration ensures that security considerations are integrated from the start, leading to more secure software development processes. Implementation tip: Establish regular cross-team meetings and use collaboration tools to facilitate communication and knowledge sharing. Continuous Improvement Continuous improvement in DevSecOps involves regularly updating security practices based on feedback, new threats, and evolving technologies. This principle ensures that security measures remain effective and relevant so that they adapt to changes in the threat landscape and technological advancements. Implementation tip: Use metrics and key performance indicators (KPIs) to evaluate the effectiveness of security practices and identify areas for improvement. Shift-Left Security Shift-left security involves integrating security early in the development process rather than addressing it at the end. This approach allows developers to identify and resolve security issues during the initial stages of development, reducing the cost and complexity of fixing vulnerabilities later. Implementation tip: Conduct security training for developers and incorporate security testing tools into the development environment. Application Security Testing in DevSecOps Application security testing is crucial in DevSecOps to ensure that vulnerabilities are detected and addressed early. It enhances the overall security of applications by continuously monitoring and testing for potential threats. The following are different security testing methods that can be implemented: Static application security testing (SAST) analyzes source code for vulnerabilities. Dynamic application security testing (DAST) tests running applications for security issues. Interactive application security testing (IAST) combines elements of SAST and DAST for comprehensive testing. Open-source tools and frameworks that facilitate application security testing include: SonarQube, a static code analysis tool OWASP ZAP, a dynamic application security testing tool Grype, a vulnerability scanner for container images and filesystems Integrating Security Into CI/CD Pipelines Integrating security into CI/CD pipelines is essential to ensure that security checks are consistently applied throughout the SDLC. By embedding security practices into the CI/CD workflow, teams can detect and address vulnerabilities early, enhancing the overall security posture of the application. Here are the key steps to achieve this: Incorporate security tests into CI/CD workflows Use automated tools to scan for vulnerabilities during builds Continuously monitor for security issues and respond promptly Automating Security Checks and Vulnerability Scanning Automation ensures that security practices are applied uniformly, reducing the risk of human error and oversight to critical security vulnerabilities. Automated security checks can quickly identify vulnerabilities, allowing for faster remediation and reducing the window of opportunity for attackers to exploit weaknesses. DevSecOps emphasizes the importance of building security into every stage of development, automating it wherever possible, rather than treating it as an afterthought. Open-source CI/CD tools like Jenkins, GitLab CI, and CircleCI can integrate security tests into the pipeline. While automation offers significant benefits, there are scenarios where it may not be appropriate, such as: Highly specialized security assessments Context-sensitive analysis Initial setup and configuration False positives and negatives Ensuring Continuous Security Throughout the SDLC Implement continuous security practices to maintain a strong security posture throughout the SDLC and regularly update security policies, tools, and practices to adapt to evolving threats. This proactive approach not only helps in detecting and mitigating vulnerabilities early but also ensures that security is integrated into every phase of development, from design to deployment. By fostering a culture of continuous security improvement, organizations can better protect their software assets and reduce the likelihood of breaches. Practical Steps to Secure Your Software Supply Chain Implementing robust security measures in your software supply chain is essential for protecting against vulnerabilities and ensuring the integrity of your software. Here are practical steps to achieve this: Establishing a security-first culture: ☑ Implement training and awareness programs for developers and stakeholders ☑ Encourage collaboration between security and development teams ☑ Ensure leadership supports a security-first mindset Implementing access controls and identity management: ☑ Implement least privilege access controls to minimize potential attack vectors ☑ Secure identities and manage permissions using best practices for identity management Auditing and monitoring the supply chain: ☑ Continuously audit and monitor the supply chain ☑ Utilize open-source tools and techniques for monitoring ☑ Establish processes for responding to detected vulnerabilities Key Considerations for Successful Implementation To successfully implement security practices within an organization, it's crucial to consider both scalability and flexibility as well as the effectiveness of the measures employed. These considerations ensure that security practices can grow with the organization and remain effective against evolving threats. Ensuring scalability and flexibility: ☑ Design security practices that can scale with your organization ☑ Adapt to changing threat landscapes and technological advancements using flexible tools and frameworks that support diverse environments Measuring effectiveness: ☑ Evaluate the effectiveness of security efforts using key metrics and KPIs ☑ Regularly review and assess security practices ☑ Use feedback to continuously improve security measures Conclusion Securing the software supply chain is crucial in today's interconnected world. By adopting SBOM and DevSecOps practices using open-source tools, organizations can enhance their application security and mitigate risks. Implementing these strategies requires a comprehensive approach, continuous improvement, and a security-first culture. For further learning and implementation, explore the resources below and stay up to date with the latest developments in cybersecurity. Additional resources: "Modern DevSecOps: Benefits, Challenges, and Integrations To Achieve DevSecOps Excellence" by Akanksha Pathak "Building Resilient Cybersecurity Into Supply Chain Operations: A Technical Approach" by Akanksha Pathak "Demystifying SAST, DAST, IAST, and RASP: A Comparative Guide" by Apostolos Giannakidis Software Supply Chain Security: Core Practices to Secure the SDLC and Manage Risk by Justin Albano, DZone Refcard Getting Started With CI/CD Pipeline Security by Sudip Sengupta and Collin Chau, DZone Refcard Getting Started With DevSecOps by Caroline Wong, DZone Refcard This is an excerpt from DZone's 2024 Trend Report,Enterprise Security: Reinforcing Enterprise Application Defense.Read the Free Report
Are you curious what experienced practitioners are saying about AI and platform engineering — and its growing impact on development workflows? Look no further than DZone’s latest event with PlatformCon 2024 where our global software community answers these vital questions in an expert panel on all things platform engineering, AI, and beyond. What Developers Must Know About AI and Platform Engineering Moderated by DZone Core member and Director of Data and AI at Silk, Kellyn Pot’Vin-Gorman, panelists Ryan Murray, Sandra Borda, and Chiradeep Vittal discussed the most probing questions and deliberations facing AI and platform engineering today. Check out the panel discussion in its entirety here: Important questions and talking points discussed include: How has AI transformed the platform engineering landscape? Examples of how AI has improved developer productivity within organizations. What are some of the challenges you’ve faced when integrating AI into your development workflow, and how have those been addressed? What are some anti-patterns or caveats when integrating GenAI into engineering platforms and the SDLC more broadly? What are some practical steps or strategies for organizations looking to start incorporating AI into their platform engineering efforts? ….and more!
Whether on the cloud or setting up your AIOps pipeline, automation has simplified the setup, configuration, and installation of your deployment. Infrastructure as Code(IaC) especially plays an important role in setting up the infrastructure. With IaC tools, you will be able to describe the configuration and state of your infrastructure that are desirable. The popular tools for IaC include Terraform, Pulumi, AWS CloudFormation, and Ansible; each of them has different possibilities for automating the deployment and management of infrastructure both in the cloud and on-premises. With the growing complexity of applications and heightened focus on security in software development, the tools SonarQube and Mend are more predisposed. As explained in my previous article, SonarQube is a code analysis tool aimed at helping developers have high-quality code by spotting bugs and vulnerabilities across several programming languages. SonarQube is very well integrated into pipelines of Continuous Integration/Continuous Deployment, producing continuous feedback while forcing enforcement of coding standards. Mend deals with software composition analysis (SCA) helping organizations manage and secure their open-source OS components. Mend, formerly WhiteSource, is a very well-integrated security solution with IaC tools for improving the security posture of infrastructure deployments. Mend automates vulnerability scanning and management for IaC code, allowing their customers to manage incubated security issues very early in the development cycle. Terraform for Infrastructure as Code Terraform is a HashiCorp-developed tool that enables developers and operations teams to define, provision, and manage infrastructure using a declarative language known as HashiCorp Configuration Language, HCL. HCL2 is the current version. Terraform is a provider-agnostic tool that provides the ability to manage resources across several cloud platforms and services by use of a single tool. Some of Terraform's standout features include: Declarative syntax: This is a way of telling the user what they want, and Terraform basically figures out how to create it. Plan and apply workflow: Terraform's plan command shows what changes will be made before actually applying them. This reduces the risk of unintended modifications. State management: Terraform keeps track of your current state. This will turn on incremental changes and detect drift. Modularity: Reusable modules allow teams to standardize and share infrastructure elements across projects. IaC Tools in the Ecosystem Alongside Terraform, a number of other tools offer different capabilities based on what users need and where they are running out of the IaC tool. AWS CloudFormation: Specifically designed for AWS, it provides deep integration with AWS services but lacks multi-cloud support. Azure Resource Manager (ARM) templates: Similar to CloudFormation, but for Azure resources Google Cloud Deployment Manager: Google Cloud's native IaC solution Pulumi: Allows developers to use familiar programming languages like Python, TypeScript, and Go to define infrastructure Ansible: While primarily a configuration management tool, Ansible can also be used for infrastructure provisioning. Chef and Puppet: Configuration management tools that can be extended for infrastructure provisioning Enhancing Security With Mend With the growth of IaC adoption, the demand for better security management also grows. This is where Mend comes in to provide a robust scanning and securing solution for IaC code. Mend will enable smooth incorporation into the development process as well as continuous security scanning of Terraform and other IaC tools. The following are some ways through which Mend boosts security measures without compromising on productivity: Automated scanning: Mend can scan your IaC code automatically in search of vulnerabilities, misconfigurations, and compliance issues. Early detection: If integrated with CI/CD pipelines, Mend will spot security vulnerabilities at an early stage during the development phase thus reducing cost and effort for fixing them later on. Custom policies: Teams can develop custom security policies to meet their specific needs and compliance requirements. Remediation guidance: Upon detection of a problem, Mend provides clear instructions on what steps should be taken to rectify it helping developers address security concerns promptly. Compliance mapping: Issues identified are mapped by mend as per the particular requirements of different standards or regulations so that organizations can maintain compliance. Continuous monitoring: Even after deployment, Mend continues to monitor your infrastructure for new vulnerabilities or drift from secure configurations. Integration with DevOps tools: Mend integrates with famous version control systems, CI/CD platforms, and ticketing systems, making it part of existing workflows. This proactive approach to security allows teams to move fast and innovate while significantly minimizing the risk of security breaches, misconfigurations, and compliance violations when they adopt Mend in their IaC practices. Along with Terraform, Mend supports the following IaC environments and their configuration files: Bicep CloudFormation Kubernetes ARM Templates Serverless Helm Integrate Mend With GitHub Mend provides several integration options and tools that GitHub users can use to further drive security and vulnerability management in their repositories. Overview of Mend's Presence on GitHub Mend for GitHub.com App This GitHub App has both SCA and SAST capabilities. It can be installed directly from the GitHub Marketplace to allow easy integration with your repositories. Mend Bolt Mend Bolt performs repository scans looking for vulnerabilities in open-source components. It is available free of cost as an app on the GitHub Marketplace, supporting over 200 programming languages while supporting the following features: Scanning: This happens automatically after every "push." It detects vulnerabilities in open source libraries and has a five-scan per-day limit per repository. Opening issues for vulnerable, open source libraries Dependency tree management, along with the visualizing of dependency trees Checks for suggested fixes for vulnerabilities Integration with GitHub Checks stops pull requests with new vulnerabilities from getting merged. Mend Toolkit Mend maintains a GitHub Organization, "mend-toolkit", containing various repositories that host integration knowledge bases, examples of implementation, and tools. This includes: Mend implementation examples Mend SBOM Exporter CLI Parsing scripts for YAML files Import tools for SPDX or CSV SBOM into Mend Mend Examples Repository Under the mend-toolkit organization, there is a "mend-examples" repository with examples of several scanning and result-pulling techniques in Mend. This includes, among other things: SCM integration Integrating self-hosted repo setup Integration of CI/CD Examples of policy checks Mend prioritizes scans by language Terms Mend SAST and Mend SCA implementations Set Up Mend for GitHub In this article, you will learn how to set up Mend Bolt. 1. Install the Mend App Go to the GitHub Marketplace. Click "Install" and select the repositories you want to scan. Install Mend Bolt for GitHub After selecting the repositories, click on Install and complete authorization. 2. Complete the Mend Registration You'll be redirected to the Mend registration page. Complete the registration if you are a new Mend user and click on Submit. Mend Registration 3. Merge the Configuration Pull Request Mend will automatically create a pull request(PR) in your repository. This PR adds a .whitesource configuration file: Mend PR Review the PR and merge it to initiate your first scan. Review and merge the PR 4. Customize Scan Settings Open the .whitesource file in your repository. Modify settings as needed. The key setting to enable IaC scans is enableIaC: true. JSON { "scanSettings": { "enableIaC": true, "baseBranches": ["main"] }, "checkRunSettings": { "vulnerableCheckRunConclusionLevel": "failure", "displayMode": "diff", "useMendCheckNames": true }, "issueSettings": { "minSeverityLevel": "LOW", "issueType": "DEPENDENCY" } } Check the other configuration options (Configure Mend for GitHub.com for IaC). Note: Iac scans can only be performed on base branches. JSON { "scanSettings": { "enableIaC": true, "baseBranches": ["main"] }, "checkRunSettings": { "useMendCheckNames": true, "iacCheckRunConclusionLevel": "failure" } } Commit changes to update your scan configuration. 5. Monitor and Review Results Mend will now scan your repository on each push (limited to 5 scans/day per repo for Mend Bolt). Mend scan report Check the "Issues" tab in your GitHub repository for vulnerability reports. Review the Mend dashboard for a comprehensive overview of your security status. 6. Remediate Issues For each vulnerability, Mend provides detailed information and suggested fixes. Create pull requests to update vulnerable dependencies based on Mend's recommendations. 7. Continuous Monitoring Regularly review Mend scan results and GitHub issues. Keep your .whitesource configuration file updated as your security needs evolve. You have successfully integrated Mend with GitHub, enabling automated security scanning and vulnerability management for your repositories. Along with GitHub, Mend supports, Git Enterprise, GitLab, BitBucket, etc., you can find the supported platforms in the Mend documentation. Conclusion The power of IaC tools like Terraform, combined with robust security solutions such as Mend, sets any infrastructure management base on very strong ground. These technologies and best practices help keep organizations safe while ensuring adaptability and scalability for modern, fast-moving digital environments. Indeed, the importance of integrating security throughout the whole life cycle of our infrastructure cannot be overemphasized as we continue raising the bar on what is possible with infrastructure automation. There are additional best practices like version control, modularization, use of right access permissions, auditing your code for compliance, etc., providing added security to your IaC code.
In the software development domain, implementing robust and efficient processes helps meet the continuously evolving demands of the industry. Essentially, the rapid deployment of both infrastructure and software is crucial to maintaining a competitive edge in the market. Adopting methodologies like Infrastructure as Code (IaC) and Continuous Integration/Continuous Deployment (CI/CD) are crucial for swift and consistent software delivery. These approaches transform how software and its infrastructure are built, tested, and deployed. This article explores the fundamental concepts of CI/CD and the differences in application in two distinct areas: conventional software/application development and IaC. Table of Contents CI/CD Meaning: An Overview What is Continous Integration (CI) What is Continous Delivery (CD) Benefits of CI/CD for businesses CI/CD and Software Development CI/CD and Infrastructure as Code(IaC) Benefits of CI/CD in IaC Comparing CI/CD in Software Development vs CI/CD in IaC Conclusion CI/CD Meaning: An Overview CI/CD are software development practices that help improve the efficiency, reliability, and speed of the development and deployment processes. They involve automated pipelines for software deployment and testing meant to improve the product's time to market. What Is Continuous Integration (CI)? Continuous Integration involves integrating code changes regularly from multiple contributors into a shared repository. Developers submit their code changes to a Version Control System (VCS) like Git, where the changes are merged with the deployed product. Automated build and testing processes are triggered during the merge process to ensure the new code integrates well with the existing codebase and does not introduce errors. As such, CI helps detect integration issues early, reduce bugs, and create a consistent and stable codebase. What Is Continuous Delivery (CD)? Continuous Delivery involves automating software delivery at various stages of the development or production environment. Once code changes pass the continuous integration phase, automated processes take over to deploy the application to different environments, including testing, staging, and production. Consequently, CD facilitates faster and more reliable software delivery, reduced manual intervention, and quick bug fixes, as well as feature updates. Benefits of CI/CD for Businesses CI/CD is a key component of the DevOps methodology, also known as the "shift left" culture that enables collaboration and communication between the development and operations teams to create a more efficient and streamlined software delivery process. Some of the key benefits of implementing CI/CD for companies include: Improved metrics: CI/CD enhances Software Development metrics by automating the tracking of key performance indicators (KPIs). This includes insights into build success rates, test coverage, and deployment frequency. Better deployment: CI/CD implementation ensures smoother and more reliable release processes by automating key stages, such as integration, testing, and deployment. This helps businesses reduce the likelihood of deployment failures, improving the overall reliability of software releases. Reduced errors and downtime: Automation within the CI/CD pipeline significantly decreases the chances of human error during development and deployment. With automated testing, you can detect issues early in the development cycle, which consequently, reduces bugs and vulnerabilities in production. Increased agility: CI/CD establishes a more agile development environment, enabling businesses to respond quickly to changing market demands. Enhanced security: Automated security scans and checks integrated into the CI/CD pipeline help identify and address vulnerabilities early in development. Improved time to market: CI/CD implementation speeds up the production process using automated pipelines for testing and deployment to various environments. This results in faster bug fixes, updates, and a satisfied user base. CI/CD and Software/Application Development CI/CD helps streamline the development lifecycle and enhances software efficiency. CI/CD pipelines use detailed processes, standards, tools, and automation to rapidly integrate continuous phases—source, build, test, and deploy — with precision. Moreover, they are tailored to the specific needs of each project, ensuring efficiency and adaptability. Here’s how the Software Development Lifecycle (SDLC) benefits from the automated pipelines. 1. Source The first step in a CI/CD pipeline is creating source code, with developers using language-specific tools like IDEs for code-checking features. This phase relies on code repositories and VCS like Git. 2. Build The build process extracts source code, links components, and compiles them into an executable file. Tools generate logs, identify errors, and notify developers. Build tools that are language-dependent and can be integrated into integrated development environments (IDEs), supporting both source creation and building in a CI/CD pipeline. 3. Test After completing static testing in the source code phase, the built code progresses to the dynamic testing phase. This involves functional and unit testing for new features, regression testing to avoid breaking existing functionalities, and additional integration, user acceptance, and performance tests. 4. Deploy After successfully passing testing, the build becomes a candidate for deployment. Continuous delivery requires approval from human stakeholders prior to deployment, while continuous deployment automatically deploys post-test suite clearance. CI/CD and Infrastructure as Code IaC involves managing, configuring, and provisioning infrastructure using code. Organizations present their infrastructure in code, enabling version control and automated deployment through tools like Terraform and Ansible. In IaC, CI/CD helps optimize the management and provisioning of infrastructure through automation. Additionally, CI/CD in IaC involves a streamlined process from defining the infrastructure as code to implementing a robust CI/CD pipeline. Let’s explore the steps to enable IaC and CI/CD; Step 1: Define Infrastructure as Code Define the desired infrastructure configuration using code, enabling automated provisioning and management. You can use tools like Terraform, Pulumi, and Crossplane. Step 2: Create a CI/CD Pipeline The second step is to build a CI/CD pipeline that will automate the integration, testing, and deployment of IaC changes. You can use tools like GitLab CI/CD, Jenkins, or Azure DevOps. Step 3: Integrate IaC and CI/CD Integrate IaC practices with CI/CD processes to ensure a unified and automated approach to infrastructure management. This can be achieved by automatically deploying infrastructure changes within the CI/CD pipeline. This guarantees infrastructure alignment with the latest code modifications, maintaining consistency. Benefits of CI/CD in IaC Integrating CI/CD in IaC has several benefits, including cost reduction, enhanced agility, and fast time to market. Let’s explore some of them below. Cost reduction: Efficient automation minimizes manual efforts, leading to cost savings in infrastructure management. Better quality: Automated testing ensures consistent and error-free infrastructure configurations, enhancing overall quality. Enhanced agility: Quick and automated deployment of infrastructure changes enables increased adaptability and responsiveness. Synchronization between code and infrastructure: Software and IaC changes can be part of the same CI/CD pipeline. Both changes can be deployed using the same pipeline, ensuring the product and its environment are never out of sync. Fast-time to market: Accelerated development and deployment cycles result in a faster time-to-market for applications and services. Comparing CI/CD in Software/Application Development vs CI/CD in IaC CI/CD in software development and Infrastructure as Code (IaC) share fundamental principles, enabling efficiency and collaboration. Let’s explore the common aspects below: Automation IaC: CI/CD automates infrastructure provisioning and management through code. Software development: CI/CD automates the building, testing, and deployment of software. Version Control IaC: CI/CD utilizes version control systems (e.g., Git) to manage changes in infrastructure code. Software development: Employs version control for source code management Collaboration IaC: CI/CD encourages collaboration by allowing multiple team members to contribute to infrastructure code. Software development: Promotes collaborative coding and joint development efforts. The table below compares distinctive features of CI/CD in software/application development vs IaC. Aspects Software/Application Development Infrastructure as Code (IaC) Focus Software/application building, testing, and deployment Infrastructure provisioning and management Code Structure Application source code Defining infrastructure and configurations as code Deployment Process Involves automating the build, test, and deployment for systematic release of applications Involves automating the provisioning and configuring of infrastructure through code Lifecycle Management Software Development Lifecycle (SDLC) Focuses on maintaining and evolving infrastructure Final Output Deployed and operational software applications Configured and provisioned infrastructure Tools Jenkins, Travis CI, etc. Terraform, Pulumi, Crossplane, etc. Conclusion Implementing CI/CD benefits both traditional software development and Infrastructure as Code (IaC). While CI/CD practices streamline software development lifecycles, enhancing efficiency and reliability, their application in IaC optimizes infrastructure management through automation and consistency. Whether applied to software or infrastructure, the principles of CI/CD, such as automation, version control, and collaboration, contribute to the overall success and competitiveness of modern development practices.
Serverless computing has emerged as a transformative approach to deploying and managing applications. The theory is that by abstracting away the underlying infrastructure, developers can focus solely on writing code. While the benefits are clear—scalability, cost efficiency, and performance—debugging serverless applications presents unique challenges. This post explores effective strategies for debugging serverless applications, particularly focusing on AWS Lambda. Before I proceed I think it's important to disclose a bias: I am personally not a huge fan of Serverless or PaaS after I was burned badly by PaaS in the past. However, some smart people like Adam swear by it so I should keep an open mind. Introduction to Serverless Computing Serverless computing, often referred to as Function as a Service (FaaS), allows developers to build and run applications without managing servers. In this model, cloud providers automatically handle the infrastructure, scaling, and management tasks, enabling developers to focus purely on writing and deploying code. Popular serverless platforms include AWS Lambda, Azure Functions, and Google Cloud Functions. In contrast, Platform as a Service (PaaS) offers a more managed environment where developers can deploy applications but still need to configure and manage some aspects of the infrastructure. PaaS solutions, such as Heroku and Google App Engine, provide a higher level of abstraction than Infrastructure as a Service (IaaS) but still require some server management. Kubernetes, which we recently discussed, is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. While Kubernetes offers powerful capabilities for managing complex, multi-container applications, it requires significant expertise to set up and maintain. Serverless computing simplifies this by removing the need for container orchestration and management altogether. The "catch" is twofold: Serverless programming removes the need to understand the servers but also removes the ability to rely on them resulting in more complex architectures. Pricing starts off cheap. Practically free. It can quickly escalate especially in case of an attack or misconfiguration. Challenges of Serverless Debugging While serverless architectures offer some benefits, they also introduce unique debugging challenges. The primary issues stem from the inherent complexity and distributed nature of serverless environments. Here are some of the most pressing challenges. Disconnected Environments One of the major hurdles in serverless debugging is the lack of consistency between development, staging, and production environments. While traditional development practices rely on these separate environments to test and validate code changes, serverless architectures often complicate this process. The differences in configuration and scale between these environments can lead to bugs that only appear in production, making them difficult to reproduce and fix. Lack of Standardization The serverless ecosystem is highly fragmented, with various vendors offering different tools and frameworks. This lack of standardization can make it challenging to adopt a unified debugging approach. Each platform has its own set of practices and tools, requiring developers to learn and adapt to multiple environments. This is slowly evolving with some platforms gaining traction, but since this is a vendor-driven industry, there are many edge cases. Limited Debugging Tools Traditional debugging tools, such as step-through debugging and breakpoints, are often unavailable in serverless environments. The managed and controlled nature of serverless functions restricts access to these tools, forcing developers to rely on alternative methods, such as logging and remote debugging. Concurrency and Scale Serverless functions are designed to handle high concurrency and scale seamlessly. However, this can introduce issues that are hard to reproduce in a local development environment. Bugs that manifest only under specific concurrency conditions or high loads are particularly challenging to debug. Notice that when I discuss concurrency here I'm often referring to race conditions between separate services. Effective Strategies for Serverless Debugging Despite these challenges, several strategies can help make serverless debugging more manageable. By leveraging a combination of local debugging, feature flags, staged rollouts, logging, idempotency, and Infrastructure as Code (IaC), developers can effectively diagnose and fix issues in serverless applications. Local Debugging With IDE Remote Capabilities While serverless functions run in the cloud, you can simulate their execution locally using tools like AWS SAM (Serverless Application Model). This involves setting up a local server that mimics the cloud environment, allowing you to run tests and perform basic trial-and-error debugging. To get started, you need to install Docker or Docker Desktop, create an AWS account, and set up the AWS SAM CLI. Deploy your serverless application locally using the SAM CLI, which enables you to run the application and simulate Lambda functions on your local machine. Configure your IDE for remote debugging, launching the application in debug mode, and connecting your debugger to the local host. Set breakpoints to step through the code and identify issues. Using Feature Flags for Debugging Feature flags allow you to enable or disable parts of your application without deploying new code. This can be invaluable for isolating issues in a live environment. By toggling specific features on or off, you can narrow down the problematic areas and observe the application’s behavior under different configurations. Implementing feature flags involves adding conditional checks in your code that control the execution of specific features based on the flag’s status. Monitoring the application with different flag settings helps identify the source of bugs and allows you to test fixes without affecting the entire user base. This is essentially "debugging in production." Working on a new feature? Wrap it in a feature flag which is effectively akin to wrapping the entire feature (client and server) in if statements. You can then enable it conditionally globally or on a per-user basis. This means you can test the feature, and enable or disable it based on configuration without redeploying the application. Staged Rollouts and Canary Deployments Deploying changes incrementally can help catch bugs before they affect all users. Staged rollouts involve gradually rolling out updates to a small percentage of users before a full deployment. This allows you to monitor the performance and error logs of the new version in a controlled manner, catching issues early. Canary deployments take this a step further by deploying new changes to a small subset of instances (canaries) while the rest of the system runs the stable version. If issues are detected in the canaries, you can roll back the changes without impacting the majority of users. This method limits the impact of potential bugs and provides a safer way to introduce updates. This isn't great as in some cases some demographics might be more reluctant to report errors. However, for server-side issues, this might make sense as you can see the impact based on server logs and metrics. Comprehensive Logging Logging is one of the most common and essential tools for debugging serverless applications. I wrote and spoke a lot about logging in the past. By logging all relevant data points, including inputs and outputs of your functions, you can trace the flow of execution and identify where things go wrong. However, excessive logging can increase costs, as serverless billing is often based on execution time and resources used. It’s important to strike a balance between sufficient logging and cost efficiency. Implementing log levels and selectively enabling detailed logs only when necessary can help manage costs while providing the information needed for debugging. I talk about striking the delicate balance between debuggable code, performance, and cost with logs in the following video. Notice that this is a general best practice and not specific to serverless. Embracing Idempotency Idempotency, a key concept from functional programming, ensures that functions produce the same result given the same inputs, regardless of the number of times they are executed. This simplifies debugging and testing by ensuring consistent and predictable behavior. Designing your serverless functions to be idempotent involves ensuring that they do not have side effects that could alter the outcome when executed multiple times. For example, including timestamps or unique identifiers in your requests can help maintain consistency. Regularly testing your functions to verify idempotency can make it easier to pinpoint discrepancies and debug issues. Testing is always important but in serverless and complex deployments it becomes critical. Awareness and embrace of idempotency allow for more testable code and easier-to-reproduce bugs. Debugging a Lambda Application Locally With AWS SAM Debugging serverless applications, particularly AWS Lambda functions, can be challenging due to their distributed nature and the limitations of traditional debugging tools. However, AWS SAM (Serverless Application Model) provides a way to simulate Lambda functions locally, enabling developers to test and debug their applications more effectively. I will use it as a sample to explore the process of setting up a local debugging environment, running a sample application, and configuring remote debugging. Setting Up the Local Environment Before diving into the debugging process, it's crucial to set up a local environment that can simulate the AWS Lambda environment. This involves a few key steps: Install Docker: Docker is required to run the local simulation of the Lambda environment. You can download Docker or Docker Desktop from the official Docker website. Create an AWS account: If you don't already have an AWS account, you need to create one. Follow the instructions on the AWS account creation page. Set up AWS SAM CLI: The AWS SAM CLI is essential for building and running serverless applications locally. You can install it by following the AWS SAM installation guide. Running the Hello World Application Locally To illustrate the debugging process, let's use a simple "Hello World" application. The code for this application can be found in the AWS Hello World tutorial. 1. Deploy Locally Use the SAM CLI to deploy the Hello World application locally. This can be done with the following command: Shell sam local start-api This command starts a local server that simulates the AWS Lambda cloud environment. 2. Trigger the Endpoint Once the local server is running, you can trigger the endpoint using a curl command: Shell curl http://localhost:3000/hello This command sends a request to the local server, allowing you to test the function's response. Configuring Remote Debugging While running tests locally is a valuable step, it doesn't provide full debugging capabilities. To debug the application, you need to configure remote debugging. This involves several steps. First, we need to start the application in debug mode using the following SAM command: Shell sam local invoke -d 5858 This command pauses the application and waits for a debugger to connect. Next, we need to configure the IDE for remote debugging. We start by setting up the IDE to connect to the local host for remote debugging. This typically involves creating a new run configuration that matches the remote debugging settings. We can now set breakpoints in the code where we want the execution to pause. This allows us to step through the code and inspect variables and application state just like in any other local application. We can test this by invoking the endpoint, e.g., using curl. With the debugger connected, we would stop on the breakpoint like any other tool: Shell curl http://localhost:3000/hello The application will pause at the breakpoints you set, allowing you to step through the code. Handling Debugger Timeouts One significant challenge when debugging Lambda functions is the quick timeout setting. Lambda functions are designed to execute quickly, and if they take too long, the costs can become prohibitive. By default, the timeout is set to a short duration, but you can configure this in the template.yaml file; e.g.: YAML Resources: HelloWorldFunction: Type: AWS::Serverless::Function Properties: Handler: app.lambdaHandler Timeout: 60 # timeout in seconds After updating the timeout value, re-issue the sam build command to apply the changes. In some cases, running the application locally might not be enough. You may need to simulate running on the actual AWS stack to get more accurate debugging information. Solutions like SST (Serverless Stack) or MerLoc can help achieve this, though they are specific to AWS and relatively niche. Final Word Serverless debugging requires a combination of strategies to effectively identify and resolve issues. While traditional debugging methods may not always apply, leveraging local debugging, feature flags, staged rollouts, comprehensive logging, idempotency, and IaC can significantly improve your ability to debug serverless applications. As the serverless ecosystem continues to evolve, staying adaptable and continuously updating your debugging techniques will be key to success. Debugging serverless applications, particularly AWS Lambda functions, can be complex due to their distributed nature and the constraints of traditional debugging tools. However, by leveraging tools like AWS SAM, you can simulate the Lambda environment locally and use remote debugging to step through your code. Adjusting timeout settings and considering advanced simulation tools can further enhance your debugging capabilities.
Zero-day threats are becoming more dangerous than ever. Recently, bad actors have taken over the TikTok accounts of celebrities and brands through a zero-day hack. In late May to early June, reports of high-profile TikTok users losing control over their accounts started to surface after opening a direct message. The malware used for the attack was able to infect devices without the users downloading or installing anything. TikTok appeared unaware of the extent of the damage. The company’s spokesperson, Alex Haurek, said that the number of accounts compromised was “very small,” but he also declined to provide a specific number. He said they have been working with the owners of the affected accounts to restore access and that they have implemented measures to make sure the problem does not happen again. If a massive company with vast resources can fall victim to a serious zero-day attack, it follows logically that smaller companies find themselves in a more vulnerable position. This underscores the importance of maximizing the integration of DevOps and security. The Threat of Zero-Day Vulnerabilities Zero-day vulnerabilities are security weaknesses or issues in software that have not been discovered, identified, and profiled yet. Nobody knows they exist, let alone how they work. There are no security patches available to address them. When threat actors discover them, they get to launch attacks generally unhindered and unmitigated. Most existing cyber defenses tend to be ineffective against such attacks. TikTok is just one of the major organizations hit by zero-days. In 2017, Microsoft was rocked by a zero-day exploit in MS Word that led to compromised personal bank accounts. In 2020, at least two zero-day vulnerabilities that enabled remote attacks were discovered in Apple’s iOS. That same year, the popular video-conferencing platform Zoom sustained a serious zero-day encounter – a vulnerability that made it possible for hackers to take over devices and access files. However, the lack of information about vulnerabilities does not mean there are no ways to detect, block, and mitigate them. Zero-days are stoppable or at least mitigatable with the right strategies and solutions. They are not easy to address, but they can significantly impede threat actors’ efforts to attack undetected. And one of the best ways to keep them at bay is through DevSecOps. A Foundation for Modern IT Security DevOps has been a favorite buzzword in the software development field in the past few years, but it eventually became apparent that security cannot be disregarded in the quest to optimize the software development process and accelerate time to market. Cyber threats have become increasingly aggressive and sophisticated, and it has become necessary to involve developers in building cyber protection. Separate review solutions are not ineffective in general, but they cannot immediately address issues that lie in the software code itself. With this in mind, it’s the developers who are in the best position to implement practices that emphasize security from the ground up, while still keeping high deliverability in mind. For one, developers can adopt the shift-left principle, wherein security testing tools and processes are integrated into the CI/CD pipeline. They can identify and address vulnerabilities during the development phase as part of their standard routine, instead of undertaking a separate security testing phase. This significantly removes security issues before software is deployed. Developers can also embrace secure coding practices by following guidelines or standards like the OWASP Secure Coding Practices and the Open Project’s Secure Coding Guidelines. This is also known as the principle of “security by design,” wherein developers build software that is built specifically to be resilient against both known and unknown vulnerabilities. Additionally, DevOps teams can implement continuous vulnerability scanning to constantly check their code for possible weaknesses. This involves the use of vulnerability scanning tools throughout the development pipeline. It entails additional costs, but the security rewards are indisputable. The ability to detect vulnerabilities in real-time ensures rapid patching and remediation, preventing threat actors from spotting and exploiting the vulnerabilities. Also, DevOps teams can leverage Infrastructure as Code (IaC) to streamline secure cloud environment management. IaC enables the configuration and provisioning of infrastructure through code, which makes it easier to iterate security configurations and check the code for issues before deployment. Security practices are baked into the code, ensuring the consistent implementation of security standards and mechanisms. Moreover, DevOps teams can leverage containerization and microservice architectures to isolate applications and make it easier to address zero-day attacks. These do not necessarily prevent the emergence of zero days, but they help control and resolve the problem. Each container runs in an isolated environment, which means that if vulnerabilities are exploited, the compromise can be limited to the affected container. Also, it will be faster to patch the affected container and conduct forensics to ensure that the same problem does not recur even in other containers. DevSecOps Best Practices A successful DevSecOps strategy requires more than just tools. It is not enough to have security software and testing integrated into the entire development process. Organizations should also take into account best practices such as continuous monitoring, regular testing and audits, and employee education. It is, however, important to use security tools that enable continuous monitoring, including automated and AI-driven services capable of comprehensively monitoring the development process for security issues. AI can also power robust vulnerability alert systems that employ contextualization to avoid security information overload and make sure that the most crucial and urgent alerts are not buried under insignificant details such as false positives and logs of low-risk events. It may sound redundant, but security audits and testing are not the same as continuous monitoring. Audits and testing are conducted on a periodic basis and they target specific areas or functions. Continuous monitoring is an ongoing process that reveals trends and immediately discernible vulnerabilities, but these processes are not as thorough and in-depth as periodic penetration testing. Lastly, DevOps teams should have high-level proficiency in security optimization. This requires them to undergo training and closely collaborate with the security team. DevSecOps From Day One The instances of zero-day attacks are unlikely to drop. It is advisable to prepare for them and even anticipate the growing aggressiveness and cunningness of threat actors in finding and exploiting vulnerabilities. It makes perfect sense to embrace DevSecOps, which may require a paradigm change for many organizations. Organizations need to observe new practices and invest in new tools and processes that proactively and more effectively address security issues associated with zero-day vulnerabilities. DevSecOps is hardly foolproof, and it’s possible that no amount of vigilance would have stopped the TikTok zero-day attack. However, organizations definitely have better chances of avoiding unpredictable security issues if they integrate their security tools, streamline their security processes, implement continuous monitoring, and conduct regular penetration testing and security audits.
Boris Zaikin
Lead Solution Architect,
CloudAstro GmBH
Pavan Belagatti
Developer Evangelist,
SingleStore
Lipsa Das
Content Strategist & Automation Developer,
Spiritwish