DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report

DevOps and CI/CD

The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).

icon
Latest Refcards and Trend Reports
Refcard #387
Getting Started With CI/CD Pipeline Security
Getting Started With CI/CD Pipeline Security
Refcard #084
Continuous Integration Patterns and Anti-Patterns
Continuous Integration Patterns and Anti-Patterns
Trend Report
DevOps
DevOps

DZone's Featured DevOps and CI/CD Resources

Cloud-Native Application Security
Refcard #375

Cloud-Native Application Security

Continuous Delivery Patterns and Anti-Patterns
Refcard #145

Continuous Delivery Patterns and Anti-Patterns

What Is APIOps? How to Be Successful at It
What Is APIOps? How to Be Successful at It
By David Mckenna
How To Migrate Terraform State to GitLab CI/CD
How To Migrate Terraform State to GitLab CI/CD
By Anthony Neto
How To Create Jenkins Multibranch Pipeline
How To Create Jenkins Multibranch Pipeline
By Salman Khan
Test Design Guidelines for Your CI/CD Pipeline
Test Design Guidelines for Your CI/CD Pipeline

When delivering software to the market faster, there is a critical need to onboard automated tests into your continuous delivery pipeline to verify the software adheres to the standards your customer expects. Your continuous delivery pipeline could also consist of many stages that should trigger these automated tests to verify defined quality gates before the software can move to the next stage and eventually into production (see Figure 1). Depending on the stage of your pipeline, your automated tests could range in complexity from unit, integration, end-to-end, and performance tests. When considering the quantity and complexity of tests, along with the possibility of having multiple stages in your pipeline, there could be many challenges when onboarding, executing, and evaluating the quality of your software before it is released. This article will describe some of these challenges. I will also provide some best practice guidelines on how your automated tests could follow a contract to help increase the delivery of your software to the market faster while maintaining quality. Following a contract helps to onboard your tests in a timely and more efficient manner. This also helps when others in your organization might need to troubleshoot issues in the pipeline. Strive to Run Any Test, Anywhere, Anytime, by Anyone! Figure 1: Example Software Continuous Delivery Pipeline Challenges There could be several challenges when onboarding your tests into your continuous delivery pipeline that could delay your organization from delivering software to market in a reliable manner: Quantity of Technologies Automated tests can be developed in many technologies. Examples include pytest, Junit, Selenium, Cucumber, and more. These technologies might have competing installation requirements such as operating system levels, browser versions, third-party libraries, and more. Also, the infrastructure that hosts your pipeline may not have enough dedicated resources or elasticity to support these varieties of technologies. It would be efficient to execute tests in any environment without having to worry about competing requirements. Test Runtime Dependencies Tests can also depend on a variety of inputs during runtime that could include text files, images, and/or database tables, to name a few. Being able to access these input items can be challenging as these inputs could be persisted in an external location that your test must retrieve during execution. These external repositories may be offline during runtime and cause unanticipated test failures. Different Input Parameters When onboarding and sharing your tests into your organization’s CI/CD process, it is common for your tests to have input parameters to pass into your test suite to execute the tests with different values. For example, your tests may have an input parameter that tells your test suite what environment to target when executing the automated tests. One test author may name this input parameter –base-URL while another test author in your organization may name this input parameter –target. It would be advantageous to have a common signature contact, with the same parameter naming conventions when onboarding into your organization’s CI/CD process. Different Output Formats The variety of technologies being used for your testing could produce different output formats by default. For example, your pytest test suite could generate plain text output while your Selenium test suite may produce HTML output. Standardizing on a common output format will assist in collecting, aggregating, reporting, and analyzing the results of executing all the tests onboarded into your enterprise CI/CD process. Difficulties Troubleshooting If a test fails in your CI/CD pipeline, this may cause a delay in moving your software out to market and thus there will be a need to debug the test failure and remediate it quickly. Having informative logging enabled for each phase of your test will be beneficial when triaging the failure by others in your organization such as your DevOps team. Guidelines Containerize Your Tests By now, you have heard of containers and their many benefits. The advantage of containerizing your tests is that you have an isolated environment with all your required technologies installed in the container. Also, having your tests containerized would allow the container to be run in the cloud, on any machine, or in any continuous integration (CI) environment because the container is designed to be portable. Have a Common Input Contract Having your test container follow a common contract for input parameters helps with portability. It also reduces the friction to run that test by providing clarity to the consumer about what the test requires. When the tests are containerized, the input parameters should use environment variables. For example, the docker command below uses the -e option to define environment variables to be made available to the test container during runtime: docker run-e BASE_URL=http://machine.com:80-e TEST_USER=testuser-e TEST_PW=xxxxxxxxxxx-e TEST_LEVEL=smoke Also, there could be a large quantity of test containers onboarded into your pipeline that will be run at various stages. Having a standard naming convention for your input parameters would be beneficial when other individuals in your organization need to run your test container for debugging or exploratory purposes. For example, if tests need to define an input parameter that defines the user to use in the tests, have a common naming convention that all test authors follow, such as TEST_USER. Have a Common Output Contract As mentioned earlier, the variety of technologies being used by your tests could produce different output formats by default. Following a contract to standardize the test output helps when collecting, aggregating, and analyzing the test results across all test containers to see if the overall results meet your organization's software delivery guidelines. For example, there are test containers using pytest, Junit, Selenium, and Cucumber. If the contract said to produce output in xUnit format, then all the test results generated from the running of these containers could be collected and reported on in the same manner. Provide Usage/Help Information When onboarding your test container in your pipeline, you are sharing your tests with others in your organization, such as the DevOps and engineering teams that support the pipeline. Others in your organization might have a need to use your test container as an example as they design their test container. To assist others in the execution of your test container, having a common option to display the help and usage information to the consumer would be beneficial. The help text could include: A description of what your test container is attempting to verify Descriptions of the available input parameters Descriptions of the output format One or two example command line executions of your test container Informative Logging Logging captures details about what took place during test execution at that moment in time. To assist with faster remediation when there is a failure, the following logging guidelines are beneficial: Implement a standard record format that would be easy to parse by industry tooling for observability Use descriptive messages about the stages and state of the tests at that moment in time Ensure that there is NO sensitive data, such as passwords or keys, in the generated log files that might violate your organization's security policies Log API (Application Program Interfaces) requests and responses to assist in tracking the workflow Package Test Dependencies Inside of the Container As mentioned earlier, tests can have various runtime dependencies such as input data, database tables, and binary inputs to name a few. When these dependencies are contained outside of the test container, they may not be available at runtime. To onboard your test container in your pipeline more efficiently, having your input dependencies built and contained directly inside of your container would ensure that they are always available. However, there are use cases where it may not make sense to build your dependencies directly inside of your test container. For example, you have a need to use a large input dataset that is gigabytes in size, in your test. In this case, it may make more sense to work with your DevOps team to have this input dataset available on a mounted filesystem that is made available in your container. Setup and Teardown Resources Automated tests may require the creation of resources during execution time. For example, there could be a requirement to create multiple Account resources in your shared deployment under test and perform multiple operations on these Account resources. If there are other tests running in parallel against the same deployment that might also have a requirement to perform some related operations on the same Account resource, then there could be some unexpected errors. A test design strategy that would create an Account resource with a unique naming convention, perform the operation, assert things were completed correctly, and then remove the Account resource at the end of the test would reduce the risk of failure. This strategy would ensure that there is a known state at the beginning and end of the test. Have Code Review Guidelines Code review is the process of evaluating new code by someone else on your team or organization before it is merged into the main branch and packaged for consumption by others. In addition to finding bugs much earlier, there are also benefits to having other engineers review your test code before it is merged: Verify the test is following the correct input and output contracts before it is onboarded into your CI/CD pipeline Ensure there is appropriate logging enabled for readability and observability Establish the tests have code comments and are well documented Ensure the tests have correct exception handling enabled and the appropriate exit codes Evaluate the quantity of the input dependencies Promote collaboration by reviewing if the tests are satisfying the requirements Conclusion It is important to consider how your test design could impact your CI/CD pipeline and the delivery of your software to market in a timely manner while maintaining quality. Having a defined contract for your tests will allow one to onboard tests into your organization’s software delivery pipeline more efficiently and reduce the rate of failure.

By Billy Dickerson
DevSecOps: The Broken or Blurred Lines of Defense
DevSecOps: The Broken or Blurred Lines of Defense

With the modern patterns and practices of DevOps and DevSecOps, it’s not clear who the front-line owners are anymore. Today, most organizations' internal audit processes have lots of toils and low efficacy. This is something John Willis, in his new role as Distinguished Researcher at Kosli, has referred to in previous presentations as “Security and Compliance Theatre.” It's a topic he has previously touched on with Dan Lines on the Dev Interrupted Podcast, also featured on DZone. In this talk, filmed at Exploring DevOps, Security, Audit compliance and Thriving in the Digital Age, John takes a deep dive into DevSecOps and what effective governance will look like as regulation and automation continue to have competing impacts on the way software is delivered. He'll ask how we came to be at the current pass with references to well-known risk and compliance failures at Equifax, Knight Capital, Capital One, and Solar Winds. Full Transcript So, if you think back in time to my evolution of trying to get people to change the way they think about what we do in security from a DevOps perspective, the Abraham Wald story, you have probably heard it; you just haven't heard it with his name. So, during World War Two, there were a bunch of mathematicians and statisticians whose job was to figure out how to do weight distribution and repair fighter planes that came back with bullet holes. This Abraham Wald one day woke up and said — and this is really the definition of survival bias — “Wait a minute, we're repairing the planes where the bullet holes are? They're the ones that are coming back. We should be repairing the planes where the bullet holes aren’t because they're the ones that aren't coming back.” I think that's a great metaphor for the way we think about security: maybe we're looking in the wrong place. And so I asked this meta question about three or four years ago, which I hope makes your brain hurt a little bit, but in the Abraham Wald way — which was “What if DevSecOps happened before DevOps?” Well, the world would be different. Because if you think about it — I'm pro-DevSecOps, and I think everybody should have a good DevSecOps reference architecture — basically what happened was we did all this DevOps work, and then we put an overlay of security on it, and that's good, it’s necessary. But maybe we already had this bias of bullet holes when we were thinking about that. What if we started with security? What if some security person said, “I’ve got a great way to do security. I'm going to call it DevSecOps!” and we started in that order? Could things be different? Would we be thinking differently, or have we not thought differently? So Shannon Lietz, who is one of my mentors — she wrote the DevSecOps Manifesto — coined the term “everyone is responsible for security.” We were talking at the break about these three lines of defense. So, I don't come from an auditor background. All I know is that I get brought into all these companies, and they would ask, “Hey John, look at our DevSecOps reference architecture!” And I’d go, “Well, that’s awesome,” and then we would have a conversation. “Yeah, we buy the three lines of defense model.” “Erm, yeah, that one is not so awesome!” Because Andrew Clay Shafer, in the earliest days of DevOps, pre-DevSecOps, made this beautiful character of what described the problem of the original DevOps problem statement. There was a wall of confusion — and some of you people look like you might be close to as old as I am — but there was a day when developers would throw their code over the wall, and operations would catch it and say, “It doesn’t work!” And the other side would say, “No, you broke it; it works! And this would go on for weeks and weeks and weeks, and Andrew would talk about figuring out a way to break that wall. And there were some of the original DevOps, these beautiful stories of developers working together in collaboration, and there is a whole industry that's been built out of it. So, there is busting the wall, and it becomes a metaphor for any non-collaborative groups in an organization. And so when I was thinking about what was the problem statement that drove DevOps, where are we now with the problem statement of what happens in a large organization between the first line, second line, and third line? The way I view it when I have these conversations, the second line is, by definition, a buffer for the third line. The second line has no way to communicate with the first. And this is what “dev” and “ops” looked like 15 years ago. We didn't have the tools. We didn't even have the cognitive mapping to have those discussions. We didn't even know that we should be having those concerns. In Investments Unlimited, we have a short description about how I'm not going to go to the Institute of Internal Auditors and say, “Hey, I'm John Willis; you’ve never heard of me; get rid of three lines of defense!” That ain't happening. But what I am going to say is, just like we do a separation, can we reframe — and we did this a little bit in the book — the conversation of how we think about this? Why can't we create that DSL I talked about, where the second line can meet the first line in designer requirements? And here's the kicker, right? I don't claim to be a genius about all things. But what I do know is in every bank and every company I've walked in, what's the purpose of that second line? They basically make sure they do the job that they need to do. What's the job they need to do? Protect the brand. That's it, right? Everything falls on protecting the brand. When you're Equifax, and you lose a $5 billion market cap in a day, or you're another company called Knight Capital that was the second largest high-frequency trading company on the NYSE, they lost $440million in 45 minutes and were out of business in 24 hours. That's what they're supposed to do. And our relationship now is to basically hide things from them. That's got to change. And that's why we get into the likes of Equifax, Ignite Capital, and Capital One. So what do you do? I had this idea when I started thinking about security. Have you ever been to RSA? You go into the exhibition hall at the RSA you're like, “I’ve gotta get out of here!” There are too many lights and too many vendors; this is just too confusing. It was almost impossible to come up with a taxonomy for security because there are just so many ways to discuss it and look at it. So, I started thinking about how do I make it simple. Could I come up with it? And like I always say — and I’ll keep saying it until I get punched in the face for saying it — a post-cloud native world or using cloud-native as the marker to make the conversation simpler. I'm not implying that security doesn't exist for legacy in mainframes; it certainly does. But we could have a simpler conversation if we just assumed there was a line where we could say everything to the right is cloud native. And so with that, I will tell you that what we need to do and what we do, are inconsistent. What we need to do or what we do is how do we prove we're safe when we have some form of usually subjective audit service to our records that tell stories about things that might lead to screen prints? And then how do we demonstrate we have a poor second line in our internal auditors or external auditors, try to figure out what all those subjective descriptions are, and we need to do both. So we need to be able to make a system that can prove we're safe and be very consistent with what we demonstrate. And that's the whole point of something like Kosli or, just in general, this idea of digitally signed, immutable data that represents what we say we do. So, then the audit isn't a subjective 40-day conversation, it's a one-second look at the SHA, and we're done. So we move from implicit security models to explicit proof models, and we change subjective to objective and then verifiable. Back to the cloud-native model, if you can accept there’s a post-cloud-native world, then I can tell you that I can give you a simple taxonomy for thinking about security and not having to think about it in like 40 horizontal and 50 vertical ways. I worked with a couple of groups, and I started hearing from these CISOs, and I said, “I don't want to call it taxonomy, but we can look at it as risk defense and trust, and we can look at it as a transition from subjective, to objective, to verifiable.” So we went through in the last presentation in 20 minutes the risk from a change at the attestation. I didn't talk about continuous verification, but there are some really interesting products that are basically trying to use chaos monkey-like tools to go beyond just breaking things to actually, for example, this port should never be open… let's just open the port. If this vulnerability should have never gotten through the pipeline, let's launch an application that has that vulnerability, right? So there's some really interesting continuous verification. I'll spend a little more on that. But then, on defense, it's table stakes that you detect and respond by Azure and all that stuff. And then everybody's basically trying to build a data lake right now, a cyber data lake. That’s the in, hip thing to do. I'm not making fun of it, it's required, but there's some real thought process that isn't happening about how you build a cyber data lake that isn’t just a bunch of junk. So there are a couple of vendors and projects that are thinking about "Can we normalize data and ingest like coming out of the provider?" So, for example, you take a message that might come from Amazon, the same message might come from Google Cloud and might come from Oracle, and it might be the same thing, like increased privileges. But the message is completely different; there's no normalization, and so if you shove that all the way to the right in a cyber data lake, you’re going to have a hard time figuring out what the message even is, let alone that each one of them has a different meta definition for the ID and all that stuff, and at some point, you really want to you want to attach that to a NIST or a minor attack framework tactic. So, let's do all that on the left side, and there's some good work happening there. And then the trust thing is interesting too because the thing that we're seeing anybody could file the SDS. When Mike said I sold a company to Docker, what I actually did is I had this crazy idea of ‘Could we do software-defined networking in containers?’ And we did it. We were literally me and this guy who pretty much invented software networking; we built it, and as you know, this whole idea of how you do trust and build around it. If you think about SDN, it was changing a north-south paradigm of how traffic came in and out to the east-west. If you looked at some of the traffic patterns going back 15, 20 years ago, 90% of your traffic was north-south. And then all of a sudden, the more you got into high-scaled service, service mesh, all that stuff, it flipped. It went to 80% east-west. And we built a network around that. Well, I believe we have to do that for trust now. And we already see evidence of this when we start getting into Kubernetes and clusters and stuff like that. We're seeing pods, like SPIFFE and SPIRE, some of the new service mesh stuff, ambient mesh — I am throwing out a lot of terms — but there is this possibility to instead of building this on or off north-south trust, we could create ephemeral trust within a cluster, and it goes away. So, even things like secrets management — I think Vault is a great product today, but that stuff could happen really at the mesh level, where a secret just exists in this clustered pod or cluster for the life of the cluster. And by the way, you're in, or you're out like you're authorized for that cluster. So, I think there's incredibly interesting stuff around what I call moving. And zero trust is table stakes, right? I'm talking about more of, let's really go to a level of trust where the world is going to be — I don't know if it's Kubernetes — but it's definitely going to be a cluster-based compute, and then we could build our trust around that model. I know it sounds crazy, but hey. So, risk differently. We talked about this in Investments Unlimited. This was the Capital One breach, which is fun for everybody except Capital One! Basically, this was the stretch to Jakarta. Oh, wait a minute, this is actually Equifax, but that's fine. So what happened was there was a vulnerability in one of the stretch two libraries, which almost everybody uses, where if you are an unauthorized system, you put a command in, and it runs. And if that was their system, you could do whatever you want – this is what I told you about the breach. But this one's a little more interesting; this is Capital One’s breach. If you follow aviation disasters, this is like the Air France 447 of computing, if that makes any sense to anybody. So what was interesting about this one is they were basically rolling IDSs, so there was this window of anywhere from seconds to about five minutes where an adversary could poke in. There was this woman who was a crypto miner, who basically runs a billion curls a day, looking for one person to put up a proxy with the defaults on. And this team that was in a hurry got an exception and put up a proxy that left the defaults, and this one proxy, the default bypass, was on. So, this crypto miner got really lucky because the IDSs were rolling, they popped through, they hit that IP address — it’s a hardwired address that anybody who has ever worked with Amazon Web Service knows is the metadata server — so that capitalone.com question mark URL equals the metadata server, and because they were in a hurry, they probably cut and paste from a stack overflow some VPC definitions. And they were privileged, so they got right through, were able to dump the privileges, and assume a super user power. Meanwhile, some developers left 100 million business credit card applications in the S3 bucket. Here's where it gets really worse. In business credit cards, the PCI DSS requires Social Security numbers to be tokenized, but it doesn't require the corporation ID to be that. I'm sure it's everywhere, but basically, half of the S corps are small businesses, and they use the Social Security number as the corporation ID. So again, there are just all these loopholes that happen. And that's called server-side request forgery. I was actually brought into SolarWinds. One of the authors of the book worked for a big five, and they wanted to get the contract for the clean-up job. So I was brought in to talk about automated governance. Again, we can make fun of SolarWinds all day long, but every software company out there is basically as bad as they are. By the way, all the software that you're buying – now that I actually don't work for a software company, we’re SaaS-based, we’re good! – you look at what SolarWinds was, and it was terrible. The pipelines it was just horrendous. And so I go in talking about advanced stuff, and they're like, “No, no, we’ve just got to get DevOps!” So they weren't really that interested. But I thought I'd be Johnny on the spot and go in there and take this CrowdStrike minor attack framework and say, “Alright, I'm going to really show these guys what they should use.” Because basically, what happened was the adversary got into the Microsoft compiler. These are supply chain attacks; they are the really scary ones — where they're not even going after you; they're going after whoever you're delivering stuff to. So they got into there. And by the way, they apparently rolled their logs after 18 months, so they don't even know how long they were in there. They could have been there for years. So CrowdStrike did a really good analysis, and one of the ones that I just caught — in fact, it’s in our demo, that's why I sent our demo — was they weren't checking the image SHA. So what happened is they said, “I must build!” and they start injecting nefarious code into it so that when that product goes out to the Department of Defense or the bank, they've got this open backdoor out there. And a table stakes attestation would be if it's a clean image or a job file, is doing a baseline SHA and be able to look before or after, and be able to see if it should have been this, and there are other ways to detect. And the other thing that was really interesting about why this idea of automated governance has to have an immutable and non-tamperable data store, they went in and actually created logs. That's really scary if they get to live in your company. And by the way, they’re in your company right now, don't think they're not there now. They may not be finding a way around to do real damage, but you are incredibly naive if you don't think there are adversaries sitting in your corporation. There's polymorphic malware – I spent 20 minutes explaining how polymorphic malware works — they are in your company. The question is how hard it is or what opportunities arise. The Air France 447 allows them to get to the next step and to the next step. If they're really smart, this is where it gets really scary. They can actually tamper with the logs to remove the evidence that they actually did things. One of the biggest things to Equifax, when that was said and done, the Equifax story is really interesting. I know a lot of people who worked at Equifax, and so their external auditors are the thing that drove almost everybody to quit for the next two years after the breach: they wanted everybody to prove the negative and note the negative. In other words, they were like, you know what? We survived a nightmare because they didn't change the data. That's the scary thing. It's one thing to have an adversary that dumps a bunch of confidential data out in the wild, and that's not good; it's going to hurt the brand. You'll go out of business if they change your system or record data and they publish that. If you're a bank and I change your counts, they were in Marriott for five years for that breach. So, if they're really smart — and this is evidence that they do this — not only might they mutate your data, they’ll mutate the evidence of the data. That's why it has to be in an immutable, non-tamperable store. Defense differently: again, I talked a lot about this. You have to think about your site, don't just build a cyber data lake. There are some really good opportunities to think about how you ingest at the provider level. And there are a couple of providers now building these interesting SDKs — it's called automated cloud governance. It's from a group out in New York called ONUG, where you can basically use these SDKs from Microsoft, Oracle, IBM, and Google, where you can start building these NIST user data, and you can normalize the messages themselves. So by the time you get in the data lake, you're not doing an incredible amount of compute consumption to try to correlate. And trust differently; zero trust is table stakes. But I think the really interesting stuff becomes, certainly, in this state, the 207, and the good news is when I first was writing this, SPIFEE and SPIRE were just external projects. Now they're actually built into the Istio, Envoy, and service mesh, so they’re all there. But Sigstore is really interesting, a Merkle tree-based solution that deserves to be looked at. The thing that I'm trying to get Mike and James, and everybody, really excited about is what's coming down the pike. And here's the thing: in the past, we had this buffer as IT, our first line people who know that the adversaries and the auditors, we are ahead of them. We’ve got Kubernetes. They won't figure out all the dangers, ghosts, and dragons in Kubernetes until next year. We've been sort of living in that sort of delayed buffer. Well, now what's happening is the people who write the Google stuff, like service mesh, Istio, and Envoy, are now writing or getting contracted by NIST to write the documentation. So now some of the incredibly dangerous stuff that's in Istio and Envoy, which is the service mesh part of Kubernetes, is well documented in English — easily read, and both the adversaries and the auditors can see that there's something called the blue-green deploy that used to only happen in layer three. Basically, what happened now is all the way through; stuff can happen to layer seven. And in layer three, stuff or switch config, which is very hard for an adversary to get in your system and tamper with that stuff. But now an adversary just sees a find one leaky API or the YAML file, and they can basically say, “You know what? I'm going to take 1% of all the paid traffic and send it to some latest version.” I ask people, “Have you ever heard of the Envoy? Do you even turn on the Envoy access log?” “What's that?” So that means there are banks that are running production, customer payment, or funds-based stuff in service mesh — and they have no evidence of that. So this is for people like Kosli and us, who want to get ahead of the curve, a treasure trove. It's stuff that's going to happen so fast, and people aren't even ready. And again, the takeaway is we had the luxury to think about the adversaries, how we don't think they'll figure that out because it's so advanced. The people who write Envoy, Istio, and that stuff are now writing this documentation on how it works. So the adversaries are not stupid. You know, when you tell them there's something called the blue-green deploy, they might not know what it is, but once they realize it's a reroute of traffic, then they'll know exactly what to do with it. By the way, that's a GPT-3 image; all I put in was Bacciagalupe as John Willis, and that's what I got. And the only thing I will say is — and this is probably worth a drink of water — I think the internet thinks I'm a clown. So that's OK! We've got some time for a Q&A, so I'll bring a couple of chairs up, and we can have a bit of a fireside chat. If you have any questions, put them in on the QR code on your lanyard. So before we get into the questions from the audience, I'd like to pick up on what you were saying about the network stuff because I have to say, when you started talking about this, Istio and Envoy, can we just stick with what we've got for now? And the more I started thinking about it, the more I thought, “Oh wait, hold on, this is quite interesting because, again, it goes back to the DevOps story because it's another example of things that used to be in another department in the business where the developers get so pissed off with it, they decide that we're going to put this in software now. So first, it was billed as security, its deployments, its cloud, and its containers. Time after time, we talk about everything being code, but it's really developers doing civil disobedience against other parts of the org in some way. So networking is one area, but some of the conversations I've had this week are also about data. Maybe you could say a bit about that? Oh, yeah. I mean, that's another thing. So one of the things that John Rzeszotarski implemented this thing and one of the first interesting conversations that happened after, and he had he built it, and our whole thought process was this is for the software supply chain. And it turns out one of the API development teams saw it and did a pull request, and said, “Hey, we built a system for API development workflow.” “Ooh, that's interesting!” Since this isn’t really a software-defined workflow, it's a workflow that shows evidence of the decisions you made in defining an API, right? Like leaky API, all that stuff. And that sort of opened up this idea that it's just a model for workflow evidence. And so I started thinking about what else are we doing. And right at the time, this concept of data ops was starting. Go back 15 or 20 years, let's say 20 years ago; there were a lot of big banks that the way they put software into production was they just put the software into production. And then DevOps came to CI and CD, and then DevOps like it'd be very rare and almost probably pretty close to a criminal if that happened in a large bank today. There are people that sort of bypass the system, but in general, I don't run into any financial organizations that don't have some form of a pipeline. But those same organizations have developers that stuff 145 million business credit cards into an S3 bucket through no operational pattern. And so this movement of data ops is ‘Could we do a workflow for how we move data?’ And we've been sort of doing the data warehouse forever, but now whenever a developer wants to get some data, there should be a process of how you tokenize it. Does it get ETL’d? What's the evidence so when it’s sitting in an S3 bucket? So imagine all the way that you're processing. They say that you're taking it from the raw data, maybe ETL process, maybe you're sort of tokenizing, maybe it's going through Kafka and doing this, this, and this, and it's winding up here. What if you were keeping these Kosli or just this attestational evidence all the way so that when it goes to the S3 bucket, you could reject it? Just like you could reject the build, fail to build? Or even better, have a scanner scanning all the S3 buckets looking for any data that doesn't have a verifiable meta of evidence. Again, the two worlds have to mature and meet together, and I think the more of the conversations that happen about data ops, the more it puts us in a better or anybody who's doing this model of that kind of evidence should naturally happen, and it could happen for it. I've seen people talk about doing it for modeling, for example, Monte Carlo modeling. What were the decisions that you made, and what's the data that shows? When the model runs like it's force majeure, right? I mean, at that point, once it's been trained, it's going to do what it does. Now if it does really bad stuff, at least I can show evidence that these are decisions that we made when we were building the model. This gentleman had a question. I know you; you gave a great presentation the other night! Thanks! I was just thinking about the information within the data, and the kind of situation that we are in is that the regulations keep changing; everything changes, right? So if we even have this tokenization or verification of the data that you're using, whatever that is in the architecture, if the regulations change, what are you going to do about it? That’s what I was thinking because if you don't scan for it, but if you know where it is, that means that you can go out and you can pick it out. So the GDPR regulations, OK, we can't keep it for six months anymore; it's only three. If you get the meta, it will tell you everything. Then you know where you have what, so you can actually change on the spot. Here's the beauty part of that: it's the same thing with software delivery, right? Remember, I said earlier, the beauty of having that DSL associated as an artifact in the evidence chain is because if the requirements today are that you had to have this, this, and this, and then in six months now, there's some executive order where we realize, oh, you had to have this, this and this, it's the point in time evidence because the artifact is part of the evidence. So when you're looking at that data or that record, the evidence said you only had to have this. Well, it's even more true with data. With data, you might have reclassified. I did some work with Nike — you want to talk about how interesting their data classification is in the cloud? You guys might not know who Michael Jordan is because you don't follow American basketball, but at the time, the ownership of his data at Nike is cherished as a bank’s account data. The data about some of their big clients, so data classification, but then how do you mature the meta around it? And I think that's a great point — if the policy changes so that it needs to be from six months to three months, if you have the meta tagged — which this model I think works really well for that — then you could just scan around and say, “OK, we got rid of all the data that's been sitting around for four months, that used to be for six months, and now should be three months. I think, just to add to all of this, I agree with everything that's been said. But we know from the SRE book and from Google that 70% of system outages are due to changes in a live system. We focus a lot on changes nowadays in the DevOps world about deployments, but there's so much change that there isn't a deployment, right? There's a database migration, or a user is provisioned, or, you know, somebody needs to fix up a record in a transaction or something. It's just so much more. But it's the same thing, right? The data currently is siloed, ephemeral, and disconnected. And we talked about this the other day. What are the basics? And I'll just throw out the basic four – it’s probably five, maybe six — but what are the basic four about an audit? When did the change happen? Who did the change? Who approved the change? And then, usually, some variant of it was successful; was there a backup plan? And that's whether it's data, whether it's software artifact, whether it's configuration. And again, when the orders come in, they ask about this artifact, which is some library we still, without something like Kosli or a solution like that, spend a lot of time grabbing a lot of stuff to prove it. But when they ask us that same question about databases, I can tell you the answer is it's chaos because, one, we don't use data ops as a model, and two, if we had data ops, we could actually be more aligned with giving the evidence of who made it. Those should be standard in any delivery of any workflow, whether it's an API, whether it's Monte Carlo modeling, whether it's data ops, or whether it's software delivery. One-hundred percent agree. But in the interest of getting through some of the other questions, we had a question from Axel which I think is quite interesting: where do you think the CSO should be put in an organization, both in terms of the formal setup but also the activities and where they do it from. That's an interesting question. I had a great conversation during the break, so I'll give you my first uneducated answer. And it's not too uneducated because Mark Schwartz — he's another writer for IT Revolution and he's written a bunch of books — but one of his books is A Seat At The Table. And it's an interesting book that sort of talks about ‘Are you really an IT company if your CIO isn’t really on a real seat at the table?’ And actually what I've done when I go into large companies, not that I'm talking to the CEO – I do actually get to work for CIOs quite often, but not CEOs, I don't dress well enough for that – but the question I like to find out the answer to is ‘Where does your CIO sit today?’ Do they sit on the kiddies' table or the big grown-up table? Because if they're not on the grown-up table, I don't care how much you tell the world you’re a data company or software company – you’re not a software company. So he makes that point really well, and then he says that even companies that do create this category of achieving — and this might offend some people — but he says that creating a chief data officer is basically an excuse that you're doing it terribly, because why isn’t that part of the CIO? Is data not part of information technology? So. my only point is — the John Willis answer is — you call it CIO or whatever you want to call it, but they all should be aligned. Why is security completely segregated? Compliance and risk are over here, the CISOs here, and CIO here — is security, not information technology? Now, you pointed out that there are some requirements where they have to be firewalled, but then I go back to John Willis doesn't say get rid of the three lines of defense – I say we have to reframe the way we do things. So if I can't change you structurally, I'm not going to get rid of the three lines defenses, but I'm going to ask you until you kick me out of the building, “Why isn’t the second line in designer requirements?” every time I talk to you until you either tell me to get lost or you finally say, “OK, they're going to start showing up, John.” So, I think that there's somewhere in that's how you solve the problem, where it's a hardwired regulation. You work around it by reframing the mindset and the collaboration. But I think it's quite an interesting concept as well because I know some banks, even in this room, their second line doesn't report internally; it reports to the board as an independent control function, which makes a lot of sense. But it's interesting that you would take information security as a control function externally rather than an internal cultural thing that you need to do. Yeah, part of the legacy of our company. I'd say five years into the 10-year DevOps journey, oh my goodness, we forgot to bring security along. Our industry talks about bleeding end information. I've seen CTOs at banks like Citibank say, “We need to be more like Google in like the third slide. Fifteen slides later, they have a slide that says, ‘Do more with less.’ No, that's not how Google does its business! They don't do more with less! They hire incredibly expensive people. When a person tries to leave Google for a startup, they basically add about $1,000,000 to their yearly salary. So they don't do more with less. I was really surprised that the IT budget for places like JPMorgan. It’s incredible how much money they spend, though; it’s more than Google. So, a good friend of mine, I can't say who it is, but when you fire up a backup IBM mainframe, you immediately have to write a $1,000,000 check to IBM. And by the way, there are products called Net View — there are millions and millions of dollars that go into legacy budgets. But yes, the big banks — JPMorgan, Goldman Sachs — Goldman have been trying to figure out Quantum for trading applications. They put an incredible amount of investment money into bleeding-edge tech. I was at Docker, and they were literally the first large financial institution that was going all in on figuring out how they can use containers for tier-one trading applications. So, they definitely do spend money. Great, OK. So another question from an anonymous source. We have a whistle here in the room! So how do we overcome skepticism and resistance among non-tech stakeholders? You can't imagine life without a CAB. I have some opinions! It all goes back to trust. And there are actually a couple of really good books written by Scott Prugh, who is part of the Gene Kim tribe of people. There's a methodical way, and it all comes back to you just having to create the trust model. And it sounds simple, but it could be what we're talking about. One of the things I had to take out of the slide deck because I couldn't do it in 20 minutes; what got me really interested in working with Topo Pal is back in 2017 — he was the first fellow at Capital One — he wrote a blog article about how they did their pipelines. This is 2017, and it is a great article out there — if you want it, I can get a link to it; it’s still very relevant — he defined what he called 16 gates. And the idea was that they told the developers, “If you can provide evidence for these 16 things and we can pick it up, you don't have to go to the CAB.” So the first model is the way you get rid of the CAB is trusted data, right? And there are ways to create trust. I heard somebody say recently that their auditors don't want to hear anything about SHAs or anything like that. What are they thinking about when they're asking questions about funds? Because that tells you it's all encrypted. And if it's not, they’ve got way worse problems than worrying about what we do, you know? So it's how you frame things. If you go to a second line and you talk about SHAs and crypto, and we use Vault to do this, you're going to lose them. But if you try to explain it in a way that says, “The way we protect our system record data and data like our banking information is the same model we're using.” That reframes that conversation to, “Oh, I get it. Yeah, that makes sense.” I think we've got a question in the audience. There was a comment just before you started this about the trust model because I'm thinking that is what is important. If you skip the part about the governance coming down and we go back to DevOps, we need to have a little legitimacy. I think that developers need to have a mandate, or they need to feel a legitimacy to the auditors or the ones controlling them, that they can give away the data, they can give away the code, the 16 gates of trust kind of thing is really important. And I have an example if you want to hear it. I wrote a master's thesis on the security police in Norway because they had to do a complete reorg after the terror attacks that we had on 22 July a few years back. And my question to them was: how do you trust an organizational change? What they did was ask all the departments and department heads what they needed to work on. And they all said more money, more people, and then I'll fix it. And then they fired all of them. Literally, they had to apply for their own jobs. So the solution to all of this was that they asked everybody that worked on the very root level of the organization: what do you need to work? And they said, “Well, I need to talk to my colleague more. I need to sit in the same room. We need to establish the value chains from the bottom and then up.” So they did that, and they did it all internally without any external company auditing them. And it's a completely different matter. Don't even get me started on Dr. Deming because we will not end the day. But probably one of the greatest research projects of the 21st century in our industry is called Project Aristotle by Google. And they asked one question: how do you create great teams? And the single answer — although there was a ton of data, they talked to anthropologists, they talked to software engineers, they interviewed an incredible wealth to figure out this question — and the answer was psychological safety. And if you think about the umbrella of psychological safety, it includes everything you just talked about. Because if I'm a junior female worker in a corporation that's been around for 30 years, that has a bunch of fat old men like me, can that person say, “I don't think that's going to work,” and not get, “You’ve only been here for a week! How would you know?!” A psychologically safe person would say, “We need to take a look at that.” So, I'm not saying this for you, but it's easy to say we need to collaborate. But you can't have collaboration until you can take into account diversity and all these things that you can break down. And again, some of the best, strongest research that has ever happened in our industry comes out of something Google did. And there are some really great resources for people who just track psychological safety. I think it's number one. I'll get on my meta-horse — put me in front of the CEO and make me king for a day where they’re forced to listen to me, and there are two things I would tell them they have to do systemically in that organization. One is systemically pervasive psychological safety throughout the whole company. And second, I'd want them to pervasively create a systemic mindset around systems thinking. Those are the two things I would basically create, and I tell you, everything else will fall into place.

By Bruce Johnston
DevOps for Developers — Introduction and Version Control
DevOps for Developers — Introduction and Version Control

I start some of my talks with a joke: back in my day, we didn’t have monitoring or observability. We’d go to the server and give it a kick. Hear the HD spin? It’s working! We didn’t have DevOps. If we were lucky, we had some admins and a technician to solve hardware issues. That’s it. In a small company, we would do all of that ourselves. Today this is no longer practical. The complexity of deployment and scale is so big it’s hard to imagine a growing company without an engineer dedicated to the operations. In this series, I hope to introduce you to some of the core principles and tools used by DevOps. This is an important skill that we need to master in a startup where we might not have a DevOps role at all. Or in a big corporation, where we need to communicate with the DevOps team and explain our needs or requirements. What Is DevOps? DevOps is a software development methodology that aims to bridge the gap between development and operations teams. It emphasizes collaboration and communication between these two teams to ensure the seamless delivery of high-quality software products. The core principles behind it are: Continuous Integration and Continuous Delivery (CI/CD) — CI/CD is one of the key principles of DevOps. It involves automated processes for building, testing, and deploying software. With CI/CD, developers can identify and fix bugs early in the development cycle, leading to faster and more reliable delivery of software. As a developer, CI/CD can help you by giving you a faster feedback loop, enabling you to make changes to the code and see the results in real time. This helps you to quickly identify and fix any issues, which saves time and ensures that your code is always in a releasable state. Notice that CD stands for both Continuous Delivery and Deployment. This is a terribly frustrating acronym. The difference between the two is simple, though. Deployment relies on delivery. We can’t deploy an application unless it was built and delivered. The deployment aspect means that merging our commits into the main branch will result in a change to production at some point without any user involvement. Automation — Automation involves automating repetitive tasks such as building, testing, and deploying software. This helps to reduce the time and effort required to perform these tasks, freeing up developers to focus on more important tasks. As a developer, automation can help you by freeing up your time and allowing you to focus on writing code rather than spending time on manual tasks. Additionally, automation helps reduce the risk of human error, ensuring that your code is always deployed correctly. Collaboration and Communication — DevOps emphasizes collaboration and communication between development and operations teams. This helps ensure that everyone is on the same page and working towards a common goal. It also helps reduce the time and effort required to resolve any issues that may arise. Platform Engineering Recently there’s been a rise in the field of platform engineering. This is somewhat confusing as the overlap between the role of DevOps and a Platform Engineer isn’t necessarily clear. However, they are two related but distinct fields within software development. While both are concerned with improving software delivery and operation processes, they have different focuses and approaches. Platform Engineering is a discipline that focuses on building and maintaining the infrastructure and tools required to support the software development process. This includes the underlying hardware, software, and network infrastructure, as well as the tools and platforms used by development and operations teams. In other words, DevOps is concerned with improving the way software is developed and delivered, while Platform Engineering is concerned with building and maintaining the platforms and tools that support that process. While both DevOps and Platform Engineering complement each other, they serve different purposes. DevOps helps teams to work together more effectively and deliver software faster, while Platform Engineering provides the infrastructure and tools needed to support that process. Where Do We Start? When learning DevOps, it is important to have a solid understanding of the tools and techniques commonly used in the field. Here are some of the most important tools and techniques to learn: Version control systems: Understanding how to use version control systems, such as Git, is a key component of DevOps. Version control systems allow teams to track changes to their code, collaborate on projects, and roll back changes if necessary. I assume you know Git and will skip it and go directly to the next stage. Continuous Integration (CI) and Continuous Deployment (CD) tools: CI/CD tools are at the heart of DevOps and are used to automate the build, test, and deployment of code. Popular CI/CD tools include Jenkins, Travis CI, CircleCI, and GitLab CI/CD. I will focus on GitHub Actions. It isn’t a popular tool in the DevOps space since it’s relatively limited, but for our needs as developers, it’s pretty great. Infrastructure as Code (IaC) tools: IaC tools let us manage our infrastructure as if it was source code. This makes it easier to automate the provisioning, configuration, and deployment of infrastructure. Popular IaC tools include Terraform, CloudFormation, and Ansible. I also like Pulumi, which lets you use regular programming languages to describe the infrastructure, including Java. Containerization: Containerization technologies, such as Docker, allow you to package and deploy applications in a consistent and portable way, making it easier to move applications between development, testing, and production environments. Orchestration: Orchestration refers to the automated coordination and management of multiple tasks and processes, often across multiple systems and technologies. In DevOps, orchestration is used to automate the deployment and management of complex, multi-tier applications and infrastructure.Popular orchestration tools include Kubernetes, Docker Swarm, and Apache Mesos. These tools allow teams to manage and deploy containers, automate the scaling of applications, and manage the overall health and availability of their systems. Monitoring and logging tools: Monitoring and logging tools allow you to keep track of the performance and behavior of your systems and applications. Popular monitoring tools include Nagios, Zabbix, and New Relic. Prometheus and Grafana are probably the most popular in this field in recent years. Popular logging tools include ELK Stack (Elasticsearch, Logstash, and Kibana), Graylog, and Fluentd. Configuration management tools: Configuration management tools, such as Puppet, Chef, and Ansible, allow you to automate the configuration and management of your servers and applications. Cloud computing platforms: Cloud computing platforms, such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). They provide the infrastructure and services necessary for DevOps practices. In addition to these tools, it is also important to understand DevOps practices and methodologies, such as agile. Remember, the specific tools and techniques you need to learn will depend on the needs of your organization and the projects you are working on. However, by having a solid understanding of the most commonly used tools and techniques in DevOps, you will be well-prepared to tackle a wide range of projects and challenges. Most features and capabilities are transferable. If you learn CI principles in one tool, moving to another won’t be seamless. But it will be relatively easy. Version Control We all use Git, or at least I hope so. Git's dominance in version control has made it much easier to build solutions that integrate deeply. As developers, Git is primarily viewed as a version control system that helps us manage and track changes to our codebase. We use Git to collaborate with other developers, create and manage branches, merge code changes, and track issues and bugs. Git is an essential tool for developers as it allows them to work efficiently and effectively on code projects. DevOps have a different vantage point. Git is viewed as a critical component of the CI/CD pipeline. In this context, Git is used as a repository to store code and other artifacts such as configuration files, scripts, and build files. DevOps professionals use Git to manage the release pipeline, automate builds, and manage deployment configurations. Git is an important part of the DevOps toolchain as it allows for the seamless integration of code changes into the CI/CD pipeline, ensuring the timely delivery of software to production. Branch Protection By default, GitHub projects allow anyone to commit changes to the main (master) branch. This is problematic in most projects. We usually want to prevent commits to that branch so we can control the quality of the mainline. This is especially true when working with CI, as a break in the master can stop the work of other developers. We can minimize this risk by forcing everyone to work on branches and submit pull requests to the master. This can be taken further with code review rules that require one or more reviewers. GitHub has highly configurable rules that can be enabled in the project settings. As you can see here. Enabling branch protection on the master branch in GitHub provides several benefits, including: Preventing accidental changes to the master branch: By enabling branch protection on the master branch, you can prevent contributors from accidentally pushing changes to the branch. This helps to ensure that the master branch always contains stable and tested code. Enforcing code reviews: You can require that all changes to the master branch be reviewed by one or more people before they are merged. This helps to ensure that changes to the master branch are high quality and meet the standards of your team. Preventing force pushes: Enabling branch protection on the master branch can prevent contributors from force-pushing changes to the branch, which can overwrite changes made by others. This helps to ensure that changes to the master branch are made intentionally and with careful consideration. Enforcing status checks: You can require that certain criteria, such as passing tests or successful builds, are met before changes to the master branch are merged. This helps to ensure that changes to the master branch are of high quality and do not introduce new bugs or issues. Overall, enabling branch protection on the master branch in GitHub can help to ensure that changes to your codebase are carefully reviewed, tested, and of high quality. This can help to improve the stability and reliability of your software. Working With Pull Requests As developers, we find that working with branches and pull requests allow us to collect multiple separate commits and changes to a single feature. This is one of the first areas of overlap between our role as developers and the role of DevOps. Pull requests let us collaborate and review each other's code before merging it into the main branch. This helps identify issues and ensures that the codebase remains stable and consistent. With pull requests, the team can discuss and review code changes, suggest improvements, and catch bugs before they reach production. This is critical for maintaining code quality, reducing technical debt, and ensuring that the codebase is maintainable. The role of DevOps is to tune the quality vs. churn. How many reviewers should we have for a pull request? Is a specific reviewer required? Do we require test coverage levels? DevOps needs to tune the ratio between developer productivity, stability, and churn. By increasing the reviewer count or forcing review by a specific engineer, we create bottlenecks and slow development. The flip side is a potential increase in quality. We decide on these metrics based on rules of thumb and best practices. But a good DevOps engineer will follow through with everything with metrics that help an informed decision down the road. E.g., If we force two reviewers, we can then look at the time it takes to merge a pull request which will probably increase. But we can compare it to the number of regressions and issues after the policy took place. That way, we have a clear and factual indication of the costs and benefits of a policy. The second benefit of pull requests is their crucial role in the CI/CD process. When a developer creates a pull request, it triggers an automated build and testing process, which verifies that the code changes are compatible with the rest of the codebase and that all tests pass. This helps identify any issues early in the development process and prevents bugs from reaching production. Once the build and test processes are successful, the pull request can be merged into the main branch, triggering the release pipeline to deploy the changes to production. I will cover CI more in-depth in the next installment of this series. Finally I feel that the discussion of DevOps is often very vague. There are no hard lines between the role of a DevOps engineer and the role of a developer since they are developers and are a part of the R&D team. DevOps navigate over that fine line between administration and development. They need to satisfy the sometimes conflicting requirements on both ends. I think understanding their jobs and tools can help make us better developers, better teammates, and better managers. Next time we’ll discuss building a CI pipeline using GitHub actions. Working on your artifacts. Managing secrets and keeping everything in check. Notice we won’t discuss the continuous delivery in great detail at this stage because that would drag us into a discussion of deployment. I fully intend to circle back to it and discuss CD as well once we cover deployment technologies such as IaC, Kubernetes, Docker, etc.

By Shai Almog CORE
Is DevOps Dead?
Is DevOps Dead?

What Exactly Is DevOps Anyway? Way back in 2006, Amazon CTO Werner Vogels got the ball rolling when he famously said, “Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, throw it over, and then forget about it. Not at Amazon. You build it; you run it. This brings developers into contact with the day-to-day operation of their software.” While the “you build it, you run it” mantra has become synonymous with DevOps, the actual application has not been so clear for many organizations. Dozens of articles and Reddit posts would seem to indicate that there is a very big range of opinions on this topic. Is DevOps a cultural approach that bridges the gap between siloed ops and dev teams that get them to cooperate more closely? Perhaps it really is, as Werner probably intended, where software developers take full responsibility for infrastructure and operational processes. This would make it overlap a bit with the SRE role described by Benjamin Treynor Sloss, VP of Engineering, Google and founder of SRE, as “what you get when you treat operations as a software problem and you staff it with software engineers.” Or maybe it’s an over-used and abused job title that, in the real world, confuses job seekers, recruiters, and executives alike? As such, should the term DevOps be used in the hiring process, or perhaps recruiters should just list out all the required skills the candidate will need to have? And What of Platform Engineering? Is This Something New or Just a Variation of DevOps? Some might even argue that DevOps is just a sexier way to describe a Sysadmin role! Are companies hurting themselves by trying to force the organization to adopt DevOps without fully understanding what it’s meant to accomplish? Maybe true DevOps is only relevant for very advanced organizations like Amazon, Netflix, and other tech elites? And what is the most important ingredient for successful DevOps? Is it about excellent communication between dev and ops? Is it about using the best automation and tooling? Or perhaps great DevOps can only be achieved if a company has developers who are infrastructure and operations experts and can handle managing it all on top of their daily coding. Finally, assuming that there is some consensus that DevOps is dead, is it time to pay our respects and say farewell? In this fast and furious session, these questions and more are hotly debated by Andreas Grabner, DevOps Activist at Dynatrace, Angel Rivera, Principal Developer Advocate at CircleCI, Oshrat Nir, Head of Product Marketing at Armo, and Fabian Met, CTO at Fullstaq. Each of the panelists brings years of technology experience at leading companies and shares their unique perspectives and opinions on whether DevOps is dead or not. The debate is moderated by Viktor Farcic, a popular YouTube educator, Upbound Developer Advocate, CDF ambassador, and published author. So put your headphones on, grab some popcorn, and enjoy!

By Amit Eyal Govrin
What Is Continuous Testing?
What Is Continuous Testing?

Testing is a crucial part of the Software Development Lifecycle (SDLC). Testing should be included in every stage of the SDLC to get faster feedback and bake the quality within the product. Test automation can get you excellent results if it is implemented and used in an efficient way and continuous testing is the right approach. According to Markets and Markets, the continuous testing market is expected to grow at a compound annual growth rate of 15.9% during the forecast period of 2018-2023 and reach $2.41 billion by 2023. 2017 was considered the base year for estimating the market size. In this article, we will discuss what continuous testing is, how to implement it, and what benefits we can get out of it. What Is Continuous Testing? Continuous testing helps provide faster feedback in all stages of the Software Development Lifecycle (SDLC). In most SDLC cases, it is seen that minimal automated tests are written at the core level, hence, increasing the pressure on the top level of the test pyramid to perform manual exploratory testing. This actually hits the quality because catching an error after the development is complete is very costly. The table below shows the cost to fix a bug at Google and you can see it costs a whopping $5000 when a bug is discovered in the “System Testing” phase. Continuous testing helps us evaluate this fear of software failing by providing early feedback as soon as the code is committed to the repository. The main goal of continuous testing is to test early at all stages of the SDLC with automation, test as often as possible, and get faster feedback on the builds. You might know about the Go/No-Go meetings, which are set up before every release, this meeting helps you find out whether you are headed in the right direction. It helps you decide whether you are good to release the application with the respective features in the production or not. Similarly, continuous testing works, it provides you the test results based on what you decide you can move to the next stage of development. Using continuous testing, we can fix all failures as soon as they occur before moving on to the next stage, which eventually helps save time and money. Why Is Continuous Testing Needed? In one of my previous projects, we were working on a mobile application to be developed for iOS and Android platforms. The client wanted everything to be automated from the inception itself. Any bug leakage into production meant it will impact the business directly and cost millions of dollars. We were asked to present a plan for automation where testing would be carried out in every stage of the development to minimize the risk of bug leakage. Hence, we decided to implement the test pyramid and create a CI/CD pipeline where testing would be done continuously at every stage. Here is the graphical representation of the CI/CD Pipeline that. can also be taken as a practical guide to implementing continuous testing in a project: To bake the quality within the product, we came up with a plan to introduce testing at every stage in the pipeline, and as soon as any red flag appears, it should be fixed before we move on to another phase. So, as soon as the dev commits the code to the remote repository, the following scans would run: Static code analysis: This will ensure the best coding practices are followed and alert us with code smells in case of any errors. SecOps scan: This will scan the code and all the libraries used within the code for any security vulnerabilities and raise a red flag in case of “Critical,” “High,” “Medium,” or “Low” level vulnerabilities that should be taken care of. Once the above scans are successful, the pipeline would move ahead and run the following tests in the development environment: Unit tests Integration tests System tests End-to-end tests All of the above tests will ensure the code is working perfectly as expected. If any of the above tests fails, the pipeline will break and a red flag will be raised. It is the developer’s responsibility to fix those respective failing tests. It’s not about playing the blame game, but finding the commit that broke the build and fixing it. The team will offer help to the developer to fix the issue. After all the above-mentioned tests pass successfully, the build will be deployed to the QA environment where end-to-end automated tests will run on the QA build as a part of regression tests. Once the end-to-end automated tests pass on a QA build, QA will pick up the build and perform manual exploratory tests to uncover any further defects in the build. Once QA signs off the build, it will eventually be deployed to the UAT env where further rounds of testing will be done by the UAT team of testers. Finally, after the sign off, the build will be deployed into production. This plan worked for us tremendously as we uncovered many issues in the first and second stage of the pipeline when the unit and integration tests were run. The Static Code Analysis and SecOps scan helped us implement the best coding practices and fixing vulnerable libraries by updating to the latest version or discarding and using the libraries that were less prone to vulnerabilities and also frequently updating them so the code is less prone to security risks. Though we also discovered issues in the exploratory testing, which was done manually; however, those were not that critical. Most of the issues were resolved in the initial phase, which provided us faster feedback. Continuous Testing: The Need of the Hour No Longer Good but a Must Have in the SDLC Lifecycle From my experience, the following points are derived, which states why continuous testing is needed: Requirements change frequently: With the requirement changing frequently, the need arises to change the code as well, and with every change we do, there is a risk involved. There are two risks involved here: Whether the changed code will work as expected. If this change impacts the existing code. With continuous testing, we can tackle both of these risks by setting up an automated pipeline, which will run the unit, integration, and, eventually, automated regression tests. Continuous integration:With agile development in place, continuous integration has gained a lot of popularity where developers merge their code to the main branch as often as possible, to make it production ready. Continuous testing helps us here because before merging takes place, the code goes through a pipeline where automated tests are run against the build. If there is a failure, the code does not merge and a red flag is raised. Production ready: With continuous testing, we can be production ready as all our checks and tests run on an automated pipeline as soon as the developer commits the code. Reduce human errors: In the case of regression tests, if an automated test is written, it can serve as a documentation proof for the feature and help reduce human errors in testing. Advantages of Continuous Testing Fast feedback: In the traditional software development process, the team had to wait for the tester’s feedback, who would test the build manually after the developer completed writing the feature from their end. After the tester’s feedback, they had to rework to fix the issues that were time-consuming and more costly. With continuous testing, we can get faster feedback on the newly committed code and save time and money. Quality baked within the product: With all tests running in the automated pipeline, from unit, integration, functional, security, performance, and end-to-end user journeys, we can be sure that quality is baked within the product itself and need not worry about releasing it to production. Reduces bug leakage: Continuous testing helps in eliminating the chances of bugs occurring in the build by providing us with timely updates about software failures. Minimize the risks: It also helps find the risk, address them, and improve the quality of the product. Important Types of Continuous Testing Unit tests: This involves testing a piece of code in isolation. Basically, testing every method written for the feature. The main objective of this test is to check that the code is working as expected, meaning all the functionalities, inputs, outputs, and performance of the code are as desired. Integration tests: This involves testing the two modules together. The goal of this test is to ensure the integration between the two components is working fine. Regression tests: This is the most widely used test and is used to ensure the existing functionality of the application is working as expected after the latest addition or modification to the code repository, End-to-end Journey tests: These tests are added to check the end-to-end working of the software. The goal of these tests is to ensure the end user is able to use the application end to end. Future of Continuous Testing With the ever-increasing demand for high-quality software and the economies flourishing with digitalization at its core, continuous testing is considered an important aspect. A software company is required to respond to frequent changes happening daily in the SDLC and continuous testing is the answer. The following points are the benefits of continuous testing: To adapt to the frequent changes in the software. To achieve maximized automation in the delivery cycle and avoid loopholes in the process. Minimize human errors. To provide a cost-effective solution to the end customer. Beat the competition and outperform the competitors by releasing bug-free software. Baking the quality within the product. As technology progresses, so is the need to upgrade the process. With continuous testing, best results can be achieved. Role of Cloud Services Platform in Continuous Testing Last year, I was working on a mobile application development project, which had iOS and Android versions to be rolled out. Now, as it was planned to be rolled out in different regions around Germany, we studied mobile phone usage in the different areas of Germany and came to know that Android and iPhones are used in all areas. Hence, we concluded that we will need a combination of at least six devices (three Android devices and three iPhones) to test our build, two devices to test minimum supported versions, two devices with the latest versions, and two devices with random versions between the highest and minimum supported ones. The hard part was to get these devices, as mobile phones and their versions are updated nearly every 2-3 months, so even if the organization invested and bought these devices, it would be required to update them occasionally when the new version launches. Here, the cloud platform services came to the rescue and we simply purchased the right plan as per our requirement and got access to the required real devices, which helped us test the applications smoothly by running automated and manual exploratory tests on the real devices on the cloud platform. In today’s fast-paced world, there are multiple platforms where software works, from browsers to mobile phones and tablets. As we release the application to production, we need to make sure it runs on all the desired platforms and fix the things we find that are not working. However, to do that, we need to test it on the respective devices/browsers to ensure it works hassle-free. This is possible but will cost money and time as we will have to purchase the hardware and provide the required resources to make it work. From hiring engineers to setting up the infrastructure. As we are testing continuously, performing parallel runs on the different browsers and their respective versions OR on different mobile devices with different OS versions, these services help us test continuously by providing us with the required device, browsers/OS and their variety of versions, so we catch the bugs early and using the early feedback to fix the required issue and stop the bug leakage. Conclusion Quality is a crucial part of software and needs to be baked in the software. Continuous testing helps us build the right product by implementing testing at every phase of the Software Development Lifecycle. We need to be production ready with every feature we build, which is necessary to get fast feedback with a fail-fast strategy. There are various testing types available that help us implement continuous testing using automated pipelines. Cloud Services Platforms provide us with the right infrastructure to keep up the pace with testing continuously by providing the required infrastructure.

By Faisal Khatri
A Beginner's Guide to Infrastructure as Code
A Beginner's Guide to Infrastructure as Code

This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report Infrastructure as Code (IaC) is the practice of provisioning and managing infrastructure using code and software development techniques. The main idea behind IaC is to eliminate the need for manual infrastructure provisioning and configuration of resources such as servers, load balancers, or databases with every deployment. As infrastructure is now an integral part of the overall software development process and is becoming more tightly coupled with application delivery, it's important that we make it easier to deliver infrastructure changes. Using code to define and manage infrastructure and its configuration enables you to employ techniques like version control, testing, and automated deployments. This makes it easier to prevent all sorts of application issues, from performance bottlenecks to functionality failures. This article will explain how IaC works, highlighting both approaches, as well as the benefits and challenges of delivering infrastructure as code within a DevOps environment. How Does IaC Work? Historically, infrastructure management was mostly a manual process done by specialized system administrators. You needed to manually create virtual machines (VMs), manage their software updates, and configure their settings. This became highly expensive and time-consuming, especially with the rapid pace of modern software development. IaC evolved as a solution for scalable infrastructure management. It allows you to codify the infrastructure to then be able to create standardized, reusable, and sharable configurations. IaC also allows you to define infrastructure configurations in the form of a code file. For example, Figure 1 demonstrates how you’d define the creation of an S3 bucket in AWS, using CloudFormation. Resources: S3Bucket: Type: 'AWS::S3::BUCKET' DeletionPolicy: Retain Properties: BucketName: DOC-EXAMPLE-BUCKET As you define your Infrastructure as Code, you can implement the same practices that you use with application development code, like versioning, code review, and automated tests. Figure 1: The IaC workflow To implement IaC, you can use a variety of tools and technologies, like: Configuration management tools that make sure the infrastructure is in the desired state that you previously defined, like Ansible, Chef, or Puppet. Provisioning tools (e.g., CloudFormation templates) that allow you to define cloud resources in the form of a JSON or YAML file, and provision that infrastructure on a cloud platform. Containerization tools (e.g., Docker, Kubernetes) used to package applications and their dependencies into containers that can be run on any infrastructure. Approaches to IaC There are two different Infrastructure as Code approaches: imperative (procedural) IaC and declarative (functional) IaC. With the imperative method, the developer specifies the exact steps that the IaC needs to follow to create the configuration. The user commands the automation completely, which makes this method convenient for more specific use cases where full control is needed. The main advantage of the imperative method is that it allows you to automate almost every detail of the infrastructure configuration. This also means that you need a higher level of expertise to implement this type of automation as it's mostly done by executing scripts directly on the system. Here is an example of an imperative way to create an S3 bucket using the AWS CLI: aws s3api create-bucket --bucket my-new-bucket --region eu-central-1 When you run this command, the AWS CLI will create a new Amazon S3 bucket with the name my-new-bucket in the eucentral-1 region. With the declarative method, the developer specifies the desired outcome without providing the exact steps needed to achieve that state. The user describes how they want the infrastructure to look through a declarative language like JSON or YAML. This method helps with standardization, change management, and cloud delivery. Features can be released faster and with a significantly decreased risk of human error. Here's a simple example of a declarative Infrastructure as Code using AWS CloudFormation: { “Resources”: { “EC2Instance”: { “Type”: “AWS::EC2::Instance”, “Properties”: { “InstanceType”: “t2.micro”, “ImageId”: “ami-0c94855bac71e”, “KeyName”: “my-key” } } } } This script tells CloudFormation to create an EC2 instance of type t2.micro, and the CloudFormation will take care of all the required steps to achieve that state with the specific properties you defined. Figure 2: Overview of IaC approaches The Benefits and Challenges of IaC Infrastructure as Code is one of the key practices in DevOps and provides many benefits that save time and money, as well as reduce risk. But as with any tool, solution, or practice that you adopt in the organization, it is important to weigh the challenges that one may face when implementing IaC methods. Factor Benefits Challenges Codification Codifying infrastructure helps developers ensure that the infrastructure is configured well without any unintended changes whenever it is provisioned. Infrastructure as Code can reduce visibility into the infrastructure if it is not implemented or managed properly. Improved visibility can be achieved by making sure that the code is well-documented, easily accessible, standardized, simple, and properly tested. Configuration drift IaC is idempotent, meaning that if a change happens that's not in sync with your defined IaC pipeline, the correction to the desired state occurs automatically. Avoiding manual changes directly in the console reduces challenges with configuration drift in relation to IaC implementation. Version control Integrating with the version control system your team uses helps create trackable and auditable infrastructure changes and facilitates easy rollbacks. As the size and complexity of an organization's infrastructure grow, it can become more difficult to manage using IaC. Testing and validation Infrastructure changes can be verified and tested as part of the delivery pipeline through common CI/CD practices like code reviews. Performing code reviews to ensure infrastructure consistency is not always enough — there are usually a variety of testing options specific to a use case. Cost Automating time-consuming tasks like infrastructure configuration with IaC helps minimize costs and reallocate resources to more critical assignments. Extra costs can easily ramp up if everyone is able to create cloud resources and spin up new environments. This usually happens during the development and testing phases, where developers create resources that can be forgotten after some time. To prevent this, it's a good idea to implement billing limits and alarms. Speed There may be a higher initial investment of time and effort to automate infrastructure delivery, but automating IaC brings faster and simpler procedures in the long run. For organizations that run simple workloads, the process of automating and managing IaC can become more burdensome than beneficial. Error handling Automating with IaC eliminates human-made infrastructure errors and reduces misconfiguration errors by providing detailed reports and logs of how the infrastructure works. In complex infrastructure setups, it can be extremely challenging to debug and troubleshoot the infrastructure, especially when issues arise in production environments. Security You can define and execute automated security tests as part of the delivery pipeline. Security experts can review the infrastructure changes to make sure they comply with the standards, and security policies can be codified and implemented as guardrails before deploying to the cloud. As IaC is a much more dynamic provisioning practice that can be used to optimize infrastructure management, it can be misused almost as easily. IaC can make it easier to unintentionally introduce security vulnerabilities, such as hard-coded credentials or misconfigured permissions. Table 1: IaC benefits vs. challenges — factors to consider Conclusion With IaC, systems can be easily reproduced and reused; the processes are consistent. As DevOps culture becomes more pervasive, maintaining a strategic advantage through IaC will likely become an increasingly important goal. If you work at an organization that aims to implement Infrastructure as Code within their existing processes, it can be helpful to understand the benefits and challenges that your team might encounter. The trick is to walk a fine line between understanding your business' infrastructure needs and recognizing the potential for improvement that IaC offers. To ensure that IaC is implemented properly, it's important to start small. You want to gradually increase the complexity of the tasks, avoid over-complicating the codebase, continuously monitor the IaC implementation to identify areas for improvement and optimization, and continue to educate yourself about the different tools, frameworks, and best practices in IaC. Check out these additional resources to continue learning about IaC: Getting Started With IaC, DZone Refcard IaC Security: Core DevOps Practices to Secure Your Infrastructure as Code, DZone Refcard "Infrastructure-as-Code: 6 Best Practices for Securing Applications" by Jim Armstrong "5 Best Practices for Infrastructure as Code" by Marija Naumovska "Best Infrastructure as Code Tools (IaC): The Top 11 for 2023" by Florian Pialoux This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report

By Marija Naumovska
Shift-Left: A Developer's Pipe(line) Dream?
Shift-Left: A Developer's Pipe(line) Dream?

This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report The software development life cycle (SLDC) is broken. In fact, the implementation steps were flawed before the first project ever utilized Winston Royce's implementation steps. Figure 1: Winston Royce software implementation steps (Waterfall Model) Dr. Winston Royce stated "the implementation described above is risky and invites failure" in the same lecture notes that presented this very illustration back in 1970. Unfortunately, that same flaw has carried over into the iterative development frameworks (like Agile) too. What is broken centers around when and how the quality validation aspect is handled. In Figure 1, this work was assumed to be handled in the testing phase of the cycle. The quality team basically sat idle (from a project perspective) until all the prior work was completed. Then, the same team had a mountain of source code which had to be tested and validated — putting the initiative in a position where it is difficult to succeed without any issues. If it were possible to extract a top 10 list of lessons learned over the past 50+ years in software development, I am certain that the consequence of placing quality validation late in the development lifecycle is at the top of the list. This unfortunate circumstance has had a drastic impact on customer satisfaction — not to mention livelihood of products, services, or entire business entities. Clearly, we are far past due for a "shift" in direction. How Shift Left Is a Game Changer Shift left is an approach that moves — and really scatters — the testing phase of the development lifecycle to earlier in the process. In fact, the "test early and often" approach was first mentioned by Larry Smith back in a 2001 Dr. Dobb's Journal post. To demonstrate how shift left is a game changer, consider the basic DevOps toolchain shown in Figure 2, which has become quite popular today: Figure 2: DevOps toolchain Shift left adoption introduces a testing effort at every phase of the lifecycle, as is demonstrated in Figure 3: Figure 3: DevOps toolchain with shift-left adoption Employing a shift-left approach redistributes when the quality aspects are introduced into the lifecycle: Plan – Expose fundamental flaws sooner by leveraging and validating specifications like OpenAPI. Create – Establish unit and integration tests before the first PR is created. Verify – Include regression and performance/load tests before the first consumer access is granted. Package and so on – Assure the CI/CD pipelines perform automated test execution as part of the lifecycle. This includes end-to-end and sanity tests designed to validate changes introduced in the latter phases of the flow. As a result, shift left yields the following benefits: Defects in requirements, architecture, and design are caught near inception — saving unnecessary work. Difficulty in trying to comprehend and organize a wide scope of use cases to validate is avoided by the "test early and often" approach inherent within shift left. Understand performance expectations and realities, which could potentially drive design changes. Determine breaking points in the solution — before the first production request is made. Avoid having to make design updates late in the lifecycle, which is often associated with unplanned costs. Higher staff levels required to fully test at the end of development are avoided by dedicating smaller staff levels to participate throughout the development lifecycle. Shift Left in Action The "test early and often" approach often associated with shift left has the direct result of better quality for a given solution. Here are a few key points to expect when shift left is put into action. New Career Path for Quality Engineers One might expect the pure focus of shift left to be on quality engineers assigned to the initiative. That is not really the case. Instead, this change becomes the role of every engineer associated with the project. In fact, more organizations today are merging tasks commonly associated with quality engineers and assigning them to software engineers or DevOps engineers. Forward-looking quality engineers should evaluate which of these engineering roles best suits their short- and long-term goals when thinking about their role in shift left-driven organizations. Test Coverage In a "test early and often" approach, actual tests themselves are vital to the process. The important aspect is that the tests are functional — meaning they are written in some type of program code and not executed manually by a human. Some examples of functional tests that can adhere to shift left compliance are noted below: API tests created as a result of an API-first design approach or similar design pattern Unit tests introduced to validate isolated and focused segments of code Integration tests added to confirm interactions across different components or services Regression tests written to fully exercise the solution and catch any missed integrations Performance tests designed to determine how the solution will perform and establish benchmarks The more test coverage that exists, the better the solution will be. As a rule of thumb, API and unit tests should strive for 100% code coverage and the remaining tests should strive for 90% coverage since reaching full coverage is often not worth the additional costs required. Pipeline Configuration Adoption of the shift-left approach can require updates to the pipeline configuration as well. In addition to making sure the tests established above are part of the pipeline, sanity and end-to-end functional tests should be included. These sanity tests are often short-running tests to validate that each entry point into the solution is functioning as expected. The end-to-end functional tests are expected to handle the behavioral aspects of the application — validating that all of the core use cases of the solution are being exercised and completed within the expected benchmarks. Release Confidence The end result of shift left in action is a high degree of confidence that the release will be delivered successfully — minimizing the potential for unexpected bugs or missed requirements to be caught by the customer. This is a stark contrast to prior philosophies that grouped all the testing at the end of the lifecycle — with a hope that everything was considered, tested, and validated. Too Idealistic? Like any lifecycle or framework, there are some key challenges that must be addressed and accepted before shift left is adopted into an organization to avoid a "too idealistic" conclusion. Must Be a Top-Down Decision Shift left must be a top-down decision because the core philosophy changes that are being introduced extend beyond the team participating in the initiative. What this means is that managers of quality engineering staff will find themselves dedicating individuals to projects on a long-term basis instead of handling all the validation efforts after development has been completed. From a bigger-picture perspective, those quality engineers are likely going to find themselves adapting their role to become a software or DevOps engineer — reporting to a different management structure. Level-Set Expectations The shift left approach will also require expectations to be established and clarified early on. This builds upon the last challenge because each step of the lifecycle will likely require more time to complete. However, the overall time to complete the project should remain the same as the projects that achieve full and successful testing at the end of the lifecycle. It is vital to remember that defects found by shift left adoption will have less of an impact to resolve. This is because the testing is completed within a given phase, reducing the potential to address additional concerns that must be resolved in prior phases of the lifecycle. Sacrificing Quality Is No Longer an Option A risky approach often employed for initiatives that are pressured to meet an imposed deadline is to shorten the amount of time reserved for the quality and validation phase of the lifecycle. When shift left is adopted, this is no longer a valid option since a dedicated testing phase no longer exists. While this approach is never something that should be considered, aggressive decision-makers often would roll the dice and sacrifice the amount of time given to quality engineers to validate the initiative, hoping for a positive result. Conclusion Throughout my life, I have always been a fan of the blockbuster movies. You know those movies made with a large budget and a cast that includes a few popular actors. I was thinking about the 1992 Disney movie Aladdin, which builds upon a Middle Eastern folktale and features the comic genius of the late Robin Williams. I remember there being a scene where the genie gives Aladdin some information about the magical lamp. Immediately, the inspired main character races off before the genie can provide everything Aladdin really needs to know. It turns out to be a costly mistake. I feel like Dr. Winston Royce was the genie while the rest of us raced through the lifecycle without a desire to hear the rest of the story, like Aladdin. Decades and significant cost/time expenditures later, a new mindset has finally emerged which builds upon Royce's original thoughts. Regardless of your team's solution destiny, shift left should be strongly considered because this is something we should have been doing all along. Taking this approach provides the positive side effect of transforming dedicated quality engineers to software engineers who can participate at every step of the lifecycle being utilized. The biggest benefit is identifying defects earlier in the lifecycle, which should always translate into a significant cost savings when compared to finding and fixing defects later in the lifecycle. To succeed, implementing shift left must be a top-down decision — driven by an organization-wide change in mindset and supported by everyone. Not ever should a reduction in quality or performance testing be a consideration to shorten the time required to release a new feature, service, or framework. Have a really great day! References: "Shift Left," Devopedia "Successfully Implementing TDD/BDD to Enable Shift-Left Testing Approach" by Pavan Rayaprolu "Managing the Development of Large Software Systems" by Dr. Winston W. Royce "Shift-Left Testing" by Larry Smith This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report

By John Vester CORE
How To Build an Effective CI/CD Pipeline
How To Build an Effective CI/CD Pipeline

This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report Continuous integration/continuous delivery (CI/CD) pipelines have become an indispensable part of releasing software, but their purpose can often be misunderstood. In many cases, CI/CD pipelines are treated as the antidote to release woes, but in actuality, they are only as effective as the underlying release process that they represent. In this article, we will take a look at a few simple steps to create an effective CI/CD pipeline, including how to capture and streamline an existing release process, and how to transform that process into a lean pipeline. Capturing the Release Process A CI/CD pipeline is not a magic solution to all of our release bottlenecks, and it will provide minimal improvement if the underlying release process is faulty. For software, the release process is the set of steps that a team uses to get code from source code files into a packaged product that can be delivered to a customer. The process will reflect the business needs of each product and the team creating the product. While the specifics of a release process will vary — some may require certain security checks while others may need approvals from third parties — nearly all software release processes share a common purpose to: Build and package the source code into a set of artifacts Test the artifacts with various levels of scrutiny, including unit, integration, and end-to-end (E2E) tests Test the critical workflows of the product from an end user's perspective Deploy the artifacts into a production-like environment to smoke test the deployment Every team that delivers a product to a customer has some release process. This process can vary from "send the artifacts to Jim in an email so he can test them" to very rigid and formalized processes where teams or managers must sign off on the completion of each step in the process. Putting It Down on Paper Despite this variation, the first, most-critical step in developing an effective CI/CD pipeline is capturing the release process. The simplest way to do this is by drawing a set of boxes to capture the steps in the release process and drawing arrows from one step to another to show how the completion of one step initiates the start of another. This drawing does not have to be overly formal; it can be done on a sheet of paper, so long as the currently practiced process is captured. Figure 1 illustrates a simple release process that is common for many products: Figure 1: A basic release process - Capturing the steps of the current release process is the first step to creating a pipeline Speaking the Same Language Once the current release process has been captured, the next step is to formalize the process. When speaking about a release process, and eventually a CI/CD pipeline, it is important to use a common vernacular or domain language. For pipelines, the basic lexicon is: Step – A single action, such as Build, Unit Tests, or Staging, in the release process (i.e., the boxes). Stage – A single phase in the release process, containing one or more steps. Generally, stages can be thought of as the sequential columns in a pipeline. For example, Build is contained in the first stage, Unit Test in the second stage, and User Tests and Staging in the fifth stage. When there is only one step in a stage, the terms step and stage are often used synonymously. Pipeline – A set of ordered steps. Trigger – An event, such as a check-in or commit, that starts a single execution of the pipeline. Gate – A manual step that must be completed before all subsequent steps may start. For example, a team or manager may need to sign off on the completion of testing before the product can be deployed. A CI/CD pipeline is simply an automated implementation of a formalized release process. Therefore, if we wish to create an effective CI/CD pipeline, it's essential that we optimize our release process first. Optimizing the Release Process Since our CI/CD pipelines are a reflection of our release process, one of the best ways to create an effective pipeline is to optimize the release process itself before deriving a pipeline from it. There are three critical optimizations that we can make to a release process that pay dividends toward an effective pipeline: Streamline the process– We should minimize any bottlenecks or artificial steps that slow down our release process. Remove any unnecessary steps. Minimize the number of steps while fulfilling business needs. Simplify any complex steps. Remove or distribute steps that require a single point-of-contact. Accelerate long-running steps and run them in parallel with other steps. Automate everything– The ideal release process has no manual steps. While this is not always possible, we should automate every step possible. Consider tools and frameworks such as JUnit, Cucumber, Selenium, Docker, and Kubernetes. Capture the process for running each step in a script — i.e., running the build should be as easy as executing build.sh. This ensures there are no magic commands and allows us to run each step on-demand when troubleshooting or replicating the release process. Create portable scripts that can be run anywhere the release process is run. Do not use commands that will only work on specific, special-purpose environments. Version control the scripts, preferably in the same repository as the source code. Shorten the release cycle – We should release our product as often as possible. Even if the end deliverable is not shipped to the customer or user (e.g., we build every day but only release the product to the customer once a week), we should be frequently running our release process. If we currently execute the release process once a day, we should strive to complete it on every commit. Optimizing the release process ensures that we are building our CI/CD pipeline from a lean and efficient foundation. Any bloat in the release process will be reflected in our pipeline. Optimizing our release process will be iterative and will take continuous effort to ensure that we maintain a lean release process as more steps are added and existing steps become larger and more comprehensive. Building the Pipeline Once we have an optimized release process, we can implement our pipeline. There are three important pieces of advice that we should follow in order to create an effective CI/CD pipeline: Don't follow fads – There are countless gimmicks and fads fighting for our attention, but it is our professional responsibility to select our tools and technologies based on what is the most effective for our needs. Ubiquity and popularity do not guarantee effectiveness. Currently, options for CI/CD pipeline tools include GitHub Actions, GitLab CI/CD, and Jenkins. This is not a comprehensive list, but it does provide a stable starting point. Maintain simplicity– Each step should ideally run one script with no hard-coded commands in the pipeline configuration. The pipeline configuration should be thought of as glue and should contain as little logic as possible. For example, an ideal GitLab CI/CD configuration (.gitlab-ci.yml) for the release process in Figure 1 would resemble: build: stage: building script: - /bin/bash build.sh unit-tests: stage: unit-testing script: - /bin/bash run-unit-tests.sh integration-tests: stage: integration-testing script: - /bin/bash run-integration-tests.sh ... deploy: stage: deployment script: - /bin/bash deploy.sh --env production:443 --key ${SOME_KEY} This ideal is not always possible, but this should be the goal we strive toward. Gather feedback– Our pipelines should not only produce artifacts, but they should also produce reports. These reports should include: Test reports that show the total, passed, and failed test case counts Reports that gauge the performance of our product under test Reports that show how long the pipeline took to execute — overall and per step Traceability reports that show which commits landed in a build and which tickets — such as Jira or GitHub tickets — are associated with a build This feedback allows us to optimize not only our product, but the pipeline that builds it as well. By following these tips, we can build an effective pipeline that meets our business needs and provides our users and customers with the greatest value and least amount of friction. Conclusion CI/CD pipelines are not a magic answer to all of our release problems. While they are important tools that can dramatically improve the release of our software, they are only as effective as our underlying release processes. To create effective pipelines, we need to streamline our release processes and be vigilant so that our pipelines stay as simple and as automated as possible. Further Reading: Continuous Delivery by Jez Humble & David Farley Continuous Delivery Patterns and Anti-Patterns by Nicolas Giron & Hicham Bouissoumer Continuous Delivery Guide by Martin Fowler Continuous Delivery Pipeline 101 by Juni Mukherjee ContinuousDelivery.com This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report

By Justin Albano CORE
Dev vs. Ops: Conflicted? So Are We. [Comic]
Dev vs. Ops: Conflicted? So Are We. [Comic]

This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report

By Daniel Stori CORE
Source Code Management for GitOps and CI/CD
Source Code Management for GitOps and CI/CD

This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report GitOps has taken the DevOps world by storm since Weaveworks introduced the concept back in 2017. The idea is simple: use Git as the single source of truth to declaratively store and manage every component for a successful application deployment. This can include infrastructure-as-code (e.g., Terraform, etc.), policy documents (e.g., Open Policy Agent, Kyverno), configuration files, and more. Changes to these components are captured by Git commits and trigger deployments via CI/CD tools to reflect the desired state in Git. GitOps builds on recent shifts towards immutable infrastructure via declarative configuration and automation. By centrally managing declarative infrastructure components in Git, the system state is effectively tied to a Git commit, producing a versioned, immutable snapshot. This makes deployments more reliable and rollbacks trivial. As an added benefit, Git provides a comprehensive audit trail for changes and puts stronger guardrails to prevent drifts in the system. Finally, it promotes a more consistent CI/CD experience, as all operational tasks are now fully captured via Git actions. Once the pipeline is configured, developers can expect a standard Git workflow to promote their changes to different environments. Even though the benefits of GitOps are well documented, best practices for implementing GitOps are still being formed. Afterall, the implementation details will depend on the nature of the existing code repositories, size and makeup of the engineering teams, as well as practical needs for imperative changes (e.g., emergency rollback or break glass procedures). In this article, we'll look at how to choose the best strategy for embracing GitOps with the aforementioned considerations in mind. Source Code Management The first consideration to adopting GitOps is deciding what artifacts to store in which Git repository. Some tech companies, such as Google and Facebook, are staunch supporters of monorepos, utilizing sophisticated tools to build and deploy code. Others take the opposite approach, using multiple repos to segregate applications or products to manage them separately. Fortunately, GitOps is not tied to a particular framework, but unless your organization already has robust tooling to deal with monorepo builds, such as Bazel, the general recommendation is to at least separate application source code from deployment artifacts. Figure 1: Typical GitOps flow The benefits to separating the two repositories are multifold: Deployment cadence for application and infrastructure changes can be more easily separated and controlled. For example, application teams may want every commit to trigger a deployment in lower environments, whereas infrastructure teams may want to batch multiple configuration changes before triggering a deployment (or vice-versa). There may be compliance or regulatory requirements to separate who has access to deploy certain aspects of the application stack as a whole. Some organizations may only allow a Production Engineering or SRE team to trigger production deployments. Having separate repos makes access and audit trails easier to configure. For applications with dependent components that necessitate deployment as a single unit, a separate configuration repo allows multiple application or external dependency repos to push changes independently. Then the CD tool can monitor a single configuration repo and deploy all the components at the same time. With that said, the exact point of separation for what belongs in the application repo versus the deployment artifacts repo depends on the composition of your team. Small startups may expect application developers to be responsible for application code and other deployment artifacts (e.g., Dockerfile, Helm charts, etc.). In that case, keeping those manifests in the same repo and just keeping Terraform configs in another repo may make sense. As for larger organizations with dedicated DevOps/infrastructure teams, these dedicated teams may own Kubernetes components exclusively and maintain those separately from application code. Deployment Repo Setup The logical next question is, how many repos should hold deployment configs? The answer to this question is driven by similar considerations as above but with influence from the infrastructure topology. For small teams with a small number of cloud accounts/projects, it will be easier to have a single repo to host all deployment configs to trigger deployments to a small number of environments. As the infrastructure evolves, non-prod artifacts can be separated from prod repos. For mid-sized teams with slightly more sophisticated tooling and complex cloud infrastructure (e.g., multiple projects nested under organizations, hybrid-/multi-cloud), a repo per team may work well. This way different controls can be implemented based on security or compliance needs. At the other end of the spectrum, a repo per service and environment provides the most flexibility in terms of controls for large teams with robust tooling. Other CI/CD Considerations The main goal of GitOps is to leverage the power of Git to store the single source of truth in terms of the deployed state. In practice, however, what constitutes the versioned, immutable artifact will be determined by the state of CI/CD tools for each team. For teams with slow pipelines, it may be undesirable to trigger frequent updates. Likewise, for product teams needing to interface with external constituents, versioned releases may need to be coordinated with business partners. In such cases, designing the CI pipeline to update the configuration repo on tagged releases may be beneficial. On the other hand, for agile teams with robust CD tools for canary releases, frequent deployments may be favored to collect real-time user feedback. Another critical component to call out with GitOps is secret management. Since secrets can't be checked into Git as plaintext, a separate framework is required to deal with secrets. Some frameworks like Bitnami Sealed Secrets check encrypted data into Git and use a controller/operator on the deployed environment to decrypt the secrets. Others separate secrets entirely and leverage secret stores such as Hashicorp Vault. Finally, it's important to call out that even with GitOps, configuration drift may still occur. In fact, for small teams with immature tooling, it may be critical to leave some room for imperative changes. For time-critical outages, manual fixes may be necessary to restore the service first before running through the official pipeline for a long-term fix. Before enforcing stringent policies, test rollback and restore strategies to ensure that GitOps systems do not hinder emergency fixes. Conclusion GitOps is the best thing since configuration as code. Git changed how we collaborate, but declarative configuration is the key to dealing with infrastructure at scale, and sets the stage for the next generation of management tools.– Kelsey Hightower GitOps applies the best practices that application developers have learned over the years from interacting with Git to declarative configuration and infrastructure. GitOps produces a versioned, immutable state of the system with an audit trail via a tried-and-true system with which developers are already familiar. In this article, we walked through some considerations for implementing GitOps in your organization based on team size, composition, and state of your CI/CD tooling. For some, a monorepo holding everything in one place may work best, whereas for others, granular controls enforced at multiple repos are a requirement. The good news is that GitOps is flexible enough to adapt as your needs change. There is a plethora of GitOps tooling available in the market today. Start with a simple tool to reap the benefits of GitOps as you grow your technology stack to boost developer productivity. References: "5 GitOps Best Practices" by Alex Collins "Guide to GitOps," Weaveworks This is an article from DZone's 2023 DevOps Trend Report.For more: Read the Report

By Yitaek Hwang CORE

Top DevOps and CI/CD Experts

expert thumbnail

Boris Zaikin

Senior Software Cloud Architect,
Nordcloud GmBH

Certified Software and Cloud Architect Expert who is passionate about building solutions and architecture that solve complex problems and bring value to the business. He has solid experience designing and developing complex solutions based on the Azure, Google, AWS clouds. Boris has expertise in building distributed systems and frameworks based on Kubernetes, Azure Service Fabric, etc. His solutions successfully work in the following domains: Green Energy, Fintech, Aerospace, Mixed Reality. His areas of interest Enterprise Cloud Solutions, Edge Computing, High loaded Web API and Application, Multitenant Distributed Systems, Internet-of-Things Solutions.
expert thumbnail

Pavan Belagatti

Developer Advocate,
Harness

DevOps influencer, tech storyteller and guest author at various technology publications.
expert thumbnail

Nicolas Giron

Site Reliability Engineer (SRE),
KumoMind

Information Technology professional with 10+ years expertise in application development, project management, system administration and team supervision. Co-founder of KumoMind, a company aiming to share his deep expertise of open source technologies and cloud computing.
expert thumbnail

Alireza Chegini

DevOps Architect / Azure Specialist,
Smartwyre

Alireza is a software engineer with more than 20 years of experience in software development. He started his career as a software developer, and in recent years he transitioned into DevOps practices. Currently, he is helping companies and organizations move away from traditional development workflows and embrace a DevOps culture. Additionally, Alireza is coaching organizations as Azure Specialists in their migration journey to the public cloud.

The Latest DevOps and CI/CD Topics

article thumbnail
Unlock the Full Potential of Git
Discover the Top Must-Know Commands with Examples.
March 31, 2023
by Shivam Bharadwaj
· 755 Views · 1 Like
article thumbnail
How to Use Buildpacks to Build Java Containers
This article will look under the hood of buildpacks to see how they operate and give tips on optimizing the default settings to reach better performance outcomes.
March 30, 2023
by Dmitry Chuyko
· 2,281 Views · 1 Like
article thumbnail
What Is Docker Swarm?
Managing containers at scale can be challenging, especially when running large, distributed applications. This is where Docker Swarm comes into play.
March 30, 2023
by Aditya Bhuyan
· 1,787 Views · 2 Likes
article thumbnail
How To Add Estimated Review Time and Context Labels To Pull Requests
The easiest way to cut down your code review time is as simple as letting developers know how long a review will take.
March 30, 2023
by Dan Lines CORE
· 1,534 Views · 2 Likes
article thumbnail
Simplifying Containerization With Docker Run Command
Here, you will learn how to use the docker run command to create and start a Docker container from an image with various configuration options.
March 30, 2023
by Ruchita Varma
· 1,908 Views · 1 Like
article thumbnail
Optimizing Machine Learning Deployment: Tips and Tricks
Machine learning models are only effective when deployed in a production environment; that is where machine learning deployment becomes indispensable.
March 30, 2023
by Chisom Ndukwu
· 2,470 Views · 1 Like
article thumbnail
Tackling the Top 5 Kubernetes Debugging Challenges
Bugs are inevitable and typically occur as a result of an error or oversight. Learn five Kubernetes debugging challenges and how to tackle them.
March 29, 2023
by Edidiong Asikpo
· 2,465 Views · 1 Like
article thumbnail
View the Contents of a Deployed Message Flow in IBM App Connect Enterprise
Three videos explaining how to view the contents of a deployed message flow and how to retrieve and import resources that are deployed to an Integration Server.
March 29, 2023
by Sanjay Nagchowdhury
· 1,536 Views · 1 Like
article thumbnail
Reconciling Java and DevOps with JeKa
This article takes the reader through the JeKa capabilities and how to use a single language for everything from dev to delivery.
March 29, 2023
by jerome angibaud
· 2,184 Views · 2 Likes
article thumbnail
Configuring Database Connection Parameters Dynamically In WSO2 EI 7.1.0
Configuring database connection parameters dynamically in WSO2 EI 7.1.0, aimed at integration developers who are looking to integrate databases in WSO2 EI 7.X.
March 29, 2023
by Suman Mohan
· 1,667 Views · 1 Like
article thumbnail
Building a REST API With AWS Gateway and Python
Build a REST API using AWS Gateway and Python with our easy tutorial. Build secure and robust APIs that developers will love to build applications for.
March 29, 2023
by Derric Gilling CORE
· 2,942 Views · 1 Like
article thumbnail
When Should We Move to Microservices?
Avoiding the small monolith antipattern. At what scale do microservices make sense? Avoid a solution worse than the problem, and understand the tradeoffs.
March 28, 2023
by Shai Almog CORE
· 2,354 Views · 2 Likes
article thumbnail
OpenShift Container Platform 3.11 Cost Optimization on Public Cloud Platforms
A developer gives a tutorial on optimizing the performance of OpenShift containers using some shell scripts. Read on to learn more!
March 28, 2023
by Ganesh Bhat
· 9,579 Views · 3 Likes
article thumbnail
Demystifying Multi-Cloud Integration
Explore this in-depth look at multi-cloud integration to discover comprehensive strategies and patterns for integrating cloud systems.
March 28, 2023
by Boris Zaikin CORE
· 2,822 Views · 1 Like
article thumbnail
Automated Performance Testing With ArgoCD and Iter8
In this article, readers will learn about AutoX, which allows users to launch performance experiments on Kubernetes apps, along with code and visuals.
March 28, 2023
by Alan Cha
· 8,468 Views · 4 Likes
article thumbnail
Navigating Progressive Delivery: Feature Flag Debugging Common Challenges and Effective Resolution
Lightrun's Conditional Snapshot and Logs allow developers to create virtual breakpoints dynamically without impacting performance or sacrificing security.
March 27, 2023
by Eran Kinsbruner
· 3,458 Views · 1 Like
article thumbnail
Application Architecture Design Principles
Strategy combines processes, principles, and patterns. It's a daunting task requiring organizational commitment. Learn a coordinated, cross-cutting approach.
March 27, 2023
by Ray Elenteny CORE
· 2,583 Views · 3 Likes
article thumbnail
Orchestration Pattern: Managing Distributed Transactions
Design flexible and scalable systems with careful consideration of error handling and communication between components.
March 27, 2023
by Gaurav Gaur CORE
· 3,063 Views · 2 Likes
article thumbnail
Demystifying the Infrastructure as Code Landscape
Are you confused about the infrastructure as code landscape? In this article, I help classify the IaC tools and give actionable advice on choosing a tool.
March 27, 2023
by Saurabh Dashora CORE
· 2,083 Views · 3 Likes
article thumbnail
The Power of Docker Images: A Comprehensive Guide to Building From Scratch
Here, we’ll explore Docker images, its benefits, the process of building Docker images from scratch, and the best practices for building a Docker image.
March 27, 2023
by Ruchita Varma
· 3,226 Views · 2 Likes
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: