Open source vulnerabilities are on the rise. Read here how to tackle them effectively.
We recently hosted another episode of our Continuous Discussions (#c9d9) podcast featuring expert panelists that held a discussion about artifact repositories.
Our expert panel included: Baruch Sadogursky, J*, G* and public speaking geek; Eduardo Piairo, DBAdmin and DevOps; Robert Los, one of the co-founders of BrainStam; Philip Lombardi, Senior Software Engineer at Datawire.io; and, our very own Anders Wallgren and Sam Fell.
During the episode, panelists discussed the benefits of artifact repositories as well as the challenges of enterprise repos in terms of traceability, distributed teams, and security and governance. Read on for what they had to say!
Benefits of Artifact Repositories
There are many benefits to using an artifact repository versus a file system, explains Sadogursky. “I would say that the biggest difference between an artifact repository and having some files thrown somewhere are two of these things: One, the file systems are stupid. We don’t know much about the files that are there and the file systems aren’t very friendly for automation. Two, when you take a binary artifact repository, it is kind of a file system storage, but with those two layers on top, you can have information about the artifacts, you can add to it, you can inquiry it and you can automate around it for your deployment tool.”
Piairo adds, “Having one central repository has helped us establish a communication contract between operations and dev. When you try to figure it out or establish configuration patterns, etc., it is very useful. Patterned communication establishes enforcement.”
Los sees artifact repositories as the heart of Continuous Delivery. “We think that an artifact repository is actually the center of the entire Continuous Delivery environment. We like to separate Continuous Integration from Continuous Deployment and what we like to do is that we build a process whether it’s Java based or Python based or some other base that results in artifacts that are stored on Artifactory.”
Artifact repositories can seriously benefit your development teams, says Lombardi: “My real feeling about artifact repositories is that they’re basically the common interface between developers and developers and then also developers and operations. Without them, developers were left to share source code around with each other. Developers can handle that, but there is no reproducibility there. There is no common way to share things. Really, the benefit comes in operations. You no longer have to have your developers thinking about how this stuff is going to get deployed, your ops team knows.”
Wallgren says that artifact repositories should be used much more. “A lot of us under-utilize artifact repositories. To some degree, they are absolutely necessary, but they are also not quite sufficient and I think we need a lot of other things around it. But, most of us are just not taking advantage of what’s there, which is a shame because there is a high correlation between good outcomes and being disciplined in your artifact manipulation.”
Artifact repositories now have positive effects on both the dev and ops sides, explains Fell. “The number one predictor of operations success is having versioned artifacts in an artifact repository. The idea that developers would be focusing on artifact repositories makes sense because we have been doing it forever. But now it is also on the operations side. To have one spot where you have a known version of something that has been built and tested, and all the energy and effort has gone into making it – that is the value of the repository.”
Challenges with Enterprise Repos: Dependency Management and Traceability
Artifact repositories can be beneficial to large, scalable systems, says Lombardi. “One of the biggest problems of any dependency system and any large scale software project is you end up either with a situation where you have circular dependencies or basically shadowed dependencies where you have two clashing versions of the same dependency for two different pieces of software on your system. How you end up resolving that becomes a tricky problem, but if you have a system base where you can set up your artifact repository to always point at one thing and say this is going to be the one that you resolve out of here, that really helps.”
“The challenge of artifact repositories in respect to dependencies is we’re really dealing with a world where the dependency information is stored separately from the artifact repository. As engineers and as developers we need to be a little more disciplined about our artifact management. For example, when you are building something, ask, ‘Why am I declaring this dependency?’ We tend not to do that.”
Sadogursky outlines the challenges with package management. “Package management is a surprisingly hard domain, it doesn’t look that way, it doesn’t feel that way. How hard can it be to grab a couple of files from a server and put them in some known place on your file system? But, for decades people couldn’t get it right. I did a talk about it that I called Dependency Management, Welcome to Hell. We go through all the different packaging types that we learned in JFrog when we added support for them to our artifact repository and binary and how they all suck – some of them in the same way, the others in various bizarre and different ways.”
It’s important to find a good balance, but this is also very challenging, according to Piairo. “The major challenge for me is getting a balance between the fundamental resource definition and the dependency mark. You need flexibility, you need to mount your system or application but you don’t want to put a lot of effort into doing that. Balance is very hard to get and is the main challenge I see.”
Fell stresses the importance of having a version of a service for dependency management. “If I think about the amount of dependency management you need upfront in a monolithic application versus the amount of dependencies you need at the end for microservices-type architectures – that dependency data you are talking about is living somewhere (whether it’s for run-time or compile-time) you still have to have that, that still needs to be held somewhere so when you are breaking your application apart you can manage your run-time dependencies for your end-users. Being able to have a version of some service is extremely important.”
Communication is crucial to ensuring high levels of traceability, says Los. “I think software development is largely about communication. If you have individuals who just start hacking around the program and you’re not discussing things with each other. That is why we have the daily stand up and scrums, we have the two-week discussions within the teams. If you don’t standardize and if you don’t communicate, you will end up with these problems.”
Challenges With Enterprise Repos: Distributed Teams, Apps, and Infrastructure
Los speaks on the benefits of distributed teams. “One of the benefits of distribution is that you can actually cache all your work and have it quickly available, especially the remote repositories. You can have them available fast to your developers and synchronize overnight, that can be a big help I think.”
Advice per Piairo: be wise. “Dependencies in distribution means bottlenecks. Be wise in choosing your bottleneck. You can choose your CI server or your release server, but whatever you choose – be wise. Have a good CI server because you will need it.”
Lombardi speaks on different tools that can help with enterprise repos in distributed teams. “Your locality is important. Make sure the stuff is close to whatever you’re deploying on. I have had the wonderful experience of running on Google container engine but using Docker.io to host the images and then finding that network connectivity for whatever reason between Google and Docker.io is less than satisfactory and your deployment fails. So, we moved everything into GCR and are super happy about that, because it’s just really fast and never really have any issues with the link between the two.”
The performance of repositories is particularly important in distributed teams and apps, says Wallgren. “You have to pay attention to the performance of the repositories. Because if you’ve got a thousand nodes that are downloading a particular JAR file at once, or you have a Docker image you have to distribute to a thousand nodes, in order to do that you need to make sure that stuff scales and you are doing the IO properly and using zero-copy. It’s easy to bring a naively implemented repository to its knees.”
Install your repository locally, but make sure it can be synchronized anywhere around the world, suggests Sadogursky. “It’s extremely important for development teams to bring their artifact repository to the team because there is latency, and a slow artifact repository is as annoying as any other repository. You need to do something that takes forever to download, and then productivity, of course, takes a dive. It’s kind of a standard now that in a good productive working environment you have an artifact repository nearby, and the real question is how can you synchronize between all of them, how the team in Europe after they have produced an artifact and go to sleep, the team in the United States that is starting to work will get these artifacts as soon as possible. That’s where it’s important to your artifact repository to be installed locally, but to be able to synchronize in any possible topology and through all the network obstacles like firewalls.”
Fell brings the conversation back to the production side of things. “Never does something have to scale as much as it has to scale when you are releasing something to hundreds of thousands of machines.”
Challenges With Enterprise Repos: Security and Governance
Piairo gives straightforward advice on how to ensure security and governance in repos. “The less you have to manage access the better. If you have one system about security, it will be less problematic. You should use your company’s rules.”
Sadogursky discusses two main security concerns he sees in enterprise repositories. “We have two types of security concerns, first is the access control and the second is the content control. In access control, our need to separate the permission comes from different perspective, first it can be really who should see some stuff but when we’re talking about in-house environments, that’s really less of an issue because usually people are allowed to see and then use packages or artifacts from different repositories and different groups. The other aspect is the content trust, how can we know that what is inside our packages and our artifacts is actually secure?”
Fell advises making it easy for your developers to know what they can and cannot access. “When it comes down to it, implement stuff that makes it so that people don’t have to do any work. You don’t want to force them to go and put a bunch of directories on a file system and force them to use some weird naming convention. They just want to build code. They just want to know what is safe to deploy.”
Security and governance are especially important in financial services organizations, explains Los. “The first thing that we set up in our company was we connected to an active directory or to other kinds of authentication mechanisms so that you have clear control on who can do what and who can do roll-based access to your stuff. Especially for financial organizations, it’s important that they have tight control and actually lock who is doing what and deploying what and when, because organizations like financial groups need to report that.”
There are a lot of bad practices out there, laments Wallgren. “Most of us don’t proxy artifact repositories into the internet, so aside from wasting bandwidth because everyone is pulling out their own versions of their Spring JAR’s, you don’t have any governance over what dependencies are being injected into your build systems. There are pretty significant bad practices – it’s not a problem with the tooling, servers or clients – it’s a problem between the keyboard and the chair. We use it out of the box and it’s unencrypted and anyone can inject their own bits into it.”