The Power of JFrog Build Info (Build Metadata), Part I
This article will take a detailed look at the term "build-info", what it is all about, and why it will help us protect against attacks such as the SolarWinds Hack.
Join the DZone community and get the full member experience.Join For Free
What Is the Concept Behind the Term “Build-info”?
Let's start at the very beginning and clarify the basic principle behind the term build-info. The term "build-info" has been coined for many years by the company JFrog, among others. This is a particular type of repository.
This repository stores the information that describes the context that led to the creation of a binary file. With this information, you can now achieve a wide variety of things.
What Components Make Up Build-info?
The content of a build-info is not strictly defined. Instead, the approach that applies is that the more, the better. Of course, you have to proceed with caution here, too. All possible parameters are collected: date and time, the system on which the process was run, which operating system was used in which patch level, active environment variables, compiler switches, and library versions.
The challenge is actually that it is not known which information will later be helpful and expedient. For this reason, more rather than less should be saved.
Why Do We Actually Need a Build-info?
The task of a build-info is to enable the observation, or rather, the analysis of a past situation. There can be a variety of reasons for this. For example, it can be used to improve quality, or it can be the basis for reconstructing a cyber attack that has taken place. And with that, we come straight to the event that got everything rolling in the recent past.
Trigger: SolarWinds Hack
You will have heard or read something about it. We are talking about one of the most significant cyberattacks that have ever taken place. It's the SolarWinds Hack. Here it was not the final target that was attacked directly, but a point in the supply chain. SolarWinds is a software company that provides a product for managing network infrastructure. With just over 300,000 customers worldwide, this software's automatic update process has been the target of the attack. It was not the update process itself that was compromised, but the creation of the binaries that will be distributed with this update process. The attack took place on the company's CI route to immediately infect the newly created binaries with each build. Here the CI route was manipulated so that another component was added to the binary to be generated. This component can be thought of as a kind of initial charge. As soon as this has been ignited or activated, further components are dynamically reloaded. As a result, each infection had different forms. These files were then offered to all customers by means of an automatic update. Thus, over 15,000 systems were infiltrated within a short time.
Reaction: Executive Order of Cybersecurity
Since there were many well-known US companies, US organizations, and US government institutions among the victims, the question arose of how to counter such a threat from the US in the future. The US government has decided that one begins with the complete cataloging of all software components in use, including all their constituent parts. This obligation to provide evidence was formulated in the "Executive Order of Cybersecurity". However, when I first heard about an executive order, I wasn't sure what that actually meant.
What Is an Executive Order?
An executive order is a decree of the U.S. President that regulates or changes internal affairs within the state apparatus. You can think of it as a U.S. president like a managing director of a company who can influence his company's internal processes and procedures. In doing so, no applicable law can be circumvented or changed. With such an executive order, no law can be changed, stopped, or restricted. However, it can change the internal processes of the U.S. authorities very drastically. And that's exactly what happened with this Executive Order of Cybersecurity, which has directly impacted the U.S. economy. Every company that works directly or indirectly for the state must meet these requirements to continue doing business with the U.S. authorities.
What Is the Key Message of the Executive Order?
The Executive Order of Cybersecurity contains a little more text, the content of which I would like to shorten here and reproduce without guaranteeing legal correctness.
This arrangement aims to record the software operated by or for the U.S. authorities in its entirety. This means that all components of the software used must be documented. This can be thought of as follows: If you want to bake a cake, you need a list of ingredients. However, listing these would not meet the requirements, as knowledge of all elements down to the last link is required. In our example, an ingredient that consists of more than one component would have to provide a list of all existing parts. This is how you get all the components used that are needed for the cake. If we now transfer this to the software, it is necessary to record all direct and indirect dependencies. The list of all components is then called SBOM (Software Bills Of Material).
And What Does That Mean for Me?
Now I am not directly working for an organ of the U.S. government. But it will still reach me sooner or later because indirectly, all possible economic sectors worldwide will be affected. It can also affect me in the private sector. If I have an open-source project that is used indirectly in this environment, it can only continue if I prepare the project accordingly. Long story short: it will come our way in whatever way it will happen.
What Are the General Requirements for a Build-info?
Let's get back to the build-info. A build-info is a superset of an SBOM (Software Bills Of Material). So the dependencies are cataloged, and additional information from the runtime environment in which the binary is generated is recorded. However, there are also some requirements for such build-info. By this, I mean properties that must be present in order to be able to deal with this information in a meaningful way.
The information that is collected must be quickly and easily accessible and therefore usable. A central storage location such as an Artifactory is suitable here, as all binaries and dependencies are also stored here. You also have all the meta-information of a respective binary file at this location.
When information is collected, it makes sense to store it in such a way that it can no longer be changed. This promotes the acceptance of the data situation and gives particular security when evaluating a case.
The static data are also suitable for enriching them with current secondary data. In this case, I am talking about the vulnerabilities that can be found in the dependencies used. In contrast to the static information, this image has to be constantly updated. In practical use, this means that the data must be stored in a machine-readable manner in order to enable the connection of other data sources. In the case of Artifactory, it is the data from JFrog Xray that is displayed here.
How Can a Build-info Be Generated?
It would be best if you had the following:
First, the free JFrog CLI generates the build-info, and second, it is possible to save this information permanently. You can do this very quickly with the Build-Info Repositories from Artifactory. But one step at a time.
The process for generating a build-info is as follows:
In the following example, we assume that we are going to build a Java project with maven. Here you have to get involved in the build process. You can do that quite easily with the JFrog CLI tools. These represent a wrapper for the build process used in each case. In our example, it is maven. To install the JFrog CLI Tools, you can go on the website https://jfrog.com/getcli/ to choose the installer that is suitable for it. After installing and configuring the JFrog CLI Tools, you can start the build via the CLI on the command line. After a successful build, you can find the build-info as a JSON file in the target directory. This file can be viewed with a text editor to get a feel for the wealth of information that is stored here. This file is regenerated with each build, which means that there is a 1: 1 relationship between the build cycle and the binaries generated in the process. Using the CLI, you can then transfer this information to the Artifactory build repository and evaluate it there using a graphical user interface.
How Can I Link Vulnerability Scans to the Build-info?
If you are now within Artifactory (https://bit.ly/SvenYouTube), look at the build-information: you also get the current information regarding the known vulnerabilities. This is an example of how the build-information can be enriched with further system information.
Since there is not enough space at this point to clarify the generation and evaluation, I refer to a demo project in which I have stored the steps in the README, including detailed explanations of the individual practical steps.
The URL for the demo project is https://github.com/Java-Workshops/JFrog-FreeTier-JVM.
Later I will add a pure step-by-step tutorial as a follow-up article.
Have fun trying it out. Cheers, Sven!
Published at DZone with permission of Sven Ruppert. See the original article here.
Opinions expressed by DZone contributors are their own.