You Can’t Get Around Code Scanning if You Care About Open Source Licenses
You Can’t Get Around Code Scanning if You Care About Open Source Licenses
Too often, Open Source Software (OSS) contains code that violates license agreements, and not having the appropriate code scanner may get you into legal trouble.
Join the DZone community and get the full member experience.Join For Free
Learning by doing is more effective than learning by watching - that’s why Codebashing offers a hands-on interactive training platform in 10 major programming languages. Learn more about AppSec training for enterprise developers.
Today, every developer uses open source software (OSS) in their apps. If you’re developing modern software, you should probably be using a tool to help you track and comply with OSS licenses.
To properly discover what licenses you’re using, there’s no other way than to scan your code — this means you have to check every line of code across your deep dependencies for license information (ideally per-commit). Sounds like overkill, right? But for the past decade, code scanning has been a standard feature across every commercial tool that helps with OSS compliance. Plus, if you ever go through due diligence, that’s what the detail auditors will expect and use against you.
At FOSSA, full code scanning was one of the first features we built. For our customers, or any company serious about compliance, missing this feature meant a non-starter to deals.
Why must you run code scanning if you want to be compliant, and how can you do so without slowing down development?
Code Scanning is How Nearly All Compliance Issues Are Found
To understand why code scanning is important, you first have to consider the immense variety of ways developers share code. The most obvious method is by explicitly including an OSS library (usually by declaring a dependency in a software build or package file). However, in software, it’s commonplace to casually copy files, code snippets, binaries, or entire modules inline without a reliable way of reporting it.
Every time code is casually shared, it passes on a slew of unknown license and copyright responsibilities for every subsequent developer that uses or spreads the code. Today, developers have no easy way to see what’s inside the code they get. As more code is used/written/shared, legal obligations and risks cascade across the community. Even if your developers diligently avoid casual code sharing, they likely rely on code that doesn’t — and if they’re using a modern language/build system, their tools are automatically pulling in thousands of OSS libraries from casual developers.
Code scanning isn’t just the only way to cover these cases, but these cases also account for the majority of license violations.
When looking for tools to track your open source licenses, there are tons of free scripts and utilities to get a quick report — primarily by checking a single “package file” where developers describe the module and (hopefully) report the dominant license of their code. We call this package file parsing.
This data is useful but has serious blind spots, since it accounts for only the most obvious way developers include OSS code. Even if all OSS developers properly licensed their code:
- Package files missing or using default (automatically-assigned) license keys will list completely incorrect licenses.
- Package files only express “top-level” licenses for the publisher’s code —nothing for files, snippets, modules, or license headers included inline.
- Package files do NOT include raw copyright, notices, and other data needed for creating required disclosures, notices, and attributions
- And much, much more…
These limitations don’t account for just fringe occurrences, but the bulk of how license SNAFUs enter a codebase. Undesirable code that compromises an entire product rarely comes from explicitly including a bad package, but instead through deeply-nested files or embedded sub-dependencies.
That’s why relying on just package file parsing is not only unreliable, it’s dangerous. Most compliance issues don’t come from the obvious stuff, which is why commercial tools must implement code scanning (and are typically the only ones that can afford to — it’s a lot of work to build & maintain!).
Having code scanning is key, and is usually one of the first questions we’re asked when talking to someone procuring/evaluating FOSSA. But don’t just rely on this article, ask your lawyer.
Making Code Scanning Accessible
Code scanning is necessary but also intimidating because it adds a lot more data to manage. If your tool is doing full code scans, it’s doing an immense amount of work for you behind the scenes (on average ~1000x as much compared to package file parsing). As a developer, the last thing I want is to have to review tons of data in order to ship my product.
“Wait what? You want me to hire people to run this tool?” — A sad guy.
Modern developers need to move fast and have high standards for their tools; you can’t implement things that will get in their way. Unfortunately, most code scanning tools weren’t made to be run on a fast and continuous basis — their output is often large spreadsheets of technical data that require immense expertise and manual review. Trying to integrate this with ongoing development just isn’t worth it—it slows down engineers and requires a massive budget/buy-in. But somehow, you need to use that data to run an effective compliance process.
How can you get value from code scanning without creating more work for yourself and your team?
At FOSSA, we spent a significant amount of time figuring out how to make code scans compatible with a fast development workflow. On top of code scanning and package file parsing, we added a set of key features to keep compliance fast and automated:
- Static Analysis — To understand how modules are laid out and used in the code.
- License Inferencing — Analyzing the difference between package, declared, and vendorized licenses (from inline dependencies).
- Automatic Dual/Multi-License Handling — Automated policy approvals if package authors give a choice between different licenses.
- Iterative Scanning and Notifications — Focusing only on incremental changes to the codebase.
These features help us take an immense amount of data from code scanning and only flag what’s relevant, allowing companies to go from scanning code once a quarter to dozens of times per day.
All of this is fully configurable and integrated with workflow tools like GitHub, JIRA, Slack, code review, etc. You can customize every behavior down to the depth we scan or even the types of files we consider or choose a set of standard settings (profiles) that correlate to your risk level. On our most limited profile, we’ll only scan files that *look* like they include license/copyright data.
With all that said, you can always turn off code scanning in FOSSA…
…but we’ve seen that code scanning can work really well with fast and complex development workflows — check out what SmartThings is doing!
Did I Convince You?
The open source community is incredible, and we rely on it every day here at FOSSA. However, it’s also notoriously casual about sharing code and properly reporting/tracking licensing data. It only takes one out of the one-thousand developers whose code you’re using to lack diligence and include a license violation.
If you’re running a compliance tool, just make sure it scans code. And of course, I encourage you to try FOSSA and see if it’s right for you.
Published at DZone with permission of Kevin Wang . See the original article here.
Opinions expressed by DZone contributors are their own.