npmGate — Lessons Learned Again
npmGate — Lessons Learned Again
When 17 lines of code disappeared from NPM, several sites like Facebook and Spotify were at risk. While the issue has been resolved, there are several lessons to be learned from managing a large public open source repository.
Join the DZone community and get the full member experience.Join For Free
Read why times series is the fastest growing database category.
No matter what your personal opinion is about who is at fault and what should be done about this though — there are definitely a few lessons to be learned. Some will be new to you and old to others, but together they constitute a valuable step forward. If we followed those ideas, npmGate might never have happened the first place:
Don’t Fall for Lawyer Threats Immediately.
In general, you can assume that people do not understand patent, copyright or trademark laws. This is exactly, why companies like kik can threaten with lawyers, even if their claim is most likely unfounded. Both the developer of the kik package as well as npm inc. fell victim to this bullying tactic. A trademarked company name does not give you rights to the sole usage of these letters in any context. It would have to be easy to confuse. Just ask Apple how they prevent supermarkets from selling an apple. Or Cisco and their usage of iOS for their operating system. And don’t confuse that with iOS. And then check kik.com vs kik.de.
With these complexities on the trademark side, not to mention things like onboarding, user support, and so on, you can probably understand that...
A Public Repository is a Big Responsibility
Sonatype has been running the Central Repository for the Maven and wider Java ecosystem as the largest repository for years now. We manage billions of downloads, millions of components and many thousands of contributors. We supply lots of documentation including a video series and have strict guidelines and terms of service. Over the years we have learned many lessons and have been enforcing things like signatures, minimal metadata and proof of namespace ownership for years. You have to be very careful with your actions around managing the repository and always look out for the best interest of the community of users.
One of the aspects that definitely helps a lot is that:
Release Components are Immutable
The users of a repository rely on the fact that any component retrieved with a certain identifier, is the same, no matter if they retrieved it two years ago, yesterday or will do so in a year’s time. The idea of unpublishing, as possible in npm, always struck me as a bad idea and against this immutability concept. Without it, you can for example not guarantee that a build of your project running today produces the same output as a week ago... it might not even work. Not to mention what happens in a year’s time.
The npm community suffered from an even worse aspect though, a package was actually deleted. This breaks the immutability and also opens the door to potential new packages of the same name with completely different characteristics. They are frantically working on plugging that hole now.
And the same immutability principle applies to your own application releases. If the output of your build or release process is different, it should be a new component and e.g. use a new version number. It should never overwrite an existing component, not in the file system and also not in your own in-house repository.
Namespace Separation Helps
The npm public registry does not force the usage of namespaces. In the npm case they are called scope. In the beginning the Central Repository did not use namespaces either, but the usage and enforcement as part of the deployment process and user validation has been a tremendous help for users. Without it, administration and onboarding new users would not be possible at the scale of the Central Repository. In the case of the Central Repository, the groupId uses a naming convention that relies on reverse domain name patterns. E.g. if I want to publish to foo.com, I have to provide proof that I own the foo.com domain name. Check out more in our video series Easy Publishing to the Central Repository. If I were to run the npm registry I would force scope usage with a similar mechanism going forward.
Usage of Libraries Comes at a Cost
Every developer knows that you don’t want to reinvent the wheel and use libraries instead. We follow the Unix principles and stand on each others shoulders to enable creation of today’s complex applications to be created at all. After all you would not tell someone, wanting to learn to drive a car, to learn how to weld a frame first, so they can build their own car…
However just like using a rusty car with bad welding can cause you problems, using components of unknown quality can be disastrous. Developing applications includes a requirement to understand the characteristics of all the involved components. Java developers using Maven have known that for a long time and have access to tools like the dependency hierarchy view in M2Eclipse. Users of Nexus IQ for Eclipse even have access to security and license details for all components.
Other developers in the npm or python ecosystem are not yet that lucky, but essentially have the same problem. The good news is that we are working on helping them as well.
First and foremost however, this will need to be driven by a shift in mindset. Your application is not just the code you write… but everything you ship and use as part of your running application. And you are responsible for it all. So try to work towards understanding all the parts inside and be able to generate a full bill of materials.
And don’t tell me it gets easier with Docker. You just have one large container … with lots more stuff inside. You might want to talk to Twistlock to find out what they are up to.
Use a Repository Manager
A repository manager essentially allows you to run your own in house repositories or registries as well as proxy public ones. Check out more in my white paper Concepts and Benefits of Repository Management. It has long been a well known best practice in the Maven ecosystem to use a repository manager and users of Gradle and other tools are slowly waking up to that fact as well.
Beyond that, any tool that relies on public repositories is potentially victim to upstream deletions, outage and suffers from the network overhead of repeated downloads and wasted time. And that includes npm, NuGet, Bower, Docker, and others. Luckily the Nexus Repository Manager OSS is freely available and can help you with all those repositories and technologies.
Supply Chain Management
At this stage it is undeniable — the software industry is growing up and supply chain management methodologies have become crucial. A mantra like “Use fewer, better components from trusted suppliers” applies to a car manufacturer outsourcing seat belts and other parts. But it also applies to a software developer using database abstraction layers and persistence frameworks, logging, and web frameworks or security and encryption libraries. The software supply chain tools around Nexus IQ Server can help you with the selection and management throughout the development cycle with CI server and IDE integration and lots more.
And Nexus Repository Manager can be your warehouse of components. And it is free and easy to install and run. Don’t wait – grab Nexus Repository Manager OSS 3 today!
So is this it? Are we done and we got this all under control? Not by a long shot. Npm inc. has plans to improve the situation already. And there are a lot of further tools to be created to manage large dependency trees in various ecosystems as well as mature their repository formats and processes. And of course, there are trust and security related complexities that should be tackled as well. Exciting times!
Opinions expressed by DZone contributors are their own.