Why Nexus and not Artifactory? Compliance, Standards, Security, and Quality
Join the DZone community and get the full member experience.
Join For FreeWe (Sonatype) recently received some support requests from a company making a switch from Artifactory to Nexus. In the evaluation and system design phase, they were setting up Nexus to proxy their internal Artifactory instance and where having some troubles with integration. Our support staff did some digging and the results where unexpected.
Before
I get into the details, I just want to say that I don't derive much
satisfaction from pointing out problems in Artifactory, and I won't
claim Nexus is perfect either, but we pay very detailed attention to
key areas like stability, performance and most importantly,
interoperability. Frankly, it isn't something I'd like to be spending
my time on, but I've read so much hyperbole from JFrog about how
configuring mirrorOr is "lazy and dirty", and so much trash talk about
Sonatype just being "all talk" that I think it is time to start
answering the criticism.
POM Rewriting and License Compliance
The customer was configuring their system to use the Procurement support in Nexus and it was choking on validating the signature of a lot of artifacts coming from their legacy Artifactory system. Upon investigation, we found that Artifactory completely rewrites the pom files, presumably as part of a new feature to strip out repository entries from the poms. To see for yourself, compare the results of these two urls:http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom
and
http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom
Notice first that this pom has no repository element in it, therefore there is no need to modify the file at all. A closer evaluation will reveal that this pom being “proxied” by Artifactory is completely rewritten, removing all comments and reordering elements. I personally don’t think it’s a good idea to muck around with files being proxied but it’s probably fine assuming all the parsing is done correctly. It does introduce yet another place for things to go wrong though. I mean comments aren’t really that important are they? Well, if you care about open source licensing, they are. Take a look at this POM from Central:
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>maven</artifactId>
<groupId>org.apache.maven</groupId>
<version>2.0.10</version>
</parent>
Now take a look at first few lines from the same POM from JFrog's public repository:
<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
...
The License header of the file has been completely stripped away. I was pretty sure that this might be a violation of the
license itself, so I checked the Apache License at http://www.apache.org/licenses/LICENSE-2.0.
4.2 You must cause any modified files to carry prominent notices stating that You changed the files; and
4.3 You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works
I am not a lawyer, but I interpret this to mean that if you have this option turned on, and you are distributing these POMs to anyone else, you may be violating the license of artifacts being proxied. This example is for the ASL license, a rather liberal license as they go, but as an active participant in the ASF, I can tell you that the organization takes licensing issues very seriously. The fact is headers from any pom would be dumped and most licenses out there probably frown upon this. Some of you are going to shrug this off as a minor problem, maybe it is, but this is the sort of minor issue that will make a legal compliance department go berserk. But, striping licenses off of POMs wasn't really the main issue, it was just something I stumbled on trying to find a solution to the problem with PGP signatures.
POM Rewriting and PGP Signatures
Setting aside the license issue for a moment, let’s go back to the
procurement issue that was reported. Now try getting the signature file
for this artifact so you can validate it hasn’t been tampered with.
The asc file should have a GPG signature that was created with a
publicly accessible key. Click on the following URL on central to see
an example.
http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc
Ok
so far? (That my signature fwiw) Here’s the crux of the issue.
Click on the same artifact in the Atifactory proxy of Central below:
http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc
At the time of writing, I get:
HTTP Status 500 -
type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
java.lang.IllegalArgumentException: Checksum type not found for path org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc org.artifactory.engine.DownloadServiceImpl.respondForChecksumRequest(DownloadServiceImpl.java:214) org.artifactory.engine.DownloadServiceImpl.respond(DownloadServiceImpl.java:176) org.artifactory.engine.DownloadServiceImpl.process(DownloadServiceImpl.java:122) sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source) )
My valid signature that exists on Central can no longer be retrieved through the proxy. I have no doubt they will fix the crash. However: the problem still stands, how can you have a web of
trust that links back to the original developer, when proxies in the
middle are rewriting the artifact and stripping (or regenerating) the
pgp signature? Even if you trust your instance, how can you validate
the signature was correct for the inbound artifact before it was
rewritten? What if you’re proxying from someone else that happens to be
using Artifactory, did they Trojan you or just unwittingly break the
web of trust?
If you download things from the internet,
validating PGP signatures isn't something you should think about
doing, it is something you need to do. It is the only way
to guarantee that the artifacts from a remote repository are sound, and
Sonatype has invested a great deal of time into making sure that
artifacts added to the Central Maven repositories, the Apache
repositories, and the Codehaus repositories are all accompanied by
valid PGP keys that are on a public keyserver. In
addition to that, the ASF takes the idea of building a web of trust
very seriously. You shouldn't sign an ASF
release unless you've had your key signed by someone in the ASF's web
of trust at a key signing event (PGP keys are best signed only if you
can verify someone's signature, face-to-face.) It seems a
shame to throw away all of that work just to "clean" the POM of
repository elements.
Again, JFrog has written publicly that the
only reason this POM rewriting is necessary is because they think that
Maven is broken by design. But, their fix throws
away the web of trust that makes it possible to validate the contents
of a repository using original PGP keys from project
developers. We've considered similar changes in the
past, but because we are responsible for maintaining some of these
source repositories, we are forced to think about the ramifications of
our changes for the community. Building a
repository manager that just "throws out" PGP signatures for POMs seems
to me to be irresponsible when we're starting to make traction on the
difficult job of making sure that new artifacts added to central have
PGP signatures.
Artifactory Produces Non-standard Indexes
We
also had some reports of odd indexing behavior. The original index
format was a Lucene 2.3 binary file zipped up in a convenient
archive. This created a problem because if you want to
upgrade to a newer version of Lucene, you can no longer produce the
older formatted version. Newer versions of Lucene cannot
generate backwards-compatible binary index files.
Because the community needed to maintain backwards-compatibility for
all older clients, the standard Index that is produced by the major
public repositories is now a new binary layout completely separate from
Lucene. All of this work was done in the Nexus Indexer
project, a separate, open-source project that has been available under
the Eclipse Public License (EPL) and which is already integrated into
all repository managers. This new .gz format. In addition to
being a neutral format, it also supports incremental indexes. The
indexes produced by Artifactory are using the old-style Lucene zip, but
with a newer version of Lucene. This means it is non-standard and is
not consumable by all IDE plugins or other index clients.
Another
problem we found was that the indexes presented by the "virtual" repos
(equivalent to Nexus group indexes) serve up only the index of the last
repository in the list. This means in an enterprise you can not get an
index that contains all artifacts available to you, both internal and
external. While you can certainly use the
Artifactory search interface, the promise of a repository index is that
tools like m2eclipse and other Maven plugins can use this index to
quickly locate artifacts that contain particular classes or quickly
generate a list of versions for a particular artifactId.
Because
it is important for all repository managers to produce interoperable
repository indexes, we've decided to donate
the Nexus Indexer code to the Apache Software Foundation.
The Nexus index is the standard format for a Maven
repository, it is integrated into Archiva, Nexus, and Artifactory, and
it just makes sense that the code that created this index be moved
moved to an open, transparent community like
Apache. This will increase the visibility of the
Nexus Indexer code for people that actively participate in the Maven
community.
Artifactory Breaks Wagon
Maven and
the Maven Ant Tasks use something called the Wagon to transfer files to
and from a repository. It is the "transport abstraction that is used
in Maven's artifact and repository handling code", and it has providers
for SCP, HTTP, FTP, and file. Any time Maven sees a URL, the Maven
Wagon component handles the transfer. I won't go into the gory
details of this component, but one of the things that a repository
manager needs to do is provide some sort of file list for a
directory. All of the other protocols with Wagon
providers have some way to get a directory listing. The basic subset of HTTP that is supported by all web servers does not
have this command, so the HTTP wagon relies the repository returning a
list of links to the folder's contents.
Instead of returning
such a list of folder contents, Artifactory tries to redirect the
client to the UI. It doesn't return a file list,
and anything in Maven that relies on Wagon's ability to get a file list
will fail. In other words, anything in Maven or any Maven
plugin that uses wagon.getFileList() interface will break when you are
using Artifactory. You can see it here:
[INFO] Scanning remote file system: http://repo1.maven.org/maven2/org/apache/mav
en/apache-maven/2.0.10/ ...
[INFO] apache-maven-2.0.10-bin.tar.bz2
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.bz2.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz
[INFO] apache-maven-2.0.10-bin.tar.gz.asc
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.sha1
[INFO] apache-maven-2.0.10-bin.zip
[INFO] apache-maven-2.0.10-bin.zip.asc
[INFO] apache-maven-2.0.10-bin.zip.asc.md5
[INFO] apache-maven-2.0.10-bin.zip.asc.sha1
[INFO] apache-maven-2.0.10-bin.zip.md5
[INFO] apache-maven-2.0.10-bin.zip.sha1
[INFO] apache-maven-2.0.10-sources.jar
[INFO] apache-maven-2.0.10-sources.jar.asc
[INFO] apache-maven-2.0.10-sources.jar.asc.md5
[INFO] apache-maven-2.0.10-sources.jar.asc.sha1
[INFO] apache-maven-2.0.10-sources.jar.md5
[INFO] apache-maven-2.0.10-sources.jar.sha1
[INFO] apache-maven-2.0.10.jar
[INFO] apache-maven-2.0.10.jar.asc
[INFO] apache-maven-2.0.10.jar.asc.md5
[INFO] apache-maven-2.0.10.jar.asc.sha1
[INFO] apache-maven-2.0.10.jar.md5
[INFO] apache-maven-2.0.10.jar.sha1
[INFO] apache-maven-2.0.10.pom
[INFO] apache-maven-2.0.10.pom.asc
[INFO] apache-maven-2.0.10.pom.asc.md5
[INFO] apache-maven-2.0.10.pom.asc.sha1
[INFO] apache-maven-2.0.10.pom.md5
[INFO] apache-maven-2.0.10.pom.sha1
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32 seconds
[INFO] Finished at: Mon Jan 04 17:17:11 EST 2010
[INFO] Final Memory: 8M/47M
[INFO] ------------------------------------------------------------------------
C:\svn\staging-test>mvn validate
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building staging-test
[INFO] task-segment: [validate]
[INFO] ------------------------------------------------------------------------
[INFO] [wagon:list {execution: upload-javadoc}]
[INFO] Scanning remote file system: http://repo.jfrog.org/artifactory/libs-relea
ses/org/apache/maven/apache-maven/2.0.10/ ...
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Error handling resource
Embedded error: Error transferring file
Server redirected too many times (20)
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
Summary
Last time I wrote something about Artifactory, the founders of the company came back and called me biased and not objective. Judge for yourself, I've presented some concrete facts in this post.
I have to tell you that the thing that really struck a chord with me and the other engineers at Sonatype was the idea that someone could write a blog post saying that Sonatype is "all talk". It just doesn't make any sense, as a corporation we've poured resources into the foundational technologies that our competitors use. I spend a great deal of my time working on the Maven project, stopping Denial of Service attacks on Central, I'm on the PMC, a lot of that time is spent trying to make Maven a better product. A lot of this work involves talking to our competitors about ways to improve Maven and related technologies. To hear someone come at us because we're "all talk" is, frankly, insulting given the hours (no, years) we've put into this open source community.
Opinions expressed by DZone contributors are their own.
Comments