There were a lot of great activities and sessions at the recent Apache: Big Data North America in Vancouver, B.C. I enjoyed the technical level of the sessions and meeting others who contribute to projects in the Apache Software Foundation (ASF). The sessions I went to had a high level of interesting technical content, with engineers sharing the work they were doing. The questions I received in the two sessions I presented on Apache Hive showed that the audience was engaged with the project, how it works, and the changes that the community is making. It is always interesting to hear what others are working on and to be able to share your work with others who appreciate and understand it.
From an ODPi perspective, I think the best thing about it was the fact that so many people learned about who we, as ODPi, are and what we do. Many people I talked to had never heard of us but they were excited when we explained our goals and what we have done so far. And for many who had heard of us but were confused or concerned (i.e., are we competing with ASF? are we another distribution? why do we even need the ODPi?), we were able to communicate our mission and answer their questions.
Hence I am excited about ODPi’s announcement at Apache: Big Data North America — it has become a gold sponsor of the ASF. It is a great way for ODPi to enhance its contributions to the ASF, which already include contributing features and bug fixes in Apache Hadoop and related projects. ODPi, its member companies, and the ASF all want a stronger Hadoop ecosystem and community, and this is another way that ODPi can help achieve that goal.
The Hadoop ecosystem includes a number of companies distributing Hadoop as well as many companies and projects providing applications that run on Hadoop. As each distribution and application makes slightly different assumptions about how the software is installed, what versions of various packages are being used, and how the packages are configured, it becomes impossible for end users to successfully mix and match software from different projects and companies. And for those applications trying to run on multiple distributions, the test matrix grows ever larger and the investment required to produce their software grows. This situation does not help the ecosystem grow and thrive.
A fast-growing and healthy ecosystem benefits all who are involved in that ecosystem, such as ASF big data projects, companies distributing those projects and selling products in the ecosystem, and end users. Giving end users assurance that the projects they adopt and purchase will work together speeds their adoption. Lowering the investment for application builders will increase the number of applications written on top of Hadoop. As more end users and application builders enter the ecosystem, the ecosystem will become more diverse, looking more like the picture below. A more diverse ecosystem will be a healthier ecosystem.
How does ODPi help with this? As I said in my keynote at Apache Big Data, ODPi provides specifications for Hadoop runtime and operations. The recently released runtime specification stipulates how compliant distributions should install and configure Apache Hadoop and what environment variables should be set to indicate the location of executables and jar files. It also proscribes actions such as changing public APIs. We will soon be releasing an operations specification for Apache Ambari that will provide similar requirements. These specifications allow application builders to code and test with confidence. They will allow end users to deploy many different applications on their compliant distributions. So let’s work together to accelerate the big data ecosystem!