Big Data Is Not About Access Using Web APIs
Big Data Is Not About Access Using Web APIs
Data is widely seen as power, and all the technical elements — and many of the human elements — involved often magically align themselves in service of this power.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
I’m neck deep in research around data and APIs right now, and after looking at 37 of the Apache data projects, it is pretty clear that web APIs are not a priority in this world. There are some of the projects that have web APIs, and there a couple projects that look to bridge several of the projects with an aggregate or gateway API, but you can tell that the engineers behind the majority of these open-source projects are not concerned with access at this level. Many engineers will counter this point by saying that web APIs can’t handle the volume, and it shows that the concept isn’t applicable in all scenarios. I’m not saying web APIs should be used for the core functionality at scale; I’m saying that web APIs should be present to provide access to the resulting state of the core features for each of these platforms, whatever that is, which something that web APIs excel at.
From my vantage point, the lack of web APIs isn’t a technical one; it is a business and political motivation. When it comes to big data, the objectives are always about access, and it definitely isn’t about the wide audience access that comes when you use HTTP and the web for API access. The objective is to aggregate, move around, and work with as much data as you possibly can amongst a core group of knowledgeable developers. Then, you distribute awareness, access, and usage to designated parties via distilled analysis, visualizations, or in some cases to other systems where the result can be accessed and put to use. Wide access to this data is not the primary objective, paying forward much of the power and control we currently see around database to API efforts. Big data isn’t about democratization. Big data is about aggregating as much as you can and selling the distilled down wisdom from analyses or derived as part of machine learning efforts.
I am not saying there is some grand conspiracy here. It just isn’t the objective of big data folks. They have their marching orders, and the technology they develop reflects these marching orders. It reflects the influence money and investment has on the technology. The ideology that drives how the tech is engineered and the algorithms handle specific inputs and provide intended outputs. Big data is often sold as data liberation, democratization, and access to your data, building on much of what APIs have done in recent years. However, in the last couple of years, the investment model has shifted. The clients who are purchasing and implementing big data have evolved, and they aren’t your API access type of people. They don’t see wide access to data as a priority. You are either in the club and know how to use the Apache X technology, or you are sanctioned one of the dashboard analysis visualization machine learning wisdom drips from the big data. Reaching a wide audience is not necessary.
For me, this isn’t some amazing revelation. It is just watching power do what power does in the technology space. Us engineers like to think we have control over where technology goes, yet we are just cogs in the larger business wheel. We program the technology to do exactly what we are paid to do. We don’t craft liberating technology or the best performing technology. We assume engineer roles, with paychecks, and bosses who tell us what we should be building. This is how web APIs will fail. This is how web APIs will be rendered yesterdays technology — not because they fail, but technically, it is because the ideology of the hedge funds, enterprise groups, and surveillance capitalism organizations that are selling to law enforcement and the government will stop funding data systems that require wide access. The engineers will go along with it because it will be real time, evented, complex, and satisfying to engineer in our isolated development environments (IDE). I’ve been doing data since the 1980s, and in my experience, this is how data works. Data is widely seen as power, and all the technical elements and many of the human elements involved often magically align themselves in service of this power — whether they realize they are doing it or not.
Published at DZone with permission of Kin Lane , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.