The Data Lake, or the Scam of the Decade?
What if data lakes were just a scam?
Join the DZone community and get the full member experience.Join For Free
I remember the years 2014-2015. Back then, we were talking about digital transformation, about data as the new oil. Data scientists were the new IT stars. It was the era of the data lake, the promise of a new data promise land. Yes... but also no.
Examining the Utility of Data Lakes
As an architect, if I want to defend a project idea, I show the business case, study why this project is needed, examine the important use cases and show which ones will be quick wins. In short, I start from the use cases and then define the project from its IT angle. And now, with the data lake, I'm doing it the other way around! "Yes, but we'll find the use cases in practice." Except that I know that many data lake projects have not yielded anything operational after two or three years! And no one has any idea what the real use cases of this data lake might be. It's sad, especially when you see customers who didn't ever make any data lakes, but would like to make a data lake by knowing the use cases that would finance the project in advance. There are literally tens of millions of dollars/euros spent on data lake projects for nothing!
A Bureaucratic Approach
But this is what you see when you're not in it! When you work on these projects, you attend endless meetings with a business project manager, a data lake project manager, and IT architects, where you talk about data modeling.
"But wouldn't Business Object X have this extra field somewhere by chance? Are we really sure that is fresh and of high quality?"
And, meanwhile, the business project manager repeats at will that they just want a list of customers with 3 or 4 associated fields. Obviously, nobody listens to this request, as top-notch modeling of quality data is essential to know everything about everyone! And then there's me in all this, coughing the word "agility," hoping that someone will react... Because, yes, these projects are often managed in a V cycle, with perfect assurance that we'll have a budget and deadlines that will be completely exceeded. But we don't have enough Big Data experts, so... everything's fine.
Of course, when you are the star of IT, you certainly do better than everyone else, in a much better way. And so you integrate your data from the data source with tools like Kafka. But it turned out to be a very big effort to import the data from the very old in-house mainframe.
If you had asked an integration specialist, they would have made you a simple demo of change data capture tools, and integrated your data on the fly in less than five minutes. But we don't have enough Big Data experts, so... everything's fine.
Take a data scientist, an architect, a functional expert, put them in a room with an ideation coach, and keep them there all day. If you have use cases that come out of it, then implement them as a top priority! Of course, you will miss a lot of things in the immediate future, but the innovation that we are discovering and hoping to discover is always adjacent to what we already know. I'll leave you to savor TedX lectures on the subject of innovation to convince you — the best of which probably comes from an Italian scientist who has analyzed the emergence of innovation in a scientific way.
And, of course, if locking four people in a room doesn't work, maybe you should give up?
Opinions expressed by DZone contributors are their own.