Are You Ready for the Future of Big Data?
Are You Ready for the Future of Big Data?
This article is one of a 3-part summary about what the author learned from World of Watson (WoW).
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
A few weeks ago, I went to the IBM World of Watson conference in Las Vegas, NV. Being one of the (roughly) 500 IBM Champions worldwide, this seems to be my yearly migration, my regular “blue shot”, my commitment to the program. You get the gist. But who knows… What happens in Vegas stays in Vegas.
This article is one of a 3-part summary about what I learned from World of Watson (WoW). All articles are independent and you do not have to read them in a specific order. In this part, I am covering concepts that explore the future of big data, data lakes, governance, and data wrangling. A more personal summary can be read on my blog (JGP.net), with some restaurant suggestions (just a few). The second part is focusing on Informix, its usage and how active it is in the IoT world.
Governance Is Everywhere
You could argue that this isn’t breaking news, but before this conference, I had the feeling that it was a vain wish: it was a separate project that was not integrated in the business processes. As an example, this is exactly what Vanguard did when they started, more than 10 years ago, with their own DG (data governance) 1.0 tool.
Vanguard is a pioneer in the whole governance process. They started, like everyone, with Excel and other such office tools, switching to a full integrated process using IBM InfoSphere Governance Catalog (IGC).
They now use Open IGC (InfoSphere Governance Catalog) for its extensibility and thanks to this journey they pioneered, they have a serious experience in metadata management and governance. It reassures me as one of their 401k customers.
Thinking of metadata management, the "M" word is no longer a mystery. Proper governance requires cleansed and reliable metadata. For this reason, a lot of the tools integrate more and more of those features as part of their standard operating procedures.
And it’s a good feeling to see that I am not the only person working on the subject.
More than a long-term vision, IBM demonstrated some of those concepts in their new Data Connect product.
Open IGC supports extensibility (hence its “Open” prefix). If you look closely you won’t see Bedrock (more on that in a future post).
The industry is definitely more and more in need for such integration, automatic metadata management is becoming key to governance.
Data Lenses and the Future of Machine Learning
Last week, Jane came to Carrboro High School wearing a green top with a pink skirt. Paul, who likes Jane, complimented her on the choice of cloth, but Julie who witnessed the scene thought he was making fun of her bestie, who’s going out with Philipp. But as Julie, who has a little crush on Paul who only has eyes for Jane, reported the incident to Philipp. Of course, Philipp did not like that, and they went behind the gym to solve the issue.
So, from a student point of view, this is a normal high school drama, while for the admin staff it was bullying, even if Philipp and Paul solved their issues with a battle of “Magic the Gathering”.
This was the theme of the example that MIT Media Lab director, Joichi Ito, used to explain the concept of data lenses. Of course, this noble institution is not spending all this energy to solve high school dramas, but rather to enhance the output of analytics and, very precisely, the idea of giving a “job experience” feeling in traditional Machine Learning techniques.
So what’s a typical use-case? Imagine an experienced cop (not starting any debate here). He has this knowledge based on his experience, he sees clues, where we would not see anything, he knows where to look for indices. The idea of the data lenses is to build this prism through which the machine will see data differently.
Joi Ito is reminding us that we do not all have a PhD.
What does it mean concretely? You tint your Machine Learning model with the experience of the professional.
Cloud, Cognitive, and Analytics
These are the 3 keywords you should remember from World of Watson 2016.
IBM believes in Cloud, which some might say is not really surprising. I strongly believed in Cloud even before it was called Cloud. And really, as keynote speaker Tom Friedman, and three-time Pulitzer Prize winner: “This ain’t no cloud, folks. This is a technological supernova, the explosion of a star. And we know what happens with the explosion of a star — it’s the center of everything”.
It sure is a high-level view and it needs to be drilled down into concrete implementations, but everybody is working in some kind of cloud. My biggest belief is that hybrid clouds will be the predominant architecture for the next 5 years. This means that your software needs to be aware of this and benefits within. Not going for a shameless Zaloni-promotion here, but this is exactly the idea behind where Bedrock can archive data in the cloud when it’s cold.
Cognitive is just about AI. But, AI stands less and less for artificial intelligence, rather it stands for augmented intelligence. As augmented reality displays additional information on your screen, augmented (aka extended) intelligence will help you make better decisions.
An example of Augmented Reality in Pokémon Go, a female Nidoran is walking on my desk as I work on metadata architecture.
Thanks to smarter applications that can pre-analyze your data, your analytics will get smarter, more impactful. This brings me to what was the biggest insight of the conference: Just as IBM did with Linux a few years ago - phasing out all their operating systems in favor of Linux - IBM now defines Spark as an Analytics Operating System.
Rob D. Thomas, VP Product Development IBM Analytics, and Adam Kocoloski, CTO for Data Services, co-founder of Cloudant, on Spark as an Analytics Operating System.
For me, this is a huge step forward and confirm that our choice of using Apache Spark as our underlying transformation engine for Bedrock is the way to go. I look forward to embracing even more Spark features in our products (but I can’t share more for now).
Stay tuned. Not exactly everything that happens in Vegas stays in Vegas.
Published at DZone with permission of Jean Georges Perrin , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.