The Big Data Analytics Landscape
The Big Data Analytics Landscape
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
As a technologist and evangelist working in the big data marketplace, I'm excited by the new products my company brings to market, and how this functionality helps to bridge the gap for Enterprises adoption. It's also a bit surreal, seeing the number of blog and tweets on Big Data rise. There seems to be a new big data conference every week.
It's interesting to monitor other vendors in the marketplace and how they position their offerings. There is certainly a lot of clever marketing going on (that I believe in time will show a lack of substance), and some innovation too. You know who you are...but jump on the bandwagon. Just because you might have Hadoop and your database + analytics within the same rack, that doesn't mean they're integrated.
You see a lot of fervor from people discussing open source products. Those who know me know that I'm a long time UNIX guru -- over 20 years from the early BSD distributions to working with UNIX on mainframes, to even getting minix to work on PCs before Linux came along. Fun times, but I was young, free, and single, and enjoyed the technical challenge. I was also working in a research department in a university. The argument that Hadoop is free, easy to implement, and will one day replace data warehousing -- well, that doesn't ring true for me. Certainly it has a place, and provides value, but it doesn't come at zero cost. Certainly Hortonworks and Cloudera provide distributions that are reducing the installation/configuration and management effort, but you have multiple distributions, starting to go in different directions? MapR for example?
How many enterprises really want to get that involved in running and maintaining this infrastructure. Surely they should be focused on identify new insights that provides business benefits or gives greater competitive advantage. IT has an important role to play, but it will be the business users ultimately that need to leverage the platform to gain these insights.
It is no use getting insights, if you don't take action on them either.
Insight gained from big data analytics should be fed into existing EDW (if they exist) so they can enhance what you already have and the EDW provides you with a better means of operationalizing the results.
I say to those people who think Hive is a replacement for SQL, not yet it ain't, it doesn't provide the completeness or performance that a pure SQL engine can provide. You don't replace 30+ years of R&D that quickly...
To the NoSQL folks, this debate is taking on religious fervour at times, It has a role, but I don't see it replacing the relational database overnight either.
In a previous role I managed a complex DB Environment that included a Big Data platform for a company that operated in the online gaming marketplace in a very much 24 X 7 environment, with limited downtime. It was the bleeding edge at times, growing very fast. If we had Teradata Aster 5.0 then, my life would have been so much easier. Se had an earlier release but we learned a lot. We proved the value of SQL combined with the Map Reduce programming paradigm. We saw the ease of scaling and reliability, We delivered important insights into various types of fraud, and took action on them, which yielded positive kudos for the company and increased player trust, which is very important in an online marketplace. We also were able to leverage the platform for an novel ODS requirement and had both executing simultaneously along with various ad-hoc queries. I was also lucky then and since to meet real visionaries, like Mayank and Tasso which gives you confidence in the approach and the future direction
When you think of big data analytics, it just not just about multi structure data or new data sources. Using SQL/MR for example may be the most performant way to yield new insights from existing relational data. Also consider what 'grey data' already exists within your organisations, it maybe easier to tap into that first, before sourcing new data feeds. The potential business value should drive that decision though.
Don't underestimate the importance of having a discovery platform as you tackle these new Big Data challenges. Yes, you will probably need new people or even better, train existing analysts to take on these new skills and grow your own data scientists. The ease of this approach, will be in how feature rich your discovery platform is, How many built in and useful analytical functions are provided to get you started, before you may have to develop specific ones of your own.
Maybe I'm rambling with these comments, but help is at hand! We recently put together a short webinar, about 20 minutes in duration.
The Big Data Analytics Landscape: Trends, Innovations and New Business Value, featuring Gartner Research Vice President Merv Adrian and Teradata Aster Co-President Tasso Argyros. In the video, Merv and Tasso, answer these questions and more, including how organizations can find the right solution - to make smarter decisions, take calculated risks, and gain deeper insights than their industry peers.
- How do you cost-effectively harness and analyze new big data sources?
- How does the role of a data scientist differ from other analytic professionals?
- What skills does the data scientist need?
- What are the differences between Hadoop, MapReduce, and a Data Discovery Platform?
- How are these new sources of big data and analytic techniques and technology helping organizations find new truths and business opportunities?
What do you think?
Published at DZone with permission of Donal Daly , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.