Big Data and Analytics in 2017
Big Data and Analytics in 2017
Seven executives share their thoughts on where Big Data and analytics are going in the coming year.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Check out seven executives' points of view about where Big Data and analytics are going to go this year.
Andrew Brust, Senior Director, Market Strategy and Intelligence, Datameer, makes the following points.
Siloed features are going away. Whether it be data prep, data quality, visualization, or even predictive analytics, data companies are making acquisitions or organically adding features to make their products cover the data journey end-to-end. While that trend began this year, it will gain serious momentum in 2017. Look carefully, though, at which products will merely bolt on new features and which were designed and architected for an integrated experience from the start
Hadoop (YARN and HDFS) is a super powerful distributed computing platform. Next year, an increasing number of companies will start using it as infrastructure for their own offerings, rather than as a directly exposed platform. These companies will treat Hadoop more like an operating system and less like an application, abstracting away Hadoop’s complexity and turning its cluster-based deployment into a mere implementation detail. Customers will get the power of the Hadoop platform and of the execution engines designed for it, without needing expertise in the platform or those engines themselves.
The Growth of Notebooks
Notebook platforms like Jupyter and Zeppelin are already extremely popular with developers and that popularity will grow in 2017. Notebooks’ ability to fuse together code, data visualizations, and Wiki-like text and media content make them super-versatile. Next year, more conventional enterprise database platforms will be accommodated in notebook environments and the ubiquity and standardization of Notebooks will thus increase.
Python Versus R
The battle between Python and R to be Data Science’s preferred programming language is raging on at full blast and shows no sign of letting up. Microsoft’s integration of R throughout its stack is helping that language immensely, but Python’s applicability to a great number of programming domains is working in its favor. Both languages have gargantuan ecosystems and countless packages/libraries, making them ever-more capable. Next year, the competition will spur significant innovation on both sides. While there may not be a singular winner, it’s hard to imagine either side really losing.
Enterprise developers and database administrators are increasingly interested in analytics technology and will bring true critical mass to these technologies in 2017. At the same time, analytics tools will become more accessible to technologists with Enterprise developer and database skillsets next year than ever before.
Software-Defined Hadoop Clusters
Being able to spin up, and tear down, Hadoop clusters, in an automated fashion, will become much more mainstream next year. The combination of scripting frameworks, standards for declarative architecture specifications and the continuing rise of software container technology will democratize Hadoop provisioning, not just in the context of managed cloud Hadoop services, but also on Infrastructure as a Service cloud platforms and on-premises as well. Expect to see lots more Hadoop clusters being spun up expressly for specific, ephemeral workloads and less use of monolithic Hadoop clusters that run 24/7.
Art Landro, CEO, Sencha, sees the following.
In 2017, we will produce more data than ever before, creating new challenges around consuming that data to make strategic and tactical decisions.
More data was created in the last two years than the previous 5,000 years of humanity. In 2017, we will create even more data in one year alone. The type of data created is expanding rapidly across a wide range of industries: biotech, energy, IoT, healthcare, automotive, space and deep sea explorations, cybersecurity, social media, telecom, consumer electronics, manufacturing, gaming and entertainment — the list goes on. Yet, recent research has found that less than 0.5 percent of that data is actually being analyzed for operational decision making.
The focus in software development will be getting your hands around all that data and being able to use it either strategically to make important long-term decisions, or in real-time to make operational decisions – as there is no value to the data being created if you can’t use it. In order to get ahead and stay ahead of the competition, it will be critical for organizations to be able to visualize and analyze the data for better decision making. Check out this article by Dan Gallo, Using Ext JS to Visualize and Interact with IoT Data, about how developers can leverage the power of D3 and IoT devices to build a great looking dashboard using Ext JS.
Satyen Sangani, CEO, Alation, sees:
Big Data Wanes
Big data will wane as a term. The focus now turns from infrastructure to applications with specific purposes. Companies will look to applications and new business models for concrete value, rather than the more general idea that data can be useful at scale.
Resume Must-Have: "Data Analysis"
The ability to understand data will become the key marker in the emerging divide between the middle class and the managerial class. Where once a college degree might have been a clear divider, now those who continue to favor and profit from investment in analytical skills, science and technology will see their value increase in demand.
Venky Ganti, CTO, Alation, shares:
Data Lake Independence
Data lakes have enabled us to push all source data into one place and then try to organize and transform it into analyzable datasets later. This will change with the increasing adoption of technologies (e.g., Presto) that enable analysis of data at the source, without having to pull them all into a common store upfront.
Managing the Sprawl
Self-service analytics technologies have put analysis into the hands of more users and as a byproduct, led to the creation of derivative artifacts: additional datasets and reports, think Tableau workbooks and Excel spreadsheets. These artifacts have taken on a life of their own. In 2017, we will see a set of technologies begin to emerge to help organize these self-service data sets and manage data sprawl. These technologies will combine automation and encourage organic understanding, guided by well thought-out, but broadly applicable policies.
David Crawford, Engineer, Alation, believes:
Data autonomy and management play nice. Autonomy and order have traditionally been opposing forces in data management. In 2017, the tension will be relieved as the end-users of the data gain the ability to self-organize and collaborate, essentially bringing the best of both worlds.
Rick Fitz, SVP of IT Markets, Splunk, sees:
Analytics go mainstream. In 2017, we will see a major focus on analytics, with more IT professionals and engineers relying on emerging technologies like machine learning, automation and predictive analytics to do higher level work behind the scenes.
Aaron Kalb, Head of Product, Alation, agrees:
Self-service analytics rises. Self-service analytics will increasingly become a must-have, and analytical organizations will have to demonstrate their value by disintermediating the process of decision-makers accessing data, rather than trying to show they are useful by always being in the middle.
Opinions expressed by DZone contributors are their own.