DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. How Big Data Tools and Technology Have Changed — In The Last Year

How Big Data Tools and Technology Have Changed — In The Last Year

Four themes: 1) Spark supplanting MapReduce and Hadoop; 2) machine learning and clustered computing coming to the forefront; 3) the cloud enabling larger datasets at ever lower prices; and, 4) the new tools that make the analysis of large, disparate dataset even faster.

Tom Smith user avatar by
Tom Smith
CORE ·
Jun. 28, 16 · Analysis
Like (3)
Save
Tweet
Share
4.34K Views

Join the DZone community and get the full member experience.

Join For Free

To gather insights for DZone's Big Data Research Guide, scheduled for release in August, 2016, we spoke to 15 executives who have created big data solutions for their clients.

Here's who we talked to:

Uri Maoz, Head of U.S. Sales and Marketing, Anodot | Dave McCrory, CTO, Basho | Carl Tsukahara, CMO, Birst | Bob Vaillancourt, Vice President, CFB Strategies | Mikko Jarva, CTO Intelligent Data, Comptel | Sham Mustafa, Co-Founder and CEO, Correlation One | Andrew Brust, Senior Director Marketing Strategy, Datameer | Tarun Thakur, CEO/Co-Founder, Datos IO | Guy Yehiav, CEO, Profitect | Hjalmar Gislason, Vice President of Data, Qlik | Guy Levy-Yurista, Head of Product, Sisense | Girish Pancha, CEO, StreamSets | Ciaran Dynes, Vice Presidents of Products, Talend | Kim Hanmark, Director, Professional Services, TARGIT | Dennis Duckworth, Director of Product Marketing, VoltDB.

We asked these executives, "How have big data tools and technologies changed in the past year?"

Here's what they told us:

  • Massive movement. 1) Look at what Spark has done to Hadoop – it’s sucked the energy out of the newer frameworks. 2) How Google cloud is bringing new applications from machine learning and an AI perspective. It’s about the intelligence level, not about the data infrastructure. 3) The big data ecosystem is evolving with new tools being added on top of Hadoop. Keep in mind the application developers who are becoming more aware of big data.
  • We have a big name data scientist on staff that knows multiple languages and platforms. We see a lot of companies using Apache Spark as their big data platform because it analyzes and hand-offs datasets more quickly. We are also able to see what three dozen clients are doing to see what’s trending and we’re able to create tests that get information from employees about what knowledge they already have and what knowledge they need to learn. Companies don’t know what they don’t know and they don’t know the value of the data they already have. The knowledge gap is huge which is why we are using testing to create standards and metrics. We provide value to clients by providing feedback on the positions they are looking to fill and the specific requirements and skillsets for those positions. Managers need to be trained in how to leverage the value of the information collected.
  • 1) Open Source – NoSQL drove a lot with Apache projects and clients cobbling together multiple Open Source packages to solve a single problem with the SMACK stack which you can do with our single solution. In addition, code glue and developers are needed to stitch all of the different packages together. We’ve lost the knowledge of how to create a solution from scratch. 2) The cloud has provided prominent deployment options since it’s not in the IT business – Operations Expense versus Capital Expense. Healthcare and financial services tend to trail due to privacy regulations. Those industries are leading the way with hybrid cloud solutions.
  • There are a lot of tools and platforms; however, you don’t know which one’s right for your business today, let alone tomorrow. Our smart execution solutions help you identify the solution that’s best for your jobs based on cost-based optimization.
  • Big data and Hadoop used by data scientists. Large scale complicated machine learning work – clustered computing. Last year more developers made big data stores an interactive resource for the business. We need to learn how to make Hadoop accessible to other people than just data scientists – from 10 to 20 users to 100s and 1000s of users. This need is driven by economics and the volume of data projected to increase 47-times over the next 20 years. Greater ability to push down ETL in Hadoop to the cluster. Better economics made more real in the organization. More tools help make the environment ready for corporate scale.
  • Accelerating innovation with Open Source Apache Heron and Beam. New applications continue to emerge focusing on streaming data. We use Kafka for this and the cloud with AWS and Azure. Innovation with record services and MapR systems and Horton Works’ data flow.
  • 1) Increase in the number of tools enables scale out from just a few members. There’s a focus on Spark rather than Map Reduce. People are unsure what to invest in – what will be the go to solution. 2) In memory databases with Apache and NoSQL. Anything that flows to a disk is slower so we want to keep data in memory so we can look it up more quickly. 3) Machine learning is scaling up. This requires math and computational skills that people get with self education. We use the term “smart” in marketing more. The scale of the amount of data we are getting requires machine learning. There is an influx of smaller vendors doing interesting things while improving the UX. Happier times with nicer interfaces.  
  • 1) More companies are looking at big data because less technology is needed to access it. Platforms were expensive but they’re now in the cloud with Dell, AWS, and other supporting business users. 2) Capture, facilitate, and use data on demand and analyze. A differentiator will be how much information you can catalyze on the different big data tools.
  • The pace of technology is changing quickly. Closer integration between the different technologies. Microsoft just integrated Spark to their big data application. They obviously have a commitment to Apache Spark and Azure. There’s strong maneuvering among the big cloud companies to provide dominant B.I. solutions. The cloud game and the big data game go hand-in-hand. Ultimately you need to convert data into insights. While it’s early, Microsoft is offering a low-end freemium B.I. tool. Some of the other players are creating an integrated ecosystem.
  • IoT is overly used and loosely defined. Use big data for machine learning algorithms. We have a client that puts raw materials into reactors full of sensors. They received a lot of data but were unable to understand the root causes of the changes in their output. Getting data isn’t where the value is. You need to get to the root cause and make prescriptive changes where they are needed to be made. As you generate more data with more accuracy, you need the tools at the end to show the actions you need to take.
  • The technical progress has not been revolutionary. The biggest changes have been around the tools we work with, the volume of data, the sources of data, and the types of data. There’s also been a significant change on the business front with companies investing in big data systems and populating their databases over the last six to eight years. Those companies are now looking for insights, value, and ROI. Big data has become “real” in the last two to three years.
  • Tools like Tableau and Qlikview were more static. We want to provide granularity on top of big data so customers could get inside machine learning. We enable companies to look at the lower levels of the data from the bottom up and the top down. The tools are scalable and automated.
  • We moved from MapReduce and Batch to Streaming real-time data ingestion using Kafka and Spark. There’s an interest in streaming style technologies, different levels of processing, analytics and post processing. There’s more data, the cost of collection is going down, and the tools are maturing.
  • They’re changing all the time. We’ve moved from MapReduce to Spark and are exploring new tools as they come on the market. The tool we choose depends on the application. Tools become obsolete as well so we keep a lookout for new tools all of the time.
  • Big data has been the hot keyword for the last couple of years and with more competition it has become cheaper and more efficient in the marketplace.  The pricing models to host big data in the cloud have significantly dropped over the last 5 years and that is a direct result of competition. The competition drives innovation in hosting data in the cloud and reflecting that data visually.  

What are the most significant changes you've observed in big data tools and technology in the past year?

Big data Data science Machine learning IT

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • mTLS Everywere
  • Introduction to Container Orchestration
  • Stop Using Spring Profiles Per Environment
  • What Are the Benefits of Java Module With Example

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: