Keys to Working With Big Data
Keys to Working With Big Data
Read about insights from 15 executives that created big data solutions for clients with topics ranging from data sources, integration of data, and innovation.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Here's who we talked to:
Uri Maoz, Head of U.S. Sales and Marketing, Anodot | Dave McCrory, CTO, Basho | Carl Tsukahara, CMO, Birst | Bob Vaillancourt, Vice President, CFB Strategies | Mikko Jarva, CTO Intelligent Data, Comptel | Sham Mustafa, Co-Founder and CEO, Correlation One | Andrew Brust, Senior Director Marketing Strategy, Datameer | Tarun Thakur, CEO/Co-Founder, Datos IO | Guy Yehiav, CEO, Profitect | Hjalmar Gislason, Vice President of Data, Qlik | Guy Levy-Yurista, Head of Product, Sisense | Girish Pancha, CEO, StreamSets | Ciaran Dynes, Vice Presidents of Products, Talend | Kim Hanmark, Director, Professional Services, TARGIT | Dennis Duckworth, Director of Product Marketing, VoltDB.
We asked these executives, "What are the keys to working with big data?"
Here's what they told us:
- There’s a high volume of data to manage.
- The elastic nature of enterprise and e-commerce applications. EBay is using MongoDB to scale out horizontally.
- Resilience—recovery and management of data. Companies cannot afford any downtime.
- Define what you want to get from big data (e.g. algorithms for facial recognition). Financial Services/Investment houses may be building quantitatively-based hedge funds that require analysis of a lot of datasets and provide predictive analytics.
- Transactional data tends to be less voluminous. We partner with Teradata for big data analysis where we’re the fast front end. Clients are able to deploy our product to see when errors may be forthcoming since we’re able to provide fast, real-time analytics.
- The ability to pull data from anywhere, enrich data sets, take spreadsheet functions with declarative models to model, structure, analyze and visualize the data.
- As data sets grow think about where big data is going. It’s difficult, slow, expensive, and rigid to put together large data warehouses. Large credit card companies load data at speed into Hadoop. We have ETL tools for information. The burden is to make it analytically ready to put on a platform. Multiple sources of data are available in a shorter period of time. It is difficult processing the data to get it ready for use in business decisions.
- Innovation in dealing with volume and elasticity. Current challenge is taming data drift–unpredictable changes to data semantics, structure, and infrastructure. Leverage new solutions to manage, watch, and deal with the changes.
- Integration: How many data sources do we need to integrate in the next quarter, six months, nine months? What’s the nature and number of data sources and the type of data within those sources? What are the skills needed to complete the project? What platform? Real-time streaming use cases are currently on 15 to 20% of the projects, most projects are still traditional batch processing but that will change.
- Big data arena with early mover advantage is huge and involves mostly structured data. There are lot of transactions from a lot of sources that can be processed. Clients want to be able to take action on the data by compressing it into something that is actionable.
- Big data has different interpretations and meaning. The difference in big data is with data complexity and the disparate number of data sources being created. We have one client who started with five data sources, now has 18, and will have 30 next year with IoT creating significant amounts of data.
- A lot of time on integration. Data ingestion is the highest risk. Clients need to ingest their data in days not weeks or months. We help customers go live faster and take action more quickly. We begin by looking at the gold nuggets first, and then looking at the other, less important data later.
- More data coming faster. Systems connect to get the data from Hadoop or Hana SAP with the amount of data varying based on the project needs. Large scale data collection, ingestion, analysis, and output.
- IoT or company producing and collecting millions of points of data but they don’t know what to do with it. They try to build a solution but can’t drill down and see the details. Insights are delayed - they take five days versus five minutes and the delay costs the company millions of dollars.
- Ability to deal with different forms of data–unstructured, semi-structured, and structured. The ability to work with combinations of data is the way to drive the most value from the data.
- Crystalizing predictive analytics. Operationalizing big data in a corporate environment. Making automatically actionable decisions in real time.
- Understanding from the outset what you want to accomplish with the data. It’s important to work with your clients to get a good sense of what they are trying to measure so you can index and structure the data to their needs. Forethought and planning enables us to deliver results efficiently as well as produce visual reports and dashboards that are appealing to the client.
What else do you see as the keys to working with big data?
Opinions expressed by DZone contributors are their own.