Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How Can I Get More Out Of Big Data?

DZone's Guide to

How Can I Get More Out Of Big Data?

Know what you’re going to do with the information you’re collecting and the insights you want to uncover.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

To gather insights on the state of Big Data today, we spoke with 22 executives from 20 companies who are working in Big Data themselves or providing big data solutions to clients. Here’s who we talked to:

  • Nitin Tyagi, Vice President Enterprise Solutions, Cambridge Technology Enterprises
  • Ryan Lippert, Senior Marketing Manager and Sean Anderson, Senior Product Marketing Manager, Cloudera
  • Sanjay Jagad, Senior Manager, Product Marketing, Coho Data
  • Amy Williams, COO, Data Conversion Laboratory (DCL)
  • Andrew Brust, Senior Director Market Strategy and Intelligence, Datameer
  • Eric Haller, Executive Vice President, Experian DataLabs
  • Julie Lockner, Global Product Marketing, Data Platforms, Intersystems
  • Jim Frey, V.P. Strategic Alliances, Kentik
  • Eric Mizell, Vice President Global Engineering, Kinetica
  • Rob Consoli, Chief Revenue Officer, Liaison
  • Dale Kim, Senior Director of Industrial Solutions, MapR
  • Chris Cheney, CTO, MPP Global
  • Amit Satoor, Senior Director, Product and Solution Marketing, SAP
  • Guy Levy-Yurista, Head of Product, Sisense
  • Jon Bock, Vice President of Product and Marketing, Snowflake Computing
  • Bob Brodie, CTO, SUMOHeavy
  • Kim Hanmark, Director of Professional Services EMEA, TARGIT
  • Dennis Duckworth, Director of Product Marketing, VoltDB
  • Alex Gorelik, Founder and CEO and Todd Goldman, CMO, Waterline Data
  • Oliver Robinson, Director and Co-Founder, World Programming

We asked, "How can companies get more out of Big Data?" Here's what they told us:

  • Planning. Don’t integrate a lot of pieces. Don’t worry about infrastructure concerns. There’s a wide variety of tools in advanced analytics. What additional things could be done – pattern recognition, anomaly detection, recommendations, and IoT data analysis.
  • Be aware of what data is and where it’s coming from. Mixing the corporate crown jewels with crap from the internet is not smart. You have trusted and untrusted data – know the difference. Maintain the same maturity, security, and providence applied to all the data going into the data lake to ensure its value.
  • Take a step back to determine the value of cleansed data. Rather than taking an application-centric approach take a data-centric approach. This is what Tesla has done. Insights can be limited to the applications you are using. What kind of data do we want to improve the way Tesla is doing?
  • Know what to do with the information collected – normalize, tag and add in the key data elements for analysis. Streamline work processes. Make the data more accessible to the outside world so it can be monetized or used for training.
  • 1) The real promise of big data is in the fact that you have access to a hugely valuable data store, with insights just waiting to be uncovered. Never stop analyzing. 2) Don’t go half way. Make sure you have budget and resources committed to carry you through the full process of plan, design, build, and optimize. And make sure you have budgeted for training, and maybe some consultants here and there to help you stay on track and get over the nastier hurdles. 3) Don’t build it all yourself. There’s a tendency to say “it’s easy - we can just build it.” Our experience is that this rarely holds true. What ends up happening is you build the basics, but then never get to the stretch goals as the real challenges set in for scaling, stability, and the inevitable wave of enhancement requests.
  • Pull data into one place is a limiting not a liberating factor. Think about future-proofing your data. Have an ongoing process for improving seamlessly providing automatic data discovery. Have a platform that tells you what columns to collect and what correlations are occurring.
  • Do projects that can be scaled and extended so more people can use them. Don’t target too narrowly that others cannot benefit from what you are doing. Don’t make the solution to the problem too complex. Step back and identify rational, flexible solutions on a flexible platform.
  • Let people find the data. Make quick decisions based on data a lot of data is still hidden in operational systems. There’s too much to build a catalog manually. Three buckets of value: 1) better, faster decision making; 2) rationalization of the data – understand and eliminate duplicates; and 3) governance – manage who has access rights to what data.
  • Invest in a big data team with fundamental disciplines like DevOps, database administrators, Linux administrators, and Java developers. There are so many tools wedging their use cases in where they are not the best solution for the job. Keep it simple – crawl (6 to 12 months), walk (6 to 18 months), and run. Your speed is determined by the investment in the team and the willingness to tear down siloes to share data.
  • Big data needs to be a more reliable product offering. Start small, show value, and grow. Gets started with a bang and then people see how daunting the challenge really is. Software stacks need to mature, change, and evolve. Storage is the linchpin. Cloud made it easy to compute. Need to move data around in a timely manner – fingerprinting, transferring, parsing, adding value into tangible consumption models.
  • Heavy investment and training the team to measure what problem we are solving. Do you have the right data? How are you measuring CX? What data is out there and what data do I need to invest in? Don't invest in the technology before learning how to use it. Look at 451 Group’s map of databases. We are focusing our priorities on clients who share weblog data and Twitter data to address customers’ needs while they are in the store. Interoperating and being the integration between data in a log format with Twitter and interoperating with advanced analytics systems. How to connect the old world with the new world.
  • Identify what you will benefit from. Start small with big data. Bring in gently. Look at the technology that’s available, start prototyping and see if it works for you.
  • Banks, retailers, mobile networks are comfortable with the data network and analytics. We work in the most advanced field of analytics with the leaders in machine learning. Our scientists are PhDs and are at the top of their field. We differentiate on the quality of our scientists and the tools they are using. Because of the knowledge of our own data we’re able to identify insights customers can’t tap like card transaction data.
  • Strategy is first and foremost. Be in the open source ecosystem. Identify specific use cases. Consider machine learning 360-degree view of the customer. Create guidance around people and processes. Ensure people have the skills necessary to be successful. Have a process to deliver self-service business intelligence.
  • More use cases and proofs of concept.
  • While companies talk about big data, that doesn’t mean warehousing and analytics. Big data is hyperbole. Have a predictive need. Most data sets aren’t ready for analytics.
  • Understand the taxonomy of the data. What’s most important to the business. We have sessions with companies helping them identify what they’re looking for without thinking about data. What questions are you trying to answer? Sometimes people have information they don’t realize they have. Ignore the technology and the data and think about the questions you want to solve.
  • The dynamic global marketplace is a necessity. As data volume, velocity, and variety increases, the exploding mobile-enabled population now expects this information to be available at its fingertips. Real time at every level of the business. Managing the daily business transactions of your core business processes (for example, finance, sales, and production) in real time is one part of becoming a real-time business. You also need to be able to: 1) Capture new data from sources like social media to enable one-to-one customer engagement, or connect directly to machines through sensors for getting reliable ground-level information on what’s happening at each moment 2) Analyze all this data using advanced predictive models that enable more-relevant decision making 3) Access real-time business insights on any device for immediate action
  • Companies are generally making good progress in their effort to become more data driven. A focus on how data has transformed and disrupted certain industries in the past few years has made companies aware that data is no longer a burden but a gift when used right. The key to deriving the value from this data is to venture out into the simple yet valuable use cases that the Data Science community has proven can give a quick ROI. Those are cases like monitoring customer churn, seasonality in product lines, clustering of data, etc. There are common development patterns that can be applied to data that will bring valuable information into existing analytical repositories.
  • Firstly, you don’t need ‘Big Data’ unless you have a lot of data to process, or the data is highly unstructured. Often, problems can be solved using more traditional data processing engineering solutions. Many people think ‘big data’ isn’t really a big data problem. However, you might predict that while you are not necessarily ‘big data’ yet, you may need to scale horizontally and linearly. One great advantage of big data solutions is that the cost of running them does broadly scale linearly, whereas traditional technique can hit a brick wall where the costs suddenly go exponential. The business justification for a Big Data project might simply be guaranteeing a linear scaling cost model.

How are you getting more out of big data?

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
big data ,scalability ,data science ,data management ,devops

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}