Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Unleash Data-Driven Decision-Making Through Agile Analytics

DZone's Guide to

Unleash Data-Driven Decision-Making Through Agile Analytics

With the amount of data accessible to developers and data scientists growing rapidly, the need for greater agility in data analytics is picking up steam.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

"It is a capital mistake to theorize before one has data." - Arthur Conan Doyle, author of Sherlock Holmes

Despite the advice of Arthur Conan Doyle, theorizing to a greater or lesser extent is how the majority of business has been conducted until the digital age. Whether you call it gut instinct or business smarts, the ability to spot trends and anticipate demand gives companies the edge over the competition. Now the digital age is taking the guesswork out of the process. Data is redefining decision-making every front - from operation and engineering activities to research and engagement strategies.

In fact, the data economy is already a multi-billion-dollar industry, generating employment for millions, and yet we're only just beginning to tap its potential. It's no accident that digital transformation is on every boardroom agenda. The secret to unlocking future prosperity in almost any business, whether established or a digital native, lies with the data.

Big Data Is Big Business

Today, the key to successful business decision-making is data engineering.

2.5 quintillion bytes of data are generated every day on the internet!

And that figure is growing. So is the desire to put it to good business use. Utilizing vast repositories for storing data, otherwise known as data lakes, is now commonplace. These differ from traditional warehousing solutions in that they aim to present the data in as "flat" a structure as possible, rather than in files and sub-folders, and in their native format as well. In other words, data lakes are primed for analytics.

Drowning in Data

Data lakes have given rise to the concept of the "enterprise data bazaar," a useful term coined by 451 Research. In the enterprise data bazaar, or marketplace, self-service access to data combines with data governance to produce a powerful platform that enterprises can use to steer the future direction of the business. You can read more in the 451 Research report, Getting Value from the Data Lake.

Data lakes are not without their challenges. Gartner predicts 80 percent are currently inefficient due to metadata management capabilities that are ineffective.

Data Engineering Puts Disparate Data to Work With Agile Analytics

IDC's Ritu Jyoti spells it out for enterprises, noting, "Data lakes are proving to be a highly useful data management architecture for deriving value in the DX era when deployed appropriately. However, most of the data lake deployments are failing, and organizations need to prioritize the business use case focus along with end-to-end data lake management to realize its full potential."

When we talk to customers, the business drivers for data engineering are clear. Businesses are crying out for quick access to the right data. They need relevant reports, delivered fast. They want to be able to analyze and predict business behaviors, and then take action in an agile fashion. Data growth shows no signs of slowing, and the business insights enterprises will gain are only as good as the data they put in. As data sets grow, enterprises need to be able to quickly and easily add new sources. Finally, efficiency is a consideration since the cost of data systems, as a percentage of IT spend, continues to grow.

Extracting business value from these vast data volumes requires a rock-solid business strategy, a tried-and-tested approach, and deep technical and sector expertise. We have broken this down into four key phases for big data deployments:

  1. Assess and Qualify: First, the focus is on understanding the nature of the organization's data, formulating its big data strategies, and building the business case.

  2. Design: Next, big data workloads and solution architecture need to be assessed and defined according to the individual needs of the organization.

  3. Develop and Operationalize: Work to develop the technical approach for deploying and managing big data on-premise or in the cloud. The approach should take into account governance, security, privacy, risk, and accountability requirements.

  4. Maintain and Support: Big data deployments are like well-oiled engines, and they need to be maintained, integrated, and operationalized with additional data, infrastructure and the latest techniques from the fields of analytics, AI, and ML.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
enterprise data ,data lakes ,big data ,data engineering

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}