DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. So, You Want to Be a Tech Visionary: Part I

So, You Want to Be a Tech Visionary: Part I

Here is Part I of an executive guide to data lakes, how they work, and why you want one.

Greg Wood user avatar by
Greg Wood
·
Aug. 31, 16 · Opinion
Like (3)
Save
Tweet
Share
3.08K Views

Join the DZone community and get the full member experience.

Join For Free

You’ve heard it time and time again: cloud is the future. Those who don’t adopt modern Big Data practices will fall behind the pack. The next wave of IT disruption is right around the corner. And yet, at the same time, budgets are shrinking, demand is growing and pressure on the IT organization to show value is at an all-time high. As an executive, you have the full force of your business behind you and more options than ever to achieve both short- and long-term goals with business data. So many options, in fact, that the landscape has become a confusing, often contradictory mess of competing solutions.

Hadoop is one of the most widely adopted next-generation big data frameworks and is also one of the worst offenders as far as being confusing. MapR, Cloudera, or HortonWorks? Flume, Sqoop, Kafka, or NiFi? Spark or MapReduce? All of the many offerings in the Hadoop ecosystem have their strengths and weaknesses, but many are unrealistically sold as a silver bullet to solve an array of business problems. Likewise, the data lake architecture, although younger than Hadoop, holds great promise. However, this architecture can also be confusing for business leaders as it becomes more pervasive in the market. So, where do you start when you need concrete, proven big data solutions?

Some Basics About Hadoop and Data Lakes

You may know nothing about data lakes or Hadoop, you might have heard of them in passing, or you might already be rolling out a pilot. This post is meant to serve equally as an introduction and a reminder of some of the strengths and basic uses of Hadoop and data lakes. To level-set, let me first define Hadoop and data lakes.

Hadoop

Hadoop often refers to all of the many interrelated big data software products created under the umbrella of the Apache Foundation. Hadoop has also come to refer to bundles of these products sold by third-party vendors such as MapR, Cloudera, and Hortonworks, among many others.

Data Lakes

Data lakes are architectures of (usually enterprise-level) data storage, management, and governance. In this architecture, raw data is ingested into the “lake,” where it resides in an unaltered state until it is needed by the organization; it can then be processed, enriched and extracted without losing fidelity or metadata surrounding the raw data.

What Hadoop and Data Lakes Are NOT

Before jumping into what Hadoop and Data Lakes can do, here’s what they can’t do.

Hadoop Is Not a Drop-In Replacement for Traditional Database Systems

For everything that Hadoop is, it is not simple. It is radically different from traditional Oracle and IBM implementations and, although it is amazingly powerful, it is not one-size-fits-all. All of the nuances and subtleties would take a whole series of posts on their own, but for now understand that there is a place and function for both Hadoop and RDMS in a cutting-edge IT organization, especially when highly transactional processes are common.

Data Lakes Are Not a Wholesale Replacement for Data Warehouse Architectures

As tempting as it may be to rip out all of your EDW architecture and transition to a data lake, this is equivalent to opening all of the floodgates at once before the dam is built. Like gushing water that floods its surrounds, this approach will flood data owners, IT managers, and other end users—and not in a good way. A steady, carefully planned transition may or may not involve completely removing EDWs, even at full implementation. This is why, in a thoughtful data lake architecture such as in the above diagram, EDWs may still be present. The EDW portion may be significantly smaller in this architecture than in an EDW-only implementation, but it may never be completely eliminated.

data-lake-edw-hybrid.png

Data lakes and Hadoop are not set-it-and-forget-it systems

For very different reasons, Hadoop and a data lake architecture require expert, hands-on management throughout their lifecycles.

  • Hadoop is an ecosystem of Apache-managed open source projects. As such, it is constantly changing, evolving and shifting. Being abreast of the most current changes in each project is critical to long-term success.
  • Data lakes, if left unmanaged, can quickly become messy and unmanageable, creating a lack of transparency into the processes and origins of data, and growing in size and complexity until they are no longer efficient or cost effective. This is where products such as Zaloni’s Bedrock data lake management platform can be leveraged to effectively automate, manage and govern the data lake.

Ready to find out the upsides to data lakes? Stay tuned for Part II.

Big data hadoop Data lake Architecture

Published at DZone with permission of Greg Wood, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The 12 Biggest Android App Development Trends in 2023
  • How Do the Docker Client and Docker Servers Work?
  • Tech Layoffs [Comic]
  • Remote Debugging Dangers and Pitfalls

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: