Over a million developers have joined DZone.

Waterline Data Brings Automated Data Cataloging to Hortonworks Data Platform through Integration with Apache Atlas

DZone's Guide to

Waterline Data Brings Automated Data Cataloging to Hortonworks Data Platform through Integration with Apache Atlas

With Rapid Discovery, Governance and Time to Value for All Data Lake Assets, Customers Can Dramatically Accelerate Self-Service Analytics

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

Waterline Data, The Smart Data Catalog Company, today announces the integration of the company’s Smart Data Catalog with Apache Atlas within Hortonworks Data Platform (HDP). This announcement is being made from the Hadoop Summit being held in San Jose, June 28-30.

The Apache Atlas project provides data governance framework and capabilities for Hadoop that effectively address many compliance requirements. With the addition of Waterline Data’s Smart Data Catalog, Apache Atlas users can replace manual tagging of metadata with an automated process that rapidly classifies the data assets in their data lake, including new data even as it’s created. Unlike catalogs that scan historical SQL logs, Waterline Data automatically catalogs every field of data in the data lake while capturing and learning from tribal knowledge.

HDP is the industry's only true secure, enterprise-ready open source Apache Hadoop distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision-making and innovation.

“We are very excited that Waterline Data has integrated their automated smart data cataloging capabilities with Apache Atlas, which brings added value to Waterline and Hortonworks users,” said Matt Morgan, Vice President of Product and Alliance Marketing, Hortonworks. “This helps customers rapidly organize their data lake, enabling more secure, compliant and optimal use of their data through Atlas.”

With this announcement, Waterline Data has now earned the Governance Ready badge. Previously, Waterline Data has earned HDP Certification and YARN integration certification. 

This new integration allows common customers to:

  • Accelerate data discovery, governance and time to value through smart data discovery capabilities
  • Provide data engineers, data scientists and business analysts with secure self-service access to trusted, high quality data for faster understanding and use 
  • Automatically update Atlas with all the metadata Waterline uncovers
  • Facilitate data compliance and trust by discovering sensitive data and data lineage

Furthermore, as part of the company’s integration with Apache Atlas via HDP, Waterline Data will begin importing the data lineage information captured in Apache Atlas.

“No data lake can be opened up without proper data governance,” said Alex Gorelik, CEO of Waterline Data. “If compliance isn’t assured, the data simply isn’t usable. That’s why our new integration with Apache Atlas is so significant. As soon as organizations begin to realize they can replace manual tagging with rapid, automatic cataloging, we expect to see a dramatic rise in the adoption and expanded use of Hadoop.”

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

big data ,hadoop ,apache atlas ,data lake assets ,metadata ,hortonworks ,HDP certification ,YARN integration certification

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}