Over a million developers have joined DZone.

Waterline Data Brings Automated Data Cataloging to Hortonworks Data Platform through Integration with Apache Atlas

With Rapid Discovery, Governance and Time to Value for All Data Lake Assets, Customers Can Dramatically Accelerate Self-Service Analytics

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Waterline Data, The Smart Data Catalog Company, today announces the integration of the company’s Smart Data Catalog with Apache Atlas within Hortonworks Data Platform (HDP). This announcement is being made from the Hadoop Summit being held in San Jose, June 28-30.

The Apache Atlas project provides data governance framework and capabilities for Hadoop that effectively address many compliance requirements. With the addition of Waterline Data’s Smart Data Catalog, Apache Atlas users can replace manual tagging of metadata with an automated process that rapidly classifies the data assets in their data lake, including new data even as it’s created. Unlike catalogs that scan historical SQL logs, Waterline Data automatically catalogs every field of data in the data lake while capturing and learning from tribal knowledge.

HDP is the industry's only true secure, enterprise-ready open source Apache Hadoop distribution based on a centralized architecture (YARN). HDP addresses the complete needs of data-at-rest, powers real-time customer applications and delivers robust analytics that accelerate decision-making and innovation.

“We are very excited that Waterline Data has integrated their automated smart data cataloging capabilities with Apache Atlas, which brings added value to Waterline and Hortonworks users,” said Matt Morgan, Vice President of Product and Alliance Marketing, Hortonworks. “This helps customers rapidly organize their data lake, enabling more secure, compliant and optimal use of their data through Atlas.”

With this announcement, Waterline Data has now earned the Governance Ready badge. Previously, Waterline Data has earned HDP Certification and YARN integration certification. 

This new integration allows common customers to:

  • Accelerate data discovery, governance and time to value through smart data discovery capabilities
  • Provide data engineers, data scientists and business analysts with secure self-service access to trusted, high quality data for faster understanding and use 
  • Automatically update Atlas with all the metadata Waterline uncovers
  • Facilitate data compliance and trust by discovering sensitive data and data lineage

Furthermore, as part of the company’s integration with Apache Atlas via HDP, Waterline Data will begin importing the data lineage information captured in Apache Atlas.

“No data lake can be opened up without proper data governance,” said Alex Gorelik, CEO of Waterline Data. “If compliance isn’t assured, the data simply isn’t usable. That’s why our new integration with Apache Atlas is so significant. As soon as organizations begin to realize they can replace manual tagging with rapid, automatic cataloging, we expect to see a dramatic rise in the adoption and expanded use of Hadoop.”

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
big data ,hadoop ,apache atlas ,data lake assets ,metadata ,hortonworks ,HDP certification ,YARN integration certification

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}