Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

What’s New in HDP 2.6 for Enterprise Data Governance and Security? (Part 1)

DZone's Guide to

What’s New in HDP 2.6 for Enterprise Data Governance and Security? (Part 1)

By building security and data governance into the platform, we ensure that capabilities are administered consistently across all the components or data engines.

Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Hortonworks continues to advance the Hortonworks Data Platform (HDP) as an integrated portfolio of enterprise security and governance products for big data. By building security and data governance into the platform, we ensure that these capabilities are administered consistently across all the components or data engines, and when new engines are added to the platform they inherit the same level of security and governance.

This is why we’ve responded to customer requests for more enterprise capabilities with substantial investment in security and data governance options. Hortonworks recently announced the general availability of HDP version 2.6. This is a review of the new features and functionality that are introduced as part of Apache Atlas, Apache Ranger, and Apache Knox in HDP 2.6.

Data Governance

Hortonworks has been working alongside the Apache community on critical advancements for open metadata and governance via Apache Atlas. The vision for Apache Atlas project is to provide core metadata-driven governance services for Hadoop and enterprise data ecosystems. Key enterprise metadata and governance features in Atlas include:

  • Data lineage/provenance visualization.
  • Data classification.
  • Metadata catalog and search.
  • Enterprise ready real-time metadata and lineage ingestion with Hive, Sqoop, and Storm/Kafka.
  • Extensible APIs for custom metadata ingestion and APIs to register custom models.
  • Apache Ranger integration for classification based security.
  • Robust Metadata Repository to provide a flexible metamodel to capture technical, business, operational metadata.
  • Out-of-box metadata models for Hive, Storm, Sqoop, HDFS, Kafka, and HBase.

Atlas 0.8.0 included with HDP 2.6 offers the following key enhancements.

Structured Higher-Level API (ATLAS-1223, ATLAS-1241, ATLAS-1234, ATLAS-1308)

Apache Atlas APIs have been re-platformed to v2 and enhanced significantly to make them easier for community and partners to consume. The new easy-to-use and streamlined API makes it easier for user and partners to build extensions as well as accomplish more with a more succinct API set. The community has also added Swagger-based API documentation that will help improve onboarding of new users and help the community develop faster by making it easier for developers to understanding how to use APIs. The earlier v1 version of the APIs that were available in releases until HDP2.5 (Apache Atlas 0.7.x) are deprecated as of HDP2.6 and support for the older version will be terminated in a future release.

Revamped Search User Experience With Basic/Advanced Search (ATLAS-1630)

The metadata catalog search experience for users has been streamlined to offer performant and efficient search interface. Atlas metadata supports both a basic search functionality that will allow users to perform a search using a combination of entity type, classifications, and names (including wildcard support) as well as advanced search using Apache Atlas SQL-like query language DSL.

Separation of Lineage and Impact in Visualization (ATLAS-1667)

In HDP 2.6, the distinction between lineage and impact is shown clearly and visually for data assets. Lineage which answers the question where an entity originated from (source/provenance) is represented by the upstream path through all data assets and processes leading up to the current data asset. On the other hand, Impact answers the question of how a specific data is being used and what other data assets (derivative/dependent) does it impact. The impact is shown via the downstream path through all data assets and processes leading out of the current data asset. Lineage and Impact analysis, which are valuable enterprise features for forensic analysis, auditing, and compliance, are now even better in HDP 2.6. Lineage is shown visually with green arrows and impact is shown with red arrows in the Lineage and Impact widget on an asset detail page.

Classification (Tag)-Based Policy Support for HDFS, Kafka, HBase: (ATLAS-1309)

Building on the classification based security framework introduced in HDP 2.5 for Hive, the community has extended classification based security workflow coverage across the ecosystem. HDFS, Kafka, and HBase can now have classification policies applied via the integration of Atlas tagging with Apache Ranger’s tag-based policy. This new capability provides unified policy authoring and eases the security administration overhead in large enterprises by providing a simple authoring but extensive security policy framework that can be applied uniformly across multiple Hadoop components.

Knox SSO for Atlas UI (ATLAS-1244)

HDP 2.5 introduced enterprise ready SSO capabilities for the Hadoop ecosystem, by adding SAML v2 based SSO authentication via Apache Knox for Apache Ambari and Apache Ranger UIs. In HDP 2.6 this framework has been extended to include Atlas UI which also participates in SSO via Apache Knox.

Manually Create/Update Entities to Support HDFS, HBase, Kafka, and Custom entities (ATLAS-1193)

In HDP 2.6, the community has added the capability to update and create different types of entities in UI. This feature will enable manual addition, metadata maintenance and curation of data assets especially those for which built-in connectors or hooks are not yet available. Users can now register or update types (including custom types) with a REST call and subsequently define and manage all of the metadata for entities of that type via a manual form based UI in Apache Atlas. Once those entities are created they can be classified and tag-based Ranger policies can be applied for those entity types in Atlas.

Getting Started

That’s just a brief overview of the new features. Please check out the following links to learn more about HDP 2.6 data governance features and how to get started:

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,hadoop ,enterprise security ,apache atlas ,data governance

Published at DZone with permission of Srikanth Venkat, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}