Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

HDF 3.1 Blog Series Part 2: Introducing the NiFi Registry

DZone's Guide to

HDF 3.1 Blog Series Part 2: Introducing the NiFi Registry

In this installment of the series, we'll talk about a net new enterprise service we added to HDF: the Nifi Registry, powered by Apache NiFi registry.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Last week, we announced the GA of HDF 3.1, and to share more details about this milestone release we started the HDF 3.1 Blog Series. In this installment of the series, we'll talk about a net new enterprise service we added to HDF: the Nifi Registry, powered by Apache NiFi registry.

With the interactive command and control capabilities, as one of the cornerstones of Apache NiFi project, NiFi users can manage and edit their production flows in real-time. However, for those who prefer to go through a software development lifecycle (SDLC) flow deployment process for quality control, it is not as convenient. NiFi had support for using flow templates to facilitate SDLC use cases for a long time, but templates weren't designed/optimized for that use case in the first place: no easy version control mechanism, not user-friendly for sharing between multiple teams, no handling for sensitive properties, etc. But, now with the NiFi registry, you can get version control, collaboration, and easy deployment - significantly shortening the SDLC process, and accelerating flow deployment to achieve faster time to value.

In HDF 3.1, NiFi registry facilitates flow migration use cases by managing the whole lifecycle of a versioned flow, moving from DEV to QA to PROD. NiFi registry is designed to be agnostic to the type of persisted artifacts. As part of the HDF 3.1 release, flow is the only supported artifact, but processors/extensions/referenceable datasets could all be persisted in the NiFi registry down the lane. The NiFi registry can be installed via Ambari or outside of Ambari depending on the preferred cluster provisioning mechanism.

Let's walk through a quick example to further understand the power of the NiFi registry, and how exactly this can change the development lifecycle of NiFi flows.

First of all, open the NiFi registry UI, and you can see a list of available buckets (see Fig.1). The intention behind buckets - I can create a number of buckets mapping to my desired categorization and grouping in my organization. They could be mapped to different teams, different deployment environments, different usage patterns, or different business units. I have all the flexibility I need.

Notice that the buckets are not displayed in a hierarchical structure on the landing page, and rather the expectation is, I can view a list of 'recently added/modified artifacts.' Version flow is one type of artifacts, version processors/components/referenceable datasets could all be possible in a subsequent release.

Fig.1: NiFi registry UI & the concept of buckets

Now let's go ahead and create a new bucket: demo-bucket, with associated access policies (Fig.2). You can interact with a bucket via either web browser or REST API. In HDF 3.1, Ranger integration isn't supported yet, but certainly something we may address in a subsequent release. But, we do support a number of authentication mechanisms: LDAP, kerberos, certificates, etc.

Fig.2: new bucket and access policies

Now let's have a look at how a NIFI instance can interact with a flow registry. First of all, go add a NiFi registry client under global controller settings in your NiFi instance (Fig.3):

Fig. 3: NiFi registry client under global controller settings

Flow version control is defined at process group level, to be aligned with our PG level multi-tenant authorization model. Create a simple flow, and choose to 'start version control' on the given process group (Fig.4). When you save/update any version flow, a version number would be automatically assigned (Fig.5)

Fig. 4: start version control for a given process group

Fig.5: save version flow

Once a flow is correctly configured under version control, you can see a green check mark on the upper left corner (Fig.6). The icon will change when you have uncommitted local edits.

Fig.6: green checkmark indicating a NiFi flow is under version control

Now you can go back to check the registry UI, open the change log (Fig.7) for more details. Not only you know exactly who made the change, when the change was made, you can also take some actions on a version flow.

Fig.7: changelog of a version flow

Now you can go to any other NiFi environments, QA/STAGING/PROD, and import that version by simply dragging and dropping a process group on the canvas, and choosing 'import from...'.

Last but not least, NOT everything on your flow would be persisted in the NiFi registry. Sensitive properties are a good example, as most people wouldn't want their sensitive properties (which are likely environmental specific meanwhile) to be pulled into another environment by a different team during the flow migration process. Let's quickly walk through those exceptions, as well as the underlying logic behind the scene. Specifically, we are talking about the following categories:

Controller services

PG level controller services will be persisted in the version-controlled PG, but higher level controller services being referenced in a version-controlled PG will NOT be carried over to the target NiFi environment.

In an example scenario, assume PG-A is placed on the root canvas, with version control turned on. There are two PutHiveQL processors in PG-A. PutHiveQL-1 references the Hive connector defined at root level (inherited in PG-A), and PutHiveQL-2 references the Hive connector defined within PG-A. When PG-A gets imported to PROD, PutHiveQL-2 would carry over the referenced Hive connector controller service, but PutHiveQL-1 would not. But as long as you manually specify the appropriate Hive connector in PROD, you do not have to do it again when upgrading PG-A to a newer version in PROD. Notice that, even if you have a root-level Hive connector in PROD with the exact same name, that Hive connector will not be referenced in PutHiveQL-1, until you manually configure it. This is necessary to ensure the stability of flow migration to PROD, even when names are not unique.

Variables

Unlike how controller services are being handled, when you have a missing CS you can get a processor-level warning message, you won't get any warning message if a higher PG level variable being referenced in a processor is missing in your target NiFi environment. Therefore, during flow migration, all the variables being referenced in the version flow would be imported to your target NiFi environment, but their hierarchical levels would be flattened out unless there is an existing variable with the same key in the target environment.

In the previous example scenario, when PG-A gets imported to PROD, both root-level variables and PG-level variables would be imported to PROD as PG-level variables. But if you have pre-defined higher level variables with the same name, and could be inherited, those would be used instead.

Sensitive properties

Sensitive properties would not be persisted as part of a version flow, as you want to avoid having other folks accidentally pulling the sensitive information into their NiFi instance in a shared multi-tenancy environment. You need to manually configure those while importing a version flow for the first time, but don't have to do it again when upgrading your flow to a newer version.

Remote process group URLs

RPG URLs would be persisted as part of a version flow. The value would be automatically carried over while importing a version flow for the first time. You can edit the URL in your target environment, and that value will not be overwritten when upgrading your flow to a newer version with a different RPG URL.

What's Next?

In the next installment of the HDF 3.1 Blog Series, we discuss the addition of Kafka 1.0 and the exciting new HDF and Kafka integrations added to HDF 3.1.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,sdlc ,big data tools

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}