Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

New Features in Apache NiFi 1.5: Registry and Version Control

DZone's Guide to

New Features in Apache NiFi 1.5: Registry and Version Control

Come learn about a revolutionary new version has been added to Apache NiFi that adds a powerful Flow File Registry and Version Control.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In 2018, some awesome new in-demand features came to my favorite Swiss Army Knife of IoT and enterprise development: Apache NiFi. Speaking of knives, for fun, say, “Apache NiFi” to Google Assistant.

OK, back to the awesome new release of Apache NiFi.

There are a couple new processors that I want to highlight first. I like the new CountText processor; it's useful for counting elements of text documents like words and lines. My example flow is using it and I see some useful metrics gathering there. I also think some of these could be used as file validation checks to feed to machine learning algorithms. My files of type X are usually a certain number of lines and words, but not this time. I have come across a couple of use cases on file ingest in the past that could use this. In one example, a company was reading personnel files from a sFTP. The first step in validation was checking to see that they received the proper number of lines — one person per line. Another occasion: sometimes, the client would receive bad files in FTP. They looked fine, but the last few records in a file would be missing, so they needed to meet a minimum number of characters in the file. In yet another, they were counting words for legal documents.

text.line.count 1 

No value set

text.word.count

Another cool processor that I will talk about in greater detail in future articles is the much-requested Spark Processor. The ExecuteSparkInteractive processor with its Livy Controller Service gives you a much better alternative to my hacky REST integration to calling Apache Spark batch and machine learning jobs.

There are a number of enhancements, new processors, and upgrades I’m excited about, but the main reason I am writing today is because of a new feature that allows for having an Agile SDLC with Apache NiFi. This is now enabled by Apache NiFi Registry. It’s as simple as a quick git clone or download and then, you'll use Apache Maven to install Apache NiFi Registry and start it. This process will become even easier with future Ambari integration for a CLI-free install.

To integrate the Registry with Apache NiFi, you need to add a Registry Client. It’s very simple to add the default local one — see below.

Accessing Apache NiFi registry

By default, it will be running here: http://localhost:18080/nifi-registry/.

I did a quick install and did not set any security parameters. With the next HDF release, everything will be integrated and simple.

Accessing Apache NiFi Flow Registry API

As is the case with Apache NiFi, there is a great REST API that comes with the new Apache NiFi Registry. This API is very well-documented and easy to follow. This will allow for easy integration with all the popular DevOps automation tools that will please all the DevOps focused teams out there.

  • http://localhost:18080/nifi-registry-api/buckets
  • http://localhost:18080/nifi-registry-api/items
  • http://localhost:18080/nifi-registry-api/tenants/user-groups
  • http://localhost:18080/nifi-registry-api/tenants/users
  • http://localhost:18080/nifi-registry-api/policies
  • http://localhost:18080/nifi-registry-api/access

Example output:

{
  "identity": "anonymous",
  "anonymous": true,
  "resourcePermissions": {
    "buckets": {
      "canRead": true,
      "canWrite": true,
      "canDelete": true
    },
    "tenants": {
      "canRead": true,
      "canWrite": true,
      "canDelete": true
    },
    "policies": {
      "canRead": true,
      "canWrite": true,
      "canDelete": true
    },
    "proxy": {
      "canRead": true,
      "canWrite": true,
      "canDelete": true
    },
    "anyTopLevelResource": {
      "canRead": true,
      "canWrite": true,
      "canDelete": true
    }
  }
}

I added a few buckets to try it out.

After you have done that, you can start using it in Apache NiFi. It could not be easier.

Create or use an existing Processor Group. Right-click and pick Version > Start version control.

You then pick a Registry (if you have more than one) and a bucket. A bucket is a logical categorization of related flows. I created buckets for development, testing, and production. You then add a name, description, and comments for this particular flow and then hit Save.

You have just versioned a Process Group! You can now run Agile team development with Apache NiFi in your enterprise with familiar version control, team development, and isolation.

Now, you can edit your flow and see that it has changed.

You can now easily commit those changes or revert. To see what changed, pick Show local changes.

As you can see, you get a very slick display of what changed to what component.

Now, let’s jump to Apache NiFi Registry and see what happened.

The above screenshot shows that my flow (Nifi 1.5 Test) has been stored in bucket Tim and has three saved versions.

An example versioned test flow:

Now that your flow is version-controlled, others can import it into their workspace (depending on security).

You can choose from any of the versions based on your needs. For teams, this part is awesome:

You will know if there’s a newer version and you can pick that one if you wish. Or not. You can run many copies of the same flow with different variables and versions.

My next article will be about updates to integrating with Apache Spark via Apache Livy.

Change to another version:

Commit your local changes (or revert them):

Save your flow version to any bucket or registry that you have permissions to:

Your variable registry is per versioned processor group:

This is the second version I am saving. Add some comments.

New subproject, processors, tasks, and services:

  • MoveHDFS processor
  • Kafka 1.0 processors
  • CSVRecordLookupService
  • New Graphite reporting task
  • Spark job executor with Apache Livy integration
  • FlattenJSON processor
  • DeleteMongo processor
  • TextCount processor
  • Apache NiFi Registry

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,hadoop ,apache nifi ,version control ,registry

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}