Apache NiFi Overview
In this post, we explore the basics of NiFi and learn why it's a great tool in any data scientist's belt.
Join the DZone community and get the full member experience.Join For Free
What Is Apache NiFI?
Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.
NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.
How NiFi Works
NiFi consists of atomic elements which can be combined into groups to build simple or complex dataflows.
NiFi has processors and process groups.
What Is a Processor in NiFi?
A processor is an atomic element in NiFi which can do some specific task.
The latest version of NiFi has around 280+ processors, and each has its own responsibilities.
For example, the GetFile processor can read a file from a specific location, whereas the PutFile processor can write a file to a particular location. Like this, we have many other processors, that each address a unique aspect of the data pipeline.
We have processors to get data from various data sources and processors to write data to various data sources.
The data source can be almost anything. It can be any SQL database server like Postgres, Oracle, or MySQL, or it can be NoSQL databases like MongoDB or Couchbase. It can also be your search engines like Solr or Elasticsearch, or it can be your cache servers like Redis or HBase. It can even connect to Kafka Messaging Queue.
NiFi also has a rich set of processors to connect with Amazon AWS entities likes S3 Buckets and DynamoDB.
NiFi have a processor for almost everything you need when you're working with data. We will go deep into various types of processors available in NiFi in later posts. Even if you don’t find the right processor for your requirements, NiFi gives a simple way to write your custom processors.
Now let’s move on to the next term, FlowFile.
What Is a FlowFile in NiFi?
The actual data in NiFi propagates in the form of a FlowFile. The FlowFile can contain any data, say CSV, JSON, XML, or plain text, and it can even be SQL queries or binary data.
The FlowFile abstraction is the reason that NiFi can propagate any data from any source to any destination. A processor can process a FlowFile to generate a new FlowFile.
The next important term is connections.
What Is a Connection in NiFi?
In NiFi, all processors can be connected to create a data flow. This link between processors is called connections. Each connection between processors can act as a queue for FlowFiles as well.
The next one is the process group and input/output ports.
What Are Process Groups, Input Ports, and Output Ports in NiFi?
In NiFi, one or more processors are connected and combined into a process group. When you have a complex data flow, it’s better to combine processors into logical process groups. This helps in better maintaining the flows.
Process groups can have input and output ports which are used to move data between them.
The last and final term you should know is, 'controller services.'
What Is a Controller Service in NiFi?
Controller services are shared services that can be used by processors. For example, a processor which gets and puts data into a SQL database can have a Controller Service with the required DB connection details.
The controller service is not limited to DB connections.
To learn more about Apache NiFi, kindly visit my YouTube Channel. I have created a playlist, especially for beginners.
After finishing my YouTube tutorial, if you wish to dive deep into the advanced topic, you can opt my Udemy course.
You can learn the same course in Skillshare for FREE using the below referral link.
Published at DZone with permission of Manoj G T. See the original article here.
Opinions expressed by DZone contributors are their own.