Blog_Definition

What is Nifi – Part 1

Johannes Brucher

Johannes Brucher

…ist nicht nur auf dem Gebiet Search & Analytics Spezialist, sondern fühlt sich ebenso im Bereich SHI Publishing Solutions zuhause. Geboren 1981 in Bayern, studierte er Bioinformatik mit Schwerpunkt Softwareengineering an der FH Weihenstephan in Freising. Seit über 10 Jahren bereichert er SHI mit seinen Fachkenntnissen, die er bei SHI von „Junior Consultant“ zum „Senior Technical Consultant“ ausbaute. Lieblings-Dateiformat: WAV, MOV, jar

What is Nifi and how can it help moving up data a level higher – Part 1

Apache Nifi is a powerful system to process and distribute data across different systems. It enables the automation of data flows and can be seen as a data logistic platform.
Data can be processed in real time, in batches and even an event can trigger certain tasks.

Apache Nifi supports powerful and scalable directed graphs of data routing. To build up a data flow we need to understand the core concepts of Nifi first.

Nifi’s fundamental design concepts are closely related to the main ideas of Flow Based Programming (FBP).

Here is a list with the main concepts and how it fits into FBP:

Nifi terms

FBP terms

Description

FlowFile

Information Package

An object that is moving through a flow between each configured processor. A FlowFile consists of attributes (key/value pairs) and its associated content (bytes).

Processor

Black Box

Once a FlowFile reaches a processor, the actual work will be performed. Each processor in Nifi’s eco system can run to transform, clean, format, enrich, aggregate, inform etc. the data each FlowFile transports.

Connection and Queue

Bounded Buffer

Connections are the glue between processors. A connection is directed and acts as a queue allowing various processors to interact at different rates. These queues can be prioritized dynamically and can have upper bounds on load, which enable back pressure.

Controller (FlowFile configuration)

Scheduler

Mainly the configuration part of a FlowFile (scheduling, number of threads, FlowFile properties, …).
The Flow Controller acts as the broker facilitating the exchange of FlowFiles between processors.

Processor Group

Subnet

Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports. In this manner, process groups allow creation of entirely new components simply by composition of other components.

Input/Output Ports

Port

Ports are used for data intersection between processor groups and/or Nifi instances (the latter used for clustering and site-to-site connections)

To illustrate the above concept, figure 1 shows all basic components inside Nifi’s web based user interface (version 1.3.0).

 

As you can see, flows can be created in a visual way by dragging & dropping the Nifi components into Nifi’s root processor group and connecting each component by a directed graph.

This flexibility lets you create workflows that not only will move data from one node to another, it enables you the power to enrich, format, clean, aggregate and transform your data to fit into any company’s needs.

To achieve that flexibility, Nifi comes with a bunch of default Nifi processors on board.
Each Nifi processor has one single task or target, like it should be for Java methods. This allows us to reuse processors more frequently and more generally, chaining them to create and process more complex tasks.

There are generally 4 types of processors that will:

–    Modify the attributes of FlowFiles
–    Modify the content of FlowFiles
–    Consume data from a data source
–    Send data to a data source/node

A full list of all available processors can be found in Nifi’s official documentation.

 

Important to know

An important aspect of flow-based programming is the idea of resource-constrained relationships between the black boxes. In NiFi these are queues and processors respectively. FlowFiles are routed from one processor to another through queues simply by passing a reference to the FlowFile (similar to the „Claim Check“ pattern in EIP).
This will allow us to use many processors in series without worrying about performance issues!

What comes next?

In case there is no appropriate solution possible with the ~ 225 default processors, or even there is the need for a custom connector, Nifi supports the ability to extend the system by using different extension points.
All possible extension points are listed here. But the most important one is for sure the ‘Processor Extension Point’! This will be the topic on the next part of this series.

What is Nifi – Part 2

What is Nifi – Part 3

Posted in ,