<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Jannik Rebmann</title>
    <description>The latest articles on Forem by Jannik Rebmann (@jrebmann).</description>
    <link>https://forem.com/jrebmann</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1156284%2Fc5000739-c61b-4997-a577-8f47e4ba7a7f.jpg</url>
      <title>Forem: Jannik Rebmann</title>
      <link>https://forem.com/jrebmann</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/jrebmann"/>
    <language>en</language>
    <item>
      <title>Workbench for Apache NiFi data flows</title>
      <dc:creator>Jannik Rebmann</dc:creator>
      <pubDate>Fri, 17 May 2024 09:20:58 +0000</pubDate>
      <link>https://forem.com/jrebmann/workbench-for-apache-nifi-data-flows-5796</link>
      <guid>https://forem.com/jrebmann/workbench-for-apache-nifi-data-flows-5796</guid>
      <description>&lt;p&gt;This article presents the concept and implementation of a universal workbench for &lt;a href="https://nifi.apache.org/"&gt;Apache NiFi&lt;/a&gt; data flows.&lt;/p&gt;

&lt;p&gt;The workbench is intended to improve the quality of your data flows by increasing their:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Testability&lt;/strong&gt; - &lt;em&gt;How to test the specified functionality of my data flows?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility&lt;/strong&gt; - &lt;em&gt;How to ensure the functionality of my data flows after changes?&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt; - &lt;em&gt;How to define and test edge cases of my data flows?&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The article is primarily aimed at advanced Apache NiFi users, but is also of interest to beginners who are in the process of learning basic development concepts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;Following up my last post &lt;a href="https://dev.to/jrebmann/setup-a-secure-apache-nifi-cluster-in-kubernetes-1b86"&gt;Setup a secure Apache NiFi cluster in Kubernetes&lt;/a&gt;, I would now like to cover an important topic regarding the quality of the Apache NiFi data flow.&lt;/p&gt;

&lt;p&gt;Before I began using Apache NiFi, I took a close look at its tools and concepts. While I was excited about its potential, I found the standards and tools for developing data flows lacking. To be honest, I was expecting something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cmzuxpgmc2da63etptf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1cmzuxpgmc2da63etptf.png" alt="The picture above is a partial screenshot of the nodejs project [gitex-flow](https://github.com/gitex-flow/gitex-flow-vscode) unit tests in visual studio code." width="542" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The green lights give me a feeling of security and well-being. It signals to me that I can make adjustments and changes to the code without having to worry about destroying the functionality. Also, unit tests give me an entry point to understand programs and develop new features.&lt;/p&gt;

&lt;p&gt;Even if it is a bit challenging at first, &lt;a href="https://en.wikipedia.org/wiki/Test-driven_development"&gt;test-driven development&lt;/a&gt; is a very stable approach and prevents problems before they occur. Especially when extensions and refactorings are required after the first releases, the test-driven approach is unbeatable in my eyes.&lt;/p&gt;

&lt;p&gt;That doesn't mean that there aren't other great approaches. But unfortunately, I couldn't find any established approaches or best practices for developing data flows in Apache NiFi. Apart from a few articles on unit testing custom processors and using &lt;code&gt;GenerateFlowFile&lt;/code&gt; processors for basic &lt;a href="https://en.wikipedia.org/wiki/Smoke_testing_(software)"&gt;smoke testing&lt;/a&gt;, the number of articles about securing the functionality of data flows is small.&lt;/p&gt;

&lt;p&gt;Working in an environment that's always changing, where adapting functionality and ensuring reliability are key, I realized I needed to develop my own structured approach.&lt;/p&gt;

&lt;p&gt;That's when I came up with the idea of a &lt;em&gt;Data Flow Workbench&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The basic idea is straightforward:&lt;br&gt;
Simply encapsulate a data flow within a predefined and immutable context. Inputs and expected outputs remain the same, while only the data flow itself changes. This approach is similar to the principles of unit testing but needs to be tailored for data flow scenarios.&lt;/p&gt;
&lt;h2&gt;
  
  
  Concept
&lt;/h2&gt;

&lt;p&gt;Lets dive deeper into the concept of the Apache NiFi Workbench.&lt;/p&gt;

&lt;p&gt;Apache NiFi data flows can be organized within process groups, which is similar to organize code within functions. A &lt;code&gt;ProcessGroup&lt;/code&gt; has defined inputs and outputs (represented by input and output ports) that are connected by processing pipelines of process groups and processors. A &lt;code&gt;FlowFile&lt;/code&gt; is the data which flows through the piplines and consists of &lt;code&gt;attributes&lt;/code&gt; and a &lt;code&gt;content&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now you know everything you need to know about data flows in order to understand the following illustration, which shows the basic structure of the Workbench. The process group &lt;code&gt;PG main&lt;/code&gt; contains the data flow to be tested.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaecl57utzazca1h6xiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgaecl57utzazca1h6xiy.png" alt="Apache NiFi Workbench Concept" width="800" height="862"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Splitting the flow files into their attributes and contents as separate inputs allows them to be defined as separate files. The workbench requires the attributes as JSON, while the content can be in any file format. The flow files of all inputs are related via the &lt;code&gt;assert.group.identifier&lt;/code&gt; attribute. This ensures that the correct parts are put together.&lt;/p&gt;

&lt;p&gt;An example input can look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;[input]&lt;/code&gt; Attributes&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;[input]&lt;/code&gt; Content&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;[input]&lt;/code&gt; Expected Attributes&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;[input]&lt;/code&gt; Expected Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;[attribute]&lt;/code&gt; &lt;code&gt;assert.group.identifier&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;test&lt;/td&gt;
&lt;td&gt;test&lt;/td&gt;
&lt;td&gt;test&lt;/td&gt;
&lt;td&gt;test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;[content]&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ "done": false }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hello&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ "done": true }&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hello world&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Workflow
&lt;/h3&gt;

&lt;p&gt;As everywhere in software engineering, the quality of the software depends heavily on a proper design. For this reason, a clear and modular structure of the data flows is strongly recommended. In particular, it makes sense to decouple the data sources and data targets from the processing logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;---------------      -----------     ---------------
| Data Source |  --&amp;gt; | PG_main | --&amp;gt; | Data Target |
---------------      -----------     ---------------
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that's the scenario, the processing logic &lt;code&gt;PG_main&lt;/code&gt; can be integrated into the workbench as an isolated module, allowing us to define simulated inputs along with their corresponding expected outputs.&lt;/p&gt;

&lt;p&gt;However, the workbench truly unleashes its full potential when paired with the Apache NiFi Registry. Once changes are made to &lt;code&gt;PG_main&lt;/code&gt;, they can be committed to version management and seamlessly integrated into productive data flows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;TL;DR&lt;/code&gt; In this section the implementation of the workbench is explained in more detail.&lt;br&gt;
If you like to test and explorer the workbench yourself, you can directly jump to the conclusion to download the workbench template.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The workbench consists of two core modules (implemented as process groups):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Build FlowFile&lt;/code&gt;: Builds a &lt;code&gt;FlowFile&lt;/code&gt; from given &lt;code&gt;attributes&lt;/code&gt; and a specific &lt;code&gt;content&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Assert FlowFile&lt;/code&gt;: Compares the &lt;code&gt;attributes&lt;/code&gt; and the &lt;code&gt;content&lt;/code&gt; of two flow files and fails if they are different or pass if they are equal.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The implementation of the two core modules is described in more detail in the following two sections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Process Group: &lt;code&gt;Build FlowFile&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This module merges the &lt;code&gt;attributes&lt;/code&gt; with their corresponding &lt;code&gt;content&lt;/code&gt; into a single &lt;code&gt;FlowFile&lt;/code&gt;. Sounds simple for common programming languages, but is tricky for data flows, as there is no defined order in which files arrive.&lt;/p&gt;

&lt;p&gt;NiFi provides a standard processor (&lt;code&gt;MergeContent&lt;/code&gt;) for this purpose. The &lt;code&gt;Create fragments&lt;/code&gt; processors ensure that the required attributes of the &lt;code&gt;MergeContent&lt;/code&gt; processor are properly configured for incoming flow files. After the merge, the mime type is restored and some helper attributes are cleaned up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwhbzdkfrrybn7qo0j0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwhbzdkfrrybn7qo0j0k.png" alt="Module Build FlowFile" width="800" height="1268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Everything is actually pretty straight forward, but you may have noticed that I skipped the &lt;code&gt;JSON to attributes&lt;/code&gt; processor. Let's take a closer look. Unfortunately there is no standard processor for extracting a JSON from the flow file content as attributes. So it needs to be implemented.&lt;br&gt;
This can be done with the &lt;code&gt;ExecuteScript&lt;/code&gt; processor, which makes it possible to execute user-defined code on incoming flow files. There are various programming languages to choose from and I decided to use &lt;a href="https://en.wikipedia.org/wiki/Jython"&gt;Jython&lt;/a&gt; (python running on the Java plattform).&lt;/p&gt;

&lt;p&gt;The following code retrieves a flow file, reads and parses its content as JSON and adds the keys with their values as attributes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;org.apache.commons.io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IOUtils&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;java.nio.charset&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardCharsets&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;org.apache.nifi.processor.io&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InputStreamCallback&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JsonInputStreamCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;InputStreamCallback&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputStream&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;jsonStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;IOUtils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StandardCharsets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UTF_8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonStr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;flowFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;flowFile&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;callback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JsonInputStreamCallback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;newFlowFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;newFlowFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putAllAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newFlowFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transfer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newFlowFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REL_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An error occured&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transfer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REL_FAILURE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Process Group: Assert FlowFile
&lt;/h3&gt;

&lt;p&gt;The second core module compares the actual flow file against the expected one. The subsequent flow expands on the idea by adding the hash of the file's content as an attribute, and then proceeds to compare all existing attributes.&lt;br&gt;
There are some core attributes like &lt;code&gt;uuid&lt;/code&gt; and &lt;code&gt;filename&lt;/code&gt; (see &lt;a href="https://www.javadoc.io/doc/org.apache.nifi/nifi-utils/1.0.0/org/apache/nifi/flowfile/attributes/CoreAttributes.html"&gt;CoreAttributes&lt;/a&gt; for the complete list) that are ignored in the comparison.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcfz690qkuao8vfqq4ct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcfz690qkuao8vfqq4ct.png" alt="Module Assert FlowFile" width="800" height="913"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The heart of the data flow is the processor &lt;code&gt;Assert FlowFile&lt;/code&gt; which compares two flow files with each other. As Apache NiFi does not provide a standard processor for this either, this logic is also implemented using an &lt;code&gt;ExecuteScript&lt;/code&gt; processor.&lt;/p&gt;

&lt;p&gt;It tries to find the corresponding actual or expected flow file to be compared. If there is no matching file with the same attribute &lt;code&gt;assert.group.identifier&lt;/code&gt; it is pushed back to the queue. Otherwise the attributes of both files get compared and asserted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;java.util&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HashMap&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;org.apache.nifi.flowfile.attributes&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CoreAttributes&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;org.apache.nifi.processor&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;org.apache.nifi.processor.FlowFileFilter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilterResult&lt;/span&gt;

&lt;span class="n"&gt;FILE_TYPE_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;assert.file.type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;GROUP_IDENTIFIER_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;assert.group.identifier&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FlowFileTypeFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FlowFileFilter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FILE_TYPE_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GROUP_IDENTIFIER_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilterResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ACCEPT_AND_TERMINATE&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilterResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;REJECT_AND_CONTINUE&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilterResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ACCEPT_AND_TERMINATE&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FlowFileFilterResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;REJECT_AND_CONTINUE&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;getFlowFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;flowFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;FlowFileTypeFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;flowFiles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isEmpty&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;flowFiles&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;getAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;map&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HashMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flowFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttributes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FILE_TYPE_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attr&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CoreAttributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;attrName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;key&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;containsKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrName&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attrName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;map&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;actualAttrs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;expectedAttrs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getAttributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actualAttrs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expectedAttrs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The number of attributes differs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;actualAttrs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;expectedAttrs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;actualAttrs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Attribute &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; differs (actual / expected): {} / {}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actualAttrs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;expectedAttrs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getFlowFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actual&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getFlowFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GROUP_IDENTIFIER_KEY&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;diffMsg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FILE_TYPE_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;diffMsg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;actual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;assert_message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;diffMsg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transfer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REL_FAILURE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transfer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REL_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Something went wrong&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The workbench presented in this article can be used universally for all isolated dataflows in Apache NiFi. It therefore offers a great extension for the development and testing of data flows and thus makes a valuable contribution to increasing the quality of data flows.&lt;/p&gt;

&lt;p&gt;The attributes and content of the flow files can be input separately in the Workbench. This characteristic makes the concept easily extendable to a test suite for Apache NiFi, which I will present in a follow-up article.&lt;/p&gt;

&lt;p&gt;If you want to look at the workbench in detail feel free to download the workbench as an Apache NiFi 1.25 template:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://jannikrebmann.de/files/blog/workbench/NiFi_Workbench.xml"&gt;Download Workbench&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>testing</category>
      <category>cleancode</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Setup a secure Apache NiFi cluster in Kubernetes</title>
      <dc:creator>Jannik Rebmann</dc:creator>
      <pubDate>Thu, 23 Nov 2023 19:27:52 +0000</pubDate>
      <link>https://forem.com/jrebmann/setup-a-secure-apache-nifi-cluster-in-kubernetes-1b86</link>
      <guid>https://forem.com/jrebmann/setup-a-secure-apache-nifi-cluster-in-kubernetes-1b86</guid>
      <description>&lt;p&gt;This article provides a detailed, step-by-step guide on setting up a secure Apache NiFi cluster with a NiFi Registry in Kubernetes, featuring the following capabilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;NiFi and the NiFi Registry are secured via https.&lt;/li&gt;
&lt;li&gt;Authentication of all services is realized via OpenId Connect (OIDC).&lt;/li&gt;
&lt;li&gt;The internal communication between the nodes is encrypted.&lt;/li&gt;
&lt;li&gt;The communication between the cluster and the NiFi Registry is encrypted and authenticated.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;In a world where ChatGPT is bringing artificial intelligence into our everyday lives, data integration becomes a key challenge. Ensuring that AI systems receive the right data at the right time will be crucial.&lt;/p&gt;

&lt;p&gt;This article addresses this challenge using Apache NiFi, a proven data integration system that has been effectively solving data integration problems long before the AI revolution.&lt;/p&gt;

&lt;p&gt;However, I do have one major criticism of Apache NiFi: The barrier to entry is relatively high. The process of setting up a secure cluster of nodes and connecting it to a secure NiFi registry can be time-consuming, especially for those new to the system.&lt;/p&gt;

&lt;p&gt;That is why I have decided to write this article to help everyone get started with this excellent system. I promise it will be worth it!&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You need a Linux system (I used Ubuntu 22.04.3 LTS) with the following software installed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://docs.docker.com/engine/install/"&gt;docker&lt;/a&gt;: Platform and runtime environment for container virtualization.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://minikube.sigs.k8s.io/docs/start/"&gt;minikube&lt;/a&gt;: A local Kubernetes environment for development.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://helm.sh/docs/intro/install/"&gt;helm&lt;/a&gt;: A package manager for organizing software and systems developed for Kubernetes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Preparations
&lt;/h2&gt;

&lt;p&gt;The use of minikube is very simple. With just one command, a local Kubernetes is started, which offers all the features of Kubernetes except for the high scalability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; minikube config &lt;span class="nb"&gt;set &lt;/span&gt;cpus 4

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; minikube config &lt;span class="nb"&gt;set &lt;/span&gt;memory 8184

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; minikube start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.1 Enable ingress in minikube
&lt;/h3&gt;

&lt;p&gt;To be able to access the Apache NiFi services via URL later on, we still need to enable the ingress controller of minikube.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; minikube addons &lt;span class="nb"&gt;enable &lt;/span&gt;ingress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.2 Integrate &lt;code&gt;kubectl&lt;/code&gt; for minikube
&lt;/h3&gt;

&lt;p&gt;Another useful thing about minikube is that it always comes with a matching &lt;code&gt;kubectl&lt;/code&gt; client. This can be accessed with the command &lt;code&gt;minikube kubectl&lt;/code&gt; and behaves identically to a standalone installation of &lt;code&gt;kubectl&lt;/code&gt;. Therefore it is recommended to provide this command with an alias and enable auto-completion for &lt;code&gt;kubectl&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'alias kubectl="minikube kubectl --"'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'source &amp;lt;(kubectl completion bash)'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; ~/.bashrc

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After all the above steps are done, everything is set up to use &lt;code&gt;kubectl&lt;/code&gt; against minikube. To test &lt;code&gt;kubectl&lt;/code&gt;, you can run the following command (you can use auto-completion with &lt;code&gt;Tab&lt;/code&gt;) :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; kubectl version &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.3 Install cert-manager in minikube
&lt;/h3&gt;

&lt;p&gt;The cert-manager is a framework to organize X.509 certificates within a Kubernetes cluster and simplifies the process of obtaining, renewing and using certificates.&lt;/p&gt;

&lt;p&gt;To secure the NiFi cluster from unauthorized access and to encrypt the communication between the NiFi nodes, we use the cert-manager to issue us these certificates.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://cert-manager.io/docs/installation/#default-static-install"&gt;installation of the cert-manager&lt;/a&gt; can be done with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.4 Map NiFi domains to minikube
&lt;/h3&gt;

&lt;p&gt;While the ingress controller handles URL mapping within the Kubernetes cluster, it's important to note that the URL must initially reach minikube. After successfully configuring the setup, you will be able to access the following two addresses via your browser:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;nifi.example.org&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;nifi-registry.example.org&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can use the following command to add both mappings to the &lt;code&gt;/etc/hosts&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | sudo tee -a /etc/hosts
# Map nifi.example.org and nifi-registry.example.org to minikube ip
`minikube ip`   nifi.example.org
`minikube ip`   nifi-registry.example.org
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1.5 Register OpenID connect (OIDC) clients
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://openid.net/developers/how-connect-works/"&gt;OpenID Connect (OIDC)&lt;/a&gt; is a protocol for secure user authentication and information sharing, where the provider performs authentication on behalf of the application. OIDC has become a standard and is offered by many large platform providers such as Google, PayPal but also GitLab.&lt;br&gt;
Moreover, OIDC is gaining popularity within organizations. Solutions such as &lt;a href="https://www.keycloak.org/"&gt;Keycloak&lt;/a&gt; or &lt;a href="https://www.authelia.com/"&gt;Authelia&lt;/a&gt; offer a convenient ways to provide OpenId Connect on the basis of e.g. LDAP.&lt;/p&gt;

&lt;p&gt;Both Apache NiFi and the Apache NiFi Registry support OpenID Connect to authenticate their users. This means we have to register two clients with a OIDC provider.&lt;/p&gt;

&lt;p&gt;For this article I will use GitLab as OIDC provider. However, any other platform can be used as well. The only thing that changes is actually the domain name. The required information remains the same.&lt;/p&gt;

&lt;p&gt;For GitLab the OIDC client registration is very easy. Just open your GitLab profile and create two new Applications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NiFi OIDC Client:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Name&lt;/code&gt;: NiFi&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Redirect URI&lt;/code&gt;: &lt;a href="https://nifi.example.org/nifi-api/access/oidc/callback"&gt;https://nifi.example.org/nifi-api/access/oidc/callback&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Scopes&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;openid&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;email&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;NiFi Registry OIDC Client:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Name&lt;/code&gt;: NiFi Registry&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Redirect URI&lt;/code&gt;: &lt;a href="https://nifi-registry.example.org/nifi-registry-api/access/oidc/callback"&gt;https://nifi-registry.example.org/nifi-registry-api/access/oidc/callback&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Scopes&lt;/code&gt;:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;openid&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;email&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For both registrations you need to save the following information for later integration into our NiFi services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Placeholder&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Discovery URL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;discovery_url&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://gitlab.com/.well-known/openid-configuration"&gt;https://gitlab.com/.well-known/openid-configuration&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Application ID&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;application_id&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;c9515c774fa1036cbcae5de455a23cc6ca7da54109a858f5b2c6869a89d40f08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secret&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;secret&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;90b45e16b759fa097461917e7ef3df2c79916b548e1dc44f8d1c2b2c8a8c5537&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitLab email&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;registered_email&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="mailto:john.doe@example.org"&gt;john.doe@example.org&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"The OIDC standard uses a discovery endpoint (&lt;code&gt;discovery_url&lt;/code&gt;) to supply clients with configuration information from the OIDC server. This endpoint URL consistently ends with &lt;code&gt;.well-known/openid-configuration&lt;/code&gt; but might have a unique prefix path depending on the provider.&lt;/p&gt;

&lt;p&gt;For instance, Keycloak includes additional realm information in its discovery URL: &lt;code&gt;https://{keycloakhost}:{keycloakport}/realms/{realm}/.well-known/openid-configuration&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Setup an Apache NiFi cluster
&lt;/h2&gt;

&lt;p&gt;One major advantage of Kubernetes standardization is the availability of numerous preconfigured software packages, including entire software systems in the form of helm packages. Fortunately, there's a helm chart for Apache NiFi, simplifying the process of setting up a entire cluster.&lt;/p&gt;

&lt;p&gt;You can access this package through a public helm repository, which you can conveniently add to your local helm chart sources using these commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm repo add cetic https://cetic.github.io/helm-charts

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helm chart provides a wide range of configuration options, all documented in the associated GitHub project &lt;a href="https://github.com/cetic/helm-nifi"&gt;helm-nifi&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The following configuration deploys a secure NiFi cluster with two nodes and OIDC authentication (the placeholders &lt;code&gt;&amp;lt;application_id&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;secret&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;registered_email&amp;gt;&lt;/code&gt; are defined in section 1.5):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;fullnameOverride&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.18.0&lt;/span&gt;
&lt;span class="na"&gt;replicaCount&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;sensitiveKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;changeMechangeMe&lt;/span&gt;
  &lt;span class="na"&gt;isNode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;webProxyHost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi.example.org&lt;/span&gt;
&lt;span class="na"&gt;certManager&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="c1"&gt;## Uncomment the next two lines only if you have&lt;/span&gt;
  &lt;span class="c1"&gt;## installed the NiFi registry&lt;/span&gt;
  &lt;span class="c1"&gt;# caSecrets:&lt;/span&gt;
  &lt;span class="c1"&gt;#   - nifi-registry-ca&lt;/span&gt;
&lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;registered_email&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;oidc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;discoveryUrl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://gitlab.com/.well-known/openid-configuration&lt;/span&gt;
    &lt;span class="na"&gt;clientId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;application_id&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;secret&amp;gt;&lt;/span&gt;
    &lt;span class="na"&gt;claimIdentifyingUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
    &lt;span class="na"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;registered_email&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;persistence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;subPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/app-root&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/nifi&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/backend-protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPS&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/affinity-mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;persistent&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cookie"&lt;/span&gt;
    &lt;span class="na"&gt;cert-manager.io/issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nifi-ca"&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nifi.example.org&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nifi.example.org&lt;/span&gt;
      &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi-example-crt-secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can deploy the Apache NiFi cluster with the given configuration file &lt;code&gt;nifi_values.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; nifi_values.yaml nifi cetic/nifi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open your browser and enter the address &lt;code&gt;https://nifi.example.org&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Depending on your PC and internet connection, the download of all Docker images and the startup of the entire system can take several minutes.&lt;/p&gt;

&lt;p&gt;You can check the current progress by entering:&lt;/p&gt;


&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; kubectl get pods
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;If all pods are in the READY state, you should be able to access the service via the browser.&lt;/p&gt;

&lt;p&gt;You may have to accept your browser's certificate warning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Setup an Apache NiFi Registry
&lt;/h2&gt;

&lt;p&gt;As with the installation of the NiFi cluster, there is a helm package for the NiFi registry. You need to add it to your local helm repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm repo add dysnix https://dysnix.github.io/charts/

&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm repo update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following configuration deploys a secure NiFi registry with OIDC authentication (the placeholders &lt;code&gt;&amp;lt;application_id&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;secret&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;registered_email&amp;gt;&lt;/code&gt; are defined in section 1.5):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;fullnameOverride&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi-registry&lt;/span&gt;
&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tag&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.18.0&lt;/span&gt;
&lt;span class="na"&gt;security&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;needClientAuth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;registered_email&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;certManager&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;replaceDefaultTrustStore&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;caSecrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nifi-ca&lt;/span&gt;
  &lt;span class="na"&gt;additionalDnsNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nifi-registry&lt;/span&gt;
&lt;span class="na"&gt;oidc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;discoveryUrl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://gitlab.com/.well-known/openid-configuration&lt;/span&gt;
  &lt;span class="na"&gt;clientId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;application_id&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;secret&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;claimIdentifyingUser&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
  &lt;span class="na"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;registered_email&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;persistence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/app-root&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/nifi-registry&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/backend-protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HTTPS&lt;/span&gt;
    &lt;span class="na"&gt;cert-manager.io/issuer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;nifi-registry-ca"&lt;/span&gt;
  &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi-registry.example.org&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
          &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
  &lt;span class="na"&gt;tls&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nifi-registry.example.org&lt;/span&gt;
      &lt;span class="na"&gt;secretName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nifi-reistry-example-crt-secret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can deploy the Apache NiFi Registry with the given configuration file &lt;code&gt;nifi_reg_values.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; nifi_reg_values.yaml nifi-reg dysnix/nifi-registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open your browser and enter the address &lt;code&gt;https://nifi-registry.example.org&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Configure Apache NiFi
&lt;/h2&gt;

&lt;p&gt;Since we have enabled authentication with the NiFi Registry, the NiFi cluster must also authenticate itself to the registry. For this to happen, both services need to trust each other.&lt;/p&gt;

&lt;p&gt;To do this, we have to uncomment the &lt;code&gt;caSecrets&lt;/code&gt; configuration block from the &lt;code&gt;nifi_values.yaml&lt;/code&gt;. This imports the newly available NiFi Registry certificate into the local truststore of the NiFi nodes. To activate the change, you need to deploy configuration again. If you have copied the whole &lt;code&gt;nifi_values.yaml&lt;/code&gt; originally you can use following command to uncomment the lines and redeploy the NiFi nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/#[^#]//'&lt;/span&gt; nifi_values.yaml &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; helm upgrade &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; nifi_values.yaml nifi cetic/nifi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the cluster nodes have been rebooted, you will need to add a new NiFi registry to the NiFi settings. Navigate to the burger menu at the top right of the &lt;a&gt;NiFi UI&lt;/a&gt; and opening the &lt;em&gt;"Controller Settings"&lt;/em&gt; menu. Go to the &lt;em&gt;"REGISTER CLIENTS"&lt;/em&gt; tab and register a new client using the &lt;em&gt;"+"&lt;/em&gt; symbol.&lt;br&gt;
Give it a name and an optional description and save it.&lt;br&gt;
You will then need to edit the newly created entry, go to the &lt;em&gt;"PROPERTIES"&lt;/em&gt; tab and set the URL to &lt;code&gt;https://nifi-registry:18080&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzavdyufx87a2wmiiv9k0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzavdyufx87a2wmiiv9k0.png" alt="Register Registry" width="763" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This URL &lt;code&gt;https://nifi-registry:18080&lt;/code&gt; is the internal service address within the cluster. Do not use the external URL &lt;code&gt;https://nifi-registry.example.org&lt;/code&gt; otherwise the node authentication will fail.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now only the node authentication is left. For this purpose, each NiFi node is created as a separate user. These can be found by clicking on the burger menu at the top right of the &lt;a&gt;NiFi UI&lt;/a&gt; and opening the &lt;em&gt;"Users"&lt;/em&gt; menu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ivap226m8lmo52zenmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ivap226m8lmo52zenmu.png" alt="NiFi Users" width="551" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exactly these users must now be created and authorized in the NiFi Registry. To do this, go to the &lt;a href="https://nifi-registry.example.de"&gt;NiFi Registry UI&lt;/a&gt; and click on &lt;em&gt;"LOGIN"&lt;/em&gt; in the upper right corner.&lt;/p&gt;

&lt;p&gt;Once GitLab has successfully authenticated you, you will be redirected back to your NiFi registry and your username should show up along with a &lt;em&gt;"Settings"&lt;/em&gt; icon.&lt;/p&gt;

&lt;p&gt;You can click it and create a new &lt;em&gt;"Bucket"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xrsn5t0fy2tnvdwwyu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xrsn5t0fy2tnvdwwyu3.png" alt="Create Bucket" width="800" height="194"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then switch to the &lt;em&gt;"USERS"&lt;/em&gt; Tab and create the NiFi node users with read permissions on &lt;em&gt;"Can manage buckets"&lt;/em&gt; and read, write and delete permissions on &lt;em&gt;"Can proxy user requests"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5346syvpsunnx1k8v5p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5346syvpsunnx1k8v5p.png" alt="Set User Permissions" width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If all steps have been completed successfully, you should now see the "Sample Bucket" in the &lt;a&gt;NiFi UI&lt;/a&gt; when you import a &lt;em&gt;"Process Group"&lt;/em&gt; from the Registry (drag&amp;amp;drop a &lt;code&gt;Process Group&lt;/code&gt; from the header menu and click &lt;code&gt;Import from Registry&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuufvdztf18jpheujm1t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuufvdztf18jpheujm1t8.png" alt="Import Process Group from Registry" width="764" height="566"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Limitations and Troubleshooting
&lt;/h2&gt;

&lt;p&gt;The helm charts &lt;code&gt;cetic/nifi&lt;/code&gt; (&lt;code&gt;1.1.4&lt;/code&gt;) and &lt;code&gt;dysnix/nifi-registry&lt;/code&gt; (&lt;code&gt;1.1.4&lt;/code&gt;) are very helpful, but have some limitations.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The latest working NiFi and NiFi Registry docker versions are &lt;code&gt;1.18.0&lt;/code&gt;. The following container versions use Java 11, which brings a new default format (PKCS12 instead of JKS) for the truststore (breaking change).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When deploying the NiFi Registry, a permission error may occur on the &lt;code&gt;auth-conf&lt;/code&gt; folder. This is due to a &lt;a href="https://github.com/dysnix/charts/blob/eb70bc4270a4a6b491bc9cf3850b7e907e898c6d/dysnix/nifi-registry/templates/statefulset.yaml#L33"&gt;bug in helm chart&lt;/a&gt; which omits the setting of permissions on this folder. Either the permissions on this folder have to be set manually once, or the folder has to be added to the &lt;code&gt;initContainers&lt;/code&gt; script.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Due to the use of the cert-manager, certificates are issued on the fully qualified internal Kubernetes service name. This is composed &lt;a href="https://github.com/dysnix/charts/blob/eb70bc4270a4a6b491bc9cf3850b7e907e898c6d/dysnix/nifi-registry/templates/cert-manager.yaml#L23"&gt;as follows&lt;/a&gt;:&lt;br&gt;
&lt;code&gt;{{ fullname }}-{{ replicaCount }}.{{ fullname }}-headless.{{ namespace }}.svc.cluster.local&lt;/code&gt;.&lt;br&gt;
Since the common name of a certificate may not exceed 64 bytes, this leads to the following name length restriction:&lt;br&gt;
&lt;code&gt;2 * length(fullname) + length(replicaCount) + length(namespace) &amp;lt; 35 characters&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;sensitiveKey&lt;/code&gt; and &lt;code&gt;clientSecret&lt;/code&gt; secrets cannot be passed as Kubernetes secrets. This means that the configurations should not be pushed into a version control system.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatic scaling of an existing NiFi cluster is not possible by simply increasing the &lt;code&gt;replicaCount&lt;/code&gt;. Once deployed, you must additionally add the new nodes manually to the configuration of all cluster nodes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  5. Conclusion
&lt;/h2&gt;

&lt;p&gt;This article gave you a tutorial on how to set up a secure Apache NiFi cluster with Apache NiFi Registry integration. It also addresses the limitations and challenges that can arise in this process. If everything worked well, you can now seamlessly dive into modeling data integration workflows and become familiar with Apache NiFi's core functionality. With this knowledge, you're well-equipped to harness the advantages of this powerful platform and effectively handle your data integration tasks.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
