<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: ahmedkadeh</title>
    <description>The latest articles on Forem by ahmedkadeh (@ahmedkadeh).</description>
    <link>https://forem.com/ahmedkadeh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F872112%2F8a0aced1-6097-4862-8ee8-f32085223f5a.jpeg</url>
      <title>Forem: ahmedkadeh</title>
      <link>https://forem.com/ahmedkadeh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/ahmedkadeh"/>
    <language>en</language>
    <item>
      <title>Serverless data processing in real-time with Lambda &amp; MSK 🚀</title>
      <dc:creator>ahmedkadeh</dc:creator>
      <pubDate>Wed, 08 Jun 2022 18:58:30 +0000</pubDate>
      <link>https://forem.com/ahmedkadeh/serverless-data-processing-in-real-time-with-lambda-msk-32c9</link>
      <guid>https://forem.com/ahmedkadeh/serverless-data-processing-in-real-time-with-lambda-msk-32c9</guid>
      <description>&lt;p&gt;In recent times, the serverless computing model has become an architectural approach increasingly popular. It enables faster development of modern applications, analytics and backend solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why use serverless implementation ?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It helps developers to focus on code rather than spending time in the configuration, managing, or scaling of backend infrastructure.&lt;/p&gt;

&lt;p&gt;Serverless allows you to build and run applications without giving a thought about the server. The provisioning, administration, and scaling of the application server are all managed dynamically by the cloud provider.&lt;/p&gt;

&lt;p&gt;Moreover, using serverless services allows to reduce drastically the bill.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;An overview of some AWS serverless service&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W7M8YRmt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r2q9e761oyq1l7lu05az.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W7M8YRmt--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r2q9e761oyq1l7lu05az.png" alt="AWS Serverless Services" width="880" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this post, I would like to share with you how to use a full serverless architecture to meet an analytics use case.&lt;/p&gt;

&lt;p&gt;As a big fan of AWS, I had to do this post using the technologies offered by this cloud provider but you will surely find the equivalents on GCP/AZURE 😇&lt;/p&gt;

&lt;p&gt;Everyone knows how complex it is to configure and set up a KAFKA cluster. AWS already offered a service called MSK, but since April 2, 2022, it has offered a version of this service in serverless mode. According to AWS&lt;br&gt;
it is the ideal solution to launch a project on Kafka when you are not able to predict the volume of data produced. &lt;/p&gt;

&lt;p&gt;For this post I decided to implement this simple and effective solution :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IR_oZpk6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qmsyc1ev7faat6wp0dfo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IR_oZpk6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qmsyc1ev7faat6wp0dfo.png" alt="Architecture" width="880" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This implementation will be split in 3 steps :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Getting started with the serverless MSK service&lt;/li&gt;
&lt;li&gt;Setting up the producer&lt;/li&gt;
&lt;li&gt;Setting up the consumer within a Lambda Function&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's go 🥷&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1 : Getting started with MSK service
&lt;/h2&gt;

&lt;p&gt;It's easy to deploy a Serverless MSK instance, it's just a matter of checking the serverless box and providing a network configuration&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uuTrFRQE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ipemvbdb3y15hr68i7y3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uuTrFRQE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ipemvbdb3y15hr68i7y3.png" alt="msk-configuration-1" width="880" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zzo8fjpc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cixaln526tzplwg1qib1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zzo8fjpc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cixaln526tzplwg1qib1.png" alt="msk-configuration-2" width="880" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the serverless is active, you will found the bootstrap servers endpoint : &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MjNIYL9F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfi5qnjjteb1vl6l77fi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MjNIYL9F--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kfi5qnjjteb1vl6l77fi.png" alt="msk-bs-server-1" width="880" height="170"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--a82hRKsH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/woe6pk0vrmurskkbrzcs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--a82hRKsH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/woe6pk0vrmurskkbrzcs.png" alt="msk-bs-server-2" width="880" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Keep this property near we will use it in the producer or consumer.&lt;/p&gt;

&lt;p&gt;Note that the serverless version supports only AWS IAM for client authentication and authorization.&lt;/p&gt;

&lt;p&gt;The default configuration applied on a serverless cluster &lt;a href="https://docs.aws.amazon.com/msk/latest/developerguide/serverless-config.html"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;⚠️ The network configuration is not editable once the instance is deployed. &lt;strong&gt;AWS MSK must be deployed on secured VPC&lt;/strong&gt; and can’t be exposed publicly. &lt;/p&gt;
&lt;h2&gt;
  
  
  Step 2 : Setting up the producer
&lt;/h2&gt;

&lt;p&gt;To simulate a KAFKA producer, we will use an EC2 instance.&lt;/p&gt;

&lt;p&gt;It must be assigned an IAM role (using IAM policy below) and a network configuration allowing it to communicate with the MSK cluster. The instance must be in the same VPC as the KAFKA cluster.&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Producer IAM Policy&lt;/u&gt; &lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Once the EC2 instance is ready, you will need to install java as well as the kafka tools to test the connection and create our topic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Java 11 installation
sudo yum -y install java-11

## KAFKA Tools installation
wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz
tar -xzf kafka_2.12-2.8.1.tgz
cd kafka_2.12-2.8.1/bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's create our topic&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export BS=[BOOTSTRAP_SERVER_URL]
./kafka-topics.sh --bootstrap-server $BS --command-config client.properties --create --topic salesdata.topic.1 --partitions 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let’s produce data to our MSK topic by using the kafka-console-producer.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Download the sample sales data JSONL File
wget https://raw.githubusercontent.com/ahmedkadeh31/websales-msk/master/data/sales-data.jsonl -P ~/data/

## Send data to the topic
./kafka-console-producer.sh --bootstrap-server $BS --producer.config client.properties --topic salesdata.topic.1 &amp;lt; ~/data/sales-data.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's check if our topic received the data :&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Cloud watch metrics&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x6LBzwGx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tpn682nnyjg7x43zb3z0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x6LBzwGx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tpn682nnyjg7x43zb3z0.png" alt="aws-cloudwatch-msk" width="880" height="635"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;u&gt;Kafka consumer result&lt;/u&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./kafka-console-consumer.sh --bootstrap-server $BS --consumer.config client.properties --topic salesdata.topic.1 --group test-group-1 --from-beginning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j_nYT7es--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vi9dsnhv3pbtu0m0zmoa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j_nYT7es--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vi9dsnhv3pbtu0m0zmoa.png" alt="aws-consumer-test" width="880" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ Step 2 done! &lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3 : Setting up the consumer Lambda Function
&lt;/h2&gt;

&lt;p&gt;Now that our topic is fed, let's focus on our consumer. &lt;/p&gt;

&lt;p&gt;When a message will be pushed in the topic, a trigger will invoke the lambda function.&lt;/p&gt;

&lt;p&gt;And to be fully serverless, our function will write the data in DynamoDB.&lt;/p&gt;

&lt;p&gt;The first action to do is to create an IAM role for the lambda. &lt;/p&gt;

&lt;p&gt;The IAM role need the following policies :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The policy created at the first step (for the producer) &lt;/li&gt;
&lt;li&gt;AWSLambdaMSKExecutionRole&lt;/li&gt;
&lt;li&gt;AmazonDynamoDBReadWriteAccess&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next we create the DynamoDB table called 'Sales':&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vaO8ykNW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/58spsbuccr7ecyctc00p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vaO8ykNW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/58spsbuccr7ecyctc00p.png" alt="dynamodb-table-creation" width="880" height="619"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let's code our lambda function ⚙️&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Once our function is in place, we can test it by using an example event&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;&lt;u&gt;Results&lt;/u&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;START RequestId: d1fd60fb-32ce-4bdb-bf16-590a8bf835c6 Version: $LATEST
[INFO]  2022-06-08T14:38:06.998Z        Found credentials in environment variables.
[INFO]  2022-06-08T14:38:07.046Z    d1fd60fb-32ce-4bdb-bf16-590a8bf835c6    Processing partition : mytopic-0
[INFO]  2022-06-08T14:38:07.046Z    d1fd60fb-32ce-4bdb-bf16-590a8bf835c6    Sales data in message :  {'OrderId': 'NT5a6j3D', 'OrderDate': '2022-06-05 14:30:48Z', 'CustomerEmail': 'Joseph_Moore9429@kideod.biz', 'CustomerFirstName': 'Joseph', 'CustomerLastName': 'Moore', 'ItemsQuantity': 23, 'OrdersAmount': '775,154,468Sk', 'CustomerPhone': '0-351-588-1823', 'ShippingStreet': 'Bush  Road, 2355', 'ShippingCity': 'Otawa', 'ShippingCountry': 'Vanuatu'}
[INFO]  2022-06-08T14:38:07.046Z    d1fd60fb-32ce-4bdb-bf16-590a8bf835c6    Saving data in DynamoDB table  : Sales
END RequestId: d1fd60fb-32ce-4bdb-bf16-590a8bf835c6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--s8MI89Gd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ioh2qcxxefe11njgfuir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--s8MI89Gd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ioh2qcxxefe11njgfuir.png" alt="dynamodb-result" width="880" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Great it works 🎉&lt;/p&gt;

&lt;p&gt;The last step is to set up the MSK trigger in order to invoke the lambda function when a message is published.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T1AkXW4t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mluyzn3ezvmt4hbny0bc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T1AkXW4t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mluyzn3ezvmt4hbny0bc.png" alt="msk-trigger-configuration" width="880" height="782"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once configured, wait a few minutes and you can run the producer to send data.&lt;/p&gt;

&lt;p&gt;To check that the function is working properly, take a look at the CloudWatch logs and the last processing result of the MSK trigger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8rQ9Vpm2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fxg2zda1zqqycq2z1kpj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8rQ9Vpm2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/fxg2zda1zqqycq2z1kpj.png" alt="msk-trigger-status" width="880" height="316"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;It was a good experience, this architecture is easy and quick to set up. By using MSK Serverless, there is no longer any need to spend hours configuring KAFKA, which saved us a lot of time.&lt;/p&gt;

&lt;p&gt;But the use of MSK in Serverless mode has limitations, in particular on authentication and exposure to the public, which limits its use in production. However, I think it's interesting (and profitable) to use this service for small streaming applications, development environment or setting up a POOC.&lt;/p&gt;

&lt;p&gt;Thanks for reading &amp;amp; hope that helps! 🤝&lt;/p&gt;

&lt;p&gt;You can find the project ressources on my &lt;a href="https://github.com/ahmedkadeh31/websales-msk"&gt;Github&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>lambda</category>
      <category>python</category>
    </item>
  </channel>
</rss>
