<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: clasnake</title>
    <description>The latest articles on Forem by clasnake (@clasnake).</description>
    <link>https://forem.com/clasnake</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F92300%2F0b1a8a5e-46c5-42da-b37f-4785c2dff432.jpeg</url>
      <title>Forem: clasnake</title>
      <link>https://forem.com/clasnake</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/clasnake"/>
    <language>en</language>
    <item>
      <title>What is ISR (In-Sync Replicas) in Kafka?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Wed, 05 Mar 2025 14:29:11 +0000</pubDate>
      <link>https://forem.com/clasnake/what-is-isr-in-sync-replicas-in-kafka-55if</link>
      <guid>https://forem.com/clasnake/what-is-isr-in-sync-replicas-in-kafka-55if</guid>
      <description>&lt;h2&gt;
  
  
  What is ISR?
&lt;/h2&gt;

&lt;p&gt;ISR (In-Sync Replicas) is a fundamental concept in Kafka that represents a set of replicas that are in sync with the leader partition. This set includes the leader replica itself and all follower replicas that are actively syncing with the leader. The ISR mechanism is crucial for ensuring high availability and data consistency in Kafka.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ISR Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Basic Concepts
&lt;/h3&gt;

&lt;p&gt;Each partition's ISR list contains two types of replicas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leader Replica: The primary replica that handles all read and write requests&lt;/li&gt;
&lt;li&gt;Follower Replicas: Secondary replicas that replicate data from the leader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key characteristics of the ISR mechanism:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ISR membership is dynamic and automatically adjusts based on replica sync status&lt;/li&gt;
&lt;li&gt;Only replicas in the ISR are eligible to become the new leader&lt;/li&gt;
&lt;li&gt;With &lt;code&gt;acks=all&lt;/code&gt;, writes are considered successful only after all ISR replicas confirm&lt;/li&gt;
&lt;li&gt;Kafka uses ZooKeeper to persist and synchronize ISR changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's a concrete example:&lt;br&gt;
Consider a partition with 3 replicas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replica on Broker-1 is the Leader&lt;/li&gt;
&lt;li&gt;Replica on Broker-2 is in sync (lag &amp;lt; 5s)&lt;/li&gt;
&lt;li&gt;Replica on Broker-3 is lagging (20s behind)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this case, the ISR list only includes replicas on Broker-1 and Broker-2. The replica on Broker-3 is temporarily removed from ISR. Once it catches up, it will automatically rejoin the ISR list.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqk08we0ro2qli2s704q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqk08we0ro2qli2s704q.png" alt="Kafka ISR" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ISR Membership Rules
&lt;/h3&gt;

&lt;p&gt;Requirements to join ISR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Follower's message lag must be within acceptable limits (controlled by replica.lag.time.max.ms)&lt;/li&gt;
&lt;li&gt;Follower must maintain active fetch requests to the leader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conditions for removal from ISR:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replica falls behind beyond the allowed time threshold&lt;/li&gt;
&lt;li&gt;Broker hosting the replica fails&lt;/li&gt;
&lt;li&gt;Replica encounters synchronization errors&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ISR Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Core Settings
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Maximum allowed time for replica lag
&lt;/span&gt;&lt;span class="py"&gt;replica.lag.time.max.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;

&lt;span class="c"&gt;# Minimum number of in-sync replicas required
&lt;/span&gt;&lt;span class="py"&gt;min.insync.replicas&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;

&lt;span class="c"&gt;# Whether to allow non-ISR replicas to become leader
&lt;/span&gt;&lt;span class="py"&gt;unclean.leader.election.enable&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Producer Settings
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ensure writes to all ISR replicas
&lt;/span&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;

&lt;span class="c"&gt;# Number of retry attempts
&lt;/span&gt;&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ISR in Practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Reliability
&lt;/h3&gt;

&lt;p&gt;When producers use &lt;code&gt;acks=all&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa03y0dcgo2fy5h2s0vi5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa03y0dcgo2fy5h2s0vi5.png" alt="Kafka Data Reliability" width="800" height="683"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Leader Election
&lt;/h3&gt;

&lt;p&gt;When the leader replica fails:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpibcnua3c0hdtmoql7th.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpibcnua3c0hdtmoql7th.png" alt="Kafka Leader Election" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Issues and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Frequent ISR Shrinking
&lt;/h3&gt;

&lt;p&gt;Common causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network latency spikes leading to sync timeouts&lt;/li&gt;
&lt;li&gt;High system load on follower nodes&lt;/li&gt;
&lt;li&gt;Extended GC pauses disrupting sync processes&lt;/li&gt;
&lt;li&gt;Disk I/O bottlenecks affecting write performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recommended solutions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Parameter Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Increase &lt;code&gt;replica.lag.time.max.ms&lt;/code&gt; for networks with higher latency&lt;/li&gt;
&lt;li&gt;Adjust &lt;code&gt;replica.fetch.wait.max.ms&lt;/code&gt; based on network characteristics&lt;/li&gt;
&lt;li&gt;Increase &lt;code&gt;replica.fetch.max.bytes&lt;/code&gt; to optimize sync efficiency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Follower Performance Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor and address CPU usage spikes&lt;/li&gt;
&lt;li&gt;Optimize memory allocation and usage&lt;/li&gt;
&lt;li&gt;Consider SSD storage for improved I/O performance&lt;/li&gt;
&lt;li&gt;Isolate Kafka brokers from other resource-intensive services&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;JVM Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select an appropriate GC algorithm for the workload&lt;/li&gt;
&lt;li&gt;Optimize heap size configuration&lt;/li&gt;
&lt;li&gt;Implement comprehensive GC monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Network Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure adequate network bandwidth&lt;/li&gt;
&lt;li&gt;Monitor and address network latency issues&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Data Loss Risks
&lt;/h3&gt;

&lt;p&gt;Critical scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ISR set reduction to a single replica&lt;/li&gt;
&lt;li&gt;Enabled unclean leader election&lt;/li&gt;
&lt;li&gt;Network partitioning events&lt;/li&gt;
&lt;li&gt;Unexpected traffic surges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prevention strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Replica Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintain minimum of 2 in-sync replicas&lt;/li&gt;
&lt;li&gt;Disable unclean leader election&lt;/li&gt;
&lt;li&gt;Implement regular replica status monitoring&lt;/li&gt;
&lt;li&gt;Deploy balanced replica distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Monitoring Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement ISR size change monitoring&lt;/li&gt;
&lt;li&gt;Track replica synchronization metrics&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Capacity Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintain adequate resource headroom&lt;/li&gt;
&lt;li&gt;Monitor cluster metrics&lt;/li&gt;
&lt;li&gt;Plan proactive scaling&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The ISR mechanism is fundamental to Kafka's reliability and high availability. Successful implementation requires balancing data durability with performance requirements. Each deployment should be tuned according to specific use cases and operational requirements.&lt;/p&gt;

&lt;p&gt;Related Topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-what-is-topic-and-partition" rel="noopener noreferrer"&gt;What are Topics and Partitions in Kafka?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-consumer-rebalance" rel="noopener noreferrer"&gt;How Does Kafka Consumer Rebalancing Work?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Practice Message Queue Interview Questions Like LeetCode&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>pubsub</category>
      <category>kafka</category>
    </item>
    <item>
      <title>What is Zero Copy in Kafka?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Mon, 03 Mar 2025 01:55:52 +0000</pubDate>
      <link>https://forem.com/clasnake/what-is-zero-copy-in-kafka-490p</link>
      <guid>https://forem.com/clasnake/what-is-zero-copy-in-kafka-490p</guid>
      <description>&lt;h2&gt;
  
  
  What is Zero Copy?
&lt;/h2&gt;

&lt;p&gt;Zero Copy is a technique that eliminates unnecessary data copying between memory regions by the CPU. In Kafka, this technology optimizes data transfer from disk files to the network, reducing redundant data copies and improving transmission efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional Copy vs. Zero Copy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Traditional Copy Process
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4dj5tlrw3qontrngzeg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg4dj5tlrw3qontrngzeg.png" alt="Traditional Copy Process" width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The traditional data copy process involves 4 copies and 4 context switches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Disk --&amp;gt; Kernel Buffer&lt;/li&gt;
&lt;li&gt;Kernel Buffer --&amp;gt; Application Buffer&lt;/li&gt;
&lt;li&gt;Application Buffer --&amp;gt; Socket Buffer&lt;/li&gt;
&lt;li&gt;Socket Buffer --&amp;gt; NIC Buffer&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Zero Copy Process
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo73a81cwkpezbrj6ei6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo73a81cwkpezbrj6ei6c.png" alt="Zero Copy Process" width="800" height="213"&gt;&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Zero Copy requires only 2 copies and 2 context switches:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Disk --&amp;gt; Kernel Buffer&lt;/li&gt;
&lt;li&gt;Kernel Buffer --&amp;gt; NIC Buffer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Performance Benefits of Zero Copy
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reduced CPU Copy Operations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decreased from 4 copies to 2&lt;/li&gt;
&lt;li&gt;Lower CPU utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fewer Context Switches&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced from 4 switches to 2&lt;/li&gt;
&lt;li&gt;Decreased system call overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enhanced Data Transfer Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct data flow from page cache to NIC&lt;/li&gt;
&lt;li&gt;Elimination of intermediate buffers&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Zero Copy Implementation in Kafka
&lt;/h2&gt;

&lt;p&gt;Kafka's Zero Copy implementation relies on two key features of Java NIO: memory mapping (mmap) and the sendfile system call. These mechanisms offer different advantages for optimizing data transfer efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. mmap (Memory Mapping)
&lt;/h3&gt;

&lt;p&gt;Memory mapping allows direct access to kernel space memory from user space, eliminating the need to copy data between kernel and user space. This method is particularly effective for small file transfers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Implementing memory mapping using MappedByteBuffer&lt;/span&gt;
&lt;span class="nc"&gt;FileChannel&lt;/span&gt; &lt;span class="n"&gt;fileChannel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RandomAccessFile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rw"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getChannel&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;MappedByteBuffer&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fileChannel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;FileChannel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MapMode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;READ_WRITE&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fileChannel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. sendfile
&lt;/h3&gt;

&lt;p&gt;Introduced in Linux 2.1, sendfile is a system call that directly transfers data between file descriptors. It's ideal for large file transfers and is implemented through FileChannel's transferTo method in Java NIO.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Implementing Zero Copy using transferTo&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;transferTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;IOException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;FileChannel&lt;/span&gt; &lt;span class="n"&gt;sourceChannel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FileInputStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getChannel&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="nc"&gt;FileChannel&lt;/span&gt; &lt;span class="n"&gt;destChannel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FileOutputStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getChannel&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;sourceChannel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;transferTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sourceChannel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;destChannel&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Comparison of Implementation Methods
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;mmap:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Suitable for small files, supports random access&lt;/li&gt;
&lt;li&gt;Cons: Higher memory usage, potential page faults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;sendfile:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pros: Optimal for large files, more efficient Zero Copy&lt;/li&gt;
&lt;li&gt;Cons: No data modification support, whole-file transfer only&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Applications in Kafka
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Log File Transfer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Brokers use Zero Copy to efficiently send log files directly to consumers&lt;/li&gt;
&lt;li&gt;Leverages sendfile for high-performance bulk log transfer&lt;/li&gt;
&lt;li&gt;Significantly reduces memory usage and CPU overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Message Production and Consumption
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Optimizes network transfer for large batch message production&lt;/li&gt;
&lt;li&gt;Enables efficient data retrieval during batch consumption&lt;/li&gt;
&lt;li&gt;Uses mmap for flexible access to small message batches&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Cluster Data Synchronization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Facilitates efficient data transfer from Leader to Follower replicas&lt;/li&gt;
&lt;li&gt;Reduces network overhead in cross-datacenter replication&lt;/li&gt;
&lt;li&gt;Accelerates large-scale data migration processes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Strategic Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose implementation based on file size: mmap for files under 1MB, sendfile for larger files&lt;/li&gt;
&lt;li&gt;Apply appropriate methods per use case: sendfile for log transfer, mmap for random access&lt;/li&gt;
&lt;li&gt;Balance memory usage and performance: monitor available system memory&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track key metrics: CPU usage, memory utilization, I/O wait times&lt;/li&gt;
&lt;li&gt;Set appropriate alerts: trigger at 70% CPU or 80% memory usage&lt;/li&gt;
&lt;li&gt;Identify bottlenecks through I/O wait time analysis&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configuration Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tune system parameters: adjust vm.max_map_count, file descriptors&lt;/li&gt;
&lt;li&gt;Optimize memory allocation: configure JVM heap size, reserve page cache memory&lt;/li&gt;
&lt;li&gt;Fine-tune socket buffer sizes based on workload&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Security Considerations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor file descriptor leaks&lt;/li&gt;
&lt;li&gt;Plan capacity based on growth projections&lt;/li&gt;
&lt;li&gt;Implement robust backup strategies&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Zero Copy is a fundamental technology behind Kafka's high performance. By minimizing data copies and context switches, it significantly improves data transfer efficiency. Success in implementation requires careful consideration of use cases and ongoing performance monitoring.&lt;/p&gt;

&lt;p&gt;Related Resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/problems/mq-kafka-high-performance" rel="noopener noreferrer"&gt;Kafka's High-Performance Design&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Practice Message Queue Interview Questions Like LeetCode&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kafka</category>
      <category>eventdriven</category>
      <category>pubsub</category>
      <category>programming</category>
    </item>
    <item>
      <title>How Does Kafka Log Compaction Work?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Tue, 25 Feb 2025 12:40:41 +0000</pubDate>
      <link>https://forem.com/clasnake/how-does-kafka-log-compaction-work-5949</link>
      <guid>https://forem.com/clasnake/how-does-kafka-log-compaction-work-5949</guid>
      <description>&lt;h2&gt;
  
  
  What is Log Compaction?
&lt;/h2&gt;

&lt;p&gt;Log Compaction is Kafka's intelligent way of managing data retention. Instead of simply deleting old messages, it keeps the most recent value for each message key while removing outdated values. This approach is especially valuable when you need to maintain the current state of your data, such as with database changes or configuration settings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqfbxdj351nlkss7bea5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqfbxdj351nlkss7bea5.png" alt="Kafka Log Compaction" width="800" height="980"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How Log Compaction Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Log Storage Structure
&lt;/h3&gt;

&lt;p&gt;Kafka divides the log into two segments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clean Segment&lt;/strong&gt;: Data that has been compacted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dirty Segment&lt;/strong&gt;: New data waiting for compaction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Compaction Process
&lt;/h3&gt;

&lt;p&gt;The compaction process consists of two main phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scanning Phase&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scans through all messages in the Dirty segment&lt;/li&gt;
&lt;li&gt;Creates an index of message keys and their latest positions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cleaning Phase&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preserves only the most recent record for each key&lt;/li&gt;
&lt;li&gt;Removes outdated duplicate records&lt;/li&gt;
&lt;li&gt;Maintains the original message sequence&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  3. Compaction Triggers
&lt;/h3&gt;

&lt;p&gt;Compaction kicks in when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uncompacted data ratio exceeds threshold&lt;/li&gt;
&lt;li&gt;Scheduled time interval is reached&lt;/li&gt;
&lt;li&gt;Manual compaction is triggered&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Configure Log Compaction?
&lt;/h2&gt;

&lt;p&gt;Here's how to set up log compaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable log compaction
&lt;/span&gt;&lt;span class="py"&gt;log.cleanup.policy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;compact&lt;/span&gt;

&lt;span class="c"&gt;# Set compaction check interval
&lt;/span&gt;&lt;span class="py"&gt;log.cleaner.backoff.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;30000&lt;/span&gt;

&lt;span class="c"&gt;# Set compaction trigger threshold
&lt;/span&gt;&lt;span class="py"&gt;log.cleaner.min.cleanable.ratio&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;0.5&lt;/span&gt;

&lt;span class="c"&gt;# Set compaction thread count
&lt;/span&gt;&lt;span class="py"&gt;log.cleaner.threads&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;Log compaction is best suited for the following scenarios:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Database Change Records
&lt;/h3&gt;

&lt;p&gt;Example of user information updates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial record: &lt;code&gt;key=1001, value=John&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Update record: &lt;code&gt;key=1001, value=John Smith&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;After compaction: &lt;code&gt;key=1001, value=John Smith&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. System Configuration Management
&lt;/h3&gt;

&lt;p&gt;Example of connection settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial config: &lt;code&gt;key=max_connections, value=100&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Updated config: &lt;code&gt;key=max_connections, value=200&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;After compaction: &lt;code&gt;key=max_connections, value=200&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. State Data Storage
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Maintain latest entity states&lt;/li&gt;
&lt;li&gt;Save storage space&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Important Considerations
&lt;/h2&gt;

&lt;p&gt;When using log compaction, keep these points in mind:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Messages Must Have Keys&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only messages with keys can be compacted&lt;/li&gt;
&lt;li&gt;Keyless messages will remain untouched&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Impact on System Performance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compaction process consumes system resources&lt;/li&gt;
&lt;li&gt;Configure parameters appropriately&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Message Order Guarantees&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Messages with the same key stay in order&lt;/li&gt;
&lt;li&gt;Ordering between different keys isn't guaranteed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Kafka's log compaction offers a smart way to manage our data retention needs. It's perfect for cases where we only need the latest state of your data, helping you save storage space while keeping your data accessible. When properly configured, it can significantly improve our Kafka cluster's efficiency.&lt;/p&gt;

&lt;p&gt;Related Topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-what-is-topic-and-partition" rel="noopener noreferrer"&gt;What are Topics and Partitions in Kafka?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-consumer-rebalance" rel="noopener noreferrer"&gt;How Does Kafka Consumer Rebalance Work?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Practice Message Queue Interview Questions Like LeetCode&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>kafka</category>
      <category>eventdriven</category>
      <category>pubsub</category>
    </item>
    <item>
      <title>What is a Kafka Consumer Group?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Mon, 24 Feb 2025 03:45:46 +0000</pubDate>
      <link>https://forem.com/clasnake/what-is-a-kafka-consumer-group-22bk</link>
      <guid>https://forem.com/clasnake/what-is-a-kafka-consumer-group-22bk</guid>
      <description>&lt;h2&gt;
  
  
  What is a Consumer Group?
&lt;/h2&gt;

&lt;p&gt;A Consumer Group is Kafka's mechanism for organizing consumers to collectively process messages from topics. It enables multiple Consumer instances to work together, providing horizontal scalability while ensuring partition-level ordering guarantees.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Concept
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F131vc4cyv4h9wusilsto.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F131vc4cyv4h9wusilsto.png" alt="Kafka Consumer Group" width="800" height="725"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features of Consumer Groups
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Consumption Division
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each partition is consumed by only one consumer within a group&lt;/li&gt;
&lt;li&gt;A single consumer can handle multiple partitions&lt;/li&gt;
&lt;li&gt;Automatic load balancing between group members&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Consumer Offset Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Consumption Progress Tracking:
├── Auto-commit: enable.auto.commit=true
└── Manual commit: enable.auto.commit=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Consumer Group Isolation
&lt;/h3&gt;

&lt;p&gt;Different Consumer Groups operate independently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic-A
├── Consumer Group 1: Order Processing
└── Consumer Group 2: Order Analytics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Message Broadcasting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;One message needs processing by multiple systems:
├── Group 1: Order System
├── Group 2: Logistics System
└── Group 3: Analytics System
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Load Balancing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When single consumer capacity is insufficient:
Topic: "User Registration"
├── Consumer 1: Processes 25% of messages
├── Consumer 2: Processes 25% of messages
├── Consumer 3: Processes 25% of messages
└── Consumer 4: Processes 25% of messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Consumer Group Configuration Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Basic Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Consumer Group ID
&lt;/span&gt;&lt;span class="py"&gt;group.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;order-processing-group&lt;/span&gt;

&lt;span class="c"&gt;# Commit mode
&lt;/span&gt;&lt;span class="py"&gt;enable.auto.commit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# Session timeout
&lt;/span&gt;&lt;span class="py"&gt;session.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;

&lt;span class="c"&gt;# Heartbeat interval
&lt;/span&gt;&lt;span class="py"&gt;heartbeat.interval.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Consumer Count Planning
&lt;/h3&gt;

&lt;p&gt;Here are the key principles for configuring Consumer count:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Minimum&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At least 1 Consumer is required&lt;/li&gt;
&lt;li&gt;Ensures coverage of all partitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maximum&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Should not exceed the total partition count&lt;/li&gt;
&lt;li&gt;Additional Consumers will remain idle&lt;/li&gt;
&lt;li&gt;Results in wasted system resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommended&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Formula: Partitions ÷ Single Consumer capacity&lt;/li&gt;
&lt;li&gt;Adjust according to actual load&lt;/li&gt;
&lt;li&gt;Maintain a 30% capacity buffer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Consumer Offset Commit Strategy
&lt;/h3&gt;

&lt;p&gt;Consumer Offset is a crucial mechanism for tracking consumption progress. Each Consumer must periodically commit its consumption position (offset) to Kafka to ensure correct recovery after restarts or failures.&lt;/p&gt;

&lt;p&gt;Two main commit strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-commit&lt;/strong&gt;: Handled automatically by Kafka client, simple but may lose messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual commit&lt;/strong&gt;: Developer controls commit timing, higher reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example of manual commit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Manual commit example&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Poll messages with 100ms timeout&lt;/span&gt;
    &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Process message&lt;/span&gt;
        &lt;span class="n"&gt;processMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// Manual commit after batch processing&lt;/span&gt;
    &lt;span class="c1"&gt;// commitSync() blocks until commit succeeds or fails&lt;/span&gt;
    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitSync&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Code explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;poll()&lt;/code&gt; to batch fetch messages&lt;/li&gt;
&lt;li&gt;Process each message in loop&lt;/li&gt;
&lt;li&gt;Commit offset only after all messages are processed&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;commitSync()&lt;/code&gt; for guaranteed commit results&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach, while slightly slower, ensures no message loss and is suitable for scenarios requiring high data reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Issues and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Too Many Consumers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Issue&lt;/strong&gt;: Having more Consumers than partitions leads to resource waste.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintain Consumer count at or below partition count&lt;/li&gt;
&lt;li&gt;If higher parallelism is needed, consider adding partitions&lt;/li&gt;
&lt;li&gt;Assess actual processing capacity requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Consumption Skew
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Some Consumers are overloaded while others are idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review partition assignment strategy&lt;/li&gt;
&lt;li&gt;Consider adding partitions for finer-grained load balancing&lt;/li&gt;
&lt;li&gt;Optimize message key distribution to avoid hot partitions&lt;/li&gt;
&lt;li&gt;Monitor Consumer capacity and load&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Duplicate Processing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Messages are processed multiple times, affecting business logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement message idempotency&lt;/li&gt;
&lt;li&gt;Use manual offset commit strategy&lt;/li&gt;
&lt;li&gt;Set appropriate commit intervals&lt;/li&gt;
&lt;li&gt;Implement business-level deduplication&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Consumer Groups are a key mechanism for Kafka's scalability and fault tolerance. Through proper configuration and usage of Consumer Groups, we can build more reliable message processing systems.&lt;/p&gt;

&lt;p&gt;Related Topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-consumer-rebalance" rel="noopener noreferrer"&gt;How Does Kafka Consumer Rebalance Work?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-producer-retry" rel="noopener noreferrer"&gt;How to Use Kafka Producer Retries?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Practice Message Queue Interview Questions Like LeetCode&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kafka</category>
      <category>programming</category>
      <category>eventdriven</category>
      <category>pubsub</category>
    </item>
    <item>
      <title>How to Use Kafka Producer Retries?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Fri, 21 Feb 2025 10:19:14 +0000</pubDate>
      <link>https://forem.com/clasnake/how-to-use-kafka-producer-retries-pp1</link>
      <guid>https://forem.com/clasnake/how-to-use-kafka-producer-retries-pp1</guid>
      <description>&lt;h2&gt;
  
  
  Why Do We Need Retries?
&lt;/h2&gt;

&lt;p&gt;In distributed systems, network failures and server outages are inevitable. Kafka is no exception. The Producer retry mechanism is designed specifically to handle these temporary failures gracefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Producer Retry Works?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkx110lc8awqvkxnxm2sg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkx110lc8awqvkxnxm2sg.png" alt="Kafka producer retry mechanism" width="800" height="811"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Configuration Parameters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. retries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Number of retry attempts
&lt;/span&gt;&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3                   # Retry 3 times&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. retry.backoff.ms
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Time between retries
&lt;/span&gt;&lt;span class="py"&gt;retry.backoff.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;100        # Base retry interval of 100ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: Since Kafka 2.1, Producers use exponential backoff by default. The actual wait time increases with each retry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1st retry: 100ms wait&lt;/li&gt;
&lt;li&gt;2nd retry: 200ms wait&lt;/li&gt;
&lt;li&gt;3rd retry: 400ms wait
This prevents unnecessary rapid retries during sustained failures.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. delivery.timeout.ms
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Total timeout for message delivery
&lt;/span&gt;&lt;span class="py"&gt;delivery.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;120000  # Wait up to 2 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. enable.idempotence
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable idempotence to prevent duplicate messages during retries
&lt;/span&gt;&lt;span class="py"&gt;enable.idempotence&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Strongly recommended for production environments. This ensures each message is written exactly once, even when retries occur.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Real-World Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Network Hiccup
&lt;/h3&gt;

&lt;p&gt;Here's how the retry mechanism handles network instability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Send message ❌ Network timeout
↓ 
Wait 100ms and retry
↓ 
Retry successful ✅ Message delivered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: Broker Failover
&lt;/h3&gt;

&lt;p&gt;When a broker fails, Kafka automatically elects a new leader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Send message ❌ Leader unavailable
↓ 
Wait 100ms (while leader election happens)
↓ 
Retry sending ✅ New leader online, message delivered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enable Idempotence&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prevents message duplication during retries&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;enable.idempotence=true&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set Reasonable Retry Limits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Based on your business requirements&lt;/li&gt;
&lt;li&gt;Avoid infinite retries&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Monitor Retry Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track retry counts&lt;/li&gt;
&lt;li&gt;Set up alerting thresholds&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;A well-configured retry mechanism ensures message reliability while avoiding performance issues from excessive retries. It's a crucial component of any robust messaging system.&lt;/p&gt;

&lt;p&gt;Related Topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-what-is-topic-and-partition" rel="noopener noreferrer"&gt;What are Kafka Topics and Partitions?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-consumer-rebalance" rel="noopener noreferrer"&gt;How Does Kafka Consumer Rebalancing Work?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>eventdriven</category>
      <category>kafka</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>How Does Kafka Consumer Rebalance Work?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Fri, 21 Feb 2025 03:40:51 +0000</pubDate>
      <link>https://forem.com/clasnake/how-does-kafka-consumer-rebalance-work-4h57</link>
      <guid>https://forem.com/clasnake/how-does-kafka-consumer-rebalance-work-4h57</guid>
      <description>&lt;h2&gt;
  
  
  What is Consumer Rebalance?
&lt;/h2&gt;

&lt;p&gt;When you run Kafka with multiple consumers, you'll need to handle Consumer Rebalance. It happens when Kafka needs to shuffle around which consumer reads from which partition - usually when consumers come and go from your consumer group. Think of it like redistributing work when people join or leave your team. While this keeps things running smoothly, doing it too often can slow everything down.&lt;/p&gt;

&lt;p&gt;Here's a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Initial Consumer Group State:
Consumer 1 --&amp;gt; Partition 0, 1
Consumer 2 --&amp;gt; Partition 2, 3

After Consumer 2 crashes:
Consumer 1 --&amp;gt; Partition 0, 1, 2, 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why do we need this?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load balancing&lt;/li&gt;
&lt;li&gt;High availability&lt;/li&gt;
&lt;li&gt;Fault tolerance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Triggers a Rebalance?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Consumer Group Membership Changes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You add a new consumer&lt;/li&gt;
&lt;li&gt;A consumer shuts down normally&lt;/li&gt;
&lt;li&gt;A consumer crashes unexpectedly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Topic Subscription Changes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Topic deletion&lt;/li&gt;
&lt;li&gt;Partition count changes&lt;/li&gt;
&lt;li&gt;Consumer subscription changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Manual Trigger by Admin
&lt;/h3&gt;

&lt;h2&gt;
  
  
  Rebalance Process
&lt;/h2&gt;

&lt;p&gt;Let's break down what happens during a rebalance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Phase 1: Group Membership Change
├── Consumers send JoinGroup request
├── Group Coordinator selects leader
└── Returns member info to leader

Phase 2: Partition Assignment
├── Leader determines assignment plan
├── Sends SyncGroup request
└── All members receive assignments

Phase 3: Start Consuming
├── Consumers get their partitions
├── Commit old offsets
└── Begin consuming from new partitions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Partition Assignment Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Range Strategy (Default)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic-A: 4 partitions
├── Consumer-1: Partition 0, 1
└── Consumer-2: Partition 2, 3

Good: Assigns nearby partitions together
Bad: Some consumers might get more work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. RoundRobin Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic-A: 4 partitions
├── Consumer-1: Partition 0, 2
└── Consumer-2: Partition 1, 3

Good: Each consumer gets equal work
Bad: Partitions are spread out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Sticky Strategy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Characteristics:
├── Shares work fairly
├── Keeps working assignments if possible
└── Moves partitions only when needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Optimization Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Proper Timeout Settings
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example configuration&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"session.timeout.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"10000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"heartbeat.interval.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"3000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"max.poll.interval.ms"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"300000"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Avoid Frequent Rebalancing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Set the right heartbeat timing&lt;/li&gt;
&lt;li&gt;Process messages quickly&lt;/li&gt;
&lt;li&gt;Use Static Membership when possible&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Monitoring and Alerts
&lt;/h3&gt;

&lt;p&gt;Watch out for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rebalance frequency&lt;/li&gt;
&lt;li&gt;Rebalance duration&lt;/li&gt;
&lt;li&gt;Consumer lag&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Issues and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Frequent Rebalancing
&lt;/h3&gt;

&lt;p&gt;Why it happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slow message processing&lt;/li&gt;
&lt;li&gt;Long GC pauses&lt;/li&gt;
&lt;li&gt;Network instability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fix it by:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Increase session.timeout.ms
2. Tune GC parameters
3. Enable Static Membership
4. Optimize message processing logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Slow Rebalance Process
&lt;/h3&gt;

&lt;p&gt;The usual suspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too many group members&lt;/li&gt;
&lt;li&gt;Too many subscribed topics&lt;/li&gt;
&lt;li&gt;Too many partitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Control consumer group size
2. Use multiple consumer groups
3. Optimize partition assignment strategy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Understanding Rebalance is key to maintaining a healthy Kafka cluster. You'll likely get asked about it as part of Kafka interview questions too. When running in production, make sure to monitor rebalance events closely, adjust configurations as needed, and keep a watchful eye on your metrics.&lt;/p&gt;

&lt;p&gt;Related Resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nootcode.com/knowledge/en/kafka-what-is-topic-and-partition" rel="noopener noreferrer"&gt;What are Kafka Topics and Partitions?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Practice Message Queue Interview Questions Like LeetCode&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kafka</category>
      <category>eventdriven</category>
      <category>programming</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>What are Topics and Partitions in Kafka?</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Thu, 20 Feb 2025 08:38:23 +0000</pubDate>
      <link>https://forem.com/clasnake/what-are-topics-and-partitions-in-kafka-31i4</link>
      <guid>https://forem.com/clasnake/what-are-topics-and-partitions-in-kafka-31i4</guid>
      <description>&lt;h2&gt;
  
  
  What is a Topic?
&lt;/h2&gt;

&lt;p&gt;A Topic is Kafka's fundamental building block for organizing messages. It's essentially a feed or channel where messages flow through. If Kafka were a post office, Topics would be like different mailboxes, each dedicated to a specific type of message.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Partition?
&lt;/h2&gt;

&lt;p&gt;Each Topic can be divided into multiple Partitions, which is a key feature for scalability. Think of it as splitting a busy highway into multiple lanes. Here's why Partitions are important:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Processing&lt;/strong&gt; - Each Partition operates independently, similar to multiple CPU cores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load Distribution&lt;/strong&gt; - Data is spread across your cluster, preventing single-server bottlenecks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Throughput&lt;/strong&gt; - Multiple Partitions enable concurrent operations for better performance&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Partition Storage Model
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic: "Order Messages"
├── Partition 0: [Order1] -&amp;gt; [Order2] -&amp;gt; [Order3]
├── Partition 1: [Order4] -&amp;gt; [Order5] -&amp;gt; [Order6]
└── Partition 2: [Order7] -&amp;gt; [Order8] -&amp;gt; [Order9]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each message in a Partition receives a unique offset number, which serves as its sequential identifier within that Partition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Partition Replication Mechanism
&lt;/h3&gt;

&lt;p&gt;For fault tolerance, Kafka maintains multiple copies of each Partition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Leader Replica&lt;/strong&gt; - The primary copy that handles all read/write operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follower Replicas&lt;/strong&gt; - Backup copies that maintain synchronization and provide failover capability
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Partition 0
├── Leader (Server 1)
├── Follower (Server 2)
└── Follower (Server 3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Producer Assignment Strategies
&lt;/h3&gt;

&lt;p&gt;Producers use several strategies to distribute messages across Partitions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Round-Robin&lt;/strong&gt; - Distributes messages evenly across Partitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key-Based&lt;/strong&gt; - Routes messages with the same key to the same Partition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Logic&lt;/strong&gt; - Implements specific routing rules based on business requirements&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Consumer Reading Patterns
&lt;/h3&gt;

&lt;p&gt;Consumer groups coordinate Partition reading through different assignment strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Range Assignment&lt;/strong&gt; - Allocates continuous Partition ranges to consumers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round-Robin Assignment&lt;/strong&gt; - Distributes Partitions evenly across consumers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sticky Assignment&lt;/strong&gt; - Maintains stable assignments to minimize rebalancing overhead&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Partition Sizing Guidelines&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calculate your expected message volume&lt;/li&gt;
&lt;li&gt;Consider your infrastructure capacity&lt;/li&gt;
&lt;li&gt;Formula: Partition count = (Target throughput/sec) ÷ (Single partition throughput)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Important Considerations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each Partition requires system resources&lt;/li&gt;
&lt;li&gt;Adding Partitions is straightforward, but removal is complex&lt;/li&gt;
&lt;li&gt;Excessive Partitions can impact cluster stability&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics to Watch&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consumer lag measurements&lt;/li&gt;
&lt;li&gt;Replica synchronization status&lt;/li&gt;
&lt;li&gt;Partition load distribution&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Proper Topic and Partition design is fundamental to a well-performing Kafka deployment. Consider your specific use case, plan your capacity requirements, and choose configurations that align with your performance needs.&lt;/p&gt;

&lt;p&gt;Visit &lt;a href="https://www.nootcode.com/problem-sets/message-queue-essentials" rel="noopener noreferrer"&gt;Message Queue Essentials&lt;/a&gt; to actively practice more Kafka interview questions. &lt;/p&gt;

</description>
      <category>kafka</category>
      <category>eventdriven</category>
      <category>programming</category>
    </item>
    <item>
      <title>Hello Dev.to!</title>
      <dc:creator>clasnake</dc:creator>
      <pubDate>Tue, 21 Jan 2025 03:38:28 +0000</pubDate>
      <link>https://forem.com/clasnake/hello-devto-50if</link>
      <guid>https://forem.com/clasnake/hello-devto-50if</guid>
      <description></description>
      <category>watercooler</category>
    </item>
  </channel>
</rss>
