<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tomer Ben David</title>
    <description>The latest articles on Forem by Tomer Ben David (@tomerbendavid).</description>
    <link>https://forem.com/tomerbendavid</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1528%2F47ca83b6-b329-434d-a98f-79851ae130ef.png</url>
      <title>Forem: Tomer Ben David</title>
      <link>https://forem.com/tomerbendavid</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tomerbendavid"/>
    <language>en</language>
    <item>
      <title>Choosing the Right Shortest Path Algorithm</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Sat, 11 Apr 2026 07:29:10 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/choosing-the-right-shortest-path-algorithm-17f5</link>
      <guid>https://forem.com/tomerbendavid/choosing-the-right-shortest-path-algorithm-17f5</guid>
      <description>&lt;p&gt;Shortest path problems on LeetCode vary by constraint. Graphs can have weights, no weights, single source focuses, or all pairs requirements. Some have positive costs and others have negative costs. &lt;/p&gt;

&lt;p&gt;Each specific situation has a corresponding algorithm. Understanding the constraints of the graph dictates the strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Identifying the Graph
&lt;/h2&gt;

&lt;p&gt;Before writing code, verify the terrain.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Clear Graph
&lt;/h3&gt;

&lt;p&gt;The first question is whether every step costs the same. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Calculating degrees of separation in a social network or moving between cells in a maze means the costs are uniform.&lt;/li&gt;
&lt;li&gt;  Dealing with traffic where one road takes 5 minutes and another takes 50, flight prices, or effort means each step has a unique cost. These are weighted graphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Disguised Graph
&lt;/h3&gt;

&lt;p&gt;Sometimes the problem hides the graph.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;The Matrix:&lt;/strong&gt; A 2D grid where each cell is a node and valid moves are edges. If moving to an adjacent cell costs 1, it is a simple BFS.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;State transitions:&lt;/strong&gt; Consider Word Ladder. Each word is a node and a one character difference is the edge. Since every transform costs 1, this is a BFS problem.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Resource management:&lt;/strong&gt; Problems like Cheapest Flights Within K Stops are weighted graphs requiring you to track cost while adhering to state constraints.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Selection Logic
&lt;/h2&gt;

&lt;p&gt;Select the algorithm based on what the graph requires. &lt;/p&gt;

&lt;h3&gt;
  
  
  BFS
&lt;/h3&gt;

&lt;p&gt;If every step costs the same, use Breadth-First Search. The first time the search reaches a node is the shortest path. &lt;/p&gt;

&lt;h3&gt;
  
  
  Dijkstra
&lt;/h3&gt;

&lt;p&gt;When roads have different lengths but they are all positive, use Dijkstra. A discovery at one point in the search assumes no future path through a positive weight road can make it better. &lt;/p&gt;

&lt;h3&gt;
  
  
  Bellman-Ford
&lt;/h3&gt;

&lt;p&gt;If a path provides a negative cost, Dijkstra fails. Bellman-Ford handles negative weights and detects cycles where a path keeps getting cheaper forever. &lt;/p&gt;

&lt;h3&gt;
  
  
  Floyd-Warshall
&lt;/h3&gt;

&lt;p&gt;If the problem requires the shortest path from every node to every other node, use Floyd-Warshall. This checks every node as a possible layover to solve for all pairs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Escalation of Power
&lt;/h2&gt;

&lt;p&gt;As graph rules become more complex, the algorithms become heavier. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  BFS is fastest but cannot handle weights.&lt;/li&gt;
&lt;li&gt;  Dijkstra handles weights but requires a priority queue and fails on negative costs.&lt;/li&gt;
&lt;li&gt;  Bellman-Ford handles negatives and cycles but uses repeated loops.&lt;/li&gt;
&lt;li&gt;  Floyd-Warshall handles all pairs but uses triple nested loops.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Brute Force Hack
&lt;/h2&gt;

&lt;p&gt;You do not always need the most efficient algorithm to pass the interview. If you struggle to implement the minHeap logic for Dijkstra, use Bellman-Ford as a brute force alternative. &lt;/p&gt;

&lt;p&gt;You do not need a priority queue. Take the core idea of edge relaxation.&lt;/p&gt;

&lt;p&gt;Each pass through all edges discovers the shortest path using one additional edge. The first pass finds shortest paths with one edge, the second pass finds shortest paths with two edges, and so on. Since a shortest path in a graph of $V$ nodes can have at most $V-1$ edges, this ensures every node is covered. It is two nested loops and handles everything Dijkstra can.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The brute force alternative
# n: number of nodes, edges: list of (u, v, weight)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;shortest_path_hack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inf&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="c1"&gt;# Its just a nested loop and you could pass the in without Dijkstra.
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;

    &lt;span class="c1"&gt;# No need to handle weighted and negative edges.
&lt;/span&gt;    &lt;span class="c1"&gt;# We skip this part of belman ford. Quick Win. Two Birds.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Originally published at: &lt;a href="https://looppass.mindmeld360.com/blog/choosing-shortest-path-algorithm/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/choosing-shortest-path-algorithm/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>interview</category>
      <category>career</category>
      <category>algorithms</category>
      <category>faang</category>
    </item>
    <item>
      <title>System Design Interview - Designing from Invariants</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Wed, 08 Apr 2026 06:50:13 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/system-design-interview-designing-from-invariants-3ede</link>
      <guid>https://forem.com/tomerbendavid/system-design-interview-designing-from-invariants-3ede</guid>
      <description>&lt;h2&gt;
  
  
  Designing from Invariants
&lt;/h2&gt;

&lt;p&gt;Software architecture is frequently treated as an exercise in connecting infrastructure components. We often reach for Kafka, Redis, or microservice boundaries as if they are the building blocks of the business logic itself. But when tools come before logic, the resulting design prioritizes infrastructure choices over the problem they are meant to solve.&lt;/p&gt;

&lt;p&gt;A high reliability system does not start with a distributed queue or a complex workflow engine. It starts with the core constraints the invariants that make the system reliable. If you start by choosing your infrastructure before you have defined the logic that keeps your data correct, you are building complexity on an undefined foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Distribution Trap
&lt;/h2&gt;

&lt;p&gt;Most designs become unmanageable because they assume every step of a business process must be distributed across new infrastructure from the beginning. &lt;/p&gt;

&lt;p&gt;In this style of design, the business logic is spread across a database, a queue, and a workflow engine. To answer a simple question like &lt;em&gt;"What is the state of this payment?"&lt;/em&gt;, you have to reconstruct the story from multiple logs. This introduces the Dual Write problem where a database update succeeds but a message publish fails before the system has even achieved its basic purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coherence as the Minimal Solution
&lt;/h2&gt;

&lt;p&gt;The strongest designs identify the &lt;strong&gt;Invariants&lt;/strong&gt; first. An invariant is a statement that must always be true for the business to be valid. For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cleared risk decision must never exist without an authoritative payment record."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the business rules require two things to change together to be valid, the simplest and most robust solution is to keep them in the same transaction. &lt;/p&gt;

&lt;p&gt;This logical anchor is the &lt;strong&gt;Transactional Center&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;The core state machine for an important process should have one queryable home, usually a relational database like Postgres. By starting here, you eliminate entire classes of distributed system bugs. You can scale the system outward later, but the authority remains in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling without Scattering
&lt;/h2&gt;

&lt;p&gt;Scaling should be a reaction to a requirement, not a default architecture. The &lt;strong&gt;Four Plane Model&lt;/strong&gt; provides a way to distribute workloads without losing the source of truth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plane 1 Transactional Truth
&lt;/h3&gt;

&lt;p&gt;This is the core. It owns the current state, the audit trail, and the records used to reliably notify the rest of the system. &lt;/p&gt;

&lt;h3&gt;
  
  
  Plane 2 Action Systems
&lt;/h3&gt;

&lt;p&gt;These are Kafka workers and background jobs. They &lt;strong&gt;react&lt;/strong&gt; to the truth committed in Plane 1. Asynchronous tasks like notifications or external fraud checks happen here without slowing down the core transaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plane 3 Real Time Reads
&lt;/h3&gt;

&lt;p&gt;When you need fast dashboards, move those reads to a specialized replica like ClickHouse. This keeps analytical traffic from overwhelming the transactional core.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plane 4 Historical Analytics
&lt;/h3&gt;

&lt;p&gt;This is for deep history and data science (BigQuery or Snowflake). It stays completely separate from the operational system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing Your Path
&lt;/h2&gt;

&lt;p&gt;The decision to distribute should always follow the logic of the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with a Transactional Center when
&lt;/h3&gt;

&lt;p&gt;Consistency is part of the business value. If a payment must be atomic with an order update, keep them together. This is the simplest possible solution and the most resilient to failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extend to Distributed Choreography when
&lt;/h3&gt;

&lt;p&gt;Domains are truly independent or you have reached a scale where a single database cannot handle the write volume. Use patterns like Sagas only when the local boundary can no longer support the technical requirements of the system.&lt;/p&gt;

&lt;p&gt;A resilient system starts by identifying the center. Ask one question: &lt;strong&gt;Where is the authority?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Originally published at: &lt;a href="https://looppass.mindmeld360.com/blog/system-design-transactional-center/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/system-design-transactional-center/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>architecture</category>
      <category>distributedsystems</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Memory Types in LangChain</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Sun, 15 Mar 2026 09:18:43 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/memory-types-in-langchain-4l0n</link>
      <guid>https://forem.com/tomerbendavid/memory-types-in-langchain-4l0n</guid>
      <description>&lt;h3&gt;
  
  
  Ever felt like your LLM needs a memory?
&lt;/h3&gt;

&lt;p&gt;LangChain felt the same thing. From full chat transcripts to summaries, entities, and vector backed recall, it gives you several ways to make a stateless model feel like it actually remembers what matters.&lt;/p&gt;

&lt;p&gt;Large Language Models are inherently stateless. Every request you send arrives as a blank slate with no recollection of what was discussed five minutes ago. To create a coherent conversation, the system must manually feed previous messages back into the model. &lt;/p&gt;

&lt;p&gt;LangChain provides several distinct patterns for managing this history. Choosing the right one is a balance between providing perfect context and managing the cost of every token.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain Memory Types
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;strong&gt;Transcript Pattern&lt;/strong&gt; for quick, high precision support tasks.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Window Pattern&lt;/strong&gt; for predictable, task oriented interactions.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Summary Pattern&lt;/strong&gt; for long, creative, or collaborative sessions.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Entity Pattern&lt;/strong&gt; for personal assistants that track user preferences.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Vector Retrieval&lt;/strong&gt; Pattern for knowledge intensive systems with vast histories.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Transcript Pattern
&lt;/h3&gt;

&lt;p&gt;The simplest way to maintain a conversation is through a direct buffer. This stores every word exactly as it was spoken in a sequential list.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every message from the user and every response from the AI is saved verbatim.&lt;/li&gt;
&lt;li&gt;The entire history is appended to the prompt for the next turn.&lt;/li&gt;
&lt;li&gt;It provides the model with the most accurate and raw context possible.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationBufferMemory&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationBufferMemory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_context&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The capital of France is Paris.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_memory_variables&lt;/span&gt;&lt;span class="p"&gt;({})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example of this is a customer support bot helping a user reset a password. The bot needs to remember the specific email address and the error code mentioned two sentences ago to provide a precise solution. While excellent for short interactions, this does not scale for long sessions where the prompt becomes massive.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Window Pattern
&lt;/h3&gt;

&lt;p&gt;To solve the scaling issue of a raw buffer, we can use a sliding window. This strategy only keeps the most recent portion of the conversation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system only remembers the last few interactions, defined by a fixed count.&lt;/li&gt;
&lt;li&gt;Older segments are discarded as new ones arrive.&lt;/li&gt;
&lt;li&gt;This keeps the prompt size and API costs predictable.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationBufferWindowMemory&lt;/span&gt;

&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationBufferWindowMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_context&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I live in London&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;London is a great city.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_context&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the weather like?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It is currently rainy in London.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A weather assistant is a perfect candidate for this pattern. If you ask for the forecast in London and then ask "What about tomorrow?", the bot only needs the most recent context to understand that you are still talking about London. It does not need to remember that you asked about the news ten minutes ago.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Summary Pattern
&lt;/h3&gt;

&lt;p&gt;For very long term dialogues, a summarization strategy is more effective. Instead of saving every word, the system maintains a running overview of the discussion.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After each interaction, the system updates a concise summary of the key points.&lt;/li&gt;
&lt;li&gt;Only this summary is sent to the primary model as context.&lt;/li&gt;
&lt;li&gt;It handles massive transcripts while keeping the context size relatively flat.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationSummaryMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationSummaryMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_context&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain the plot of Inception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inception is about dreams within dreams...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consider a creative writing assistant helping you plot a novel. Over several hours, you might discuss dozens of characters and plot points. Instead of feeding the whole transcript, the system carries a summary that tracks the main objective and the current state of the story.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Entity Pattern
&lt;/h3&gt;

&lt;p&gt;Some applications require remember specific facts about people or technical concepts without carrying the entire dialogue.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system extracts key participants or topics mentioned in the chat.&lt;/li&gt;
&lt;li&gt;It builds a structured knowledge base about these specific items.&lt;/li&gt;
&lt;li&gt;Relevant facts are pulled from storage when the topic resurfaces.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConversationEntityMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationEntityMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_context&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My name is Tomer and I use Kotlin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Nice to meet you Tomer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An example is a personalized coding coach. If you mention that you prefer a specific library like React or a particular cloud provider, the system stores that fact. When you later ask for a code sample, it automatically applies those preferences without needing to reread the original transcript.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Vector Retrieval Pattern
&lt;/h3&gt;

&lt;p&gt;The most advanced method involves treating the conversation like a database. This allows the model to recall information from any point in the history based on semantic relevance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Past message snippets are stored in a vector database.&lt;/li&gt;
&lt;li&gt;The system performs a search based on the current user query.&lt;/li&gt;
&lt;li&gt;It retrieves only the most relevant historical segments.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorStoreRetrieverMemory&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.docstore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InMemoryDocstore&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;

&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;embed_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IndexFlatL2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nc"&gt;InMemoryDocstore&lt;/span&gt;&lt;span class="p"&gt;({}),&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorStoreRetrieverMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the ideal choice for an AI researcher. If you are discussing a series of academic papers over several weeks, the model can pull a specific detail from a conversation you had ten days ago because it is semantically related to your current question.&lt;/p&gt;




&lt;p&gt;Originally published at: &lt;a href="https://looppass.mindmeld360.com/blog/langchain-memory-types/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/langchain-memory-types/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>langchain</category>
      <category>coding</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Battle Between RAG and Long Context</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Fri, 13 Mar 2026 06:27:21 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/the-battle-between-rag-and-long-context-4ilc</link>
      <guid>https://forem.com/tomerbendavid/the-battle-between-rag-and-long-context-4ilc</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Large Language Models arrive with a fundamental limitation known as the knowledge cutoff. They are experts on the world as it existed during their training phase but they are completely blind to your private data or events that happened this morning. Whether it is an internal wiki or a complex codebase, the model cannot see what it was not trained on. To make these systems useful for building products, we have to solve the problem of context injection.&lt;/p&gt;

&lt;p&gt;The industry is currently split between two competing philosophies for solving this. One is a complex engineering pipeline while the other is a brute force architectural shift.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Engineering Complexity of Retrieval Augmented Generation
&lt;/h3&gt;

&lt;p&gt;Retrieval Augmented Generation is the established path for providing context. It works by turning your entire knowledge base into a searchable index. You break your documents into small pieces and store them in a vector database as numerical maps. When a user submits a query, the system performs a semantic search to find the most relevant snippets and hands them to the model for processing.&lt;/p&gt;

&lt;p&gt;This remains the essential strategy for massive datasets. If you have ten million technical specifications, you cannot possibly cram them all into a single prompt. This approach acts as a smart filter that protects the model from information overload. It is also more cost efficient for high volume systems because you only pay to process a few hundred words of context instead of millions of tokens every time. &lt;/p&gt;

&lt;p&gt;However, this method introduces a retrieval lottery. If your search logic fails to find the exact piece of information required, the model will never see it. You are essentially gambling that your search engine is smart enough to find the needle in a global haystack.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Simplicity of Long Context Brute Force
&lt;/h3&gt;

&lt;p&gt;A newer alternative is to use models with massive context windows. Instead of building a complex database and retrieval pipeline, you simply paste your entire dataset directly into the prompt. This has been called the no stack stack because it removes the need for infrastructure like vector databases and embedding models entirely.&lt;/p&gt;

&lt;p&gt;The primary advantage here is global reasoning. When you give the model every word of the source material, you eliminate the risk of the retrieval lottery. This is superior for tasks that require seeing the whole picture. For example, if you are analyzing a series of incident reports from a distributed system to find a recurring pattern, you want the model to see every log entry simultaneously. In a traditional retrieval system, the search might pull out isolated errors but miss the subtle connection between a load balancer change on Monday and a latency spike on Thursday. By providing the entire history at once, you allow the model to detect deep architectural threads.&lt;/p&gt;

&lt;p&gt;The downside is the token tax. You pay the price for every word in your knowledge base on every single turn. These systems can also suffer from attention dilution. When you overwhelm a model with too much information, it may start to ignore or misinterpret details that are buried in the middle of a massive block of text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Navigating the Infinite Data Problem
&lt;/h3&gt;

&lt;p&gt;For many enterprise environments, the data lake is effectively infinite. A million tokens might sound like a lot, but it is a drop in the ocean compared to the size of a global corporate knowledge base. In these scenarios, retrieval is not just an option but a structural necessity. You cannot brute force a petabyte of data into a prompt regardless of how large the context window becomes.&lt;/p&gt;

&lt;p&gt;The choice comes down to the boundaries of your problem. You should use the long context approach for bounded datasets that require deep and interconnected reasoning across every page. You should stick with the engineering approach when you need to navigate vast libraries of information where efficiency and noise reduction are the highest priorities.&lt;/p&gt;




&lt;p&gt;Originally posted at: &lt;a href="https://looppass.mindmeld360.com/blog/rag-vs-long-context-strategy/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/rag-vs-long-context-strategy/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>Comparing LangChain, CrewAI, and ADK</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Thu, 12 Mar 2026 08:14:40 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/comparing-langchain-crewai-and-adk-491j</link>
      <guid>https://forem.com/tomerbendavid/comparing-langchain-crewai-and-adk-491j</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In the current gold rush of Agentic AI, developers are often caught in Framework Fatigue. Every week, a new library claims to be the standard for building autonomous agents. &lt;/p&gt;

&lt;p&gt;The question isn't only about which tool is most popular or which architecture solves a function. Different projects have different requirements, so the real challenge is finding the architecture that best matches your specific needs and your unique friction. &lt;/p&gt;

&lt;p&gt;You also have to balance this with your instincts about which framework might catch on as the de facto industry standard. If one of them wins, you want to be on the right side of that curve without sacrificing your specific goals today.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Coding and Building Your Own Orchestration
&lt;/h3&gt;

&lt;p&gt;Before we talk about ready made frameworks like LangChain or ADK, we have to acknowledge how the landscape has changed. In the era of AI coding, you don't necessarily need a massive library to get ahead. You can build your own bespoke orchestration layer that fits your project exactly.&lt;/p&gt;

&lt;p&gt;When you take the custom orchestration route, you are essentially solving three core technical challenges on your own terms.&lt;/p&gt;

&lt;p&gt;First is the &lt;strong&gt;Parsing Tax&lt;/strong&gt;. You need a way to ensure the AI returns structured data like JSON instead of just a paragraph of text. Today this is often solved with simple system prompts or native model features.&lt;/p&gt;

&lt;p&gt;Second is &lt;strong&gt;State Management&lt;/strong&gt;. You have to decide how the system remembers previous steps without overflowing the context window. &lt;/p&gt;

&lt;p&gt;Third is &lt;strong&gt;Loop Control&lt;/strong&gt;. You need a safety mechanism so an autonomous agent doesn't get stuck in a thought loop and burn through API credits.&lt;/p&gt;

&lt;p&gt;The choice today isn't about whether you can build an agent without a library. You definitely can, and for many uncommon projects, building your own thin orchestration is the best way to avoid unnecessary bloat.&lt;/p&gt;

&lt;h3&gt;
  
  
  LangChain and the Modular Lego Set
&lt;/h3&gt;

&lt;p&gt;LangChain was the first to standardize the chaos. It treated AI workflows like a pipeline or a "Chain."&lt;/p&gt;

&lt;p&gt;The philosophy here is modularity. Everything is a &lt;strong&gt;component&lt;/strong&gt;, including prompts, models, output parsers, and tools. &lt;/p&gt;

&lt;p&gt;If you need to take a PDF, turn it into vectors, and ask a question, LangChain has a plug for every single part of that process.&lt;/p&gt;

&lt;p&gt;The critique for many is that it became a &lt;strong&gt;"Thick Platform."&lt;/strong&gt; The abstractions can sometimes be harder to debug than the raw code itself. It is a massive toolkit that occasionally forces you to learn the LangChain way instead of the standard software engineering way.&lt;/p&gt;

&lt;h3&gt;
  
  
  CrewAI and the Collaborative Storyteller
&lt;/h3&gt;

&lt;p&gt;As we moved from single chains to Multi Agent Systems, CrewAI arrived with a different mental model of Role Playing.&lt;/p&gt;

&lt;p&gt;The philosophy is simple. Don't just give an agent a tool but give it a job. You define a Researcher, a Writer, and a Manager.&lt;/p&gt;

&lt;p&gt;It is important to understand that CrewAI is actually built on top of LangChain. It uses the foundational pieces of LangChain to handle the heavy lifting of LLM communication and tool execution while adding the collaborative crew logic on top. It is best for content creation or complex research because it excels at delegating tasks between agents. In these scenarios, it feels less like coding a system and more like managing a crew.&lt;/p&gt;

&lt;p&gt;The critique is that because it sits on top of LangChain, it inherits all of that platform's complexity. It is excellent for story driven workflows but can feel like it has too much magic under the hood for high precision systems engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  ADK or Google’s Agent Builder Kit
&lt;/h3&gt;

&lt;p&gt;ADK is the production first response. Unlike CrewAI, ADK is a standalone stack that doesn't rely on LangChain. It is a clean slate alternative.&lt;/p&gt;

&lt;p&gt;The philosophy treats agents as independent tools that you can plug into any system. It prioritizes writing real code and testing everything on your own machine before going live. While other frameworks can do hierarchy, ADK makes it a core structural primitive by treating entire agents as modular tools that a primary agent can call. It feels much like a system of nested microservices.&lt;/p&gt;

&lt;p&gt;This is the best case for enterprise environments where observability and Agent to Agent communication are critical. It’s optimized for Gemini but stays model agnostic via LiteLLM.&lt;/p&gt;

&lt;p&gt;The real strength here is that it treats an agent as a unit of deployment. This means the agent isn't just a variable in your code but a standalone service you can ship independently. For example, if you have a Pricing Agent. In a traditional library, that agent is just a function call inside your main application. If you want to update it, you have to redeploy your entire app. With ADK, that Pricing Agent is a standalone service with its own endpoint. You can update it, test it, or scale it without ever touching your main product code. It covers the entire engineering lifecycle, which includes professional evaluation, automated deployment, and production monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  One Weather Task and Three Different Mental Models
&lt;/h3&gt;

&lt;p&gt;To see the difference clearly, lets say we want an agent to check the weather and suggest an outfit. Each framework approaches this differently.&lt;/p&gt;

&lt;p&gt;With LangChain, you build a chain of thought. You create a weather tool, give it to an agent executor, and the system runs a loop until it reaches the final answer. You are essentially building a custom logic path.&lt;/p&gt;

&lt;p&gt;With CrewAI, you would hire a Weather Expert and a Fashion Stylist. You define their roles and backstories, then assign them a task to collaborate. The Researcher finds the data and the Stylist uses it. You are managing a team meeting.&lt;/p&gt;

&lt;p&gt;With ADK, you define a Weather Service as a tool. You create a Weather Agent as a modular unit. Because it is hierarchical, you might have a primary Assistant Agent that simply delegates the request to that specialized unit. In this model, these agents can behave like actual web services that you communicate with via REST APIs. You have total flexibility here. You can choose to run every agent on a single monolithic server if your project is small or you can choose to have specialized agents living on different machines entirely. This allows your system to grow from a simple monolith into a network of independent services that you can update and scale one by one without touching the rest of the codebase. You are architecting for future growth instead of being locked into a single monolithic script.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Framework Paradox: Avoiding the J2EE Trap
&lt;/h3&gt;

&lt;p&gt;In software history, we often see a pendulum swing between "lightweight libraries" and "heavy platforms." For those who remember the early days of Enterprise Java, the term &lt;strong&gt;J2EE&lt;/strong&gt; often brings back memories of "Thick Platforms" that were so heavy you spent more time configuring the framework than writing the business logic.&lt;/p&gt;

&lt;p&gt;The risk with AI frameworks today is falling into that same trap. You start with a tool meant to simplify a task, but as the framework grows to cover every possible edge case, it introduces so much architectural weight that it becomes a burden. &lt;/p&gt;

&lt;p&gt;There is a delicate balance to strike. You want enough abstraction to be productive, but not so much that you lose sight of the underlying LLM calls. If you find yourself spending days trying to figure out how to "pass a variable the framework way" instead of just writing a function, you might be carrying too much weight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing the Right Path for Your Agent Architecture
&lt;/h3&gt;

&lt;p&gt;If you’ve followed my work at MindMeld360, you know I’m wary of Thick Platforms, but the truth is there is no single winner in this space yet. The industry is currently obsessed with finding the perfect library, but the real engineering task is matching the right abstraction level to each specific service you build.&lt;/p&gt;

&lt;p&gt;LangChain is a library of parts. CrewAI is a framework for behavior. ADK is a kit for modular systems.&lt;/p&gt;

&lt;p&gt;My advice is to start by playing with a custom and thin orchestration layer. You have to understand the problem space first and truly feel the pain that these frameworks are trying to solve. Once you gain your own intuition through a bespoke solution, you can incorporate existing libraries to handle the heavy lifting.&lt;/p&gt;

&lt;p&gt;Do not try to build your own massive agent library from scratch for production since these tools are already heavily used and battle tested. Instead, use a stage based approach to grow your experience.&lt;/p&gt;

&lt;p&gt;Start custom to feel the domain. Then build your next service with LangChain to see the ecosystem and the drawbacks for yourself.&lt;/p&gt;

&lt;p&gt;From there, you can choose the right tool for each job. Use LangChain when you want a common and widely supported library. Use CrewAI when you need a higher level of agent collaboration. Use ADK when you want to distribute your agents as independent services across a network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Closing Note
&lt;/h3&gt;

&lt;p&gt;By the time this post has been published, we probably already have 5 more libraries to explore! The pace of AI is relentless, but that’s not a bad thing it just means more tools for us to master. More blog posts to come on those, so stay tuned! :)&lt;/p&gt;




&lt;p&gt;Originally published at: &lt;a href="https://looppass.mindmeld360.com/blog/ai-frameworks-langchain-crewai-adk/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/ai-frameworks-langchain-crewai-adk/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>engineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Load Balancing &amp; WebSockets (L4 vs L7)</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Mon, 09 Mar 2026 13:01:24 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/load-balancing-websockets-l4-vs-l7-5b94</link>
      <guid>https://forem.com/tomerbendavid/load-balancing-websockets-l4-vs-l7-5b94</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When you build a standard web app, load balancing is usually straightforward because every request is independent. You just spread the traffic around. But once you introduce WebSockets, everything changes. You are no longer dealing with quick requests. You are managing a persistent pipe that might stay open for hours.&lt;/p&gt;

&lt;p&gt;The first thing to understand is that WebSockets can work on either Layer 4 or Layer 7. There is no hard rule requiring one over the other. Every load balancer can pass a pocket of bits through. The difference is entirely in how the device treats the connection once it is established.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Layer 4 handles the traffic
&lt;/h2&gt;

&lt;p&gt;Since WebSockets are built on top of TCP, a Layer 4 load balancer can handle them perfectly. Think of this balancer as a high speed postman who only reads the house number on the envelope. He doesn't know he is routing WebSockets or HTTP. He just sees a raw TCP connection request on a specific port and blindly forwards that stream to a backend server.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;This approach works at the TCP level so it is incredibly efficient.&lt;/li&gt;
&lt;li&gt;The initial HTTP Upgrade request passes right through the load balancer. The backend server itself handles the handshake and the SSL termination.&lt;/li&gt;
&lt;li&gt;It can handle millions of simultaneous connections without breaking a sweat because it doesn't have to decrypt SSL or parse headers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The main downside mentioned in architectural circles is the NAT trap. Because a Layer 4 balancer only sees IP addresses and ports, it often relies on the source IP to kept the connection sticky. If you have thousands of users in a single office building all sharing one public IP address, the balancer might accidentally send every single one of them to the same backend server. That server will quickly get overwhelmed while the rest of your fleet sits idle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The intelligence of Layer 7
&lt;/h2&gt;

&lt;p&gt;A Layer 7 load balancer operates at the Application layer and actually understands the HTTP protocol. It is more like a sophisticated concierge who opens the mail to understand exactly who it is for and what they need. This balancer intercepts the traffic, decrypts the SSL, and reads the HTTP headers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It explicitly sees the Upgrade and Connection headers that define a WebSocket.&lt;/li&gt;
&lt;li&gt;Because it reads the headers and cookies, it can route users based on session IDs rather than IP addresses. This completely avoids the NAT trap because every user has a unique cookie even if they share an IP.&lt;/li&gt;
&lt;li&gt;You can use path based routing to send specific types of traffic to different server groups. You could send chat traffic to one group and live feeds to another.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The performance trade off here is significant. The balancer has to maintain the state of the persistent WebSocket connection while continuously proxying the decrypted frames back and forth. This requires significantly more RAM and CPU than a simpler Layer 4 setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hybrid approach for global scale
&lt;/h2&gt;

&lt;p&gt;Many massive global applications like Discord and Slack do not choose just one layer. They use a hybrid approach that provides the best of both worlds. They place highly resilient hardware Layer 4 balancers at the network edge to absorb massive traffic spikes and defend against DDoS attacks.&lt;/p&gt;

&lt;p&gt;These edge balancers then distribute the traffic to a internal fleet of software based Layer 7 balancers like NGINX or HAProxy. This second fleet handles the smart routing and the persistence needed for the WebSocket lifecycle. This layered strategy provides the raw horsepower to handle the initial connection and the intelligence to manage the application state once it is established.&lt;/p&gt;




&lt;p&gt;Originally published at: &lt;a href="https://looppass.mindmeld360.com/blog/load-balancing-websockets-l4-l7/" rel="noopener noreferrer"&gt;https://looppass.mindmeld360.com/blog/load-balancing-websockets-l4-l7/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>network</category>
      <category>architecture</category>
      <category>cloud</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>How to Actually use Python's heapq for Kth Largest Problems</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Sun, 08 Mar 2026 09:41:45 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/how-to-actually-use-pythons-heapq-for-kth-largest-problems-5138</link>
      <guid>https://forem.com/tomerbendavid/how-to-actually-use-pythons-heapq-for-kth-largest-problems-5138</guid>
      <description>&lt;p&gt;If you're using Python for coding interviews, &lt;code&gt;heapq&lt;/code&gt; is your best choice for priority queues. But it has a massive quirk that trips up almost everyone. &lt;strong&gt;It only supports min heaps.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you try to use &lt;code&gt;heapq.heapify_max()&lt;/code&gt;, your code will crash on most platforms (it's not fully public until Python 3.14).&lt;/p&gt;

&lt;p&gt;So, how do you find the Kth &lt;em&gt;largest&lt;/em&gt; element if you only have a &lt;em&gt;min&lt;/em&gt; heap? &lt;/p&gt;

&lt;p&gt;There is a brute force way, and there is the way interviewers actually want to see.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brute force with negation
&lt;/h2&gt;

&lt;p&gt;Since &lt;code&gt;heapq&lt;/code&gt; always puts the smallest element at index 0, you can fake a max heap by making all your numbers negative. The largest positive number becomes the smallest negative number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;heapq&lt;/span&gt;

&lt;span class="n"&gt;nums&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;max_heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;heapq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heapify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_heap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The root is now -6
&lt;/span&gt;&lt;span class="n"&gt;largest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;max_heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works fine for small arrays. But if an interviewer asks you to get the top 100 values from a stream of a billion numbers, storing every single number in memory is extremely inefficient. You need a better strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The efficient Min heap strategy
&lt;/h2&gt;

&lt;p&gt;Instead of putting all the numbers into a max heap, put exactly &lt;code&gt;K&lt;/code&gt; numbers into a min heap.&lt;/p&gt;

&lt;p&gt;Think of it like keeping a running "Top 10" list. The root of a min heap (&lt;code&gt;heap[0]&lt;/code&gt;) is always the smallest element. If your heap is exactly size &lt;code&gt;K&lt;/code&gt;, the root is the smallest of your top &lt;code&gt;K&lt;/code&gt; numbers. &lt;/p&gt;

&lt;p&gt;As you stream through the rest of the data, if you see a new number that is bigger than your root, it belongs in the Top K. You kick the root out, and put the new number in.&lt;/p&gt;

&lt;p&gt;First, you start by creating a heap with only the first &lt;code&gt;K&lt;/code&gt; elements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;heapq&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_kth_largest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Start our list with the first K elements
&lt;/span&gt;    &lt;span class="n"&gt;heap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;heapq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heapify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you iterate through the remaining numbers. If a new number is larger than the root of our heap, it means the root is no longer in the Top K. You replace it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Go through the rest of the numbers
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;heapq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heapreplace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, the root of your heap will be the Kth largest element overall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;heap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Interviewers Care
&lt;/h2&gt;

&lt;p&gt;This exact pattern solves the massive streaming data problem perfectly. &lt;/p&gt;

&lt;p&gt;Because you only ever store &lt;code&gt;K&lt;/code&gt; elements at a time, your Space Complexity is &lt;code&gt;O(K)&lt;/code&gt;. It takes virtually zero memory. &lt;/p&gt;

&lt;p&gt;Your Time Complexity is &lt;code&gt;O(N log K)&lt;/code&gt;. You look at every number once (&lt;code&gt;N&lt;/code&gt;), and occasionally do a heap replacement operation that takes logarithmic time based on the small size of &lt;code&gt;K&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So next time you are asked for the &lt;code&gt;K&lt;/code&gt; largest items, do not reach for a max heap. Use a min heap, cap it at size &lt;code&gt;K&lt;/code&gt;, and only let the big numbers in.&lt;/p&gt;

</description>
      <category>python</category>
      <category>interview</category>
      <category>career</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>Integrating Local GenAI into Desktop Applications: Lessons from RexIDE</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Wed, 04 Feb 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/integrating-local-genai-into-desktop-applications-lessons-from-rexide-1l14</link>
      <guid>https://forem.com/tomerbendavid/integrating-local-genai-into-desktop-applications-lessons-from-rexide-1l14</guid>
      <description>&lt;p&gt;&lt;strong&gt;How we navigated the engineering challenges of embedding local AI models and agentic CLIs directly into a native desktop environment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RexIDE started as a personal frustration.&lt;/p&gt;

&lt;p&gt;Modern IDEs are powerful, but they weren’t designed for a world where AI agents are &lt;em&gt;active participants&lt;/em&gt; in your workflow. They assume short lived commands, stateless tools, and human only context switching. That model breaks down the moment you introduce long running AI agents, real terminals, and multi project execution.&lt;/p&gt;

&lt;p&gt;This post walks through how RexIDE was designed, the tradeoffs behind its architecture, and why a &lt;strong&gt;local first, execution centric&lt;/strong&gt; approach became the core principle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Persistent Terminal State
&lt;/h2&gt;

&lt;p&gt;The primary goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep context alive, across projects, terminals, and AI agents, without forcing the developer to think about infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That goal immediately shaped every technical decision that followed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Tradeoffs of Local Execution
&lt;/h2&gt;

&lt;p&gt;One of the earliest decisions was whether AI execution should happen in the cloud or directly on the developer’s machine. Cloud models offer excellent quality, but they introduce friction through API keys and billing management, trust concerns around proprietary code, and a heavy dependency on latency and availability.&lt;/p&gt;

&lt;p&gt;Local models remove those concerns entirely. They keep code on the machine, work offline, and feel instant when integrated correctly.&lt;/p&gt;

&lt;p&gt;RexIDE was designed &lt;strong&gt;local first by default&lt;/strong&gt;, with the option to layer in cloud models only when the user explicitly opts in. Privacy and control are the baseline, not premium features.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on Codex and the Recent Shift
&lt;/h2&gt;

&lt;p&gt;Recently, OpenAI launched the Codex desktop app, which meaningfully validates the direction RexIDE took early on: local execution with persistent context.&lt;/p&gt;

&lt;p&gt;Codex today focuses on a single toolchain, the Codex ecosystem, and does a solid job at solving the local, long running AI workflow problem within that scope.&lt;/p&gt;

&lt;p&gt;RexIDE takes a broader approach. Instead of committing to a single AI provider or tool, it was designed from the start to act as an orchestrator for multiple local AI CLIs across platforms, including Claude Code, Codex CLI, and OpenCode. All of these run locally on macOS, Windows, and Linux, side by side, inside the same execution centric environment.&lt;/p&gt;

&lt;p&gt;This reflects how many developers already work today: using multiple AI tools side by side, depending on the task at hand. The environment should adapt to that reality rather than force consolidation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Selection and Resource Constraints
&lt;/h2&gt;

&lt;p&gt;Running AI models locally isn’t free: CPU, memory, and energy usage matter, especially on a machine you actively work on. RexIDE intentionally uses multiple layers of local AI execution. It utilizes external local CLIs such as Claude Code, Codex CLI, and similar tools for full reasoning and agent-driven workflows, while also employing embedded lightweight local models for smaller, fast tasks like snippet analysis, summarization, and structural understanding directly inside the app.&lt;/p&gt;

&lt;p&gt;Instead of chasing the largest model possible, RexIDE follows a simple rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use the smallest model that reliably meets the task’s requirements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lightweight embedded models handle frequent, low-latency tasks without context switching, while heavier reasoning is delegated to specialized local CLIs that already excel at those workflows.&lt;/p&gt;

&lt;p&gt;Multiple model sizes were tested against real workflows including transcription, summarization, and code understanding while monitoring latency, sustained CPU usage, and memory pressure. The selected models stay well within acceptable resource bounds, ensuring they don’t interfere with compilers, editors, or other foreground tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Native PTY Execution and State Persistence
&lt;/h2&gt;

&lt;p&gt;Most IDEs optimize for editing, but RexIDE optimizes for execution. That means providing real terminals rather than simulated ones, maintaining long running processes that don’t reset when focus changes, and enabling AI agents that operate inside the same execution context as the developer.&lt;/p&gt;

&lt;p&gt;This approach eliminates a huge amount of mental overhead. You don’t restart tasks, re-explain context, or reconstruct state — everything stays alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering Stateless Backend Boundaries
&lt;/h2&gt;

&lt;p&gt;RexIDE doesn’t require a backend to function, but it was designed with one in mind. If a backend were introduced, it would follow a few strict principles: stateless request handling, explicit separation between compute, user state, and storage, and strong session isolation to prevent data leakage.&lt;/p&gt;

&lt;p&gt;The client would remain the source of truth for execution context, with the backend acting only as an optional accelerator — never a dependency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resource Management and Background Throttling
&lt;/h2&gt;

&lt;p&gt;Performance isn’t something you optimize later, it is a core part of the user experience. RexIDE treats system resources with respect by ensuring heavy work runs off the main thread and AI workloads throttle when the app is backgrounded.&lt;/p&gt;

&lt;p&gt;If the tool ever feels like it’s “in the way,” it has failed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reversible Architectural Decisions
&lt;/h2&gt;

&lt;p&gt;Early design decisions are rarely perfect. RexIDE was built with reversibility in mind.&lt;/p&gt;

&lt;p&gt;Short, time boxed prototypes were preferred over long debates. Decisions were explicitly labeled as reversible or irreversible, which made it easier to move fast without locking the project into bad paths. That mindset allowed rapid iteration without accumulating architectural debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;RexIDE isn’t trying to be another editor with AI bolted on. It’s an execution environment where context persists, AI agents feel native, and the developer stays in control.&lt;/p&gt;

&lt;p&gt;Everything else is a consequence of that choice.&lt;/p&gt;

&lt;p&gt;If you’re building tools for developers today, the question isn’t whether to add AI — it’s &lt;strong&gt;where it lives, how much context it gets, and who ultimately controls it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;RexIDE represents one way to approach that problem.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>programming</category>
      <category>coding</category>
      <category>ai</category>
    </item>
    <item>
      <title>AWS Lambda Pricing 2026 Guide</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Mon, 02 Feb 2026 13:16:55 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/aws-lambda-pricing-2026-guide-5dnf</link>
      <guid>https://forem.com/tomerbendavid/aws-lambda-pricing-2026-guide-5dnf</guid>
      <description>&lt;p&gt;AWS Lambda is the "serverless" gold standard for a service that lets you run code without managing any servers. You only pay for what you use, but if you don't understand the rules, your bill can grow surprisingly fast.&lt;/p&gt;

&lt;p&gt;Here is everything you need to know about Lambda pricing in a clear, simple guide for 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Two Main Costs: Requests and Duration
&lt;/h2&gt;

&lt;p&gt;AWS calculates your bill using two primary factors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Requests:&lt;/strong&gt; You are charged for the total number of times your functions start running.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Duration:&lt;/strong&gt; You are charged for the time it takes your code to execute, rounded to the nearest &lt;strong&gt;1 millisecond&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Free Tier (The Good News)
&lt;/h3&gt;

&lt;p&gt;Every month, AWS gives you &lt;strong&gt;1 million requests&lt;/strong&gt; and &lt;strong&gt;400,000 GB-seconds&lt;/strong&gt; of compute time for free. The best part? This free allowance never expires.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The "Cold Start" Cost Shift (New for 2025)
&lt;/h2&gt;

&lt;p&gt;A "cold start" happens when Lambda has to set up a new environment to run your code. This used to be a performance problem; now it's a budget problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Update:&lt;/strong&gt; As of August 2025, AWS now bills for the initialization (&lt;strong&gt;INIT&lt;/strong&gt;) phase of a cold start. Before this change, the setup time was mostly free. Now, it’s a recurring budget item, especially for heavy runtimes like &lt;strong&gt;Java&lt;/strong&gt; or &lt;strong&gt;C#&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Three Simple Ways to Save (Up to 34%)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tip 1: Switch to ARM (Graviton2)
&lt;/h3&gt;

&lt;p&gt;Most Lambda functions run on x86 processors by default. However, switching to ARM-based Graviton2 processors can offer up to &lt;strong&gt;34% better price-performance&lt;/strong&gt; and costs roughly &lt;strong&gt;20% less per millisecond&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tip 2: "Right-Size" Your Memory
&lt;/h3&gt;

&lt;p&gt;When you give your function more memory (RAM), AWS automatically gives it more CPU power.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Too little memory:&lt;/strong&gt; Your code runs so slowly that you end up paying &lt;em&gt;more&lt;/em&gt; in duration charges.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Too much memory:&lt;/strong&gt; You might give your code more CPU than it can actually use.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Pro Tip:&lt;/strong&gt; Use tools like &lt;strong&gt;AWS Lambda Power Tuning&lt;/strong&gt; to find the "sweet spot" where speed and cost intersect.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tip 3: The "Lambda-Less" Approach
&lt;/h3&gt;

&lt;p&gt;The cheapest Lambda is the one you don't run. Many AWS services—like &lt;strong&gt;API Gateway&lt;/strong&gt;, &lt;strong&gt;AppSync&lt;/strong&gt;, and &lt;strong&gt;EventBridge Pipes&lt;/strong&gt;—can talk directly to databases (DynamoDB) or queues (SQS) without needing a Lambda function in the middle. This eliminates compute costs and reduces latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Pro-Tip: Don't Spend Money Waiting
&lt;/h2&gt;

&lt;p&gt;For complex, multi-step workflows that need to "wait" for something to happen, don't use Lambda to manage the wait. Use &lt;strong&gt;AWS Step Functions&lt;/strong&gt; instead. You don’t pay for the time Step Functions sits idle, whereas a Lambda function would bill you for every second it spends waiting.&lt;/p&gt;




&lt;h3&gt;
  
  
  Citations &amp;amp; Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://aws.amazon.com/lambda/pricing/" rel="noopener noreferrer"&gt;AWS Lambda Pricing Official Page&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.cloudzero.com/blog/lambda-pricing/" rel="noopener noreferrer"&gt;CloudZero: AWS Lambda Pricing Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://edgedelta.com/company/knowledge-center/aws-lambda-cold-start-cost" rel="noopener noreferrer"&gt;EdgeDelta: Lambda Cold Start Costs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning" rel="noopener noreferrer"&gt;AWS Lambda Power Tuning Tool&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/awslabs/llrt" rel="noopener noreferrer"&gt;LLRT: Low Latency Runtime for Lambda&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://aws.amazon.com/step-functions/pricing/" rel="noopener noreferrer"&gt;AWS Step Functions Pricing&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>programming</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>The Almost Correct System</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Sun, 25 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/the-hidden-cost-of-almost-correct-systems-4hoa</link>
      <guid>https://forem.com/tomerbendavid/the-hidden-cost-of-almost-correct-systems-4hoa</guid>
      <description>&lt;p&gt;In modern service and cloud architectures, the most painful production failures aren’t usually caused by "bad code" in the traditional sense.&lt;/p&gt;

&lt;p&gt;They’re caused by &lt;strong&gt;good code making different assumptions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the reality of distributed systems. It’s uncomfortable to hear, especially if you’re a careful engineer who writes tests, handles errors, and thinks about edge cases. But once you see this pattern, you’ll start noticing it everywhere, from microservice outages to distributed deadlocks and system design interview questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Baseline: Why "Working Code" != A Working System
&lt;/h2&gt;

&lt;p&gt;We naturally test the things we can control: the client, the API, the database. We run integration tests between them. If every individual component returns the correct output for a given input, we say the code is "correct."&lt;/p&gt;

&lt;p&gt;In a simple, local program, this is the ground truth. If every function is correct, the program is correct. But in a distributed cloud architecture, this logic breaks down. You can have three "correct" services that, when combined, create a catastrophic failure.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The failure isn’t usually inside your code, it’s in the space between your services.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each component is built with &lt;strong&gt;assumptions&lt;/strong&gt; about how the rest of the system behaves. When those assumptions don’t match, the system becomes fragile, even if every line of code is technically perfect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The assumption mismatch in practice
&lt;/h2&gt;

&lt;p&gt;Let’s look at something boring on purpose: &lt;strong&gt;timeouts&lt;/strong&gt;. Imagine this setup where every value looks reasonable on its own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Client timeout:&lt;/strong&gt; 2 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load balancer timeout:&lt;/strong&gt; 5 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend service timeout:&lt;/strong&gt; 30 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database timeout:&lt;/strong&gt; No limit&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Step-by-Step Failure
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;The client sends a request.&lt;/li&gt;
&lt;li&gt;The backend is slow today (cold cache, lock contention, etc.).&lt;/li&gt;
&lt;li&gt;After &lt;strong&gt;2 seconds&lt;/strong&gt;, the client gives up and retries.&lt;/li&gt;
&lt;li&gt;The original request is &lt;strong&gt;still running&lt;/strong&gt; in the backend (it has 28 seconds left).&lt;/li&gt;
&lt;li&gt;Now the backend is doing the same work twice.&lt;/li&gt;
&lt;li&gt;The database sees double load. Latency increases further.&lt;/li&gt;
&lt;li&gt;More clients retry. The system spirals.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No single component broke. The database didn't crash; the service didn't leak memory. The failure emerged from &lt;strong&gt;how their assumptions interacted.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bridging the gap with explicit contracts
&lt;/h2&gt;

&lt;p&gt;Every boundary in a system has a &lt;strong&gt;contract&lt;/strong&gt;, whether you wrote it down or not. We often rely on implicit contracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"This request finishes quickly"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Retries are safe"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"This operation runs once"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is that when assumptions are implicit, different parts of the system invent their own version of reality. That’s where "almost correct" systems are born.&lt;/p&gt;

&lt;p&gt;If a client times out at 2 seconds, the backend &lt;em&gt;must&lt;/em&gt; know its work is no longer wanted. If a client retries, the operation &lt;em&gt;must&lt;/em&gt; be idempotent.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to reason about boundaries
&lt;/h2&gt;

&lt;p&gt;To move from "Junior" to "Senior" systems thinking, you have to shift your primary question:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Junior-level thinking:&lt;/strong&gt; "Is my code correct?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Senior-level thinking:&lt;/strong&gt; "What assumptions does my code make, and who depends on them?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The longer a system lives, the more &lt;strong&gt;assumption drift&lt;/strong&gt; it accumulates. To combat this, you need to implement alignment strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Align Timeouts:&lt;/strong&gt; Upstream timeouts should generally be &lt;em&gt;shorter&lt;/em&gt; than downstream ones only if you have aggressive retries; actually, a better pattern is &lt;strong&gt;Deadline Propagation&lt;/strong&gt;, where the remaining time budget is passed along the request chain.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Make Operations Idempotent:&lt;/strong&gt; If a caller assumes they can retry safely, you must assume they &lt;em&gt;will&lt;/em&gt; retry multiple times.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Use Backpressure:&lt;/strong&gt; If you assume the system can handle X load, you must have a way to say "no" when X is exceeded, rather than slowing down for everyone.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why "almost correct" is worse than "broken"
&lt;/h2&gt;

&lt;p&gt;Failing loud is a feature. When a system crashes or returns a 500, you know exactly when and where it broke. Experienced engineers aim for this &lt;strong&gt;"fail fast"&lt;/strong&gt; behavior because it surfaces problems immediately.&lt;/p&gt;

&lt;p&gt;The danger comes from the impulse often seen in junior developers to "handle" every error by hiding it. This leads to the most dangerous state: the almost-correct system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Almost correct systems are quieter and more dangerous:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They pass unit tests.&lt;/li&gt;
&lt;li&gt;They survive staging.&lt;/li&gt;
&lt;li&gt;They fail only under specific load.&lt;/li&gt;
&lt;li&gt;They fail only when timing is unlucky.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These failures are hard to reproduce because &lt;strong&gt;no single line of code is wrong.&lt;/strong&gt; This is why postmortems often sound like: &lt;em&gt;"Everything behaved as designed... just not together."&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A systems thinking checklist
&lt;/h2&gt;

&lt;p&gt;When designing or reviewing a system, don’t start with implementation details. Start with &lt;strong&gt;failure questions&lt;/strong&gt; to force assumptions into the open:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;Retries:&lt;/strong&gt; What retries this, and what is the retry budget?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Timeouts:&lt;/strong&gt; Who times out first? Does the work stop when they do?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Idempotency:&lt;/strong&gt; What happens if this exact request runs twice?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Partial Failure:&lt;/strong&gt; What happens if the DB update succeeds but the cache update fails?&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;State:&lt;/strong&gt; What state survives a crash, and what assumption does the &lt;em&gt;next&lt;/em&gt; run make about that state?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Great software isn’t built by eliminating bugs. It’s built by eliminating &lt;strong&gt;surprises&lt;/strong&gt;. These surprises don’t come from bad code; they come from assumptions that were never made explicit.&lt;/p&gt;




&lt;h3&gt;
  
  
  Citations &amp;amp; Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sre.google/sre-book/cascading-failures/" rel="noopener noreferrer"&gt;Google SRE Book: Chapter 22 - Addressing Cascading Failures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/message/5467D2/" rel="noopener noreferrer"&gt;Amazon DynamoDB 2015 Incident Post-mortem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/No_Silver_Bullet" rel="noopener noreferrer"&gt;Fred Brooks: No Silver Bullet — Accident vs. Essence in Software Engineering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/" rel="noopener noreferrer"&gt;Notes on Distributed Systems for Young Bloods&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://rex.mindmeld360.com" rel="noopener noreferrer"&gt;https://rex.mindmeld360.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>architecure</category>
      <category>testing</category>
      <category>softwarequality</category>
    </item>
    <item>
      <title>Java Memory Model Deep Dive: Visibility, Reordering, and the Truth About Volatile</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/java-memory-model-deep-dive-visibility-reordering-and-the-truth-about-volatile-58gd</link>
      <guid>https://forem.com/tomerbendavid/java-memory-model-deep-dive-visibility-reordering-and-the-truth-about-volatile-58gd</guid>
      <description>&lt;p&gt;In a single-threaded Java program, you are protected by a beautiful lie called &lt;strong&gt;as-if-serial semantics&lt;/strong&gt;. If you write &lt;code&gt;int x = 1; int y = 2;&lt;/code&gt;, the JVM and CPU can reorder those lines however they want to improve performance, but they promise that the &lt;em&gt;result&lt;/em&gt; will be exactly as if they ran in order. Inside that single thread, the reordering is invisible.&lt;/p&gt;

&lt;p&gt;As soon as you introduce a second thread, the lie falls apart. That second thread doesn't see the "as-if-serial" promise; it sees the raw memory as it updates. Code that looks perfectly logical can suddenly fail in ways that seem impossible. This is where the &lt;strong&gt;Java Memory Model (JMM)&lt;/strong&gt; comes in—it is the official "contract" that defines exactly when and how threads are allowed to see each other's changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Core Problem: Performance over Predictability
&lt;/h2&gt;

&lt;p&gt;Most developers assume the JVM executes code exactly line-by-line as written. In reality, the JVM and your CPU are obsessed with speed. To run faster, they perform optimizations that create two main issues: &lt;strong&gt;Reordering&lt;/strong&gt; and &lt;strong&gt;Visibility&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reordering: The "Out of Order" Execution
&lt;/h3&gt;

&lt;p&gt;The compiler or the CPU might decide to swap two instructions if it thinks the final result will be the same.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// What you wrote:&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// What the CPU might actually execute:&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a single thread, this swap doesn't matter. But if another thread is waiting for &lt;code&gt;flag&lt;/code&gt; to be true so it can read &lt;code&gt;a&lt;/code&gt;, it might see &lt;code&gt;flag == true&lt;/code&gt; before &lt;code&gt;a&lt;/code&gt; has actually been set to 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visibility: The Cache Problem
&lt;/h3&gt;

&lt;p&gt;Modern CPUs have their own local caches (L1, L2, L3). When a thread updates a variable, it might only save that change in its local CPU cache to save time. Other threads, running on different CPU cores, will continue to read the old value from their own caches or main memory. The change is "invisible" to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Solution: "Happens-Before" (HB)
&lt;/h2&gt;

&lt;p&gt;The JMM doesn't promise that everything will always be in order. Instead, it provides a set of rules called the &lt;strong&gt;Happens-Before&lt;/strong&gt; relationship. &lt;/p&gt;

&lt;p&gt;Think of Happens-Before as a "visibility bridge." If Action A happens-before Action B, then any change made by Action A is guaranteed to be visible to the thread performing Action B.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Most Important Rules:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Program Order:&lt;/strong&gt; In a single thread, every action happens-before any action that comes later in the code.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Volatile Variable Rule:&lt;/strong&gt; A write to a &lt;code&gt;volatile&lt;/code&gt; field happens-before every subsequent read of that same field. (This is the "signal" we use to bridge threads).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Monitor Lock Rule:&lt;/strong&gt; Releasing a lock (&lt;code&gt;synchronized&lt;/code&gt;) happens-before any subsequent acquisition of that same lock.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Thread Life Cycle:&lt;/strong&gt; Calling &lt;code&gt;thread.start()&lt;/code&gt; happens-before any action in that thread. All actions in a thread happen-before a successful &lt;code&gt;thread.join()&lt;/code&gt; on that thread.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3. The &lt;code&gt;volatile&lt;/code&gt; Modifier: A Modern Guide
&lt;/h2&gt;

&lt;p&gt;A common mistake is thinking &lt;code&gt;volatile&lt;/code&gt; is just for "disabling caches." It's more powerful than that. &lt;/p&gt;

&lt;p&gt;When you write to a &lt;code&gt;volatile&lt;/code&gt; variable, the JVM ensures two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Visibility:&lt;/strong&gt; The write is immediately flushed to main memory, and any subsequent read will pull the latest value.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Ordering (The Barrier):&lt;/strong&gt; The JVM prevents instructions from being reordered around the volatile read/write. It acts as a "memory barrier."&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What &lt;code&gt;volatile&lt;/code&gt; does NOT do: Atomicity
&lt;/h3&gt;

&lt;p&gt;This is the biggest landmine in Java. &lt;code&gt;volatile&lt;/code&gt; does &lt;strong&gt;not&lt;/strong&gt; make compound operations atomic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;++;&lt;/span&gt; &lt;span class="c1"&gt;// NOT THREAD-SAFE&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;count++&lt;/code&gt; is actually three steps: read, add 1, write. If two threads do this at the same time, they might both read the same value, add 1, and write the same result back, losing one increment. For this, you need &lt;code&gt;AtomicInteger&lt;/code&gt; or &lt;code&gt;synchronized&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Unsafe Publication: Why "null" isn't always null
&lt;/h2&gt;

&lt;p&gt;One of the strangest bugs in Java is when a thread sees an object that is "half-initialized."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Thread A&lt;/span&gt;
&lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Helper&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Thread B&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;x&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Could print 0 instead of 42!&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because of reordering, the CPU might assign the memory address of the new &lt;code&gt;Helper&lt;/code&gt; object to the &lt;code&gt;shared&lt;/code&gt; variable &lt;em&gt;before&lt;/em&gt; the constructor has finished setting &lt;code&gt;x = 42&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to fix this (Safe Publication):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Make the &lt;code&gt;shared&lt;/code&gt; field &lt;code&gt;volatile&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  Initialize it inside a &lt;code&gt;synchronized&lt;/code&gt; block.&lt;/li&gt;
&lt;li&gt;  Make the fields inside the object &lt;code&gt;final&lt;/code&gt;. The JMM gives special visibility guarantees to &lt;code&gt;final&lt;/code&gt; fields once the constructor finishes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Double-Checked Locking (DCL)
&lt;/h2&gt;

&lt;p&gt;The classic way to create a lazy singleton safely is the Double-Checked Locking pattern. It relies heavily on &lt;code&gt;volatile&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;volatile&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt; &lt;span class="nf"&gt;getResource&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Resource&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// First check (no locking)&lt;/span&gt;
        &lt;span class="kd"&gt;synchronized&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// Second check (with locking)&lt;/span&gt;
                &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: We use the local variable &lt;code&gt;result&lt;/code&gt; to reduce the number of times we have to read the &lt;code&gt;volatile&lt;/code&gt; field, which is a small performance optimization.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Beyond Volatile: VarHandles (Java 9+)
&lt;/h2&gt;

&lt;p&gt;In modern Java, if you need even more control than &lt;code&gt;volatile&lt;/code&gt; provides, you can use the &lt;code&gt;VarHandle&lt;/code&gt; API. It allows you to choose exactly how much "strictness" you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Opaque:&lt;/strong&gt; Ensures the value isn't cached, but allows reordering.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Acquire/Release:&lt;/strong&gt; A lighter version of volatile that only enforces ordering in one direction.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Volatile:&lt;/strong&gt; The full-strength version we discussed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Checklist for Concurrent Code
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Is the variable shared?&lt;/strong&gt; If yes, it must be protected by &lt;code&gt;volatile&lt;/code&gt;, &lt;code&gt;Atomic&lt;/code&gt; classes, or a lock.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Are you doing more than a simple write?&lt;/strong&gt; If you are reading-then-writing (like &lt;code&gt;count++&lt;/code&gt;), &lt;code&gt;volatile&lt;/code&gt; is not enough. Use &lt;code&gt;AtomicInteger&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Is your object fully built?&lt;/strong&gt; Never let the &lt;code&gt;this&lt;/code&gt; reference "escape" from a constructor (e.g., by passing it to another thread) before the constructor is finished.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Can you use &lt;code&gt;final&lt;/code&gt;?&lt;/strong&gt; Always prefer &lt;code&gt;final&lt;/code&gt; fields. They are the simplest way to ensure thread-safety for data that doesn't change.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Citations &amp;amp; Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://docs.oracle.com/javase/specs/jls/se17/html/jls-17.html" rel="noopener noreferrer"&gt;Java Language Specification, Chapter 17: Threads and Locks&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html" rel="noopener noreferrer"&gt;JSR-133 (Java Memory Model) FAQ&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/" rel="noopener noreferrer"&gt;Aleksey Shipilёv: JMM Pragmatics&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/VarHandle.html" rel="noopener noreferrer"&gt;VarHandle API Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Originally published at &lt;a href="https://rex.mindmeld360.com" rel="noopener noreferrer"&gt;https://rex.mindmeld360.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>java</category>
      <category>concurrency</category>
      <category>jvm</category>
      <category>multithreading</category>
    </item>
    <item>
      <title>Deduce, Don't Store</title>
      <dc:creator>Tomer Ben David</dc:creator>
      <pubDate>Tue, 23 Dec 2025 11:46:34 +0000</pubDate>
      <link>https://forem.com/tomerbendavid/deduce-dont-store-5adn</link>
      <guid>https://forem.com/tomerbendavid/deduce-dont-store-5adn</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;One of the most common sources of bugs in complex applications is stale state. When you store a value that depends on another piece of data, you create a requirement to keep those two values in sync. If you miss a single update path, your application enters an invalid state that can be incredibly difficult to debug.&lt;/p&gt;

&lt;p&gt;We want to be strict about state management: Deduce state from the source of truth rather than storing it. While we implement this using Swift computed properties, it is important to note that this is not a language specific trick. It is a fundamental engineering practice used in reliable systems across all platforms to eliminate data synchronization errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Danger of Cached Values
&lt;/h2&gt;

&lt;p&gt;In traditional application development, it is tempting to store boolean flags like &lt;code&gt;isLoaded&lt;/code&gt; or &lt;code&gt;isAuthorized&lt;/code&gt; as mutable properties. However, these flags are rarely the actual source of truth. The true state lives in your data collection or your active session token.&lt;/p&gt;

&lt;p&gt;By storing these flags separately, you are essentially caching a view of the reality. If the data is cleared or the token expires, your stored flag becomes a lie. This leads to edge cases where the UI shows one thing while the underlying system is doing another.&lt;/p&gt;

&lt;h2&gt;
  
  
  Computing State from the Source of Truth
&lt;/h2&gt;

&lt;p&gt;To ensure that the application always reflects the current reality, we prefer to compute state directly from the service that owns the data. Instead of updating a status variable whenever an action occurs, we query the state in real time.&lt;/p&gt;

&lt;p&gt;This approach ensures that there is only one place where an update can happen. Every other component simply observes or deduces its logic from that single point.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;/// Deducing state from a dependency&lt;/span&gt;
&lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="kt"&gt;InventoryManager&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;StorageProvider&lt;/span&gt;

    &lt;span class="c1"&gt;/// The true source of truth is the actual data in storage&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;needsReplenishment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// We compute this every time it is needed&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;itemCount&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;

    &lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;StorageProvider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By making &lt;code&gt;needsReplenishment&lt;/code&gt; a computed property, we eliminate the possibility of it ever being out of sync with the &lt;code&gt;storage&lt;/code&gt;. There is no &lt;code&gt;setNeedsReplenishment(true)&lt;/code&gt; method to call, which removes an entire category of logic errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolating Side Effects in the Deduction Loop
&lt;/h2&gt;

&lt;p&gt;Deducing state is not just about simple booleans. It is a philosophy that extends to complex UI transitions and background operations. When you need to decide whether to show a specific view or enable an action, you should deduce that decision from the current environment variables.&lt;/p&gt;

&lt;p&gt;In our core architecture, we use services that provide these environmental snapshots. For example, if we need to know if a permission is granted, we do not check a stored &lt;code&gt;isPermitted&lt;/code&gt; flag. Instead, we query a provider that evaluates the current system settings in real time. This ensures that the app reacts immediately to changes without needing complex synchronization code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benefits of Reduced Complexity
&lt;/h2&gt;

&lt;p&gt;When you stop storing redundant state, your code becomes significantly more predictable. Your models become simpler because they no longer need to manage the lifecycle of cached values. Your tests also become more robust because you only need to mock the primary source of truth to verify hundreds of different deduced outcomes.&lt;/p&gt;

&lt;p&gt;By removing the opportunity for state to become stale, we create a faster and more reliable experience for our users.&lt;/p&gt;

</description>
      <category>sre</category>
      <category>programming</category>
      <category>swift</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
