<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vignesh</title>
    <description>The latest articles on Forem by Vignesh (@spaceduck18).</description>
    <link>https://forem.com/spaceduck18</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817501%2F9e0ed59d-87e9-49b2-824d-5eaed8beb259.jpeg</url>
      <title>Forem: Vignesh</title>
      <link>https://forem.com/spaceduck18</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/spaceduck18"/>
    <language>en</language>
    <item>
      <title>Building a Serverless Semantic Deduplication Engine Under 500ms published: true</title>
      <dc:creator>Vignesh</dc:creator>
      <pubDate>Tue, 10 Mar 2026 21:17:42 +0000</pubDate>
      <link>https://forem.com/spaceduck18/building-a-serverless-semantic-deduplication-engine-under-500ms-published-true-11e6</link>
      <guid>https://forem.com/spaceduck18/building-a-serverless-semantic-deduplication-engine-under-500ms-published-true-11e6</guid>
      <description>&lt;p&gt;As a 1st-year engineering student diving into distributed systems, I’ve noticed a massive inefficiency in how growing engineering teams operate: they don't just duplicate code—they duplicate intent. One squad builds an "Auth Service" while another builds a "Login Backend." They use different words, so standard Jira or GitHub keyword searches never catch the overlap until the code is already shipped.&lt;/p&gt;

&lt;p&gt;I recently built &lt;strong&gt;Kanso&lt;/strong&gt;, a cloud-native semantic intelligence layer, to solve this. It integrates directly into the IDE and alerts developers if they are typing a feature that already exists elsewhere in the organization. &lt;/p&gt;

&lt;p&gt;Here is a breakdown of the event-driven software architecture patterns I used, and how I kept the entire round-trip semantic validation under a strict 500ms latency budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Architecture Pattern
&lt;/h3&gt;

&lt;p&gt;Kanso operates on a fully decoupled, serverless hub-and-spoke model. The IDE extensions (VS Code, Kiro) act as edge sensors, while the heavy lifting is handled by an AWS serverless backend.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(AWS Architecture Diagram)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp0uscl8ck3wmepk4hdf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp0uscl8ck3wmepk4hdf.png" alt=" " width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion:&lt;/strong&gt; Amazon API Gateway (with JWT Lambda Authorizers for multi-tenant isolation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute:&lt;/strong&gt; AWS Lambda (Python 3.11). Python was the pragmatic choice here for rapid integration with the AWS Bedrock SDKs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings &amp;amp; Reasoning:&lt;/strong&gt; Amazon Bedrock (Titan &amp;amp; Claude 3 Sonnet)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Database:&lt;/strong&gt; Amazon OpenSearch Serverless (KNN Search)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solving the 500ms Edge Challenge
&lt;/h3&gt;

&lt;p&gt;Even as a student, I know that blocking the main thread of an IDE is a cardinal sin. For this tool to be usable, the time from a developer pausing their typing to receiving a validated semantic alert had to be practically imperceptible. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Edge Debouncing &amp;amp; Caching&lt;/strong&gt;&lt;br&gt;
Instead of firing an API call on every keystroke, the VS Code extension uses a 500ms debouncer. I also implemented an intelligent embedding cache at the edge. If a developer deletes and retypes the same phrase, the extension pulls from the local cache rather than hitting the API Gateway, aggressively reducing noisy network calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fast Vector Retrieval&lt;/strong&gt;&lt;br&gt;
Once the text hits the backend, AWS Lambda uses the &lt;strong&gt;Titan&lt;/strong&gt; model to generate a vector embedding. To query historical initiatives quickly, I utilized &lt;strong&gt;OpenSearch Serverless&lt;/strong&gt;. Its KNN (k-nearest neighbors) matching allows the system to scan thousands of partitioned, tenant-specific vectors in milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The LLM Circuit Breaker&lt;/strong&gt;&lt;br&gt;
Vector similarity alone generates too many false positives (e.g., "Implement Stripe" vs. "Deprecate Stripe"). However, running every query through a Large Language Model is too slow for an IDE extension. &lt;/p&gt;

&lt;p&gt;My solution was a two-tiered filter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tier 1:&lt;/strong&gt; OpenSearch returns vectors. If the similarity is below 85%, the Lambda drops the event immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier 2:&lt;/strong&gt; If (and only if) the score exceeds 85%, the payload is routed to &lt;strong&gt;Claude 3 Sonnet&lt;/strong&gt; via Bedrock. Claude acts as the final contextual validator, returning a strict JSON response (&lt;code&gt;is_duplicate: true/false&lt;/code&gt;) based on intent and status conflicts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pattern ensures that the slower, compute-heavy LLM is only invoked when absolutely necessary, preserving the overall latency budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Result
&lt;/h3&gt;

&lt;p&gt;By combining fast vector retrieval with conditional LLM reasoning, Kanso prevents redundant engineering effort in real time without lagging the developer's environment. &lt;/p&gt;

&lt;p&gt;The full project is currently a semi-finalist in the AWS 10,000 AIdeas Competition. As a 1st-year student, making it to the Top 300 would be incredible. If you found this architectural breakdown helpful, &lt;strong&gt;I would highly appreciate your vote!&lt;/strong&gt; &lt;strong&gt;To support Kanso:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click the link below to visit my AWS Builder Center page.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log in&lt;/strong&gt; with your AWS or Amazon account.&lt;/li&gt;
&lt;li&gt;Hit the &lt;strong&gt;"Like"&lt;/strong&gt; button at the top of the article!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;👉 &lt;strong&gt;Vote here:&lt;/strong&gt; (&lt;a href="https://builder.aws.com/content/3ALxyTVN1wfLRqkvdh5s8BYwtcC/aideas-kanso-a-cloud-native-semantic-intelligence-layer-to-prevent-engineering-duplication" rel="noopener noreferrer"&gt;https://builder.aws.com/content/3ALxyTVN1wfLRqkvdh5s8BYwtcC/aideas-kanso-a-cloud-native-semantic-intelligence-layer-to-prevent-engineering-duplication&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Let me know in the comments how your team currently handles redundant engineering work!&lt;/p&gt;

&lt;p&gt;— Vignesh&lt;/p&gt;

</description>
      <category>awschallenge</category>
      <category>aideas10000</category>
      <category>serverless</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
