<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Derin Akay</title>
    <description>The latest articles on Forem by Derin Akay (@dxa204).</description>
    <link>https://forem.com/dxa204</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3745610%2F273f57e8-d1f5-418f-80c8-bdb16e27677c.png</url>
      <title>Forem: Derin Akay</title>
      <link>https://forem.com/dxa204</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dxa204"/>
    <language>en</language>
    <item>
      <title>Git Cluster RAG: Semantic Routing for Git History (Copilot CLI Challenge)</title>
      <dc:creator>Derin Akay</dc:creator>
      <pubDate>Fri, 06 Feb 2026 06:07:36 +0000</pubDate>
      <link>https://forem.com/dxa204/git-cluster-rag-semantic-routing-for-git-history-copilot-cli-challenge-4fd0</link>
      <guid>https://forem.com/dxa204/git-cluster-rag-semantic-routing-for-git-history-copilot-cli-challenge-4fd0</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Repository: &lt;a href="https://github.com/dxa204/git-cluster-rag.git" rel="noopener noreferrer"&gt;https://github.com/dxa204/git-cluster-rag.git&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built Git Cluster RAG, a command-line tool that uses K-Means Clustering to "route" questions about a repository's history to the correct context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Standard RAG (Retrieval-Augmented Generation) applications are "flat." If you ask a question like "Why did we remove the notes file?", a standard vector search might retrieve unrelated commits just because they share keywords. It struggles to distinguish between Code Refactoring, Documentation Updates, and One-off Cleanups.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;My tool uses the GitHub Copilot CLI to build a pipeline that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingests commit history (messages + file diffs).&lt;/li&gt;
&lt;li&gt;Embeds the changes using sentence-transformers.&lt;/li&gt;
&lt;li&gt;Clusters the commits using K-Means.&lt;/li&gt;
&lt;li&gt;Routes user queries to the specific semantic cluster (e.g., "Cluster 0: Maintenance") before retrieving answers.
This "Cluster-Guided" approach ensures that when I ask about a deleted file, the system prioritizes "Cleanup" commits over "Feature" commits.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Demo: Cluster-Guided Routing in Action
&lt;/h2&gt;

&lt;p&gt;In this video, you can see the tool ingesting the git history, identifying the clusters, and then correctly routing a specific query about a deleted file to the "Maintenance" cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/FY4GY0uqMxI" rel="noopener noreferrer"&gt;https://youtu.be/FY4GY0uqMxI&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;Building this project entirely with the Copilot CLI changed my workflow from "Stack Overflow searcher" to "Command Line architect."&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Scaffolding with Context&lt;br&gt;
I used the &lt;code&gt;@workspace /new&lt;/code&gt; command to generate the entire project structure (&lt;code&gt;ingest.py&lt;/code&gt;, &lt;code&gt;cluster.py&lt;/code&gt;, &lt;code&gt;chat.py&lt;/code&gt;) in one go. Instead of writing boilerplate, I could focus on the logic of the K-Means algorithm.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The "Agent" Workflow&lt;br&gt;
The standout feature for me was the &lt;code&gt;/init&lt;/code&gt; command. By running this, I was able to generate a &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt; file that taught Copilot the specific constraints of my project (e.g., "Always use 3 clusters", "Truncate diffs to 500 chars"). This effectively turned Copilot into a specialized teammate that knew my architecture, not just a generic code generator.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Frictionless Debugging&lt;br&gt;
When I hit syntax errors or needed to generate dummy git data for testing, I didn't leave the terminal. I used &lt;code&gt;gh copilot suggest&lt;/code&gt; to generate complex shell commands that created dummy commits, enabling me to test the clustering algorithm in seconds.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
  </channel>
</rss>
