<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: BALA PRAHARSHA MANNEPALLI</title>
    <description>The latest articles on Forem by BALA PRAHARSHA MANNEPALLI (@bala_praharshamannepalli).</description>
    <link>https://forem.com/bala_praharshamannepalli</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840551%2F66df7a32-c1d0-48bc-9cbd-75ccd34d185d.jpg</url>
      <title>Forem: BALA PRAHARSHA MANNEPALLI</title>
      <link>https://forem.com/bala_praharshamannepalli</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/bala_praharshamannepalli"/>
    <language>en</language>
    <item>
      <title>Why My AI Tutor Improved With Hindsight</title>
      <dc:creator>BALA PRAHARSHA MANNEPALLI</dc:creator>
      <pubDate>Mon, 23 Mar 2026 17:46:42 +0000</pubDate>
      <link>https://forem.com/bala_praharshamannepalli/why-my-ai-tutor-improved-with-hindsight-l05</link>
      <guid>https://forem.com/bala_praharshamannepalli/why-my-ai-tutor-improved-with-hindsight-l05</guid>
      <description>&lt;p&gt;Our AI tutor used to treat every submission like the first one—until Hindsight let it connect attempts, and suddenly it stopped explaining errors and started predicting them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built (and Why It Wasn’t Working)
&lt;/h2&gt;

&lt;p&gt;I’ve been working on a coding practice platform that behaves less like a judge and more like a tutor. You write code, run it, get feedback—but instead of a binary pass/fail, the system tries to explain what went wrong and nudge you forward.&lt;/p&gt;

&lt;p&gt;At a high level, the system is pretty straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;frontend&lt;/strong&gt; where users write and submit code&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Python backend&lt;/strong&gt; that handles submissions and orchestration&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;execution layer&lt;/strong&gt; (sandboxed) that runs code safely&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;AI layer&lt;/strong&gt; that analyzes output and generates feedback&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;memory layer&lt;/strong&gt; where things got interesting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first four parts worked fine. Code executed, errors were captured, feedback was generated.&lt;/p&gt;

&lt;p&gt;But something felt off.&lt;/p&gt;

&lt;p&gt;The tutor wasn’t actually &lt;em&gt;teaching&lt;/em&gt;. It was just reacting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Stateless Feedback Is Useless
&lt;/h2&gt;

&lt;p&gt;Every time a user submitted code, the system treated it as a completely new event.&lt;/p&gt;

&lt;p&gt;It didn’t matter if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The user had already failed 3 times&lt;/li&gt;
&lt;li&gt;The mistake was identical to the previous attempt&lt;/li&gt;
&lt;li&gt;The hint we gave last time clearly didn’t help&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We still generated feedback like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You might want to check your recursion base case.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Again. And again. And again.&lt;/p&gt;

&lt;p&gt;From the system’s perspective, this was correct. From the user’s perspective, it was useless.&lt;/p&gt;

&lt;p&gt;I initially thought I could solve this with a bigger prompt—just shove previous attempts into the context window and let the model “figure it out.”&lt;/p&gt;

&lt;p&gt;That worked… until it didn’t:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context got bloated quickly&lt;/li&gt;
&lt;li&gt;Important patterns were lost in noise&lt;/li&gt;
&lt;li&gt;Cost and latency increased&lt;/li&gt;
&lt;li&gt;Behavior was inconsistent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I needed wasn’t more context.&lt;/p&gt;

&lt;p&gt;I needed memory that &lt;em&gt;meant something&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shift: Storing Attempts as First-Class Data
&lt;/h2&gt;

&lt;p&gt;The turning point was when I stopped thinking of submissions as isolated requests and started treating them as a sequence.&lt;/p&gt;

&lt;p&gt;Instead of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;feedback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;feedback&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I moved to something closer to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_submission&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;store_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_recent_events&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;feedback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;feedback&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks obvious in hindsight (no pun intended), but this alone didn’t fix the problem.&lt;/p&gt;

&lt;p&gt;I had &lt;em&gt;history&lt;/em&gt;, but I didn’t have &lt;em&gt;learning&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The system still wasn’t connecting patterns. It just had more data to ignore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter Hindsight: Turning History Into Behavior
&lt;/h2&gt;

&lt;p&gt;This is where I integrated &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;, which is essentially a structured memory layer for agents.&lt;/p&gt;

&lt;p&gt;Instead of dumping raw history into prompts, Hindsight lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store interactions as structured events&lt;/li&gt;
&lt;li&gt;Retrieve relevant past patterns&lt;/li&gt;
&lt;li&gt;Feed them back into decision-making in a controlled way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wired it into the submission pipeline so that every attempt becomes a memory entry.&lt;/p&gt;

&lt;p&gt;Conceptually, it looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;classify_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And on the next submission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;problem_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;feedback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference: I wasn’t passing &lt;em&gt;all history&lt;/em&gt;. I was passing &lt;em&gt;relevant history&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If a user repeatedly messed up recursion, the system would surface those specific attempts—not everything they’d ever done.&lt;/p&gt;

&lt;p&gt;If you’re curious how this works in detail, the &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight documentation&lt;/a&gt; explains the retrieval and structuring patterns pretty clearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Real Change in Behavior
&lt;/h2&gt;

&lt;p&gt;This is where things got interesting.&lt;/p&gt;

&lt;p&gt;Before Hindsight:&lt;/p&gt;

&lt;p&gt;User writes incorrect recursion →&lt;br&gt;
System says: “Check base case.”&lt;/p&gt;

&lt;p&gt;User repeats same mistake →&lt;br&gt;
System says: “Check base case.”&lt;/p&gt;

&lt;p&gt;After Hindsight:&lt;/p&gt;

&lt;p&gt;User repeats mistake →&lt;br&gt;
System says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“You’re making the same recursion mistake as your previous attempt—your base case still doesn’t stop when n == 0. Try returning 1 there.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a completely different experience.&lt;/p&gt;

&lt;p&gt;It’s not just explaining the error. It’s &lt;em&gt;connecting the dots&lt;/em&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Worked (and My First Attempt Didn’t)
&lt;/h2&gt;

&lt;p&gt;The difference wasn’t just “adding memory.” It was &lt;em&gt;how&lt;/em&gt; memory was used.&lt;/p&gt;

&lt;p&gt;My earlier approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dump raw history into prompt&lt;/li&gt;
&lt;li&gt;Hope the model extracts patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hindsight approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store structured events&lt;/li&gt;
&lt;li&gt;Retrieve relevant ones&lt;/li&gt;
&lt;li&gt;Inject them intentionally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This aligns much more closely with how we’d design any other system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index data&lt;/li&gt;
&lt;li&gt;Query by relevance&lt;/li&gt;
&lt;li&gt;Use results predictably&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you look at how agent memory is framed more broadly, this is exactly the pattern described in systems like Vectorize’s &lt;a href="https://vectorize.io/features/agent-memory" rel="noopener noreferrer"&gt;agent memory architecture overview&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Small but Important Design Choice
&lt;/h2&gt;

&lt;p&gt;One thing that made a big difference: &lt;strong&gt;explicit error classification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of storing just raw outputs, I added a lightweight classifier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RecursionError&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IndexError&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allowed retrieval to be more targeted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recursion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, the system sometimes pulled irrelevant past attempts, which diluted feedback quality.&lt;/p&gt;

&lt;p&gt;This is one of those small, boring engineering decisions that ended up mattering more than expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Still Doesn’t Work Well
&lt;/h2&gt;

&lt;p&gt;It’s not perfect.&lt;/p&gt;

&lt;p&gt;A few things are still rough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Overfitting to past mistakes&lt;/strong&gt;&lt;br&gt;
Sometimes the system assumes the user is repeating an error when they’re actually trying something new.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cold start problem&lt;/strong&gt;&lt;br&gt;
First-time users still get generic feedback.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory drift&lt;/strong&gt;&lt;br&gt;
Old mistakes can become irrelevant, but still show up if not filtered properly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I haven’t fully solved these yet. I’m experimenting with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decay functions for memory relevance&lt;/li&gt;
&lt;li&gt;Weighting recent attempts higher&lt;/li&gt;
&lt;li&gt;Separating “resolved” vs “active” mistakes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;A few things I’d carry forward into any system like this:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. More context is not better context
&lt;/h3&gt;

&lt;p&gt;Dumping everything into a prompt is lazy and doesn’t scale. Retrieval beats brute force every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory needs structure, not just storage
&lt;/h3&gt;

&lt;p&gt;Raw logs are not memory. Indexing, filtering, and shaping data is what makes it useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Small metadata decisions matter a lot
&lt;/h3&gt;

&lt;p&gt;That simple &lt;code&gt;error_type&lt;/code&gt; field improved retrieval quality more than any prompt tweak I tried.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Behavior &amp;gt; accuracy
&lt;/h3&gt;

&lt;p&gt;The biggest improvement wasn’t “better explanations.” It was &lt;em&gt;different behavior&lt;/em&gt;—the system started referencing past attempts.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Teaching requires continuity
&lt;/h3&gt;

&lt;p&gt;If your system doesn’t connect attempts, it’s not a tutor. It’s just a smarter compiler error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Leaves the System
&lt;/h2&gt;

&lt;p&gt;Right now, the tutor does something I didn’t expect it to do this early: it adapts.&lt;/p&gt;

&lt;p&gt;Not perfectly, not consistently—but enough that you notice.&lt;/p&gt;

&lt;p&gt;You can make the same mistake twice, and it won’t respond the same way.&lt;/p&gt;

&lt;p&gt;That alone changed how the system feels.&lt;/p&gt;

&lt;p&gt;It stopped being reactive and started being &lt;em&gt;contextual&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;And that shift didn’t come from a better model.&lt;/p&gt;

&lt;p&gt;It came from giving the system a memory it could actually use.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
