<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rich Robertson</title>
    <description>The latest articles on Forem by Rich Robertson (@rich_robertson).</description>
    <link>https://forem.com/rich_robertson</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3839458%2Fd65f70e9-3030-41ef-a774-e248437d62bb.jpeg</url>
      <title>Forem: Rich Robertson</title>
      <link>https://forem.com/rich_robertson</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rich_robertson"/>
    <language>en</language>
    <item>
      <title>How I Built a Retrieval-Backed Chatbot to Replace My Resume Screening Step</title>
      <dc:creator>Rich Robertson</dc:creator>
      <pubDate>Thu, 02 Apr 2026 17:50:48 +0000</pubDate>
      <link>https://forem.com/rich_robertson/how-i-built-a-retrieval-backed-chatbot-to-replace-my-resume-screening-step-131n</link>
      <guid>https://forem.com/rich_robertson/how-i-built-a-retrieval-backed-chatbot-to-replace-my-resume-screening-step-131n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This is a condensed cross-post. &lt;a href="https://www.myrobertson.com/blog/how-i-built-the-askrich-chatbot-for-technical-screening" rel="noopener noreferrer"&gt;Read the full version on my site&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Recruiter conversations have a consistent problem: a resume gives breadth, but not the system details behind the impact statements. I built &lt;strong&gt;AskRich&lt;/strong&gt; to close that gap — a chatbot that lets hiring teams ask specific technical questions and get citation-backed answers grounded in my actual portfolio and writing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;AskRich is optimized for realistic recruiter workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-click prompt chips for common questions (architecture trade-offs, delivery scope, measurable outcomes)&lt;/li&gt;
&lt;li&gt;A freeform chat input for custom questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citations attached to every answer&lt;/strong&gt; — so a recruiter can verify the source instead of taking generated output at face value&lt;/li&gt;
&lt;li&gt;A lightweight, dependency-free web UI in plain JavaScript&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The citation-backed output is the core product decision. Once someone can see &lt;em&gt;where&lt;/em&gt; an answer comes from, they tend to ask sharper follow-ups — and the conversation gets more useful faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture overview
&lt;/h2&gt;

&lt;p&gt;At a high level: a thin web client over a retrieval-backed chat API.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6r42efpgpap15k9lema.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6r42efpgpap15k9lema.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Worker supports three runtime modes: &lt;code&gt;upstream&lt;/code&gt; (proxy to retrieval API), &lt;code&gt;local&lt;/code&gt; (built-in corpus), and &lt;code&gt;openai&lt;/code&gt; (direct model path with retrieval-aware constraints). This lets me test and route independently without redeploying the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part that actually made it better: a feedback loop
&lt;/h2&gt;

&lt;p&gt;The first version was anecdotal. I'd notice a weak answer, edit something, and hope for the best.&lt;/p&gt;

&lt;p&gt;The current version records structured events for every question, answer, and thumbs-up/thumbs-down interaction — all linked by stable event IDs. That lets me triage a specific low-rated answer with its exact question text, citation count, latency, and backend mode instead of debugging in the abstract.&lt;/p&gt;

&lt;p&gt;Triage classifies failures into four buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Corpus gap&lt;/strong&gt; — the content just isn't there&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval/ranking issue&lt;/strong&gt; — the right content exists but isn't surfaced&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt/format issue&lt;/strong&gt; — generation quality or response clarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Out-of-scope&lt;/strong&gt; — the question type needs routing or a guardrail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Changes are tested, compared against a baseline, and only promoted when they improve answer quality without regressing citation clarity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate limiting at the edge
&lt;/h2&gt;

&lt;p&gt;Rate limiting is enforced in the Cloudflare Worker before any chat execution. Client identity is derived from a one-way hash of request context (IP + origin + user-agent) — raw IPs aren't stored as persistent identifiers.&lt;/p&gt;

&lt;p&gt;Two guards run in sequence: an hourly quota and a burst interval. If either is exceeded, the API returns &lt;code&gt;429&lt;/code&gt; with &lt;code&gt;Retry-After&lt;/code&gt;. If KV storage is unavailable, the limiter degrades gracefully (fail-open) to preserve availability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tighten citation quality metrics and regression gating for high-frequency questions&lt;/li&gt;
&lt;li&gt;Promote successful A/B retrieval/prompt variants into default behavior&lt;/li&gt;
&lt;li&gt;Expand corpus gap-closing using the weekly triage workflow&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.myrobertson.com/ask-rich.html" rel="noopener noreferrer"&gt;Try AskRich →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ask it about architecture decisions, migration strategy, or platform delivery outcomes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://www.myrobertson.com/blog/how-i-built-the-askrich-chatbot-for-technical-screening" rel="noopener noreferrer"&gt;Full write-up with architecture diagrams and implementation detail on my site.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>architecture</category>
      <category>webdev</category>
      <category>career</category>
    </item>
    <item>
      <title>Backpressure in Distributed Systems: Stability, Correctness, and Graceful Degradation</title>
      <dc:creator>Rich Robertson</dc:creator>
      <pubDate>Mon, 23 Mar 2026 07:00:01 +0000</pubDate>
      <link>https://forem.com/rich_robertson/backpressure-in-distributed-systems-stability-correctness-and-graceful-degradation-2b4f</link>
      <guid>https://forem.com/rich_robertson/backpressure-in-distributed-systems-stability-correctness-and-graceful-degradation-2b4f</guid>
      <description>&lt;h2&gt;
  
  
  Backpressure in Distributed Systems: Stability, Correctness, and Graceful Degradation
&lt;/h2&gt;

&lt;p&gt;Most distributed systems do not fail because average traffic is a little too high. They fail when work arrives faster than it can be completed, queues grow faster than they can drain, and retries or fan-out make the original problem worse.&lt;/p&gt;

&lt;p&gt;That is why backpressure matters. It is not just a throughput tweak or a queue setting. It is the control mechanism that lets a system say “not now” before it collapses.&lt;/p&gt;

&lt;p&gt;Distributed systems rarely fail because average demand is slightly above average capacity. They fail when arrival rates outrun service rates, queues accumulate faster than they can drain, and retries or fan-out amplify the initial disturbance. In that regime, backpressure is not a minor throughput optimization; it is the control mechanism that keeps throughput, latency, and resource usage inside a stable operating envelope (IBM, 2026; Reactive Streams, 2022).&lt;/p&gt;

&lt;h2&gt;
  
  
  Backpressure as Flow-Control Feedback
&lt;/h2&gt;

&lt;p&gt;Backpressure is best modeled as feedback that propagates downstream capacity limits upstream. Producers are not permitted to emit unbounded work; instead, the system signals when demand must be delayed, reduced, or rejected. Reactive Streams formalizes this requirement for asynchronous boundaries, where unchecked producers would otherwise force receivers to buffer unbounded data (Reactive Streams, 2022).&lt;/p&gt;

&lt;p&gt;The same control principle applies to RPC chains, event pipelines, brokered messaging, and service meshes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Queueing Stability, Not Just Performance Tuning
&lt;/h2&gt;

&lt;p&gt;Queueing fundamentals explain why this matters: when offered load persistently exceeds effective service capacity, queue length and response time become nonlinear near saturation. IBM guidance highlights the practical implication: as utilization approaches limits, latency rises sharply, and backlog itself becomes a failure source through memory pressure, timeout cascades, and degraded dependency behavior (IBM, 2026; Amazon Web Services, 2022).&lt;/p&gt;

&lt;p&gt;This is why unbounded queues are hazardous. They often postpone visible failure while silently increasing tail latency and resource debt. AWS reliability guidance recommends failing fast and limiting queues specifically to avoid insurmountable backlog states (Amazon Web Services, 2022). A bounded queue is therefore not a concession; it is an explicit overload signal that enables timely corrective action.&lt;/p&gt;

&lt;h2&gt;
  
  
  Autoscaling Cannot Replace Immediate Overload Control
&lt;/h2&gt;

&lt;p&gt;Autoscaling is necessary but temporally mismatched with burst-driven overload. Requirements-driven studies of microservice autoscaling show that generic threshold-based policies can react slowly and allocate capacity suboptimally under dynamic workloads (Nunes et al., 2024). Scale-out decisions require observation windows, control decisions, scheduling, and startup latency. Overload can materialize much faster through retries, correlated bursts, and dependency regressions.&lt;/p&gt;

&lt;p&gt;Stable systems therefore combine medium-timescale capacity adaptation with millisecond-timescale flow control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Backpressure Primitives
&lt;/h2&gt;

&lt;p&gt;Four mechanisms recur across robust architectures.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Admission control&lt;/strong&gt; limits accepted work to protect critical resources under contention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bounded buffering&lt;/strong&gt; exposes pressure instead of concealing it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load shedding&lt;/strong&gt; discards lower-priority work so critical paths stay within service objectives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive concurrency&lt;/strong&gt; adjusts in-flight work based on observed latency, making admission limits responsive to current system stress rather than static guesses (Amazon Web Services, n.d.-a; Amazon Web Services, n.d.-b; Netflix, 2025).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent practice adds policy awareness to these primitives. Netflix reports service-level-prioritized shedding that preserves high-value requests while trimming less critical traffic only when needed. The objective is not indiscriminate reduction; it is selective reduction aligned with user and business criticality (Gancarz, 2024).&lt;/p&gt;

&lt;h2&gt;
  
  
  Asynchronous Pipelines and Backlog Liability
&lt;/h2&gt;

&lt;p&gt;Event-driven topologies improve decoupling and failure isolation, but they also enable producers to outpace consumers for prolonged intervals. Reactive Streams treats this as a first-order correctness constraint for asynchronous processing (Reactive Streams, 2022). AWS operational guidance reaches the same conclusion in queue-backed systems: durability benefits collapse if backlog growth is unmanaged during spikes or partial outages (Amazon Web Services, n.d.-b).&lt;/p&gt;

&lt;h2&gt;
  
  
  Fairness and Isolation in Shared Infrastructure
&lt;/h2&gt;

&lt;p&gt;Backpressure is also an isolation mechanism. In multi-tenant systems, aggregate stability is insufficient if one tenant or request class can consume disproportionate shared capacity. AWS links admission control and rate limiting directly to fairness and predictable performance in shared environments (Amazon Web Services, n.d.-a). Effective overload control therefore enforces both platform protection and workload isolation.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Thresholds to Closed-Loop Regulation
&lt;/h2&gt;

&lt;p&gt;A consistent industry direction is visible: fixed thresholds are yielding to closed-loop regulation driven by observed latency, queue depth, concurrency, and service objectives. Autoscaling research, adaptive concurrency control, and prioritized shedding all converge on the same systems insight: stability under load must be actively governed (Nunes et al., 2024; Netflix, 2025; Gancarz, 2024).&lt;/p&gt;

&lt;p&gt;The engineering conclusion is straightforward. A system is not meaningfully scalable unless it can refuse or defer work when required. Designs that can only accept additional load are not resilient; they are temporarily lucky. Backpressure is what converts capacity uncertainty into disciplined behavior and graceful degradation (IBM, 2026; Reactive Streams, 2022; Nunes et al., 2024).&lt;/p&gt;




&lt;p&gt;I write more about distributed systems, platform architecture, and production engineering at my site:&lt;br&gt;
&lt;a href="https://www.myrobertson.com" rel="noopener noreferrer"&gt;https://www.myrobertson.com&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Web Services. (2022). REL05-BP04: Fail fast and limit queues. AWS Well-Architected Framework.&lt;/li&gt;
&lt;li&gt;Amazon Web Services. (n.d.-a). Fairness in multi-tenant systems. Amazon Builders’ Library.&lt;/li&gt;
&lt;li&gt;Amazon Web Services. (n.d.-b). Avoiding insurmountable queue backlogs. Amazon Builders’ Library.&lt;/li&gt;
&lt;li&gt;Gancarz, R. (2024, November 23). Netflix rolls out service-level prioritized load shedding to improve resiliency. InfoQ.&lt;/li&gt;
&lt;li&gt;IBM. (2026). WebSphere Application Server performance cookbook: Statistics. IBM.&lt;/li&gt;
&lt;li&gt;Netflix. (2025). concurrency-limits [Software repository]. GitHub.&lt;/li&gt;
&lt;li&gt;Nunes, J. P. K. S., Nejati, S., Sabetzadeh, M., &amp;amp; Nakagawa, E. Y. (2024). Self-adaptive, requirements-driven autoscaling of microservices. ACM/ArXiv.&lt;/li&gt;
&lt;li&gt;Reactive Streams. (2022). Reactive Streams 1.0.4.``&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>distributedsystems</category>
      <category>backend</category>
      <category>architecture</category>
      <category>reliability</category>
    </item>
  </channel>
</rss>
