<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Edwar Diaz</title>
    <description>The latest articles on Forem by Edwar Diaz (@botoom).</description>
    <link>https://forem.com/botoom</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F739122%2F7a4fce73-f218-44cf-938f-eb0b03ac9b90.png</url>
      <title>Forem: Edwar Diaz</title>
      <link>https://forem.com/botoom</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/botoom"/>
    <language>en</language>
    <item>
      <title>Why Ignoring Token Costs Can Kill Your AI Product (and How to Fix It)</title>
      <dc:creator>Edwar Diaz</dc:creator>
      <pubDate>Wed, 25 Mar 2026 23:21:31 +0000</pubDate>
      <link>https://forem.com/botoom/why-ignoring-token-costs-can-kill-your-ai-product-and-how-to-fix-it-2c64</link>
      <guid>https://forem.com/botoom/why-ignoring-token-costs-can-kill-your-ai-product-and-how-to-fix-it-2c64</guid>
      <description>&lt;p&gt;When building applications powered by LLMs from providers like OpenAI, Google, or Mistral AI, there’s a detail that often gets overlooked:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;token cost.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At small scale, it’s barely noticeable. But once your application starts getting real usage, token consumption grows quickly—and if you’re not measuring it, you can easily end up with a feature that costs more than the value it delivers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem with token usage
&lt;/h2&gt;

&lt;p&gt;Every interaction with an LLM typically involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input tokens (your prompt)&lt;/li&gt;
&lt;li&gt;output tokens (the model’s response)&lt;/li&gt;
&lt;li&gt;sometimes cache tokens, depending on the provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Individually, these costs are small. But combined with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;longer prompts&lt;/li&gt;
&lt;li&gt;verbose outputs&lt;/li&gt;
&lt;li&gt;high request volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;they scale faster than most people expect.&lt;/p&gt;

&lt;p&gt;And there’s an important nuance here:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Not all models cost the same, and not all tasks require the same type of model.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Model selection is a cost decision
&lt;/h2&gt;

&lt;p&gt;It’s common to default to the most capable model available, but that’s rarely the most efficient choice.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you don’t need a reasoning-heavy model for simple transformations&lt;/li&gt;
&lt;li&gt;you don’t need multimodal capabilities if you're only processing text&lt;/li&gt;
&lt;li&gt;many providers offer smaller or optimized variants (mini, nano, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choosing the right model affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cost&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where cost awareness becomes part of system design, not just an afterthought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why you should estimate costs early
&lt;/h2&gt;

&lt;p&gt;If you’re building anything beyond a prototype, you should be able to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how much does each request cost?&lt;/li&gt;
&lt;li&gt;what is the expected daily usage?&lt;/li&gt;
&lt;li&gt;what does that translate to monthly or yearly?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Frameworks like LangChain, Azure AI Foundry, or AWS Bedrock usually provide token usage metrics (input/output/cache). That’s helpful, but incomplete.&lt;/p&gt;

&lt;p&gt;In many cases, you still need to map those numbers to actual pricing yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Calculating token costs
&lt;/h2&gt;

&lt;p&gt;If you already have token counts, the calculation is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cost = (input_tokens / 1000 * input_price) + (output_tokens / 1000 * output_price)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge is when you don’t have those token counts directly.&lt;/p&gt;

&lt;p&gt;In that case, you can approximate them by tokenizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the input text you send&lt;/li&gt;
&lt;li&gt;the expected output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you a reasonable baseline for estimation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tools that make this easier
&lt;/h2&gt;

&lt;p&gt;There are a couple of tools that simplify this process.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Prices
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.llm-prices.com/" rel="noopener noreferrer"&gt;https://www.llm-prices.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This tool lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input token counts&lt;/li&gt;
&lt;li&gt;select specific models&lt;/li&gt;
&lt;li&gt;estimate cost per request&lt;/li&gt;
&lt;li&gt;define custom pricing if needed&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Token Budget Calculator
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://tokenbudget.edwardiaz.dev/" rel="noopener noreferrer"&gt;https://tokenbudget.edwardiaz.dev/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A more complete approach is to use a tool that combines token estimation with cost projection.&lt;/p&gt;

&lt;p&gt;With this kind of platform, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;paste input and output text&lt;/li&gt;
&lt;li&gt;automatically estimate token usage&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;calculate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cost per request&lt;/li&gt;
&lt;li&gt;daily cost&lt;/li&gt;
&lt;li&gt;monthly cost&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;define request frequency (per day / per month)&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;It also allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compare across a large set of models (100+)&lt;/li&gt;
&lt;li&gt;filter by provider or capabilities&lt;/li&gt;
&lt;li&gt;sort by cost efficiency&lt;/li&gt;
&lt;li&gt;get a recommendation for the most cost-effective model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, there is API support, which makes it possible to integrate cost estimation directly into your own systems. This is especially useful if you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;track cost per request internally&lt;/li&gt;
&lt;li&gt;build usage dashboards&lt;/li&gt;
&lt;li&gt;enforce budgets or limits at the application level&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Planning for scale
&lt;/h2&gt;

&lt;p&gt;Once you start tracking token usage and costs, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;forecast infrastructure expenses&lt;/li&gt;
&lt;li&gt;define budgets&lt;/li&gt;
&lt;li&gt;prevent unexpected spikes&lt;/li&gt;
&lt;li&gt;choose models more intentionally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what turns an experimental feature into something sustainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tokens also impact rate limits
&lt;/h2&gt;

&lt;p&gt;Cost is only one side of the problem.&lt;/p&gt;

&lt;p&gt;Many providers enforce limits based on tokens, such as tokens per minute. If your prompts or outputs are too large, you may run into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;throttling&lt;/li&gt;
&lt;li&gt;increased latency&lt;/li&gt;
&lt;li&gt;failed requests under load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reducing token usage helps both with cost and system stability.&lt;/p&gt;




&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;Understanding cost is the first step. The next one is optimization.&lt;/p&gt;

&lt;p&gt;In a follow-up post, I’ll go deeper into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt optimization techniques&lt;/li&gt;
&lt;li&gt;reducing token usage without losing quality&lt;/li&gt;
&lt;li&gt;practical ways to make LLM integrations more efficient&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re not measuring token usage, you’re making decisions without visibility.&lt;/p&gt;

&lt;p&gt;Tracking tokens, estimating costs, and choosing the right model are not optional if you care about building scalable AI systems.&lt;/p&gt;

&lt;p&gt;It’s a small investment early on that can save you a lot later.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tokenization</category>
      <category>aicost</category>
      <category>performance</category>
    </item>
    <item>
      <title>I Built a Context-Aware AI Browser Mentor Powered by GitHub Copilot CLI</title>
      <dc:creator>Edwar Diaz</dc:creator>
      <pubDate>Fri, 13 Feb 2026 20:58:37 +0000</pubDate>
      <link>https://forem.com/botoom/i-built-a-context-aware-ai-browser-mentor-powered-by-github-copilot-cli-4p5j</link>
      <guid>https://forem.com/botoom/i-built-a-context-aware-ai-browser-mentor-powered-by-github-copilot-cli-4p5j</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  I Built a Context-Aware AI Browser Mentor Powered by GitHub Copilot CLI
&lt;/h1&gt;

&lt;p&gt;What if GitHub Copilot CLI could see what you see, understand your context, and help you without breaking your workflow?&lt;/p&gt;

&lt;p&gt;That question led me to build &lt;strong&gt;DevMentorAI&lt;/strong&gt; — a browser extension that transforms Copilot CLI into a real-time AI mentor inside your browser.&lt;/p&gt;

&lt;p&gt;Built entirely with GitHub Copilot CLI — from extension to backend to landing page to release workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DevMentorAI&lt;/strong&gt; is a context-aware AI assistant that lives inside your browser and understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What page you're on&lt;/li&gt;
&lt;li&gt;What text you've selected&lt;/li&gt;
&lt;li&gt;What you're trying to write&lt;/li&gt;
&lt;li&gt;What you're troubleshooting&lt;/li&gt;
&lt;li&gt;What you want to improve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of copying context into prompts, DevMentorAI sends context automatically to Copilot CLI.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✨ Core capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📄 Context capture from the current page&lt;/li&gt;
&lt;li&gt;📸 Screenshot understanding&lt;/li&gt;
&lt;li&gt;✍️ Grammar correction &amp;amp; rewriting&lt;/li&gt;
&lt;li&gt;🔄 Replace text directly inside inputs (emails, chats, forms)&lt;/li&gt;
&lt;li&gt;🛠 Works for development, DevOps, writing, learning, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No framework lock-in. No domain restriction. Just AI assistance anywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎥 Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🌐 Project Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Landing page&lt;br&gt;
&lt;a href="https://devmentorai.edwardiaz.dev/" rel="noopener noreferrer"&gt;https://devmentorai.edwardiaz.dev/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Installation Guide&lt;br&gt;
&lt;a href="https://devmentorai.edwardiaz.dev/installation" rel="noopener noreferrer"&gt;https://devmentorai.edwardiaz.dev/installation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub Repository&lt;br&gt;
&lt;a href="https://github.com/BOTOOM/devmentorai" rel="noopener noreferrer"&gt;https://github.com/BOTOOM/devmentorai&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backend NPX Package&lt;br&gt;
&lt;a href="https://www.npmjs.com/package/devmentorai-server" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/devmentorai-server&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extension Downloads&lt;br&gt;
&lt;a href="https://github.com/BOTOOM/devmentorai/releases" rel="noopener noreferrer"&gt;https://github.com/BOTOOM/devmentorai/releases&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ▶️ Full Walkthrough
&lt;/h3&gt;

&lt;p&gt;📺 Video:&lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/Z_MnW1hJubM"&gt;
  &lt;/iframe&gt;


.&lt;/p&gt;




&lt;h3&gt;
  
  
  ⚡ Feature Highlights
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Context-aware assistance
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv89lcr6gfzms39dmrqmb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv89lcr6gfzms39dmrqmb.gif" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Grammar correction replacing text directly in inputs
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxh70tdybvvkfudmdhqa.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxh70tdybvvkfudmdhqa.gif" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Installation from zero using NPX backend
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rt9xt32m4ovtmgkynje.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8rt9xt32m4ovtmgkynje.gif" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗 How It Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Extension captures page context + optional screenshot&lt;/li&gt;
&lt;li&gt;Sends to local backend&lt;/li&gt;
&lt;li&gt;Backend communicates with Copilot CLI&lt;/li&gt;
&lt;li&gt;AI response returned&lt;/li&gt;
&lt;li&gt;Optional direct replacement into page inputs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a seamless AI workflow without leaving the browser.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Privacy &amp;amp; Security
&lt;/h2&gt;

&lt;p&gt;DevMentorAI runs locally and respects user control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No credentials required&lt;/li&gt;
&lt;li&gt;Uses your Copilot CLI session&lt;/li&gt;
&lt;li&gt;Backend runs locally via NPX&lt;/li&gt;
&lt;li&gt;Users control what context is shared&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🤖 My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;Using GitHub Copilot CLI was both enriching and fun. I discovered capabilities far beyond what I previously experienced using Copilot inside editors.&lt;/p&gt;

&lt;p&gt;I started with little experience using Copilot CLI, but by following the official documentation and experimenting with slash commands, I learned how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create custom agents&lt;/li&gt;
&lt;li&gt;Implement skills (including WXT extension knowledge)&lt;/li&gt;
&lt;li&gt;Use advanced TypeScript skills&lt;/li&gt;
&lt;li&gt;Plan and execute complex builds through the CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the most impressive features was agent-based planning mode. Copilot could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Plan the entire feature&lt;/li&gt;
&lt;li&gt;Execute it&lt;/li&gt;
&lt;li&gt;Iterate quickly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All from the terminal — lightweight and extremely fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 What surprised me most
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Copilot CLI enabled building an entire full-stack project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser extension&lt;/li&gt;
&lt;li&gt;Backend server&lt;/li&gt;
&lt;li&gt;Landing page&lt;/li&gt;
&lt;li&gt;Release workflows&lt;/li&gt;
&lt;li&gt;NPX package&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The planning file memory system was incredibly powerful.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧪 Workflow I Discovered
&lt;/h3&gt;

&lt;p&gt;As sessions grew large, context sometimes became less effective. I learned to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start a new session per major feature&lt;/li&gt;
&lt;li&gt;Refine functionality within that session&lt;/li&gt;
&lt;li&gt;Commit once complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dramatically improved results.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛠 Problem Solving with Copilot
&lt;/h3&gt;

&lt;p&gt;Occasionally, Copilot would get stuck in a loop trying the same solution. When that happened, guiding it to consider alternative perspectives helped it resolve issues successfully.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 Why This Matters
&lt;/h2&gt;

&lt;p&gt;DevMentorAI demonstrates a new paradigm:&lt;/p&gt;

&lt;p&gt;AI assistance that adapts to your context instead of forcing you to adapt to it.&lt;/p&gt;

&lt;p&gt;GitHub Copilot CLI made this possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  👤 Author
&lt;/h2&gt;

&lt;p&gt;Edwar Diaz&lt;br&gt;
DEV: &lt;a class="mentioned-user" href="https://dev.to/botoom"&gt;@botoom&lt;/a&gt;&lt;br&gt;
GitHub: &lt;a href="https://github.com/BOTOOM" rel="noopener noreferrer"&gt;https://github.com/BOTOOM&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
  </channel>
</rss>
