<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Phasu  Yeneng</title>
    <description>The latest articles on Forem by Phasu  Yeneng (@kmusicman).</description>
    <link>https://forem.com/kmusicman</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1381246%2Fe58a0893-0bc2-4a5d-9845-3bbe41076adf.jpeg</url>
      <title>Forem: Phasu  Yeneng</title>
      <link>https://forem.com/kmusicman</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kmusicman"/>
    <language>en</language>
    <item>
      <title>9 Verified Tools to Stop Burning Claude Tokens Unnecessarily</title>
      <dc:creator>Phasu  Yeneng</dc:creator>
      <pubDate>Mon, 20 Apr 2026 14:03:30 +0000</pubDate>
      <link>https://forem.com/kmusicman/9-verified-tools-to-stop-burning-claude-tokens-unnecessarily-f9e</link>
      <guid>https://forem.com/kmusicman/9-verified-tools-to-stop-burning-claude-tokens-unnecessarily-f9e</guid>
      <description>&lt;p&gt;You're not using Claude more — you're just wasting more context.&lt;/p&gt;

&lt;p&gt;I went looking for real, working tools after seeing a widely-shared list that mixed legitimate repos with hallucinated ones. This article only covers tools I could verify on GitHub, organized by the type of waste they fix.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why tokens disappear faster than you expect
&lt;/h2&gt;

&lt;p&gt;Before the tools: understanding &lt;em&gt;where&lt;/em&gt; tokens actually go.&lt;/p&gt;

&lt;p&gt;Most developers assume their prompts are the main cost. They're not. The real culprits are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Verbose model output&lt;/strong&gt; — Claude explaining what it's about to do, then doing it, then summarizing what it did&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raw CLI output&lt;/strong&gt; — dumping full &lt;code&gt;git log&lt;/code&gt;, &lt;code&gt;npm install&lt;/code&gt;, or test runner output directly into context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bloated CLAUDE.md&lt;/strong&gt; — this file loads on &lt;em&gt;every&lt;/em&gt; turn before Claude reads a single line of your code. A 5,000-token CLAUDE.md costs 5,000 tokens per message, before you've typed a word&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code navigation by content&lt;/strong&gt; — when Claude reads entire files to find one function instead of navigating by symbol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ghost tokens&lt;/strong&gt; — leftover context from earlier in the session that no longer contributes to the task but still costs money every turn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each category has a different fix. Here's what actually works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 1 — Shrink what Claude writes back
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/juliusbrussee/caveman" rel="noopener noreferrer"&gt;Caveman&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The simplest idea with surprisingly good results: make Claude talk like a caveman. Short words. No filler. Still technically accurate.&lt;/p&gt;

&lt;p&gt;It ships as a Claude Code skill that cuts ~65–75% of output tokens while preserving full technical accuracy. The compression is aggressive but the signal stays intact — you get &lt;code&gt;fix auth bug in login.js line 42&lt;/code&gt; instead of three paragraphs explaining what the fix does.&lt;/p&gt;

&lt;p&gt;Works as a plugin for Cursor, Windsurf, Cline, and others too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# install as a Claude Code skill&lt;/span&gt;
&lt;span class="c"&gt;# see: https://github.com/juliusbrussee/caveman&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Long coding sessions where Claude's explanations are eating your context budget.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;a href="https://github.com/drona23/claude-token-efficient" rel="noopener noreferrer"&gt;claude-token-efficient&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Single &lt;code&gt;CLAUDE.md&lt;/code&gt; file. Drop it into your project. Done.&lt;/p&gt;

&lt;p&gt;It bakes response-terseness rules directly into Claude's instructions, forcing shorter output on heavy workflows without you having to prompt for it every time. No code changes, no new dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Projects where you want a permanent "be concise" baseline without running an extra tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 2 — Compress what you send in
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/rtk-ai/rtk" rel="noopener noreferrer"&gt;RTK (Rust Token Killer)&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A CLI proxy that filters terminal output before it reaches Claude. Instead of Claude seeing 2,000 tokens of raw &lt;code&gt;git status&lt;/code&gt;, it sees ~200 tokens of the relevant parts.&lt;/p&gt;

&lt;p&gt;The hook transparently rewrites shell commands — &lt;code&gt;git status&lt;/code&gt; becomes &lt;code&gt;rtk git status&lt;/code&gt; — and Claude never sees the rewrite, just the compressed result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# without rtk&lt;/span&gt;
git status  → ~2,000 tokens raw output

&lt;span class="c"&gt;# with rtk&lt;/span&gt;
rtk git status  → ~200 tokens filtered output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claims 60–90% reduction on common dev commands. Single Rust binary, zero dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams running agentic workflows where Claude executes a lot of shell commands.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;a href="https://github.com/chopratejas/headroom" rel="noopener noreferrer"&gt;Headroom&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;A context optimization layer that sits between your app and the LLM. Three compression modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SmartCrusher&lt;/strong&gt; — JSON compression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CodeCompressor&lt;/strong&gt; — AST-aware code compression (understands structure, not just text)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kompress-base&lt;/strong&gt; — general text compression&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AST-aware approach is the interesting one. It doesn't just truncate code — it understands which parts of a file are structurally relevant and compresses accordingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Applications (not just Claude Code) that programmatically build context before sending to any LLM.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;a href="https://github.com/nadimtuhin/claude-token-optimizer" rel="noopener noreferrer"&gt;claude-token-optimizer&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Reusable CLAUDE.md setup prompts that structure your documentation so Claude only loads what it needs per task.&lt;/p&gt;

&lt;p&gt;One real-world example from the repo: a RedwoodJS project reduced session start from 11,000 tokens down to 1,300 by restructuring which docs load when.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Projects with large documentation that Claude currently loads all at once.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 3 — MCP-level optimization
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/ooples/token-optimizer-mcp" rel="noopener noreferrer"&gt;Token Optimizer MCP&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Intelligent token optimization for Claude Code via MCP. Claims 95%+ reduction through caching, compression, and smart tool intelligence — meaning it tracks which tools Claude actually uses and optimizes the tool definitions it sends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Claude Code users running MCP-heavy workflows.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;a href="https://glama.ai/mcp/servers/woling-dev/promptthrift-mcp" rel="noopener noreferrer"&gt;PromptThrift MCP&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Compresses conversation history using a local Gemma 4 model (runs on your machine, no extra API cost) or heuristic fallback. Key feature: &lt;strong&gt;pinned facts&lt;/strong&gt; — you can mark specific context as protected so it survives compression.&lt;/p&gt;

&lt;p&gt;Claims 70–90% reduction on conversation history while keeping critical context intact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Long multi-turn sessions where early context becomes expensive dead weight.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;a href="https://github.com/Mibayy/token-savior" rel="noopener noreferrer"&gt;Token Savior&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;MCP server specifically built for code navigation. Instead of Claude reading entire files to find a function, it indexes your codebase by symbol — functions, classes, imports, call graph — and navigates by pointer.&lt;/p&gt;

&lt;p&gt;Also includes a persistent memory engine that stores decisions, conventions, and session summaries in SQLite and re-injects them as a compact delta at session start.&lt;/p&gt;

&lt;p&gt;Claims 97% reduction on code navigation across 170+ real sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Large codebases where Claude spends significant tokens just finding the right file and function.&lt;/p&gt;




&lt;h2&gt;
  
  
  Category 4 — Clean up ghost tokens
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/alexgreensh/token-optimizer" rel="noopener noreferrer"&gt;Token Optimizer&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Targets "ghost tokens" — context that's still technically in the window but no longer relevant to the current task. Also helps survive context compaction without losing quality.&lt;/p&gt;

&lt;p&gt;More of a diagnostic and cleanup tool than a compression layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Sessions that run long and accumulate stale context from earlier tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The free fix most people skip
&lt;/h2&gt;

&lt;p&gt;Before installing anything: &lt;strong&gt;audit your CLAUDE.md&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;According to Claude Code's official documentation, CLAUDE.md loads before every response, before Claude reads your code, before anything else. It's the most expensive file in your project on a per-token basis.&lt;/p&gt;

&lt;p&gt;The recommended limit is &lt;strong&gt;under 200 lines&lt;/strong&gt;. If yours is longer, move sections into on-demand skill files that only load when invoked. Most CLAUDE.md files I've seen in the wild are 3–5x longer than they need to be.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it fixes&lt;/th&gt;
&lt;th&gt;Claimed reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/juliusbrussee/caveman" rel="noopener noreferrer"&gt;Caveman&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Verbose model output&lt;/td&gt;
&lt;td&gt;~65–75% output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/drona23/claude-token-efficient" rel="noopener noreferrer"&gt;claude-token-efficient&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Verbose model output&lt;/td&gt;
&lt;td&gt;Drop-in terseness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/rtk-ai/rtk" rel="noopener noreferrer"&gt;RTK&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Raw CLI output in context&lt;/td&gt;
&lt;td&gt;60–90% on shell commands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/chopratejas/headroom" rel="noopener noreferrer"&gt;Headroom&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Input context size&lt;/td&gt;
&lt;td&gt;AST-aware compression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/nadimtuhin/claude-token-optimizer" rel="noopener noreferrer"&gt;claude-token-optimizer&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Doc structure / loading&lt;/td&gt;
&lt;td&gt;11k → 1.3k session start&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/ooples/token-optimizer-mcp" rel="noopener noreferrer"&gt;Token Optimizer MCP&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MCP tool definitions&lt;/td&gt;
&lt;td&gt;95%+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://glama.ai/mcp/servers/woling-dev/promptthrift-mcp" rel="noopener noreferrer"&gt;PromptThrift MCP&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Conversation history&lt;/td&gt;
&lt;td&gt;70–90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/Mibayy/token-savior" rel="noopener noreferrer"&gt;Token Savior&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Code navigation&lt;/td&gt;
&lt;td&gt;~97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/alexgreensh/token-optimizer" rel="noopener noreferrer"&gt;Token Optimizer&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Ghost tokens / session health&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The numbers above come from each project's own documentation — treat them as upper bounds under ideal conditions, not guarantees.&lt;/p&gt;

&lt;p&gt;Pick one from Category 1 and one from Category 2. Those two changes alone will have the most impact on day-to-day cost for most workflows.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>tools</category>
    </item>
    <item>
      <title>Why your RAG chatbot fails in Thai — and how to fix it</title>
      <dc:creator>Phasu  Yeneng</dc:creator>
      <pubDate>Sun, 19 Apr 2026 10:08:22 +0000</pubDate>
      <link>https://forem.com/kmusicman/why-your-rag-chatbot-fails-in-thai-and-how-to-fix-it-3m72</link>
      <guid>https://forem.com/kmusicman/why-your-rag-chatbot-fails-in-thai-and-how-to-fix-it-3m72</guid>
      <description>&lt;h2&gt;
  
  
  Why your RAG chatbot fails in Thai — and how to fix it
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;A real-world walkthrough of how we built a customer service chatbot for a Thai e-commerce company — and the chunking problem nobody warns you about.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When I started building a RAG (Retrieval-Augmented Generation) chatbot for a Thai e-commerce company, I made the same mistake every developer makes: I copied the LangChain quickstart example, set &lt;code&gt;chunk_size=500&lt;/code&gt;, and expected things to just work.&lt;/p&gt;

&lt;p&gt;They didn't.&lt;/p&gt;

&lt;p&gt;This is the story of why naive chunking fails for Thai text, what we built instead, and the full pipeline from PDF product manuals to chatbot answers — using Python, Qdrant, and OpenAI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Warns You About
&lt;/h2&gt;

&lt;p&gt;Most RAG tutorials are written with English in mind. The chunking logic looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Works fine for English
&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# or
&lt;/span&gt;&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because English has clear word boundaries — spaces between every word. When you split on periods or character count, you still get coherent, searchable chunks.&lt;/p&gt;

&lt;p&gt;Thai is completely different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thai has no spaces between words.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This sentence — "ร้านค้าของเรามีสินค้าหลายหมวดหมู่ให้เลือกซื้อ" — means "Our store has many product categories to choose from." But to a naive chunker, it looks like one enormous, unsplittable blob. There are 7 meaningful words in there, with zero whitespace between them.&lt;/p&gt;

&lt;p&gt;Here's what happens when you embed that raw blob versus properly tokenized words:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input to embedding model&lt;/th&gt;
&lt;th&gt;What it sees&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ร้านค้าของเรามีสินค้าหลายหมวดหมู่ให้เลือกซื้อ&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One opaque token sequence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;`ร้านค้า \&lt;/td&gt;
&lt;td&gt;ของเรา \&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The second form produces embeddings that actually capture the meaning of each concept — "store", "product", "category" — which leads to better retrieval when a user asks "มีสินค้าหมวดหมู่ไหนบ้าง" (what product categories are available?).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pipeline We Built
&lt;/h2&gt;

&lt;p&gt;Here's the full architecture:&lt;br&gt;
{% raw %}&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PDF product manuals / FAQ documents
    |
Python (PyMuPDF) → extract raw text
    |
Sentence splitting by '. '
    |
[Stored in MongoDB as raw sentences]
    |
Python → pythainlp tokenization
    |
OpenAI text-embedding-3-small
    |
Qdrant vector database (cosine similarity, 1536 dims)
    |
User query → tokenize → embed → search → top-7 chunks
    |
GPT-4o-mini + context → answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's walk through each step with real code. Here are the dependencies we'll use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# requirements.txt
&lt;/span&gt;&lt;span class="py"&gt;pymupdf&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.27.2.2&lt;/span&gt;
&lt;span class="py"&gt;pythainlp&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=5.2.0&lt;/span&gt;
&lt;span class="py"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=2.32.0&lt;/span&gt;
&lt;span class="py"&gt;qdrant-client&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=1.17.1&lt;/span&gt;
&lt;span class="py"&gt;pymongo&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;=4.10.1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 1 — Extract Text from PDF
&lt;/h2&gt;

&lt;p&gt;We use &lt;code&gt;PyMuPDF&lt;/code&gt; (the &lt;code&gt;fitz&lt;/code&gt; library) instead of &lt;code&gt;PyPDF2&lt;/code&gt; because it handles Thai character encoding much more reliably.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/python/PdfToSentences.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pymupdf&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;  &lt;span class="c1"&gt;# PyMuPDF 1.27+ (legacy: import fitz)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_sentences_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;pdf_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fitz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pdf_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Split on English period + space — works for mixed Thai/English documents
&lt;/span&gt;    &lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cleaned_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\u2022&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Remove bullet points
&lt;/span&gt;    &lt;span class="n"&gt;cleaned_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\s+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cleaned_text&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cleaned_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to note here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;PyMuPDF&lt;/code&gt; over &lt;code&gt;PyPDF2&lt;/code&gt;?&lt;/strong&gt; Thai PDF documents often use non-standard font encodings. &lt;code&gt;PyMuPDF&lt;/code&gt; handles these much better — with &lt;code&gt;PyPDF2&lt;/code&gt; you'd frequently get garbled output or empty strings for Thai text blocks. Note: as of PyMuPDF 1.24+, the recommended import is &lt;code&gt;import pymupdf&lt;/code&gt; (the old &lt;code&gt;import fitz&lt;/code&gt; still works but is considered legacy).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why split on &lt;code&gt;.&lt;/code&gt; (period + space)?&lt;/strong&gt; Our documents are mixed Thai/English — product names, SKUs, and technical specs are often in English, while descriptions are Thai. The period-space split is a pragmatic middle ground that preserves Thai paragraphs as single chunks rather than fragmenting them randomly at character 500.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;⚠️ Limitation:&lt;/strong&gt; Formal Thai text often ends paragraphs with a line break rather than a period. If your PDFs have no periods at all, &lt;code&gt;text.split('. ')&lt;/code&gt; will return one giant chunk per page. In that case, use &lt;code&gt;pythainlp&lt;/code&gt;'s sentence tokenizer instead:&lt;/p&gt;


&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pythainlp.tokenize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sent_tokenize&lt;/span&gt;
&lt;span class="n"&gt;sentences&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sent_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crfcut&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 2 — Thai Word Tokenization Before Embedding
&lt;/h2&gt;

&lt;p&gt;This is the most important step, and the one that differs most from English RAG.&lt;/p&gt;

&lt;p&gt;Before sending any Thai text to the embedding model, we tokenize it with &lt;code&gt;pythainlp&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# thai_tokenizer.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pythainlp.tokenize&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;word_tokenize&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;word_cut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;word_tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;newmm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Join with pipe separator so the embedding model sees distinct units
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;pythainlp&lt;/code&gt; uses a dictionary-based approach (&lt;code&gt;newmm&lt;/code&gt; engine) to segment Thai text into individual words:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Input:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s2"&gt;"สินค้าอิเล็กทรอนิกส์ราคาถูกส่งฟรี"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Output:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"สินค้า|อิเล็กทรอนิกส์|ราคาถูก|ส่งฟรี"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the embedding model sees four distinct semantic units instead of one long string. The cosine similarity between "ส่งฟรี" (free shipping) and a user's query "จัดส่งฟรีไหม" (is shipping free?) will be much higher and more meaningful after proper tokenization.&lt;/p&gt;

&lt;p&gt;We also tried &lt;code&gt;attacut&lt;/code&gt; (a neural-network-based engine in &lt;code&gt;pythainlp&lt;/code&gt;) but settled on &lt;code&gt;newmm&lt;/code&gt; for its speed and dictionary coverage — important when your domain includes product jargon and Thai promotional phrases like "ลดราคา", "ส่งฟรี", "ผ่อนชำระ".&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — Generate and Store Embeddings
&lt;/h2&gt;

&lt;p&gt;We use OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; for embeddings — the current-generation model that replaced &lt;code&gt;text-embedding-ada-002&lt;/code&gt;. It scores 44% on the MIRACL multilingual benchmark vs 31.4% for the old model, and costs 5x less. The key is that we tokenize &lt;strong&gt;before&lt;/strong&gt; embedding — not after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ingest_embeddings.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;thai_tokenizer&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;word_cut&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai_module&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_embedding&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# ✅ Tokenize Thai text FIRST
&lt;/span&gt;    &lt;span class="n"&gt;tokenized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;word_cut&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Then embed the tokenized version
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;      &lt;span class="c1"&gt;# store original for display
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;    &lt;span class="c1"&gt;# store original keyword
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;    &lt;span class="c1"&gt;# embed the tokenized version
&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;sentences_collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice we store the &lt;strong&gt;original&lt;/strong&gt; text as the payload but create the embedding from the &lt;strong&gt;tokenized&lt;/strong&gt; version. This way, when a match is found, the chatbot returns the human-readable original sentence — not the pipe-separated tokenized form.&lt;/p&gt;

&lt;p&gt;The embedding function itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# openai_module.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;MAX_INPUT_LENGTH&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MAX_INPUT_LENGTH&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Text too long&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# replaces text-embedding-ada-002
&lt;/span&gt;        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                 &lt;span class="c1"&gt;# if you change this, update Qdrant collection size too!
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4 — Qdrant as the Vector Store
&lt;/h2&gt;

&lt;p&gt;We use &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; running in Docker as our vector database. It's fast, lightweight, and the REST API is straightforward to call with Python's &lt;code&gt;requests&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# qdrant_module.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;QDRANT_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QDRANT_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_rag_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;QDRANT_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/collections/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vectors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatgpt_vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;vector_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 1536 for text-embedding-3-small (default)
&lt;/span&gt;                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;distance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;QDRANT_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/collections/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/points/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;with_payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Qdrant locally with one Docker command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-dt&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; VectorDB &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 6333:6333 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /your/path/storage:/qdrant/storage &lt;span class="se"&gt;\&lt;/span&gt;
  qdrant/qdrant:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use &lt;strong&gt;Cosine similarity&lt;/strong&gt; rather than Euclidean distance. For semantic search in Thai, cosine similarity performs better because it measures the angle between vectors (meaning similarity) rather than the absolute distance, which is sensitive to text length differences.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 — The RAG Query Flow
&lt;/h2&gt;

&lt;p&gt;When a user asks a question, here's what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# chat_module.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai_module&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_embedding&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qdrant_module&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;category_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Build a context-rich search query
&lt;/span&gt;    &lt;span class="n"&gt;search_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;สินค้า&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;category_name&lt;/span&gt;  &lt;span class="c1"&gt;# "Product [category]"
&lt;/span&gt;
    &lt;span class="c1"&gt;# 2. Embed the search query (tokenization happens upstream before this call)
&lt;/span&gt;    &lt;span class="n"&gt;question_embed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Search Qdrant for the top 7 most similar sentences
&lt;/span&gt;    &lt;span class="n"&gt;gpt_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatgpt_vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question_embed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="n"&gt;search_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatgpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gpt_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Assemble context from the matched payloads
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_relevant_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_relevant_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The assembled context is then injected into GPT-4o-mini's system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;system_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Use the attached context to answer the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s questions.
Answer only questions related to our company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s products and services:

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

ภาษาที่ใช้ตอบกลับ User ให้ยึดจากภาษาของคำถามล่าสุดของ User เท่านั้น&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last Thai instruction tells the model: &lt;em&gt;"Reply in the same language as the user's most recent message."&lt;/em&gt; This handles the bilingual nature of our users — some ask in Thai, some in English, some mix both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6 — Question Classification Before RAG
&lt;/h2&gt;

&lt;p&gt;One non-obvious optimization: not every question needs a RAG lookup. We classify questions first with GPT-4o-mini to decide which path to take:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# chat_module.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;question_classification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;วิเคราะห์คำถามของ User ว่าเป็นคำถามประเภทไหน โดยให้ตอบเป็น JSON { &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: value }

    type 0 = ทักทาย / ไม่เกี่ยวกับสินค้าหรือบริการ
    type 1 = ถามเกี่ยวกับโปรโมชั่น / ส่วนลด / หมวดหมู่สินค้า
    type 2 = ถามเกี่ยวกับสาขา / พื้นที่จัดส่ง
    type 3 = ถามเกี่ยวกับข้อมูลสินค้าหรือบริการ  ← needs RAG
    type 4 = ถามทั่วไปเกี่ยวกับบริษัท&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only &lt;code&gt;type 3&lt;/code&gt; (specific product info questions) triggers the full RAG pipeline. Promotion and branch questions (&lt;code&gt;type 1-2&lt;/code&gt;) use structured data from a JSON catalog instead. Greetings (&lt;code&gt;type 0&lt;/code&gt;) go straight to the LLM without any retrieval at all.&lt;/p&gt;

&lt;p&gt;This classification step saves both latency and API cost — you're not doing a vector search for "สวัสดีครับ" (hello).&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Tokenize before embedding, always.&lt;/strong&gt; The single biggest quality improvement came from running &lt;code&gt;pythainlp&lt;/code&gt; on every piece of text before it touches the embedding model — both at ingest time and at query time. Without this, retrieval quality was noticeably worse for Thai-only queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Use PyMuPDF, not PyPDF2.&lt;/strong&gt; For Thai PDF documents, &lt;code&gt;PyMuPDF&lt;/code&gt; is dramatically more reliable. &lt;code&gt;PyPDF2&lt;/code&gt; would silently drop or garble Thai characters from complex layouts. Also note: as of v1.24+, use &lt;code&gt;import pymupdf&lt;/code&gt; instead of the legacy &lt;code&gt;import fitz&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Store original text, embed tokenized text.&lt;/strong&gt; Users should see natural language in responses. Keep these as separate fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Sentence-level chunks beat character-level chunks for Thai.&lt;/strong&gt; Because Thai sentences naturally carry complete thoughts, splitting at sentence boundaries (&lt;code&gt;.&lt;/code&gt;) gives the model coherent context units rather than arbitrary fragments. A &lt;code&gt;chunk_size=500&lt;/code&gt; cut might land in the middle of a Thai word — or more precisely, in the middle of a run of characters that spans multiple words, since there's no space to safely break at.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Question classification as a router saves money.&lt;/strong&gt; Not every user message needs vector search. A cheap classification step routes simple questions to a direct LLM call and complex ones to the full RAG pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PDF extraction&lt;/td&gt;
&lt;td&gt;PyMuPDF (&lt;code&gt;pymupdf&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;1.27.2.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thai tokenization&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pythainlp&lt;/code&gt; (&lt;code&gt;newmm&lt;/code&gt; engine)&lt;/td&gt;
&lt;td&gt;5.2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding model&lt;/td&gt;
&lt;td&gt;OpenAI &lt;code&gt;text-embedding-3-small&lt;/code&gt; (1536d)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector database&lt;/td&gt;
&lt;td&gt;Qdrant + &lt;code&gt;qdrant-client&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;1.17.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;OpenAI GPT-4o-mini&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI SDK&lt;/td&gt;
&lt;td&gt;&lt;code&gt;openai&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.32.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Python / FastAPI or Flask&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat history&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building RAG for Thai taught me that most of the "standard" chunking advice assumes English. Once you work with a language that has no word boundaries, the whole pipeline has to be rethought — from how you split sentences to how you normalize text before embedding.&lt;/p&gt;

&lt;p&gt;The good news: the fix is not complicated. A single tokenization step with &lt;code&gt;pythainlp&lt;/code&gt; before embedding makes a significant difference. The hard part is knowing you need it in the first place.&lt;/p&gt;

&lt;p&gt;If you're building RAG for other Asian languages — Japanese, Chinese, Korean — the same principle applies. Never assume your text has whitespace-delimited tokens. Always pre-process with a language-appropriate tokenizer before hitting your embedding model.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>python</category>
      <category>nlp</category>
      <category>ai</category>
    </item>
    <item>
      <title>25 Free Developer Tools That Run 100% in Your Browser</title>
      <dc:creator>Phasu  Yeneng</dc:creator>
      <pubDate>Sat, 18 Apr 2026 07:07:19 +0000</pubDate>
      <link>https://forem.com/kmusicman/25-free-developer-tools-that-run-100-in-your-browser-90</link>
      <guid>https://forem.com/kmusicman/25-free-developer-tools-that-run-100-in-your-browser-90</guid>
      <description>&lt;h1&gt;
  
  
  25 Free Developer Tools That Run 100% in Your Browser
&lt;/h1&gt;

&lt;p&gt;If you've ever pasted sensitive data into a random online tool and immediately regretted it — this post is for you.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;&lt;a href="https://toolsstack.cloud" rel="noopener noreferrer"&gt;toolsstack.cloud&lt;/a&gt;&lt;/strong&gt; — a collection of 25 free developer tools that run entirely in your browser. No backend. No account. No data ever leaves your device.&lt;/p&gt;

&lt;p&gt;Here's the full list, grouped by category.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔐 Security &amp;amp; Auth
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;a href="https://toolsstack.cloud/tools/jwt-decoder/" rel="noopener noreferrer"&gt;JWT Decoder&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Decode and inspect JWT tokens instantly — header, payload, expiry, and signature status. Useful when debugging auth issues without needing Postman or curl.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;a href="https://toolsstack.cloud/tools/hash-generator/" rel="noopener noreferrer"&gt;Hash Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Generate MD5, SHA-1, SHA-256, and SHA-512 hashes from any text. Client-side using the Web Crypto API — nothing is sent to a server.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;a href="https://toolsstack.cloud/tools/password-generator/" rel="noopener noreferrer"&gt;Password Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Cryptographically secure passwords with options for length, uppercase, numbers, symbols. Uses &lt;code&gt;crypto.getRandomValues()&lt;/code&gt; — genuinely random, not &lt;code&gt;Math.random()&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  📦 Data &amp;amp; Encoding
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4. &lt;a href="https://toolsstack.cloud/tools/json-formatter/" rel="noopener noreferrer"&gt;JSON Formatter &amp;amp; Validator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Format, minify, and validate JSON with syntax highlighting. Shows line numbers on errors. One of the most-used tools on the site.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;a href="https://toolsstack.cloud/tools/yaml-to-json/" rel="noopener noreferrer"&gt;YAML to JSON Converter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Paste YAML, get JSON. Or paste JSON, get YAML. Uses &lt;code&gt;js-yaml&lt;/code&gt; — supports anchors, multiline strings, and nested objects. Great for switching between Kubernetes manifests and API calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. &lt;a href="https://toolsstack.cloud/tools/json-to-csv/" rel="noopener noreferrer"&gt;JSON to CSV Converter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Convert JSON arrays to CSV with auto-detected column headers. Download the result directly. No server upload needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. &lt;a href="https://toolsstack.cloud/tools/base64-encoder/" rel="noopener noreferrer"&gt;Base64 Encoder / Decoder&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Encode and decode Base64 — supports both text and file input. Handles UTF-8 correctly (unlike some tools that break on non-ASCII characters).&lt;/p&gt;

&lt;h3&gt;
  
  
  8. &lt;a href="https://toolsstack.cloud/tools/image-to-base64/" rel="noopener noreferrer"&gt;Image to Base64&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Upload an image, get a Base64 data URI. Copy it directly into your HTML &lt;code&gt;&amp;lt;img src=""&amp;gt;&lt;/code&gt; or CSS &lt;code&gt;background-image&lt;/code&gt;. Useful for embedding small icons without an extra HTTP request.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. &lt;a href="https://toolsstack.cloud/tools/url-encoder/" rel="noopener noreferrer"&gt;URL Encoder &amp;amp; Decoder&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Encode/decode URLs with two modes: &lt;code&gt;encodeURIComponent&lt;/code&gt; (for query params) and &lt;code&gt;encodeURI&lt;/code&gt; (for full URLs). Also parses URLs into protocol, host, path, and individual query parameters.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. &lt;a href="https://toolsstack.cloud/tools/html-entity-encoder/" rel="noopener noreferrer"&gt;HTML Entity Encoder&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Encode special characters like &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;amp;&lt;/code&gt;, &lt;code&gt;"&lt;/code&gt; to HTML entities and back. Useful when embedding user-generated content or debugging XSS-safe output.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Developer Utilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  11. &lt;a href="https://toolsstack.cloud/tools/uuid-generator/" rel="noopener noreferrer"&gt;UUID / GUID Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Generate UUID v4 identifiers — single or bulk up to 1000 at once. Options for uppercase, no hyphens, or &lt;code&gt;{braces}&lt;/code&gt; GUID format. Uses &lt;code&gt;crypto.getRandomValues()&lt;/code&gt; for proper randomness.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. &lt;a href="https://toolsstack.cloud/tools/regex-tester/" rel="noopener noreferrer"&gt;Regex Tester&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Test regular expressions with live match highlighting, group capture display, and flags support. Faster than switching between your editor and a browser console.&lt;/p&gt;

&lt;h3&gt;
  
  
  13. &lt;a href="https://toolsstack.cloud/tools/cron-generator/" rel="noopener noreferrer"&gt;Cron Expression Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Build cron schedules with a visual builder — minute, hour, day, month, weekday. Shows a human-readable description and the next 5 run times. No more googling "cron every 15 minutes".&lt;/p&gt;

&lt;h3&gt;
  
  
  14. &lt;a href="https://toolsstack.cloud/tools/epoch-converter/" rel="noopener noreferrer"&gt;Epoch / Unix Timestamp Converter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Convert between Unix timestamps and human-readable dates in any timezone. Also shows the current timestamp live — useful for debugging API responses with &lt;code&gt;created_at&lt;/code&gt; fields.&lt;/p&gt;

&lt;h3&gt;
  
  
  15. &lt;a href="https://toolsstack.cloud/tools/diff-checker/" rel="noopener noreferrer"&gt;Diff Checker&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Paste two blocks of text and see line-by-line differences highlighted in green/red. Good for quickly spotting config file changes or API response differences.&lt;/p&gt;

&lt;h3&gt;
  
  
  16. &lt;a href="https://toolsstack.cloud/tools/chmod-calculator/" rel="noopener noreferrer"&gt;chmod Calculator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Click checkboxes for owner/group/others read/write/execute permissions and see the numeric value update in real time. Never google "chmod 755 meaning" again.&lt;/p&gt;




&lt;h2&gt;
  
  
  💻 Code &amp;amp; Text
&lt;/h2&gt;

&lt;h3&gt;
  
  
  17. &lt;a href="https://toolsstack.cloud/tools/css-minifier/" rel="noopener noreferrer"&gt;CSS Minifier&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Minify CSS and see the exact bytes saved. Pure client-side — paste your stylesheet, get minified output without uploading to any server.&lt;/p&gt;

&lt;h3&gt;
  
  
  18. &lt;a href="https://toolsstack.cloud/tools/sql-formatter/" rel="noopener noreferrer"&gt;SQL Formatter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Format and beautify SQL queries with proper indentation and keyword casing. Supports SELECT, INSERT, UPDATE, DELETE, JOIN, subqueries.&lt;/p&gt;

&lt;h3&gt;
  
  
  19. &lt;a href="https://toolsstack.cloud/tools/markdown-converter/" rel="noopener noreferrer"&gt;Markdown Converter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Convert Markdown to HTML with a live preview. Useful for checking how your README or blog post will render before committing.&lt;/p&gt;

&lt;h3&gt;
  
  
  20. &lt;a href="https://toolsstack.cloud/tools/lorem-ipsum-generator/" rel="noopener noreferrer"&gt;Lorem Ipsum Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Generate placeholder text — choose paragraphs, sentences, or word count. Starts with the classic "Lorem ipsum dolor sit amet" or generates fully random Latin-ish text.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎨 Design &amp;amp; Visual
&lt;/h2&gt;

&lt;h3&gt;
  
  
  21. &lt;a href="https://toolsstack.cloud/tools/color-converter/" rel="noopener noreferrer"&gt;Color Converter&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Convert between HEX, RGB, HSL, and HSV instantly with a color picker. Copy any format with one click. Useful when your designer gives you a hex code but your CSS needs HSL.&lt;/p&gt;

&lt;h3&gt;
  
  
  22. &lt;a href="https://toolsstack.cloud/tools/qr-code-generator/" rel="noopener noreferrer"&gt;QR Code Generator&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Generate QR codes for URLs, WiFi credentials, vCards, email, SMS, and more. Customize dot styles, eye shapes, colors, and upload a logo overlay. Download as PNG or SVG.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌏 Specialized
&lt;/h2&gt;

&lt;h3&gt;
  
  
  23. &lt;a href="https://toolsstack.cloud/tools/ip-lookup/" rel="noopener noreferrer"&gt;IP Lookup&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Shows your public IPv4 and IPv6 addresses with geolocation (country, city, ISP). Useful for verifying VPN connections or debugging network issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  24. &lt;a href="https://toolsstack.cloud/tools/pdf-text-extractor/" rel="noopener noreferrer"&gt;PDF Text Extractor&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Extract text from PDF files — supports both English and Thai. Uses pdf.js entirely in the browser. The PDF never gets uploaded anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  25. &lt;a href="https://toolsstack.cloud/tools/thai-slug/" rel="noopener noreferrer"&gt;Thai Text to URL Slug&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Convert Thai text to URL-friendly slugs using RTGS (Royal Thai General System of Transcription) romanization. For example, &lt;code&gt;สวัสดีครับ&lt;/code&gt; → &lt;code&gt;sawatdi-khrap&lt;/code&gt;. Useful for building SEO-friendly URLs for Thai-language content.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why client-side only?
&lt;/h2&gt;

&lt;p&gt;Most online developer tools send your data to a server. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your JWT tokens (with user data) go to someone else's server&lt;/li&gt;
&lt;li&gt;Your passwords get logged somewhere&lt;/li&gt;
&lt;li&gt;Your internal API responses are stored in someone's database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every tool on toolsstack.cloud processes data entirely in your browser using native browser APIs (&lt;code&gt;crypto.getRandomValues&lt;/code&gt;, &lt;code&gt;URL&lt;/code&gt;, &lt;code&gt;FileReader&lt;/code&gt;) and trusted open-source CDN libraries like &lt;code&gt;js-yaml&lt;/code&gt; and &lt;code&gt;pdf.js&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The network tab stays clean.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;p&gt;Pure HTML + Vanilla JavaScript. No framework. No build step. No &lt;code&gt;node_modules&lt;/code&gt;. The entire site is static files on shared hosting.&lt;/p&gt;

&lt;p&gt;For tools that need libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;js-yaml@4.1.0&lt;/code&gt; — YAML parsing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pdf.js@4.4&lt;/code&gt; — PDF text extraction&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qrcode-generator&lt;/code&gt; — QR code rendering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else uses browser-native APIs.&lt;/p&gt;




&lt;p&gt;If you find any of these useful — or want to suggest a tool that's missing — feel free to drop a comment below.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://toolsstack.cloud" rel="noopener noreferrer"&gt;https://toolsstack.cloud&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>tools</category>
      <category>javascript</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
