<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ahsan Mehmood</title>
    <description>The latest articles on Forem by Ahsan Mehmood (@iamahsanmehmood).</description>
    <link>https://forem.com/iamahsanmehmood</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3813758%2F4e51aeb9-4998-4138-a021-26bb36662d98.jpg</url>
      <title>Forem: Ahsan Mehmood</title>
      <link>https://forem.com/iamahsanmehmood</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/iamahsanmehmood"/>
    <language>en</language>
    <item>
      <title>I Built the First Deterministic Urdu Compound Word Detector — Here's Why It Took a Full Library to Get There</title>
      <dc:creator>Ahsan Mehmood</dc:creator>
      <pubDate>Mon, 27 Apr 2026 18:32:27 +0000</pubDate>
      <link>https://forem.com/iamahsanmehmood/i-built-the-first-deterministic-urdu-compound-word-detector-heres-why-it-took-a-full-library-to-2l1o</link>
      <guid>https://forem.com/iamahsanmehmood/i-built-the-first-deterministic-urdu-compound-word-detector-heres-why-it-took-a-full-library-to-2l1o</guid>
      <description>&lt;p&gt;Urdu is spoken by over 230 million people. It is the national language of Pakistan, one of the 22 scheduled languages of India, and the lingua franca of a diaspora spanning three continents. And yet, if you try to build Urdu software today — real software, not a toy — you will hit the same wall every other developer hit before you: the tools do not exist.&lt;/p&gt;

&lt;p&gt;I hit that wall building &lt;a href="https://hamaariurdu.com" rel="noopener noreferrer"&gt;HamaariUrdu&lt;/a&gt;, an Urdu language learning platform. This post is about what I built to fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bugs that no library could fix
&lt;/h2&gt;

&lt;p&gt;I was not looking to build a library. I was looking to ship features. But the bugs kept piling up, and none of the available Urdu NLP libraries (UrduHack, URDUNLP, or anything else) could fix them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 1: Search returning zero results for words that are obviously in the database.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The database stored &lt;code&gt;ہے&lt;/code&gt; using the correct Urdu &lt;code&gt;ہ&lt;/code&gt; (U+06C1, Heh Goal). The user's keyboard typed Arabic &lt;code&gt;ه&lt;/code&gt; (U+0647, Heh). Both look &lt;strong&gt;completely identical&lt;/strong&gt; on screen in Naskh fonts. But &lt;code&gt;U+06C1 !== U+0647&lt;/code&gt;. Zero results. No error. No warning. Just silence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 2: String equality silently failing.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;قلم&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;قلم&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;// false — why?!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One of those strings was copied from Microsoft Word and contains an invisible ZWNJ (Zero Width Non-Joiner, U+200C) that Word inserts automatically. You cannot see it. Your editor does not show it. But the comparison fails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 3: TinyMCE destroying Izafat.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Urdu grammar, Izafat (اضافت) is a grammatical construction that links two words — like the English "of" but expressed as a marker on the first word. The marker is often an apostrophe-like character (U+2019, Right Single Quotation Mark).&lt;/p&gt;

&lt;p&gt;TinyMCE — a very popular rich text editor — silently converts U+2019 to &lt;code&gt;&amp;amp;rsquo;&lt;/code&gt; before saving. So a word like &lt;code&gt;کتابِ&lt;/code&gt; (with Kasra) or a phrase using Izafat apostrophe gets stored as an HTML entity. Every compound word lookup in the database then fails because the stored form doesn't match the queried form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 4: Numbers overflowing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Urdu text frequently references South Asian scale: لاکھ (100,000), کروڑ (10,000,000), ارب (1,000,000,000). These are real everyday numbers in Pakistan — newspaper headlines, financial documents, government statistics.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Number.MAX_SAFE_INTEGER&lt;/code&gt; is 9,007,199,254,740,991. A single کھرب (1 trillion) value loses precision with &lt;code&gt;typeof number&lt;/code&gt;. JavaScript silently gives you the wrong answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 5: Sorting broken for every Urdu word list.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No database and no JavaScript runtime has native Urdu collation. The Urdu alphabet has 39 letters in a specific order that does not match either Unicode codepoint order or any Latin-derived collation. Every sorted word list was wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug 6 — the worst one: Compound words destroying every downstream NLP task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one deserves its own section.&lt;/p&gt;




&lt;h2&gt;
  
  
  The compound word problem
&lt;/h2&gt;

&lt;p&gt;Urdu مرکب الفاظ (compound words) are multi-word expressions that function as &lt;strong&gt;a single semantic unit&lt;/strong&gt; but are written with &lt;strong&gt;spaces between their parts&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;کتاب خانہ  →  library  (کتاب = book, خانہ = place)
بے عزت     →  disrespectful  (بے = without, عزت = honor)
خوش قسمت  →  fortunate  (خوش = well, قسمت = fate)
علم و عمل  →  knowledge and practice  (fixed expression)
محنت مشقت →  hard work  (synonym compound)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A naive tokenizer sees spaces and splits them. The result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input:   "اس نے کتاب خانہ بنایا"
                ↑ ↑
         space between compound components

Wrong:   ['اس', 'نے', 'کتاب', 'خانہ', 'بنایا']
         (5 tokens — "library" is split into "book" + "place")

Right:   ['اس', 'نے', 'کتاب‌خانہ', 'بنایا']
         (4 tokens — "library" is one semantic unit)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The consequences ripple into every downstream NLP task:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;What breaks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;کتاب خانہ&lt;/code&gt; doesn't match &lt;code&gt;کتاب‌خانہ&lt;/code&gt; — zero results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;امورِ خانہ داری&lt;/code&gt; (household affairs) split into 3 unrelated tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sentiment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;بے عزت&lt;/code&gt; (disrespectful) vs &lt;code&gt;بے&lt;/code&gt; + &lt;code&gt;عزت&lt;/code&gt; — polarity lost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Translation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;رنگ برنگے&lt;/code&gt; (colorful) translated as "color" + unknown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Word count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every compound inflates the count with phantom tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Why this is genuinely hard
&lt;/h3&gt;

&lt;p&gt;Urdu compound words span &lt;strong&gt;four different morphological strategies simultaneously&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strategy 1 — Affix-based:&lt;/strong&gt; One word contains a known derivational morpheme (prefix or suffix):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;کتاب + خانہ   →  library   (خانہ = "place of" suffix)
بے + عزت      →  disrespectful  (بے = "without" prefix)  
خوش + قسمت   →  fortunate  (خوش = "well" prefix)
کتاب + داری   →  librarianship  (داری = "keeping" suffix)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strategy 2 — Izafat:&lt;/strong&gt; A grammatical linking marker appears in the text, written or implied:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;کتابِ حسنہ    (the good book)  — Zer mark (◌ِ) on first word
روحِ رواں     (driving spirit) — Hamza-above (◌ٔ) marker
علم و عمل     (knowledge and practice) — Vav-e-atf (و) connector
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strategy 3 — Lexical:&lt;/strong&gt; Neither word is morphologically special. You simply have to &lt;em&gt;know&lt;/em&gt; these pairs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;محنت مشقت     (hard work — synonym compound)
رنگ برنگے     (colorful — echo compound)
صبر شکر       (patient gratitude — near-synonym pair)
انسائیکلوپیڈیا آف اسلام  (3-word fixed title)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strategy 4 — Chains:&lt;/strong&gt; Three or more words where each link is independently valid:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;امورِ خانہ داری  (household affairs — 3 words)
↑       ↑   ↑
izafat  affix  suffix

Decomposition:
امورِ + خانہ  →  izafat compound
خانہ + داری  →  affix compound
Merged:  امورِ خانہ داری  →  one 3-word compound
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No statistical model trained on general text reliably covers all four strategies. They operate at different linguistic levels and require different detection mechanisms.&lt;/p&gt;




&lt;h2&gt;
  
  
  The approach: three deterministic layers
&lt;/h2&gt;

&lt;p&gt;Every other Urdu compound detection library (where one even exists) treats this as a &lt;strong&gt;machine learning problem&lt;/strong&gt;. They feed training data into statistical models and hope the probabilities align.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Results change unpredictably between corpus versions&lt;/li&gt;
&lt;li&gt;You cannot explain &lt;em&gt;why&lt;/em&gt; a pair was or wasn't detected&lt;/li&gt;
&lt;li&gt;Edge cases (literary izafat, 3-word expressions, echo words) fail silently&lt;/li&gt;
&lt;li&gt;No deterministic guarantee across identical inputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;urdu-tools takes the opposite approach.&lt;/strong&gt; Every detection is grounded in one of three verifiable, explainable rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw text
   │
   ├─► Layer 1 — Affix (UAWL)
   │       100+ known Urdu prefix/suffix morphemes
   │       خانہ  گاہ  پرست  بے  نا  خوش  شب  غم  …
   │
   ├─► Layer 2 — Izafat
   │       zer mark (◌ِ) · hamza-above (◌ٔ) · vav-e-atf (و)
   │       کتابِ حسنہ · روحِ رواں · علم و عمل
   │
   └─► Layer 3 — Lexicon
           3,262 root entries · N-word tails · greedy longest-match
           محنت مشقت · رنگ برنگے · انسائیکلوپیڈیا آف اسلام
               │
               └─► Span chaining
                       امورِ خانہ  +  خانہ داری  →  امورِ خانہ داری
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The same input always produces the same output, always with a reason.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the first open-source implementation of deterministic, multi-layer, N-gram Urdu compound detection in any language.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing urdu-tools
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/iamahsanmehmood/urdu-tools" rel="noopener noreferrer"&gt;github.com/iamahsanmehmood/urdu-tools&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A production-quality, zero-dependency Urdu text processing library. Available for TypeScript/JavaScript and C#/.NET, with identical APIs in both.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @iamahsanmehmood/urdu-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dotnet add package UrduTools.Core
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;392 tests passing. 85 C# tests. 90%+ coverage enforced in CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The compound detection API
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;joinCompounds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;splitCompounds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;isCompound&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools/compound&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Detecting compounds
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Layer 1: Affix — خانہ is a known place-suffix&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب خانہ بہت اچھا ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{&lt;/span&gt;
&lt;span class="c1"&gt;//     text: 'کتاب خانہ',&lt;/span&gt;
&lt;span class="c1"&gt;//     type: 'affix',&lt;/span&gt;
&lt;span class="c1"&gt;//     components: ['کتاب', 'خانہ'],&lt;/span&gt;
&lt;span class="c1"&gt;//     start: 0,&lt;/span&gt;
&lt;span class="c1"&gt;//     end: 1&lt;/span&gt;
&lt;span class="c1"&gt;//   }]&lt;/span&gt;

&lt;span class="c1"&gt;// Layer 1: Affix — بے is a known privative prefix&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;بے عزت آدمی نہیں چاہیے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'بے عزت', type: 'affix', components: ['بے', 'عزت'], ... }]&lt;/span&gt;

&lt;span class="c1"&gt;// Layer 2: Izafat — standalone و (vav-e-atf) between content words&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم و عمل ضروری ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'علم و عمل', type: 'izafat', components: ['علم', 'و', 'عمل'], ... }]&lt;/span&gt;

&lt;span class="c1"&gt;// Layer 3: Lexicon — echo compound, neither word is an affix&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;رنگ برنگے پھول کھلے ہیں&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'رنگ برنگے', type: 'lexicon', components: ['رنگ', 'برنگے'], ... }]&lt;/span&gt;

&lt;span class="c1"&gt;// Lexicon: synonym compound&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;محنت مشقت کے بغیر کامیابی نہیں&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'محنت مشقت', type: 'lexicon', ... }]&lt;/span&gt;

&lt;span class="c1"&gt;// 3-word chain: izafat (zer on امورِ) + affix (داری suffix on خانہ)&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;امورِ خانہ داری چلانا مشکل ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'امورِ خانہ داری', type: 'affix', components: ['امورِ', 'خانہ', 'داری'], ... }]&lt;/span&gt;

&lt;span class="c1"&gt;// 3-word lexicon entry: greedy longest-match wins over any 2-word overlap&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;انسائیکلوپیڈیا آف اسلام کا حوالہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [{ text: 'انسائیکلوپیڈیا آف اسلام', type: 'lexicon', ... }]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The pipeline: join before tokenize
&lt;/h3&gt;

&lt;p&gt;The critical downstream use case — bind compounds &lt;em&gt;before&lt;/em&gt; tokenizing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;joinCompounds&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools/compound&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tokenize&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب خانہ میں علم و عمل کی کتابیں ہیں&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// Without compound joining — naive tokenizer splits everything&lt;/span&gt;
&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → ['کتاب', 'خانہ', 'میں', 'علم', 'و', 'عمل', 'کی', 'کتابیں', 'ہیں']&lt;/span&gt;
&lt;span class="c1"&gt;//    ↑ split!                 ↑ split!&lt;/span&gt;

&lt;span class="c1"&gt;// With compound joining — semantic integrity preserved&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;joined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;joinCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → 'کتاب‌خانہ میں علم‌و‌عمل کی کتابیں ہیں'&lt;/span&gt;
&lt;span class="c1"&gt;//          ↑ ZWNJ (invisible, prevents tokenizer split)&lt;/span&gt;

&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;joined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → ['کتاب‌خانہ', 'میں', 'علم‌و‌عمل', 'کی', 'کتابیں', 'ہیں']&lt;/span&gt;
&lt;span class="c1"&gt;//    ↑ one token            ↑ one token  ✓&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ZWNJ (Zero Width Non-Joiner, U+200C) is invisible but meaningful — the tokenizer sees it and keeps the word intact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pair-level check
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;isCompound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;خانہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// → { matched: true,  type: 'affix'   }&lt;/span&gt;
&lt;span class="nf"&gt;isCompound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;محنت&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;مشقت&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// → { matched: true,  type: 'lexicon' }&lt;/span&gt;
&lt;span class="nf"&gt;isCompound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;اخلاقِ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;حسنہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → { matched: true,  type: 'izafat' }&lt;/span&gt;
&lt;span class="nf"&gt;isCompound&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;اچھا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;آدمی&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// → { matched: false, type: null      }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fine-grained control
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Use only specific layers&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;affix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;izafat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;lexicon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;detectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;affix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;izafat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;lexicon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Choose the binder character for joinCompounds&lt;/span&gt;
&lt;span class="nf"&gt;joinCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                      &lt;span class="c1"&gt;// ZWNJ U+200C (default, invisible)&lt;/span&gt;
&lt;span class="nf"&gt;joinCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;binder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nbsp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;  &lt;span class="c1"&gt;// Non-breaking space (visible)&lt;/span&gt;
&lt;span class="nf"&gt;joinCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;binder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wj&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;   &lt;span class="c1"&gt;// Word Joiner U+2060 (never line-breaks)&lt;/span&gt;

&lt;span class="c1"&gt;// Inverse — split back to spaces&lt;/span&gt;
&lt;span class="nf"&gt;splitCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب‌خانہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'کتاب خانہ'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The normalization pipeline
&lt;/h2&gt;

&lt;p&gt;A 12-layer normalization pipeline — the foundation that every other module builds on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fingerprint&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 — NFC&lt;/td&gt;
&lt;td&gt;Unicode canonical form&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 — NBSP&lt;/td&gt;
&lt;td&gt;Non-breaking space → regular space&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 — Alif Madda&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;آ&lt;/code&gt; → &lt;code&gt;آ&lt;/code&gt; (precomposed)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4 — Numerals&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;٠–٩&lt;/code&gt; and &lt;code&gt;۰–۹&lt;/code&gt; → ASCII &lt;code&gt;0–9&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5 — Zero-width&lt;/td&gt;
&lt;td&gt;Strip ZWNJ, ZWJ, soft hyphen&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6 — Diacritics&lt;/td&gt;
&lt;td&gt;Strip zabar, zer, pesh, shadda, sukun, tanwin&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7 — Honorifics&lt;/td&gt;
&lt;td&gt;Strip Islamic honorific signs (ؐ ؑ ؒ ؓ ؔ)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8 — Hamza&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;أ&lt;/code&gt; → &lt;code&gt;ا&lt;/code&gt;, &lt;code&gt;ؤ&lt;/code&gt; → &lt;code&gt;و&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9 — Kashida&lt;/td&gt;
&lt;td&gt;Strip tatweel U+0640&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10 — Presentation forms&lt;/td&gt;
&lt;td&gt;Map U+FB50–FEFF to base chars&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11 — Punctuation trim&lt;/td&gt;
&lt;td&gt;Strip leading/trailing non-letter chars&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12 — Char normalize&lt;/td&gt;
&lt;td&gt;Arabic look-alikes → correct Urdu codepoints&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عِلمٌ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;// 'علم'  (layers 1–6: diacritics stripped)&lt;/span&gt;
&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;آ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;// 'آ'    (layer 3: Alif + Madda → precomposed)&lt;/span&gt;
&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم‌ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;// 'علمہے' (layer 5: ZWNJ stripped)&lt;/span&gt;
&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;نبیؐ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;// 'نبی'  (layer 7: honorific stripped)&lt;/span&gt;

&lt;span class="c1"&gt;// Full normalization for search indexing&lt;/span&gt;
&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;kashida&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;presentationForms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;punctuationTrim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;normalizeCharacters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// ي → ی, ك → ک, ه → ہ&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The fingerprint function
&lt;/h3&gt;

&lt;p&gt;For client-side word comparison without database round-trips:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عِلمٌ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عَلم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// true (both normalize to 'علم')&lt;/span&gt;
&lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;نبیؐ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;نبی&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// true (honorific stripped)&lt;/span&gt;
&lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم‌&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nf"&gt;fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// true (ZWNJ stripped)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use this in HamaariUrdu to compare user input against stored words in a 110,000+ word dictionary without needing a round-trip to the database for every keystroke.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Arabic–Urdu confusion problem
&lt;/h2&gt;

&lt;p&gt;This is the &lt;strong&gt;single most common source of silent failures&lt;/strong&gt; in Urdu software, and no existing library addressed it.&lt;/p&gt;

&lt;p&gt;Three character pairs are &lt;strong&gt;visually identical&lt;/strong&gt; in Naskh fonts but are different Unicode code points:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Visual&lt;/th&gt;
&lt;th&gt;Arabic codepoint&lt;/th&gt;
&lt;th&gt;Urdu codepoint&lt;/th&gt;
&lt;th&gt;Common source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ی&lt;/td&gt;
&lt;td&gt;ي U+064A&lt;/td&gt;
&lt;td&gt;ی U+06CC&lt;/td&gt;
&lt;td&gt;Arabic-layout keyboards, Arabic websites&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ک&lt;/td&gt;
&lt;td&gt;ك U+0643&lt;/td&gt;
&lt;td&gt;ک U+06A9&lt;/td&gt;
&lt;td&gt;Arabic-layout keyboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ہ&lt;/td&gt;
&lt;td&gt;ه U+0647&lt;/td&gt;
&lt;td&gt;ہ U+06C1&lt;/td&gt;
&lt;td&gt;Arabic text pasted into Urdu context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A user searching for &lt;code&gt;ہے&lt;/code&gt; typed with Arabic &lt;code&gt;ه&lt;/code&gt; finds &lt;strong&gt;zero results&lt;/strong&gt; in a database that stored it with Urdu &lt;code&gt;ہ&lt;/code&gt;. Both look identical on screen. No error. No warning. Zero results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;normalizeCharacters&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;normalizeCharacters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ي&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'ی'  (U+064A → U+06CC)&lt;/span&gt;
&lt;span class="nf"&gt;normalizeCharacters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ك&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'ک'  (U+0643 → U+06A9)&lt;/span&gt;
&lt;span class="nf"&gt;normalizeCharacters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ه&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'ہ'  (U+0647 → U+06C1)&lt;/span&gt;

&lt;span class="c1"&gt;// Apply before storage or search indexing:&lt;/span&gt;
&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;normalizeCharacters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Progressive search matching
&lt;/h2&gt;

&lt;p&gt;The search module tries 9 progressively aggressive normalization layers until it finds a match — or returns false with full diagnostic info.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fuzzyMatch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;getAllNormalizations&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عِلمٌ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → { matched: true, layer: 'strip-diacritics', normalizedQuery: 'علم', normalizedTarget: 'علم' }&lt;/span&gt;

&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;نبیؐ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;نبی&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → { matched: true, layer: 'strip-honorifics', ... }&lt;/span&gt;

&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;أحمد&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;احمد&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → { matched: true, layer: 'normalize-hamza', ... }&lt;/span&gt;

&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → { matched: false, layer: null, ... }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For database lookups, &lt;code&gt;getAllNormalizations()&lt;/code&gt; returns every normalized form to try:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;forms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getAllNormalizations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عِلمٌ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → ['عِلمٌ', 'عِلم', 'علم', ...]  (from most specific to most aggressive)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;form&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;forms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;form&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fuzzy matching uses Levenshtein + LCS hybrid (threshold 0.5):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;fuzzyMatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتابیں&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;// → { candidate: 'کتابیں', score: ~0.7 }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Numbers — South Asian scale with bigint
&lt;/h2&gt;

&lt;p&gt;The South Asian number system has named units that don't exist in Western mathematics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Urdu&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ہزار&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;لاکھ&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;کروڑ&lt;/td&gt;
&lt;td&gt;10,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ارب&lt;/td&gt;
&lt;td&gt;1,000,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;کھرب&lt;/td&gt;
&lt;td&gt;1,000,000,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;نیل&lt;/td&gt;
&lt;td&gt;1,000,000,000,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The entire module uses &lt;code&gt;bigint&lt;/code&gt; throughout — South Asian numbers exceed &lt;code&gt;Number.MAX_SAFE_INTEGER&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;toUrduNumerals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;wordsToNumber&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                      &lt;span class="c1"&gt;// 'صفر'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;// 'ایک سو'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="nx"&gt;_000n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                &lt;span class="c1"&gt;// 'ایک لاکھ'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="nx"&gt;_000_000n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;// 'ایک کروڑ'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000_000_000_000n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// 'ایک نیل'&lt;/span&gt;

&lt;span class="c1"&gt;// Ordinals with gender agreement&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ordinal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;masculine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;  &lt;span class="c1"&gt;// 'پہلا'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ordinal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;feminine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;   &lt;span class="c1"&gt;// 'پہلی'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ordinal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;masculine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="c1"&gt;// 'گیارہواں'&lt;/span&gt;
&lt;span class="nf"&gt;numberToWords&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ordinal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gender&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;feminine&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;  &lt;span class="c1"&gt;// 'گیارہویں'&lt;/span&gt;

&lt;span class="c1"&gt;// Currency&lt;/span&gt;
&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;505.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PKR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// 'پانچ سو پانچ روپے پچاس پیسے'&lt;/span&gt;
&lt;span class="nf"&gt;formatCurrency&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// 'ایک ہزار روپے'&lt;/span&gt;

&lt;span class="c1"&gt;// Numeral conversion&lt;/span&gt;
&lt;span class="nf"&gt;toUrduNumerals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// '۲۰۲۴'&lt;/span&gt;

&lt;span class="c1"&gt;// Inverse — parse words back to number&lt;/span&gt;
&lt;span class="nf"&gt;wordsToNumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ایک کروڑ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// 10_000_000n&lt;/span&gt;
&lt;span class="nf"&gt;wordsToNumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پانچ سو پانچ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// 505n&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Canonical Urdu sorting
&lt;/h2&gt;

&lt;p&gt;No database and no JavaScript runtime has native Urdu collation. The 39-letter Urdu alphabet order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ء ا ب پ ت ٹ ث ج چ ح خ د ڈ ذ ر ڑ ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں و ہ ھ ی ے
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sortKey&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ک&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;           &lt;span class="c1"&gt;// → ['ا', 'ب', 'ک', 'ے']&lt;/span&gt;
&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;زبان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;اردو&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;بہترین&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;      &lt;span class="c1"&gt;// → ['اردو', 'بہترین', 'زبان']&lt;/span&gt;

&lt;span class="c1"&gt;// Use compare() as a comparator for any sorting context&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ک&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;compare&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// → ['ا', 'ک', 'ے']&lt;/span&gt;

&lt;span class="c1"&gt;// sortKey() for indexing — diacritics stripped before key generation&lt;/span&gt;
&lt;span class="nf"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// '030003091102280814'&lt;/span&gt;
&lt;span class="c1"&gt;// عِلم and عَلم sort to the same position&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In C# it implements &lt;code&gt;IComparer&amp;lt;string&amp;gt;&lt;/code&gt; for native LINQ integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Sorting&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"ے"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ا"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ک"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ب"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UrduComparer&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// ["ا", "ب", "ک", "ے"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Unicode-aware tokenization
&lt;/h2&gt;

&lt;p&gt;The tokenizer handles the edge cases that matter in real Urdu text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ngrams&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان ایک خوبصورت ملک ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → [&lt;/span&gt;
&lt;span class="c1"&gt;//   { text: 'پاکستان', type: 'urdu-word' },&lt;/span&gt;
&lt;span class="c1"&gt;//   { text: 'ایک',     type: 'urdu-word' },&lt;/span&gt;
&lt;span class="c1"&gt;//   { text: 'خوبصورت', type: 'urdu-word' },&lt;/span&gt;
&lt;span class="c1"&gt;//   { text: 'ملک',     type: 'urdu-word' },&lt;/span&gt;
&lt;span class="c1"&gt;//   { text: 'ہے',      type: 'urdu-word' },&lt;/span&gt;
&lt;span class="c1"&gt;// ]&lt;/span&gt;

&lt;span class="c1"&gt;// Sentence splitting — on ۔ (U+06D4) ؟ ! but NOT on ، or ؛&lt;/span&gt;
&lt;span class="nf"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پہلا جملہ۔ دوسرا جملہ؟ تیسرا جملہ!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → ['پہلا جملہ', 'دوسرا جملہ', 'تیسرا جملہ']&lt;/span&gt;

&lt;span class="c1"&gt;// The tokenizer preserves ZWNJ within words —&lt;/span&gt;
&lt;span class="c1"&gt;// so joinCompounds() output is one token per compound&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key edge cases handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Izafat Kasra (U+0650) at word boundaries is not treated as a split point&lt;/li&gt;
&lt;li&gt;ZWNJ-bound compounds (output of &lt;code&gt;joinCompounds()&lt;/code&gt;) are kept as single tokens&lt;/li&gt;
&lt;li&gt;Mixed Urdu/Latin text is classified per-token&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Transliteration — 18 aspirated digraphs
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;toRoman&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fromRoman&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;toRoman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// 'pakistan'&lt;/span&gt;
&lt;span class="nf"&gt;toRoman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;بھارت&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// 'bharat'&lt;/span&gt;
&lt;span class="nf"&gt;toRoman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;چھوٹا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// 'chhota'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Digraph rules (left-to-right FSM, digraph priority):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Urdu&lt;/th&gt;
&lt;th&gt;Roman&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Urdu&lt;/th&gt;
&lt;th&gt;Roman&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;بھ&lt;/td&gt;
&lt;td&gt;bh&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;پھ&lt;/td&gt;
&lt;td&gt;ph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;تھ&lt;/td&gt;
&lt;td&gt;th&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;ٹھ&lt;/td&gt;
&lt;td&gt;Th&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;جھ&lt;/td&gt;
&lt;td&gt;jh&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;چھ&lt;/td&gt;
&lt;td&gt;chh&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;دھ&lt;/td&gt;
&lt;td&gt;dh&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;ڈھ&lt;/td&gt;
&lt;td&gt;Dh&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;کھ&lt;/td&gt;
&lt;td&gt;kh&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;گھ&lt;/td&gt;
&lt;td&gt;gh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;fromRoman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pakistan&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'پاکستان' (trie-based longest-prefix match)&lt;/span&gt;
&lt;span class="nf"&gt;fromRoman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bharat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// → 'بھارت'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  InPage encoding — decoding 30 years of Urdu archives
&lt;/h2&gt;

&lt;p&gt;InPage was the dominant Urdu desktop publishing tool for decades. Millions of documents — newspapers, books, government archives — exist only in InPage format. The library decodes all three versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;decodeInpage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;detectEncoding&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// Auto-detect InPage version and decode&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decodeInpage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;auto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// result.paragraphs → string[]  (Unicode Urdu text)&lt;/span&gt;
&lt;span class="c1"&gt;// result.version   → 'v1' | 'v2' | 'v3'&lt;/span&gt;

&lt;span class="c1"&gt;// Explicit version&lt;/span&gt;
&lt;span class="nf"&gt;decodeInpage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// 0x04-prefix byte-pair encoding (old InPage)&lt;/span&gt;
&lt;span class="nf"&gt;decodeInpage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// UTF-16LE with paragraph markers&lt;/span&gt;

&lt;span class="c1"&gt;// Detect encoding from buffer alone&lt;/span&gt;
&lt;span class="nf"&gt;detectEncoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// → 'utf-8' | 'utf-16le' | 'windows-1256' | 'inpage-v1v2' | 'inpage-v3' | 'unknown'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  String utilities
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;wordCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;charCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="nx"&gt;extractUrdu&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;decodeHtmlEntities&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// Reverse word order (not characters — preserves Arabic shaping)&lt;/span&gt;
&lt;span class="nf"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان ہندوستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;// → 'ہندوستان پاکستان'&lt;/span&gt;

&lt;span class="c1"&gt;// Truncate at word boundary&lt;/span&gt;
&lt;span class="nf"&gt;truncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;یہ ایک بہت لمبا جملہ ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'یہ ایک...'&lt;/span&gt;

&lt;span class="c1"&gt;// Count grapheme clusters (correct for combining diacritics)&lt;/span&gt;
&lt;span class="nf"&gt;charCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;عِلم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// → 3  (ع+ِ = 1 cluster, ل, م)&lt;/span&gt;

&lt;span class="c1"&gt;// Extract Urdu/Arabic segments from mixed text&lt;/span&gt;
&lt;span class="nf"&gt;extractUrdu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The word علم means knowledge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → ['علم']&lt;/span&gt;

&lt;span class="c1"&gt;// Decode HTML entities BEFORE normalize() — critical for TinyMCE/Quill content&lt;/span&gt;
&lt;span class="nf"&gt;decodeHtmlEntities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کتاب&amp;amp;rsquo;خانہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// → 'کتاب’خانہ'&lt;/span&gt;
&lt;span class="nf"&gt;decodeHtmlEntities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&amp;amp;nbsp;ہے&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;// → 'علم ہے'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last one (&lt;code&gt;decodeHtmlEntities&lt;/code&gt;) is the fix for the TinyMCE bug mentioned at the top. Always call it before normalizing text that came from a rich text editor.&lt;/p&gt;




&lt;h2&gt;
  
  
  Script and character analysis
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;isUrduChar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;getScript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;classifyChar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isRTL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;getUrduDensity&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@iamahsanmehmood/urdu-tools&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="nf"&gt;isUrduChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// true  — U+067E is Urdu-specific&lt;/span&gt;
&lt;span class="nf"&gt;isUrduChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ب&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// false — U+0628 is shared with Arabic&lt;/span&gt;
&lt;span class="nf"&gt;isUrduChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;۱&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// true  — U+06F1 Urdu numeral&lt;/span&gt;

&lt;span class="nf"&gt;getScript&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;// 'urdu'&lt;/span&gt;
&lt;span class="nf"&gt;getScript&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;مرحبا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;// 'arabic'&lt;/span&gt;
&lt;span class="nf"&gt;getScript&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello پاکستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// 'mixed'&lt;/span&gt;

&lt;span class="nf"&gt;classifyChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// 'urdu-letter'&lt;/span&gt;
&lt;span class="nf"&gt;classifyChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;َ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// 'diacritic'&lt;/span&gt;
&lt;span class="nf"&gt;classifyChar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;۱&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// 'numeral'&lt;/span&gt;

&lt;span class="nf"&gt;isRTL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="c1"&gt;// true&lt;/span&gt;
&lt;span class="nf"&gt;getUrduDensity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;پاکستان زندہ&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// 0.28&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  C#/.NET — identical API, zero dependencies
&lt;/h2&gt;

&lt;p&gt;Every function is available in &lt;code&gt;UrduTools.Core&lt;/code&gt; with the same behavior. The C# package mirrors the TypeScript structure exactly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Normalization&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Compound&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Numbers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Sorting&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;UrduTools.Core.Search&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Normalize&lt;/span&gt;
&lt;span class="n"&gt;UrduNormalizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"عِلمٌ"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                          &lt;span class="c1"&gt;// "علم"&lt;/span&gt;
&lt;span class="n"&gt;UrduNormalizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"علم‌"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                      &lt;span class="c1"&gt;// "علم"&lt;/span&gt;

&lt;span class="c1"&gt;// Compound detection&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;spans&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CompoundDetector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;DetectCompounds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"کتاب خانہ میں"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// spans[0].Text == "کتاب خانہ"&lt;/span&gt;
&lt;span class="c1"&gt;// spans[0].Type == CompoundType.Affix&lt;/span&gt;

&lt;span class="c1"&gt;// Numbers&lt;/span&gt;
&lt;span class="n"&gt;NumberToWords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// "ایک کروڑ"&lt;/span&gt;
&lt;span class="n"&gt;NumberToWords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;NumberOptions&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;Ordinal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Gender&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Gender&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Feminine&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;  &lt;span class="c1"&gt;// "پہلی"&lt;/span&gt;

&lt;span class="c1"&gt;// Sort&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"ے"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ا"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ک"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ب"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;UrduComparer&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// ["ا", "ب", "ک", "ے"]&lt;/span&gt;

&lt;span class="c1"&gt;// Progressive normalization for DB lookup&lt;/span&gt;
&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;form&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;UrduMatcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetAllNormalizations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userInput&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;LookupAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Match&lt;/span&gt;
&lt;span class="n"&gt;UrduMatcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"عِلمٌ"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"علم"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;Matched&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// true, layer: StripDiacritics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Academic foundation
&lt;/h2&gt;

&lt;p&gt;The compound word detection module was built on peer-reviewed Urdu linguistics research. These three works directly informed the architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jabbar, A. (2016). "Urdu Compound Words Manufacturing a State of Art."&lt;/strong&gt;&lt;br&gt;
Provides the Urdu Affix Word List (UAWL) — the definitive catalog of Urdu derivational morphemes. The 100+ affix morphemes in Layer 1 (&lt;code&gt;AFFIX_SET&lt;/code&gt;, &lt;code&gt;PREFIX_SET&lt;/code&gt;, &lt;code&gt;SUFFIX_SET&lt;/code&gt;) are drawn from this work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rahman, M. "A Linguistic Classification of Urdu Compound Words."&lt;/strong&gt;&lt;br&gt;
Informed the typological distinctions between compound categories — specifically the Perso-Arabic vs. native Urdu origin split and vav-e-atf chain patterns. Shaped the &lt;code&gt;CompoundType&lt;/code&gt; taxonomy and izafat heuristics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"High Performance Stemming Algorithm to Handle Multi-Word Expressions."&lt;/strong&gt;&lt;br&gt;
Motivated the &lt;code&gt;joinCompounds()&lt;/code&gt; + &lt;code&gt;tokenize()&lt;/code&gt; pipeline design — the paper demonstrates that semantic integrity is best preserved by preventing erroneous splits at the input boundary, not by post-processing token sequences. Also reinforced N-gram scanning over bigram-only approaches.&lt;/p&gt;


&lt;h2&gt;
  
  
  Used in production
&lt;/h2&gt;

&lt;p&gt;This library is not a side project. It runs in three production systems:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://hamaariurdu.com" rel="noopener noreferrer"&gt;HamaariUrdu&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Urdu language learning platform — normalization, search, compound detection, numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://pal.gov.pk" rel="noopener noreferrer"&gt;Pakistan Academy of Letters&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Government literary institution — normalization, search, sorting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://dlp.gov.pk" rel="noopener noreferrer"&gt;Digital Library of PAL&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Government digital Urdu archive — normalization, search, encoding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;HamaariUrdu was the origin — the library was extracted from production code where these problems were first encountered and solved. PAL and DLP integrated later for their Urdu text search and archiving systems.&lt;/p&gt;


&lt;h2&gt;
  
  
  Live Playground
&lt;/h2&gt;

&lt;p&gt;Every function is interactive at &lt;strong&gt;&lt;a href="https://iamahsanmehmood.github.io/urdu-tools/" rel="noopener noreferrer"&gt;iamahsanmehmood.github.io/urdu-tools&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The playground includes compound reporting built-in: if you find a compound the detector misses, or a pair it wrongly detects, you can report it directly from the UI — a pre-filled GitHub issue opens in one click.&lt;/p&gt;


&lt;h2&gt;
  
  
  Contributing
&lt;/h2&gt;

&lt;p&gt;The compound lexicon (3,262 roots, expandable) is the highest-impact area for non-developer contributions. If you know Urdu, you can contribute without writing code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/urdu-js/src/compound/lexicon-data.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Format: ['rootWord', new Set(['tail1', 'tail2'])]&lt;/span&gt;

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;محنت&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;مشقت&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;علم&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;و ہنر&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;و عمل&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;کیمیا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;انسائیکلوپیڈیا&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;آف اسلام&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full guide in &lt;a href="https://github.com/iamahsanmehmood/urdu-tools/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;CONTRIBUTING.md&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;strong&gt;&lt;a href="https://github.com/iamahsanmehmood/urdu-tools" rel="noopener noreferrer"&gt;github.com/iamahsanmehmood/urdu-tools&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;اردو سافٹ ویئر کو بہتر بنانے میں ہمارا ساتھ دیں۔&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Help us make Urdu software better.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #urdu #nlp #typescript #dotnet #opensource&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>dotnet</category>
    </item>
    <item>
      <title>Building a Multi-Terminal Restaurant POS with C# .NET — Architecture &amp; Lessons</title>
      <dc:creator>Ahsan Mehmood</dc:creator>
      <pubDate>Mon, 09 Mar 2026 03:33:29 +0000</pubDate>
      <link>https://forem.com/iamahsanmehmood/building-a-multi-terminal-restaurant-pos-with-c-net-architecture-lessons-56e2</link>
      <guid>https://forem.com/iamahsanmehmood/building-a-multi-terminal-restaurant-pos-with-c-net-architecture-lessons-56e2</guid>
      <description>&lt;p&gt;Building a Point-of-Sale system sounds straightforward until you realize it needs to handle &lt;strong&gt;multiple terminals, thermal printers, kitchen displays, real-time table tracking, and never lose a transaction&lt;/strong&gt; — even when the network drops.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;RestoCare+&lt;/strong&gt; — a multi-terminal restaurant POS system — and in this post, I'll walk through the architecture decisions, the problems I ran into, and what I'd do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;I spent 4+ years working as a supervisor at a restaurant in Islamabad. I saw firsthand how terrible most POS systems were — slow, unreliable, confusing for staff, and impossible to maintain.&lt;/p&gt;

&lt;p&gt;When I transitioned into software development, this was the first real product I wanted to build. Not because it was technically exciting, but because I &lt;strong&gt;deeply understood the problem&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Language:&lt;/strong&gt; C#&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework:&lt;/strong&gt; .NET Framework (WinForms for UI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; SQL Server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Printing:&lt;/strong&gt; ESC/POS commands via Windows Services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture:&lt;/strong&gt; Client-Server with centralized SQL database&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Terminal 1   │    │  Terminal 2   │    │  Terminal 3   │
│  (Cashier)    │    │  (Cashier)    │    │  (Manager)    │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
       └───────────┬───────┴───────────────────┘
                   │
           ┌───────┴───────┐
           │   SQL Server   │
           │   (Central DB) │
           └───────┬───────┘
                   │
       ┌───────────┼───────────┐
       │           │           │
┌──────┴──┐  ┌────┴────┐  ┌───┴──────┐
│ Thermal  │  │ Kitchen  │  │ Receipt  │
│ Printer 1│  │ Display  │  │ Printer  │
└─────────┘  └─────────┘  └──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Design Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Centralized Database, Not Peer-to-Peer
&lt;/h3&gt;

&lt;p&gt;Every terminal connects to a &lt;strong&gt;single SQL Server instance&lt;/strong&gt;. I considered SQLite per terminal with sync, but restaurants can't tolerate eventual consistency — if Terminal 1 marks Table 5 as occupied, Terminal 2 needs to see that immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Connection string points to central SQL Server&lt;/span&gt;
&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="n"&gt;connectionString&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ConfigurationManager&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionStrings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"RestoCareDB"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;ConnectionString&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Real-Time Table Management
&lt;/h3&gt;

&lt;p&gt;The table management system uses a polling approach to keep all terminals in sync. Every terminal refreshes the floor plan every few seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Table status refresh timer&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="n"&gt;Timer&lt;/span&gt; &lt;span class="n"&gt;_tableRefreshTimer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;InitializeTableSync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;_tableRefreshTimer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Timer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 3-second interval&lt;/span&gt;
    &lt;span class="n"&gt;_tableRefreshTimer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Elapsed&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;RefreshTableStatuses&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;_tableRefreshTimer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;RefreshTableStatuses&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;object&lt;/span&gt; &lt;span class="n"&gt;sender&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ElapsedEventArgs&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_tableRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetAllWithStatus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;UpdateTableUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why polling instead of SignalR/WebSockets?&lt;/strong&gt; Simplicity. In a restaurant with 3-5 terminals on a local network, a 3-second poll is good enough and dramatically simpler to debug when something goes wrong at 9 PM on a Friday night.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Thermal Printing via Windows Service
&lt;/h3&gt;

&lt;p&gt;Thermal receipt printers speak &lt;strong&gt;ESC/POS&lt;/strong&gt; — a binary command language. I built a dedicated Windows Service (&lt;code&gt;IMS Print Service&lt;/code&gt;) that handles all print jobs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ESC/POS command to print bold text&lt;/span&gt;
&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;boldOn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="m"&gt;0x1B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x01&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;boldOff&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="m"&gt;0x1B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x00&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;centerAlign&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="m"&gt;0x1B&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x61&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0x01&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;PrintReceipt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;printer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RawPrinter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_printerName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;centerAlign&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;boldOn&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"RESTAURANT NAME\n"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;boldOff&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;$"&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,-&lt;/span&gt;&lt;span class="m"&gt;20&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt; x&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Qty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;\n"&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WriteText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;$"\nTOTAL: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s"&gt;\n"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;printer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CutPaper&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why a Windows Service?&lt;/strong&gt; Two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The print service runs independently — even if the POS UI crashes, queued prints still go through.&lt;/li&gt;
&lt;li&gt;Multiple terminals can send print jobs to the same service, which queues them to avoid conflicts.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Kitchen Display Integration
&lt;/h3&gt;

&lt;p&gt;When a waiter submits an order, it needs to appear on the kitchen display instantly. The kitchen display is a separate WinForms app running on a screen in the kitchen:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Kitchen display polls for new orders&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;KitchenOrder&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;GetPendingOrders&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;RestoCareContext&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Orders&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OrderStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pending&lt;/span&gt; 
                     &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Status&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;OrderStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InProgress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CreatedAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Include&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToList&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kitchen staff tap items as they're prepared, and the waiter's terminal updates in real-time showing which items are ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Handling the Offline Scenario
&lt;/h3&gt;

&lt;p&gt;What happens when the network drops? This is critical in restaurants — you can't stop taking orders because WiFi went down.&lt;/p&gt;

&lt;p&gt;My approach: &lt;strong&gt;local queue with retry.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// If SQL Server is unreachable, queue locally&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;SubmitOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_orderRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;_kitchenService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NotifyNewOrder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SqlException&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;_localQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;_logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"DB unreachable — order queued locally"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// Background task retries every 10 seconds&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Mistakes I Made
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Not planning for menu changes
&lt;/h3&gt;

&lt;p&gt;My initial schema had menu items tightly coupled to orders. When the restaurant changed prices or renamed dishes, it broke historical reports. &lt;strong&gt;Fix:&lt;/strong&gt; I added a snapshot of the item at order time — &lt;code&gt;OrderItem&lt;/code&gt; stores its own &lt;code&gt;PriceAtTime&lt;/code&gt; and &lt;code&gt;NameAtTime&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Underestimating receipt formatting
&lt;/h3&gt;

&lt;p&gt;Thermal printers have &lt;strong&gt;42-character line width&lt;/strong&gt; (for 80mm paper). I spent more time formatting receipts than I expected. Arabic/Urdu text support was a whole adventure.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Not building user management from day one
&lt;/h3&gt;

&lt;p&gt;I added multi-user roles (cashier, manager, admin) later, and retrofitting permissions into an existing system is painful. &lt;strong&gt;Always plan roles early.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use .NET 8&lt;/strong&gt; instead of .NET Framework — better performance, cross-platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add offline-first architecture&lt;/strong&gt; from the start, not as an afterthought&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build a web dashboard&lt;/strong&gt; alongside the desktop app for owners to check reports remotely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use message queues&lt;/strong&gt; (RabbitMQ or even a simple one) instead of polling for kitchen display&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;RestoCare+ is running in production at real restaurants. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ 3-5 terminals simultaneously&lt;/li&gt;
&lt;li&gt;✅ Thermal receipt printing with proper formatting&lt;/li&gt;
&lt;li&gt;✅ Kitchen display with real-time order updates&lt;/li&gt;
&lt;li&gt;✅ Table management with status tracking&lt;/li&gt;
&lt;li&gt;✅ Daily sales reports and analytics&lt;/li&gt;
&lt;li&gt;✅ Menu management with category organization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Domain knowledge is your unfair advantage.&lt;/strong&gt; My 4 years in a restaurant made me a better POS developer than someone with 10 years of pure coding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Polling is fine for local networks.&lt;/strong&gt; Don't over-engineer with WebSockets when 3-second polling works perfectly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thermal printing is harder than it looks.&lt;/strong&gt; Budget extra time for this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the boring stuff well.&lt;/strong&gt; Boring features like user roles, audit logs, and error handling are what separate a demo from a product.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;I'm Ahsan Mehmood — a Full-Stack Developer and Co-Founder of &lt;a href="https://xechtech.com" rel="noopener noreferrer"&gt;XechTech&lt;/a&gt;. I share what I learn from building real-world software. Follow for more .NET, Flutter, and AI content.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Connect: &lt;a href="https://linkedin.com/in/iamahsanmehmood" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://github.com/iamahsanmehmood" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://xechtech.com" rel="noopener noreferrer"&gt;XechTech&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




</description>
      <category>dotnet</category>
      <category>csharp</category>
      <category>architecture</category>
      <category>database</category>
    </item>
    <item>
      <title>From Restaurant Supervisor to Technical Founder: Shipping 38+ Projects in 5 Years</title>
      <dc:creator>Ahsan Mehmood</dc:creator>
      <pubDate>Mon, 09 Mar 2026 03:25:50 +0000</pubDate>
      <link>https://forem.com/iamahsanmehmood/from-restaurant-supervisor-to-technical-founder-shipping-38-projects-in-5-years-1l4m</link>
      <guid>https://forem.com/iamahsanmehmood/from-restaurant-supervisor-to-technical-founder-shipping-38-projects-in-5-years-1l4m</guid>
      <description>&lt;p&gt;Hey Dev.to 👋 I'm Ahsan Mehmood — a Full-Stack Developer and Co-Founder of XechTech, based in Islamabad, Pakistan.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Journey
&lt;/h2&gt;

&lt;p&gt;In 2017, I was working as a restaurant supervisor. By 2021, I had pivoted into software development full-time. Today, I've shipped &lt;strong&gt;38+ production projects&lt;/strong&gt; across 8+ organizations, serving clients in Pakistan, the US, and Australia.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Build
&lt;/h2&gt;

&lt;p&gt;My core stack is &lt;strong&gt;.NET/C#&lt;/strong&gt;, but I work across the full spectrum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🖥️ &lt;strong&gt;Desktop&lt;/strong&gt;: Multi-terminal POS systems, payroll, accounting tools (C#, .NET, SQL Server)&lt;/li&gt;
&lt;li&gt;📱 &lt;strong&gt;Mobile&lt;/strong&gt;: Cross-platform apps with Flutter (Booktionary, EstiMate Pro, DLP App)&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Web&lt;/strong&gt;: React, Laravel, Node.js, TypeScript (pal.gov.pk, xechtech.com)&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI&lt;/strong&gt;: AutoCAD AI Agent, Gemini PC-Commander, LLM-powered workflows&lt;/li&gt;
&lt;li&gt;🏗️ &lt;strong&gt;Engineering&lt;/strong&gt;: 7+ structural tools for RPEQ-certified projects in Australia&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Company: XechTech
&lt;/h2&gt;

&lt;p&gt;I co-founded &lt;a href="https://xechtech.com" rel="noopener noreferrer"&gt;XechTech&lt;/a&gt; in 2021 with my partner Aaqib Saleem. We build high-end software solutions — from RestoCare+ (a restaurant POS system with thermal printing and kitchen displays) to AI-powered PDF automation tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Projects
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RestoCare+ POS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-terminal restaurant POS&lt;/td&gt;
&lt;td&gt;C#, .NET, SQL Server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EstiMate Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI-powered PDF automation&lt;/td&gt;
&lt;td&gt;Flutter, Dart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engineering Suite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7+ structural analysis tools&lt;/td&gt;
&lt;td&gt;C#, .NET, AutoCAD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PAL Website&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Government website&lt;/td&gt;
&lt;td&gt;PHP, Laravel, Flutter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Booktionary&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dictionary &amp;amp; book app&lt;/td&gt;
&lt;td&gt;Flutter, SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AutoCAD AI Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI-generated floor plans&lt;/td&gt;
&lt;td&gt;Python, Gemini API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I've Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson from shipping 38+ projects: &lt;strong&gt;Solve real problems for real people.&lt;/strong&gt; The fanciest tech stack means nothing if it doesn't solve a pain point.&lt;/p&gt;

&lt;p&gt;I built RestoCare+ because I spent 4 years in a restaurant and knew exactly what was broken. I built engineering tools because construction firms in Australia needed calculations done faster. Every successful project started with understanding the problem deeply.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm currently focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI integration into traditional business software&lt;/li&gt;
&lt;li&gt;Growing XechTech's client base internationally&lt;/li&gt;
&lt;li&gt;Sharing what I've learned through writing here on Dev.to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're a developer in Pakistan building your career, or if you're transitioning into tech from another field — I'd love to connect. Drop a comment or find me on &lt;a href="https://linkedin.com/in/iamahsanmehmood" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; 🤝&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow me for posts about .NET, Flutter, AI integration, and building software businesses from Pakistan.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>startup</category>
      <category>beginners</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
