<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yurii Lozinskyi</title>
    <description>The latest articles on Forem by Yurii Lozinskyi (@yurii_lozinskyi).</description>
    <link>https://forem.com/yurii_lozinskyi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3703189%2F47afc5ee-74e7-4872-b299-3af97b3d17f0.jpg</url>
      <title>Forem: Yurii Lozinskyi</title>
      <link>https://forem.com/yurii_lozinskyi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yurii_lozinskyi"/>
    <language>en</language>
    <item>
      <title>When the Matrix Breaks: Failure Modes of Early Matching Systems</title>
      <dc:creator>Yurii Lozinskyi</dc:creator>
      <pubDate>Wed, 11 Feb 2026 14:53:57 +0000</pubDate>
      <link>https://forem.com/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6</link>
      <guid>https://forem.com/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6</guid>
      <description>&lt;p&gt;In previous articles, we discussed how to build a matching system without Big Tech resources, why matrices come before neural networks, when ML finally becomes justified, and why explainability is a survival mechanism.&lt;/p&gt;

&lt;p&gt;Now it’s time to talk about something less comfortable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How these systems actually break.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not with exceptions. Not without outages.&lt;br&gt;
But with slow, silent degradation.&lt;/p&gt;

&lt;p&gt;This article is about the failure modes that appear &lt;em&gt;before&lt;/em&gt; ML and why recognizing them early matters more than adding another model.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Failure Mode: Matrix Saturation
&lt;/h2&gt;

&lt;p&gt;At some point, everything starts to look “kind of relevant.”&lt;/p&gt;

&lt;p&gt;Different requests produce similar top results. Explainability payloads look correct but uninformative. Users say: &lt;em&gt;“It always suggests the same profiles.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is matrix saturation. It usually happens when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dimensions are too coarse,&lt;/li&gt;
&lt;li&gt;feature buckets are too broad,&lt;/li&gt;
&lt;li&gt;context modifiers are missing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system technically works, but it has lost resolution.&lt;br&gt;
Adding ML here doesn’t fix the problem as it learns the same flat landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Failure Mode: Signal Dominance
&lt;/h2&gt;

&lt;p&gt;One signal quietly takes over.&lt;/p&gt;

&lt;p&gt;Every explanation looks like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Ranked highly because of X.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Other signals still exist, but they no longer matter.&lt;/p&gt;

&lt;p&gt;This often comes from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;improper normalization,&lt;/li&gt;
&lt;li&gt;early weight tuning,&lt;/li&gt;
&lt;li&gt;missing caps or decay functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before ML, this was already dangerous.&lt;br&gt;
After ML, it becomes irreversible.&lt;/p&gt;

&lt;p&gt;The model will learn that only one thing matters, even if it shouldn’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Failure Mode: Silent Bias Accumulation
&lt;/h2&gt;

&lt;p&gt;The system slowly favors a narrow subset of supply.&lt;/p&gt;

&lt;p&gt;No rule explicitly enforces it. The metrics appear stable, but diversity is declining.&lt;/p&gt;

&lt;p&gt;This happens because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;positive feedback loops reinforce visibility,&lt;/li&gt;
&lt;li&gt;negative signals are missing or ignored,&lt;/li&gt;
&lt;li&gt;UX choices shape behavior unintentionally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without explainability, this bias remains invisible.&lt;br&gt;
With ML, it becomes institutionalized.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Failure Mode: Gaming the System
&lt;/h2&gt;

&lt;p&gt;Supply-side actors adapt faster than the system.&lt;/p&gt;

&lt;p&gt;They learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which fields matter,&lt;/li&gt;
&lt;li&gt;which keywords boost ranking,&lt;/li&gt;
&lt;li&gt;which signals are easy to fake.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;features lose meaning,&lt;/li&gt;
&lt;li&gt;similarity collapses,&lt;/li&gt;
&lt;li&gt;relevance becomes performative.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not malicious behavior. It’s rational optimization.&lt;br&gt;
If you don’t design for it, it will happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Failure Mode: Explainability Drift
&lt;/h2&gt;

&lt;p&gt;This one is subtle and dangerous.&lt;/p&gt;

&lt;p&gt;Explanations still sound reasonable, but they no longer reflect real scoring logic.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scoring logic evolved,&lt;/li&gt;
&lt;li&gt;explanations didn’t,&lt;/li&gt;
&lt;li&gt;versions diverged.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product teams can’t reproduce decisions,&lt;/li&gt;
&lt;li&gt;auditors lose confidence,&lt;/li&gt;
&lt;li&gt;trust erodes quietly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Explainability without versioning is technical debt.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Why “Just Add ML” Makes This Worse
&lt;/h2&gt;

&lt;p&gt;When ML is added at this stage, it learns from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;saturated rankings,&lt;/li&gt;
&lt;li&gt;dominant signals,&lt;/li&gt;
&lt;li&gt;accumulated bias,&lt;/li&gt;
&lt;li&gt;gamed features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model doesn’t fix the system.&lt;br&gt;
It freezes its worst behaviors into weights.&lt;/p&gt;

&lt;p&gt;Now the problem is harder to see and harder to undo.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Designing Matrices That Expect to Break
&lt;/h2&gt;

&lt;p&gt;Healthy systems assume failure.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;monitoring signal distributions, not just outcomes,&lt;/li&gt;
&lt;li&gt;treating explainability as a contract, not a debug tool,&lt;/li&gt;
&lt;li&gt;embedding governance hooks early.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Matrices are not temporary scaffolding.&lt;br&gt;
They are operational components.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. A Practical Checklist
&lt;/h2&gt;

&lt;p&gt;Ask these questions regularly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are top-N results diversifying over time?&lt;/li&gt;
&lt;li&gt;Do explanations meaningfully change across requests?&lt;/li&gt;
&lt;li&gt;Is any single signal dominating rankings?&lt;/li&gt;
&lt;li&gt;Can product teams reproduce decisions?&lt;/li&gt;
&lt;li&gt;Are new supply actors ever surfaced?&lt;/li&gt;
&lt;li&gt;Do explanations still match scoring logic?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can’t answer these confidently, the matrix is already breaking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Early matching systems rarely fail catastrophically.&lt;/p&gt;

&lt;p&gt;They fail quietly.&lt;br&gt;
They fail politely.&lt;br&gt;
They fail while still “working.”&lt;/p&gt;

&lt;p&gt;The teams that succeed are not the ones who rush to ML.&lt;br&gt;
They are the ones who understand &lt;em&gt;how their systems break&lt;/em&gt; — and design for it.&lt;/p&gt;

&lt;p&gt;Before you teach a system to learn,&lt;br&gt;
make sure it knows how to fail.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.linkedin.com/in/yuriylozinsky/" rel="noopener noreferrer"&gt;Yurii Lozinskyi - AI Delivery Lead &amp;amp; AI Practice Director&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Part 1.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;Building an AI Matching Engine Without Big Tech Resources&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 2.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;AI Matching: Matrix First, Neural Nets Later&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 3.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;From Matrix to Model: When Is It Finally Safe to Train ML?&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 4.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7"&gt;Explainability in AI: Not a Feature, but a Vital Mechanism&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 5.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6"&gt;When the Matrix Breaks: Failure Modes of Early Matching Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>matching</category>
      <category>explainability</category>
    </item>
    <item>
      <title>Explainability in AI Is Not a Feature. It’s a Survival Mechanism.</title>
      <dc:creator>Yurii Lozinskyi</dc:creator>
      <pubDate>Sat, 07 Feb 2026 19:54:36 +0000</pubDate>
      <link>https://forem.com/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7</link>
      <guid>https://forem.com/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7</guid>
      <description>&lt;h2&gt;
  
  
  If your AI system works but no one can explain why, it doesn’t really work.
&lt;/h2&gt;

&lt;p&gt;That statement may sound provocative, but it captures a hard-earned lesson from building AI-powered matching systems under real-world constraints. Many systems don’t fail because their models are inaccurate. They fail because no one (users, delivery teams, or auditors) can understand why a decision was made.&lt;/p&gt;

&lt;p&gt;Explainability is not a UI feature you add later.&lt;br&gt;&lt;br&gt;
It’s a survival mechanism.&lt;/p&gt;

&lt;p&gt;This article continues the thread from the previous ones:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first we &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;built a matching system without Big Tech resources&lt;/a&gt;,
&lt;/li&gt;
&lt;li&gt;then we showed &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;why matrices come before neural networks&lt;/a&gt;,
&lt;/li&gt;
&lt;li&gt;then we discussed &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;when it’s finally safe to train ML&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now we address the next unavoidable question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you keep an AI system trustworthy once it starts making decisions?&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  1. Why explainability becomes unavoidable
&lt;/h2&gt;

&lt;p&gt;At some point, every AI-powered delivery or matching system reaches the same moment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system produces results.
&lt;/li&gt;
&lt;li&gt;Metrics look reasonable.
&lt;/li&gt;
&lt;li&gt;Accuracy appears acceptable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then someone asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why did the system choose &lt;em&gt;this&lt;/em&gt; option?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This question doesn’t come from engineers first.&lt;br&gt;&lt;br&gt;
It comes from product owners, business stakeholders, compliance teams, and end users.&lt;/p&gt;

&lt;p&gt;In matching systems like marketplaces, supplier selection, and regulated workflows, people don’t just want a score. They want a &lt;em&gt;reason&lt;/em&gt;. Without it, trust erodes quickly, even if the system is technically correct.&lt;/p&gt;

&lt;p&gt;Explainability becomes unavoidable the moment your system influences real decisions.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Explainability vs observability vs governance
&lt;/h2&gt;

&lt;p&gt;These concepts are often discussed together, but they solve different problems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Explainability&lt;/strong&gt; answers &lt;em&gt;why&lt;/em&gt; a specific decision was made.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; shows &lt;em&gt;what&lt;/em&gt; is happening inside the system over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance&lt;/strong&gt; defines &lt;em&gt;what is allowed&lt;/em&gt;, &lt;em&gt;what is risky&lt;/em&gt;, and &lt;em&gt;who is accountable&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They form a layered stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   +--------------------+
   |     Governance     |
   |   (Rules, Risks)   |
   +---------+----------+
             |
   +---------+----------+
   |    Observability   |
   |  (Metrics, Drift)  |
   +---------+---------=+
             |
   +---------+----------+
   |   Explainability   |
   |  (Why this match)  |
   +--------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;strong&gt;explainability&lt;/strong&gt;, observability becomes abstract.&lt;br&gt;
Without &lt;strong&gt;observability&lt;/strong&gt;, governance becomes blind.&lt;br&gt;
Without &lt;strong&gt;governance&lt;/strong&gt;, explainability is just storytelling.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Why do matching systems need explainability more than most AI systems
&lt;/h2&gt;

&lt;p&gt;Matching is not classification.&lt;br&gt;
It’s not a prediction.&lt;br&gt;
It’s &lt;strong&gt;multi-factor ranking under constraints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Users don’t ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Is this prediction correct?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why is &lt;em&gt;this&lt;/em&gt; option higher than the others?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the system cannot answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why supplier A ranked above supplier B,&lt;/li&gt;
&lt;li&gt;why a campaign brief changed the ranking,&lt;/li&gt;
&lt;li&gt;why similar requests produced different results,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then users will bypass the system, even if it’s statistically “good”.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. What real explainability looks like
&lt;/h2&gt;

&lt;p&gt;Explainability is not a single number or a heatmap.&lt;br&gt;
It’s a &lt;strong&gt;structured explanation tied to signals&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example explainability payload for a match:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;campaign_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;influencer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;456&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score_breakdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matrix_compatibility_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_similarity_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;caption_similarity_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_prediction_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.61&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;why&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Campaign.blog_type=expert aligns with influencer.social_status=micro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High semantic overlap in professional tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lower caption similarity due to missing niche terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit_meta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-12T10:32:00Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matching-v1.3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature_flags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;caption_v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This payload doesn’t just show a score.&lt;br&gt;
It explains how the system reasoned.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Observability: seeing patterns, not just logs
&lt;/h2&gt;

&lt;p&gt;Explainability becomes powerful only when paired with observability.&lt;br&gt;
Good observability focuses on signal behavior, not just uptime or latency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distribution of individual scores over time,&lt;/li&gt;
&lt;li&gt;correlation between signals and outcomes,&lt;/li&gt;
&lt;li&gt;drift in embeddings or matrix usage,&lt;/li&gt;
&lt;li&gt;anomalies in ranking patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example instrumentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matching.matrix_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;matrix_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matching.semantic_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;semantic_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matching.explainability_gaps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;missing_explanations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These metrics allow teams to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the system behaving as designed?&lt;/li&gt;
&lt;li&gt;Which signals dominate decisions?&lt;/li&gt;
&lt;li&gt;Where does behavior diverge from expectations?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Explainability enables governance and compliance
&lt;/h2&gt;

&lt;p&gt;In regulated or high-stakes environments, explainability is not optional.&lt;/p&gt;

&lt;p&gt;Auditors don’t want probabilities.&lt;br&gt;
They want rationales.&lt;/p&gt;

&lt;p&gt;Governance logic often depends on explainability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auditor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;include_full_decision_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;audit trails,&lt;/li&gt;
&lt;li&gt;historical decision reviews,&lt;/li&gt;
&lt;li&gt;risk analysis,&lt;/li&gt;
&lt;li&gt;regulatory compliance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without explainability, governance becomes guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Explainability and AI agents
&lt;/h2&gt;

&lt;p&gt;AI agents amplify the importance of explainability.&lt;/p&gt;

&lt;p&gt;A non-explainable agent output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Suggested&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;
&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.86&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A usable agent output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Suggested&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;
&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.86&lt;/span&gt;
&lt;span class="n"&gt;Reasons&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Strong&lt;/span&gt; &lt;span class="n"&gt;compatibility&lt;/span&gt; &lt;span class="n"&gt;prior&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Professional&lt;/span&gt; &lt;span class="n"&gt;semantic&lt;/span&gt; &lt;span class="n"&gt;alignment&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Low&lt;/span&gt; &lt;span class="n"&gt;risk&lt;/span&gt; &lt;span class="n"&gt;based&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;historical&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents without explanations are dangerous.&lt;br&gt;
They produce confident answers without accountability.&lt;/p&gt;

&lt;p&gt;Explainability turns agents from black boxes into collaborators.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. A real failure that explainability revealed
&lt;/h2&gt;

&lt;p&gt;In one deployment, aggregate metrics looked healthy.&lt;br&gt;
But users reported “odd” matches for specific campaign types.&lt;/p&gt;

&lt;p&gt;Explainability revealed that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embedding similarity-dominated decisions in edge cases,&lt;/li&gt;
&lt;li&gt;compatibility priors were being overridden unintentionally,&lt;/li&gt;
&lt;li&gt;recent data drift affected only a subset of campaigns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix wasn’t a new model.&lt;br&gt;
It was correcting signal weighting and drift detection.&lt;br&gt;
Without explainability, the system would have failed silently.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Putting it all together
&lt;/h2&gt;

&lt;p&gt;Explainability is not something you add after ML.&lt;br&gt;
It’s part of the architecture that enables ML to be sustainable.&lt;/p&gt;

&lt;p&gt;It connects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;decision -&amp;gt; reasoning,&lt;/li&gt;
&lt;li&gt;reasoning -&amp;gt; observability,&lt;/li&gt;
&lt;li&gt;observability -&amp;gt; governance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In AI-powered delivery systems, explainability is not a “nice to have”.&lt;br&gt;
It’s what keeps systems trustworthy, auditable, and correctable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Machine learning can optimize decisions.&lt;br&gt;
Explainability ensures that those decisions withstand real-world scrutiny.&lt;/p&gt;

&lt;p&gt;If your system produces answers but cannot explain them, it may look intelligent, but it will eventually fail where it matters most.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.linkedin.com/in/yuriylozinsky/" rel="noopener noreferrer"&gt;Yurii Lozinskyi - AI Delivery Lead &amp;amp; AI Practice Director&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Part 1.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;Building an AI Matching Engine Without Big Tech Resources&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 2.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;AI Matching: Matrix First, Neural Nets Later&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 3.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;From Matrix to Model: When Is It Finally Safe to Train ML?&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 4.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7"&gt;Explainability in AI: Not a Feature, but a Vital Mechanism&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 5.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6"&gt;When the Matrix Breaks: Failure Modes of Early Matching Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>explainability</category>
      <category>observability</category>
    </item>
    <item>
      <title>From Matrix to Model: When Is It Finally Safe to Train ML?</title>
      <dc:creator>Yurii Lozinskyi</dc:creator>
      <pubDate>Wed, 04 Feb 2026 14:15:55 +0000</pubDate>
      <link>https://forem.com/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3</link>
      <guid>https://forem.com/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3</guid>
      <description>&lt;p&gt;Most teams don’t fail at machine learning because of bad models.&lt;br&gt;&lt;br&gt;
They fail because they try to train models &lt;strong&gt;before the system is ready to learn&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After shipping a matrix-based matching system, the next inevitable question appears:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Okay, when do we finally replace this with ML?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To answer that honestly, we need to step away from abstract ML theory and examine how real systems evolve.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. A concrete scenario: supplier selection in a marketplace
&lt;/h2&gt;

&lt;p&gt;Let’s ground this discussion in a real-world use case.&lt;/p&gt;

&lt;p&gt;Imagine a B2B marketplace that helps companies select service providers — agencies, vendors, or contractors.&lt;br&gt;&lt;br&gt;
The platform sits between two sides with very different expectations.&lt;/p&gt;

&lt;p&gt;On the demand side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;some clients care about reputation and risk,&lt;/li&gt;
&lt;li&gt;others prioritize niche expertise,&lt;/li&gt;
&lt;li&gt;others want speed and flexibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the supply side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;providers differ in size,&lt;/li&gt;
&lt;li&gt;maturity,&lt;/li&gt;
&lt;li&gt;credibility,&lt;/li&gt;
&lt;li&gt;and communication style.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At launch, the platform has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no historical performance data,&lt;/li&gt;
&lt;li&gt;no clear notion of “successful” vs “failed” matches,&lt;/li&gt;
&lt;li&gt;no reliable feedback loops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yet users still expect &lt;strong&gt;reasonable recommendations on day one&lt;/strong&gt;. (Details in &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;Article: Matrix-first matching&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;This is where many teams ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Shouldn’t we train a model?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And this is where most ML-first approaches break.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Why ML fails first in this scenario
&lt;/h2&gt;

&lt;p&gt;Before talking about solutions, it’s important to understand why ML struggles here.&lt;/p&gt;

&lt;p&gt;In early-stage marketplaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a rejected supplier does not mean “bad match”,&lt;/li&gt;
&lt;li&gt;a selected supplier does not mean “good match”,&lt;/li&gt;
&lt;li&gt;outcomes depend on off-platform conversations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From a data perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;labels are weak or missing,&lt;/li&gt;
&lt;li&gt;feedback is delayed or ambiguous,&lt;/li&gt;
&lt;li&gt;user behavior is heavily shaped by defaults and UI ordering.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Training a model at this stage doesn’t produce intelligence.&lt;br&gt;&lt;br&gt;
It produces a confident replication of noise.&lt;/p&gt;

&lt;p&gt;The problem isn’t model choice.&lt;br&gt;&lt;br&gt;
The problem is &lt;strong&gt;learning from signals that don’t mean what we think they mean&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Why a compatibility matrix exists in the first place
&lt;/h2&gt;

&lt;p&gt;This is where explicit priors come in.&lt;/p&gt;

&lt;p&gt;Instead of asking the system to &lt;em&gt;learn&lt;/em&gt; relevance, we start by &lt;strong&gt;encoding expectations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;conservative enterprise clients expect established suppliers,&lt;/li&gt;
&lt;li&gt;startups often prefer smaller, flexible providers,&lt;/li&gt;
&lt;li&gt;regulated industries prioritize reputation and compliance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These expectations can be expressed explicitly using a small set of stable features.&lt;/p&gt;

&lt;p&gt;A compatibility matrix does exactly that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it encodes domain knowledge,&lt;/li&gt;
&lt;li&gt;enforces product constraints,&lt;/li&gt;
&lt;li&gt;and produces consistent behavior without training data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Importantly, the matrix does &lt;strong&gt;not&lt;/strong&gt; predict outcomes.&lt;br&gt;&lt;br&gt;
It defines what is &lt;em&gt;reasonable&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The matrix as a stabilizing prior
&lt;/h2&gt;

&lt;p&gt;In the marketplace example, the matrix plays three critical roles.&lt;/p&gt;

&lt;p&gt;First, it enforces constraints.&lt;br&gt;&lt;br&gt;
High-risk suppliers are discouraged for conservative clients without hard rejection.&lt;/p&gt;

&lt;p&gt;Second, it enables explainability.&lt;br&gt;&lt;br&gt;
The system can say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This supplier ranks higher because their profile aligns with your request type.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Third — and most importantly — it shapes early behavior.&lt;/p&gt;

&lt;p&gt;Early interactions are not random.&lt;br&gt;&lt;br&gt;
They happen within a controlled decision space.&lt;/p&gt;

&lt;p&gt;That matters because &lt;strong&gt;early behavior becomes future data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If early matches are arbitrary, future training data will be arbitrary too.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. When the system starts producing usable data
&lt;/h2&gt;

&lt;p&gt;Over time, something changes.&lt;/p&gt;

&lt;p&gt;The marketplace now observes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which suppliers were shown,&lt;/li&gt;
&lt;li&gt;which were shortlisted,&lt;/li&gt;
&lt;li&gt;which were contacted,&lt;/li&gt;
&lt;li&gt;which engagements progressed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crucially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;events are logged intentionally,&lt;/li&gt;
&lt;li&gt;success criteria are defined upfront,&lt;/li&gt;
&lt;li&gt;the matching logic remains stable during data collection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the moment many teams miss.&lt;/p&gt;

&lt;p&gt;ML becomes viable not when data exists,&lt;br&gt;&lt;br&gt;
but when data reflects &lt;strong&gt;intentional system behavior&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Data collected accidentally is rarely useful for learning.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. From matrix to model: the safe transition
&lt;/h2&gt;

&lt;p&gt;At this stage, teams often expect neural networks to be the next step.&lt;/p&gt;

&lt;p&gt;In practice, they rarely are.&lt;/p&gt;

&lt;p&gt;The first successful transition usually involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;learning-to-rank models,&lt;/li&gt;
&lt;li&gt;gradient-boosted trees,&lt;/li&gt;
&lt;li&gt;or simple linear models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The compatibility matrix does not disappear.&lt;br&gt;&lt;br&gt;
It becomes just another feature.&lt;/p&gt;

&lt;p&gt;The model learns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;when the matrix over-penalizes,&lt;/li&gt;
&lt;li&gt;when exceptions occur,&lt;/li&gt;
&lt;li&gt;which signals matter more than expected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ML does not replace judgment.&lt;br&gt;&lt;br&gt;
It &lt;strong&gt;refines it&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Why synthetic data doesn’t fix this problem
&lt;/h2&gt;

&lt;p&gt;Some teams try to accelerate learning by generating synthetic data.&lt;/p&gt;

&lt;p&gt;In marketplaces — especially B2B or regulated ones — this is dangerous.&lt;/p&gt;

&lt;p&gt;Synthetic data assumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;known distributions,&lt;/li&gt;
&lt;li&gt;known success criteria,&lt;/li&gt;
&lt;li&gt;known user behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Early-stage systems have none of that.&lt;/p&gt;

&lt;p&gt;A model trained on synthetic outcomes optimizes for imagined users.&lt;br&gt;&lt;br&gt;
That’s worse than using a matrix.&lt;/p&gt;

&lt;p&gt;The matrix, while imperfect, stays honest.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. The full evolution path, revisited
&lt;/h2&gt;

&lt;p&gt;In this marketplace scenario, a healthy evolution looks like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — Explicit priors&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Matrix-based compatibility and explainable defaults.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — Instrumentation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Structured logging, defined outcomes, and feedback loops.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 — Hybrid ranking&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
ML learns residuals while the matrix remains a prior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4 — ML dominance&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Models lead; matrices constrain edge cases.&lt;/p&gt;

&lt;p&gt;Skipping phases doesn’t accelerate this process.&lt;br&gt;&lt;br&gt;
It breaks it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;In real marketplaces, especially high-stakes or regulated ones,&lt;br&gt;&lt;br&gt;
ML is not the starting engine.&lt;/p&gt;

&lt;p&gt;It’s the turbocharger.&lt;/p&gt;

&lt;p&gt;If your system behaves sensibly &lt;em&gt;before&lt;/em&gt; you train a model, you’re not behind.&lt;br&gt;
You’re building the only kind of foundation that machine learning can learn from.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.linkedin.com/in/yuriylozinsky/" rel="noopener noreferrer"&gt;Yurii Lozinskyi - AI Delivery Lead &amp;amp; AI Practice Director&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Part 1.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;Building an AI Matching Engine Without Big Tech Resources&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 2.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;AI Matching: Matrix First, Neural Nets Later&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 3.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;From Matrix to Model: When Is It Finally Safe to Train ML?&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 4.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7"&gt;Explainability in AI: Not a Feature, but a Vital Mechanism&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 5.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6"&gt;When the Matrix Breaks: Failure Modes of Early Matching Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>matching</category>
      <category>architecture</category>
      <category>marketplaces</category>
    </item>
    <item>
      <title>AI Matching: Matrix First, Neural Nets Later</title>
      <dc:creator>Yurii Lozinskyi</dc:creator>
      <pubDate>Sat, 31 Jan 2026 23:37:27 +0000</pubDate>
      <link>https://forem.com/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280</link>
      <guid>https://forem.com/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280</guid>
      <description>&lt;h2&gt;
  
  
  How to get day-one relevance when you don't have data (and probably never did)
&lt;/h2&gt;

&lt;p&gt;Everyone wants an "AI-powered matching engine".&lt;/p&gt;

&lt;p&gt;In practice, this usually means one thing:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"We'll train a neural network and let it figure things out."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That sounds reasonable --- until you ask the first uncomfortable question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where exactly will the training data come from?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article is about that gap between ambition and reality.&lt;br&gt;&lt;br&gt;
It's about building matching systems &lt;strong&gt;before&lt;/strong&gt; you have Big Data, feedback loops, or ML infrastructure - and still delivering relevance from day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The real business problem: "Where do we get data to train a neural network?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's start with the problem most teams avoid articulating clearly.&lt;/p&gt;

&lt;p&gt;Neural networks do not fail because they are bad.&lt;br&gt;&lt;br&gt;
They fail because &lt;strong&gt;they need data that doesn't exist yet&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To train a meaningful matching model, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;historical matches&lt;/li&gt;
&lt;li&gt;outcomes (success/failure)
&lt;/li&gt;
&lt;li&gt;user behavior (clicks, acceptances, conversions)&lt;/li&gt;
&lt;li&gt;enough volume to avoid overfitting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Early-stage systems have none of that.&lt;/p&gt;

&lt;p&gt;This creates a paradox:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you need good matching to get users&lt;/li&gt;
&lt;li&gt;you need users to get data&lt;/li&gt;
&lt;li&gt;you need data to train matching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams quietly ignore this and ship:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;random relevance&lt;/li&gt;
&lt;li&gt;overconfident AI labels&lt;/li&gt;
&lt;li&gt;or brittle rule engines disguised as "ML"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's not a technical issue.&lt;br&gt;&lt;br&gt;
That's a &lt;strong&gt;product and architecture problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. A concrete use case: choosing the right marketing channel or agency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To make this tangible, let's define a clear use case.&lt;/p&gt;

&lt;p&gt;Imagine a company launching a new marketing campaign.&lt;br&gt;&lt;br&gt;
They want to choose &lt;strong&gt;the right advertising channel, agency, or influencer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Their constraints are realistic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;limited budget&lt;/li&gt;
&lt;li&gt;brand reputation at stake&lt;/li&gt;
&lt;li&gt;unclear expectations about what will work&lt;/li&gt;
&lt;li&gt;no historical performance data &lt;em&gt;in this exact setup&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the supply side (channels, agencies, influencers), you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different levels of reach&lt;/li&gt;
&lt;li&gt;different credibility&lt;/li&gt;
&lt;li&gt;different risk profiles&lt;/li&gt;
&lt;li&gt;different communication styles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The business question is not:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Which option is statistically similar to this campaign?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The real question is:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;"Which option best fits the expectations and constraints of this campaign?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's a &lt;strong&gt;compatibility problem&lt;/strong&gt;, not a similarity problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Why "just train a neural network" doesn't work here&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this point, someone usually says:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Let's just embed everything and train a model later."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That works only if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you already have outcomes&lt;/li&gt;
&lt;li&gt;you already have labels&lt;/li&gt;
&lt;li&gt;you already have scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our use case, you don't.&lt;/p&gt;

&lt;p&gt;Trying to use neural networks here leads to one of three failures:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model overfits on tiny data&lt;/li&gt;
&lt;li&gt;The model outputs noise that looks confident
&lt;/li&gt;
&lt;li&gt;The team disables the model "temporarily" --- permanently&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The real issue is not lack of ML talent.&lt;br&gt;&lt;br&gt;
It's that &lt;strong&gt;the system has no prior understanding of what "fit" means&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So you need a prior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Reframing the problem: similarity vs compatibility&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the key conceptual shift.&lt;/p&gt;

&lt;p&gt;Most ML tooling is built around &lt;strong&gt;similarity&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cosine similarity&lt;/li&gt;
&lt;li&gt;Euclidean distance&lt;/li&gt;
&lt;li&gt;nearest neighbors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarity answers:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"How alike are these two things?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But matching in business systems rarely asks that question.&lt;/p&gt;

&lt;p&gt;Instead, it asks:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;"How appropriate is this option for this context?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's compatibility.&lt;/p&gt;

&lt;p&gt;Compatibility is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;asymmetric&lt;/li&gt;
&lt;li&gt;expectation-driven&lt;/li&gt;
&lt;li&gt;domain-specific&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it can be expressed &lt;strong&gt;explicitly&lt;/strong&gt;, without pretending to learn it from non-existent data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Solution: Compatibility Matrix (feature matrix, not ML)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now we get to the core idea.&lt;/p&gt;

&lt;p&gt;Instead of trying to &lt;em&gt;learn&lt;/em&gt; relevance, we &lt;strong&gt;encode domain knowledge as a matrix&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We define two small, stable feature spaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Campaign side&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;blog_type ∈ { corporate, brand_voice, expert, personal }&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how formal the communication should be&lt;/li&gt;
&lt;li&gt;how much authority is expected&lt;/li&gt;
&lt;li&gt;how much personal storytelling is acceptable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Supply side (agency / influencer / channel)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;social_status ∈ { celebrity, macro, micro, nano }&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;perceived authority&lt;/li&gt;
&lt;li&gt;reach expectations&lt;/li&gt;
&lt;li&gt;risk tolerance&lt;/li&gt;
&lt;li&gt;credibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now we define a compatibility matrix:&lt;br&gt;&lt;br&gt;
&lt;code&gt;compatibility[blog_type][social_status] → score ∈ [0..1]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This matrix answers:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Given this campaign style, how appropriate is this level of authority?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It is not a guess.&lt;br&gt;&lt;br&gt;
It is a &lt;strong&gt;product hypothesis&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Example: a simple 4×4 compatibility matrix&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's make this concrete.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;           | celebrity | macro | micro | nano
-----------|-----------|-------|-------|------
corporate  | 1.0       | 0.8   | 0.4   | 0.2
brand_voice| 0.7       | 1.0   | 0.8   | 0.5
expert     | 0.6       | 0.9   | 1.0   | 0.7
personal   | 0.3       | 0.6   | 0.9   | 1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Compatibility Matrix lookup (Day 1 matching)
&lt;/span&gt;&lt;span class="n"&gt;matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;corporate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;brand_voice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;expert&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;personal&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;matrix_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;influencer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;O(1) lookup — 1000s RPS без проблем&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;influencers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;corporate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;macro&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;micro&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nano&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;influencers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;influencer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;matrix&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Production usage
&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;matrix_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;corporate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;macro&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 0.8 ✅
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Corporate ↔ Macro: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this represents in business terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Corporate campaigns prioritize authority and low risk&lt;/li&gt;
&lt;li&gt;Personal storytelling thrives with relatable, smaller voices&lt;/li&gt;
&lt;li&gt;Expert campaigns value credibility over raw reach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Important clarification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These numbers are &lt;strong&gt;relative&lt;/strong&gt;, not absolute&lt;/li&gt;
&lt;li&gt;They don't predict success&lt;/li&gt;
&lt;li&gt;They define &lt;strong&gt;expected fit&lt;/strong&gt;, not outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;7. Why this works without data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this stage, a reasonable question arises:&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Isn't this just hard-coded logic?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes --- and that's exactly the point.&lt;/p&gt;

&lt;p&gt;But it's &lt;strong&gt;structured&lt;/strong&gt;, &lt;strong&gt;graded&lt;/strong&gt;, and &lt;strong&gt;explicit&lt;/strong&gt;, unlike:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;binary rules&lt;/li&gt;
&lt;li&gt;if/else chains&lt;/li&gt;
&lt;li&gt;or fake ML&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A compatibility matrix gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic behavior&lt;/li&gt;
&lt;li&gt;explainable decisions&lt;/li&gt;
&lt;li&gt;controllable bias&lt;/li&gt;
&lt;li&gt;and stable early relevance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, it gives the system &lt;strong&gt;a worldview&lt;/strong&gt; before data exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. How this evolves into machine learning (without rewrites)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This approach is not anti-ML.&lt;br&gt;&lt;br&gt;
It's &lt;strong&gt;pre-ML&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As the system runs, you naturally collect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which matches were shortlisted&lt;/li&gt;
&lt;li&gt;which were accepted&lt;/li&gt;
&lt;li&gt;which led to engagement or conversion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, the transition is incremental.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 --- Matrix only&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;score = compatibility_matrix[blog_type][social_status]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 --- Hybrid&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;score = 0.7 * matrix_score + 0.3 * nn_prediction&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Phase 2: Matrix 70% + NN 30%
&lt;/span&gt;&lt;span class="n"&gt;matrix_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
&lt;span class="n"&gt;nn_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 0.75
&lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;matrix_score&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;nn_score&lt;/span&gt;  &lt;span class="c1"&gt;# 0.785
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 3 --- ML-dominant&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;code&gt;score = nn_prediction&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The matrix never disappears.&lt;br&gt;&lt;br&gt;
It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a baseline&lt;/li&gt;
&lt;li&gt;a regularizer&lt;/li&gt;
&lt;li&gt;a fallback for cold start&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how production systems actually grow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Why this gives you day-one relevance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest hidden risk in matching systems is &lt;strong&gt;irrelevance at launch&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If users see poor matches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;they don't interact&lt;/li&gt;
&lt;li&gt;you don't collect data&lt;/li&gt;
&lt;li&gt;your ML roadmap dies before it starts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A compatibility matrix avoids that trap.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reasonable defaults&lt;/li&gt;
&lt;li&gt;behavior aligned with business expectations&lt;/li&gt;
&lt;li&gt;trust from users&lt;/li&gt;
&lt;li&gt;and data that actually reflects intent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All without pretending you have Big Data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Day 1: 100% matrix, no training data needed
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_matches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;suppliers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.6&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;supplier&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;suppliers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;matrix_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;campaign_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;supplier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;min_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;supplier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Real metrics: 47 suppliers → 12 matches → 3% conversion
# O(n) complexity, 1000s RPS, zero cold start
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Final takeaway&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If there's one idea worth remembering:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Similarity is a mathematical concept.&lt;br&gt;&lt;br&gt;
Compatibility is a business concept.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Neural networks are excellent at learning similarity ---&lt;br&gt;&lt;br&gt;
&lt;em&gt;after&lt;/em&gt; the world gives you data.&lt;/p&gt;

&lt;p&gt;Compatibility matrices let you act &lt;strong&gt;before&lt;/strong&gt; that moment arrives.&lt;/p&gt;

&lt;p&gt;Matrix first.&lt;br&gt;&lt;br&gt;
Neural nets later.&lt;/p&gt;

&lt;p&gt;That's not a compromise. That's how real matching systems survive long enough to learn.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.linkedin.com/in/yuriylozinsky/" rel="noopener noreferrer"&gt;Yurii Lozinskyi - AI Delivery Lead &amp;amp; AI Practice Director&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Part 1.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;Building an AI Matching Engine Without Big Tech Resources&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 2.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;AI Matching: Matrix First, Neural Nets Later&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 3.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;From Matrix to Model: When Is It Finally Safe to Train ML?&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 4.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7"&gt;Explainability in AI: Not a Feature, but a Vital Mechanism&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 5.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6"&gt;When the Matrix Breaks: Failure Modes of Early Matching Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aiml</category>
      <category>matching</category>
      <category>marketplaces</category>
      <category>product</category>
    </item>
    <item>
      <title>Building an AI Matching Engine Without Big Tech Resources</title>
      <dc:creator>Yurii Lozinskyi</dc:creator>
      <pubDate>Fri, 09 Jan 2026 22:32:25 +0000</pubDate>
      <link>https://forem.com/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490</link>
      <guid>https://forem.com/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490</guid>
      <description>&lt;h2&gt;
  
  
  &lt;a href="https://pairfect.io" rel="noopener noreferrer"&gt;Pairfect IO&lt;/a&gt; Case Study + Practical Framework
&lt;/h2&gt;

&lt;p&gt;Most people think matching in marketplaces is just filters + sorting.  &lt;/p&gt;

&lt;p&gt;It isn’t.&lt;br&gt;
Matching is architecture. It's the mechanism that decides who should meet whom.&lt;br&gt;&lt;br&gt;
When matching fails, the entire marketplace collapses — no UX, no design, and no advertising budget can save it.&lt;/p&gt;

&lt;p&gt;This post is about how we built an AI-powered matching engine for &lt;strong&gt;Pairfect IO&lt;/strong&gt;, a marketplace connecting brands with influencers — &lt;strong&gt;without&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training data
&lt;/li&gt;
&lt;li&gt;behavioral signals
&lt;/li&gt;
&lt;li&gt;feedback loops
&lt;/li&gt;
&lt;li&gt;GPUs
&lt;/li&gt;
&lt;li&gt;ML ops stacks
&lt;/li&gt;
&lt;li&gt;Pinecone/Milvus/Weaviate
&lt;/li&gt;
&lt;li&gt;and without 200 ML engineers like LinkedIn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything ran on &lt;strong&gt;PostgreSQL + pgvector&lt;/strong&gt;, with explainability, determinism, and an evolution path.&lt;br&gt;&lt;br&gt;
If you're building a marketplace and need matching that works &lt;strong&gt;before you have Big Tech data&lt;/strong&gt; — this is for you.&lt;/p&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;Why Matching Is Harder Than It Looks&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Matching looks trivial from the outside. But production-grade matching is an &lt;strong&gt;outcome-driven system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Take LinkedIn. Their matching works because it learns from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;applications&lt;/li&gt;
&lt;li&gt;acceptance rates&lt;/li&gt;
&lt;li&gt;recruiter behavior&lt;/li&gt;
&lt;li&gt;network overlap&lt;/li&gt;
&lt;li&gt;engagement signals&lt;/li&gt;
&lt;li&gt;retention data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: LinkedIn doesn’t “guess relevance”. It &lt;strong&gt;learns relevance from outcomes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now contrast that with a seed-stage marketplace.&lt;/p&gt;

&lt;p&gt;Pairfect started with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no labeled data&lt;/li&gt;
&lt;li&gt;no behavioral data&lt;/li&gt;
&lt;li&gt;no interactions&lt;/li&gt;
&lt;li&gt;no click-through signals&lt;/li&gt;
&lt;li&gt;no embeddings graph&lt;/li&gt;
&lt;li&gt;no GPUs&lt;/li&gt;
&lt;li&gt;Postgres as the only accepted infra&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Completely different world.&lt;/p&gt;

&lt;p&gt;Yet a common mistake early teams make is trying to copy Big Tech architecture without Big Tech data. It doesn’t work.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Real Beginning: Constraints, Not Models
&lt;/h2&gt;

&lt;p&gt;Most teams begin matching by asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Which ML model should we use?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We started by asking a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What constraints make certain architectures impossible?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Below is a simplified version of our real constraint table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Self-funded&lt;/td&gt;
&lt;td&gt;No GPUs, no distributed systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Must run on Postgres&lt;/td&gt;
&lt;td&gt;Matching logic must be SQL-native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No labels&lt;/td&gt;
&lt;td&gt;No LTR, no two-tower training&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU only&lt;/td&gt;
&lt;td&gt;Lightweight embeddings only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MVP in 3 months&lt;/td&gt;
&lt;td&gt;Simple &amp;gt; complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need explainability&lt;/td&gt;
&lt;td&gt;No black-box ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sparse metadata&lt;/td&gt;
&lt;td&gt;Must extract from text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Minimal DevOps&lt;/td&gt;
&lt;td&gt;No vector DB clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table &lt;strong&gt;was the architecture&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Before we wrote a single line of code, we knew what we &lt;strong&gt;couldn’t&lt;/strong&gt; build.&lt;/p&gt;

&lt;p&gt;And ironically, that saved Pairfect, a self-funded startup.&lt;/p&gt;
&lt;h2&gt;
  
  
  Defining What “Good Match” Means (Critical &amp;amp; Often Missed)
&lt;/h2&gt;

&lt;p&gt;You cannot architect matching until you define what a &lt;strong&gt;good match means in your domain&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For LinkedIn, a “good match” means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;hired + retained&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For Pairfect, a “good match” meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic fit between campaign &amp;amp; influencer&lt;/li&gt;
&lt;li&gt;audience expectations align&lt;/li&gt;
&lt;li&gt;tone compatibility&lt;/li&gt;
&lt;li&gt;price compatibility&lt;/li&gt;
&lt;li&gt;content format alignment&lt;/li&gt;
&lt;li&gt;worldview alignment (yes, that matters in creators)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your team cannot answer:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What constitutes a good match here?”&lt;br&gt;&lt;br&gt;
Then any discussion of embeddings vs rules vs transformers is premature.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Why We Didn’t Go Straight for SOTA Models
&lt;/h2&gt;

&lt;p&gt;We evaluated the standard architectural options. Most didn’t survive the constraint filter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Why Not (At MVP Stage)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rules-only&lt;/td&gt;
&lt;td&gt;Too rigid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pure embeddings&lt;/td&gt;
&lt;td&gt;Too noisy without deterministic anchors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM ranking&lt;/td&gt;
&lt;td&gt;Too slow + expensive on CPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning-to-Rank&lt;/td&gt;
&lt;td&gt;Needs labeled data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-tower&lt;/td&gt;
&lt;td&gt;Needs training data + GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collaborative filtering&lt;/td&gt;
&lt;td&gt;Needs behavior data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph models&lt;/td&gt;
&lt;td&gt;Needs graph maturity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That left one viable category:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hybrid Matching&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not because it's “cool” — but because it’s &lt;strong&gt;appropriate for the stage&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  &lt;strong&gt;The Architecture: Hybrid Matching in Practice&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Our hybrid pipeline looked like this:&lt;/p&gt;

&lt;p&gt;Hard Filters → One-Hot Features → Embeddings → Fusion → Top-K&lt;/p&gt;

&lt;p&gt;Breakdown:&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;1. Hard Filters&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Eliminate impossible cases upfront:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; price&lt;/li&gt;
&lt;li&gt;language&lt;/li&gt;
&lt;li&gt;content format&lt;/li&gt;
&lt;li&gt;region&lt;/li&gt;
&lt;li&gt;campaign type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This removes garbage noise.&lt;/p&gt;

&lt;p&gt;Example (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;influencers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;language&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'en'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'eu'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt; &lt;span class="o"&gt;@&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'video'&lt;/span&gt;&lt;span class="p"&gt;]::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;2. One-Hot Signals&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Encode domain knowledge explicitly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tone&lt;/li&gt;
&lt;li&gt;niche&lt;/li&gt;
&lt;li&gt;vertical&lt;/li&gt;
&lt;li&gt;channel&lt;/li&gt;
&lt;li&gt;creative style&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This prevents “semantic nonsense” (e.g., matching a financial brand with a prank channel).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;influencer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;tone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tone&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;tone_match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;vertical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vertical&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vertical_match&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;influencers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;3. Embeddings&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We generated embeddings for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bios&lt;/li&gt;
&lt;li&gt;captions&lt;/li&gt;
&lt;li&gt;descriptions&lt;/li&gt;
&lt;li&gt;LLM summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stored in pgvector, similarity via cosine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;influencer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bio_embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;campaign&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bio_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;semantic_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;influencers&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;semantic_score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;4. Rank Fusion (RRF)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This was surprisingly powerful.&lt;/p&gt;

&lt;p&gt;RRF allowed us to merge multiple ranking signals into one stable ranking &lt;strong&gt;without training&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To merge them without training, we used RRF:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Score = Σ 1 / (k + rank_i)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example (simplified in SQL/CTE form):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;influencer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;semantic_score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;tone_match&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;vertical_match&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;r3&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;influencer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;r3&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no ML pipeline&lt;/li&gt;
&lt;li&gt;consistent behavior&lt;/li&gt;
&lt;li&gt;explainable scoring&lt;/li&gt;
&lt;li&gt;cheap to compute&lt;/li&gt;
&lt;li&gt;resistant to noisy embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Top-K Output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Return a shortlist, not an infinite scroll.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;Top&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="n"&gt;most&lt;/span&gt; &lt;span class="n"&gt;compatible&lt;/span&gt; &lt;span class="n"&gt;influencers&lt;/span&gt;
&lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;explanation&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;not&lt;/strong&gt; personalization; it is &lt;strong&gt;decision support&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Everything Ran on PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Our entire matching system ran on:&lt;/p&gt;

&lt;p&gt;PostgreSQL + pgvector + CPU&lt;/p&gt;

&lt;p&gt;Reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; infra should reduce risk, not increase it&lt;/li&gt;
&lt;li&gt;one system &amp;gt; five microservices&lt;/li&gt;
&lt;li&gt;fewer moving parts = fewer failures&lt;/li&gt;
&lt;li&gt;debugging in SQL is fast &amp;amp; deterministic&lt;/li&gt;
&lt;li&gt;product iteration &amp;gt; infra optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hot take:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;infra is not tooling, infra is liability&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Especially at the MVP stage.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Explainability Was a Feature, Not a Nice-to-Have&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We built full explainability into the matching layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why this recommendation&lt;/li&gt;
&lt;li&gt;which signals contributed&lt;/li&gt;
&lt;li&gt;how fusion scored them&lt;/li&gt;
&lt;li&gt;what would disqualify it&lt;/li&gt;
&lt;li&gt;how to override&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trust matters in early marketplaces.&lt;/p&gt;

&lt;p&gt;LinkedIn can hide behind a black box.&lt;br&gt;&lt;br&gt;
Startups cannot.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evolution Path (Critical CTO Work)
&lt;/h2&gt;

&lt;p&gt;Founders often ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Will hybrid scale forever?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No. And it doesn’t need to.&lt;/p&gt;

&lt;p&gt;Our planned evolution path looked like this:&lt;/p&gt;

&lt;p&gt;Hybrid → Behavioral Signals → LTR → Two-Tower → Graph → RL → Agents&lt;/p&gt;

&lt;p&gt;Where each step unlocks the next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hybrid gives usable matching Day 1&lt;/li&gt;
&lt;li&gt;behavior gives labels&lt;/li&gt;
&lt;li&gt;labels enable LTR&lt;/li&gt;
&lt;li&gt;scale enables encoders&lt;/li&gt;
&lt;li&gt;graph enables multiple objective optimization&lt;/li&gt;
&lt;li&gt;RL enables personalization&lt;/li&gt;
&lt;li&gt;agents enable reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how &lt;strong&gt;marketplace intelligence actually grows&lt;/strong&gt; in the real world.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Lessons
&lt;/h2&gt;

&lt;p&gt;Three lessons emerged from building Pairfect:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lesson 1&lt;/strong&gt; — Matching is not a model problem; it’s a business constraint problem &lt;br&gt;
&lt;strong&gt;Lesson 2&lt;/strong&gt; — Appropriate complexity wins at the MVP stage. Over-engineering extends time-to-market &lt;br&gt;
&lt;strong&gt;Lesson 3&lt;/strong&gt; — You don’t need Big Tech architecture without Big Tech data&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The goal is not to replicate LinkedIn.&lt;br&gt;&lt;br&gt;
The goal is to build a system &lt;strong&gt;honest about your stage&lt;/strong&gt; and &lt;strong&gt;prepared to evolve&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you’re building something similar
&lt;/h2&gt;

&lt;p&gt;Happy to discuss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;marketplace matching&lt;/li&gt;
&lt;li&gt;ranking architectures&lt;/li&gt;
&lt;li&gt;hybrid systems&lt;/li&gt;
&lt;li&gt;pgvector setups&lt;/li&gt;
&lt;li&gt;evolution paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DMs open.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://www.linkedin.com/in/yuriylozinsky/" rel="noopener noreferrer"&gt;Yurii Lozinskyi - AI Delivery Lead &amp;amp; AI Practice Director&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Part 1.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/building-an-ai-matching-engine-without-big-tech-resources-4490"&gt;Building an AI Matching Engine Without Big Tech Resources&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 2.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/ai-matching-matrix-first-neural-nets-later-2280"&gt;AI Matching: Matrix First, Neural Nets Later&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 3.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/from-matrix-to-model-when-is-it-finally-safe-to-train-ml-46o3"&gt;From Matrix to Model: When Is It Finally Safe to Train ML?&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 4.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/explainability-in-ai-is-not-a-feature-its-a-survival-mechanism-5al7"&gt;Explainability in AI: Not a Feature, but a Vital Mechanism&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Part 5.&lt;/strong&gt; &lt;a href="https://dev.to/yurii_lozinskyi/when-the-matrix-breaks-failure-modes-of-early-matching-systems-4nm6"&gt;When the Matrix Breaks: Failure Modes of Early Matching Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>marketplaces</category>
      <category>matching</category>
    </item>
  </channel>
</rss>
