<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: adaboese</title>
    <description>The latest articles on Forem by adaboese (@adaboese).</description>
    <link>https://forem.com/adaboese</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1196379%2Fa9898d50-fe92-4793-aaa7-ee8071544a11.jpg</url>
      <title>Forem: adaboese</title>
      <link>https://forem.com/adaboese</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/adaboese"/>
    <language>en</language>
    <item>
      <title>AI Content is Not Short-Term Arbitrage</title>
      <dc:creator>adaboese</dc:creator>
      <pubDate>Thu, 08 Feb 2024 14:34:45 +0000</pubDate>
      <link>https://forem.com/adaboese/ai-content-is-not-short-term-arbitrage-2dg6</link>
      <guid>https://forem.com/adaboese/ai-content-is-not-short-term-arbitrage-2dg6</guid>
      <description>&lt;p&gt;Whether you like it or not, AI is going to change the way we create content.&lt;/p&gt;

&lt;p&gt;Recently, there has been a flood of naysayers who claim that AI-generated content is short-term arbitrage. Like this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ahrefs.com/blog/ai-content-is-short-term-arbitrage/"&gt;https://ahrefs.com/blog/ai-content-is-short-term-arbitrage/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have a lot of respect for Rayan, but he is taking a too narrow view of the potential of AI-generated content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content was never the moat
&lt;/h2&gt;

&lt;p&gt;First, let's address the Elephant in the room: your writing style is not a moat. Even with today's capabilities, given sufficient amount of direction, AI can mimic your writing style to a T. The moat has always been the data and unique insights. The content is just a way to get the data and insights to the users. This will remain true even when AI-generated content becomes the norm.&lt;/p&gt;

&lt;p&gt;The real-question is whether AI-generated content can produce unique insights and data. The answer is yes, but not yet. We are getting there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Short-term arbitrage
&lt;/h2&gt;

&lt;p&gt;It is hard to talk about AI content without mentioning the recent "penalization" of websites that used AI-generated content. The "&lt;a href="https://twitter.com/jakezward/status/1728032634037567509"&gt;SEO heist&lt;/a&gt;" is one that got the most attention for both the rapid rise and fall of the website. So let's talk about it.&lt;/p&gt;

&lt;p&gt;Every single instance of websites getting "penalized" for use of AI content happened in the context of outputting &lt;em&gt;thousands&lt;/em&gt; of articles, getting a temporary boost, and then sinking. This comes as no surprise. Google &lt;a href="https://searchengineland.com/how-google-search-ranking-works-pandu-nayak-435395"&gt;publicly disclosed&lt;/a&gt; that they use multi-model evaluation algos to establish how to rank content. The first stage is the naive/cheap algo that basically gives any website the benefit of doubt. I suppose at the time they never expected anyone to be adding thousands of articles... well those thousands of articles are getting the benefit of doubt across many keywords. However, no surprise that once we enter verification stage (which includes broader spectrum of variables, including bounce rate, time on site, backlinks and whatnot) the websites tank. So... don't do that. Build organically over time.&lt;/p&gt;

&lt;p&gt;Related, contrary to what everyone on Reddit will want you to think, no one at Google is manually penalizing your website just because your tweet goes viral talking about how you came to top using AI. And if they do, you will be notified of it through Google Search Console &lt;a href="https://support.google.com/webmasters/answer/9044175?hl=en"&gt;Manual Action report&lt;/a&gt;, but the reason won't be "because you've used AI".&lt;/p&gt;

&lt;p&gt;Just based on common sense, Google wants only 3 things out of your-content:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;factually accurate content&lt;/li&gt;
&lt;li&gt;unique insights or new data points&lt;/li&gt;
&lt;li&gt;recognized by users to be valuable [based on engagement, backlinks, etc]&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Achieve these 3 things and it won't matter whether you use USD 100/hour copywriter, USD 15/hour copywriter, or AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state of AI content generation
&lt;/h2&gt;

&lt;p&gt;Most "AI content generators" are not yet where they would tick all 3 boxes, esp. if we compare to a good human copywriter. However, even if we are not there yet, we will be there very soon. Like, within 12 months at most. Not because of advancements in Large Language Models (LLMs) to be clear. I think LLMs are already there. What needs to catch up is orchestration of content generation.&lt;/p&gt;

&lt;p&gt;LLMs are already very capable of text analyzes and even making logical conclusions based on provided data. They are good at narrative telling. They are good at spotting logical inconsistencies, etc.&lt;/p&gt;

&lt;p&gt;What they are not good at is doing all of this in one-go. The statistical model just doesn't have the capability to think this far ahead or correct itself after output has been sent to users. This is why &lt;a href="https://www.promptingguide.ai/techniques/rag"&gt;Retrieval-Augmented Generation&lt;/a&gt; (RAG) and multi-stage output generation is necessary to produce something that resembles human-level output.&lt;/p&gt;

&lt;p&gt;Just to give a high-level idea, &lt;a href="https://aimd.app"&gt;AIMD&lt;/a&gt; uses 20+ different APIs to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;research the topic&lt;/li&gt;
&lt;li&gt;identify the most important data points about the topic&lt;/li&gt;
&lt;li&gt;aggregate supporting data&lt;/li&gt;
&lt;li&gt;validate the data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this work is done before even starting to write the outline of the article. These APIs include Google Search Console, "people also ask", keyword research tools, and many more.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am intentionally being vague about the APIs used beyond the obvious ones, because I don't want to give away the secret sauce.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then when it starts writing, it keeps fact checking everything that is written in the context of the greater article. And then when it is done, it goes through each section again looking for logical inconsistencies, style inconsistencies, etc.&lt;/p&gt;

&lt;p&gt;The biggest downside of this approach is that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;it takes a long time to generate article&lt;/li&gt;
&lt;li&gt;costs a lot more&lt;/li&gt;
&lt;li&gt;a lot more unstable (&lt;em&gt;many&lt;/em&gt; things tend to break when talking with so many services)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But all of 3 of those problems are solvable. So it is just a matter of time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to spot bad AI content generator?
&lt;/h2&gt;

&lt;p&gt;I currated a list of AI content generators that are currently available. I have 25 on the list at the moment, and I keep a close eye on all of them.&lt;/p&gt;

&lt;p&gt;Without even looking at the output, the biggest tell-tale sign of a bad AI content generator are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;real-time output&lt;/strong&gt; - if the output is streamed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cheap price&lt;/strong&gt; – if you are paying less than ~1 USD per article&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fast output&lt;/strong&gt; – if the output is generated in a few minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If either of these 3 are true, then you are likely just paying for a ChatGPT wrapper. You get what you pay for.&lt;/p&gt;

&lt;p&gt;Not surprisingly, the content produced using zero-shot approach is going to include logical inconsistencies, style inconsistencies, and factual inaccuracies. It is not going to be able to reference its sources or make compelling arguments.&lt;/p&gt;

&lt;p&gt;Just as a benchmark, I am working &lt;em&gt;really&lt;/em&gt; hard to get AIMD to produce articles in under 10 minutes. And I am not there yet. The data queries alone take a good chunk of time. The first draft is usually reduced by good ~40-60% just for the sake of fact checking and logical consistency. And then it goes through few more iterations of fact checking and logical consistency checks.&lt;/p&gt;

&lt;p&gt;To be clear, I am not making a claim that &lt;a href="https://aimd.app"&gt;https://aimd.app&lt;/a&gt; is there yet. I believe AIMD is the most advanced AI content generator out there, but even then you would be shooting yourself in the foot if you were to use its output without human editorial. However, I am confident that we will be there within 12 months and that then the moat will be the unique data and perspectives that you can provide to the AI, and not the "packaging" of the content.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contentwriting</category>
    </item>
    <item>
      <title>Using Vector Embeddings to Overengineer 404 pages</title>
      <dc:creator>adaboese</dc:creator>
      <pubDate>Wed, 17 Jan 2024 15:55:30 +0000</pubDate>
      <link>https://forem.com/adaboese/using-vector-embeddings-to-overengineer-404-pages-47b1</link>
      <guid>https://forem.com/adaboese/using-vector-embeddings-to-overengineer-404-pages-47b1</guid>
      <description>&lt;p&gt;After spending a significant amount of time working with vector embeddings, I've started to see more and more use cases for them for every day problems. One of the most interesting ones I've seen is using vector embeddings to find the page that the user was looking for when they hit a 404 page.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a vector embedding?
&lt;/h2&gt;

&lt;p&gt;A vector embedding is a way to represent a word or phrase as a vector. This is useful because it allows us to do math on words and phrases. For example, we can find the word that is closest to another word by finding the word with the smallest distance between the two vectors.&lt;/p&gt;

&lt;p&gt;Financial Times has a great &lt;a href="https://ig.ft.com/generative-ai/"&gt;interactive article&lt;/a&gt; that explains vector embeddings in more detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can we use vector embeddings to find the page that the user was looking for?
&lt;/h2&gt;

&lt;p&gt;We can use vector embeddings to find the page that the user was looking for by finding the page with the smallest distance between the vector of the page and the vector of the user's query. In the context of a 404 page, user's query is the URL that they were trying to access.&lt;/p&gt;

&lt;p&gt;It is suprisingly simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;we need to create a database of all the pages on our site&lt;/li&gt;
&lt;li&gt;we need to create a vector embedding for each page URL&lt;/li&gt;
&lt;li&gt;we need to create a vector embedding for the user's query&lt;/li&gt;
&lt;li&gt;we need to find the page with the smallest distance between the vector of the page and the vector of the user's query&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In case of AIMD, I am doing this all in-memory, but you could also do this in a database (e.g. &lt;a href="https://www.pinecone.io/"&gt;Pinecone&lt;/a&gt;). It all depends on how much data you have and how much compute you have available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deciding on a vector embedding model
&lt;/h2&gt;

&lt;p&gt;The first step is to decide on a vector embedding model. I am using &lt;a href="https://huggingface.co/Supabase/gte-small"&gt;Supabase/gte-small&lt;/a&gt; because it is small model and it &lt;a href="https://huggingface.co/spaces/mteb/leaderboard"&gt;outperforms&lt;/a&gt; OpenAI's &lt;a href="https://platform.openai.com/docs/guides/embeddings/types-of-embedding-models"&gt;&lt;code&gt;text-embedding-ada-002&lt;/code&gt; model&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I wrote this abstraction that creates a vector embedding for a given text:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;pipeline&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@xenova/transformers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;feature-extraction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Supabase/gte-small&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;pooling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mean&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;float32&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Expected embedding type to be float32&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a database of all the pages on our site
&lt;/h2&gt;

&lt;p&gt;The next step is to create a database of all the pages on our site.&lt;/p&gt;

&lt;p&gt;Let's assume that we have an array of all the pages on our site:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;SitemapEntry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;staticPages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SitemapEntry&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://aimd.app/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://aimd.app/blog&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://aimd.app/blog/2024-01-15-top-seo-trends-for-2024-what-should-you-focus-on&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://aimd.app/blog/2024-01-07-maximizing-article-visibility-understanding-and-applying-e-e-a-t-in-seo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then create a database of all the pages on our site by creating a vector embedding for each page URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;DatabaseEntry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Metadata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DatabaseEntry&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;staticPages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;entries&lt;/code&gt; is now a database of all the pages on our site.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding the page with the smallest distance between the vector of the page and the vector of the user's query
&lt;/h2&gt;

&lt;p&gt;The last step is to find the page with the smallest distance between the vector of the page and the vector of the user's query.&lt;/p&gt;

&lt;p&gt;Let's assume that we have a user's &lt;code&gt;query&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://aimd.app/blog/2024-01-17-using-vector-embeddings-to-overengineer-404-pages&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, we need to create a vector embedding for the user's query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryVector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we need a way to calculate a distance between two vectors. For this, we can use &lt;a href="https://en.wikipedia.org/wiki/Cosine_similarity"&gt;cosine similarity&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;similarity&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compute-cosine-similarity&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we can find the page with the smallest distance between the vector of the page and the vector of the user's query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;closestEntry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;closestEntry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryVector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;distance&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;closestEntry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;closestEntry&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;closestEntry.entry&lt;/code&gt; is now the page that has the most similar URL to the page the user was similar.&lt;/p&gt;

&lt;p&gt;The best part is that this does not even need to be the exact page that the user was looking for, e.g. in case the page was removed. It will be whichever page has the most similar URL to the page the user was looking for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using with Remix
&lt;/h2&gt;

&lt;p&gt;Just to complete the example, here is how you would use this with &lt;a href="https://remix.run/"&gt;Remix&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/routes/$.tsx&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;MetaFunction&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@remix-run/node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;useLoaderData&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@remix-run/react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;LoaderFunctionArgs&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@remix-run/server-runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;findNearestUrl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#app/services/sitemap.server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MetaFunction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;404&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="nx"&gt;LoaderFunctionArgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nearestUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;findNearestUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;nearestUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Route&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useLoaderData&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;loader&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;404&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
       &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;Were you looking for &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nearestUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nearestUrl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;?&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;p&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;Route&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your 404 page will suggest the page that the user was most likely looking for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Examples of this in the wild
&lt;/h2&gt;

&lt;p&gt;Here is how that looks once deployed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aimd.app/blog/2024-01-07-maximizing-article-visibility-e-e-a-t-in-seo"&gt;2024-01-07-maximizing-article-visibility-e-e-a-t-in-seo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aimd.app/blog/2024-01-07-maximizing-article-visibility-understanding-applying-e-e-a-t-in-seo"&gt;2024-01-07-maximizing-article-visibility-understanding-applying-e-e-a-t-in-seo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aimd.app/blog/2024-01-07-maximizing-article-visibility-understanding-and-applying-eeat-in-seo"&gt;2024-01-07-maximizing-article-visibility-understanding-and-applying-eeat-in-seo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though all of these pages do not exist, they all produce a 404 page that links to the &lt;a href="https://aimd.app/blog/2024-01-07-maximizing-article-visibility-understanding-and-applying-e-e-a-t-in-seo"&gt;correct page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In practice, this type of hint will be most useful for pages that were removed or renamed, e.g. I have accidentally introduced numerous 404s on this site by changing the dates of the posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do we even need 404 pages?
&lt;/h2&gt;

&lt;p&gt;This is a bit of a tangent, but I think it is worth mentioning that we might not even need 404 pages. Instead, we could just redirect the user to the page that they were looking for.&lt;/p&gt;

&lt;p&gt;Realistically, the only reason we have 404 pages is because we don't know what the user was looking for. But if we can use vector embeddings to find the page that the user was looking for, then we can just redirect them to that page.&lt;/p&gt;

&lt;p&gt;I will be experimenting with this on AIMD in the future.&lt;/p&gt;

</description>
      <category>seo</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
