Forem: Siddharth Patel

Programmatic SEO Without Spam: A Scalable Framework That Actually Works

Siddharth Patel — Mon, 15 Dec 2025 05:26:46 +0000

The Programmatic SEO Paradox: Scale vs. Quality

Let's start with an uncomfortable truth: most programmatic SEO fails. Not because the methodology is flawed, but because 95% of implementations prioritize quantity over quality, creating the very content farms Google's algorithms were designed to destroy.

But when done right, when you combine systematic scale with genuine value, programmatic SEO becomes your unfair advantage. It's how you can out-resource larger competitors, dominate niche spaces, and build sustainable organic moats.

This guide isn't about churning out 10,000 thin pages. It's about building a quality-first programmatic system that creates genuinely helpful content at scale while maintaining, or even enhancing, your site's authority.

Part 1: What Programmatic SEO Actually Is (And Isn't)

The Core Principle

Programmatic SEO is creating content through systems, not just individual effort. It combines:

Data analysis
Template-based content creation
Automated publishing
Systematic optimization

What It's NOT:

Not AI-generated spam
Not duplicate content with swapped keywords
Not doorway pages
Not thin affiliate sites

What It IS:

Scalable content creation based on real user needs
Data-driven topic selection
Consistent quality through templates
Efficient resource allocation

Part 2: The Quality-First Framework

Phase 1: Foundation Audit (Don't Scale a Broken Base)

Before writing a single programmatic page, answer:

Does your site already demonstrate E-E-A-T?
- Established expertise in your niche?
- Quality backlink profile?
- Strong user engagement signals?
Do you have the technical foundation?
- Fast hosting and CDN?
- Clean site architecture?
- Proper internal linking?
Do you have at least 5-10 truly excellent, manually created "pillar" pages?
- These serve as your quality benchmark.

If you answered "no" to any of these, fix them first. Programmatic SEO amplifies what you have, it doesn't fix foundational issues.

Phase 2: Strategic Data Collection

Step 1: Identify Your Data Source

Choose unique, proprietary, or hard-to-access data:

Internal data: Customer usage patterns, support ticket analysis, feature adoption rates
Curated data: Manual research compiled into structured datasets
API data: Public data processed with unique insights
Community data: Aggregated user experiences or reviews

Example: A project management tool might analyze:

10,000+ projects to identify "most common workflow bottlenecks"
Time tracking data to show "optimal meeting duration by team size"
Integration usage patterns to reveal "most valuable app combinations"

Step 2: Structure for Scalable Insights

Your data should allow for:

Comparison (Tool A vs. Tool B)
Categorization (By use case, industry, team size)
Filtering (By price, feature, integration)
Trend analysis (Over time, by region)

Phase 3: Template Design That Doesn't Look Template

The 70/30 Rule

70% standardized content (consistent structure, data presentation, formatting)
30% unique value (insights, analysis, commentary, specific examples)

Template Components That Add Value:

1. Introduction (Unique for each page)
   - Specific problem this page solves
   - Why this specific variation matters
   - Who it's specifically for

2. Data Presentation (Structured)
   - Comparison tables with sortable columns
   - Charts/graphs where appropriate
   - Key metrics clearly highlighted

3. Analysis Section (Unique)
   - What the data actually means
   - Surprising findings
   - Practical implications

4. Actionable Recommendations (Contextual)
   - Specific next steps based on the data
   - Tools/resources that help
   - Common pitfalls to avoid

5. Related Considerations (Dynamic)
   - Related but different scenarios
   - Edge cases worth mentioning
   - Future trends to watch

Avoiding the "Template Look":

Vary sentence structure within sections
Use different data visualizations (tables, charts, timelines)
Include unique images/screenshots where possible
Add relevant anecdotes or mini-case studies
Change section order based on importance

Phase 4: The Production Pipeline

Step 1: Data Processing

Raw Data → Clean → Analyze → Structure → Enrich

Enrichment examples:

Add difficulty scores
Include popularity trends
Append cost analysis
Calculate time estimates

Step 2: Content Generation

Human-in-the-loop workflow:

System generates first draft using templates + data
Editor reviews for coherence and insight
Expert adds unique commentary/analysis
Quality check against pillar page standards

Step 3: Quality Gates

Every page must pass:

Originality check: Minimum 30% unique content
Depth threshold: Minimum 800 words (unless data-heavy)
Value assessment: Would this help someone make a decision?
E-E-A-T alignment: Does it demonstrate expertise?

Phase 5: Publishing Architecture

URL Structure:

/use-case/[specific-variation]/
/compare/[tool-a]-vs-[tool-b]/
/industry/[industry]-[solution]/

Internal Linking Strategy:

Each programmatic page links to 2-3 pillar pages
Pillar pages link to relevant programmatic pages
Related programmatic pages interlink
Maintain silo structure by topic

XML Sitemap Management:

Separate sitemap for programmatic pages
Regular updates as new pages publish
Priority scoring based on quality metrics

Part 3: Real-World Examples (Without the Spam)

Example 1: SaaS Comparison Engine

Traditional (Spammy) Approach:

500 "X vs Y" pages with swapped keywords
Thin content (<300 words)
No unique insights
Obvious affiliate bias

Quality Programmatic Approach:

Data source: Actual user reviews + feature analysis
Template includes:
- Real pros/cons from user data
- Integration compatibility matrix
- Pricing breakdown by team size
- Migration difficulty scoring
Unique value: "Based on analysis of 142 teams who switched..."
Page count: 50 highly comprehensive comparisons

Example 2: Local Service Pages

Traditional (Spammy) Approach:

"Plumber in [City]" for 1,000 cities
Identical content with city names swapped
Fake testimonials
No local expertise

Quality Programmatic Approach:

Data source: Local licensing boards + review analysis
Template includes:
- Actual licensing requirements for that area
- Average pricing based on local data
- Common local issues (e.g., "old pipes in historic districts")
- Real local business hours/patterns
Unique value: "Unlike [Neighboring City], here you need..."
Page count: Only for areas you actually serve

Example 3: Calculator/Resource Pages

Traditional (Spammy) Approach:

Generic calculators with ads
No explanation of formulas
Thin supporting content

Quality Programmatic Approach:

Data source: Industry benchmarks + academic research
Template includes:
- Interactive calculator with multiple scenarios
- Formula explanation with assumptions
- Industry comparison data
- Actionable interpretation of results
Unique value: "Why standard calculations fail for [specific scenario]"
Supporting content: Detailed methodology page

Part 4: Quality Control Systems

Automated Quality Metrics

Track for every programmatic page:

Engagement Thresholds:
- Minimum time on page: 90 seconds
- Maximum bounce rate: 60%
- Minimum scroll depth: 50%
Performance Metrics:
- Core Web Vitals compliance
- Mobile usability scores
- Indexation rate
SEO Health:
- Keyword cannibalization alerts
- Internal linking saturation
- Orphan page detection

Human Review Schedule

Monthly: Review 5% of programmatic pages
Quarterly: Update data/references
Bi-annually: Complete template refresh
Annually: Prune underperforming pages (<10 visits/month for 6 months)

The "Would I Share This?" Test

Every page should pass: "Would I genuinely share this with a colleague facing this specific problem?" If not, improve it.

Part 5: Scaling Without Dilution

The Expansion Framework

Existing Authority → New Related Topic → Quality Content → Measure → Expand Further

Expansion criteria:

Search demand exists (1,000+ monthly searches)
You can provide unique value (data, expertise, perspective)
Fits your site's expertise (clear connection to pillars)
Commercial potential aligns with goals

Velocity Management

Start slow: 10-20 pages/month
Monitor quality signals: No degradation in engagement
Adjust based on:
- Crawl budget impact
- Indexation rate
- Overall site authority changes

The Saturation Warning Signs

Red flags that you're scaling too fast:

Indexation rate drops below 80%
Average position declines for existing pages
Crawl errors increase significantly
Overall site traffic plateaus while page count grows

Part 6: Technical Implementation (Without Breaking Everything)

Architecture Decisions

Option A: Subdirectory

yoursite.com/programmatic/[pages]/

Best for: Tight topic integration, authority sharing

Option B: Subdomain

programmatic.yoursite.com/[pages]/

Best for: Experimental approaches, very different content types

Option C: Separate Property

different-site.com/[pages]/

Best for: Completely different topics, risk isolation

Recommendation: Start with Option A unless you have specific reasons otherwise.

Performance Optimization

Caching strategy: Separate cache for dynamic programmatic pages
CDN configuration: Edge computing for personalization
Database optimization: Read replicas for high-traffic query patterns
Lazy loading: Images, tables, and interactive elements

Crawl Efficiency

Googlebot Time = Limited Resource

Prioritize important pages via internal links
Use robots.txt strategically for pagination/search pages
Implement proper canonicalization for filtered views
Monitor crawl stats in Search Console weekly

Part 7: Measuring Real Success (Vanity Metrics vs. Value Metrics)

What NOT to Focus On:

Raw page count
Keyword rankings alone
Impressions without clicks

What Actually Matters:

Tier 1: User Value Metrics

Conversion rate from programmatic pages
Engagement time compared to manual pages
Support ticket reduction on covered topics
User satisfaction scores (surveys, feedback)

Tier 2: SEO Health Metrics

Crawl efficiency (pages crawled vs. indexed)
Keyword cannibalization incidents
Domain authority distribution (not concentrated on few pages)
Internal link equity flow

Tier 3: Business Impact

Customer acquisition cost reduction
Support cost reduction
Upsell/cross-sell attribution
Competitive positioning improvement

The ROI Calculation

Programmatic SEO ROI = 
(Attributed Revenue - Production Costs) / Production Costs

Where:
Attributed Revenue = Conversions × Average Value × Attribution %
Production Costs = (Tooling + Labor + Hosting) / Number of Pages

Good target: 300-500% ROI within 12 months

Part 8: Advanced: AI-Assisted Programmatic SEO

The Right Way to Use AI:

Data analysis at scale
Template optimization through A/B testing
Quality scoring automation
Opportunity identification

The Wrong Way to Use AI:

Content generation without human oversight
Keyword stuffing detection evasion
Fake expertise creation
Review/ testimonial fabrication

The key is using AI as quality control and scaling assistant, not as content creation replacement.

Part 9: The Pruning Principle

When to Remove Programmatic Pages:

Consistent underperformance: <10 visits/month for 6+ months
Data becomes obsolete: Information is no longer accurate
Quality score declines: Failing regular quality audits
Cannibalization issues: Competing with better pages
Strategic shift: No longer aligns with business focus

How to Prune Properly:

301 redirect to most relevant page
Update internal links pointing to removed page
Remove from sitemaps
Monitor for traffic recovery on target pages
Document learnings for future projects

Conclusion: The Sustainable Programmatic Mindset

Programmatic SEO isn't about replacing human creativity, it's about systematizing human insight. The goal isn't more pages; it's more helpful pages.

Remember this hierarchy:

User Value > Content Quality > Scalable Systems > Automation

If your programmatic SEO prioritizes automation over quality, you're building a house of cards. If it prioritizes user value first, you're building an asset.

Your Implementation Checklist:

Start with 10 pages - Prove quality before scale
Establish quality benchmarks - What makes your manual pages successful?
Build templates around value - Not just around keywords
Implement rigorous QA - Human review every page initially
Measure what matters - Engagement and conversions, not just rankings
Prune aggressively - Remove what doesn't work
Iterate constantly - Improve templates based on data

The Ultimate Test:

Six months from now, when you look at your programmatic pages, you shouldn't be able to tell they were created systematically. They should feel as valuable, unique, and helpful as your best manually created content.

That's the difference between programmatic SEO and programmatic spam. One builds assets, the other builds liabilities.

Start small. Prioritize quality. Measure rigorously. Scale carefully. That's how you build programmatic SEO that lasts.

How Embeddings Actually Improve SEO: A Practical Guide for Developers

Siddharth Patel — Sun, 07 Dec 2025 13:52:18 +0000

SEO is undergoing a fundamental shift. Modern search engines rely on vector embeddings to capture meaning beyond keywords. In other words, SEO today is about matching intent and context, not just exact words. Google’s AI-powered algorithms, from Hummingbird to RankBrain to BERT, all build on embedding representations of queries and content. In practice, this means SEO is “no longer about optimizing for exact words but for meaning, relationships, and relevance”.

As developers and machine-learners, we can leverage the same techniques inside our sites: by turning text into vectors, we can measure semantic similarity, cluster topics, and uncover hidden keyword opportunities in a quantitative way.

What Are Embeddings?

At a high level, embeddings are numeric vectors that represent text (words, sentences, or whole pages) in a high-dimensional space. An embedding model takes text as input and outputs a list of floating-point numbers. These coordinates capture the semantics of the text: conceptually similar pieces of content get vectors that are close together. As one SEO specialist explains, “vector embedding is a method LLMs use to assess the relationships between different pieces of content. They are numerical representations of words, phrases, or documents in a multi-dimensional space”.

For example, a model might map “project management software” and “team collaboration tools” to nearby points in space, even if they share no exact keywords.

Vector embeddings turn language into geometry: terms like “dog,” “cat,” and “canine” end up near each other, while “hot dog” (the food) goes off in a totally different direction. The model learns this by analyzing usage contexts: since “dog” appears with “bark” and “leash,” it clusters with other pet concepts. In essence, embeddings give AI a way to “think” about meaning. Famous examples illustrate this: if you take the embedding for “king,” subtract “man,” and add “woman,” you get a vector very close to “queen”. Embeddings also allow embeddings arithmetic, analogies, and a continuous measure of how related any two texts are.

Technically, there are many kinds of embedding models. Older models like Word2Vec or GloVe produce fixed word vectors, while newer models (BERT, GPT, or sentence-transformers) give contextual embeddings for whole sentences or documents. Models vary from small (tens of millions of parameters) to huge (billions) and from general-purpose to domain-specific. For instance, EmbeddingGemma is a 300M-parameter open model covering 100+ languages, while larger models like Qwen3-8B or Meta’s Llama-Embed (fine-tuned versions of popular LLMs) excel on broad tasks. In practice, developers can pick from many open-source and API models (Hugging Face hosts 100K+ embeddings), Depending on accuracy, cost, and latency trade-offs. The key idea is that any good embedding model can turn your content into vectors that machines can compare.

From Keyword Matching to Semantic Relevance

How do embeddings change the SEO game? Traditionally, SEO relied on keyword counts (TF-IDF) and link signals. But TF-IDF treats each word as a separate “dimension” and fails to capture meaning: swapping the order of unrelated words doesn’t change the TF-IDF score, even if the meaning flips. For example, the sentences:

“Bill ran from the giraffe towards the dolphin.” “Bill ran from the dolphin towards the giraffe.”

contain the same words but mean very different things. A simple bag-of-words metric would judge them nearly identical, which is clearly wrong. Embeddings solve this by capturing context and semantics. Semantic search engines now “go beyond simple keyword matching” to interpret intent. In practice, two pages that use different phrasing but discuss the same topic will have similar vector embeddings. So if a user searches “best laptop for gaming,” an embedding-based search can return pages about “high-performance gaming laptops” even if “best laptop” is not explicitly on the page. In short, embeddings let search engines surface relevant content even without exact keyword overlap.

This shift has big implications for us as developers. Instead of checking keyword density, we’ll measure how close our content’s embeddings are to target queries. Cosine similarity is commonly used: a cosine score near 1 means two texts are very similar in meaning, whereas 0 means orthogonal (unrelated). In formula form, given two embedding vectors u and v,

\mathrm{cosineSimilarity}(u, v) = \frac{u \cdot v}{|u||v|}

A high cosine similarity indicates our content and the query “point” in the same direction in embedding space.

In summary, the era of keyword-stuffed SEO is fading. Now the goal is to cover concepts comprehensively. Search engines and AI-driven overviews rank pages not just on words, but on the concepts they cover. For example, writing a guide on “project management” that also mentions related terms like “team collaboration,” “workflow automation,” and “productivity software” signals strong topical authority because embeddings capture those semantic relationships. In this new reality, terse answers or FAQ nuggets (engineered for AI summarizers) and rich semantic content win the day.

Key SEO Use Cases for Embeddings

(In this blog series I will write in-detail about lot of technical implementations, take this one as introductory post)

Embedding models unlock many practical workflows in SEO. Here are some core use cases to consider:

Semantic Keyword Research & Clustering: Instead of just mining exact match keywords, use embeddings to discover synonyms and related terms. For example, an embedding model might tell you that “eco-friendly baby wipes” is close in meaning to “natural wipes for newborns” or “biodegradable diapers”. This reveals long-tail keyword opportunities and topic clusters you may have missed. In practice, you can take a list of seed keywords, embed each one, and cluster them (e.g. with k-means or hierarchical clustering) to see which phrases naturally group together by intent. These clusters help you build comprehensive content plans: instead of writing many one-off posts, focus on robust articles that cover each semantic cluster.
Intent and Topic Modeling: Vector embeddings can automatically categorize queries by user intent. By clustering query embeddings, you’ll naturally separate informational queries (e.g. “what is X”) from transactional or navigational ones. This means you can tailor your pages to the user’s stage: think educational guides for learning intent, comparison pages for evaluation intent, and product pages for purchase intent. For example, grouping all “how-to” queries together lets you write a thorough tutorial, while a separate cluster of “best X for Y” suggests a buyer’s guide. Even without manual tagging, embeddings reveal “what users are really trying to do”, so you can align content format and depth to actual search behavior.
Competitor & Content Gap Analysis: Embeddings help you see hidden competitors, other sites (even unexpected ones like forum threads or Q&A sites) that serve the same user intent. By embedding your target keywords and the top-ranking pages in a vector space, you can find pages that live in the same semantic neighborhood. These are the pages you actually compete with for eyeballs. For instance, you might find that a Reddit thread is capturing traffic for a keyword you thought was safe. Knowing this, you can study those pages: what additional subtopics or formats do they cover? You can then fill the gap on your site. In practice, compute embeddings for competitor URLs or titles, compare them to your pages, and focus on the ones with highest similarity, those are “semantic competitors” stealing traffic.
Content Optimization & Internal Linking: Embeddings shine at surfacing related concepts that should live together on a page. For example, if you’re writing about “electric vehicles,” embedding models might highlight terms like battery range, charging speed, EV tax incentives, and range anxiety as closely related concepts. Including sections on these topics will make your content more comprehensive and user aligned. In fact, embeddings tell us which subtopics users naturally group together: if a particular concept clusters far apart (say “EV range” vs “green energy tariffs”), it may deserve its own page. In one case, embeddings revealed that “charging infrastructure” and “green energy” fell into different clusters, so they should not be shoehorned into a single article. Aligning your site structure to these clusters, linking related content and separating distinct topics, helps search engines understand your site’s architecture and boosts clarity for users. In effect, embedding-based site audits can drive smarter siloing and internal linking strategies.
Query and Content Matching (Vector Search): On the engineering side, you can build a vector search system for your content. For each page or piece of content, compute and store an embedding. When a user query arrives, embed the query and retrieve the nearest page vectors by cosine similarity or a vector database (like FAISS or Pinecone). This goes beyond keyword lookup, you’re matching on meaning. In pseudocode:

page_vectors = [model.embed(text) for text in all_page_texts]
query_vector = model.embed(user_query)
scores = [cosine_similarity(query_vector, pv) for pv in page_vectors]
top_pages = sort_by_score(scores)[:5]

This simple pipeline returns your most semantically relevant pages. You can use it for site search, related-article widgets, or even SEO analysis (e.g. verifying that a query matches your intended landing page). For large content sets, you’d use an approximate nearest-neighbor index:

index = build_vector_index(page_vectors)
top_pages = index.search(query_vector, k=5)

Tools like Hugging Face’s embeddings or libraries like LlamaIndex make it easy to plug in sentence-transformer models (BGE, E5, etc.) and perform semantic retrieval at scale.

Knowledge Graphs and Entities: Embeddings can link concepts and named entities across content. By embedding key phrases or entities, you can cluster them and see which pages share the same entities or topics. For example, if your site mentions many person or organization names, embedding those entities can help you automatically tag or relate pages. This can feed into structured data (schema) or knowledge graph features, improving how rich results connect users to your content.

Choosing Embedding Models

There’s no one-size-fits-all model, so developers should weigh options. Some quick guidelines:

OpenAI and API Models: Commercial APIs like OpenAI’s embeddings (text-embedding-ada-002) are easy to use and high quality, but cost per query can add up and require internet calls. They’re a good starting point for prototypes or occasional use. (This is what I am using at LLaMaRush till I hit 100 customers, and 20000 user blogs)
Open-Source Transformers: There are many free models on Hugging Face. Sentence-Transformers (SBERT) offers dozens of options (e.g. all-MiniLM for speed, or larger models for accuracy). Newer models like Meta’s Llama-Embed or Google’s EmbeddingGemma are promising. For example, EmbeddingGemma (300M parameters) supports 100+ languages and ranks top for its size. If you have GPU resources, large models like Qwen3-8B produce very rich embeddings. Conversely, smaller models (50M–500M params) are often “good enough” and much faster for real-time use.
Domain/Task-Specific: If your SEO niche is specialized (e.g. legal or medical), consider fine-tuned or custom models. You could even train embeddings on your own content. Tools like LlamaIndex allow fine-tuning embeddings (including on-proprietary models). Otherwise, embedding models trained on broad web text tend to be surprisingly robust across domains.
Vector Dimensionality and Format: Check the output dimensionality: some models give 384-dim vectors (e.g. MiniLM) while others give 1024+. Cosine similarity works regardless of dimension, but higher-d models may capture nuance better. Also, some embeddings are dense (floating-point vectors) while older methods (TF-IDF, BM25) were sparse; modern SEO relies on these dense vectors. Libraries like Baseten’s guide note that with 100K+ models on Hugging Face, the ideal choice often balances inference speed vs accuracy.

In practice, I recommend starting with a well-known sentence transformer. For example, all-MiniLM-L6-v2 (384-dim, ~10MB) or all-MiniLM-L12-v2 (768-dim) are fast and free. These capture phrases and short passages nicely. For heavy-duty use, try OpenAI’s Ada or a 7B-8B open model (like bge-7b-1.3). The Baseten blog highlights models like Qwen3 and EmbeddingGemma as top picks if you need cutting-edge accuracy. Ultimately, experiment: compute sample similarities (e.g. does your site’s “title embedding” match the page content embedding?) to see which model best reflects your notion of relevance.

Putting It All Together: Example Workflow

Let’s sketch a simple embedding-driven SEO pipeline:

Crawl & Collect Content: Gather text from your pages (or use an existing content database).
Note: I am writing blog soon on how to crawl your own site
Compute Page Embeddings: Use your chosen model to embed page text (e.g.
+ or section by section). Store these vectors in a database or vector index.
Keyword/Data Gathering: Compile a list of target keywords, queries from Search Console, or competitor queries.
Compute Query Embeddings: Embed those keywords/queries with the same model.
Similarity & Clustering:
- For each query, retrieve the nearest page vectors (via cosine similarity) to see which page best matches semantically.
- Cluster all query vectors to discover intent groups or topic clusters.
- Cluster page vectors to identify topical families of your content.
Insights & Optimization:
- Identify gaps: Are there high-volume query clusters with no close page embedding? That’s a content opportunity.
- Optimize pages: For a given page, see the top terms (or other pages) closest to its embedding, consider adding those related concepts or linking to those pages.
- Competitive analysis: Embed top competitors’ pages for your target topics; see where they land relative to yours in vector space.
Monitor & Iterate: As queries shift, re-run embeddings periodically. Use embedding similarity as an additional ranking signal: pages closer to many relevant query vectors should be prioritized.

A tiny code sketch (pseudocode):

# Prepare content embeddings
page_texts = load_all_pages()
page_vectors = [embed_model.encode(text) for text in page_texts]
# Build an index for fast nearest-neighbor search
index = build_faiss_index(page_vectors)

# Given new query
query_vec = embed_model.encode(user_query)
# Retrieve top-3 semantically closest pages
distances, page_ids = index.search(query_vec, k=3)
for dist, pid in zip(distances, page_ids):
    print(f"Page {pid} matched with similarity {1-dist:.2f}")

And for keyword clustering:

keywords = ["eco-friendly baby wipes", "biodegradable diapers", "natural newborn wipes", ...]
kw_vectors = [embed_model.encode(kw) for kw in keywords]
clusters = KMeans(n_clusters=3).fit(kw_vectors)
for i, label in enumerate(clusters.labels_):
    print(f"Keyword '{keywords[i]}' is in cluster {label}")

These snippets illustrate the logic; real implementations would handle batching, GPU acceleration, and data storage.

Conclusion

In summary, embeddings bring a huge advantage to SEO workflows. They let us quantify semantic similarity, surface latent topics, and optimize content for meaning. Rather than manually guessing synonyms or reading search query logs, a developer can programmatically find exactly what search engines are looking for. As one SEO technologist put it, embeddings allow us to “engineer the relevance of your content to perform better”.

For example, our team’s tool LlamaRush (an “AI SEO co-founder”) uses similar ideas: it crawls your site and analytics, then uses embeddings to generate content suggestions and keyword strategies automatically. Whether or not you use a SaaS product, you can incorporate embeddings into your own projects. Start simple: pick a pre-trained embedding model (like a Sentence-Transformer), vectorize your pages, and run a few cosine searches on real queries. You’ll quickly see that related concepts light up in the results, giving you clear ideas for improvement.

The era of purely keyword-based SEO is fading. As search evolves toward understanding intent, embracing embeddings is essential. By thinking in vectors, not just strings, we build content and tools that align with where search is headed.

Thanks for reading! ❤️