<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: filtede98</title>
    <description>The latest articles on Forem by filtede98 (@filtede98).</description>
    <link>https://forem.com/filtede98</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3862095%2Fab2c347c-90ef-42b7-ac76-18b275ef0599.jpg</url>
      <title>Forem: filtede98</title>
      <link>https://forem.com/filtede98</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/filtede98"/>
    <language>en</language>
    <item>
      <title>Stop using BeautifulSoup: Convert any webpage to clean Markdown in 1 second</title>
      <dc:creator>filtede98</dc:creator>
      <pubDate>Mon, 06 Apr 2026 19:14:56 +0000</pubDate>
      <link>https://forem.com/filtede98/stop-using-beautifulsoup-convert-any-webpage-to-clean-markdown-in-1-second-15mp</link>
      <guid>https://forem.com/filtede98/stop-using-beautifulsoup-convert-any-webpage-to-clean-markdown-in-1-second-15mp</guid>
      <description>&lt;p&gt;If you're still doing this:                                                                                           &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python                                                                                                             
  from bs4 import BeautifulSoup                                   
  import requests                                                                                                       

  response = requests.get("https://example.com")                                                                        
  soup = BeautifulSoup(response.text, "html.parser")              

  # Remove scripts, styles...                 
  for tag in soup(["script", "style", "nav", "footer"]):
      tag.decompose()

  text = soup.get_text()                      
  # Now clean up whitespace...                                                                                          
  lines = (line.strip() for line in text.splitlines())            
  text = '\n'.join(line for line in lines if line)                                                                      

  ...you're working way too hard. And you're losing all the structure — headings, tables, code blocks, links — gone.    

  There's a better way

  One API call. Any URL. Clean Markdown back in under 1 second.   

  curl -X POST https://wtmapi.com/api/v1/convert \                
    -H "x-api-key: YOUR_KEY" \                                                                                          
    -H "Content-Type: application/json" \ 
    -d '{"url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map"}'          

  What you get back                       

  Instead of a blob of plain text, you get structured Markdown:                                                         

  # Array.prototype.map()                                                                                               

  The **map()** method of Array instances creates a new array
  populated with the results of calling a provided function                                                             
  on every element in the calling array.                          

  ## Syntax                               

  map(callbackFn)                                                                                                       
  map(callbackFn, thisArg)

  ## Examples                                                     

  const numbers = [1, 4, 9];                  
  const roots = numbers.map((num) =&amp;gt; Math.sqrt(num));
  // roots is now [1, 2, 3]

  Headings, code blocks, bold, links, tables — all preserved.

  BeautifulSoup vs WTM API                                                                                              

  ┌─────────────┬─────────────────────────┬───────────────────────────────┐                                             
  │             │      BeautifulSoup      │            WTM API            │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Output      │ Raw text                │ Structured Markdown           │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Headings    │ Lost                    │ Preserved (h1-h6)             │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Code blocks │ Lost                    │ Preserved with language hints │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Tables      │ Lost                    │ Converted to Markdown tables  │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Links       │ Lost                    │ Absolute URLs preserved       │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Setup       │ 10-50 lines of code     │ 1 API call                    │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Speed       │ Depends on your code    │ &amp;lt; 1 second                    │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Maintenance │ You maintain the parser │ Zero                          │                                             
  └─────────────┴─────────────────────────┴───────────────────────────────┘

  Python example                                                  

  import requests                                                 

  response = requests.post(                                       
      "https://wtmapi.com/api/v1/convert",
      headers={                                                                                                         
          "x-api-key": "wtm_your_key",
          "Content-Type": "application/json"                                                                            
      },                                                          
      json={"url": "https://en.wikipedia.org/wiki/Mars"}
  )                                                                                                                     

  data = response.json()                                                                                                
  markdown = data["data"]["markdown"]                             
  print(f"Got {data['data']['length']} chars in {data['meta']['response_time_ms']}ms")

  Works great with LangChain too              

  pip install langchain-wtmapi                                                                                          

  from langchain_wtmapi import WTMApiLoader                                                                             

  loader = WTMApiLoader(                                          
      urls=["https://docs.python.org/3/tutorial/"],                                                                     
      api_key="wtm_your_key",                 
  )                                                                                                                     
  docs = loader.load()                                            
  # Ready for your RAG pipeline                                                                                         

  When to still use BeautifulSoup                                                                                       

  To be fair, BeautifulSoup is still great when you need to:                                                            
  - Extract specific elements (e.g. all prices on a page)
  - Parse XML/RSS feeds                                                                                                 
  - Work offline without API calls                                                                                      
  - Have full control over the parsing logic

  But if you just need web content as Markdown — for RAG, content migration, documentation archival — an API call is
  simpler, faster, and gives you better output.                                                                         

  Try it free                                 

  Live demo at https://wtmapi.com — 3 free conversions without signing up. Free tier: 50 calls/month.

  What do you think? Would love to hear what URLs you test it on.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I built an API that converts any webpage to clean Markdown in under 1 second</title>
      <dc:creator>filtede98</dc:creator>
      <pubDate>Sun, 05 Apr 2026 10:21:01 +0000</pubDate>
      <link>https://forem.com/filtede98/i-built-an-api-that-converts-any-webpage-to-clean-markdown-in-under-1-second-3p5m</link>
      <guid>https://forem.com/filtede98/i-built-an-api-that-converts-any-webpage-to-clean-markdown-in-under-1-second-3p5m</guid>
      <description>&lt;p&gt;## The Problem                                                                                                        &lt;/p&gt;

&lt;p&gt;I was building a RAG pipeline and needed a way to feed web content into my LLM. The options were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copy-paste the text manually&lt;/strong&gt; — doesn't scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Beautiful Soup&lt;/strong&gt; — returns raw text, loses all structure
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a headless browser&lt;/strong&gt; — slow, expensive, complex to maintain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of them gave me what I actually needed: &lt;strong&gt;structured Markdown&lt;/strong&gt; that preserves headings, tables, code blocks, and&lt;br&gt;
   links.                                                                                                               &lt;/p&gt;

&lt;p&gt;## So I Built WTM API                                                                                                 &lt;/p&gt;

&lt;p&gt;One POST request. Any URL. Clean Markdown back in under 1 second.                                                     &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
  curl -X POST https://wtmapi.com/api/v1/convert \                                                                      
    -H "x-api-key: YOUR_KEY" \                
    -H "Content-Type: application/json" \                                                                               
    -d '{"url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map"}'

  What you get back:                      

  # Array.prototype.map()                                                                                               

  The **map()** method of Array instances creates a new array                                                           
  populated with the results of calling a provided function                                                             
  on every element in the calling array.

  ## Syntax                                                       

  map(callbackFn)                                                 
  map(callbackFn, thisArg)

  ## Examples                             

  const numbers = [1, 4, 9];                                                                                            
  const roots = numbers.map((num) =&amp;gt; Math.sqrt(num));
  // roots is now [1, 2, 3]                                                                                             

  Headings, code blocks with syntax hints, bold text, links — all preserved. Not just raw text.

  How It Works

  No headless browser. No Puppeteer. No Playwright.               

  The engine runs server-side with Cheerio (lightweight HTML parser) and does:

  1. Minimal cleanup — removes only &amp;lt;nav&amp;gt;, &amp;lt;script&amp;gt;, &amp;lt;style&amp;gt;, cookie banners
  2. Recursive conversion — walks the DOM tree and converts each element to its Markdown equivalent                     
  3. URL resolution — relative links become absolute URLs         
  4. Table conversion — HTML tables become proper Markdown tables                                                       

  The result is a faithful conversion of the page content, not a lossy text extraction.                                 

  What I Used to Build It                                                                                               

  The entire stack runs on free tiers:                                                                                  

  ┌────────────┬────────────────────────────────────────────┐     
  │  Service   │                  Purpose                   │                                                           
  ├────────────┼────────────────────────────────────────────┤     
  │ Next.js 16 │ Framework (App Router)                     │                                                           
  ├────────────┼────────────────────────────────────────────┤     
  │ Supabase   │ Auth + PostgreSQL + Row Level Security     │
  ├────────────┼────────────────────────────────────────────┤
  │ Stripe     │ Subscription billing (Free/Pro/Enterprise) │
  ├────────────┼────────────────────────────────────────────┤                                                           
  │ Vercel     │ Hosting and deployment                     │                                                           
  ├────────────┼────────────────────────────────────────────┤                                                           
  │ Cheerio    │ HTML parsing engine                        │                                                           
  └────────────┴────────────────────────────────────────────┘     

  Total monthly cost: $0.                                         

  Pricing

  - Free: 50 calls/month (no credit card)     
  - Pro: $9/month — 10,000 calls                                                                                        
  - Enterprise: $49/month — 100,000 calls                         

  Try It Now

  There's a live demo on the site — 3 free conversions, no signup required. Paste any URL and see the output instantly.

  https://wtmapi.com                                              

  I'd love to hear what you think. What URLs would you test it on? What features would you want next?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>webdev</category>
      <category>api</category>
      <category>markdown</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
