Stop using BeautifulSoup: Convert any webpage to clean Markdown in 1 second

filtede98 — Mon, 06 Apr 2026 19:14:56 +0000

If you're still doing this:


python                                                                                                             
  from bs4 import BeautifulSoup                                   
  import requests                                                                                                       

  response = requests.get("https://example.com")                                                                        
  soup = BeautifulSoup(response.text, "html.parser")              

  # Remove scripts, styles...                 
  for tag in soup(["script", "style", "nav", "footer"]):
      tag.decompose()

  text = soup.get_text()                      
  # Now clean up whitespace...                                                                                          
  lines = (line.strip() for line in text.splitlines())            
  text = '\n'.join(line for line in lines if line)                                                                      

  ...you're working way too hard. And you're losing all the structure — headings, tables, code blocks, links — gone.    

  There's a better way

  One API call. Any URL. Clean Markdown back in under 1 second.   

  curl -X POST https://wtmapi.com/api/v1/convert \                
    -H "x-api-key: YOUR_KEY" \                                                                                          
    -H "Content-Type: application/json" \ 
    -d '{"url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map"}'          

  What you get back                       

  Instead of a blob of plain text, you get structured Markdown:                                                         

  # Array.prototype.map()                                                                                               

  The **map()** method of Array instances creates a new array
  populated with the results of calling a provided function                                                             
  on every element in the calling array.                          

  ## Syntax                               

  map(callbackFn)                                                                                                       
  map(callbackFn, thisArg)

  ## Examples                                                     

  const numbers = [1, 4, 9];                  
  const roots = numbers.map((num) => Math.sqrt(num));
  // roots is now [1, 2, 3]

  Headings, code blocks, bold, links, tables — all preserved.

  BeautifulSoup vs WTM API                                                                                              

  ┌─────────────┬─────────────────────────┬───────────────────────────────┐                                             
  │             │      BeautifulSoup      │            WTM API            │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Output      │ Raw text                │ Structured Markdown           │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Headings    │ Lost                    │ Preserved (h1-h6)             │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Code blocks │ Lost                    │ Preserved with language hints │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Tables      │ Lost                    │ Converted to Markdown tables  │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤                                             
  │ Links       │ Lost                    │ Absolute URLs preserved       │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Setup       │ 10-50 lines of code     │ 1 API call                    │                                             
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Speed       │ Depends on your code    │ < 1 second                    │
  ├─────────────┼─────────────────────────┼───────────────────────────────┤
  │ Maintenance │ You maintain the parser │ Zero                          │                                             
  └─────────────┴─────────────────────────┴───────────────────────────────┘

  Python example                                                  

  import requests                                                 

  response = requests.post(                                       
      "https://wtmapi.com/api/v1/convert",
      headers={                                                                                                         
          "x-api-key": "wtm_your_key",
          "Content-Type": "application/json"                                                                            
      },                                                          
      json={"url": "https://en.wikipedia.org/wiki/Mars"}
  )                                                                                                                     

  data = response.json()                                                                                                
  markdown = data["data"]["markdown"]                             
  print(f"Got {data['data']['length']} chars in {data['meta']['response_time_ms']}ms")

  Works great with LangChain too              

  pip install langchain-wtmapi                                                                                          

  from langchain_wtmapi import WTMApiLoader                                                                             

  loader = WTMApiLoader(                                          
      urls=["https://docs.python.org/3/tutorial/"],                                                                     
      api_key="wtm_your_key",                 
  )                                                                                                                     
  docs = loader.load()                                            
  # Ready for your RAG pipeline                                                                                         

  When to still use BeautifulSoup                                                                                       

  To be fair, BeautifulSoup is still great when you need to:                                                            
  - Extract specific elements (e.g. all prices on a page)
  - Parse XML/RSS feeds                                                                                                 
  - Work offline without API calls                                                                                      
  - Have full control over the parsing logic

  But if you just need web content as Markdown — for RAG, content migration, documentation archival — an API call is
  simpler, faster, and gives you better output.                                                                         

  Try it free                                 

  Live demo at https://wtmapi.com — 3 free conversions without signing up. Free tier: 50 calls/month.

  What do you think? Would love to hear what URLs you test it on.

I built an API that converts any webpage to clean Markdown in under 1 second

filtede98 — Sun, 05 Apr 2026 10:21:01 +0000

## The Problem

I was building a RAG pipeline and needed a way to feed web content into my LLM. The options were:

Copy-paste the text manually — doesn't scale
Use Beautiful Soup — returns raw text, loses all structure
Use a headless browser — slow, expensive, complex to maintain

None of them gave me what I actually needed: structured Markdown that preserves headings, tables, code blocks, and
links.

## So I Built WTM API

One POST request. Any URL. Clean Markdown back in under 1 second.


bash
  curl -X POST https://wtmapi.com/api/v1/convert \                                                                      
    -H "x-api-key: YOUR_KEY" \                
    -H "Content-Type: application/json" \                                                                               
    -d '{"url": "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map"}'

  What you get back:                      

  # Array.prototype.map()                                                                                               

  The **map()** method of Array instances creates a new array                                                           
  populated with the results of calling a provided function                                                             
  on every element in the calling array.

  ## Syntax                                                       

  map(callbackFn)                                                 
  map(callbackFn, thisArg)

  ## Examples                             

  const numbers = [1, 4, 9];                                                                                            
  const roots = numbers.map((num) => Math.sqrt(num));
  // roots is now [1, 2, 3]                                                                                             

  Headings, code blocks with syntax hints, bold text, links — all preserved. Not just raw text.

  How It Works

  No headless browser. No Puppeteer. No Playwright.               

  The engine runs server-side with Cheerio (lightweight HTML parser) and does:

  1. Minimal cleanup — removes only <nav>, <script>, <style>, cookie banners
  2. Recursive conversion — walks the DOM tree and converts each element to its Markdown equivalent                     
  3. URL resolution — relative links become absolute URLs         
  4. Table conversion — HTML tables become proper Markdown tables                                                       

  The result is a faithful conversion of the page content, not a lossy text extraction.                                 

  What I Used to Build It                                                                                               

  The entire stack runs on free tiers:                                                                                  

  ┌────────────┬────────────────────────────────────────────┐     
  │  Service   │                  Purpose                   │                                                           
  ├────────────┼────────────────────────────────────────────┤     
  │ Next.js 16 │ Framework (App Router)                     │                                                           
  ├────────────┼────────────────────────────────────────────┤     
  │ Supabase   │ Auth + PostgreSQL + Row Level Security     │
  ├────────────┼────────────────────────────────────────────┤
  │ Stripe     │ Subscription billing (Free/Pro/Enterprise) │
  ├────────────┼────────────────────────────────────────────┤                                                           
  │ Vercel     │ Hosting and deployment                     │                                                           
  ├────────────┼────────────────────────────────────────────┤                                                           
  │ Cheerio    │ HTML parsing engine                        │                                                           
  └────────────┴────────────────────────────────────────────┘     

  Total monthly cost: $0.                                         

  Pricing

  - Free: 50 calls/month (no credit card)     
  - Pro: $9/month — 10,000 calls                                                                                        
  - Enterprise: $49/month — 100,000 calls                         

  Try It Now

  There's a live demo on the site — 3 free conversions, no signup required. Paste any URL and see the output instantly.

  https://wtmapi.com                                              

  I'd love to hear what you think. What URLs would you test it on? What features would you want next?

Forem: filtede98

Stop using BeautifulSoup: Convert any webpage to clean Markdown in 1 second

I built an API that converts any webpage to clean Markdown in under 1 second