<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Fuad Hasan</title>
    <description>The latest articles on Forem by Fuad Hasan (@devfuad).</description>
    <link>https://forem.com/devfuad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3804300%2Fb4f45c67-db37-4f31-8f95-65a9d8f04573.png</url>
      <title>Forem: Fuad Hasan</title>
      <link>https://forem.com/devfuad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/devfuad"/>
    <language>en</language>
    <item>
      <title>I Built an AI That Automates Literature Reviews — Here's How It Works Under the Hood</title>
      <dc:creator>Fuad Hasan</dc:creator>
      <pubDate>Tue, 03 Mar 2026 16:41:26 +0000</pubDate>
      <link>https://forem.com/devfuad/i-built-an-ai-that-automates-literature-reviews-heres-how-it-works-under-the-hood-5gpk</link>
      <guid>https://forem.com/devfuad/i-built-an-ai-that-automates-literature-reviews-heres-how-it-works-under-the-hood-5gpk</guid>
      <description>&lt;p&gt;If you've ever had to do a systematic literature review — the kind where you manually search databases, download 80 PDFs, read each one, and paste findings into a spreadsheet — you know it's one of the most brutal parts of academic research. It takes weeks, sometimes months.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6ob39z2231pb8de5905.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd6ob39z2231pb8de5905.png" alt="Research Room AI" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built Research Room AI ([&lt;a href="https://researchroomai.com%5D" rel="noopener noreferrer"&gt;https://researchroomai.com]&lt;/a&gt;) to eliminate that pain. You type in a research topic, and the platform finds relevant papers, downloads the full-text PDFs (open-access only), reads them cover-to-cover with AI, and spits out a structured, exportable table of methodology, findings, and limitations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What It Actually Does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The core user flow is four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define your topic&lt;/strong&gt; — Enter your research subject + constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure full texts&lt;/strong&gt; — The system identifies and downloads legal open-access PDFs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI synthesis&lt;/strong&gt; — An LLM reads each paper and extracts structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export &amp;amp; analyze&lt;/strong&gt; — Results land in a clean dashboard; download as CSV&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The hard part isn't any single step — it's making all four work together reliably at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Tech Stack&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Frontend:   Next.js 15 (App Router) + Tailwind CSS&lt;br&gt;
Auth:       Supabase Auth&lt;br&gt;
Database:   PostgreSQL via Prisma ORM&lt;br&gt;
Queue:      BullMQ on Redis&lt;br&gt;
Worker:     Separate Node.js service (Docker)&lt;br&gt;
AI:         Groq (fast LLM inference)&lt;br&gt;
Storage:    Cloudflare R2&lt;br&gt;
Payments:   Paddle&lt;br&gt;
APIs:       OpenAlex, Semantic Scholar, Google Scholar&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hardest Problem: Finding and Downloading PDFs Reliably&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This was the most frustrating engineering challenge. Academic papers live across hundreds of different publishers, repositories, and paywalls. My approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search OpenAlex / Semantic Scholar for papers matching the topic — these APIs return rich metadata including DOIs and, crucially, open-access PDF URLs.&lt;/li&gt;
&lt;li&gt;Multi-source resolution — if the primary URL fails, fall back to Unpaywall, arXiv, PubMed Central, and institutional repositories.&lt;/li&gt;
&lt;li&gt;Compliance guardrails — only download PDFs explicitly flagged as open-access. No paywalled content, ever.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The PDF resolver service (worker/src/services/pdf-resolver.ts) handles retry logic, redirect chains, and content-type validation. A surprising number of "PDF links" serve HTML error pages — you have to check mime types after download, not before.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Worker Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The main Next.js app and the AI processing worker are fully separate services. This was the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Next.js app stays fast and responsive — it just enqueues jobs&lt;/li&gt;
&lt;li&gt;The worker can be scaled independently and redeployed without touching the frontend&lt;/li&gt;
&lt;li&gt;Long-running AI tasks (reading a 40-page paper) don't block HTTP request cycles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Jobs flow through BullMQ queues backed by Redis. The worker picks up a job, downloads the PDF, sends the text to Groq for extraction, and writes structured results back to Postgres.&lt;/p&gt;

&lt;p&gt;Simplified processor flow:&lt;br&gt;
&lt;code&gt;&lt;br&gt;
  async function processJob(job) {&lt;br&gt;
    const paper = await resolvePDF(job.data.doi);&lt;br&gt;
    const text = await extractText(paper.pdfBuffer);&lt;br&gt;
    const analysis = await groqAnalyzer.extract(text, job.data.topic);&lt;br&gt;
    await prisma.paperResult.create({ data: analysis });&lt;br&gt;
  }&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Groq's LPU inference is key here — it's fast enough that users see results streaming in within a reasonable time, rather than waiting 20 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Database Schema Challenge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every literature review has a different set of columns. One researcher wants sample_size, study_design, country. Another wants model_accuracy, dataset, limitations.&lt;/p&gt;

&lt;p&gt;My solution: store extracted fields as a flexible JSON blob alongside a set of review-level column definitions the user can configure. This gives relational integrity for project-level data while keeping per-paper results flexible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Subscription Model&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Free tier gets 3 review projects with a 30-day trial window. Premium unlocks unlimited reviews:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$0 — 3 reviews&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium Monthly&lt;/td&gt;
&lt;td&gt;$19/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium Yearly&lt;/td&gt;
&lt;td&gt;$149/year ($12.42/mo)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I used Paddle for billing because it handles global VAT/tax compliance out of the box, which would otherwise be a compliance nightmare for a solo founder selling to universities worldwide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lessons Learned&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Decouple your AI work from your web server immediately.&lt;br&gt;
I initially processed PDFs in a Next.js API route. The first time a user uploaded a 200-page paper, the request timed out. Move to an async queue from day one.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Academic APIs are inconsistent — build defensive parsers.&lt;br&gt;
OpenAlex returns null for fields you'd expect to always have values. Semantic Scholar uses a different schema entirely. Write adapters for each source and never trust a field to always exist.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rate limiting is not optional.&lt;br&gt;
Without per-user rate limits on the processing queue, a single determined user could burn through thousands of API credits in minutes. BullMQ's job throttling saved me here.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Full-text &amp;gt; abstract for quality extraction.&lt;br&gt;
Early versions only sent abstracts to the LLM. The quality of extracted methodology was poor. Sending the full paper text (chunked for context window limits) dramatically improved accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stripe vs. Paddle — for research/academic niches, Paddle wins.&lt;br&gt;
Universities and research institutions are often in the EU, UK, or APAC. Paddle being the Merchant of Record means they handle VAT calculation and invoice compliance, which academics often need for expense reimbursement.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Try It&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you do any kind of research — academic, market, scientific — give it a shot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.tourl"&gt;https://researchroomai.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The free tier gets you 3 full literature reviews with no credit card required.&lt;/p&gt;

&lt;p&gt;Happy to answer questions about any part of the architecture in the comments. Building AI tooling for academia is an underexplored niche with real pain to solve — the manual review process genuinely hasn't changed since the 1990s.&lt;/p&gt;

&lt;h1&gt;
  
  
  nextjs #ai #saas #webdev #researchtool
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>saas</category>
      <category>researchtool</category>
    </item>
  </channel>
</rss>
