IdeaBot — a YouTube-driven Viral Topic Generator.

Shah Pourazadi — Mon, 01 Sep 2025 02:33:45 +0000

This is a submission for the AI Agents Challenge powered by n8n and Bright Data

What I Built

IdeaBot — a YouTube-driven Viral Topic Generator.
Give it any topic (e.g., “AI & Automation”) and it will:

1) Find trending YouTube results;
2) pull transcripts and top comments;
3) analyze patterns;
4) generate suggested titles, short-form ideas, mini-scripts (3–5 lines), and social post drafts.

Problem it solves: creators waste time guessing what will resonate. IdeaBot grounds ideation in what audiences are already engaging with (comments) and what’s working now (recent videos), then turns that signal into publish-ready prompts and snippets.

Demo

Public Chat (n8n Chat Trigger): https://flow.wemakeflow.com/webhook/ac48b635-40d1-4c83-aa5a-fbf2cb5ba546/chat

n8n Workflow

Workflow JSON (Gist): https://gist.github.com/azadishahab/677424d5a84f570ebbf2fb83544119b6

Technical Implementation

Agent & Model Setup
Chat entrypoint: When chat message received.

Models: Google Gemini Chat Model nodes power the agents.

Google Gemini Chat Model1 → explicitly set to models/gemini-2.5-flash-lite (drives the URL-builder agent).

Google Gemini Chat Model → default Gemini chat model (drives parsing/summarization/repurposing agents).

Agents (system instructions):

AI Agent1 – SERP URL Builder. Generates a Google video search URL:
https://www.google.com/search?q=&tbm=vid&gl=
– comes from the user prompt; is 2-letter country (defaults to us if not specified).
– Output contract: URL only (no extra text).

AI Agent2 – SERP Result Parser. Input: raw SERP payload. Task: extract YouTube video URLs and return them as an array.

AI Agent – Transcript Summarizer. Input: video metadata/transcripts. Task: summarize each transcript into key notes for downstream repurposing.

AI Agent3 – Content Repurposer. Input: transcript summaries + high-signal comments. Task: generate new, original ideas (publish-ready JSON in the final responders).

Bright Data usage (nodes & flow)
Search (SERP):

Node: Access and extract data from a specific URL (Bright Data Verified).

serp_api1, country: us (default), url: {{$json.output}} (the URL built by AI Agent1).

Responds via Respond to Chat (“Done searching Google…”) to keep the chat user informed.

Video transcripts & metadata (YouTube – Video Posts dataset):

Node: Extract structured data from a single URL2 → dataset “Youtube - Videos posts” (dataset_id: e.g., gd_lk56epmy2i5g7lzu0k).

Input URLs: {{ $('Respond to Chat1').item.json.output.toJsonString() }} (the array of video URLs extracted earlier).

Sort by views (desc) then likes (desc) → Limit to 2 top videos.

Code node wraps those into { output: [{ url: ... }] } for consistent downstream shape.

YouTube comments (Comment Collector dataset) with polling:

Node: Extract structured data from a single URL1 → dataset “Youtube - Comments” (dataset_id: e.g., gd_lk9q0ew71spt1mxywf).

Snapshot polling loop: Edit Fields1 (capture snapshot_id) → Download the snapshot content → If status == "running" → Wait 6s → loop back to Download until done.

Filter1: keep only comments with likes > 60 (noise reduction).

Aggregate: consolidate high-signal comment_text for analysis.

Data shaping & analysis pipeline
SERP URL Builder (AI Agent1) → Bright Data SERP fetch → AI Agent2 extracts an array of YouTube URLs → Respond to Chat1 acknowledges URL collection.

Video Posts dataset (transcripts/metadata) → Sort → Limit (2) → Code packaging → Respond to Chat3 (status update) → Comments dataset (with polling) → Filter1 (likes>60) → Aggregate (comment texts).

Summarization branch: the Limit node also feeds AI Agent (Summarizer) to create concise transcript summaries.

Merge:

Aggregate1 collects summarizer outputs;

Aggregate (comments) merges via Merge → Aggregate2 (aggregateAllItemData) to a single payload.

Content generation: AI Agent3 (Repurposer) transforms summaries + comments into the final JSON ideas package.

Final reply: Respond to Chat2 returns the JSON object to the user.

Prompting & contracts (highlights from node configs)

URL Builder (Agent1): strict instruction to output only the correctly-formed SERP URL with tbm=vid and default gl=us.

Parser (Agent2): extract YouTube URLs array from SERP results (no prose).

Summarizer: “Summarize the video transcription, keep all important notes … used for content repurpose.”

Repurposer (Agent3): “You are the Content Repurposer Agent… generate fresh, original content ideas based on video summaries + top comments.”

Final schema: returned by Respond to Chat as a JSON payload (titles, short-form ideas, mini-scripts, post drafts).

Memory / conversation behavior
Workflow uses Chat Trigger (public) with responseNodes mode and several Respond to Chat status messages.

There is no dedicated memory node in this export; each run is effectively stateless (refinements re-enter the flow). You can add an n8n Chat Memory Manager / Window Buffer Memory later if you want multi-turn refinement without re-scraping.

Notable safeguards & heuristics
Comment quality gate: likes > 60 to boost signal.

Top-video cap: Limit 2 (fast, token-efficient).

Polling loop: waits for Bright Data comment snapshots to complete before analysis.

Code shaping: wraps arrays into { output: [...] } so downstream Bright Data nodes accept uniform input.

Bright Data Verified Node

How it’s used end-to-end:

SERP (video) fetch
Node: Access and extract data from a specific URL

serp_api1; gl defaults to us; URL pattern https://www.google.com/search?q=&tbm=vid&gl= generated upstream.

Output is handed to AI Agent to extract YouTube links.

Video Post (transcripts & metadata)
Node: Extract structured data from a single URL

Dataset: e.g., gd_lk56epmy2i5g7lzu0k (“Youtube - Videos posts”)

Flow: Sort (views, likes) → Limit top 2..5 → Code to produce {output:[{url:...}]} for downstream.

Comment Collector
Node: Extract structured data from a single URL

Dataset: e.g., gd_lk9q0ew71spt1mxywf (“Youtube - Comments”)

Snapshot poll: Edit Fields → Wait → Download snapshot content → If (status=="running") loop back → else continue.

Quality: Filter (e.g., likes > 60) → Aggregate to merge comment text for analysis.

This pairing (SERP → Video Post → Comment Collector) yields fresh, structured inputs resilient to blocking, enabling reliable analysis and ideation.

Journey

Process:
Started from a clear target: ideas tied to real audience demand.

Built a prompt→URL Builder so users can stay free-form while the system enforces SERP correctness (tbm=vid, gl default).

Split data collection into videos (transcripts) and comments, then layered agents: summarize, pattern-find, repurpose, respond.

Challenges & Solutions:
SERP parsing reliability: Solved by chaining a Bright Data SERP fetch with an LLM Structured Output Parser to normalize video URLs.

Snapshot polling for comments: Implemented a Wait + If loop to poll until snapshot completion, then filtered by likes for signal.

Token/length limits: Summarizer truncates transcripts; comments are filtered before aggregation.

Keeping outputs actionable: A dedicated Repurposer prompt that forces new ideas to be inspired by (not copied from) summaries + comments, then formats to a strict JSON schema.

What I learned:
Enforcing tool contracts (I/O shapes per agent) makes multi-agent flows robust.

Bright Data’s datasets + polling patterns are a clean fit for n8n; pairing them with lightweight LLM parsing yields dependable, real-time pipelines.

A small amount of structure (sorting by views/likes, comment like-thresholds) dramatically improves idea quality and virality potential.