DEV Community

Cover image for DecipherIt: Building a NotebookLM-Inspired AI Research Assistant powered by Bright Data
Amit Wani
Amit Wani

Posted on

6 5 4 6 6

DecipherIt: Building a NotebookLM-Inspired AI Research Assistant powered by Bright Data

This is a submission for the Bright Data AI Web Access Hackathon

What I Built

DecipherIt is a cutting-edge AI-powered research assistant inspired by Google NotebookLM that revolutionizes how researchers, students, and professionals explore, analyze, and synthesize information from the web. The platform transforms any combination of documents, URLs, or topics into comprehensive research notebooks complete with AI-generated summaries, interactive Q&A capabilities, audio overviews, visual mindmaps, and automatically generated FAQs.

The Problem DecipherIt Solves

Traditional research is time-consuming and fragmented. Researchers often struggle with:

  • Information Overload: Sifting through countless sources manually
  • Geo-restrictions: Unable to access content from different regions
  • Bot Detection: Getting blocked when trying to scrape valuable data
  • Synthesis Challenges: Difficulty connecting insights across multiple sources
  • Accessibility: Converting research into different formats for various audiences

DecipherIt addresses these challenges by leveraging Bright Data's MCP Server to provide unrestricted, intelligent web access combined with advanced AI agents that can understand, synthesize, and present information in multiple formats.

Key Features

πŸ”¬ Deep Research - Conduct thorough research on any topic with AI-assisted analysis and synthesis
πŸ” Multi-Source Research - Seamlessly integrate documents, URLs, and manual text into unified research spaces
πŸ€– AI-Powered Summaries - Generate comprehensive, well-structured research analyses using advanced AI agents
πŸ’¬ Interactive Q&A - Chat with your research materials using natural language queries
🎧 Audio Overviews - AI-generated podcast-style audio summaries with multiple voices
❓ Smart FAQ Generation - Automatically create relevant FAQs from your research content
🧠 Visual Mindmaps - Generate interactive, hierarchical mindmaps to visualize research structure and connections
🌐 Global Web Access - Bypass geo-restrictions and bot detection using Bright Data's infrastructure

πŸ” Detailed Feature Overview

πŸ”¬ Deep Research
DecipherIt's AI agents conduct comprehensive research by strategically planning data collection, discovering diverse sources through Bright Data's global search capabilities, and synthesizing information from multiple perspectives. The system can research any topic from current events to academic subjects, providing thorough analysis that rivals human researchers.

πŸ” Multi-Source Research
Users can combine various input types in a single research project: upload documents (PDF, DOCX, PPTX, XLSX), add custom URLs for specific web content, input manual text for direct analysis, or simply enter topics for AI-driven discovery. All sources are processed and integrated into a unified research space.

πŸ€– AI-Powered Summaries
Specialized CrewAI agents work together to create comprehensive research summaries. The Research Analyst synthesizes information from all sources, while the Content Writer crafts engaging, well-structured analyses that highlight key insights, trends, and connections across the research material.

πŸ’¬ Interactive Q&A
Using vector embeddings and semantic search through Qdrant database, users can ask natural language questions about their research content. The system provides contextual answers by retrieving relevant information from all processed sources, enabling deep exploration of the research material.

🎧 Audio Overviews
On-demand feature that transforms research into engaging podcast-style audio content. The Podcast Script Generator agent creates conversational scripts, which are then converted to high-quality audio using LemonFox TTS with multiple AI voices, making research accessible in audio format.

❓ Smart FAQ Generation
AI agents automatically analyze research content to generate relevant, insightful questions and comprehensive answers. This feature helps users understand key aspects of their research topic and provides quick access to important information.

🧠 Visual Mindmaps
The Mindmap Creator agent analyzes research structure to generate interactive, hierarchical visualizations with up to 5 levels of depth. Built with react-mindmap-visualiser, these mindmaps help users understand complex topics at a glance and navigate research relationships visually.

🌐 Global Web Access
Powered by Bright Data's MCP Server, DecipherIt bypasses geo-restrictions and bot detection to access content from anywhere in the world. This ensures comprehensive research coverage and access to diverse, authoritative sources that traditional scraping methods cannot reach.

Demo

πŸš€ Live Demo: https://decipherit.xyz

Demo Credentials:

πŸ“‚ GitHub Repository:

GitHub logo mtwn105 / decipher-research-agent

Turn topics, links, and files into AI-generated research notebooks β€” summarize, explore, and ask anything.

πŸ” DecipherIt - AI-Powered Research Assistant

DecipherIt Logo

Transform your research process with AI-powered intelligence

Next.js React Python FastAPI TypeScript

πŸš€ Live Demo β€’ πŸ› οΈ Installation β€’ 🀝 Contributing


✨ Overview

DecipherIt is a cutting-edge AI-powered research assistant inspired by Google NotebookLM that revolutionizes how researchers, students, and professionals explore, analyze, and synthesize information from the web. The platform transforms any combination of documents, URLs, or topics into comprehensive research notebooks complete with AI-generated summaries, interactive Q&A capabilities, audio overviews, visual mindmaps, and automatically generated FAQs.

The Problem DecipherIt Solves

Traditional research is time-consuming and fragmented. Researchers often struggle with:

  • Information Overload: Sifting through countless sources manually
  • Geo-restrictions: Unable to access content from different regions
  • Bot Detection: Getting blocked when trying to scrape valuable data
  • Synthesis Challenges: Difficulty connecting insights across multiple sources
  • Accessibility: Converting research into different formats for various audiences

DecipherIt addresses these challenges by leveraging Bright Data's MCP Server…

Video Demo

πŸ“Ί Watch DecipherIt in action:

The video demonstrates:

  • Setting up a new research notebook
  • Adding multiple sources (URLs, documents, text)
  • AI-powered research and analysis process
  • Exploring generated summaries and insights
  • Using interactive features like Q&A and mindmaps
  • Generating audio overviews

Screenshots

DecipherIt-LandingPage

DecipherIt-Dashboard

DecipherIt-Create-New-Notebook

DecipherIt-Create-New-Notebook-2

DecipherIt-Notebook

DecipherIt-Notebook-Summary

DecipherIt-Notebook-Audio

DecipherIt-Notebook-Chat

DecipherIt-Notebook-FAQ

DecipherIt-Notebook-Mindmap

How It Works

  1. Input Your Research Sources: Enter any topic, upload documents, add custom URLs, or input manual text
  2. AI Planning: The system creates a strategic research plan using specialized AI agents
  3. Web Discovery: Bright Data's search engine finds relevant sources globally
  4. Intelligent Scraping: Bright Data extracts content and converts it to clean markdown format
  5. AI Analysis: Multiple AI agents analyze, synthesize, and create comprehensive summaries
  6. Multi-Format Output: Get research summaries, FAQs, visual mindmaps, and podcast-style audio overviews

Tech Stack

Frontend

  • Next.js 15 with App Router
  • React 19 with concurrent features
  • TypeScript 5 for type safety
  • Tailwind CSS 4 for styling
  • Shadcn/ui component library
  • Better Auth for authentication
  • react-mindmap-visualiser for interactive mindmap visualization

Backend

  • Python 3.12 with FastAPI
  • CrewAI for multi-agent orchestration
  • Bright Data MCP Server for web access
  • Qdrant vector database for semantic search
  • SQLAlchemy with PostgreSQL
  • LemonFox TTS for audio generation

AI & ML Services

  • Google Gemini via OpenRouter for LLM capabilities
  • OpenAI Embeddings for semantic search
  • MarkItDown for document processing

CrewAI Crews Overview

DecipherIt employs a sophisticated multi-crew architecture powered by CrewAI:

Planning Crew

  • Agent: Web Scraping Strategy Expert
  • Task: Generate 3 targeted search queries for comprehensive topic coverage

Link Discovery Crew

  • Agent: Link Discovery Specialist
  • Task: Find authoritative sources using Bright Data search engine

Web Scraping Crew

  • Agent: Expert Web Scraping Engineer
  • Task: Extract clean markdown content from URLs using Bright Data scraper

Research Analysis Crew

  • Agent: Senior Research Analyst
  • Task: Synthesize multi-source data into comprehensive research insights

Content Creation Crew

  • Agents: Research Analyst + Content Writer
  • Tasks: Create engaging blog posts + Generate 10 detailed FAQs

Audio Overview Crew

  • Agents: Research Analyst + Conversation Planner + Script Writer
  • Tasks: Analyze content + Plan conversation + Generate 4-5 minute podcast transcript

Mindmap Generation Crew

  • Agents: Content Analyzer + Mindmap Creator
  • Tasks: Identify hierarchical themes + Build interactive visualizations (up to 5 levels)

Chat Response Crew

  • Agent: Decipher (Analytical Assistant)
  • Task: Answer questions using vector search and chat context with source citations

Architecture

Decipher-It Architecture

How I Used Bright Data's Infrastructure

Bright Data's MCP (Model Context Protocol) Server is the backbone of DecipherIt's web access capabilities. Here's how I integrated it:

1. Official MCP Server Integration

from mcp import StdioServerParameters
from crewai_tools import MCPServerAdapter

server_params = StdioServerParameters(
    command="pnpm",
    args=["dlx", "@brightdata/mcp"],
    env={
        "API_TOKEN": os.environ["BRIGHT_DATA_API_TOKEN"],
        "BROWSER_AUTH": os.environ["BRIGHT_DATA_BROWSER_AUTH"]
    },
)
Enter fullscreen mode Exit fullscreen mode

2. Two Core Tools Implementation

Search Engine Tool - For discovering relevant sources:

web_scraping_link_collector_tools = [
    tool for tool in tools if tool.name in ["search_engine"]
]
Enter fullscreen mode Exit fullscreen mode

Scrape as Markdown Tool - For extracting clean content:

web_scraping_tools = [
    tool for tool in tools if tool.name in ["scrape_as_markdown"]
]
Enter fullscreen mode Exit fullscreen mode

3. Multi-Agent Workflow

I created specialized CrewAI agents that leverage Bright Data's tools:

  • Link Collector Agent: Uses search_engine to find relevant sources based on research topics
  • Web Scraper Agent: Uses scrape_as_markdown to extract clean, structured content from discovered URLs

4. Parallel Processing for Scale

# Execute multiple scraping tasks in parallel
web_scraping_tasks = []
for link in links:
    web_scraping_tasks.append(
        web_scraping_crew.kickoff_async(inputs={
            "url": link.url,
            "current_time": current_time,
        })
    )

web_scraping_results = await asyncio.gather(*web_scraping_tasks)
Enter fullscreen mode Exit fullscreen mode

6. Data Processing & AI Integration

Here's how we process the scraped data and integrate it with our AI agents:

# Process scraped content for AI analysis
async def integrate_scraped_data(web_scraping_results, links):
    scraped_data = []

    # Extract clean content from Bright Data results
    for link, result in zip(links, web_scraping_results):
        scraped_data.append({
            "url": link.url,
            "content": result.raw,  # Clean markdown format
            "title": link.title
        })

    # Create vector embeddings for semantic search
    embeddings = await create_embeddings(scraped_data)

    # Store in Qdrant vector database
    await store_in_vector_db(embeddings, scraped_data)

    # Trigger AI analysis crew
    research_result = await research_content_crew.kickoff_async(inputs={
        "scraped_data": scraped_data,
        "current_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    })

    return research_result
Enter fullscreen mode Exit fullscreen mode

Frontend integration with React and TypeScript:

// Research hook for managing AI-powered research
const useResearch = () => {
  const [isLoading, setIsLoading] = useState(false);
  const [research, setResearch] = useState<Research | null>(null);

  const startResearch = async (sources: ResearchSource[]) => {
    setIsLoading(true);
    try {
      const response = await fetch("/api/research", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ sources }),
      });

      const result = await response.json();
      setResearch(result);
    } catch (error) {
      console.error("Research failed:", error);
    } finally {
      setIsLoading(false);
    }
  };

  return { research, isLoading, startResearch };
};
Enter fullscreen mode Exit fullscreen mode

5. Seamless Integration with AI Pipeline

The scraped data from Bright Data flows seamlessly into DecipherIt's multi-layered AI processing system:

Immediate Processing:

  • Vector embeddings created using OpenAI embeddings and stored in Qdrant for semantic search capabilities
  • Contextual analysis by Research Analyst agents to synthesize information from multiple sources
  • Automatic FAQ generation by analyzing content patterns and extracting key insights

On-Demand Generation:

  • Audio script creation when users request podcast-style overviews, processed by specialized TTS agents
  • Mindmap structure analysis for hierarchical visualization when users want visual representations
  • Interactive Q&A responses powered by vector similarity search through processed content

Performance Improvements

Real-time web data access through Bright Data's infrastructure dramatically enhanced DecipherIt's AI system performance compared to traditional static data approaches:

πŸš€ Key Improvements

Real-Time Information Access: Unlike AI systems limited by training data cutoffs, DecipherIt's agents access current information including breaking news, latest research papers, and up-to-date statistics.

Global Content Discovery: Bright Data's search engine enables AI agents to discover diverse perspectives from global sources, access region-specific content, and find specialized publications that static systems cannot reach.

Clean Data Processing: The Bright Data's tool provides structured, clean content that AI agents process more effectively, improving analysis accuracy and reducing noise.

Future Enhancements

  • Interactive Mindmap: Enhanced mindmap features with expandable nodes, custom styling, and export options
  • Email Notifications: Automated email alerts for research completion and important updates
  • Mobile App: Native mobile experience for research on-the-go
  • More Robust Retry Mechanism: Improved background task handling with intelligent retry logic
  • Live Status Updates: Real-time agent activity monitoring and progress tracking
  • More Complex Scraping: Advanced web scraping capabilities for dynamic content and complex sites
  • Social Sign-ins: Integration with Google, GitHub, and other social authentication providers

Conclusion

DecipherIt demonstrates the power of combining Bright Data's robust web access infrastructure with advanced AI agents. By leveraging the Bright Data MCP Server, we've created a research assistant that can access, analyze, and synthesize information from across the global web without the typical limitations of traditional scraping methods.

The platform showcases how real-time web data access can dramatically improve AI system performance, making research faster, more comprehensive, and more reliable than ever before.

Special thanks to DEV.to and Bright Data for organizing this amazing hackathon that made DecipherIt possible! The opportunity to build with Bright Data's powerful infrastructure has been invaluable in bringing this project to life.

Dev Diairies image

User Feedback & The Pivot That Saved The Project β†ͺ️

We’re following the journey of a dev team building on the Stellar Network as they go from hackathon idea to funded startup, testing their product in the real world and adapting as they go.

Watch full video πŸŽ₯

Top comments (0)

Tiger Data image

🐯 πŸš€ Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

πŸ‘‹ Kindness is contagious

Dive into this thoughtful piece, beloved in the supportive DEV Community. Coders of every background are invited to share and elevate our collective know-how.

A sincere "thank you" can brighten someone's dayβ€”leave your appreciation below!

On DEV, sharing knowledge smooths our journey and tightens our community bonds. Enjoyed this? A quick thank you to the author is hugely appreciated.

Okay