Forem: Shashank Chakraborty

Focus-Roast: Unleashing AI-Powered Shame to Conquer Procrastination

Shashank Chakraborty — Tue, 13 Jan 2026 22:06:19 +0000

Focus-Roast: Unleashing AI-Powered Shame to Conquer Procrastination

Tired of politely ignored site blockers? We've all been there. Most productivity tools offer gentle nudges or a serene blank screen when you stray, making it all too easy to bypass them and dive back into the rabbit hole of distractions.

Enter Focus-Roast: a novel, open-source approach to productivity that leverages artificial intelligence to ruthlessly bully you into staying on task. This isn't your grandma's site blocker; it's an uncompromising digital drill sergeant designed to make procrastination an uncomfortably public and embarrassing experience.

Figure 1: A mock-up of Focus-Roast in action, displaying a context-aware roast.

😈 Core Functionality & Architecture

Focus-Roast operates as a two-part system: a Python backend handles the AI heavy lifting, and a Chrome extension acts as the vigilant gatekeeper, intercepting your browsing habits. This architecture ensures your browsing data remains entirely local and private.

Key Features:

Context-Aware Roasts via Gemini AI: At its heart, Focus-Roast integrates with Google's Gemini AI. When you attempt to access a blacklisted site, the extension sends your predefined goal (e.g., "Study Calculus") and the offending URL (e.g., "instagram.com") to the backend. The AI then generates a savage, personalized insult tailored to your specific transgression.
Audio Guilt 🔊 (Text-to-Speech): Not only does the roast appear on screen, but it's also immediately converted to speech and played aloud. Imagine trying to surreptitiously scroll through social media in a quiet office or library while Focus-Roast declares your "oxygen thief" status.
Gallery of Disappointment (Visual Deterrence): To amplify the guilt, the system displays random GIFs featuring famously disappointed figures (e.g., Gordon Ramsay, characters from The Office), providing a constant visual reminder of your failure to focus.
The "Walk of Shame" 🚶‍♂️ (Cognitive Friction): There's no quick "Close" button. To regain access to the distracting site, you are forced to manually type the phrase: "I am weak and lazy". This deliberate act of self-admission creates significant cognitive friction, making you consciously confront your procrastination.
Shame Rank System 📉 (Gamified Accountability): Focus-Roast keeps a persistent tally of your failures, categorizing your "shame rank":
- 0-2 attempts: Safe... for now.
- 3-5 attempts: Certified Clown 🤡
- 10+ attempts: Oxygen Thief 💀

🛠️ Local Setup: Running Your Personal Roaster

Focus-Roast is designed to run entirely locally, giving you full control over your data and eliminating any third-party server dependencies.

1. The Brain (Backend) 🧠

The backend is built with Python, leveraging uvicorn and FastAPI to serve the AI roasting logic.

# Clone the repository
git clone https://github.com/YOUR_USERNAME/focus-roast.git
cd focus-roast/backend

# Install Python dependencies
pip install -r requirements.txt

# Configure your Gemini API Key
# You'll need a free API key from Google AI Studio (aistudio.google.com).
# Create a .env file in the 'backend' directory with your key:
echo "GEMINI_API_KEY=AIzaSy..." > .env

# Start the roasting server
# The --reload flag enables live reloading during development.
uvicorn main:app --reload

Ensure your GEMINI_API_KEY is correctly set in the .env file. This key is crucial for the AI roast generation.

2. The Trap (Chrome Extension) 🌐

The frontend is a standard Chrome extension that monitors your browsing and communicates with your local backend.

Open Chrome Extensions Manager: Navigate to chrome://extensions in your Chrome browser.
Enable Developer Mode: Toggle the "Developer mode" switch, usually found in the top-right corner.
Load Unpacked Extension: Click the "Load unpacked" button.
Select Extension Directory: Browse to and select the focus-roast/extension folder from your cloned repository.

The Focus-Roast icon should now appear in your browser's toolbar.

🚀 How to Play (and Get Roasted)

Click the Extension Icon: Locate and click the Focus-Roast icon in your Chrome toolbar.
Enter Your Goal: In the pop-up, clearly state your current productivity goal (e.g., "Finish my Resume," "Prepare for API interview").
Click "Lock In 🔒": This activates Focus-Roast.
Test Your Willpower: Attempt to open a distracting site like Twitter, Instagram, Reddit, or YouTube.
Brace for Impact: Experience the full force of AI-powered disappointment.

Why Local? Privacy and Control

One of the primary design decisions for Focus-Roast was to ensure complete privacy. By running the backend locally, your browsing habits and AI interactions never leave your machine. There are no external servers logging your activity, providing peace of mind alongside unparalleled productivity enforcement.

License

This project is released under the MIT License. Feel free to fork it, modify it, and contribute to making it even more potent (or, dare I say, even meaner). Your pull requests for new roast categories, improved AI prompts, or additional shame mechanisms are highly encouraged!

Tags: #productivity #ai #chrome-extension #python #gemini-api #fastapi #open-source #tutorial

🔗 https://github.com/Shashank0701-byte/roast-me

Tube-Code: Instantly Extract...

Shashank Chakraborty — Mon, 12 Jan 2026 16:50:38 +0000

Tube-Code: Instantly Extract Code from YouTube Videos with AI

Learning to code from YouTube tutorials is incredibly popular, but it comes with a hidden tax: the constant need to pause the video, switch tabs, and manually transcribe code snippets. This "Context Switching Tax" breaks your focus, slows down your learning, and frankly, is a frustrating waste of time. What if you could get runnable code directly from a video's transcript, without lifting a finger (or pressing the pause button)?

Enter Tube-Code.

Tube-Code is a local-first browser extension that leverages AI to extract runnable code snippets directly from YouTube video transcripts. It's designed to eliminate friction, keep you in your learning flow, and give you the code you need, right when you need it.

The Problem: The "Context Switching Tax"

Developers are highly visual learners, and video tutorials have become an indispensable resource. However, this learning method introduces a significant bottleneck:

Lost Focus: Every time you pause a video to type code, you disrupt your concentration.
Time Waste: Studies (or anecdotal evidence, in this case!) suggest developers spend a significant portion of their learning time simply transcribing code. Manually retyping code is prone to errors and incredibly inefficient.
Broken Flow: The constant back-and-forth between video and IDE shatters the immersive learning experience.

This constant context switching leads to frustration and slows down the overall learning process, making it harder to internalize new concepts.

The Solution: Seamless Code Extraction with Tube-Code

Tube-Code tackles this problem head-on by providing an intelligent, automated way to extract code. Imagine a native-feeling button on your YouTube video player that, with a single click, places the relevant code snippet directly into your clipboard. That's Tube-Code.

It achieves this by running a lightweight Python server on your local machine, designed to:

Intercept the Video ID: The Chrome extension detects the current YouTube video and sends its ID to your local server.
Fetch Raw Transcript: Using yt-dlp, the server fetches the raw transcript for the video. Crucially, this happens from your local machine, bypassing cloud IP blocks and bot detection that often hinder direct API access.
Process with AI: The raw text is then fed into Google's Gemini 1.5 Flash model. This powerful AI identifies potential code snippets, cleans them up by removing conversational filler, speaker tags, and other non-code text, and formats them for immediate use.
Return Runnable Code: The clean, formatted code is then sent back to your browser extension, ready to be copied to your clipboard with a single click.

Tube-Code Architecture Breakdown

Tube-Code employs a client-server architecture, keeping processing local for speed, privacy, and reliability.

Client (The Eyes & Hands): A Chrome Extension (Manifest V3) written in JavaScript. It injects a native-feeling button directly into the YouTube UI, handles user interaction, and communicates with the local server.
Server (The Brain): A FastAPI application (Python) running on your local machine. This is the core logic engine, handling requests from the extension, coordinating transcript retrieval, and processing with AI.
Extraction (The Scraper): yt-dlp. This robust, open-source command-line program is an essential component for reliably fetching YouTube video transcripts, even in scenarios where direct API calls might be blocked.
Intelligence (The Editor): Google Gemini 1.5 Flash. This powerful large language model is responsible for the heavy lifting of Natural Language Processing (NLP), distinguishing code from conversational text, and ensuring the extracted snippets are clean and runnable.

The beauty of this local-first approach is enhanced privacy (your video history isn't sent to a third-party server) and resilience against platform changes, as yt-dlp is constantly maintained to bypass detection.

Getting Started: Install Tube-Code Today

Ready to supercharge your YouTube learning? Follow these simple steps to get Tube-Code up and running on your machine.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.10+: Download and install from python.org.
Google Gemini API Key (Free Tier Available): You can obtain a free API key from the Google AI Studio.

1. Setup the Backend (The Brain)

This part involves cloning the repository, installing dependencies, and starting the local FastAPI server.

# Clone the repository to your local machine
git clone https://github.com/YOUR_USERNAME/tube-code.git # Replace YOUR_USERNAME with the actual repo path
cd tube-code/backend

# Create a Python virtual environment (highly recommended)
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`

# Install necessary Python dependencies
pip install -r requirements.txt

# Configure your Google Gemini API Key
# Create a .env file in the `backend` directory
# and add your API key as shown below:
echo "GEMINI_API_KEY=AIzaSy..." > .env
# Make sure to replace "AIzaSy..." with your actual Gemini API Key!

# Start the FastAPI server
# The --reload flag will automatically restart the server on code changes
uvicorn main:app --reload

Leave this terminal window open; the server needs to be running for the extension to work.

2. Install the Extension (The Eyes)

Now, let's get the browser extension installed in Chrome.

Open Chrome and navigate to chrome://extensions. You can type this directly into your address bar.
Enable Developer Mode: In the top-right corner, toggle the "Developer mode" switch to ON.
Load the Extension: Click the "Load unpacked" button (usually on the top-left).
Select Extension Directory: Navigate to the directory where you cloned Tube-Code earlier. Select the extension subdirectory (e.g., tube-code/extension).

That's it! You should now see the Tube-Code extension listed.

How to Use Tube-Code

Navigate to any YouTube video that contains code tutorials.
A new "Get Code" button should appear near the video player (exact placement might vary based on YouTube UI updates).
Click the "Get Code" button.
Tube-Code will process the transcript. Once complete, the extracted code will be copied directly to your clipboard.
Paste into your IDE and continue learning without interruption!

Conclusion

Tube-Code is more than just a tool; it's a productivity enhancer designed to make learning from YouTube tutorials a smoother, more focused experience. By leveraging the power of local processing and advanced AI, we can reclaim valuable learning time and reduce the friction of manual transcription.

Give Tube-Code a try and say goodbye to the "Context Switching Tax"!

Feel free to contribute to the project, report issues, or suggest new features on the GitHub repository. Happy coding!

🔗 https://github.com/Shashank0701-byte/tube-code-backend?tab=readme-ov-file

Building Interview prep ai

Shashank Chakraborty — Sun, 11 Jan 2026 06:59:30 +0000


markdown
# Building Interview Prep AI: A Deep Dive into Smart Learning and AI Feedback

## Introduction: Revolutionizing Technical Interview Preparation

The journey to landing a dream role in tech often hinges on mastering the technical interview. Traditional preparation methods—static flashcards, rote memorization, or generic practice questions—often fall short, failing to replicate the dynamic, pressure-filled environment of a real interview. This challenge sparked the idea for **Interview Prep AI**, a platform designed not just to help you *learn* the answers, but to *master* communicating them effectively.

This project was more than just a coding exercise; it was a deep dive into building an intelligent learning system. We aimed to create a comprehensive, feedback-driven ecosystem that empowers tech professionals to confidently articulate their knowledge. From leveraging AI for content generation to implementing sophisticated spaced repetition algorithms and real-time voice analysis, Interview Prep AI represents a modern approach to a timeless problem.

**✨ Experience it Live! ✨**
[**View the Live Application**](https://interview-prep-karo.netlify.app/)

---

## See Interview Prep AI in Action

Let's start with a quick visual tour of Interview Prep AI's core functionality. The following GIF showcases the end-to-end user journey: from efficiently creating a personalized interview deck to engaging in a review session, and finally, practicing aloud with intelligent feedback.

![Interview Prep AI Showcase GIF](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExaGc5c2R0dGZ5a2ZqZTV3cjV2c2w5eW5ocnZtZzB6Z2w0bHk2aW5oZiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/your-gif-id/giphy.gif)

---

## Core Features: A Smarter, AI-Powered Workflow

Interview Prep AI is built around several key features, each designed to optimize the learning process and enhance interview readiness.

### 🤖 Build Custom Decks in Seconds

One of the biggest hurdles in interview preparation is knowing *what* to study. Interview Prep AI solves this by intelligently generating highly relevant question decks.

*   **Job Description Parsing:** Simply paste a link to a job description from any platform (LinkedIn, Indeed, etc.). Our AI analyzes the job requirements, responsibilities, and desired skills, then automatically generates a tailored set of interview questions specific to that role.
*   **Manual Deck Creation:** For those who prefer a hands-on approach, you can also manually build your own decks, adding questions and answers directly to suit your specific needs.
*   **Instant Relevance:** This feature ensures your study time is hyper-focused on what hiring managers are actually looking for, cutting down on generic, irrelevant prep.

### 🧠 Learn with Spaced Repetition (SRS)

At the heart of effective long-term learning is Spaced Repetition. Interview Prep AI integrates a sophisticated SRS algorithm to optimize your memory retention.

*   **Intelligent Scheduling:** Based on your performance during review sessions (how well you recall the answer), the system dynamically adjusts the intervals between your future reviews for each question.
*   **Long-Term Memory:** This scientifically-backed approach helps move information from your short-term to long-term memory, ensuring that you don't just "cram" but truly understand and retain the material.
*   **Efficiency:** By prioritizing questions you struggle with and giving more time to those you know well, the SRS maximizes the efficiency of your study time.

### 🎙️ Practice Aloud, Get Real Feedback

Knowing an answer is one thing; articulating it clearly and confidently is another. Interview Prep AI addresses this critical aspect with AI-powered voice practice.

*   **Voice-Enabled Practice:** Speak your answers aloud, simulating a real interview scenario.
*   **Instant AI Feedback:** Our AI analyzes your spoken response (via speech-to-text and subsequent NLP processing) and provides immediate, constructive feedback on several key dimensions:
    *   **Content Accuracy & Relevance:** Does your answer directly address the question? Is it technically sound?
    *   **Clarity & Conciseness:** Are you getting to the point? Is your explanation easy to follow?
    *   **Completeness:** Have you covered all necessary aspects of the question without rambling?
    *   **Confidence & Fluency (Inferred):** While not a direct measure of tone, the AI can highlight areas where phrasing might be awkward or where a more direct approach could improve perceived confidence.
*   **Iterative Improvement:** This real-time feedback loop allows you to refine your communication style, improve your articulation, and build confidence with each practice session.

---

## Conclusion

Interview Prep AI is designed to be more than just a study tool; it's a personal AI interview coach that adapts to your learning style and helps you bridge the gap between knowing the material and confidently presenting it. By combining intelligent content generation, spaced repetition, and real-time voice feedback, we aim to transform how tech professionals approach their most critical interviews.

Give it a try and let us know your thoughts! Your feedback is invaluable as we continue to evolve this platform.

🔗 https://github.com/Shashank0701-byte/interview-prep

Building promobot: # From Code to Content: Buildi...

Shashank Chakraborty — Sat, 10 Jan 2026 16:38:58 +0000

From Code to Content: Building PromoBot, an AI-Powered Marketing Agent for Devs

As developers, we pour our hearts and souls into building amazing projects. But once the code is written, a new, often dreaded, challenge arises: marketing. Crafting unique launch announcements for Reddit, Dev.to, Twitter (X), and Peerlist – each with its own tone and audience – can be a time-consuming and demotivating chore.

What if your project could market itself? What if your README.md was all it took to kick off a multi-platform launch campaign tailored to each audience?

That's the problem I set out to solve with PromoBot.

What is PromoBot? The Autonomous Marketing Agent

PromoBot is an autonomous "Marketing Agent" designed specifically for developers. It bridges the gap between your codebase and your community by automatically generating and publishing launch posts across multiple platforms.

Here's the magic:

Code to Content: PromoBot reads your project's source code or, more commonly, your README.md.
AI-Powered Tone Adaptation: Leveraging Google Gemini, it crafts unique content tailored to the specific tone and style required by each platform.
Multi-Platform Publishing: It then publishes these posts to Reddit, Dev.to, Twitter (X), and Peerlist.
API-less Integration: For platforms without robust public APIs (like X and Reddit), PromoBot employs Browser Automation (Playwright) to seamlessly interact with the web interface.

Essentially, PromoBot turns your project's documentation into a dynamic marketing campaign, freeing you up to focus on what you do best: building.

Under the Hood: PromoBot's Event-Driven Architecture

PromoBot is built with an Event-Driven, Microservices-like architecture, allowing for modularity, scalability, and resilience. Let's break down its core components:

🧠 The Brain: Content Generation with Google Gemini

At the heart of PromoBot's intelligence is Google Gemini 1.5 Flash. Via a custom REST client, Gemini receives the project context (from your README.md) and platform requirements. It's responsible for:

Understanding Project Context: Analyzing the provided text to grasp the project's purpose, features, and target audience.
Tone Adaptation: Dynamically adjusting the content's voice, style, and structure to match the distinct requirements of Reddit, Dev.to, X, and Peerlist.
Drafting Posts: Generating the initial draft of the launch post for each platform.

🦾 The Hands: Browser Automation with Playwright

For platforms like Reddit, X, and Peerlist that either lack robust public APIs or actively restrict automated access, PromoBot uses Playwright (Python). Playwright acts as PromoBot's "hands," simulating real user interactions in a headless browser environment. This allows it to:

Navigate Websites: Log in, find submission forms, and interact with UI elements.
Bypass Restrictions: Implement strategies to mimic human browsing behavior, crucial for avoiding bot detection.
Publish Content: Paste the AI-generated content and submit posts programmatically.

⚡ The Nerves: Asynchronous Task Queues with Redis & Celery

Publishing to multiple platforms, especially those involving browser automation, can be time-consuming and prone to transient failures. Redis (as a message broker) and Celery (as an asynchronous task queue) form PromoBot's "nerves," ensuring reliable execution:

Decoupled Operations: Each publishing task (e.g., "post to Reddit," "post to X") is an independent Celery task.
Reliable Delivery: If a task fails (e.g., network error, platform temporary issue), Celery can retry it.
Concurrency: Multiple tasks can run in parallel, speeding up the overall campaign.

🧠 The Memory: Campaign Tracking with PostgreSQL

To keep track of published campaigns, user sessions, and configurations, PromoBot utilizes PostgreSQL as its relational database. SQLAlchemy serves as the ORM, abstracting away raw SQL queries and providing an elegant way to interact with the database:

Session Management: Securely stores encrypted session cookies for universal authentication (explained below).
Campaign History: Logs details of each launch, including which platforms were targeted and post statuses.
Configuration: Stores platform-specific settings and user preferences.

🧩 The Pattern: Extensibility with the Strategy Pattern

To easily add support for new platforms in the future without modifying core logic, PromoBot employs the Strategy Pattern. This means:

Plugin-Based Architecture: Each platform (Reddit, X, Dev.to, Peerlist) is treated as a separate "strategy" or plugin.
Decoupled Logic: The main publishing orchestrator simply calls the publish method on the appropriate strategy, abstracting away platform-specific implementation details.
Easy Expansion: Adding a new platform merely requires implementing a new PublisherStrategy class.

Key Features That Make PromoBot Shine

Beyond its core architecture, PromoBot boasts several features designed to optimize your project launches:

Universal Auth: You log into Reddit/X/Peerlist just once. PromoBot securely saves the session state to the secrets/ directory, allowing subsequent runs to publish without re-authentication.
Context Aware: PromoBot is smart. It navigates to your current project folder, reads your README.md, and uses that as the primary source of truth for generating content. No need to copy-paste descriptions!
Stealth Mode: Publishing to social media platforms programmatically can trigger bot detection. PromoBot uses advanced browser flags and subtle interaction patterns via Playwright to bypass these mechanisms on X and Reddit, ensuring your posts go live.
Dynamic Tone Matching: This is where the AI truly shines.
- Reddit: A casual, humble, "I built this, what do you think?" vibe, often encouraging feedback.
- Dev.to: A technical, tutorial-style post, sharing insights into the build process or a specific problem solved.
- Twitter (X): Short, punchy, attention-grabbing tweets, heavily optimized for hashtags and quick consumption.
- Peerlist: A professional "Indie Hacker" tone, focusing on the problem solved, the tech, and the journey.

Tech Stack Overview

PromoBot is built with a modern, asynchronous, and robust Python stack:

Language: Python 3.11 (leveraging its async capabilities)
Frameworks: Celery for task management, SQLAlchemy for ORM.
Infrastructure: Docker Compose for easily spinning up Redis and PostgreSQL.
Automation: Playwright for headless browser control.

Usage: Getting Started with PromoBot

Ready to give PromoBot a spin? Once you have Docker Compose running Redis and Postgres, and your Python environment set up, using PromoBot is incredibly straightforward.

Navigate to the root directory of any project you want to promote, and simply run:

promobot all

This command triggers PromoBot to:

Read your current project's README.md.
Generate platform-specific content using AI.
Publish to all configured platforms (Reddit, Dev.to, X, Peerlist).

You can also run specific platform publishers if you only want to target certain channels.

Conclusion

PromoBot is more than just an automation tool; it's a strategic partner for developers who want to share their work effectively without the marketing overhead. By combining the power of AI with robust browser automation and an extensible architecture, PromoBot empowers you to turn your code into compelling content, tailored for every audience.

Want to contribute or learn more about the internals? Check out the GitHub repository (if this were real, I'd link it here!).

Happy coding, and happy promoting!

🔗 https://github.com/Shashank0701-byte/docuflow

How I Bypassed Google's Broken Python SDK to Build an AI Pipeline in Docker

Shashank Chakraborty — Fri, 09 Jan 2026 20:17:35 +0000

I spent the last week building DocuFlow, an event-driven data pipeline that automatically ingests PDF invoices and extracts structured financial data (Vendor, Date, Amount) using AI.

The architecture was solid:
Watcher Service to detect new files.
Redis & Celery for asynchronous task queues.
PostgreSQL for storage.
Google Gemini 2.5 Flash as the intelligence layer.

Everything worked perfectly on my local machine. But the moment I containerized it with Docker, everything crashed. 💥

Here is the story of how a "simple" SDK versioning error nearly killed the project, and how I fixed it by ripping out the library and going raw.

The Problem: "404 Model Not Found"
I was using the standard google-generativeai Python library. In my Dockerfile, I was installing the latest dependencies.

When I ran docker compose up, my worker service threw this error immediately:

Plaintext

Error: 404 models/gemini-1.5-flash is not found for API version v1beta, or is not supported for generateContent.
This made no sense. The model definitely exists. It worked on my laptop. Why was Docker failing?

The Root Cause
It turns out that Google is updating their GenAI SDKs so fast that version mismatches are common.

My Docker container was pulling a slightly different version of the SDK than my local environment.

The library was trying to hit a deprecated API endpoint (v1beta) that didn't recognize the newer gemini-2.5-flash model alias.

I tried upgrading to the newer google-genai library, but that introduced its own "library hell" with conflicting dependencies in my slim Docker image.

I was stuck in dependency hell. 📉

The Solution: The "Raw" Approach
Instead of fighting pip and version numbers, I realized I didn't need the SDK. Under the hood, the SDK is just making HTTP requests.

So, I fired the SDK. 🚫📦

I rewrote the extraction engine using Python's standard requests library to hit the Gemini REST API directly. This gave me 100% control over the endpoint and the payload.

The Code
Here is the robust, Docker-proof implementation:

Python

import os
import requests
import json

def parse_invoice_with_rest(text):
api_key = os.getenv("GEMINI_API_KEY")

# Direct URL to the stable endpoint
# Note: Using 'gemini-2.5-flash-latest' to ensure we get the specific model version
url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-latest:generateContent?key={api_key}"

# Constructing the payload manually
payload = {
    "contents": [{
        "parts": [{
            "text": f"Extract the Vendor, Date, and Total Amount from this OCR text: {text}. Return JSON only."
        }]
    }]
}

try:
    # Standard HTTP POST request
    response = requests.post(
        url, 
        headers={"Content-Type": "application/json"},
        json=payload
    )

    if response.status_code != 200:
        print(f"Error: {response.text}")
        return None

    return response.json()

except Exception as e:
    print(f"Connection failed: {e}")
    return None

Why this is better
Zero Dependency Hell: I don't care if Google updates their Python SDK tomorrow. As long as the REST endpoint exists, my code works.

Lighter Containers: I removed the heavy AI libraries from my requirements.txt, making my Docker image smaller.

Debuggability: When an error happens, I see the raw HTTP response code (400, 404, 500) instead of a cryptic Python stack trace.

The Result: Green Logs 🟢
After deploying the REST client, the pipeline processed the invoice in 1.8 seconds.

![Screenshot of green terminal logs showing successful extraction]

The full pipeline now runs smoothly in Docker Compose, handling OCR (Tesseract), queuing (Redis), and AI extraction without a single library conflict.

The Takeaway
If you are building AI agents in production—especially in containerized environments—don't be afraid to bypass the "official" SDKs. Sometimes, a simple curl or requests.post is the most robust engineering decision you can make.

Repo Link: https://github.com/Shashank0701-byte/docuflow

Let me know if you've faced similar "SDK Hell" with other AI providers! 👇