Forem: AskPaul

OSpark: The All-Round Free AI Assistant Powered by MiroMind’s Open-Source MiroFlow Framework

AskPaul — Mon, 08 Dec 2025 11:47:04 +0000

Today marks an exciting milestone in the world of accessible AI: the official launch of OSpark—a free, all-in-one AI assistant designed to simplify work, creativity, and daily life. What makes this release even more noteworthy? OSpark is built on the backbone of MiroMind’s open-source MiroFlow framework, leveraging the predictive and reasoning power of one of the world’s top AI research ecosystems to deliver a seamless, powerful user experience.

Why MiroMind’s MiroFlow Is the Engine Behind OSpark’s Excellence

Before diving into OSpark’s features, it’s critical to highlight the role of MiroMind—the trailblazing AI project founded by visionary entrepreneur Tianqiao Chen and led by AI scientist Jifeng Dai. MiroMind has already made waves in the industry: it topped the global FutureX benchmark (a rigorous test for AI predictive capabilities) for two consecutive weeks, outperforming even closed-source commercial models by accurately forecasting complex scenarios like ATP tennis rankings and cryptocurrency price tiers.

At the core of MiroMind’s success is MiroFlow—an open-source Agent framework built for tool-augmented reasoning, dynamic task scheduling, and long-haul research logic. MiroFlow isn’t just a “toolkit”; it’s a flexible, reproducible foundation that empowers developers to build AI agents that think, adapt, and deliver results. On the GAIA validation set (a leading benchmark for deep research AI), MiroFlow achieved an impressive 82.4 score—surpassing many commercial APIs and setting a high bar for open-source AI performance.

For OSpark, MiroFlow’s capabilities are transformative. It’s not just about “adding AI”—it’s about integrating a framework that ensures OSpark’s features (from smart searches to real-time weather insights) are accurate, responsive, and contextually relevant. MiroFlow’s ability to support multi-tool integration and extend any large language model (LLM) means OSpark can seamlessly tap into top-tier AI power (like ChatGPT, GPT-5, and Gemini 2.5 Flash) while remaining lightweight and user-friendly.

OSpark: What You Can Do with MiroMind-Powered AI

OSpark takes MiroFlow’s technical strength and packages it into a tool that anyone can use—no coding or AI expertise required. Let’s break down its standout features, all supercharged by MiroMind’s underlying framework:

1. All-in-One AI Assistant: Smart, Efficient, and Intuitive

Thanks to MiroFlow’s reasoning engine, OSpark turns chaotic information into clear answers. Whether you’re searching the web, summarizing long PDFs, or analyzing images, OSpark cuts through the noise:

Web/Doc Summarization: Condenses hours of reading into key takeaways (rivaling tools like Monica for speed and accuracy).
Image Recognition: Matches the precision of OpenAI’s GPT-5 Image and Google’s Gemini 2.5 Flash Image, letting you ask questions about photos or identify details in seconds.
Smart Search: Goes beyond basic results to deliver context-rich answers—perfect for work, study, or satisfying daily curiosity.

2. Real-Time Weather Guides: Predictive Insights for Your Day

MiroMind’s predictive DNA shines here. OSpark doesn’t just show you the temperature—it uses MiroFlow’s ability to process live meteorological data and generate tailored advice:

Get global weather updates in real time (e.g., 66°F and sunny in San Francisco, with 67% humidity).
Receive AI-driven tips: “Dress light, wear sunscreen, and take shade breaks—humidity will make 81°F feel warmer!”
Plan ahead with 6-day forecasts, all backed by MiroFlow’s reliable data integration and analysis.

3. AI-Driven Image Generator: Turn Ideas into Visuals

MiroFlow’s support for multi-modal AI lets OSpark’s image generator create stunning visuals in any style—from 3D Anime to Realism. Just describe what you want (e.g., “a surreal mini chef cooking inside an opened delivery box”), choose an aspect ratio (like 9:16 for social media), and hit “Generate.” No design skills needed—just your imagination, amplified by MiroMind’s creative AI power.

4. AI-Powered Podcast Creation: Turn Content into On-the-Go Audio

OSpark’s podcast tool leverages MiroFlow’s automation capabilities to convert URLs, YouTube videos, or long texts into polished audio in seconds. Here’s how it works:

Paste a link (e.g., a blog post or YouTube tutorial) or text.
Choose your language (English by default) and style (e.g., Single Narrator).
Let MiroFlow handle the rest—turning your favorite content into a podcast you can listen to during commutes or workouts.

5. Top-Tier LLMs: Unlimited Access, Totally Free

The best part? OSpark gives you unrestricted access to popular LLMs (ChatGPT, GPT-5, Gemini 3 Pro) at no cost. MiroFlow’s framework ensures these models run seamlessly, so you can chat, create, or problem-solve without subscriptions or hidden fees. It’s like having a premium AI toolkit—for free.

See OSpark in Action

Curious what OSpark looks like? Check out these product screenshots to visualize its clean, user-friendly interface:

Get OSpark Today—Powered by MiroMind’s Open-Source Innovation

OSpark isn’t just another AI app—it’s a testament to what happens when cutting-edge open-source technology (MiroMind’s MiroFlow) meets user-centric design. Whether you’re a student, professional, or creative, OSpark simplifies how you interact with AI—making powerful tools accessible to everyone.

Ready to experience MiroMind-powered AI for free? Download OSpark now:

👉 Download OSpark

With OSpark and MiroMind, the future of AI isn’t just advanced—it’s usable, affordable, and built for you.

MiroThinker v1.0: Revolutionizing Open-Source Research Agents Through Interactive Scaling

AskPaul — Wed, 19 Nov 2025 10:19:16 +0000

Breaking New Ground in AI Research Capabilities with Up to 600 Tool Calls Per Task

Published by MiroMind Team | November 2024

Introduction: The Dawn of Autonomous Research Intelligence

The landscape of artificial intelligence is witnessing a profound transformation. We're moving beyond static text generation toward dynamic, tool-augmented agents capable of conducting sophisticated research autonomously. The ability to formulate hypotheses, retrieve and verify evidence, and synthesize insights across diverse information sources represents a new frontier in AI capability—one that demands more than just linguistic fluency.

Proprietary systems like ChatGPT Agent and Claude Research have demonstrated near-human proficiency in literature review, comparative analysis, and reasoning-driven knowledge discovery. However, these systems remain closed, constraining transparency, reproducibility, and community-driven innovation. The open-source community has struggled to match their performance, facing limitations in model scale, context length, and interaction depth.

"MiroThinker v1.0 introduces a third dimension of scaling—interactive scaling—that enables sustained multi-turn reasoning through up to 600 tool calls per task within a 256K context window."

Enter MiroThinker v1.0, a groundbreaking open-source research agent that fundamentally reimagines how we approach AI research capabilities. Unlike previous approaches that focused solely on scaling model size or context length, MiroThinker explores interaction scaling as a third critical dimension of performance improvement.

Key Innovation: The Three Dimensions of Agent Scaling

MiroThinker v1.0 represents a paradigm shift by systematically addressing three complementary scaling dimensions:

1. Model Size Scaling

Built on the robust Qwen2.5 and Qwen3 foundations, MiroThinker is available in three variants to accommodate diverse computational budgets:

8B variant: Optimized for efficiency while maintaining strong performance
30B variant: Balanced performance-to-compute ratio for most applications
72B variant: State-of-the-art performance approaching commercial systems

2. Context Length Scaling

With a 256K context window, MiroThinker can maintain extensive conversation histories, complex reasoning chains, and comprehensive tool interaction records. This extended context enables the model to synthesize information across multiple documents and maintain coherent long-horizon planning.

3. Interactive Scaling (The Breakthrough)

The most revolutionary aspect of MiroThinker is its systematic training for deeper and more frequent agent-environment interactions. Unlike traditional LLM test-time scaling that operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories.

Performance Breakthrough: MiroThinker-72B achieves up to 81.9% on GAIA, 37.7% on Humanity's Last Exam, 47.1% on BrowseComp, and 55.6% on BrowseComp-ZH, surpassing previous open-source agents and approaching GPT-5-high performance.

Technical Deep Dive

ReAct Workflow Architecture

MiroThinker operates under the ReAct (Reasoning and Acting) paradigm, implementing a sophisticated iterative loop of reasoning, tool invocation, and observation. The model maintains a trajectory history and alternates between generating internal thoughts and executing structured tool calls until task completion.

The core workflow follows this pattern:

Think: Generate internal reasoning about the current state and next action
Act: Execute a structured tool invocation based on the reasoning
Observe: Process the tool response and update internal understanding
Repeat: Continue until the task is resolved or termination criteria are met

Comprehensive Tool Interface

Execution Environment

MiroThinker employs a Linux sandbox that provides isolated runtime for command and code execution. The agent can create sandbox instances and execute both shell commands and Python code within secure, controlled environments. This design ensures safe interaction with system-level resources while maintaining flexibility.

File Management

The system implements bidirectional file transfer capabilities, supporting:

Upload from local systems to sandbox environments
Download from sandbox to local storage
Direct retrieval of remote assets from URLs

Information Retrieval

Two sophisticated retrieval tools power MiroThinker's research capabilities:

Google Search Integration: Returns structured search results for broad information gathering
Intelligent Web Scraping: Uses a lightweight LLM (Qwen3-14B) to extract task-relevant information from target URLs, serving as an efficient context management mechanism

Advanced Context Management

To maximize the efficiency of the 256K context window and enable up to 600 tool calls per task, MiroThinker implements two key strategies:

Recency-Based Context Retention

Rather than retaining all tool outputs (which would quickly overwhelm the context), the system preserves only the most recent tool responses while maintaining the complete sequence of thoughts and actions. This approach leverages the empirical observation that subsequent actions depend primarily on recent observations rather than distant ones.

Result Truncation

Long outputs from code execution and command tools are automatically truncated with clear indicators, preventing context overflow while preserving essential information.

"The recency-based retention strategy preserves reasoning traces while focusing attention on contextually relevant observations, freeing additional context for extended reasoning and deeper tool-use trajectories."

Data Construction: Building the MiroVerse Dataset

MultiDocQA Synthesis

The team developed an sophisticated pipeline that transforms interlinked web documents into complex, multi-hop QA pairs:

Document Corpus Construction: Diverse sources including Wikipedia and Common Crawl with preserved hyperlink structures
Knowledge Graph Creation: Connected subgraphs of related documents following internal hyperlinks
Fact Extraction: Key statements requiring cross-document reasoning
Constraint Obfuscation: Systematic transformation of facts into indirect constraints requiring deeper reasoning

Agentic Trajectory Synthesis

High-quality trajectory data generation through multiple complementary approaches:

Agent Paradigms: Both ReAct single-agent and MiroFlow multi-agent frameworks
Tool Invocation Methods: Traditional function calling and flexible Model Context Protocol (MCP)
Diverse Model Integration: Multiple leading LLMs including GPT-OSS and DeepSeek-V3.1

Three-Stage Training Pipeline

Stage 1: Agentic Supervised Fine-Tuning (SFT)

The foundation stage establishes fundamental agentic behaviors through imitation learning on expert trajectories. The model learns to mimic complex multi-hop reasoning and tool use patterns, with rigorous filtering to ensure trajectory quality and consistency.

Stage 2: Agentic Preference Optimization (DPO)

Direct Preference Optimization refines decision-making by learning from preference pairs. Crucially, the team avoided rigid structural constraints, instead focusing on answer correctness as the primary ranking criterion to prevent systematic biases.

Stage 3: Agentic Reinforcement Learning (GRPO)

The final stage employs Group Relative Policy Optimization with fully online policy training. This enables creative solution discovery and adaptation to diverse real-world environments through direct interaction and exploration. The system supports thousands of concurrent agentic rollouts with sophisticated reward design balancing correctness and format compliance.

Benchmark Results: Setting New Standards

MiroThinker v1.0 demonstrates exceptional performance across multiple challenging benchmarks, establishing new state-of-the-art results for open-source research agents:

Standout Achievements:

GAIA Benchmark: 81.9% accuracy, surpassing MiniMax-M2 by 6.2 percentage points

Humanity's Last Exam: 37.7% score, outperforming GPT-5-high by 2.5 points

BrowseComp: 47.1% accuracy, competitive with OpenAI DeepResearch

BrowseComp-ZH: 55.6% accuracy, setting new open-source records for Chinese benchmarks

The results demonstrate that MiroThinker not only leads among open-source alternatives but approaches and sometimes exceeds the performance of leading commercial systems while maintaining complete transparency and reproducibility.

Interactive Scaling: The Game-Changing Discovery

Perhaps the most significant finding from the MiroThinker research is the empirical validation of interactive scaling as a fundamental dimension of agent performance improvement. The analysis reveals that research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions.

"Interactive scaling exhibits behaviors analogous to model size and context length scaling, establishing it as a third critical dimension for building next-generation research agents."

Key insights from the interactive scaling analysis:

Consistent Improvement: Performance gains scale predictably with interaction depth across all benchmarks
Error Correction: Environment feedback enables trajectory refinement and error correction
Creative Exploration: Reinforcement learning drives discovery of novel solution paths
Sustained Reasoning: Extended interaction sequences maintain coherence and progress toward goals

The reinforcement learning-trained models exhibit substantially longer and deeper interaction trajectories compared to their supervised fine-tuning counterparts, with corresponding improvements in task performance. This demonstrates that the capacity for extended, meaningful interaction with environments is not just beneficial but essential for advanced research capabilities.

Implications for the Future of AI Research

MiroThinker v1.0 represents more than just another capable AI model—it establishes a new framework for thinking about agent capabilities and scaling laws. The discovery that interactive scaling constitutes a third fundamental dimension alongside model size and context length has profound implications for future research directions.

This breakthrough suggests that the path to human-level research capability may not require only larger models or longer contexts, but fundamentally different training approaches that emphasize iterative interaction with environments. The open-source nature of MiroThinker ensures that these advances can be studied, reproduced, and built upon by the entire research community.

The model's ability to perform sustained multi-turn reasoning through hundreds of tool calls opens new possibilities for autonomous research workflows, from literature review and hypothesis generation to experimental design and result analysis. As these capabilities mature, we may witness the emergence of AI systems that can genuinely contribute to scientific discovery and knowledge advancement.

Conclusion

MiroThinker v1.0 establishes a new paradigm for open-source research agents, demonstrating that it's possible to match and sometimes exceed the performance of proprietary systems while maintaining transparency and community accessibility. The introduction of interactive scaling as a third fundamental dimension of agent capability represents a conceptual breakthrough that will likely influence the direction of AI research for years to come.

By systematically addressing model size, context length, and interaction depth, MiroThinker proves that the gap between open-source and commercial AI capabilities can be closed through thoughtful engineering and innovative training approaches. The model's exceptional performance across diverse benchmarks, combined with its comprehensive tool suite and sophisticated context management, positions it as a valuable resource for researchers, developers, and organizations seeking advanced AI research capabilities.

Access MiroThinker v1.0

Online Demo: https://dr.miromind.ai

Code Repository: https://github.com/MiroMindAI/MiroThinker

Model Weights:

Dataset: MiroVerse v0.1

Paper: arXiv:2511.11793

🚀 MiroThinker v1.0 Launched: Pioneering a New Era of Interactive Scaling in AI Agents

AskPaul — Mon, 17 Nov 2025 08:41:04 +0000

November 13, 2025 marks an exciting milestone in AI research—the MiroMind AI team officially released MiroThinker v1.0, an open-source research agent. This isn't just another large language model release; it represents a groundbreaking new approach to enhancing AI agent capabilities.

🎯 Revolutionary Innovation: The Third Dimension of Scaling

Throughout AI's evolution, we've witnessed two primary dimensions of performance improvement: model parameter scale and context window length. MiroThinker v1.0 boldly introduces a third dimension—Interactive Scaling.

What is Interactive Scaling?

Interactive scaling is a systematic training methodology that teaches models to engage in deeper and more frequent interactions with their environment. Through environment feedback and external information acquisition, models can proactively correct errors and refine reasoning trajectories, significantly boosting task completion capabilities.

Think of it like training a researcher. It's not simply about having them memorize more knowledge (parameter scale) or read longer documents at once (context length), but rather teaching them how to efficiently use tools, search for information, verify facts, and continuously adjust their research strategy during exploration.

The experimental data is compelling: RL-tuned MiroThinker v1.0-30B exhibits far longer and deeper interaction trajectories than its SFT counterpart. Across all four major benchmarks, while supervised fine-tuning models terminate after just a few tool calls, the RL model performs extended multi-turn reasoning, exploration, and information verification. This behavioral shift yields 8-10 percentage point accuracy gains, clearly demonstrating the positive correlation between interaction depth and performance.

💪 Impressive Technical Specifications

MiroThinker v1.0 showcases remarkable technical capabilities:

Core Parameters

Context Window: 256K tokens (supporting ultra-long document processing and deep analysis)
Tool Call Capability: Up to 600 tool calls per task—a substantial breakthrough among open-source research agents
Multi-Scale Release: Available in 8B, 30B, and 72B parameter versions to accommodate different computational budgets and application scenarios

Model Name	Base Model	Max Length	Max Tool Calls
MiroThinker-v1.0-8B	Qwen3-8B	256K	600
MiroThinker-v1.0-30B	Qwen3-30B-A3B-Thinking-2507	256K	600
MiroThinker-v1.0-72B	Qwen2.5-72B-Instruct	256K	600

Why are 600 Tool Calls Important?

For complex research tasks, agents need to:

Search multiple information sources
Cross-verify data accuracy
Deep-dive into relevant literature
Synthesize analysis and draw conclusions

Support for 600 tool calls enables MiroThinker to conduct deep, multi-step information exploration and verification like a genuine researcher, without interruption due to call limitations.

📊 Outstanding Benchmark Performance

MiroThinker v1.0 demonstrates strong performance across multiple authoritative benchmarks:

HLE-Text (Humanity's Last Exam - Text): 37.7%
BrowseComp: 47.1% (approaching DeepResearch's 51.5%)
BrowseComp-ZH (Chinese Browse Comprehension): 55.6%
GAIA-Text-103: 81.9%

These results not only surpass previous open-source agents but, more importantly, significantly narrow the gap with commercial models like GPT-5-high. Notably, on the HLE-Text benchmark, MiroThinker even outperforms GPT-5-high by 2.5 percentage points!

🛠️ Real-World Application Scenarios

MiroThinker v1.0's capabilities make it ideal for numerous use cases:

1. Academic Research Assistant

In-depth literature reviews and information synthesis
Cross-disciplinary knowledge correlation analysis
Research hypothesis verification and data collection

2. Business Intelligence Analysis

Market trend research and competitive analysis
Multi-source information cross-verification
Strategic decision support

3. Technical Problem Solving

Complex technical documentation comprehension
Multi-step problem diagnosis
Solution exploration and evaluation

4. Content Creation Research

Deep background investigation
Fact-checking and citation verification
Multi-perspective information integration

🌟 The Power of Open Source

MiroThinker v1.0 is released under the MIT License, which means:

✅ Completely Free to Use: For both commercial and non-commercial projects
✅ Full Code Access: Learn, modify, and customize as needed
✅ Community-Driven Development: Contributions and collaboration welcome
✅ Transparent and Trustworthy: All technical details publicly available

This open approach not only democratizes AI technology but also provides invaluable learning and innovation opportunities for the research community.

🚀 Quick Start

Want to experience MiroThinker v1.0's powerful capabilities? Multiple options available:

Online Demo

Visit the official demo site: https://dr.miromind.ai/

Local Deployment

Access the GitHub repository for complete installation instructions:
👉 https://github.com/MiroMindAI/MiroThinker

Hugging Face Models

🔮 Looking Ahead

The release of MiroThinker v1.0 marks an important milestone in AI agent development. The introduction of interactive scaling as a third dimension of performance improvement points us toward more general and intelligent AI systems.

With the synergistic optimization of model scale, context length, and interaction depth, we have every reason to believe that future AI agents will handle increasingly complex, open-ended tasks, truly becoming powerful assistants in human research and creation.

The MiroMind AI team commits to continued R&D investment, continuously improving the MiroThinker series, and looks forward to collaborating with developers and researchers worldwide to advance this technology.

📞 Contact Information

If you have any questions or suggestions about MiroThinker v1.0, feel free to contact the MiroMind AI team through:

📧 Email: service@miromind.ai
💬 GitHub: https://github.com/MiroMindAI/
🎮 Discord: Join the Community
📱 WeChat: Search for MiroMind official account
📝 RedNote: Follow MiroMind

Reference Resources:

This is not just a model release—it's a pivotal step in the evolution of AI agents. Let's witness together how interactive scaling redefines the future of AI! 🎉

Building a Multi-Client Subscription Streaming Response System with MiroMind‘s MiroFlow

AskPaul — Tue, 11 Nov 2025 12:26:41 +0000

In the development of OSpark, a universal Agent product powered by MiroMind's open-source project MiroFlow, we encountered a critical challenge: enabling real-time, synchronized streaming responses across multiple clients while optimizing resource utilization. This article dives into the design and implementation of OSparkApi's multi-client subscription-based streaming response system, leveraging MiroFlow's flexible architecture to address the limitations of traditional single-client Server-Sent Events (SSE) solutions.

Background

Server-Sent Events (SSE) have become the backbone of real-time interaction in modern AI chat applications, enabling servers to push incremental updates to clients without constant polling. However, traditional SSE architectures face significant bottlenecks in two key scenarios:

Multi-client synchronization: When users access the same conversation across multiple endpoints (e.g., browser tabs, mobile apps, web interfaces), each client typically initiates a separate SSE connection, leading to redundant AI inference tasks and wasted computational resources.
Disconnection recovery: If a user exits a conversation page while an AI task is still running, re-entering the page should resume real-time updates seamlessly—without restarting the task or losing intermediate state.

To solve these pain points, we designed a streaming response system that supports multi-client subscription, task sharing, and reliable reconnection—all built on the foundation of MiroMind's MiroFlow framework.

Core Challenges

The design of a multi-client streaming system must address four critical requirements:

Resource reuse: Eliminate redundant AI inference by allowing multiple clients to share a single running task.
Event broadcasting: Ensure consistent, real-time updates across all subscribed clients for the same conversation.
Connection management: Gracefully handle client disconnections, reconnections, and cleanups without disrupting ongoing tasks.
Task lifecycle management: Properly create, run, cancel, and clean up tasks while accounting for subscription states.

Architecture Design

The system is composed of two core components—TaskManager and StreamHandler—working in tandem to manage tasks, subscriptions, and event distribution. This architecture decouples task execution from client communication, enabling flexible scaling and robust synchronization.

Core Components

1. TaskManager (Task Orchestrator)

The TaskManager serves as the central hub for managing asynchronous tasks and their subscribers. Its key responsibilities include:

Task registry: Maintaining a mapping of thread_id (conversation identifier) to active AI tasks.
Subscriber set management: For each thread_id, tracking a collection of subscriber queues (one per client connection).
Event broadcasting: Pushing task-generated events to all subscribed client queues.
Thread safety: Using asyncio.Lock to ensure safe concurrent access to shared data structures.

Core data structures:

Task registry: Dict[str, asyncio.Task] (maps thread_id to running tasks)
Subscriber collection: Dict[str, Set[asyncio.Queue]] (maps thread_id to subscriber queues)

2. StreamHandler (Connection & Subscription Manager)

The StreamHandler handles client connections, subscription logic, and event streaming. Its key functionalities include:

Task existence check: Verifying if a task for a given thread_id is already running.
Subscription mechanism: Allowing new clients to subscribe to existing tasks or initiate new tasks if none exist.
Event serialization & distribution: Converting task events to SSE-compatible format and streaming them to subscribed clients.

Workflows

The system’s behavior adapts to three primary user scenarios, ensuring seamless multi-client interaction and recovery.

Scenario 1: First Client Initiates a Request

A client sends a chat request (without an existing thread_id).
The StreamHandler checks with TaskManager and confirms no active task exists for the new conversation.
A new background task is created to execute AI inference (powered by MiroFlow’s asynchronous processing capabilities).
The background task generates events (e.g., partial chat responses, status updates) and pushes them to an event queue.
The StreamHandler streams events from the queue to the client in SSE format.
Once the first event is generated, the thread_id is dynamically extracted and registered in TaskManager alongside the running task.

Scenario 2: Subsequent Clients Subscribe to the Same Task

A client sends a request with an existing thread_id (e.g., from a second browser tab or mobile device).
The StreamHandler queries TaskManager and detects an active task for the thread_id.
The client subscribes to the existing stream via TaskManager.subscribe_to_stream(thread_id), which creates a new dedicated queue for the client.
All subsequent events from the task are broadcast to the new queue (alongside existing subscriber queues).
The StreamHandler streams events from the client’s queue to the new subscriber in real time.

Scenario 3: Client Disconnection & Reconnection

A client disconnects (e.g., network loss, tab closure), triggering an asyncio.CancelledError.
In the finally block of the streaming coroutine, TaskManager.unsubscribe_from_stream(thread_id, queue) is called to remove the client’s queue from the subscriber set.
The background task continues running, and other subscribed clients are unaffected.
If the client reconnects with the same thread_id before the task completes:

StreamHandler verifies the task is still active.
A new queue is created via subscribe_to_stream().
The client resumes receiving events from the current point in the stream.

When all subscribers have unsubscribed and the task completes, the TaskManager cleans up the task after a short delay (to accommodate late reconnections).

Key Technical Implementations

1. Publish-Subscribe Event Broadcasting

At the heart of the system is a publish-subscribe (pub/sub) pattern optimized for streaming:

When a task generates a new event, it is serialized into SSE format: data: {JSON-serialized-event}\n\n.
The TaskManager iterates over all subscriber queues for the corresponding thread_id and pushes the event to each queue.
Failed queue pushes (indicating a disconnected client) trigger automatic cleanup of the stale queue, ensuring the subscriber set remains efficient.

This approach guarantees that all subscribed clients receive identical events in real time, while avoiding overhead from dead connections.

2. Asynchronous Task Management with asyncio

Leveraging Python’s asyncio.Task and MiroFlow’s async capabilities, the system manages background tasks with precision:

Task registration: When a new task is created, its thread_id and asyncio.Task object are stored in the task registry.
Task cancellation: Users can cancel tasks from any client, with TaskManager propagating the cancellation signal to the background task.
Delayed cleanup: After a task completes, it remains in the registry for a configurable window (e.g., 5 minutes) to support reconnections. The task is only removed once the window expires and no subscribers exist.

3. Dynamic thread_id Extraction

For new conversations, the thread_id is not available upfront—it is generated within the first event from the AI task. To handle this:

The StreamHandler monitors the initial events from the task.
Upon extracting the thread_id from the first event, it immediately registers the task and thread_id with TaskManager.
Subsequent events are routed through the pub/sub mechanism to all subscribers.

This dynamic registration ensures seamless task tracking even when thread_id is not known at request time.

4. Robust Reconnection Support

The system’s reconnection feature is designed to minimize user disruption:

A dedicated reconnection API allows clients to send a request with a previously used thread_id.
TaskManager checks if the task is still active (or within the cleanup window).
If valid, a new subscriber queue is created, and the client resumes streaming from the current event sequence.
If the task has completed and been cleaned up, the client receives a "task completed" signal with the full conversation history.

Advantages & Value Propositions

Resource Efficiency: By sharing a single AI inference task across multiple clients, the system reduces computational overhead by up to 80% in multi-device scenarios—critical for scaling AI Agent products.
Real-Time Synchronization: All subscribed clients receive identical events simultaneously, ensuring consistent conversation states across devices.
Enhanced User Experience: Supports multi-tab/multi-device usage and seamless reconnection after network interruptions, eliminating frustration from lost progress.
Fault Tolerance: Disconnections of individual clients do not affect ongoing tasks or other subscribers, ensuring system stability.
MiroMind MiroFlow Synergy: Built on MiroFlow’s flexible async framework, the system inherits MiroMind’s strengths in task orchestration and event handling, reducing development complexity.

Application Scenarios

Multi-Device Sync: Users can start a conversation on their phone and continue on their laptop, with real-time updates on both devices.
Team Collaboration: Team members can jointly monitor AI tasks (e.g., report generation, data analysis) and receive live progress updates.
Unstable Network Environments: Users in areas with spotty connectivity can reconnect and resume conversations without losing context.
Task Cancellation: A user can initiate a task on one device and cancel it from another, with the cancellation propagated instantly.

Conclusion

By combining MiroMind’s MiroFlow framework with a pub/sub-based architecture, we’ve built a streaming response system that redefines real-time multi-client interaction for AI Agent products. The decoupling of task execution (via TaskManager) and client communication (via StreamHandler) enables efficient resource reuse, robust synchronization, and seamless reconnection—addressing the core pain points of traditional SSE systems.

This design not only enhances the user experience of OSpark but also provides a scalable foundation for future extensions, such as supporting more client types, adding event filtering, or integrating with real-time collaboration features. For teams building AI products with multi-client requirements, this architecture offers a proven, MiroMind-powered solution to deliver responsive, synchronized streaming experiences.

OSpark: Building Event-Driven Streaming Responses with MiroMind's MiroFlow Foundation

AskPaul — Fri, 31 Oct 2025 10:33:16 +0000

Introduction

OSparkApi, a universal intelligent agent orchestration system built upon MiroMindAI's open-source MiroFlow project, centers its innovation around one critical capability: sophisticated event-driven streaming response processing. This architecture, rooted in MiroFlow's robust foundations, enables seamless handling of dynamic agent interactions while maintaining exceptional flexibility and scalability.

Core Architectural Design

1. Event System: The Event-Driven Engine

Leveraging MiroFlow's architectural principles, OSparkApi adopts an event-driven architecture where all state changes flow through well-defined event streams, creating a loosely coupled ecosystem where components interact via standardized events.

Core Event Type Definitions:

ThreadStreamEvent = Annotated[
    ThreadCreatedEvent | ThreadUpdatedEvent | ThreadItemAddedEvent | 
    ThreadItemUpdated | ThreadItemDoneEvent | ProgressUpdateEvent | 
    ErrorEvent | NoticeEvent,
    Field(discriminator="type"),
]

Asynchronous Event Streaming:

class AgentContext(BaseModel):
    _events: asyncio.Queue[ThreadStreamEvent | object] = asyncio.Queue()

    async def stream(self, event: ThreadStreamEvent) -> None:
        await self._events.put(event)

This event system—enhanced from MiroFlow's original design—effectively decouples producers from consumers, significantly enhancing extensibility and allowing independent evolution of system components.

2. Store Layer: Persistence Abstraction

Building on MiroFlow's flexible data handling, the Store layer provides a unified storage interface with support for multiple backend implementations, ensuring adaptability in data persistence strategies.

Abstract Storage Interface:

class Store(ABC, Generic[TContext]):
    @abstractmethod
    async def load_thread(self, thread_id: str, context: TContext) -> ThreadMetadata:
        pass

    @abstractmethod
    async def add_thread_item(self, thread_id: str, item: ThreadItem, context: TContext) -> None:
        pass

ID Generation Strategy:

_ID_PREFIXES = {"thread": "thr", "message": "msg", "tool_call": "tc", "workflow": "wf"}
def default_generate_id(item_type: StoreItemType) -> str:
    prefix = _ID_PREFIXES[item_type]
    return f"{prefix}_{uuid.uuid4().hex[:16]}"

This design—extending MiroFlow's storage agnosticism—enables seamless switching between different storage backends without affecting core business logic.

3. Context System: Execution State Management

AgentContext, a critical enhancement to MiroFlow's execution model, serves as the core of response processing, maintaining complete execution context for thread operations and ensuring state consistency across complex interactions.

Core Data Structure:

class AgentContext(BaseModel):
    thread: ThreadMetadata
    store: Store[TContext]
    request_context: TContext

    workflow_item: WorkflowItem | None = None
    client_tool_call: ClientToolCall | None = None

Workflow Lifecycle Management:

async def start_workflow(self, workflow: Any) -> None:
    self.workflow_item = WorkflowItem(
        id=self.generate_id("workflow"),
        created_at=datetime.now(),
        workflow=workflow,
        thread_id=self.thread.id,
    )
    await self.stream(ThreadItemAddedEvent(item=self.workflow_item))

async def end_workflow(self, summary: Optional[Any] = None, expanded: bool = False) -> None:
    await self.stream(ThreadItemDoneEvent(item=self.workflow_item))
    self.workflow_item = None

The Context system—building on MiroFlow's state management capabilities—guarantees state consistency and operational atomicity throughout complex agent interactions.

4. Thread System: Conversation Session Management

Threads represent the core abstraction in OSparkApi, extending MiroFlow's session handling to encapsulate complete conversational sessions and provide a structured container for interaction history.

Thread Data Model:

class ThreadMetadata(BaseModel):
    id: str
    created_at: datetime
    status: ThreadStatus = Field(default_factory=lambda: ActiveStatus())
    title: str | None = None
    user_id: int | None = None

Rich Item Type Hierarchy:

ThreadItem = Annotated[
    UserMessageItem | AssistantMessageItem | ClientToolCallItem | 
    ServerToolCallItem | WidgetItem | WorkflowItem | TaskItem | EndOfTurnItem,
    Field(discriminator="type"),
]

The Thread system—enhanced from MiroFlow's conversation primitives—provides comprehensive lifecycle management for conversations, supporting diverse interaction patterns and content types.

Processing Flow

Streaming Response Handling

Agent response processing follows a standardized flow—evolved from MiroFlow's execution pipeline—with clear separation of concerns between components:

Response Handler Interface:

class AgentResponseHandler(ABC):
    @abstractmethod
    async def stream_agent_response(
        self, context: AgentContext, result: RunResultStreaming
    ) -> AsyncIterator[ThreadStreamEvent]:
        pass

Core Processing Logic:

async def stream_agent_response(self, context: AgentContext, result: RunResultStreaming):
    async for event in result.stream_events():
        thread_event = await self._process_stream_event(event, context)
        if thread_event:
            yield thread_event

Key Processing Capabilities

Content Streaming: Real-time delivery of incremental text with annotation and formatting information
Tool Invocation: Complete lifecycle management for both client and server tool calls
Error Handling: Unified exception capture and structured error event generation
Workflow Orchestration: Task state tracking and progress updates throughout execution

Architectural Advantages

Design Principles

Type Safety: Comprehensive use of Pydantic—consistent with MiroFlow's design philosophy—ensures data consistency
Async-First: Native asynchronous support based on asyncio for high-performance operations
Event-Driven: Decoupled architecture enables horizontal scalability and component isolation
Abstract Layering: Clear separation of responsibilities facilitates maintenance and evolution

Extensibility Features

Processor Factory: Supports custom processors for different LLM providers, building on MiroFlow's flexibility
Storage Plugins: Easy integration with new storage backends through abstract interfaces
Event Extension: New functionality can be implemented by adding event types
Config-Driven: Flexible configuration management based on OmegaConf

Enterprise-Grade Characteristics

Observability: Comprehensive event logging and tracing capabilities
Error Recovery: Event-based checkpointing and resume functionality
Concurrency Control: Thread-level state isolation for safe parallel execution
Permission Management: User-based access control for secure multi-tenant operation

Conclusion

OSparkApi's agent response processing architecture—built upon MiroMind's MiroFlow foundation—represents a mature evolution of modern streaming AI systems. By extending MiroFlow's robust core with a carefully crafted four-layer architecture (Event, Store, Context, and Thread), it delivers a high-performance, extensible, and type-safe response processing system.

This architecture not only enhances MiroFlow's original capabilities but also provides a solid technical foundation for future AI application innovation. It demonstrates strong adaptability across real-time interactions, multimodality, and complex reasoning scenarios, positioning OSparkApi as a robust framework for the next generation of intelligent agent applications—all while honoring its MiroMind heritage.

OSpark Orchestrator: Enabling Streaming Output for MiroMind's MiroFlow

AskPaul — Fri, 17 Oct 2025 12:31:07 +0000

As we embark on developing our universal Agent product OSpark, we've chosen to build upon MiroMindAI's open-source project MiroFlow. A key limitation we're addressing first is MiroFlow's lack of streaming output support. In this technical blog, we'll detail how we've enhanced MiroFlow to deliver real-time streaming capabilities through OSpark's orchestrator upgrade.

Core Changes in the Orchestrator

1. New Task Guidance System

def _get_task_guidance(chinese_context: bool = False) -> str:
    """Provides detailed task execution guidance with support for Chinese context"""

Emphasizes comprehensive information collection and transparent reporting
Offers specialized handling guidance for Chinese-language tasks
Avoids premature conclusions while preserving all potential candidate answers

2. Streaming LLM Calling Capability

async def _handle_llm_call_streaming(self, ...):
    """Supports real-time streaming of LLM responses"""

Returns real-time chunks of LLM-generated content
Integrates tool call information extraction
Supports streaming calls for both primary and sub-agents

3. Streaming Summary Generation

async def _handle_summary_with_context_limit_retry_streaming(self, ...):
    """Streaming version of summary generation with context limit retry support"""

Makes the summary generation process visible in real-time
Automatically handles context length exceeded issues
Features an intelligent retry mechanism

Design of the StreamingOrchestrator

1. Architectural Design

Decorator Pattern: Instead of replacing the existing orchestrator, we add streaming capabilities on top of it
Event-Driven: Implements streaming interactions based on a complete event system
Task Management: Supports task cancellation, status queries, and lifecycle management

2. Core Functionality

async def execute_task_streaming(self, ...) -> AsyncGenerator[StreamEvent, None]:
    """Executes tasks and returns events in a streaming fashion"""

Three Types of Real-time Interactions:

LLM Response Streaming: Returns LLM-generated content in real-time
Tool Execution Transparency: Displays tool call start, execution, and completion statuses
Progress Status Real-time Updates: Provides detailed execution steps and progress information

3. Event System

class EventType(str, Enum):
    TASK_STARTED = "task_started"           # Task initiation
    TASK_PROGRESS = "task_progress"         # Task progress updates
    LLM_RESPONSE = "llm_response"           # LLM-generated content
    TOOL_CALL_START = "tool_call_start"     # Tool invocation beginning
    TOOL_CALL_COMPLETE = "tool_call_complete" # Tool invocation finished
    ERROR = "error"                         # Error occurrences

Relationship Between the Two Components

1. Collaboration Model: Decorator Pattern

# Streaming orchestrator creates an orchestrator instance internally
orchestrator = Orchestrator(...)

Composition Relationship: The streaming orchestrator contains an instance of the original orchestrator
Function Enhancement: Adds streaming capabilities on top of the original orchestrator
Interface Compatibility: Maintains the original orchestrator's interface unchanged

2. Responsibility Division

Component	Responsibilities
Orchestrator	Core business logic: task execution, LLM calls, tool management
StreamingOrchestrator	Streaming interaction layer: event management, real-time feedback, task lifecycle

3. Data Flow

API Request → StreamingOrchestrator → Create Orchestrator Instance → Execute Core Logic → Generate Streaming Events → Return in Real-time

Core Value

1. User Experience Enhancement

Transparent Execution: Users can observe the AI's real-time thinking process
Instant Feedback: Real-time visibility of LLM responses, tool execution, and task progress
Controllability: Supports task cancellation and status monitoring

2. Technical Architecture Advantages

Backward Compatibility: All original functionalities are preserved, enabling gradual migration
Separation of Concerns: Clear separation between core logic and interaction experience
Extensibility: Event-driven architecture supports flexible feature expansion

3. Practical Applications

Real-time AI Assistant: Web interfaces that display the AI's thinking process in real-time
Debugging and Monitoring: Developers can monitor execution processes and performance metrics in real-time
Interactive Development: Supports task cancellation, parameter adjustment, and result verification

Conclusion

With the introduction of the StreamingOrchestrator, OSpark achieves a significant transformation from "black box" execution to "transparent" interaction. Using a decorator pattern design, we provide users with an improved interaction experience while maintaining backward compatibility, and offer developers enhanced observability and extensibility. This architecture not only addresses current real-time interaction needs but also lays a solid foundation for future feature expansion.

Building a Prediction Market App with MiroFlow: A Technical Deep Dive

AskPaul — Wed, 08 Oct 2025 03:53:57 +0000

As we develop https://askpaul.ai, a cutting-edge prediction market application, we've integrated MiroMindAI's MiroFlow framework to power our market outcome predictions. This powerful open-source research agent framework has proven instrumental in handling the complex, multi-step internet research tasks that underpin accurate forecasting. In this technical blog post, we'll explore the architecture of MiroFlow (available at https://github.com/MiroMindAI/MiroFlow) and examine how its design enables robust predictive capabilities.

MiroFlow Architecture Analysis

Architectural Overview

MiroFlow is a high-performance open-source research agent framework specifically engineered for executing complex multi-step internet research tasks, such as future event prediction. The project employs a modular design that supports multiple LLM providers and tool integrations, making it highly adaptable to diverse research scenarios.

Core Directory Structure

MiroFlow/
├── src/                    # Core source code
│   ├── core/              # Core orchestration logic
│   ├── llm/               # LLM client implementations
│   ├── tool/              # Tool management
│   ├── logging/           # Logging and tracing
│   └── utils/             # Utility functions
├── config/                # Configuration files
├── data/                  # Data files
├── logs/                  # Log files
├── utils/                  # Utility scripts
└── docs/                  # Documentation

Core Architectural Components

1. Core Orchestration Layer

src/core/orchestrator.py: Primary orchestrator responsible for coordinating main and sub-agents
src/core/pipeline.py: Task execution pipeline handling end-to-end task flows

2. LLM Client Layer

src/llm/client.py: LLM client factory
src/llm/provider_client_base.py: Abstract base class defining LLM client interface
src/llm/providers/: Concrete LLM provider implementations
- claude_anthropic_client.py
- gpt_openai_client.py
- mirothinker_sglang_client.py
- qwen_sglang_client.py
- And more

3. Tool Management Layer

src/tool/manager.py: Tool manager with MCP protocol support
src/tool/mcp_servers/: MCP server implementations
- audio_mcp_server.py: Audio processing
- browser_session.py: Browser sessions
- python_server.py: Python code execution
- reading_mcp_server.py: File reading
- reasoning_mcp_server.py: Reasoning tools
- searching_mcp_server.py: Search tools
- vision_mcp_server.py: Visual processing

4. Logging & Tracing Layer

src/logging/logger.py: Logging system
src/logging/task_tracer.py: Task tracer recording execution steps

5. Configuration Management

config/: Configuration file directory
- agent_*.yaml: Agent configurations
- benchmark/: Benchmarking configurations
- tool/: Tool configurations

6. Utilities Layer

src/utils/: Core utility functions
utils/: Evaluation and statistical tools

Execution Flow Architecture

Main Execution Flow:

Initialization Phase
- Loading configuration files
- Initializing LLM clients
- Creating tool managers
- Setting up logging and tracing
Task Execution Phase
- Main agent receiving tasks
- Tool invocation and sub-agent coordination
- Result aggregation and formatting
Result Output Phase
- Generating final answers
- Saving execution logs
- Returning formatted results

Key Features

1. Multi-Agent Architecture

Main Agent: Responsible for overall task coordination
Sub Agents: Execute specific subtasks
Tool Agents: Integrate various tools via MCP protocol

2. LLM Provider Support

OpenAI GPT series
Anthropic Claude series
MiroThinker (proprietary model)
Qwen series
DeepSeek series

3. Tool Ecosystem

Search Tools: Google search, web scraping
Code Execution: Python code running
File Processing: Document reading, audio transcription
Browser Automation: Playwright integration
Reasoning Tools: Logical inference support

4. Configuration-Driven Design

Hydra configuration management
Environment variable support
Dynamic configuration loading

Architectural Summary

MiroFlow represents a well-designed modular AI agent framework with the following characteristics:

Core Advantages:

High Modularity: Clear separation of concerns among components, enabling easy extensibility
Multi-LLM Support: Unified client interface supporting various large language models
Rich Tool Ecosystem: Integration of diverse tools through MCP protocol
Configuration-Driven: Flexible configuration system supporting different scenarios
Comprehensive Logging: Detailed execution records and error handling

Technology Stack:

Python 3.12+: Leveraging modern Python features
Hydra: For configuration management
MCP Protocol: Standard for tool integration
Asynchronous Programming: For high-performance concurrent processing
Pydantic: For data validation and serialization

Application Scenarios:

Future event prediction
Complex research tasks
Multi-step reasoning
Automated information gathering

The architectural design of MiroFlow embodies best practices in modern AI agent systems, ensuring both performance and maintainability while providing excellent extensibility—qualities that make it an ideal foundation for powering the predictive capabilities of our askpaul.ai platform.

Enhancing MiroFlow: Tracking Data Sources for AskPaul App

AskPaul — Thu, 02 Oct 2025 06:16:15 +0000

When building predictive market applications askpaul.ai, transparency in data sources is crucial for user trust and informed decision-making. Our platform leverages MiroMindAI's MiroFlow for market outcome predictions, but we faced a challenge: MiroFlow didn't natively support tracking and exposing the reference data sources used in predictions.

To address this, we implemented modifications to MiroFlow (available at https://github.com/MiroMindAI/MiroFlow) that enable comprehensive tracking of tool usage across all agents involved in the prediction process.

The Modification Approach

Our solution focused on capturing and preserving tool call data throughout the prediction process, with specific changes in two key files:

1. Tracking Tool Calls in `src/core/orchestrator.py`

We needed to ensure tool usage data persisted across multiple rounds of agent interactions rather than being reset each time.

Sub-agent Tool Call Tracking

# Execute tool calls
# Note: Do not reinitialize tool_calls_data to accumulate data across turns
if not hasattr(self, '_sub_agent_tool_calls_data'):
    self._sub_agent_tool_calls_data = []
tool_calls_data = self._sub_agent_tool_calls_data
all_tool_results_content_with_id = []

We then added identification and logging capabilities for sub-agent activities:

# Store current turn's tool call data to logs
if tool_calls_data:
    # Add sub-agent identification to each tool call
    for call_data in tool_calls_data:
        call_data["agent_type"] = "sub_agent"
        call_data["sub_agent_name"] = sub_agent_name
    # Store only newly added data to avoid duplicates
    current_turn_data = [call_data for call_data in tool_calls_data if call_data not in self.task_log.tool_calls_data]
    self.task_log.tool_calls_data.extend(current_turn_data)

Main Agent Tool Call Tracking

Similar modifications were made for the main agent:

# 7. Execute tool calls (in sequence)
# Note: Do not reinitialize tool_calls_data to accumulate data across turns
if not hasattr(self, '_main_agent_tool_calls_data'):
    self._main_agent_tool_calls_data = []
tool_calls_data = self._main_agent_tool_calls_data
all_tool_results_content_with_id = []

With corresponding logging:

# Store main agent's tool call data to logs
if tool_calls_data:
    # Add main agent identification to each tool call
    for call_data in tool_calls_data:
        call_data["agent_type"] = "main_agent"
    # Store only newly added data to avoid duplicates
    current_turn_data = [call_data for call_data in tool_calls_data if call_data not in self.task_log.tool_calls_data]
    self.task_log.tool_calls_data.extend(current_turn_data)

2. Extending Logging Capabilities in `src/logging/task_tracer.py`

To accommodate the new tracking data, we added a dedicated field in the task logging structure:

step_logs: list[StepRecord] = Field(default_factory=list)

# Store detailed data for all tool calls
tool_calls_data: list[dict[str, Any]] = Field(default_factory=list)

Key Improvements

These modifications delivered several critical enhancements:

Data Persistence: Tool call data is now stored in instance variables rather than temporary variables, preventing data loss between rounds.
Cross-turn Accumulation: The system now maintains a complete history of tool usage across all interaction rounds.
Agent Identification: Each tool call is clearly marked as originating from either the main agent or a specific sub-agent.
Integrated Logging: All tool usage data is systematically stored in the task log for later retrieval and display.
Duplicate Prevention: A deduplication mechanism ensures each tool call is recorded only once.

These changes have transformed MiroFlow into a more transparent prediction engine, allowing applications like askpaul.ai to display comprehensive data source information to users. This not only enhances user trust but also provides valuable insights into how predictions are formulated.

By making these modifications open-source, we hope to contribute to the MiroMindAI community and help other developers building transparent AI-powered applications.

Deploying MiroThinker for AI-Powered Predictions in askpaul.ai

AskPaul — Fri, 26 Sep 2025 11:54:44 +0000

As we develop askpaul.ai, a prediction market application requiring accurate AI-powered event outcome forecasting, we needed a reliable and high-performance model to power our prediction engine. After evaluating several options, we chose MiroMind's MiroThinker model for its exceptional predictive capabilities. This blog outlines our deployment process on a CentOS system with GPU acceleration.

Infrastructure Setup

Our deployment environment consists of:

CentOS 8.3 operating system
NVIDIA H20 GPU for accelerated computing

Prerequisites Installation

1. Python 3.12 Installation

We started by installing Python 3.12, which provides the necessary runtime environment for our application:

# Installation commands for Python 3.12 on CentOS 8.3
sudo dnf install -y gcc openssl-devel bzip2-devel libffi-devel
wget https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tgz
tar xzf Python-3.12.0.tgz
cd Python-3.12.0
./configure --enable-optimizations
make altinstall

2. NVCC Installation

To leverage GPU acceleration, we installed the NVIDIA CUDA Compiler (nvcc):

# Install CUDA toolkit containing nvcc
sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo dnf install -y cuda-toolkit

Note about CentOS 8.3 package management: CentOS 8 and later versions use dnf as the default package manager, which is an enhanced version of yum. While yum commands still work as aliases to dnf, we recommend using dnf directly for better performance and dependency resolution.

Required Dependencies

We installed the following Python packages to ensure proper functionality:

pip3.12 install sglang pybase64 pydantic orjson uvicorn uvloop fastapi torch psutil zmq packaging Pillow openai partial_json_parser huggingface_hub transformers sentencepiece sgl_kernel dill compressed_tensors einops msgspec python-multipart pynvml torchao xgrammar openai_harmony

These packages provide essential functionality including:

Web serving capabilities (uvicorn, fastapi)
GPU-accelerated tensor operations (torch, torchao)
Model management and inference (sglang, transformers, huggingface_hub)
Data processing and serialization (orjson, msgspec, pybase64)

Deploying MiroThinker

With all prerequisites in place, we deployed the MiroThinker-32B-DPO-v0.2 model using sglang's server:

nohup python3.12 -m sglang.launch_server \
    --model-path miromind-ai/MiroThinker-32B-DPO-v0.2 \
    --tp 1 \
    --dp 1 \
    --host 0.0.0.0 \
    --port 6666 \
    --trust-remote-code \
    --chat-template qwen3_nonthinking.jinja > miromind.log &

This command starts the server in the background with nohup, ensuring it continues running even after logout. The model is deployed with:

Tensor parallelism (tp) set to 1
Data parallelism (dp) set to 1

These settings are appropriate for our single GPU setup.

For the nonthinking mode required by our prediction use case, we used the specialized template available at:
https://github.com/MiroMindAI/MiroThinker/blob/main/assets/qwen3_nonthinking.jinja

Conclusion

Deploying MiroThinker on our CentOS 8.3 system with an H20 GPU has significantly enhanced the prediction capabilities of askpaul.ai. The model's performance meets our expectations for accuracy and response time, making it an excellent fit for our prediction market application.

The sglang framework provided a straightforward deployment path, and the MiroThinker model has proven to be reliable and efficient in our production environment. We're excited to continue leveraging this powerful combination as we expand the capabilities of askpaul.ai.