Forem: Amito Vrito

SynapseKit - A Production-Grade LLM Framework Built for Speed, Simplicity, and Scale

Amito Vrito — Thu, 23 Apr 2026 07:39:11 +0000

*https://github.com/SynapseKit/SynapseKit
https://synapsekit.github.io/synapsekit-docs/
*

SynapseKit is an async-first Python framework for building LLM applications - chains, agents, RAG pipelines, tool calling, and multi-agent orchestration. Two base dependencies. 48 built-in tools. 31 LLM providers. Designed for engineers who need production-grade tooling without production-grade complexity.

"The right abstraction disappears. You stop thinking about the framework and start thinking about the problem."

What SynapseKit Is
SynapseKit is an open-source Python framework for building applications powered by large language models. It covers the full surface area - from a single LLM call to multi-agent orchestration with cost guardrails - with a design philosophy that prioritizes speed, debuggability, and minimal abstraction.

The core principle: every layer of abstraction must earn its place by making the engineer faster, not by making the framework more flexible.

What ships in the box:

31 LLM providers - OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, and 25 more. Switch providers by changing one string.
48 built-in tools - 12 work with zero configuration. No pip install, no API key, no setup.
43 document loaders - PDF, HTML, CSV, JSON, Markdown, DOCX, and more. Standardized interface across all formats.
Multi-agent primitives - Sequential, parallel, supervisor, hierarchical, pipeline, and feedback loop patterns. All six supported out of the box.
MCP server support - Model Context Protocol integration for tool-rich agent deployments.
Cost guardrails - Built into the execution engine. Set a budget, the agent stops cleanly instead of burning your API credits.

Design Philosophy
Two Dependencies
SynapseKit's base install pulls two packages. Not 67. Not 43. Two.

SynapseKit: 2
· 48 MB RA
M · 80ms cold start
LangChain: 67 dependencies · 189 MB RAM · 2,400ms cold start
LlamaIndex: 43 dependencies · 112 MB RAM · 1,100ms cold start

Fewer dependencies means fewer version conflicts, faster installs, smaller container images, and cold starts that don't punish your users. In serverless deployments where every scale-from-zero event pays the cold start tax, 80ms vs 2.4 seconds is the difference between responsive and broken.

Async From the Ground Up
Every base class - BaseTool, BaseRetriever, BaseLLM - is async def by default. Not sync with an async wrapper bolted on. Not run_in_executor hiding a blocking call.

This matters because async correctness propagates. When the base class is async, every implementation is async. Contributors don't accidentally write sync tools. The framework never silently dispatches to a thread pool. At 50 concurrent requests, SynapseKit achieves 96.8% of theoretical throughput - near-baseline async efficiency.

Shallow Call Stacks
When something fails at 3am in production, the traceback is 8 lines, not 47. The agent loop is 47 lines of readable Python. No RunnableSequence.call chains, no middleware dispatch, no callback manager traversal. You read the error, you find the bug, you fix it.

One Tool Interface
Define a tool once with a JSON schema. Export to OpenAI format with .schema(). Export to Anthropic format with .anthropic_schema(). Same source of truth, zero duplication. One definition that works across all 31 providers.

What You Can Build
RAG Pipelines
from synapsekit import LLM, RAGPipeline, PDFLoader

docs = PDFLoader("reports/").load()
rag = RAGPipeline(docs=docs, llm=LLM("openai/gpt-4o"))
rag.build()

answer = await rag.query("What were Q3 revenue figures?")

Seven lines. Load, build, query. Chunking, embedding, indexing, retrieval, and generation - all handled. Switch to Anthropic by changing "openai/gpt-4o" to "anthropic/claude-sonnet-4-20250514". Nothing else changes.

Agents with Tools
Built-in tools for calculation, datetime, web search, file operations, and more. Define custom tools with a class and a JSON schema. The agent loop handles reasoning, tool selection, execution, and observation routing.

Multi-Agent Orchestration
The Crew and Task primitives support six orchestration patterns. Declare dependencies between tasks, not between agents. The framework handles execution order, context passing, and result aggregation.

from synapsekit import Crew, Task, Agent

researcher = Agent(name="researcher", tools=[search_tool])
writer = Agent(name="writer", tools=[])

research_task = Task(agent=researcher, description="Find latest data on X")
write_task = Task(agent=writer, description="Write report", context_from=[research_task])

crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = await crew.run()

Streaming
async for token in llm.stream("Explain quantum computing"):
print(token, end="")

First-class streaming with the cleanest API across any framework. No callback handlers, no special configuration.

Where SynapseKit Fits
SynapseKit is built for a specific engineer: the one building LLM-powered products that need to work reliably in production, not just in a notebook demo.

Use SynapseKit when:

You need fast cold starts (serverless, edge, CLI tools)
You want minimal dependency footprint in containerized deployments
You're building agent-heavy applications with multiple tools
You need to switch between LLM providers without rewriting code
You want cost controls built into the execution layer
Consider alternatives when:

You need LlamaIndex's advanced chunking strategies (SemanticSplitterNodeParser, KnowledgeGraphIndex)
You need LangChain's ecosystem breadth and community integrations
You need LangChain's ToolException error recovery pattern for complex agent loops
We publish these tradeoffs openly. The 30-notebook LLM Framework Showdown on Kaggle benchmarks SynapseKit against LangChain and LlamaIndex across 18 production dimensions - including the dimensions where SynapseKit loses. Honest benchmarking means publishing the uncomfortable numbers too.

The Vision
LLM frameworks today are where web frameworks were in 2010. Too many abstractions solving for flexibility instead of velocity. Too much ceremony for simple operations. Too many dependencies for production deployments.

SynapseKit is a bet on a different direction: that the best framework is the one that disappears. You think about your application logic, not about the framework's internal architecture. You debug your code, not the framework's middleware. You deploy with confidence because you understand every line between your function call and the LLM API.

The roadmap:

Evaluation harness - standardized benchmarks you can run against your own agents
Visual debugger - trace agent execution, tool calls, and token usage in real time
Plugin marketplace - community tools and integrations with a single install command
Enterprise features - audit logging, role-based access, deployment presets for AWS/GCP/Azure
SynapseKit is MIT-licensed, fully open source, and built in the open. Every design decision is documented. Every benchmark is reproducible. Every line of code is readable.

Get Started
pip install synapsekit

GitHub: github.com/SynapseKit/SynapseKit
Benchmarks: LLM Framework Showdown on Kaggle
Documentation: Ships with the package
Two dependencies. One pip install. Start building.

Engineers of AI

Why I Modelled My LLM Pipeline as a DAG Instead of a Chain — and What I Found Out

Amito Vrito — Thu, 16 Apr 2026 09:27:58 +0000

The problem with chains in production

Every major Python LLM framework gives you the same primitive: a chain.

LangChain's LCEL. LlamaIndex's pipeline. Haystack's components. They all model your pipeline as a linear sequence of steps — input flows through A, then B, then C, output comes out the end.

For a hello-world RAG demo, that's fine. For a production system, you hit the wall fast.

What chains can't express cleanly

Here's a real pipeline I needed to build:

Classify the incoming query
Based on classification: route to either semantic search, keyword search, or both in parallel
If both: merge and re-rank results
Generate response from ranked context
If any retrieval stage fails: surface a clear error, don't silently continue

Try expressing that as a chain. You end up with nested chains, manual asyncio.gather() calls outside the framework, try/except blocks swallowing exceptions to keep the chain going, and no clean way to express "step D depends on both B and C."

The abstraction is fighting you.

DAGs are the right model

A directed acyclic graph expresses all of this naturally.

Nodes are tasks. Edges are data dependencies. Execution is topologically ordered — a node fires when all its upstream dependencies have resolved.

from synapsekit import Pipeline

pipeline = Pipeline()
pipeline.add_node("classify", ClassifierNode())
pipeline.add_node("semantic_search", RAGNode(store=vector_store))
pipeline.add_node("keyword_search", BM25Node(index=bm25_index))
pipeline.add_node("rerank", RerankerNode())
pipeline.add_node("generate", LLMNode(model="gpt-4o"))

pipeline.add_edge("classify", "semantic_search")
pipeline.add_edge("classify", "keyword_search")
pipeline.add_edge("semantic_search", "rerank")
pipeline.add_edge("keyword_search", "rerank")
pipeline.add_edge("rerank", "generate")

result = await pipeline.run(query="explain async-native design")

semantic_search and keyword_search run concurrently. rerank waits for both. generate waits for rerank. The execution engine handles ordering. You describe the dependencies.

Failure propagation that actually works

In a chain, a failed step either kills the whole pipeline or gets swallowed silently.

In a DAG, failure has semantics. If semantic_search fails, rerank — which depends on it — is cancelled. generate — which depends on rerank — is also cancelled. You get a clear error naming the failed node and its dependents.

No silent degradation. Failure is explicit and traceable.

The async piece

Nodes with no dependency relationship between them run concurrently automatically. The execution engine handles asyncio.gather() at each topological level. You write individual async node functions. The graph handles orchestration.

class SemanticSearchNode(Node):
    async def execute(self, inputs):
        results = await self.vector_store.asearch(inputs["query"])
        return {"results": results}

No manual gather calls. The graph structure encodes the parallelism.

Is it worth the complexity?

For simple A → B → C pipelines with no branching: a chain is fine.

The moment you have parallel retrieval, conditional routing, or stages where failure isolation matters — the chain abstraction costs more than it saves.

SynapseKit is the framework I built around this model:
https://github.com/SynapseKit/SynapseKit
API Doc: https://synapsekit.github.io/synapsekit-docs/

10k PyPI downloads since launch. The engineers who need this know exactly why.

What does your production RAG architecture look like? Drop it in the comments.

Introducing SynapseKit: The Async-Native Python LLM Framework I Built Because LangChain's Async is Broken

Amito Vrito — Wed, 15 Apr 2026 11:44:29 +0000

Article:

Why I built another LLM framework

I know. Another one.

But hear me out — because the reason I built SynapseKit is specific, and it might be the same reason you've been frustrated too.

The async lie in Python LLM frameworks

LangChain has async support. LlamaIndex has async support. Haystack has async support.

Technically true. Practically — look at the source.

You'll find asyncio.get_event_loop().run_in_executor() wrapping sync functions. You'll find internal blocking IO disguised behind async def. You'll find ThreadPoolExecutor doing the actual work.

That's not async-native. That's sync code wearing an async costume.

For simple scripts and demos, it doesn't matter. For production services handling concurrent requests — FastAPI services, real-time RAG systems, high-throughput agent workflows — it matters enormously. You're paying the cost of threads AND the overhead of the async event loop, with none of the actual throughput benefits.

What SynapseKit does differently

I built the async layer first. Every IO operation — LLM calls, retrieval, embedding generation — is genuinely non-blocking from the ground up. There's no sync wrapper underneath.

import asyncio
from synapsekit import Pipeline, RAGNode, LLMNode

async def main():
    pipeline = Pipeline()
    pipeline.add_node("retrieve", RAGNode(vectorstore=my_store))
    pipeline.add_node("generate", LLMNode(model="gpt-4o"))
    pipeline.add_edge("retrieve", "generate")

    result = await pipeline.run(query="What is async-native design?")
    print(result)

asyncio.run(main())

Notice: no .run_in_executor(). No thread pool. Just async.

DAGs, not chains

The second architectural decision: pipelines are directed acyclic graphs, not linear chains.

Every major framework pushes you toward .pipe() or | operator chains. That works for the happy path. Production systems aren't the happy path.

In a real RAG system you might:

Retrieve from multiple sources in parallel
Route to different generation strategies based on query classification
Have fallback paths when a retrieval stage fails
Run a re-ranking step that depends on two upstream retrievers

A chain can't express that cleanly. A DAG can.

pipeline.add_node("classify", ClassifierNode())
pipeline.add_node("retrieve_docs", RAGNode(vectorstore=doc_store))
pipeline.add_node("retrieve_web", WebSearchNode())
pipeline.add_node("rerank", RerankerNode())
pipeline.add_node("generate", LLMNode())

pipeline.add_edge("classify", "retrieve_docs")
pipeline.add_edge("classify", "retrieve_web")
pipeline.add_edge("retrieve_docs", "rerank")
pipeline.add_edge("retrieve_web", "rerank")
pipeline.add_edge("rerank", "generate")

Both retrieval nodes run concurrently. The reranker waits for both. The LLM waits for the reranker. Topological ordering handles execution automatically.

The numbers

~10,000 PyPI downloads in ~20 days of active development. No Product Hunt. No HN. No launch post.

Developers found it through PyPI search. That told me the demand is real.

Try it

pip install synapsekit

GitHub: https://github.com/SynapseKit/SynapseKit

If you've been frustrated with async in your LLM stack — or you're building something where throughput actually matters — I'd genuinely love your feedback.

And if this resonates, a GitHub star helps surface it to other developers who are hitting the same walls.

SynapseKit is open source under Apache license. Built by, Senior AI Specialists and founder of EngineersOfAI.