Forem: Geri Máté

We're running our first hackathon: Build with VectorAI DB, win Claude subscriptions

Geri Máté — Thu, 09 Apr 2026 09:39:39 +0000

The Actian VectorAI DB Build Challenge is our first community hackathon, and we want to see what you build. Solo or team, beginner or experienced, local or cloud. If you've been looking for a reason to actually ship something with a vector database, this is it.

April 13-18, 2026 | Virtual | Register on DoraHacks

What you're building

An AI application that solves a real, tangible problem using Actian VectorAI DB. It can run on your laptop, on a server, in the cloud, wherever. The only rule: VectorAI DB has to be a core part of your stack, not something you bolted on at the end.

Your project also needs to go beyond basic similarity search. Pick at least one of these:

Hybrid Fusion - combine multiple search signals into one ranked result. Not just meaning, not just keywords. Both, fused together.

What that looks like in practice: A job board that ranks candidates by semantic fit ("backend engineer who gets distributed systems") AND keyword match ("Golang, Kubernetes") merged into one list using RRF or DBSF.

Filtered Search - pair vector search with structured filters on your data so results are actually useful, not just semantically close.

What that looks like in practice: A campus event finder that understands what you're looking for but also filters by date, location, and student org. So you're finding events you can go to, not just events that sound similar.

Named Vectors / Multimodal - store and search across different data types in the same collection. Text, images, audio, whatever fits your idea.

What that looks like in practice: A study tool where you search your notes by typing a question or uploading a diagram. Both hit the same knowledge base, just through different vector spaces.

Bonus points for running locally, on ARM, or offline. No fixed weight, judges' call.

Not sure what to build?

Some starting points, but don't let these limit you:

A RAG app over any dataset you actually care about (research papers, course notes, documentation, news)
A semantic search tool with smart filters (campus events, job listings, study materials)
A recommendation engine that combines meaning and metadata
An anomaly detection or monitoring system
An AI agent with vector-powered memory
A multimodal search tool across text and images

Getting started

The database runs in Docker and works natively on Mac (including Apple Silicon), Linux, and Windows. No Rosetta, no platform flags needed.

# Clone the repo and start the database
docker compose up

# Install the Python client
pip install actian-vectorai

Not sure where to begin? Start with the featured RAG example:

pip install -r examples/rag/requirements.txt
python examples/rag/rag_example.py

It walks you through building a complete retrieval-augmented generation app from scratch. You'll have something running in under 10 minutes.

VectorAI DB handles storage and search. You bring your own embedding model. A good default to start with is sentence-transformers/all-MiniLM-L6-v2, fast, lightweight, and works well for most text use cases.

pip install sentence-transformers

For the full API docs and more examples, check the repo README linked in Discord.

Prizes

🥇 1st place team: Claude Max 5x, 3 months per person

🥈 2nd place team: Claude Max 5x, 1 month per person

🥉 3rd place team: Claude Pro, 1 month per person

Teams of up to 4. Solo submissions welcome.

How we judge

Use of Actian VectorAI DB (30%): Is VectorAI DB doing real work in this app? Does the team know why they used it the way they did?
Real-world impact (25%): Does it solve something people actually care about? Would someone use this?
Technical execution (25%): Does it work? Is the code coherent and the architecture thought through?
Demo and presentation (20%): Can you explain what you built and why it matters?

How to submit

All submissions go through DoraHacks. You'll need a public GitHub or GitLab repo with a README, a working demo (video, Loom, or live link), and a short write-up covering what you built, why, and which technical requirement you used.

Results announced April 20 on Discord.

Join us

Discord for support, team formation, and progress sharing: discord.gg/432A2M63Py

Drop a comment if you're in. See you April 13.

Building Your First AI Agent Without Frameworks

Geri Máté — Fri, 13 Jun 2025 10:50:56 +0000

Want to understand how AI agents actually work? Let's build one from scratch before jumping into frameworks.

Most AI agent tutorials start with LangGraph or CrewAI, which are great tools, but they can make it hard to understand what's happening underneath.

An agent is really just a language model that can call functions. Once you understand that, frameworks make way more sense.

Today we're building a customer support system using OpenAI's API and Python. This will give you the fundamentals that make any agent framework easier to use and debug.

What we're building:

A routing system that decides which "specialist" handles each query
Function-calling agents that can search FAQs and analyze sentiment
Simple state management to track conversations
Logic to escalate to humans when needed

By the end, you'll understand how agents work under the hood, making you much more effective when you do use frameworks.

An Agent is Just an LLM with Tools

Seriously, that's all there is to it:

Language model with a specific job
Functions it can call
Logic to decide when to use them

Everything else is just orchestration.

Let's start with the simplest possible agent:

import openai
import json
from typing import Dict, List, Any

# Set up OpenAI (get your API key from https://platform.openai.com/api-keys)
import os
openai.api_key = os.getenv("OPENAI_API_KEY")

class SimpleAgent:
    def __init__(self, name: str, role: str, tools: List[callable]):
        self.name = name
        self.role = role
        self.tools = {tool.__name__: tool for tool in tools}

    def respond(self, message: str) -> str:
        # Create tool descriptions for the model
        tool_descriptions = []
        for name, func in self.tools.items():
            tool_descriptions.append({
                "type": "function",
                "function": {
                    "name": name,
                    "description": func.__doc__ or f"Function {name}",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {"type": "string", "description": "The input query"}
                        },
                        "required": ["query"]
                    }
                }
            })

        # Call OpenAI with function calling
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": self.role},
                {"role": "user", "content": message}
            ],
            tools=tool_descriptions,
            tool_choice="auto"  # Let the model decide when to use tools
        )

        # Handle function calls
        if response.choices[0].message.tool_calls:
            tool_call = response.choices[0].message.tool_calls[0]
            function_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)

            # Execute the function
            if function_name in self.tools:
                result = self.tools[function_name](arguments["query"])
                return f"{self.name}: {result}"

        # Regular response if no function call
        return f"{self.name}: {response.choices[0].message.content}"

# Test it out
def search_faq(query: str) -> str:
    """Search the FAQ database for answers"""
    faqs = {
        "shipping": "Standard shipping takes 3-5 business days",
        "refund": "Refunds processed within 5-7 business days",
        "return": "Returns accepted within 30 days"
    }

    for topic, answer in faqs.items():
        if topic in query.lower():
            return answer
    return "No FAQ found for that topic"

# Create an FAQ agent
faq_agent = SimpleAgent(
    name="FAQ Assistant",
    role="You're a helpful FAQ assistant. Use the search_faq function to find answers to customer questions.",
    tools=[search_faq]
)

# Test it
print(faq_agent.respond("How long does shipping take?"))
# FAQ Assistant: Standard shipping takes 3-5 business days

Done. You just built an AI agent. It understands questions, knows when to use its tool, and gives helpful answers.

Adding More Specialists

Now let's add agents that handle different stuff:

def analyze_sentiment(query: str) -> str:
    """Analyze the emotional tone of customer messages"""
    # Simple keyword approach - you could use [Transformers](https://huggingface.co/docs/transformers/index) for a real sentiment model
    negative_words = ["angry", "frustrated", "terrible", "awful", "hate"]
    urgent_words = ["urgent", "immediately", "asap", "emergency"]

    query_lower = query.lower()

    if any(word in query_lower for word in urgent_words):
        return "URGENT: Customer needs immediate attention"
    elif any(word in query_lower for word in negative_words):
        return "NEGATIVE: Customer is frustrated, handle with care"
    else:
        return "NEUTRAL: Standard response appropriate"

def check_escalation_needed(query: str) -> str:
    """Determine if human escalation is needed"""
    escalation_triggers = [
        "speak to manager", "cancel account", "legal action", 
        "complaint", "lawsuit", "terrible service"
    ]

    if any(trigger in query.lower() for trigger in escalation_triggers):
        return "ESCALATE: Route to human agent immediately"
    else:
        return "CONTINUE: AI agent can handle this query"

# Create specialized agents
sentiment_agent = SimpleAgent(
    name="Sentiment Analyzer",
    role="You analyze customer emotions. Use analyze_sentiment to understand how the customer is feeling.",
    tools=[analyze_sentiment]
)

escalation_agent = SimpleAgent(
    name="Escalation Manager", 
    role="You decide when customers need human help. Use check_escalation_needed to evaluate queries.",
    tools=[check_escalation_needed]
)

The Router: Deciding Who Handles What

Here's where it gets interesting - we need something to decide which agent handles each message:

class AgentRouter:
    def __init__(self):
        self.agents = {
            "faq": faq_agent,
            "sentiment": sentiment_agent,
            "escalation": escalation_agent
        }
        self.conversation_history = []

    def route_query(self, query: str) -> str:
        """Decide which agent should handle this query"""

        # Save the conversation
        self.conversation_history.append({"role": "user", "content": query})

        # Basic routing - you could make this way smarter
        query_lower = query.lower()

        # Check for escalation triggers first
        if any(word in query_lower for word in ["manager", "complaint", "cancel", "lawsuit"]):
            agent_name = "escalation"
        # Check for emotional language
        elif any(word in query_lower for word in ["angry", "frustrated", "urgent", "terrible"]):
            agent_name = "sentiment"
        # Default to FAQ for standard questions
        else:
            agent_name = "faq"

        # Get response from the right agent
        agent = self.agents[agent_name]
        response = agent.respond(query)

        # Save that too
        self.conversation_history.append({"role": "assistant", "content": response})

        return f"[Routed to {agent_name.upper()}]\n{response}"

    def get_conversation_summary(self) -> str:
        """Get a summary of the conversation so far"""
        if not self.conversation_history:
            return "No conversation yet"

        summary = f"Conversation with {len(self.conversation_history)//2} exchanges:\n"
        for i, msg in enumerate(self.conversation_history[-4:]):  # Last 2 exchanges
            role = "Customer" if msg["role"] == "user" else "Agent"
            summary += f"{role}: {msg['content']}\n"

        return summary

# Test the complete system
router = AgentRouter()

print("=== Customer Support Agent System ===\n")

# Test different types of queries
test_queries = [
    "How long does shipping take?",
    "I'm really frustrated with this terrible service!",
    "I want to speak to your manager right now!",
    "What's your return policy?"
]

for query in test_queries:
    print(f"Customer: {query}")
    response = router.route_query(query)
    print(f"{response}\n")

print("Conversation Summary:")
print(router.get_conversation_summary())

Making It Smarter: Let the AI Do the Routing

Keyword matching works, but we can do better. Let's use the LLM itself to make routing decisions:

class SmartRouter:
    def __init__(self):
        self.agents = {
            "faq": faq_agent,
            "sentiment": sentiment_agent, 
            "escalation": escalation_agent
        }
        self.conversation_history = []

    def smart_route(self, query: str) -> str:
        """Use AI to decide which agent should handle the query"""

        routing_prompt = f"""You're routing customer queries to specialists.

        Options:
        - faq: Standard questions about policies, shipping, returns
        - sentiment: Upset or frustrated customers  
        - escalation: Complex complaints or requests for managers

        Customer: "{query}"

        Which specialist? Just answer: faq, sentiment, or escalation"""

        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": routing_prompt}],
            temperature=0
        )

        agent_choice = response.choices[0].message.content.strip().lower()

        # Default to FAQ if something weird happens
        if agent_choice not in self.agents:
            agent_choice = "faq"

        # Get response from chosen agent
        agent_response = self.agents[agent_choice].respond(query)

        return f"[Smart routed to {agent_choice.upper()}]\n{agent_response}"

# Test smart routing
smart_router = SmartRouter()

print("=== Smart Routing Test ===\n")

smart_test_queries = [
    "My package is late and I'm getting married tomorrow!",
    "Do you accept international credit cards?", 
    "This is absolutely ridiculous, I want my money back immediately!",
    "Can I return something I bought 3 weeks ago?"
]

for query in smart_test_queries:
    print(f"Customer: {query}")
    response = smart_router.smart_route(query)
    print(f"{response}\n")

Adding Memory: Making Conversations Actually Work

Real support conversations build on what happened before. Here's how to add memory:

class MemoryAwareRouter:
    def __init__(self):
        self.agents = {
            "faq": faq_agent,
            "sentiment": sentiment_agent,
            "escalation": escalation_agent
        }
        self.conversation_memory = []
        self.customer_context = {
            "sentiment_history": [],
            "escalated": False,
            "resolved_issues": []
        }

    def process_with_memory(self, query: str) -> str:
        """Process query with full conversation context"""

        # Save current message
        self.conversation_memory.append({"role": "user", "content": query, "timestamp": "now"})

        # Build context summary
        context = self._build_context()

        routing_prompt = f"""Previous conversation context:
        {context}

        Current message: "{query}"

        Which specialist should handle this?
        - faq: Standard questions
        - sentiment: Emotional customers
        - escalation: Complex issues or if already escalated

        Consider the conversation history. Answer: faq, sentiment, or escalation"""

        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": routing_prompt}],
            temperature=0
        )

        agent_choice = response.choices[0].message.content.strip().lower()
        if agent_choice not in self.agents:
            agent_choice = "faq"

        # Update customer context based on routing
        if agent_choice == "sentiment":
            self.customer_context["sentiment_history"].append("negative")
        elif agent_choice == "escalation":
            self.customer_context["escalated"] = True

        # Get enhanced response with context
        agent_response = self._get_contextual_response(agent_choice, query)

        # Add to memory
        self.conversation_memory.append({
            "role": "assistant", 
            "content": agent_response,
            "agent": agent_choice
        })

        return f"[Contextual routing to {agent_choice.upper()}]\n{agent_response}"

    def _build_context(self) -> str:
        """Build conversation context summary"""
        if not self.conversation_memory:
            return "New conversation"

        context = f"Conversation history: {len(self.conversation_memory)} messages\n"
        context += f"Customer escalated: {self.customer_context['escalated']}\n"
        context += f"Negative sentiment detected: {len(self.customer_context['sentiment_history'])} times\n"

        # Include last few exchanges
        recent = self.conversation_memory[-4:]
        for msg in recent:
            role = "Customer" if msg["role"] == "user" else f"Agent ({msg.get('agent', 'unknown')})"
            context += f"{role}: {msg['content'][:100]}...\n"

        return context

    def _get_contextual_response(self, agent_name: str, query: str) -> str:
        """Get response with conversation context"""
        agent = self.agents[agent_name]

        # Add context to the agent's response
        if self.customer_context["escalated"] and agent_name != "escalation":
            prefix = "[Customer previously escalated] "
        elif len(self.customer_context["sentiment_history"]) > 1:
            prefix = "[Customer has been frustrated multiple times] "
        else:
            prefix = ""

        response = agent.respond(query)
        return prefix + response

# Test memory-aware system
memory_router = MemoryAwareRouter()

print("=== Memory-Aware Conversation ===\n")

conversation_flow = [
    "What's your return policy?",
    "That's not good enough, I'm really frustrated!",
    "I want to speak to someone who can actually help me!",
    "Fine, what information do you need for the return?"
]

for query in conversation_flow:
    print(f"Customer: {query}")
    response = memory_router.process_with_memory(query)
    print(f"{response}\n")

What You Actually Built

You just created a complete customer support system using basic Python and OpenAI. Here's what you learned:

The fundamentals:

✅ Agents = LLM + functions + routing logic
✅ Function calling lets agents take actions
✅ Smart routing decides who handles what
✅ State management keeps conversations coherent
✅ Memory makes agents context-aware

Why this approach:

You'll understand what frameworks actually do for you
Easier to debug when things go wrong
You can customize behavior exactly how you want
Works with any LLM provider
Good foundation before learning frameworks

Making It Production Ready

To actually deploy this, you'd need:

The basics:

Error handling (APIs fail)
Database for conversation storage
Rate limiting (prevent abuse)
Proper logging

The nice-to-haves:

Real sentiment analysis model
Integration with your FAQ database
Actual escalation to humans (Slack API, email, etc.)
Analytics on what's working

When frameworks make sense:
Now you understand what LangGraph, CrewAI, and AutoGen do - they handle the routing and orchestration you just built manually. They're great when:

You need complex multi-step workflows
You want pre-built integrations and tools
You're working on a team that benefits from standardized patterns
You need features like human-in-the-loop or advanced state management

The key is knowing when the abstraction helps versus when you need more control.

The Real Lesson

AI agents are organized LLMs with specific jobs and the ability to call functions. The "multi-agent" part is smart routing and state management.

Understanding these fundamentals makes you better at using any framework because you know what's happening underneath. Start here, then use frameworks when their features solve real problems you're facing.

Built something cool with this? I'd love to see what you made - drop it in the comments!

How to Prevent AI Agents From Breaking in Production

Geri Máté — Fri, 06 Jun 2025 12:21:12 +0000

Deploying AI agents in production is trickier than most teams expect. What works perfectly in development often becomes a reliability nightmare once real traffic hits.

After looking at incident reports, some clear patterns emerge. The same few issues keep causing the majority of production failures.

42% of AI agent failures come from hallucinated API calls, and another 23% are GPU memory leaks. These aren't edge cases - they're systematic problems that need systematic solutions.

Here's what's actually breaking and how to prevent it.

Common failure patterns

Hallucinated API calls

LLMs generate code that looks correct but calls non-existent methods or deprecated endpoints. Traditional validation tools miss this because the code is syntactically valid - it just references APIs that don't exist in your environment.

Teams often spend significant time debugging what appears to be infrastructure issues when the root cause is the AI making incorrect assumptions about available APIs.

GPU memory leaks

A known vulnerability in AMD, Apple, and Qualcomm GPUs can cause AI workloads to leak over 180MB per inference cycle. In Kubernetes environments, this can cascade across pods and eventually crash entire nodes.

Standard monitoring often doesn't catch this until resource exhaustion is already occurring.

Cascading failures

AI agents are more interconnected than typical microservices. A single malformed operation can stall agent threads for extended periods, and recovery processes often reset accumulated context, leading to broader system failures.

Insufficient observability

Most teams monitor traditional infrastructure metrics but lack visibility into AI-specific behavior like GPU utilization patterns, token consumption, and model performance degradation.

Practical solutions

Constrain API generation

Instead of relying on post-generation validation, limit what the LLM can suggest in the first place by providing explicit API context:

# Extract what's actually available
global_deps = extract_imports(codebase)
local_deps = parse_function_calls(current_module)

# Tell the LLM what it can actually use
prompt = f"""
Available APIs: {global_deps}
Local functions: {local_deps}
Task: {user_request}
"""

Teams using dependency-constrained prompting report fewer API hallucinations. The approach is straightforward: if you don't tell the LLM about APIs that don't exist, it's less likely to invent them.

Implement GPU resource controls

Set explicit resource limits in your container orchestration:

resources:
  limits:
    nvidia.com/gpu: 1
    memory: "4Gi"
  requests:
    memory: "4Gi"
    cpu: "2"

Monitor GPU memory usage and restart containers before they crash:

#!/bin/bash
while true; do
  vram_usage=$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits)
  if [ $vram_usage -gt 7500 ]; then  # 90% of 8GB
    kubectl rollout restart deployment/ai-agent
  fi
  sleep 30
done

This type of proactive monitoring has reduced OOM crashes in production environments.

Version AI components as units

AI agents consist of multiple interdependent components: models, vector databases, prompt templates, and configuration. These should be versioned and deployed together:

# ai-agent-chart/Chart.yaml
dependencies:
  - name: llm-model
    version: "1.2.3"
  - name: vector-db
    version: "0.9.1"
  - name: prompt-templates
    version: "2.1.0"

Deploying the entire bundle as a unit prevents version mismatches that can cause subtle but significant failures.

Add AI-specific monitoring

Traditional APM tools don't capture AI-specific metrics. You need to track GPU utilization, token consumption, and model performance alongside business outcomes. OpenTelemetry provides a good foundation for this:

from opentelemetry import trace
import time

tracer = trace.get_tracer(__name__)

def ai_inference(prompt, user_id):
    with tracer.start_as_current_span("ai_inference") as span:
        start_time = time.time()

        span.set_attribute("prompt.length", len(prompt))
        span.set_attribute("user.id", user_id)

        response = model.generate(prompt)

        span.set_attribute("response.length", len(response))
        span.set_attribute("inference.duration", time.time() - start_time)
        span.set_attribute("tokens.consumed", count_tokens(prompt + response))

        return response

Correlating these metrics with infrastructure data helps identify when GPU pressure affects response quality.

Build resilient fallback systems

Implement circuit breakers for external API calls:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_external_api(endpoint, payload):
    response = requests.post(endpoint, json=payload, timeout=10)
    response.raise_for_status()
    return response.json()

Have a clear escalation path when AI components fail:

def ai_with_fallback(user_request):
    try:
        return ai_agent.process(user_request)
    except AIAgentError:
        return rule_based_handler.process(user_request)
    except Exception:
        escalate_to_human(user_request)
        return "Request escalated to support team"

Making AI agents production-ready

AI agents in production require the same operational discipline as any other critical system. The difference is that they have unique failure modes that traditional monitoring and deployment practices don't address.

Teams that succeed treat AI agents as complex distributed systems with proper observability, resource management, and graceful degradation. The ones that struggle try to deploy them like traditional applications.

The good news is that once you address these systematic issues, AI agents become much more predictable and reliable in production environments.

Deploy AI Agents Without Infrastructure Headaches

Geri Máté — Fri, 30 May 2025 11:17:40 +0000

Platform engineers have a new nightmare: explaining to their CTO why the AI agent deployment that worked perfectly in staging is now burning through $50,000/month in production. The Terraform config looks flawless. The security groups are properly configured. The ECS tasks are healthy. But somehow, the vector database is choking on embeddings, the LLM gateway is routing traffic to the wrong regions, and the workflow orchestration is stuck in an infinite retry loop.

Traditional IaC tools weren't built for this complexity.

Traditional IaC Can't Handle AI Workloads

When ChatGPT generates your Terraform config, it looks perfect. But deploy it and everything breaks:

# This looks right but will fail in production
resource "aws_security_group" "ai_agent" {
  name = "ai-agent-sg"

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # ❌ Too permissive
  }
}

resource "aws_ecs_service" "ai_agent" {
  name            = "ai-agent"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.ai_agent.arn

  # ❌ Missing: vector DB networking, LLM provider configs, 
  # retry policies, cost controls, monitoring...
}

LLMs generating IaC are trained on public examples, not production systems. They miss vector database networking, multi-provider LLM failover, and other complexities that break under real traffic.

AI agents need completely different infrastructure:

Traditional Layer:         AI-Specific Layer:
- Compute (ECS/Lambda)     - Vector Database (Pinecone/Weaviate)
- Storage (S3/EBS)         - LLM Gateway (Multi-provider routing)
- Database (RDS)           - Workflow Orchestration (Temporal/Prefect)
- Networking (VPC/ALB)     - Model Serving & State Management

Each has its own failure modes and scaling patterns that traditional IaC treats as generic cloud resources.

What Actually Works

Pulumi for AI Infrastructure

Pulumi has native AI providers that treat vector databases and LLM gateways as real infrastructure. The trade-off? Your team needs to learn TypeScript/Python instead of HCL, and you're betting on a smaller ecosystem than Terraform's.

Alternative approaches:

Custom Terraform providers - Build your own for AI services (more work, but stays in Terraform)
Terraform + scripts - Use Terraform for basic infra, scripts for AI-specific parts
AWS CDK - Good if you're AWS-only

import * as pinecone from "@pulumi/pinecone";
import * as temporal from "@pulumi/temporal";

// Native vector database support
const vectorIndex = new pinecone.Index("knowledge-base", {
    name: "customer-support-kb",
    metric: "cosine",
    dimension: 1536,
    spec: {
        serverless: {
            cloud: "aws",
            region: "us-east-1"
        }
    }
});

// Workflow orchestration as code
const aiWorkflow = new temporal.Namespace("ai-workflows", {
    namespace: "customer-support",
    retention: "7d"
});

Temporal Handles Complex AI Workflows

Temporal manages the orchestration that AI agents need. Downsides: another system to operate, and your team needs to learn workflow concepts.

Alternatives:

Prefect - Similar to Temporal but more Python-native
Step Functions - AWS-native, simpler but less powerful
Kubernetes Jobs - If you want to stay close to K8s

@workflow.defn
class CustomerSupportAgent:
    @workflow.run
    async def handle_request(self, user_query: str) -> str:
        # Survives infrastructure failures
        context = await workflow.execute_activity(
            search_knowledge_base,
            user_query,
            start_to_close_timeout=timedelta(seconds=30)
        )

        # Automatic retries with backoff
        response = await workflow.execute_activity(
            call_llm_with_context,
            {"query": user_query, "context": context},
            retry_policy=RetryPolicy(maximum_attempts=3)
        )

        # Long-running workflows (hours/days/weeks)
        if needs_human_review(response):
            await workflow.wait_condition(
                lambda: workflow.info().search_attributes.get("approved")
            )

        return response

class CostOptimizedAI(pulumi.ComponentResource):
    def __init__(self, name: str):
        # Spot instances for training
        self.training_cluster = aws.ecs.Cluster(
            f"{name}-training",
            capacity_providers=["FARGATE_SPOT"]
        )

        # Reserved capacity for production
        self.inference_service = aws.ecs.Service(
            f"{name}-inference",
            desired_count=self.calculate_optimal_capacity()
        )

Security and Operational Considerations

API Key Management:

Use AWS Secrets Manager or Azure Key Vault for LLM API keys
Rotate keys automatically (most AI providers support this)
Never put API keys in your IaC code - use secret references

Rollback Strategy:

AI infrastructure changes can break in subtle ways
Always test rollbacks in staging first
Keep vector database backups before schema changes
Use blue-green deployments for model updates

Team Training:

Budget 2-4 weeks for engineers to learn Pulumi + Temporal
Start with one person, then spread knowledge
Document your AI infrastructure patterns for the team

Monitoring That Actually Matters

Regular monitoring misses what's important for AI systems. AI infrastructure spending hits $223 billion by 2028, so you need proper observability:

const aiMetrics = new aws.cloudwatch.Dashboard("ai-observability", {
    dashboardBody: pulumi.jsonStringify({
        widgets: [{
            type: "metric",
            properties: {
                metrics: [
                    // Traditional metrics
                    ["AWS/ECS", "CPUUtilization"],
                    ["AWS/ECS", "MemoryUtilization"],

                    // AI-specific metrics that actually matter
                    ["AI/VectorDB", "QueryLatency"],
                    ["AI/LLM", "TokensPerSecond"],
                    ["AI/LLM", "ResponseQuality"],
                    ["AI/Workflow", "CompletionRate"],
                    ["AI/Cost", "DollarPerInteraction"]
                ],
                title: "AI System Health"
            }
        }]
    })
});

// Alert on cost spikes
const costSpike = new aws.cloudwatch.MetricAlarm("ai-cost-spike", {
    comparisonOperator: "GreaterThanThreshold",
    metricName: "DollarPerInteraction",
    threshold: 0.50, // Alert if cost per interaction > $0.50
    alarmDescription: "AI infrastructure costs spiking"
});

What Teams Are Seeing

People adopting AI-native infrastructure report significant improvements:

10-100x lower costs with serverless vector databases vs. provisioned capacity
Self-hosted models can cost significantly less than API-based solutions for high-volume workloads

Companies using Temporal for AI workflows report significantly reduced debugging time and improved reliability for long-running AI processes.

Start here:

Check your AI costs - How much are you spending compared to self-hosted options?
Pick one AI workflow to rebuild as a test
Try Pulumi with Pinecone - deploy a test vector database

Next month:

Move critical AI workflows to Temporal
Set up cost monitoring and alerts
Add AI-specific observability

Companies building reliable, cheap AI infrastructure stopped using traditional IaC tools. They switched to AI-native approaches that treat AI workloads properly.

Your call: Keep fighting with Terraform and burning money, or use patterns that actually work.

AI Deployment: Why Serverless is Perfect (and Terrible)

Geri Máté — Wed, 28 May 2025 10:40:29 +0000

Your AI agent works perfectly in development. You've tested the reasoning chains, the tool integrations are solid, and the responses are exactly what users need. Then you deploy to production and everything breaks.

The timeout kills your multi-step workflows after 15 minutes. Your bundle exceeds the 250MB limit because you need scikit-learn, pandas, and a vector database client. Cold starts take 6+ seconds while your models load, making real-time interactions impossible.

Sound familiar? You're not alone. One developer working on an e-commerce recommendation engine discovered that "scikit-learn and pandas libraries increased the size of my deployment package beyond the AWS Lambda package limits." Another found their TensorFlow model loading caused API calls to timeout after 29 seconds.

Here's the thing: serverless isn't broken for AI. You're just hitting the boundaries of what it was designed for. Traditional serverless platforms were built for quick, stateless web requests—not long-running AI agent workflows that need to maintain context, load large models, and perform complex reasoning chains.

But before you abandon serverless entirely, understand this: for certain AI workloads, serverless is absolutely perfect. The question isn't whether to use serverless for AI—it's knowing when it works brilliantly and when it fails catastrophically.

When Serverless Shines for AI Deployments

Serverless excels in three specific AI scenarios that traditional infrastructure can't match.

Unpredictable Traffic Patterns

AI applications often experience extreme traffic variability. Your chatbot gets mentioned in a tweet and suddenly handles 1000x normal load. A content generation API processes 10 requests per hour during quiet periods, then 1000 requests during marketing campaigns.

Serverless platforms automatically scale from zero to thousands of concurrent executions without configuration. AWS Lambda provides 1,000 concurrent executions by default, scaling instantly based on demand. You pay only for actual compute time—not idle servers waiting for the next AI inference request.

Event-Driven AI Processing

Many AI workflows fit perfectly into event-driven patterns. Document uploaded → extract text → summarize content. New customer signup → analyze preferences → generate personalized recommendations. Code commit → run AI code review → post feedback.

These discrete, triggered operations align with serverless strengths. Each event spawns an independent function execution that processes the task and terminates. No need to manage background services or polling mechanisms.

Simple Inference Tasks

Lightweight AI operations—sentiment analysis, text classification, simple embeddings generation—work excellently in serverless environments. These tasks typically complete within seconds, use manageable dependencies, and don't require complex state management.

A sentiment analysis API using a pre-trained model can process requests in under 100ms with warm starts, providing excellent user experience while benefiting from serverless cost efficiency.

The Serverless Reality Check

The problems start when your AI workloads bump against fundamental serverless constraints.

Timeout Limitations Kill Complex Workflows

AWS Lambda caps execution at 15 minutes maximum. Vercel Functions limits vary by plan: 60 seconds on Hobby, 300 seconds on Pro, 900 seconds on Enterprise. Cloudflare Workers allows unlimited wall-clock time but restricts CPU time to 5 minutes.

Multi-step AI agent workflows routinely exceed these limits. Consider a research agent that:

Searches multiple data sources (2-3 minutes)
Processes and analyzes findings (3-5 minutes)
Generates comprehensive report (5-8 minutes)
Formats and delivers output (1-2 minutes)

Total runtime: 11-18 minutes. This workflow will fail on most serverless platforms or hit timeout limits that kill execution before completion.

Real-world example: AI agents performing "extract, transform, and load (ETL) jobs and content generation workflows such as creating PDF files or media transcoding require fast, scalable local storage to process large amounts of data quickly"—operations that frequently exceed serverless timeout constraints.

Bundle Size Problems Block AI Dependencies

Traditional serverless deployments face severe size restrictions:

AWS Lambda ZIP packages: 50MB compressed, 250MB uncompressed
Vercel Functions: 250MB uncompressed including layers
Cloudflare Workers: 3MB free, 10MB paid plans

Popular AI libraries routinely exceed these limits. Scikit-learn, pandas, numpy, and scipy together often surpass 250MB. Add a vector database client like Pinecone or Weaviate, plus an LLM SDK, and you're well beyond platform constraints.

The introduction of AWS Lambda container images (up to 10GB) fundamentally changes this landscape, but requires more complex deployment processes and sacrifices some serverless simplicity.

Cold Start Performance Destroys User Experience

AI workloads suffer dramatically from cold start penalties. Research shows that 99.9% of cold starts take up to 6.99 seconds for Java-based AI applications, while warm starts complete in just 33 milliseconds.

Loading TensorFlow models can cause initial API calls to timeout after 29 seconds during cold starts, though subsequent warm function calls process images in under one second. This unpredictable performance makes serverless unsuitable for real-time AI interactions where users expect immediate responses.

The cold start penalty compounds with AI complexity: larger models, more dependencies, and initialization-heavy frameworks all extend startup times beyond acceptable user experience thresholds.

Making Serverless Work: Practical Patterns

You can work around serverless limitations with architectural patterns designed for AI workloads.

1. Workflow Suspension and Resume

Break long-running AI processes into discrete steps with state persistence between invocations. Each step saves progress to external storage, enabling the next function to continue from checkpoint.

// Step 1: Initial Analysis
export const analyzeInput = async (event) => {
  const analysis = await performAnalysis(event.input);

  // Save state to Redis/DynamoDB
  await saveState(event.workflowId, { 
    step: 'analysis',
    result: analysis,
    nextStep: 'generate'
  });

  // Trigger next step
  await triggerNextStep(event.workflowId);

  return { status: 'processing', workflowId: event.workflowId };
};

// Step 2: Content Generation  
export const generateContent = async (event) => {
  const state = await loadState(event.workflowId);
  const content = await generateFromAnalysis(state.result);

  await saveState(event.workflowId, {
    step: 'complete',
    finalResult: content
  });

  return { status: 'complete', result: content };
};

This pattern enables unlimited workflow duration by staying within individual function timeout limits while maintaining progress state.

2. External State Management

AI agents require sophisticated state management beyond serverless stateless models. Externalize all persistent data to dedicated storage:

Redis/ElastiCache: Conversation context, short-term agent memory
PostgreSQL/MongoDB: Long-term user preferences, interaction history
Vector databases: Embeddings storage for semantic search and RAG

export const chatAgent = async (event) => {
  // Load conversation context
  const context = await redis.get(`chat:${event.userId}`);

  // Process with context
  const response = await generateResponse(event.message, context);

  // Update conversation state
  await redis.setex(`chat:${event.userId}`, 3600, {
    messages: [...context.messages, event.message, response],
    lastActivity: Date.now()
  });

  return response;
};

3. Container-Based Deployment

Use AWS Lambda container images to eliminate bundle size constraints. Include complete AI frameworks and pre-trained models within container deployments.

FROM public.ecr.aws/lambda/python:3.9

# Copy model files during build
COPY models/ ${LAMBDA_TASK_ROOT}/models/
COPY requirements.txt .

RUN pip install -r requirements.txt

COPY app.py ${LAMBDA_TASK_ROOT}

CMD ["app.lambda_handler"]

Container deployment enables 10GB packages while maintaining serverless operational benefits, though with increased deployment complexity.

4. Smart Cold Start Mitigation

Implement strategies to minimize cold start impact:

Model Pre-warming: Use scheduled functions to keep models loaded:

// Scheduled every 5 minutes
export const keepWarm = async () => {
  const modelExists = await checkModelAvailability();
  if (!modelExists) {
    await downloadAndCacheModel();
  }
  return { status: 'model ready' };
};

Progressive Response: Return immediate acknowledgment, then stream results:

export const aiInference = async (event) => {
  // Immediate response
  const responseId = generateId();
  await sendInitialResponse(responseId);

  // Background processing with streaming updates
  processInBackground(event.input, responseId);

  return { responseId, status: 'processing' };
};

Platform-Specific Considerations

AWS Lambda: Enterprise-Grade with Complexity Trade-offs

Strengths: Longest timeouts (15 minutes), container support up to 10GB, mature ecosystem, Provisioned Concurrency for predictable performance.

Best for: Complex AI workflows, enterprise deployments requiring compliance and integration with AWS services.

Limitations: Cold start performance, complex configuration for container deployments.

Vercel Functions: Developer Experience with Timeout Constraints

Strengths: Excellent developer experience, edge distribution, Fluid Compute for extended durations.

Best for: Simple AI APIs, content generation workflows, applications prioritizing deployment simplicity.

Limitations: Aggressive timeout limits (60 seconds on free tier), bundle size restrictions persist.

Cloudflare Workers: Global Edge with Memory Constraints

Strengths: Global edge distribution, unlimited wall-clock time, recent CPU limit increases to 5 minutes.

Best for: Real-time AI inference requiring global distribution, lightweight AI operations.

Limitations: 128MB memory limit, 10MB maximum bundle size, V8 runtime restrictions.

When NOT to Use Serverless for AI

Certain AI workloads fundamentally conflict with serverless constraints:

Always-On AI Agents: Customer service bots, monitoring systems, and agents requiring continuous availability benefit from dedicated infrastructure avoiding cold start penalties.

Heavy Model Inference: Large language models requiring substantial memory (8GB+ RAM) or specialized hardware (GPUs) exceed serverless platform capabilities.

Complex Multi-Agent Systems: Workflows requiring persistent communication between multiple AI agents, shared memory, or complex coordination patterns work better with traditional infrastructure.

High-Volume Production Workloads: Applications processing thousands of AI requests per minute may find dedicated infrastructure more cost-effective than per-invocation serverless pricing.

Hybrid Architectures: Best of Both Worlds

Most production AI systems benefit from hybrid approaches combining serverless and traditional infrastructure. AWS Step Functions provides excellent orchestration for these patterns:

Router Pattern

Use serverless functions as intelligent routers directing requests to appropriate processing infrastructure:

export const aiRouter = async (event) => {
  const complexity = analyzeRequestComplexity(event);

  if (complexity.simple) {
    return await processServerless(event);
  } else {
    return await queueForContainerProcessing(event);
  }
};

Hot/Cold Architecture

Maintain always-on infrastructure for baseline load, serverless for traffic spikes:

Containers handle predictable, consistent traffic
Serverless functions scale for demand peaks
Cost optimization through usage pattern matching

Making the Right Choice for Your AI Deployment

Use this decision framework when evaluating serverless for AI workloads:

Choose Serverless When:

Execution time consistently under 10 minutes
Traffic patterns are unpredictable or bursty
Dependencies fit within platform bundle limits (or container deployment acceptable)
Workflow can be broken into discrete steps
Cold start latency is acceptable for use case

Choose Traditional Infrastructure When:

Workflows require 15+ minutes execution time
Always-on availability is critical
Memory requirements exceed 10GB
Complex multi-agent coordination needed
Consistent sub-second response times required

Consider Hybrid When:

Traffic patterns combine baseline and spike loads
Some workflows fit serverless constraints, others don't
Cost optimization across variable usage patterns is priority

The Bottom Line

Serverless isn't universally perfect or terrible for AI deployment—it's contextual. Simple, discrete AI operations work excellently in serverless environments, providing cost efficiency and automatic scaling. Complex, long-running AI agent workflows require architectural adaptations or alternative infrastructure.

The key is matching your specific AI workload characteristics to platform capabilities rather than forcing incompatible patterns. As serverless platforms continue evolving—container support, extended timeouts, better cold start performance—the viable use cases for serverless AI will expand.

Start by auditing your current AI deployment challenges against serverless constraints. If timeout limits, bundle sizes, or cold start performance block your use case, consider hybrid architectures or traditional infrastructure. If your workflows fit serverless patterns, you'll benefit from simplified operations and automatic scaling.

The serverless AI landscape changes rapidly. What's impossible today may be trivial next year. But right now, success depends on honest assessment of your requirements against current platform realities—not wishful thinking about what serverless should support.

5 Developer Pain Points Solved by Internal Developer Platforms

Geri Máté — Fri, 16 May 2025 12:03:56 +0000

Ever feel like you spend more time wrestling with tools than actually building stuff? You're not alone.

According to GitLab's research, developers waste up to 75% of their time just maintaining toolchains rather than coding. Even worse, over 78% of DevOps professionals report wasting between 25-100% of their time keeping their toolchain running.

Traditional development is like being handed a giant bin of unsorted LEGO bricks and told to build a castle. You spend most of your time digging through the pile looking for the right pieces, and everyone builds differently.

Platform engineering is like getting those official LEGO kits with sorted pieces, clear instructions, and modular components. You still have creative freedom, but you're not wasting hours hunting for that one specific brick or reinventing foundations that have already been perfected.

I've spent years documenting developer workflows and watching teams struggle with the same problems over and over. Let's look at five major pain points and how Internal Developer Platforms (IDPs) actually solve them.

What's an Internal Developer Platform anyway?

Before diving in, a quick definition: an IDP is a self-service layer that sits on top of your infrastructure and tools, abstracting away complexity so developers can focus on building rather than configuring. Think of it as a unified interface for your entire development lifecycle.

No more jumping between 10+ tools just to deploy a simple feature.

Pain Point #1: Deployment Bottlenecks

The Problem

How long does it take your team to get code from commit to production? For most teams, it's days or weeks. Elite teams deploy in under a day.

The bottleneck isn't usually the code—it's the deployment process itself. When deployments require specialized knowledge or manual steps, everything slows down. If the one person who knows how to deploy is on vacation, you're stuck.

The Solution

IDPs provide self-service templates for deployments. Instead of developers needing to understand the underlying infrastructure, they get standardized workflows with the right guardrails.

With a platform approach, your team can:

Deploy without waiting for DevOps/platform teams
Use templates that enforce best practices
Automate the entire CI/CD pipeline
Deploy with a single click or command

Getting Started

You don't need a huge budget to implement this. Start with:

GitHub Actions or GitLab CI for automated pipelines
Docker (used by 59% of professional developers) for consistent environments
Standardized deployment scripts checked into your repo

Set up templates for your most common deployment types and build from there.

Pain Point #2: Context Switching Costs

The Problem

Each interruption costs developers 20+ minutes to regain focus. When developers have to switch between different tasks, tools, and contexts, productivity tanks.

The math is brutal: for a team of 10 engineers losing 10 minutes per context switch at $72/hour, that's $120 lost per build. With 50 builds per day and 22 working days, you're burning $132,000 monthly in lost productivity.

The 2024 State of Developer Productivity report found "time spent gathering project context" tied for the biggest productivity leak (26%).

The Solution

Platform engineering attacks this by creating unified interfaces and standardized workflows. Instead of switching between CI/CD tools, cloud consoles, monitoring dashboards, and ticketing systems, developers get a single interface.

Implementing an IDP gives you:

One portal for accessing all development resources
Integrated workflows that reduce tool-switching
Standardized processes that become muscle memory
Fewer interruptions due to missing context

Getting Started

For smaller teams, you can start with:

A centralized dashboard linking to your most-used tools
Consistent CLI tools that work across projects
Documentation that follows the same structure for all services
Automating workflows that currently require multiple tools

Pain Point #3: Environment Inconsistency

The Problem

"It works on my machine" might be the most frustrating phrase in software development. Environment inconsistencies waste countless hours on debugging issues that only appear in specific environments.

When dev, test, and production environments don't match, you're essentially testing different systems. Problems appear out of nowhere during deployment, and fixing them becomes a painful guessing game.

The Solution

IDPs provide standardized environment templates and self-service provisioning. This ensures consistency across all stages of development.

With a platform approach:

Every environment uses identical configurations
Developers can spin up environments on-demand
Configuration changes propagate consistently
Local development matches production

Getting Started

Begin with:

Docker for containerizing applications
Docker Compose for local development environments
Environment configuration stored as code
Automated environment provisioning scripts

Even small teams can implement these practices incrementally.

Pain Point #4: Cognitive Load from Multiple Tools

The Problem

Most teams juggle 6+ different tools, with 13% managing up to 14 different tools in their development chain. Each tool has its own interface, quirks, and mental model.

Learning and remembering how to use all these tools creates massive cognitive overhead, especially for new team members.

The Solution

Platform engineering streamlines development by providing standardized tools and interfaces. IDPs create a single point of entry for developers to access everything they need.

Implementing a platform approach gives you:

Uniform interfaces across different tools
Standardized workflows that work the same way everywhere
Simplified onboarding for new team members
Lower learning curve for daily tasks

Getting Started

Start by:

Auditing your current toolchain to identify redundancies
Creating consistent interfaces for your most-used tools
Building wrapper scripts that standardize common commands
Setting up a simple internal portal or wiki that provides single-point access

Pain Point #5: Security & Compliance Overhead

The Problem

Security is crucial but often becomes a productivity killer. Manual security reviews, compliance checks, and remediations consume valuable development time and delay deployments.

When security is bolted on at the end rather than built in from the start, it creates friction and frustration.

The Solution

Platform engineering embraces "self-service with guardrails." IDPs build security into workflows rather than tacking it on afterward.

With a platform approach:

Security scanning happens automatically in pipelines
Compliance checks run continuously
Policy enforcement happens transparently
Developers get instant feedback on security issues

Getting Started

Even small teams can implement:

Pre-commit hooks for basic security checks
Automated vulnerability scanning in CI pipelines
Compliance-as-code using tools like OPA
Security templates for new projects

Leveraging What You Already Have

The good news? You probably already have the foundation for platform engineering in place. The trick is connecting these pieces into a cohesive experience:

Your Git workflow can expand beyond code versioning to include configuration and Infrastructure as Code specs.

Those Docker containers you use for local development? With some standardization, they become the basis for consistent environments across your pipeline.

That CI/CD pipeline you built for testing? It can become the backbone of a self-service deployment platform.

The key isn't getting new tools—it's connecting what you have in smarter ways. Focus on eliminating the manual steps between these systems first, then build interfaces that make the process seamless.

What's your team's biggest development pain point? Let me know in the comments!

Streamlining Multi-Tenant Kubernetes: A Practical Implementation Guide for 2025

Geri Máté — Wed, 14 May 2025 14:55:05 +0000

Let's face it: running multiple applications on separate clusters is a resource nightmare. If you've got different teams or customers needing isolated environments, you're probably spending way more on infrastructure than you need to.

Multi-tenancy in Kubernetes offers a solution, but it comes with its own set of challenges. How do you ensure proper isolation? What about resource allocation? And the big one – security?

This guide provides practical steps for implementing multi-tenant Kubernetes that actually works in production environments. By the end, you'll have a roadmap for consolidating your infrastructure while maintaining isolation where it matters.

What Multi-Tenancy Actually Means in 2025

Multi-tenancy has become a bit of a buzzword, but at its core, it still means the same thing: multiple users sharing the same infrastructure. In Kubernetes, we typically see two flavors:

Multiple teams within an organization: Different departments or projects sharing a cluster, where team members have access through kubectl or GitOps controllers
Multiple customer instances: SaaS applications running customer workloads on shared infrastructure

The key tradeoffs haven't changed much over the years, either. You're always balancing:

Isolation: Keeping tenants from accessing or messing with each other's resources
Resource efficiency: Maximizing hardware utilization and reducing costs
Operational complexity: Making sure your team can actually manage this setup

What has changed are the tools and patterns. Pure namespace-based isolation is still common, but we've seen a shift toward more sophisticated approaches using hierarchical namespaces, virtual clusters, and service meshes. Let's start with the building blocks you'll need for a practical implementation.

For more details about how the platform approaches multi-tenancy, check Kubernetes documentation.

The Building Blocks: Practical Implementation Guide

Namespace Configuration That Actually Works

Namespaces are your first line of defense in multi-tenancy. Here's a modern namespace configuration with isolation in mind:

apiVersion: v1
kind: Namespace
metadata:
  name: tenant-a
  labels:
    tenant: tenant-a
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
    networking.k8s.io/isolation: enabled

This does a few key things:

Creates a dedicated namespace for the tenant
Labels it for easier filtering and policy targeting
Applies Pod Security Standards (the modern replacement for Pod Security Policies)
Marks it for network isolation

When organizing namespaces, many teams follow a pattern like {tenant}-{environment} (e.g., marketing-dev, marketing-prod). For SaaS applications, you might use customer IDs or similar identifiers.

The key thing to remember: namespaces alone aren't enough for true isolation. They're just containers for resources – you need additional controls to enforce boundaries.

RBAC That Actually Isolates Tenants

Role-Based Access Control (RBAC) is essential for preventing tenants from accessing each other's resources. Here's a pattern that works well in practice:

# Tenant admin role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: tenant-a
  name: tenant-admin
rules:
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "services", "deployments", "jobs"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# Binding for tenant admin
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: tenant-a-admin-binding
  namespace: tenant-a
subjects:
- kind: User
  name: tenant-a-admin
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: tenant-admin
  apiGroup: rbac.authorization.k8s.io

Notice a few important things here:

The role is scoped to a specific namespace (tenant-a)
It grants permissions for common resources but nothing cluster-wide
The binding associates a user with this role

The pattern is simple but effective: create a set of standard roles for each tenant (admin, developer, viewer), each scoped to the tenant's namespace(s).

One mistake I see teams make is being too generous with permissions. Start restrictive and loosen gradually as needed – it's much easier than trying to lock things down after a breach.

Network Policies That Actually Isolate Traffic

Network isolation is critical for multi-tenancy. By default, all pods in a Kubernetes cluster can talk to each other – not what you want in a multi-tenant environment.

Here's a practical network policy that isolates tenant traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-isolation
  namespace: tenant-a
spec:
  podSelector: {}  # Applies to all pods in namespace
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: tenant-a
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          tenant: tenant-a
  - to:
    - namespaceSelector:
        matchLabels:
          common-services: "true"

This policy does two important things:

Allows ingress traffic only from the same tenant's namespace
Allows egress traffic only to the same tenant's namespace or to namespaces labeled as common services

The second part is particularly important – your tenants probably need access to shared services like monitoring, logging, or databases. By labeling those namespaces as common-services: "true", you create controlled exceptions to your isolation rules.

A common mistake is forgetting about DNS and other cluster services. Make sure your network policies allow access to kube-system services that tenants need, or you'll have some very confusing debugging sessions.

Resource Quotas to Prevent Noisy Neighbors

One bad tenant can ruin the party for everyone by consuming all available resources. Resource quotas prevent this "noisy neighbor" problem:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-a-quota
  namespace: tenant-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20" 
    limits.memory: 40Gi
    persistentvolumeclaims: "20"
    services: "30"
    count/deployments.apps: "25"
    count/statefulsets.apps: "10"

This quota sets limits on:

CPU and memory consumption (both requests and limits)
Number of persistent volume claims (storage)
Number of services and workloads (deployments, statefulsets)

Setting appropriate quota sizes takes some experimentation. Monitor actual usage patterns and adjust accordingly – too restrictive and legitimate workloads fail, too loose and you're back to the noisy neighbor problem.

Pro tip: In addition to ResourceQuotas (which operate at namespace level), use LimitRanges to set default and maximum limits for individual containers. This prevents tenants from creating resource-hungry pods that still fit within their overall quota.

Real-World Implementation Benefits

Research and industry reports show clear benefits when organizations implement proper multi-tenancy in Kubernetes environments:

According to documented implementations, organizations typically see:

30-40% reduction in infrastructure costs by consolidating multiple single-tenant clusters
Significant decrease in time spent on cluster maintenance and updates
Improved resource utilization, often doubling from around 30-35% to 70% or more
Better standardization across development teams

However, implementation isn't without challenges. Common issues include:

Resistance from teams concerned about workload security and isolation
Migration complexity for existing applications
Learning curve for new multi-tenant tooling and workflows
Special accommodations needed for resource-intensive or security-sensitive workloads

This highlights an important point: multi-tenancy isn't all-or-nothing. Many successful implementations use a hybrid approach, keeping some high-security or high-performance workloads on dedicated clusters while consolidating standard workloads in shared environments.

Solving the Big Three Challenges

Challenge 1: Security Vulnerabilities

Cross-tenant data leakage and escalation attacks are the nightmare scenarios in multi-tenant environments. Here's a practical security checklist:

Enforce Pod Security Standards:

   apiVersion: v1
   kind: Namespace
   metadata:
     name: tenant-a
     labels:
       pod-security.kubernetes.io/enforce: restricted
       pod-security.kubernetes.io/enforce-version: v1.29

The "restricted" profile prevents pods from running as privileged, accessing host namespaces, or using dangerous capabilities.

Isolate tenant storage:
Use StorageClasses with tenant-specific access controls, or better yet, separate storage backends for sensitive data.
Implement regular security scanning:
Tools like Trivy, Falco, and Kube-bench can identify vulnerabilities in your multi-tenant setup.
Audit, audit, audit:
Enable audit logging and regularly review access patterns – many breaches are detected through unusual access.

Challenge 2: Resource Contention

Even with resource quotas, you can still run into contention issues. Here are some practical solutions:

Pod Priority and Preemption:

   apiVersion: scheduling.k8s.io/v1
   kind: PriorityClass
   metadata:
     name: tenant-high-priority
   value: 1000000

Assign different priority classes to tenant workloads based on their importance.

Node Anti-Affinity:

   affinity:
     podAntiAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
           - key: tenant
             operator: In
             values:
             - tenant-a
         topologyKey: "kubernetes.io/hostname"

This prevents multiple pods from the same tenant being scheduled on the same node, distributing the load.

Quality of Service Classes: Set appropriate QoS classes (Guaranteed, Burstable, BestEffort) for different tenant workloads to influence how they're treated under resource pressure.

Challenge 3: Operational Complexity

Managing dozens or hundreds of tenants manually isn't feasible. Here's how to simplify operations:

Automate tenant provisioning:
Create a standardized process for spinning up new tenant namespaces, applying policies, and setting quotas.
Use a tenant operator:
Tools like Capsule or the Multi-Tenant Operator can handle tenant lifecycle management, from creation to termination:

   apiVersion: tenancy.stakater.com/v1alpha1
   kind: Tenant
   metadata:
     name: tenant-a
   spec:
     owners:
     - name: tenant-a-admin
       kind: User
     namespaces:
     - tenant-a-dev
     - tenant-a-prod
     quota:
       hard:
         requests.cpu: '10'
         requests.memory: 20Gi
     resourcePooling: true
     namespacePrefix: tenant-a-

Implement tenant-aware monitoring:
Tag all metrics and logs with tenant identifiers to simplify debugging and enable tenant-specific dashboards.
Create self-service capabilities:
Build internal tools that let tenants manage their own resources within the constraints you define.

Wrapping Up: Is Multi-Tenancy Right for You?

Multi-tenant Kubernetes isn't a silver bullet, but it can significantly reduce costs and operational overhead when implemented correctly. Here's a quick checklist to decide if it's right for your organization:

✅ You have multiple teams or customers using similar infrastructure
✅ You're comfortable with the security implications of shared infrastructure
✅ You have the operational maturity to implement and maintain isolation
✅ The cost savings outweigh the increased complexity

The implementation patterns we've covered – namespace isolation, RBAC, network policies, and resource quotas – provide a solid foundation for most multi-tenant environments. Start small, perhaps with just two teams or customers, and expand as you gain confidence in your isolation mechanisms.

Remember, you don't have to go all-in on multi-tenancy. Many organizations use a hybrid approach, with shared clusters for most workloads and dedicated clusters for high-security or high-performance applications.

Whatever approach you choose, make sure your teams understand the boundaries and limitations of your multi-tenant setup. Technical controls are important, but so is user education – a confused tenant can unintentionally cause problems for everyone.

What's your experience with multi-tenant Kubernetes? Have you implemented any of these patterns, or do you have alternative approaches? Share your thoughts in the comments below.

Goodbye, 2023! dyrector.io’s Annual Recap

Geri Máté — Wed, 20 Dec 2023 11:04:33 +0000

2023 is coming to an end, which means it's time to revisit what happened with the team and the project of dyrector.io in the past 12 months.

January – Full Stack Highlighted dyrector.io

After the lengthy Christmas break with a full stomach and a couple extra kilograms the real surprise caught us blind-sided. The Full Stack platform featured dyrector.io in its highlights.

Team-wise the most notable event was our Minus 30 hike in the pleasant January weather, which was a great occasion to have a chat about both technology related and unrelated things, and also to taste some pálinka.

February – dyrector.io Alpha Dropped

The first weeks of February were all about attending FOSDEM and the upcoming launch of dyrector.io on Product Hunt. On the day of the launch we made alpha access available.

Our Product Hunt launch turned out to be a shot at the buzzer, but we still did nice. With a launch 6 hours into the voting, we reached the #11 spot. The same day we made a new release and a demo video. Busier than planned, but we did good.

At the conference in Belgium, we were able to catch up with a lot of likeminded people eager to learn about open-source software.

At the same time, our teammate, Levi showed up in the local cloud meetup scene as organizer and a presenter, too. Another teammate of ours, Nándi was interviewed in the podcast series of Uptime Community about DevOps, ChatGPT, and open-source.

March – Three (Hundred) Is the Magic Number

We doubled down on catering to a self-hosting audience in the first months of 2023, which helped us reach 300 stars on GitHub on the 3rd of March. We published a bunch of blog posts about self-hosting certain types of applications, which you can find here.

In March, we published our Awesome repository containing infrastructure related questions. We consider it useful when someone is onboarded to a new project maintaining infrastructure.

Another important event of the month was when Docker announced the end of Free Teams on Docker Hub. Backlash was inevitable and so was the organization backing out of their plans of monetizing Free Teams.

April – Adventures in the UK and Hungary

A portion of our team took a business trip in the UK to visit Hanover Displays at their HQ in Brighton. While Levi and Gopher was there, they paid a visit to the LEGO HQ for a meetup, as well.

After the trip in the UK, we went out of the office for a few days of team building when we could unwind with the whole team.

Levi attended KubeCon in Amsterdam, too, which turned out to be the funniest way to reach 420 stars on GitHub on April 20th. Trust me, we didn’t plan this whatsoever.

May – 0.4.0 & Roadmap Published

After a Q1 busy with refactoring and making dyrector.io’s code more efficient, we started to make new releases faster. The first step was making 0.4.0, which didn’t deliver any significant changes to functionality, but it was important to accelerate our release cycle in the long run.

At the same time, we published our roadmap on GitHub and added new issues to the repository.

We also made some new friends: ConfigCat reviewed the platform on their blog.

June – Team Building in Croatia & 1000000000 Stars

Release 0.5.0 was a special moment for our team. It was the first version in months that included new features. To celebrate this special moment, we went to Croatia to finish working on the new version and chill at the sunny beach.

This was the perfect way to kick off our summer. After the trip to Croatia, we were able to consistently release on a bi-weekly basis, shipping new features again and again.

After 0.5.0 dropped, we passed 512 stars on GitHub, or 1000000000 in binary.

July – Automated Deployments With dyrector.io

One of the most significant features we added this year was the auto-deployment capability. The GitHub Actions compatible feature came out on July 14 in release 0.6.0.

A very pleasant surprise was when Nevo David mentioned dyrector.io in his blog post, which resulted in increased exposure and interest in the platform. In a few days we gained hundreds of stars on GitHub.

Even though it was the middle of the summer, we took no breaks. Between publishing new releases full of new features, we went to Lake Balaton to sail and Nándi and Geri even completed the Lake Balaton Cross Swimming.

At the end of July Levi attended WeAreDevelopers 2023 in Berlin.

August – dyrector.io Turns International

The most significant change was an internal change: our teammate, Nándi moved to the Netherlands with his girlfriend. We officially became a remote-first company, while the rest of the team still showed up at the office every day. We had a goodbye party for him where we said farewell with a few cans of his favorite beverages for the road.

We launched dyrector.io on a new platform called Dev Hunt, which is an open-source Product Hunt alternative. With the help of our community, we were able to reach the #1 spot and the Developer Tool of the Week title that comes with it.

In other cloud-related news HashiCorp announced they're changing their products' license, including Terraform’s, to Business Source License, which sparked the foundation of OpenTF, which later was named OpenTofu.

September – Product Hunt Launch #2

The majority of August was spent on preparations for our Product Hunt launch in September. The date was set – September 8th. We knew a product like ours only has a chance of a significant result on a Friday.

The result: #6 in the daily rankings, top 50 in the weekly with around 260 votes. Definitely an impressive result with a heavily developer-focused tool.

In the meantime, Levi took care of networking: he appeared in the Follow The Pattern podcast, attended InfoBip’s Shift conference in Croatia, and went to Kubernetes Community Days in Vienna.

October – darklens Enters the Scene

The biggest achievement of October in our household was a one-week sprint when more than half of the team was on vacation. Three teammates of ours joined forces, two developers and one marketer, to develop a complimentary product to dyrector.io.

We named this tool darklens, which makes Docker logs and container settings available in your browser. A week after the sprint we launched darklens on Product Hunt for an impressive #14 spot with 140 upvotes.

November – Team Building in Portugal

Over the summer, the whole team was able to snag developer tickets to Web Summit in Lisbon. Soon as we got the confirmation, we started planning our travel to Portugal. With a little sightseeing and networking at the conference, the week we spent in Lisbon turned out to be a blast. We made a lot of new connections.

One of the coolest things of the year was when people found the invitation card for our CTF puzzle and came to our Discord channel or stopped by to say hi at Web Summit.

December – 0.10.0. Dropped

The latest release of dyrector.io, 0.10.0 dropped in early December. You can find out more about it on GitHub.

That’s it for 2023. So long, and thanks for all the fish!

This blogpost was written by the team of dyrector.io. dyrector.io is an open-source continuous delivery & deployment platform with version management.

Support us with a star on GitHub.

5 Use Cases When Containerization Is Absolutely Useless for You

Geri Máté — Thu, 30 Nov 2023 14:19:59 +0000

#1 Static, Unchanging Environments

If your application has minimal dependencies and operates consistently across different environments without the need for isolation, containerization may offer little benefit.

Example:

If your application will be the only process executed on the machine.

#2 Limited Scalability Needs

For applications with predictable and steady workloads that do not require rapid scaling or dynamic resource allocation, the overhead of containerization might outweigh the advantages.

Example:

Small scale IoT apps.

#3 Simple, Standalone Applications

In cases where your application is straightforward, lacks dependencies, and isn't part of a larger ecosystem with varied technologies, containerization may introduce unnecessary complexity.

Example:

Zero dependency binaries, and also debugging a host process is more straightforward than doing the same with a container.
Offline applications installed from external medium, running without internet connection.

#4 Resource-Constrained Environments

On systems with extremely limited resources, such as embedded devices or constrained hardware, the overhead of running containerization platforms might not be justified.

Example:

Microelectronics.

#5 Desktop Applications

Sounds exotic, huh? For a good reason. It would be very unusual to use containers for desktop applications. Though similar isolation techniques exist, it is not widespread.

Example:

cs_16_nosteam_portable.exe😅

If You Really Need to Containerize...

You can use dyrector.io to deploy and manage containerized services.

⭐ Star dyrector.io on GitHub:

https://github.com/dyrector-io/dyrectorio

Dagger 101: How to Get Started with Containerized CI Workflows

Geri Máté — Thu, 23 Nov 2023 11:04:26 +0000

Continuous Integration and Continuous Delivery are the secret sauces of shipping new features consistently and reliably to your software. However, the effectiveness of this process is closely tied to the tooling that orchestrates it. Some of the pain points of CI/CD systems are slow feedback loops, vendor lock-in, lack of abstraction, limited composability, or YAML itself. This is where Dagger comes into the spotlight, promising a more unified and accelerated path.

Introduction

The development and deployment process at dyrector.io has already become much faster each year as we adopt and integrate better tools and methods. However, we aim to further unify and accelerate this. Dagger philosophy aligns with what we consider crucial for a truly rapid and seamless process:

Local testing: Enable developers to test their code instantly, locally
Programmable CI: Replace messy YAML-based, complex CI with code
Compatibility: If it runs in a container, you can add it to your pipeline
Portability: The same pipeline can run on your local machine, a CI runner, a dedicated server, or any container hosting service
Universal caching: Every operation is cached by default, and caching works the same everywhere

Currently, we have the option to use our own dyrector.io (we’ll refer to it as dyo many times in this blog post) go CLI with our commands or Docker Compose with its YAML to spin up our stack for local testing, while we also maintain a GitHub Actions workflow for running end-to-end tests on GitHub. This setup lacks coherence, as we cannot employ the specialized GitHub Actions workflow YAML in a local setting or with a different CI/CD environment.

We want to get closer to being able to ship every single day, or even multiple times a day, as quickly as we possibly can, using the same tool running locally and in CI. Dagger feels like an actual innovation in CI/CD, and it seems it will enable us to do that. There is also a strong focus on getting feedback from the community and utilizing it when we’re designing and building something that people really need.

Setting up Dagger CI/CD

We would like to use Dagger locally with the dyo Go CLI, and for this we need the Dagger Go SDK for integration (there are many Dagger SDKs) and the Dagger Engine, which will run our pipelines. We developed a small proof of concept (POC) to evaluate if we could use our entire stack locally with Dagger. If this POC will be successful, we plan to use the same setup in our GitHub workflow, essentially using GitHub Actions just to trigger the Dagger pipeline.

Steps to set up Dagger for our project:

Install the Dagger Go SDK (again, you can use any other Dagger SDK for your project, but we use Go) Go to your existing project – in our case it is dyrectorio.

$ go get dagger.io/dagger
$ go mod tidy

Add local Dagger test to our Makefile It is for simple and fast “make test” (similarly to our other commands).

# Shortcut for local testing
.PHONY: test
test:
    go run golang/cmd/dagger/main.go

Create Dagger main.go
We already have dyo, dagent and crane in our golang/cmd, so put dagger here too.
Import Dagger SDK

Create a Dagger client using the SDK
This will allow you to interact with the Dagger Engine and create pipelines.

Create Dagger pipelines

Additional note:
We can also install the Dagger CLI if we want to, but this is an optional tool to interact with the Dagger Engine from the command-line – it has a nice terminal UI though, with parallel progress bars that are visually impressive if you are into that sort of thing.

Install the Dagger CLI

$ cd /usr/local
$ curl -L https://dl.dagger.io/dagger/install.sh | sh

Workflow Integration

As you will see, the “Dagger way” is a very “Docker-ish” way - no surprise, one of the co-founders of Dagger is Solomon Hykes, earlier founder and technical director of Docker.

To show you concrete code examples from our POC:

Import Dagger SDK
In our main.go:

import (
    "context"
    "dagger.io/dagger"
    …)

Create a Dagger client using the SDK

func initDaggerClient(ctx context.Context) *dagger.Client {
    client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))
    if err != nil {
        panic(err)
    }
    return client
}

And we can call this initDaggerClient() function in our main() like this:

    ctx := context.Background()
    client := initDaggerClient(ctx)
    defer client.Close()

Run unit tests on our NestJS-based Crux backend:

func runCruxUnitTestPipeline(ctx context.Context, client *dagger.Client) {
    log.Info().Msg("Run crux unit test pipeline...")

    _, err := client.Container().From("node:20-alpine").
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "test"}).
        Stdout(ctx)
    if err != nil {
        panic(err)
    }

    log.Info().Msg("Crux unit test pipeline done.")
}

We can call this runCruxUnitTestPipeline() function in our main():
runCruxUnitTestPipeline(ctx, client)

Run unit tests on our Next.js-based Crux UI frontend is very similar to the above code, we only need to change the host directory to “web/crux-ui/” and an additional “.next” exclusion, everything else remains the same:

    WithDirectory("/src", client.Host().Directory("web/crux-ui/"), dagger.ContainerWithDirectoryOpts{
        Exclude: []string{"node_modules", ".next"},
    }).

A slightly more advanced example when we run our Crux backend in production mode (as we do for e2e test) with a connected PostgreSQL DB service container:

func getEnv(envPath string) map[string]string {
    cruxEnv, err := godotenv.Read(envPath)
    if err != nil {
        panic(err)
    }
    return cruxEnv
}

func getCruxPostgres(client *dagger.Client, cruxEnv map[string]string) *dagger.Container {
    databaseURL := cruxEnv["DATABASE_URL"]
    parsedURL, err := url.Parse(databaseURL)
    if err != nil {
        panic(err)
    }
    postgresUsername := parsedURL.User.Username()
    postgresPassword, _ := parsedURL.User.Password()
    postgresDB := strings.TrimPrefix(parsedURL.Path, "/")

    dataCache := client.CacheVolume("data")

    cruxPostgres := client.Pipeline("crux-postgres").Container().From("postgres:14.2-alpine").
        WithMountedCache("/data", dataCache).
        WithEnvVariable("POSTGRES_USER", postgresUsername).
        WithEnvVariable("POSTGRES_PASSWORD", postgresPassword).
        WithEnvVariable("POSTGRES_DB", postgresDB).
        WithEnvVariable("PGDATA", "/data/postgres").
        WithExposedPort(5432)

    return cruxPostgres
}

func runCruxProd(ctx context.Context, client *dagger.Client, cruxPostgres *dagger.Container) *dagger.Container {
    crux := client.Pipeline("crux").Container().From("node:20-alpine")
    crux = crux.
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithServiceBinding("localhost", cruxPostgres).
        // WithEnvVariable("NOCACHE", time.Now().String()).
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "build"}).
        WithExec([]string{"npm", "run", "prisma:migrate"}).
        WithExec([]string{"npm", "run", "start:prod"})

    _, err := crux.Stdout(ctx)
    if err != nil {
        panic(err)
    }

    return crux
}

We can run the above code in our main() like this:

    cruxEnv := getEnv("web/crux/.env") 
    cruxPostgres := getCruxPostgres(client, cruxEnv) 
    runCruxProd(ctx, client, cruxPostgres)

We would like to note that we made our POC with Dagger 0.8.x during September, so the code snippets above will show that. But even then the new API development of Dagger Services v2 (which we will need for our complex e2e pipeline) was in progress at Dagger in a separate feature branch and they promised on their Discord forum back then that this new API with some breaking changes will be included in Dagger 0.9. It wasn’t just us showing demand for parallel long running service containers - and they kept their word and it is indeed included in Dagger 0.9.0 released at the end of October. Shouts to Team Dagger!

We put our POC on hold in October, but we have been keeping an eye on Service v2 developments and news. We will try out Service v2 in the near future and dedicate another blog post to whether we managed to solve our entire e2e pipeline with Dagger.

Dagger efficiently caches each step of the pipelines, automatically handling the caching of source code copies, containers and builds, and when developers configure it programmatically, it also caches mounted volumes such as database data, node_modules, and Go build-cache. Our logs provide clear examples of this on reruns without code modifications.

    copy web/crux/ CACHED
    > in host.directory web/crux/
    …
    pull docker.io/library/postgres:14.2-alpine CACHED
    > in crux-postgres > from postgres:14.2-alpine
    > in crux > service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    …
    exec docker-entrypoint.sh postgres
    > in crux > service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    [0.15s] PostgreSQL Database directory appears to contain a database; Skipping initialization
    …
    [0.30s] 2023-11-08 10::11.131 UTC [15] LOG:  database system is ready to accept connections
    ...
    exec docker-entrypoint.sh npm run build CACHED
    > in crux
    exec docker-entrypoint.sh npm run prisma:migrate CACHED
    > in crux
    exec docker-entrypoint.sh npm ci CACHED
    > in crux
    copy / /src CACHED
    > in crux
    exec docker-entrypoint.sh npm run start:prod
    > in crux
    [0.57s] > crux@0.7.0 start:prod
    [0.57s] > node dist/main
    [2.31s] [Nest] 33  - 11/07/2023, 14:24:13.142 AM     LOG [NestFactory] Starting Nest application...
    ...

Challenges and Lessons Learned

We were able to run most of our stack with Dagger 0.8.x, the Crux backend and the Crux-UI frontend separately, but our entire e2e test will require Dagger 0.9.x with the Services v2 API that we can run Crux, Crux-ui, Traefik and Kratos as long running service containers for the Playwright e2e container.

If you want to know more about the Services v2, Dagger wrote a blog post about it here:

Dagger 0.9: Host-to-container, container-to-host, and other networking improvements: https://dagger.io/blog/dagger-0-9

Best Practices for Dagger CI/CD

The fact that we can write the CI/CD code in Go and in a docker-like style had a refreshing effect on us. Here are some general tips:

Iterate small: Start with a small POC to understand how Dagger fits into your workflow before scaling up
Community engagement: Stay active in Dagger's community forums or Discord channels for support and to keep up with the latest developments
Documentation: Keep your Dagger configurations well-documented to ease onboarding and maintenance
Monitor and optimize: Regularly review the performance of your pipelines and optimize caching strategies for better efficiency

Conclusion

We have seen firsthand the transformative nature of Dagger and the flexibility of its programmable pipelines. It stands out as a forward-thinking solution, addressing typical CI/CD bottlenecks with a developer-centric approach. Since Dagger is relatively new and evolving, keeping an eye on updates and community feedback can help in adopting best practices as they emerge.

Dagger Resources

There's still lot to learn about Dagger, so it might be worth the time to check out the following resources to learn about this tool:

You can explore further on Dagger's official website: https://dagger.io
For those eager to dive deeper into Dagger's capabilities, the Dagger documentation is an excellent resource: https://docs.dagger.io
For absolute hackers: https://github.com/dagger/dagger
Dagger Discord community: https://discord.gg/dagger-io

This blogpost was written by the team of dyrector.io. dyrector.io is an open-source continuous delivery & deployment platform with version management.

Support us with a star on GitHub.

The One API - DevHunt Digest #6

Geri Máté — Mon, 13 Nov 2023 09:48:53 +0000

DevHunt is the open-source platform where you can showcase your developer tool. Tools compete every week for the top spot. Here's a look at who's in the race this time.

Unified.to

Unified is an API platform that's supposed to substitute for API integrations, instead developers integrate Unified once and have access to 127 integrations available.

After signing up, I was immediately directed to the dashboard where I'll see statistics about my integrations. Onboarding is easy, I like that they point to the resources you'd need in case you get stuck. Also I like that documentation isn't hidden somewhere, you navigate to Help menu and you can go check the docs.

At first glance documentation might feel weird with all the section namings, but I liked that you can navigate to the integration's documentation that you'd like to use. Pretty good tool in general!

Papermark

Papermark is an open-source DocSend alternative. Checking out the landing page, I'm not sure whether I like the dude with sign image. First impression: I don't care if you're looking for an investor. BUT, when I think about it, it's a good signal for expectations.

And let me tell you: Papermark is very good at what it's supposed to achieve. Send a pitch deck in PDF format and get analytics about it. I haven't tried setting it up for myself, but I might give it a try one day when I feel like it.

Another thing I liked about the landing page is the alternatives section in the footer. As a user you're probably not familiar to what you can do with Papermark but especially if you're working in a startup, it's realistic that you're using some kind of tool to send decks and such. Pretty useful to have some comparison to other tools.

Task Badger

Task Badger is a monitoring solution for backend tasks and queues. It's a useful tool when you'd like to visualize your backend's performance.

Task Badger is designed for engineers and they included lots of examples to provide starting points. Right next to the sign up button, they included a button that directs users to documentation - another brownie point for Task Badger.

I like the docs, too, but I think it could be improved with some of the individual sections turned into separate, smaller sections. For example, the quick start guide can be broken down into a separate sections for API and CLI users. I've found a weird thing though: this page of the documentation can't be accessed from the sections list or the table of content, just through a link in the getting started guide. I wouldn't hide it.

Pontus

Pontus is a privacy-focussed AI tool. I can't try or look at it because you can only request a demo as of now.

Recombinant AI

Recombinant AI is a conversational IDE tool which based on this demo video can only be used with paid access to ChatGPT because it's essentially a ChatGPT plugin. It's not easy to find out more about the project, as the landing page itself isn't really informative about what you can do with Recombinant AI.

DailyDomains

DailyDomains is a simple tool for domain hoarding enthusiasts. I mean, 2023 was the first year I purchased a domain and I didn't stop there. I assume there are many people who just think about an idea and immediately buy the domain knowing well they'll never make the solution.

Anyway, DailyDomains takes it a step further. It'll suggest you a few domains and generate a business idea for it. I kind of like this approach! For the small price of $12/month, you can use it to brainstorm domains for your business idea, which is probably a gamechanger to any indie hacker struggling to name their thing.

My only question is: how come this has so few upvotes days into voting?

Squirrelsong

Squirrelsong is a low-contrast light and dark theme. You can find out more about the themes here. I recommend at least a look at this, because I tried it with Google Chrome and it looks great.

Maruti.io

Maruti.io is an API for open-source language models. Based on the landing page it's difficult to figure out what's the purpose of this project, but after looking around and checking out the launch, it seems like an MLOps platform that can be utilized via an API.

I think there's a lot to improve, because documentation is very rudimentary, you can see it for yourself here. And as someone who's not a native English speaker, I can understand how difficult it can be to write copy, but the lack of copy is a bigger problem than the quality of the copy.

Vite Plugin

Vite Plugin is an open-source plugin that removes React.js attributes. It's useful for excluding attributes like 'data-testid' used in testing. Options include specific file extensions, attributes, ignored folders, and files.

blogfactory.dev

Blog Factory is a blog post generator tool. Getting started needs a bit of fixing: when you click on the Get started button, you should be directed to the log in page. Without log in you're stuck in the Create your first article flow where you can't do anything.

When you'd like to generate a blog post, you can specify a title and keywords, then set style-related options, including language, flavor (SEO friendly article, how-to guides, etc.) and writer. It has a persona option, too, but the only option for that is none as of now.

I gave it a test run to write a similar how-to blog post to our latest one of dyrector.io's blog discussing self-hosted GitHub runners.

Of course, it's not going to be as detailed as written by a human, and our case is very specific when it comes to GitHub runners, but I think there's potential in Blog Factory. It would be pretty dope to have a tool that can accelerate content writing for developer tools, because small teams and indie hackers usually can't find a way to consistently create new content.

NoCode Animations

NoCode Animations is an animation Anime.js tool on Bubble.

SkillAI

SkillAI is an AI generator tool that helps you design learn paths for skills you'd like to develop. You can input any skill you'd like, so I went with this below:

Bricks AI

Bricks AI is a tool that helps teams use business applications in a conversational way. Right now it can't be used, only a waiting list is available.

Bird Eats Bug

Bird Eats Bug is a tool that helps you manage bug reports and fixes more efficiently. One of the coolest things about this is the bug replay feature which allows you to recreate bugs that you missed tracking somehow.

Kropply

Kropply is a coding assistant tool that helps you discover bugs within your code. It works as a VS Code extension. It's compatible with some of the most popular languages: C#, C++, C, Java, Go, Rust, JS, TS, Python.

That's it for the weekly batch of developer tools that launched on DevHunt. What's your favorite project out of them? Leave it in the comments and show some love by casting a vote!

Why You Should Self-Host GitHub Runners – Or Stay Away from It

Geri Máté — Wed, 08 Nov 2023 12:23:24 +0000

GitHub Actions is the Alfred to your Batman. When you don’t feel like doing something or simply don’t have the capacity to handle various tasks, you can rely on GitHub Actions to automate workflows. You can take GitHub Actions to the next level by self-hosting runners, though. But should you? Let’s find out!

Why Self-Hosted GitHub Runners Are Beneficial

We’ve been managing dyrector.io’s code on GitHub for more than a year now. One thing we’ve always struggled with was slow GitHub Actions workflows. Here’s why we’ve been contemplating switching to self-hosting our GitHub Runners.

Speed

The default GitHub runner takes longer to execute as it initializes an ephemeral runner for each job in a workflow from scratch. This method, chosen by GitHub for its simplicity and security, has its merits. Compared to this, self-hosted runners remain active, bypassing the initialization phase for every job, thus providing quicker execution. This continuous availability demands proper management to ensure subsequent runs are not interfered with by remnants from previous executions.

Control

GitHub-hosted runners run on Ubuntu with a 2-core CPU, limiting parallel job executions to four. In a self-hosted scenario, we have the liberty to choose other OSes. We opted for Rocky Linux over Ubuntu for its open-source, enterprise-grade, and 100% Red Hat compatibility. This choice also allowed us to define the VM's hardware parameters like CPU, memory and disk type/size. However, this freedom comes at the cost of increased maintenance overhead.

Debugging / Monitoring

Debugging is more challenging on GitHub-hosted runners as only error messages and logs are retained. In the meantime, self-hosted runners keep everything in the “_work” and “_diag” directories, allowing real-time monitoring to understand precisely what is happening and the resources being consumed, as the running VM is under our control.

As we look into the future and explore opportunities for further improvement:

Writing in YAML, especially for CI/CD purposes, often necessitates additional scripts to handle various build and runtime conditions in a workflow. This can result in a fragmented view of the process.
Alternatively, or in addition, leveraging the power of Dagger CI/CD could offer a more streamlined approach to creating workflows. Dagger CI/CD allows you to use real programming languages through the Dagger SDK.
For example, we have chosen to use the Dagger Go SDK, which enables the creation of unified workflows. These workflows can run seamlessly, whether it's locally, on GitHub-hosted runners, self-hosted runners, or other CI/CD frameworks, with minimal or no need for significant modifications. This approach entirely avoids the need for extensive YAML configurations, providing a more efficient and flexible way to manage your CI/CD pipelines.

Few Reasons Why You Shouldn’t Self-Host Runners

Convenience

The default GitHub hosted runner functionality is free and comes with autoscaling if we look at the submitted parallel pull requests, so you don't have to do anything for them, they are simply there and doing their job. We obviously lose this default behavior if we go on the self-hosted route.

Setup/Maintenance

The initial setup requires a learning curve, and maintaining the runners can demand a fair share of time. It is not so much the setting of the runners themselves, but rather the maintenance, updating, securing the VM(s) and the correct initial setting of the workflow to manage the clean up and teardown side steps for every job and job step, if necessary.

Security Concerns

Self-hosted runners may expose your environment to potential security risks if not configured and managed properly. Something even GitHub recommends in its official docs is to use self-hosted runners with private repositories. Here's a more detailed description about security measures for GitHub runners.

Setting Up Self-Hosted Runners

Ensure your system meets GitHub's minimum requirements, which include:

2-core CPU
7 GB RAM
14 GB SSD storage

We used a larger machine with the following specifications:

16 vCPUs
32 GiB memory
Initially, 16 GB SSD (later upgraded to 64 GB)

The upgrade was necessary due to the combined temporary space needs of our code, node.js, and about 10 docker containers, including a playwright container for testing. Our runners resided on an additional data disk, leaving about 8 GB free on the system disk.

Instead of using multiple small VMs with one runner each, we chose to use one large VM hosting several parallel runners. This approach minimizes VM maintenance overhead and is designed to efficiently handle multiple parallel GitHub pull requests.

Future scaling is straightforward as setting up additional runners and/or VMs is not complicated; runners distribute workflow jobs based on common labels regardless of their VM location.

We set up our self-hosted runners with these steps, here we will show actions-runner-001, but it was done in a similar way for our runners 002, 003 and so on.

Step 1: Create a new runner

At your GitHub repository’s Settings, in the left sidebar click Actions, then click Runners and finally click New self-hosted runner. Select the OS image and architecture of your self-hosted runner machine. In our case it is Linux and x64.

Step 2: Download the runner installer

# Create a folder
$ mkdir actions-runner-001 && cd actions-runner-001
# Download the latest runner package
$ curl -o actions-runner-linux-x64-2.311.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.311.0/actions-runner-linux-x64-2.311.0.tar.gz
# Optional: Validate the hash
# On Rocky Linux you may need to install shasum once for this validation
$ sudo dnf update
$ sudo dnf install -y perl-Digest-SHA
$ echo "29fc8cf2dab4c195bb147384e7e2c94cfd4d4022c793b346a6175435265aa278  actions-runner-linux-x64-2.311.0.tar.gz" | shasum -a 256 –c
# Extract the installer
$ tar xzf ./actions-runner-linux-x64-2.311.0.tar.gz

Step 3: Install runner dependencies (if needed)

We only need to do this step once per VM, not per runner. You can skip this step if your OS already contains these dependencies, but for Rocky Linux 9.2 it was necessary.

# Install dependencies (on Rocky Linux dotnet core 6 was missing by default) 
$ sudo ./bin/installdependencies.sh

We also installed node.js, go and docker on our VM for our workflow, but these are not runner dependencies, so we will not go into detail about that here.

Step 4: Configure the runner

# Create the runner and configure it
$ ./config.sh --url https://github.com/dyrector-io/dyrectorio --token <RUNNER_TOKEN>

During the configuration process, you can keep most settings at their default values, but we chose to make our runners easily identifiable by giving them unique names and adding extra labels. Initially, the configuration script provides a common name, but our objective was to test multiple runners on a single VM.

By default, a runner is tagged with three labels for Linux x64: self-hosted, Linux, and X64. However, you have the flexibility to specify additional labels during the initial configuration or later on the GitHub repository website. Unlike the default labels, you can add or remove these custom labels at any time. These labels come in handy for targeting specific groups of runners or individual runners within your workflow.

Step 5: Set pre-job script

Pre-job script is not mandatory if you do not want to use it, but we need it.
In the runner directory just create a .env file with this content:

ACTIONS_RUNNER_HOOK_JOB_STARTED=pre-job-script.sh

And in the pre-job bash script file you can use your additional VM specific logic which will run before every job. Important to write “exit 0” at the end of the script file, because this means the script run without errors – otherwise or if you return any other value the runner will skip this job. You can also use this to your advantage for pre checks.

Step 6: Start the runner

You can start the runner with its run script ($ ./run.sh), but we want to run it as a service so first need to install the service and on Rocky Linux we also need to set the SELinux security context for the runsvc.sh file to ensure it operates correctly within the SELinux security policy (otherwise it will be blocked). We only need to set SELinux context and service install once.

# Set SELinux context for the runsvc script to s0 (standard security level)
$ sudo chcon system_u:object_r:usr_t:s0 runsvc.sh
# Install the runner as a service
$ sudo ./svc.sh install

Now we can use the service with its start, stop, status commands.

# Start the runner service
$ sudo ./svc.sh start

After completing these steps, the runner and its status are now listed under "Runners" of the GitHub repository.

Step 7: Execute a workflow on self-hosted runners

In your workflow file, use the following YAML for each job, adjusting the label(s) as per your runner configuration:

runs-on: self-hosted

Security Tips

Additional security measures for our public open-source repository:

We use CODEOWNERS file for our repository.
In the repository settings, we have the "Require approval for all outside collaborators" option enabled instead of the default "Require approval for first-time contributors".
Before allowing any external pull requests to run, we check if any workflow files have been modified! (It is easy to spot if anything appears in .github/workflows, without much approval overhead)
We use our self-hosted GitHub runner with an isolated Azure VM in its own resource group.
We take care of updating the runner VM's OS to ensure it is always up to date from a security perspective.
We run external pull requests on a GitHub runner, while we run our own pull requests on our self-hosted runner. This is determined by a necessary pre-job in our workflows, based on the submitter's identity, assigning the appropriate "runs-on" label to the subsequent jobs.
In the runner's “_diag“ and “_work“ directories, we can review diagnostic logs for both the workflow runs and the runner itself, as well as the checked-out code in the "workflows private" directory."

Conclusion

Self-hosted GitHub runners offer more freedom and level of control that can significantly boost the efficiency of your development workflow. However, they come with the overhead of setup, maintenance, and potential security concerns. Assessing your project’s needs and your team’s capacity to manage self-hosted runners is crucial before diving in. With proper setup and management, self-hosted runners can indeed be a valuable asset to your development process.

This blogpost was written by the team of dyrector.io. dyrector.io is an open-source continuous delivery & deployment platform with version management.

Support us with a star on GitHub.