Forem: Martin Muller 🇩🇪🇧🇷🇵🇹

How I Use OpenClaw as My AI-Powered Personal Operating System

Martin Muller 🇩🇪🇧🇷🇵🇹 — Wed, 01 Apr 2026 00:00:00 +0000

I've been running OpenClaw on a Hostinger VPS for about a month now, and it has fundamentally changed how I work. What started as "let me try this AI agent thing" turned into a powerful assistant that saves me hours every week by automating routine tasks — from GitHub issue triage to email management.

OpenClaw doesn't do everything for me (yet), but it handles a lot of the repetitive work that used to eat up my day. Here's a deep dive into every use case I've discovered so far.

What is OpenClaw?

OpenClaw is an open-source AI agent platform that runs on your own infrastructure. You connect it to your chat channels (Telegram, WhatsApp, Discord), give it access to your tools, and it becomes a persistent assistant that remembers context across sessions. Think of it as your own self-hosted AI employee that's always on.

My setup: Docker container on a Hostinger VPS, with Telegram as my primary interface. Most of my interactions — voice messages, text, files — go through Telegram. It feels like texting a colleague who never sleeps.

Use Case 1: Autonomous GitHub Issue Management

This is the killer feature for me. OpenClaw monitors my HalloCasa repositories every 2 hours via a heartbeat. When a new issue appears, it:

Notifies me on Telegram
Generates a detailed implementation plan using Cursor CLI with Claude Opus for planning
Posts the plan as a comment on the GitHub issue
Waits for my approval before implementing
Implements the fix using Cursor CLI with Composer
Creates a branch, commits, pushes, and opens a PR

In the first week alone, it planned and implemented fixes for currency display bugs, locale changes, phone validation, and filter cleanup — all with minimal intervention from me.

The workflow follows the "Factory" pattern I heard about at Agentic Conf Hamburg: don't just use AI to build features — build a production line that builds features for you.

Use Case 2: Email Management

OpenClaw reads and sends emails via Himalaya CLI (IMAP/SMTP). I just tell it what to do on Telegram:

"Tell the sports group leader we can't come today" → Searched my contacts, found the match, sent a polite cancellation.
"Email my tax advisor that I need to reschedule" → Found the firm in contacts, sent a professional German email.
"Reply to the kindergarten about scheduling a visit" → Continued an existing email thread with context-aware reply.

It sends HTML emails with a professional signature and handles German and English.

Use Case 3: Morning Briefing (Cron Job)

Every morning at 7:00 AM (my timezone, Europe/Berlin), OpenClaw runs an automated check:

Scans unread emails from the last 24 hours
Checks yesterday's emails for pending follow-ups
Cross-references with my Google Calendar for today and tomorrow
Sends me a concise summary on Telegram

This runs on Claude Haiku (cheap) and costs almost nothing. I wake up to a briefing instead of manually checking three different apps.

Use Case 4: Calendar Management

OpenClaw has full access to my Google Calendar via OAuth. Examples:

Conference schedule : I sent a PDF of the Agentic Conf Hamburg schedule with sessions I'd marked with red boxes. It extracted the image, identified the marked sessions using vision AI, fetched details from the conference website, and created 9 calendar events with full descriptions, speaker info, and links.
Creating meetings : "Create a meeting with Phillip P. for today at 18:00 CET" → Done.

Use Case 5: Contact Lookup

Connected to my iCloud contacts via CardDAV, OpenClaw searches by name, organization, or context. When I say "email the sports group leader," it finds the match and uses the right email. No manual lookup needed.

Use Case 6: LinkedIn Post Drafting

I've used it to draft LinkedIn posts for:

AWS Community Day Athens 2026 : Looked up all speakers from the conference website, compiled them alphabetically, created a professional announcement post.
Agentic Conf Hamburg recap : Searched for organizers' backgrounds, found LinkedIn profiles, referenced specific talks, and wove in my personal highlights about the "Factory" concept.

It does the research, I do the personal touch.

Use Case 7: Research & Due Diligence

Questions that would normally cost 15 minutes of Googling:

"Is there an app for biometric passport photos with QR codes?" → Comparison of available apps, including recent regulation changes.
"There's a conference in my area tomorrow, what is it?" → Found the event, full program, speakers, venue, and registration link.

Use Case 8: Document Generation

Birthday invitation : Generated a themed HTML invitation card with embedded images and QR code, published as a shareable link.
Application form : Built a bilingual job application form, deployed to GitHub Pages so it works when shared via WhatsApp.
Kindergarten applications : Generated PDF application forms from templates.
Email signatures : Set up HTML signatures in Gmail (via API) and Apple Mail.

Use Case 9: Workspace & Config Management

OpenClaw manages its own configuration:

Version-controls its workspace files in a Git repo
Adjusts its own heartbeat intervals and model settings when I ask
Maintains daily memory notes and long-term memory files for context continuity

Use Case 10: Global Brain via PeachBase

OpenClaw is connected to PeachBase — a serverless vector database that acts as my persistent memory across all AI agents. Via MCP, OpenClaw can store and retrieve knowledge: personal info, project decisions, contacts, learnings.

When I tell OpenClaw something worth remembering, it stores it in PeachBase. When I ask a question weeks later — even from a different agent like Cursor — the knowledge is there. It's the shared brain that ties everything together.

I wrote a dedicated post about this: My Global Brain with PeachBase.

Use Case 11: Multi-Channel Communication

I talk to OpenClaw primarily via Telegram, but it also:

Sends emails on my behalf (Himalaya/SMTP)
Is connected to WhatsApp
Can deliver cron job results to specific Telegram chats

Use Case 12: Writing This Blog Post

Meta moment: this very blog post was drafted by OpenClaw. I sent a voice message on Telegram (in German): "I want to write about OpenClaw and how I've been using it — go through all your history and find the use cases, in English please."

It scanned 23 days of daily memory notes, extracted every use case, researched the blog repo format, and produced a full draft in the right Gatsby frontmatter format — all in one turn. I just reviewed and tweaked.

This is the "Factory" idea in action: I didn't write a blog post. I told my agent to write one, and it had all the context it needed because it was there for every use case.

Security: It's a Spectrum

Giving an AI agent access to your email, contacts, calendar, and GitHub is powerful — but it's also a security conversation you need to have with yourself.

OpenClaw treats security as a spectrum, not a binary. You can start wide-open for convenience and tighten as you go. Here's what's available and what I use:

Sandboxing

OpenClaw supports Docker-based sandboxing where tool execution (shell commands, file reads/writes) runs inside an isolated container instead of directly on the host. You can choose:

"off" — everything runs on the host (my current setup, maximum convenience)
"non-main" — only non-main sessions (group chats, webhooks) are sandboxed
"all" — every session runs sandboxed

I'm still on "off" because I'm the only user and I trust the agent boundary. But if you're running OpenClaw on a shared machine or exposing it to group chats, sandboxing is a must.

Tool Policy

You can allowlist or denylist specific tools per agent. For example, you could disable exec (shell access) entirely and only allow read/write — or restrict which commands can run. OpenClaw also has an elevated exec model (think sudo) where dangerous commands require explicit approval.

Secrets

One thing I'd do differently: my early setup had API keys in config files. OpenClaw supports environment variables and secret refs instead. Move your credentials out of plaintext config.

The Human-in-the-Loop Rule

My most important security measure isn't technical — it's a rule in my agent's config: always ask before sending emails. After one incident where the agent sent an email I hadn't approved, I added a hard rule. The agent now shows me a draft and waits for "OK" before any outbound communication. This applies to anything that "leaves the machine" — emails, social posts, webhooks.

Network

The Gateway port (18789) should never be public. Mine is localhost-only inside Docker. If you need remote access, use Tailscale or an authenticated reverse proxy with TLS.

My Take

Security with AI agents is genuinely new territory. The threat model is different from traditional apps because the agent can be influenced by external content (prompt injection via emails, web pages). OpenClaw has tools to limit blast radius — sandboxing, tool policies, approval gates — but the most important thing is being deliberate about what you give access to and expanding gradually.

The Numbers

After a month of use, here's what surprised me:

Cost optimization matters : Running on Claude Opus 4.6 burns through API credits fast. Switching heartbeats to Haiku and reducing intervals from 30min to 2h made a huge difference.
Memory is everything : The daily notes + MEMORY.md system means I never have to re-explain context. It knows my projects, my contacts, my preferences.
HTML email was harder than expected : Getting Himalaya to send proper HTML emails with MML syntax took some trial and error. Plain text signatures don't have clickable links.

What's Next

Blog post automation : Using OpenClaw to help draft and publish posts (like this one!)
Deeper GitHub integration : Auto-implementing approved plans without manual trigger
More cron jobs : Weather briefings, social media monitoring, calendar reminders

Conclusion

OpenClaw isn't just a chatbot. It's a personal operating system that happens to be powered by AI. The combination of persistent memory, tool access (email, calendar, contacts, GitHub, file system), and multi-channel communication makes it genuinely useful — not in a "cool demo" way, but in a "I saved 2 hours today" way.

If you're a developer comfortable with Docker and CLI tools, I highly recommend giving it a try. The learning curve is worth it.

Links:

🚀 I'm also looking for beta testers for PeachBase — the serverless vector DB I use as shared memory across all my AI agents. If you want to try it, sign up!

Thanks for reading! If you have questions about my setup, feel free to reach out. And if you'd like help setting up your own OpenClaw AI agent — whether it's configuration, tool integration, or building custom workflows — I'm available for consulting. Just drop me a message at office@martinmueller.dev or book a call at calendly.com/martinmueller_dev.

Strands TypeScript SDK - Building Production AI Agents

Martin Muller 🇩🇪🇧🇷🇵🇹 — Sat, 31 Jan 2026 00:00:00 +0000

Introduction

Building AI agents that work in production requires more than wrapping an LLM API. You need tool execution, streaming responses, cost management, and integration with existing systems. After evaluating several frameworks for ai-secure.dev, I chose the Strands TypeScript SDK from AWS.

Why Strands over alternatives?

Framework	Pros	Cons
LangChain	Feature-rich, large ecosystem	Heavy, complex abstractions
crewAI	Multi-agent orchestration, role-based agents	Python-focused, heavier runtime
Raw Anthropic/OpenAI API	Full control	Too low-level, no tool orchestration
Strands SDK	Lightweight, AWS-native, streaming-first	Newer, smaller community

Strands hits the sweet spot: enough abstraction to be productive, low enough to maintain control. It's what I used to build the security audit agent behind ai-secure.dev.

Agent Creation Basics

Creating an agent requires three things: a model, a system prompt, and tools.

import { Agent, tool } from '@strands-agents/sdk'
import { z } from 'zod'

const agent = new Agent({
  model, // BedrockModel or custom provider
  systemPrompt: `You are a security auditor...`,
  tools: [httpSecurityCheck, dnsLookup, browserNavigate],
})

// Invoke the agent
const response = await agent.invoke('Audit https://example.com')

// Or stream for real-time updates
for await (const event of agent.stream(prompt)) {
  // Handle events: text deltas, tool calls, metadata
}

The SDK handles the agentic loop: model generates response → tool calls extracted → tools executed → results fed back → repeat until done.

Defining Tools

Tools are functions the agent can call. The tool() helper wraps them with Zod schema validation:

const calculatorTool = tool({
  name: 'calculator',
  description: 'Performs arithmetic. Params: operation, a, b',
  inputSchema: z.object({
    operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
    a: z.number(),
    b: z.number(),
  }),
  callback: (input) => {
    let result: number
    switch (input.operation) {
      case 'add': result = input.a + input.b; break
      case 'subtract': result = input.a - input.b; break
      // ...
    }
    return `Result: ${result}`
  },
})

For domain-specific agents, design tools around your use case. My security agent has tools like:

http_security_check - Headers, TLS inspection, redirect chain
dns_lookup - SPF/DMARC/CAA records
browser_navigate - Navigate and interact with pages
totp - Generate 2FA codes for authenticated scans

Complex tool example (abbreviated):

const httpSecurityCheckTool = tool({
  name: 'http_security_check',
  description: 'HTTP security analysis: headers, TLS cert, redirects',
  inputSchema: z.object({
    url: z.string().describe('URL to check'),
    method: z.enum(['GET', 'HEAD', 'OPTIONS']).optional(),
    includeTls: z.boolean().optional(),
  }),
  callback: async (input) => {
    // Make request, inspect TLS socket, check headers
    const securityHeaders = ['strict-transport-security', 'content-security-policy', ...]
    // ... implementation
    return JSON.stringify({ url, statusCode, securityHeaders, tls })
  },
})

Tools are the agent's "hands" - design them for your domain, not as generic utilities.

Custom Model Provider

The SDK includes BedrockModel for AWS Bedrock, but you can create custom providers. I built AnthropicModel for direct Anthropic API access with features like message caching:

export class AnthropicModel {
  constructor(config: AnthropicModelConfig) {
    this.client = new Anthropic({ apiKey: config.apiKey })
    this.config = {
      modelId: config.modelId || 'claude-sonnet-4-5-20250929',
      maxTokens: config.maxTokens || 16000,
      enableMessageCaching: config.enableMessageCaching ?? true,
    }
  }

  async *stream(messages, options) {
    // Convert messages to Anthropic format
    // Add cache_control blocks for cost reduction
    // Yield SDK-compatible events
  }
}

Message caching reduces costs by 90% on repeated context. Add cache_control to strategic messages:

// Cache system prompt (reused every call)
request.system = [{
  type: 'text',
  text: systemPrompt,
  cache_control: { type: 'ephemeral', ttl: '1h' }
}]

// Cache last tool definition
tools[tools.length - 1].cache_control = { type: 'ephemeral', ttl: '1h' }

Cost tracking built into the model:

const MODEL_PRICING = {
  'claude-sonnet-4-5-20250929': { input: 3.00, output: 15.00, cacheRead: 0.30 },
  'claude-haiku-4-5-20251001': { input: 1.00, output: 5.00, cacheRead: 0.10 },
}

function calculateCost(modelId, inputTokens, outputTokens, cacheReadTokens) {
  const pricing = MODEL_PRICING[modelId]
  return (inputTokens * pricing.input + outputTokens * pricing.output 
          + cacheReadTokens * pricing.cacheRead) / 1_000_000
}

Model Routing for Cost Optimization

Not every request needs your most powerful model. Route simple tasks to cheaper models:

function classifyTask(prompt: string) {
  const lower = prompt.toLowerCase()

  // Complex patterns → Sonnet
  const complexPatterns = [
    /security|vulnerabil|audit/i,
    /iso\s*27001|compliance/i,
    /investigate|analyze|assess/i,
  ]

  // Simple patterns → Haiku (10x cheaper)
  const simplePatterns = [
    /^(hi|hello|hey)/i,
    /^(thanks|thank\s*you)/i,
    /^(yes|no|ok)/i,
  ]

  for (const pattern of complexPatterns) {
    if (pattern.test(prompt)) {
      return { complexity: 'complex', model: 'claude-sonnet-4-5' }
    }
  }

  for (const pattern of simplePatterns) {
    if (pattern.test(lower)) {
      return { complexity: 'simple', model: 'claude-haiku-4-5' }
    }
  }

  // URLs always complex (security audits need full power)
  if (prompt.includes('http://') || prompt.includes('https://')) {
    return { complexity: 'complex', model: 'claude-sonnet-4-5' }
  }

  return { complexity: 'complex', model: 'claude-sonnet-4-5' } // Default safe
}

Log cost comparisons in production to validate routing:

📊 Tokens: 15420 in, 2341 out | $0.0812 (sonnet-4-5)
   Alternative: $0.4102 (opus-4-5) → +$0.329 (+405%)

Streaming Architecture

For real-time UX, stream agent events via Server-Sent Events (SSE):

app.post('/invocations', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream')
  res.setHeader('Cache-Control', 'no-cache')

  const sendEvent = (type, data) => {
    res.write(`data: ${JSON.stringify({ type, ...data })}\n\n`)
  }

  for await (const event of agent.stream(prompt)) {
    // Text streaming
    if (event.type === 'modelContentBlockDeltaEvent') {
      const delta = event.delta
      if (delta?.type === 'textDelta') {
        sendEvent('text', { content: delta.text })
      }
    }

    // Tool execution tracking
    if (event.type === 'modelContentBlockStartEvent') {
      const start = event.start
      if (start?.type === 'toolUseStart') {
        sendEvent('tool_start', { tool: start.name })
      }
    }

    if (event.type === 'afterToolsEvent') {
      sendEvent('tool_end', { tool: currentTool })
    }

    // Token usage
    if (event.type === 'modelMetadataEvent') {
      totalTokens += event.usage?.totalTokens || 0
    }
  }

  sendEvent('done', { usage: { totalTokens } })
  res.end()
})

Key event types:

Event	When	Use
`modelContentBlockDeltaEvent`	Text/tool input streaming	Real-time display
`modelContentBlockStartEvent`	Tool call begins	Show "Analyzing..."
`afterToolsEvent`	Tool finished	Show result
`modelMetadataEvent`	Tokens counted	Cost tracking

OpenAI-Compatible Adapter

Why build this? During development, I needed to chat with my agent without building a UI first. By exposing an OpenAI-compatible endpoint, I could use Cline (VS Code extension) as my interface - instant chat UI for free.

This let me iterate on tools and prompts rapidly before touching frontend code.

export function createOpenAIAdapter(config) {
  const router = Router()

  router.get('/v1/models', (_, res) => {
    res.json({
      data: [{ id: config.modelName, owned_by: 'strands-agents' }]
    })
  })

  router.post('/v1/chat/completions', async (req, res) => {
    const { messages, stream } = req.body
    const prompt = extractPromptFromMessages(messages)

    const { agent } = config.createAgent()

    if (stream) {
      // Stream SSE chunks in OpenAI format
      res.setHeader('Content-Type', 'text/event-stream')
      for await (const event of agent.stream(prompt)) {
        // Convert to OpenAI chunk format
        res.write(`data: ${JSON.stringify(chunk)}\n\n`)
      }
      res.write('data: [DONE]\n\n')
    } else {
      // Collect and return
      const response = await agent.invoke(prompt)
      res.json({ choices: [{ message: { content: response } }] })
    }
  })

  return router
}

// Mount the adapter
app.use(createOpenAIAdapter({ modelName: 'security-agent', createAgent }))

Now point Cline at http://localhost:8080/v1 and it works.

Production Tips

Session management with TTL:

const sessions = new Map<string, Session>()
const SESSION_TTL_MS = 30 * 60 * 1000 // 30 min

setInterval(() => {
  const now = Date.now()
  for (const [id, session] of sessions) {
    if (now - session.lastAccessedAt > SESSION_TTL_MS) {
      sessions.delete(id)
    }
  }
}, 60 * 1000)

Issue tracking during scans:

const issueTrackerTool = tool({
  name: 'issue_tracker',
  description: 'Track problems during audit: auth failures, timeouts, etc.',
  inputSchema: z.object({
    type: z.enum(['auth_failed', 'access_denied', 'timeout', 'credentials_required']),
    title: z.string(),
    description: z.string(),
  }),
  callback: (input) => {
    session.issues.push(input)
    return `Issue tracked: ${input.title}`
  },
})

Include issues in the final report so users know what couldn't be tested.

Architecture Overview

┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Frontend │────▶│ Agent Server │────▶│ Tools │
│ (Next.js) │ │ (Strands SDK) │ │ (http, dns, │
└─────────────┘ │ │ │ browser) │
      ▲ └────────┬─────────┘ └─────────────┘
      │ │
   SSE Events ┌────▼────┐
                        │ Model │
                        │ Provider│
                        └─────────┘

Cost comparison (per 1M tokens):

Model	Input	Output	Cache Read	Best For
Haiku 4.5	$1.00	$5.00	$0.10	Simple queries, greetings
Sonnet 4.5	$3.00	$15.00	$0.30	Security audits, analysis
Opus 4.5	$5.00	$25.00	$0.50	Complex reasoning

With routing + caching, typical security audit costs ~$0.08-0.15 vs $0.40+ without.

Conclusion

The Strands TypeScript SDK provides a solid foundation for building production AI agents. Key takeaways:

Tools are everything - Design domain-specific tools, not generic utilities
Cache aggressively - Message caching saves 90% on repeated context
Route by complexity - Not every request needs your best model
Stream for UX - Users need to see progress during long operations
Track costs - Log token usage and compare models in production

The SDK handles the agentic loop so you can focus on domain logic. For ai-secure.dev, that meant security analysis - not prompt engineering infrastructure.

Questions or building your own agent? Connect on LinkedIn.

AWS Bedrock AgentCore - AI Agent Development from Local to Cloud

Martin Muller 🇩🇪🇧🇷🇵🇹 — Sat, 17 Jan 2026 00:00:00 +0000

Introduction

Building production-ready AI agents requires more than just a prompt and an LLM. You need infrastructure for state management, tool execution, and secure deployment. I recently built https://ai-secure.dev, a SaaS for automated security compliance audits, using AWS Bedrock AgentCore. This post explores how AgentCore simplifies the transition from a local prototype to a scalable cloud agent.

What is AWS Bedrock AgentCore?

AgentCore is AWS's managed runtime for AI agents. Think of it as "Fargate for AI agents" - you bring your container, AWS handles scaling, networking, and infrastructure.

Key features:

Container-based : Package agent as Docker image, push to ECR, deploy via CDK
VPC networking : Agents run in private subnets with NAT for outbound (Anthropic API, target websites)
AgentCore Browser : Managed Chromium browser for web automation - no Playwright/Puppeteer infra needed
Memory : Built-in conversation memory across sessions
Streaming : SSE responses for real-time progress updates

https://ai-secure.dev

A security compliance scanner that uses an AI agent to audit websites.

How it works:

User submits URL + selects security compliance framework (ISO 27001, NIST, SOC2, COBIT)

Agent navigates site using AgentCore Browser, shows real-time progress

Scan completes with summary

User views detailed report with findings + recommendations

Architecture:

┌─────────────┐     ┌────────────────────┐     ┌─────────────┐
│ Frontend    │────▶│ AgentCore          │────▶│ AgentCore   │
│ (Next.js)   │     │ Runtime │ │ Browser│     |             |
└─────────────┘     │ (your container)   │     │ (Chromium)  │
                    └────────────────────┘     └─────────────┘
                           │
                     ┌─────▼──────┐
                     │ Claude     │
                     │ (Anthropic)│
                     └────────────┘

Tech stack:

Component	Tech
Agent	Strands TypeScript SDK + Claude (Anthropic)
Browser	AgentCore Browser (cloud) / Playwright MCP (local)
Frontend	Next.js 16 + React 19 + Tailwind (Hosted on AWS ECS Fargate)
Auth	AWS Cognito
Database	DynamoDB
Infra	AWS CDK → AWS ECS Fargate + AWS Bedrock AgentCore Runtime
Payments	Stripe

Agent tools:

browser_navigate, browser_snapshot, browser_click, browser_type - web automation
http_security_check - headers, TLS, redirects
dns_lookup - SPF/DMARC/CAA records
totp - 2FA code generation for authenticated scans
issue_tracker - tracks problems during scan

Model routing for cost optimization:

// Simple tasks → Haiku (~10x cheaper)
// Complex tasks → Sonnet
const classification = classifyTask(prompt)
const model = classification.complexity === 'simple' 
  ? 'claude-haiku-4-5' 
  : 'claude-sonnet-4-5'

Learnings

Agent development:

Don't build an agent from scratch initially. Validate your use case works with existing agents (Cursor, Claude Code, Kiro). If they can't solve it, your custom agent probably won't either. Your goal: beat them on speed, accuracy, cost for your specific domain.
Develop locally first using Docker. Same container runs locally and in AgentCore.
Use Server-Sent Events (SSE) streaming for real-time progress - users need to see what the agent is doing.

Browser automation:

AgentCore Browser is game-changer. No more managing Playwright/Puppeteer infrastructure.
For local dev, use existing MCP Docker images (playwright-mcp). Your custom implementation won't be better.
Browser tools need good error handling - pages don't always load, elements move, auth flows vary.

Cost optimization:

Model routing: Haiku for greetings/simple queries, Sonnet for audits
Message caching: 90% cost reduction on repeated context
Disable extended thinking unless needed (~$0.15/call)

Production gotchas:

AgentCore logs are separate from your app logs. Use CloudWatch SDK directly for application logging.
Use Infrastructure as Code (IaC) to deploy your agent to AgentCore.

Conclusion

AWS Bedrock AgentCore provides a robust foundation for building complex, stateful AI agents. By offloading the heavy lifting of runtime management and browser infrastructure to AWS, I was able to focus on the core logic of https://ai-secure.dev - creating high-quality security audits - rather than debugging infrastructure. If you're building agents that need to browse the web or maintain long-term state, AgentCore is a powerful accelerator.

Please give me feedback on LinkedIn. Either if you find the https://ai-secure.dev useful and how I can make it better or if you need help with your own AI agent project. More and more builders are curious about how I build and deploy AI Agents in more specifics. So let me know if you would interested in more blog posts around this or perhaps a workshop/course around how I build my AI Agents?!

As well I think my next post will be about Agent Skills as I'm exploring those to start another AI Agent idea where I want the agent helping me with marketing for https://ai-secure.dev. So make sure to subscribe to my blog posts RSS feed.

SST, AWS CDK, AWS CloudFormation migration to Terraform

Martin Muller 🇩🇪🇧🇷🇵🇹 — Mon, 26 Aug 2024 15:15:41 +0000

Recently, a client approached me with an intriguing request: to migrate their SST (v.2 https://v2.sst.dev/) project to Terraform. This task presents several interesting challenges, particularly in mapping SST and CDK constructs to equivalent Terraform resources. Additionally, certain components like Lambda functions and static S3 React buckets require extra attention, as they involve uploading additional files such as Lambda function code or React builds.

Although I performed this migration specifically for SST, the process I'll outline is equally applicable to AWS CDK or AWS CloudFormation projects. If you're facing a similar challenge, this blog post aims to provide you with a solid starting point and valuable insights into the migration process.

First, let's discuss some possible motivations and non-motivations for migrating from SST, AWS CDK, or AWS CloudFormation to Terraform.

After that, I will describe the migration process in detail.

Motivations

My client's motivation was rather passive. As a newcomer to AWS, he used SST (v.2) to quickly deploy his application, which worked well for him. However, his company eventually decided to standardize on Terraform instead of SST. This highlights alignment as a motivation: if your company is using Terraform, you probably should too.

Another commonly cited reason is Terraform's superior state management. While Terraform deployments are faster than SST or AWS CDK, the difference doesn't feel significant to me.

There are undoubtedly more motivations; feel free to mention important ones I may have missed.

Non-Motivations

The following are reasons I wouldn't consider as motivations for migrating to Terraform:

The biggest one is the degree of abstraction, like as for the CDK Constructs. You would lose that when migrating to Terraform. Defining many basic resources in Terraform isn't particularly enjoyable. To counter this somewhat, you can use an AI tool in your IDE to speed up the definition process. As well, Terraform modules give you a way to abstract definitions to modules, but it doesn't feel the same as the CDK Constructs.

As SST and AWS CDK are already TypeScript-based, they make it really easy to define and handle the lifecycle of Lambda functions. In Terraform, you need to define helper functions to manage the Lambda function lifecycle, such as bundling the code, creating the deployment package, and so on. This was actually quite painful for my project, and I ended up with this rather inelegant script:

rm -rf lambda_function_payload.zip
cd ../functions
rm -rf dist
npx tsc
npm install
cp -r node_modules/ dist/node_modules
cd dist
zip -rFS lambda_function_payload.zip *
cp -r lambda_function_payload.zip ../../../terraform
cd ../../../terraform

This feels ugly and was no fun.

Kind of similar is the deployment of a React app into a S3 bucket. In SST and AWS CDK you can use the Bucket construct and let the framework handle the deployment. In Terraform, you need to manually deploy the React app to the S3 bucket and then invalidate the CloudFront cache.

Migration

In the next section, I will describe how the migration from SST, AWS CDK, or AWS CloudFormation to Terraform works.

Step 1: Deploy

Before you start the migration, make sure your deployment works as expected. If you have an SST project, follow the instructions for deploying it like npx sst deploy. After deploying, check the functionality of the CloudFormation stacks. That is super important as that will be your comparison to the Terraform deployment.

Step 2: Generate Terraform from the CloudFormation template

Now that your deployment from step 1 created at least one CloudFormation stack, you can start the migration.

Via the AWS Console, grab each generated stack and let a chat AI like Claude from anthropic.com or ChatGPT.com generate Terraform from the CloudFormation template. The prompt could look like:

{
  "Resources": {
    "CustomResourceHandlerServiceRole41AEC181": {
      "Type": "AWS::IAM::Role",
      ...
}

Change the AWS CloudFormation to Terraform. Give back the full Terraform code!

Through the output from all those answers into the main.tf file.

Step 3: Cleanup the main.tf

Great now we have all the AWS resources in the main.tf file. But unlucky we have resources in the main.ts file which we don't want or which are not useful like those AWS CDK helper resources CustomResourceHandlerServiceRole... and CustomResourceHandler....

As well, there are might be other resources which you might consider removing. For example, if you have several stacks and used variable referencing between the CloudFormation stacks, the AI translation to Terraform usually uses aws_ssm_parameter Terraform resources to replace them. But since all your resources are in one main.tf file, you don't need aws_ssm_parameter and simply reference resources directly.

Step 4: Make the Lambda's working

Yeah, now comes the tricky part with making the Lambda's working. Look this totally depends on your project structure like where is the Lambda function code located and how is it structured. I think good advice is to keep it similar as possible to your SST or AWS CDK project. As well, use the source_code_hash to make sure the Lambda function code is hashed and the Lambda function is recreated if the code changes. Like:

resource "aws_lambda_function" "my_lambda_function" {
  ...
  source_code_hash = filebase64sha256("my_lambda_function.zip")
}

What is left is the script to bundle the Lambda function code to the my_lambda_function.zip file. For you, it could look something like this:

rm -rf lambda_function_payload.zip
cd ../functions
rm -rf dist # the tsconfig.json has an "outDir": "./dist" configured where all the compiled js files will be stored
npx tsc # compile the TypeScript code
npm install
cp -r node_modules/ dist/node_modules
cd dist
zip -rFS lambda_function_payload.zip *
cp -r lambda_function_payload.zip ../../terraform
cd ../../terraform

Deploy the Lambda function and check with the AWS Console if it is working as expected.

terraform apply

Step 5: Make the S3 React bucket working

Sure this is described for a React App but any other SPA or static site will work the same. First, you need to build the React app like:

cd ../frontend
npm install
npm run build

Copy the build to the s3 bucket and invalidate the cloudfront cache:

BUCKET_NAME=xyz-react-site-bucket
DISTRIBUTION_ID=EZ8RBY8ZM1234
aws s3 sync dist/ s3://$BUCKET_NAME --delete

aws cloudfront create-invalidation --distribution-id $DISTRIBUTION_ID --paths "/*"

Phew, that's it.

Step 6: Validating

Validate the Terraform deployment with your reference deployment from step 1. Usually like when you have an Api Gateway our Lambdas, make sure they are working as expected when comparing the two deployments. The same goes for the S3 React bucket App.

Considerations

Sure, storing all those AWS resources into one main.tf file isn't Terraform best practice, but it is totally a starting point. If you made sure that your Terraform deployment is working, feel free to split the main.tf file into multiple files, as you are used to.

Conclusion

Migrating SST to Terraform was interesting. With the power of AI, it was a quick process. Combined with my years of experience, I was able to quickly migrate. If you have a question or need otherwise help, please reach out to me.

AWS Bedrock Update from Claude v2.1 to Claude v3

Martin Muller 🇩🇪🇧🇷🇵🇹 — Sun, 21 Apr 2024 15:33:46 +0000

Recently, AWS released Claude v3. It comes with the Haiku and Sonnet flavors. Both are big improvements over the previous version. We recently updated our arcBot to use Claude v3 Sonnet and are very impressed with the results. Not only are the responses more intelligent, but the speed of the responses is much faster. In the next section I will describe how an update can go as smoothly as it did for us.

Upgrade

As mentioned above, we upgraded from Claude v2.1 to Claude v3 Sonnet for our gen AI database tool arcBot (Try it out!).

When you do this there is a high risk that your answers will not be as good as before. But we were prepared and the answers are even better now. But how did we do it?

Simply with Unit Testing. I wrote a bunch of unit tests during the development with Claude v2.1 and they came in handy for the update to v3. It was basically just a matter of making those unit tests pass.

Conclusion

Using AWS gen AI offerings like Claude is super fun. It can be challenging to make sure your answers are still as good as before. But with good test coverage, you can make sure that your answers are still as good as before, or even better. With our unit tests in place, we are confident that we can easily upgrade to a future version again, or if we wanted, change the LLM entirely. If you have any questions or thoughts about this, feel free to contact us :)!

Bonus - AB Picturer

Did you notice the cool blog title picture? It is actually one of two randomly selected pictures. I love writing blog posts and choosing nice pictures for them. But often I want to choose THE BEST picture. So to find the best picture I'm using AB Testing. If you are curious about it, have a look at my AB Picturer Tool and provide me feedback or even better become an engaged tester :).

AI Softwareentwicklung mit Ninox und arcBot

Martin Muller 🇩🇪🇧🇷🇵🇹 — Fri, 16 Feb 2024 18:15:33 +0000

KI beflügelt die Unternehmenslandschaft. Viele einfache und komplexe Prozesse im Unternehmen können durch AI stark vereinfacht oder sogar komplett automatisiert werden. Jakob mein Co-Founder und ich sehen es als spannende Herausforderung, wie wir den Zugang zu AI für kleine und mittelständische Unternehmen vereinfachen können. Wir glauben mit der Low Code Plattform Ninox und unserem AI Chatbot arcBot einen spannenden Ansatz gefunden zu haben. In diesem Blogpost möchte ich euch diesen Ansatz vorstellen.

Ninox.com

Ninox.com ist eine cloudbasierte Low-Code-Plattform zum Bau von Softwarelösungen für kleine und mittelständische Unternehmen. Ninox wurde 2013 in Deutschland von Frank Böhmer gegründet. Hier einige Möglichkeiten, wie Ninox Unternehmen unterstützen kann:

Automatisierung der Geschäftsprozesse: Mit Ninox können Unternehmen ihre täglichen Aufgaben und Prozesse automatisieren und so ihre Effizienz steigern.

Anpassbarkeit: Da Ninox eine Low-Code-Plattform ist, können Unternehmen ihre Anwendungen ohne komplexe Programmierkenntnisse an ihre spezifischen Bedürfnisse anpassen.

Integration: Ninox kann mit einer Vielzahl anderer Tools und Plattformen integriert werden, was einen nahtlosen Datenaustausch zwischen verschiedenen Systemen ermöglicht.

Kosteneffizienz: Verglichen mit der Entwicklung einer benutzerdefinierten Software von Grund auf, kann die Verwendung einer Low-Code-Plattform wie Ninox zu erheblichen Kosteneinsparungen führen.

Schnelle Implementierung: Mit Ninox können Unternehmen ihre Anwendungen schnell erstellen und implementieren, was zu einer schnelleren Markteinführung führt.

Vor einigen Monaten habe ich mit Ninox ein MVP erstellt. Meine gesammelten Erfahrungen können hier nachgelesen werden. Im nächsten Kapitel erkläre ich, wie wir mit arcBot den Build für deine Softwarelösung in ninox vereinfachen.

arcBot

Der arcBot ist eine ChatGPT-ähnliche KI zum schnellen Erstellen und Modifizieren von Ninox-Tabellen. Wir benutzen die AWS Bedrock API um die Ninox Tabellen Antworten zu erstellen. Ein Beispiel Prompt zum erstellen von Ninox Tabellen könnte wie folgt aussehen:

Create customer and product tables in a many to many relationship.

Cool oder? arcBot erstellt viele Tabellen für Kunden und Produkte. Bemerkenswert ist, dass arcBot sogar die ninox-typischen inversen Felder setzen kann. Im nächsten Prompt wollen wir den Last Name zu einem Customer hinzufügen.

Add last name to customer table

Für die Antwort verwendet arcBot das Claude v2.1 model, das über die AWS Bedrock API zur Verfügung gestellt wird.

Wie stellen wir sicher, dass die Qualität der Antworten so hoch wie möglich ist? Wir haben eine Feedbackschleife eingebaut. Der Nutzer kann nach der Antwort Feedback geben. Dieses Feedback wird dann zur Verbesserung von arcBot verwendet.

Ich stehe auch in engem Kontakt mit anderen Gen AI Experten, die mir helfen, die Qualität der arcBot Antworten zu verbessern. An dieser Stelle möchte ich mich bei diesen Experten bedanken. Es sind hauptsächlich AWS Community Mitglieder und AWS Mitarbeiter die mir schon viele wertvolle Tipps gegeben haben. Vielen Dank 🙏

Ninox Connector

Es ist cool, dass wir mit arcBot Ninox Tabellen erstellen können, aber wie können wir die Daten in Ninox weiterverarbeiten? Hier kommt der Ninox Connector ins Spiel. Der Ninox Connector kann Ninox Tabellen lesen und aktualisieren. Das folgende Bild zeigt den Ninox Connector.

Der Ninox Connector benötigt einige Informationen wie die Ninox Url, Team Id, Database Id und den Ninox API Key. Zurzeit is der Ninox Connector nur über den Early Access verfügbar. Dieser ist aber ganz leicht auf https://app.arcbot.de/ zu bekommen.

Discord Community

Werdet Teil unserer Discord Community. Wir haben bereits einige Member welche uns wertvolles Feedback und Feature Wünsche zu dem arcBot mitteilen.

Zusammenfassung

In diesem Blogpost habe ich euch gezeigt, wie wir durch die Kombination von Ninox und arcBot das Erstellen und Aktualisieren von Ninox-Tabellen vereinfachen. Wir freuen uns auf euer Feedback. Vielen Dank fürs Lesen.

Ich liebe es, an Open-Source-Projekten zu arbeiten. Viele Dinge kannst du bereits frei nutzen auf github.com/mmuller88. Wenn du meine Arbeit dort und meine Blog-Posts toll findest, denke doch bitte darüber nach, mich zu unterstützen:

Oder

Und schau doch mal auf meiner Seite vorbei

Create a Next.js Server Component S3 Picture Uploader with SST

Martin Muller 🇩🇪🇧🇷🇵🇹 — Thu, 04 Jan 2024 09:44:04 +0000

I recently started exploring SST as an alternative to my favorite full-stack set consisting of Projen, AWS CDK, and React. I have been thoroughly impressed with the experience so far. In this article, I will demonstrate how to create a Next.js App Router S3 Picture Uploader using SST.

SST

SST is a powerful framework that simplifies the development of serverless applications. It offers a straightforward and opinionated approach to defining serverless apps using TypeScript. Built on top of AWS CDK, SST handles the complexity of setting up your serverless infrastructure automatically. SST is an open-source framework and is completely free to use.

SST offers a variety of powerful constructs, including the NextjsSite construct. In the following section, I will provide more details about the NextjsSite construct, which greatly simplifies the process of deploying your frontend.

NextjsSite

The NextjsSite construct allows you to effortlessly create and manage open-next, which is a great alternative for hosting Next.js on Vercel. Being defined within an SST App, you can easily integrate other AWS services, making it incredibly powerful. However, if you're using the latest and most advanced features of Next.js, such as Server Components with the Next.js App Router, you may encounter some challenges.

Server Components

Server Components are a new way to build with Next.js. They allow you to write parts of your application using React components that are served from the server. This approach offers several advantages, such as faster page loading and simplified setup without the need to manage client states with useState, useEffect, and similar hooks. However, working with Server Components may require adjusting your workflow and learning new concepts.

S3 Picture Uploader

In this section, I will demonstrate how to create an S3 picture uploader using the Next.js App Router and SST. We will utilize the NextjsSite construct to create the Next.js App Router and the S3 construct to create the S3 Bucket.

You can find all the code in my GitHub Repository.

Initialize SST NextjsSite

All the steps are taken from the official SST guide.

npx create-next-app@latest

Mainly choose the defaults. Then switch to the app folder and open it via VS Code.

cd <SST_PROJECT>
code .

Now run:

npx create-sst@latest
npm install

Before deploying to your AWS Account, ensure that you have set up the correct credentials. I recommend using the AWS Identity Service to obtain temporary AWS CLI credentials, but you can also set up IAM User credentials or profiles. Once you have the credentials in place, run the following command:

npx sst deploy

Ensure that the AWS deployment is successful! View your CloudFront SiteUrl shown in the SST Output. Voila! You now have a running Next.js application on AWS with open-next 🤯. Let's take a closer look at what has been deployed in our AWS account because it's quite extensive!

To inspect the resources created by SST, go to the AWS Console and navigate to CloudFormation. Click on the newly created stack to view the details. You will find a set of helper Lambda Functions, the Lambda Function and Lambda URL for the Server Component, a CloudFront Distribution, and an S3 bucket that serves the static Next.js files.

Add S3 Picture Uploader

Ok let's go! We need to add an S3 bucket where we can upload the pictures to. Go to the sst.config.ts file and add a Bucket:

import { SSTConfig } from "sst"
import { Bucket, NextjsSite } from "sst/constructs"

export default {
 config(_input) {
  return {
   name: "sst-nextjs-s3-picture-uploader",
   region: "us-east-1",
  }
 },
 stacks(app) {
  app.stack(function Site({ stack }) {
   const bucket = new Bucket(stack, "public")
   const site = new NextjsSite(stack, "site", {
    permissions: [bucket],
    bind: [bucket],
   })

   stack.addOutputs({
    SiteUrl: site.url,
   })
  })
 },
} satisfies SSTConfig

The permissions: [bucket] property gives the NextjsSite read and write access for the S3 bucket. With bind: [bucket] You can use SST node variables in your React code like Bucket.public.bucketName. Let's update the page.tsx for adding the upload button and S3 AWS SDK upload code. You might need to run npm run dev so that the SST node variable Bucket.public.bucketName gets recognized.

import { Bucket } from "sst/node/bucket"
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3"

export default async function Home() {
 async function upload(data: FormData) {
  "use server"

  const file: File | null = data.get("file") as unknown as File
  if (!file) {
   throw new Error("No file uploaded")
  }

  const bytes = await file.arrayBuffer()
  const buffer = Buffer.from(bytes)

  const client = new S3Client({ region: "us-east-1" })

  const command = new PutObjectCommand({
   Bucket: Bucket.public.bucketName,
   Key: file.name,
   Body: buffer,
   ACL: "public-read",
  })

  await client.send(command)

  console.log(`Uploaded ${file.name} to S3`)

  return { success: true }
 }

 return (
  <main className="flex min-h-screen flex-col items-center justify-between p-24">
   <form action={upload}>
    <input name="file" type="file" accept="image/png, image/jpeg" />
    <button type="submit">Upload</button>
   </form>
  </main>
 )
}

Do you see how seaminess we can sneak in the AWS SDK s3 upload code? For me, that is mind-blowing if you like to compare it with a client-side variant where you couldn't do that so easily without exposing your AWS API credentials. But as server components are server side we are saved. It offloads a lot of complexity. Super cool!

Let's deploy that! For more convenience let's add a new command in the package json "deploy": "sst deploy",. Now run:

npm run deploy

Open the CloudFront SiteUrl. Now click on the upload button and check if you can see the file in S3.

BTW. for faster development you could also run locally with npm run dev but make sure to load your AWS CLI credentials before which allowing access to the S3 Bucket.

Conclusion

I'm still super flashed at how nicely SST is orchestrating the frontend with the backend. In this post, I described how you can start with SST and how to create an S3 picture uploader. I hope you learned something new. If you liked my post or want to correct me please reach out to me :).

I am passionate about contributing to Open Source projects. You can find many of my projects on GitHub that you can already benefit from.

If you found this post valuable and would like to show your support, consider supporting me back. Your support will enable me to write more posts like this and work on projects that provide value to you. You can support me by:

And don't forget to visit my site

How to Perform Unit Testing on your AWS Bedrock AI Lambda

Martin Muller 🇩🇪🇧🇷🇵🇹 — Sat, 30 Dec 2023 13:58:48 +0000

Using the AWS Bedrock API for MVPs is incredibly enjoyable! I recently wrote an article on how you can make the LLM Claude respond in JSON. You can check it out here. While it's a lot of fun, testing your LLM settings and prompts can become tiresome and frustrating. In this article, I will explain how you can effectively unit test your Lambda function that calls the AWS Bedrock API. By being able to unit test your prompts, you can iterate quickly towards your desired MVP or project state.

Lambda Unit Testing

Unit testing a Lambda function is straightforward since it is essentially a function that executes code. To test it, you can simply call the function with the necessary arguments and verify the response. If you are using Node.js, Jest provides a convenient way to run your test code.

Lambda Streaming Response & Unit Testing

The Lambda Streaming Response allows to leverage the streaming response from your LLM. Using the streaming response from your LLM is important as it gives your early feedback. That is pretty much what is happening when ChatGPT gives your those word by word streaming response.

Lambda

Here I show you a bit from the Lambda Function I want to unit test later.

async function handler(
  event: APIGatewayProxyEventV2WithRequestContext<APIGatewayEventRequestContextV2>,
  responseStream: lambdaStream.ResponseStream,
  ctx?: Context,
) {
  console.log(`event: ${JSON.stringify(event)}`);

  const body = event.body ? JSON.parse(event.body) : undefined;
  const { userInput, ninoxTables } = body;

  ...

  const bedrockParams: InvokeModelCommandInput = {
    modelId: 'anthropic.claude-v2:1',
    contentType: 'application/json',
    accept: '*/*',
    body: JSON.stringify({
      prompt: `\n\nHuman: ${prompt}\n\nAssistant:`,
      temperature: 0,
      ...
    }),
  };
  console.log(`bedrockParams: ${JSON.stringify(bedrockParams)}`);

  const command = new InvokeModelWithResponseStreamCommand(bedrockParams);

  //InvokeModelWithResponseStreamCommandOutput
  const response: any = await client.send(command);
  ...

  responseStream.end();
}

Unit Testing

And here is the unit test in a file living next to my Lambda function.

import { ArcbotLambdaInput, handler } from '../src/arcbot-stack.stream';

...

test('userInput: How is the weather?', async () => {
  const response = await handler(
    mockEvent({
      userInput: 'How is the weather?',
    }),
    //@ts-ignore
    '',
  );
  console.log(response);

  // do some validation
  expect(response.statusCode).toEqual(200);
  expect(JSON.parse(response.body)).toEqual({
    respond: 'I do not understand. Please rephrase!',
  });
});

test('Create a customer table', async () => {
  const response = await handler(
    mockEvent({
      userInput: 'Create a customer table',
    }),
    //@ts-ignore
    '',
  );

  // do some validation
  commonExpects(response);
  const body = JSON.parse(response.body) as Record<string, NinoxTable>;
  console.log(`body: ${JSON.stringify(body, null, 2)}`);

  const validationResult = z.record(NinoxTableSchema).safeParse(body);
  if (!validationResult.success) {
    console.log(validationResult.error.message);
  }
  expect(validationResult.success).toBeTruthy();

  expect(Object.entries(body)[0][0]).toBe('customer');
  expect(Object.entries(body)[0][1].caption).toBe('Customer');
});

const mockEvent = ({ userInput }: ArcbotLambdaInput) => {
  const event: lambda.APIGatewayProxyEventV2 = {
    version: '',
    ...
    body: JSON.stringify({ userInput } as ArcbotLambdaInput),
  };

  return event;
};

Conclusion

Unit testing your Lambda function that calls the AWS Bedrock API is crucial. It enables you to iterate quickly towards your desired MVP or project state. I hope this article has been helpful to you. If you have any questions or feedback, please don't hesitate to reach out to me.

I am passionate about contributing to Open Source projects. You can find many of my projects on GitHub that you can already benefit from.

And don't forget to visit my site

My first Experience with Powertools for AWS Lambda (TypeScript)

Martin Muller 🇩🇪🇧🇷🇵🇹 — Sun, 24 Dec 2023 14:16:50 +0000

Hi there,

I'm new to Powertools for AWS Lambda (TypeScript). Though I hear a lot of my DevOps friends say that it's a great tool for building serverless applications, I've never had the chance to use it myself. So I decided to give it a try and see what all the fuss is about. What can I say, I totally love it. In the next sections I will explain what the.

What is Powertools for AWS Lambda

Powertools for AWS Lambda is a collection of utilities, patterns, and best practices for writing AWS Lambda functions in Python, Typescript, Java and DotNet. It includes logging, tracing, custom metrics, and more. The goal of this project is to enable developers to build scalable and robust serverless applications easily.

Why I am using AWS Lambda Powertools

I'm a huge DevOps fanboy so I'm all over for techniques like Infrastructure as Code (IaC) or Serverless. Pretty cool about those that it helps me to focus on my business logic of my software product rather then wasting time in setting it up. Powertools feels pretty similar as it helps me to easy implement certain goodies as logging or tracing without the need to spend much time.

Certainly my main motivation is the Powertools logger library. The logger library helps you to write my Lambda logs in specific format to enable features like log levels and log queries with AWS CloudWatch Logs Insights. So far that is amazingly great and it gives me a powerful insight into my Lambda.

I'm using Powertools for me newest AI MVP. Part of the MVP is a Lambda where I call the AWS Bedrock API. Furthermore I automatically validate the LLM response. For more details see https://martinmueller.dev/aws-bedrock-validation.

How to start?

The Powertools GitHub repository offers nice examples to integrate and learn from. For example here are nice example written in AWS CDK. Following I will list the features I gained some experience with.

Logger

The logger is super fun to use! It gives me a powerful insight into my lambdas. Though I'm still not super sure when to use the different log levels and when and what objects I should put into the object logger part. But I will keep learning from my DevOps friends and make my own experience. Ultimately I will know better when I truly need insight via Logs Insights.

Parameters

The parameters feature is super cool. It helps me to easily access the parameters I defined in my AWS CDK stacks. I can access them via the get function. So far I only used it for getting secrets from the secrets manager but I'm super excited to use it for more like perhaps app configurations.

Idempotency

The idempotency feature seems super useful. It could help to reduce the quite high costs of using the AWS Bedrock API. But unfortunately idempotency isn't currently combinable with Lambda stream response. Hopefully that will change in future 🤞.

Resources to learn utilizing Powertools

Here's a list of examples I found helpful to learn how to apply Powertools:

Examples from the GitHub Repo:

https://github.com/aws-powertools/powertools-lambda-typescript

Powertools has ana amazing Discord community. Make sure to not miss it!:

Discord

Lee Gilmore:

https://github.com/leegilmorecode/embedded-aws-cloudwatch-dashboards/tree/main

Conclusion

Working with AWS Lambda Powertools Typescript totally make sense and I love it. It will definitely be my default choice when developing my next Lambda. I still need to learn how to use it properly! Please if you have any feedback how I can utilize Powertools better, reach out to me!

I am passionate about contributing to Open Source projects. You can find many of my projects on GitHub that you can already benefit from.

And don't forget to visit my site

Automatically validate your AWS Bedrock LLM Responses

Martin Muller 🇩🇪🇧🇷🇵🇹 — Mon, 18 Dec 2023 06:55:41 +0000

Validating the response from your Language Learning Model (LLM) is a critical step in the development process. It ensures that the response is in the correct format and contains the expected data. Manual evaluation can quickly become tiresome, especially when making frequent changes to your LLM. Automating or partially automating the validation process is highly recommended to save time and effort. In this post, I will discuss and demonstrate some ideas how you can achieve this automation.

Before

When I refer to LLM, I am talking about the use of existing foundation models through AWS Bedrock, such as Claude, LLama2, and others. You can learn more about AWS Bedrock here. There are techniques you can use to enhance the response, such as prompt refinements, RAG (Retrieval Augmentation using Vector Databases), or fine-tuning.

Responses from Language Learning Models (LLMs) are often non-deterministic, meaning that different responses can be generated even with the same prompt. However, this behavior can be adjusted to some extent using LLM parameters such as temperature.

Ideas

In the following sections I will present some ideas for creating automated tests for your LLM responses. I will also provide some examples of how I implemented these ideas in my own projects.

Validate the Shape

In many cases, the response may contain deterministic parts that can be used to partially validate it. For instance, I rely on Claude to provide a JSON response. I have taught Claude the schema of the JSON response, and by performing a schema validation test, I can verify if Claude adheres to the schema. Verifying a JSON schema is very simple.

Each programming language has a library that can be used to validate the schema. For instance, in TypeScript, I use the zod library to create and validate the schema. Which looks like that:

import { z } from 'zod';

export const NinoxFieldSchema = z.strictObject({
  base: z
    .enum([
      'string',
      'boolean',
      ...
    ])
    .optional(),
  caption: z.string().optional(),
  captions: z.record(z.string()).optional(),
  required: z.boolean().optional(),
  order: z.number().optional(),
  ...
});

export type NinoxField = z.infer<typeof NinoxFieldSchema>;

export const NinoxTableSchema = z.strictObject({
  nextFieldId: z.number().optional(),
  caption: z.string().optional(),
  captions: z.record(z.string()).optional(),
  hidden: z.boolean().optional(),
  ...
});

export type NinoxTable = z.infer<typeof NinoxTableSchema>;

And as part of my unit tests:

test('check schema', async () => {
    ...

    const body = JSON.parse(response.body);

    const validationResult = NinoxTableSchema.safeParse(
        JSON.parse(body.json),
    );
    if (!validationResult.success) {
        console.log(validationResult.error.message);
    }
    expect(validationResult.success).toBeTruthy();
});

Validate Sub-Responses

In my current AI application, I utilize multiple LLM calls to generate the final response. While validating the entire response may be challenging, I can easily validate some of the sub-responses. For instance, I have a deterministic response for which I can verify the response. The deterministic response classifies a user's intent into a specific category. For example, if the user asks to create a table, the intent is classified as "create_table". That will generate a deterministic sub-response in my AWS Lambda for the "create_table" intent. To test the accuracy of the classification, you can use well-known methods such as train-validation-test Split the training data into subsets for training and validation. I'll describe this technique more in the next section.

Train-Validation-Test Split

The train-validation-test split is very crucial to messsure the performance of the LLM. One methods of those splits is the k-fold cross validation. I try to explain this approach in easy words. Make sure to check the far more technical artical from Everton Gomede, PhD, The Significance of Train-Validation-Test Split in Machine Learning!

For instance, you could use 90 percent of the data for training and 10 percent for validation. Then, you can use the validation data to test the accuracy of the classification. Additionally, you can use permutation to shift the 10 percent of the validation data. I implemented a simple algorithm in TypeScript which helps me to calculate the accuracy of the classification:

import { test } from "@jest/globals"
import * as ArcbotStackStream from "../src/arcbot-stack.stream"
import {
 call_bedrock,
 generate_intent_identification_prompt,
 generate_table_identification_prompt,
 modify_table_prompt,
 relationship_json_prompt,
} from "../src/arcbot-stack.stream"
import {
 intentTrainingsData,
 modifyTableTrainingsData,
 oneToManyTrainingsData,
 tableIdentificationData,
} from "../src/training-data"

const runEvaluation = async <T extends { [s: string]: string[] }>(
 trainingData: T,
 jestSpy: jest.SpyInstance<T, [], any>,
 promptRefinement: (userInput: string) => Promise<string>,
 jsonResponse?: boolean
) => {
 const getTrainingAndEvaluationPermutations = (trainingsData: T) => {
  // Split trainings data into training and evaluation data
  const sliceTrainingsData = (fromPercentage: number, toPercentage: number) =>
   Object.entries(trainingsData).reduce(
    (acc, data) => {
     const evaluationSlice = data[1].slice(
      data[1].length * fromPercentage,
      data[1].length * toPercentage
     )
     const trainingSlice = data[1].filter((d) => !evaluationSlice.includes(d))
     return {
      training: { ...acc.training, [data[0]]: trainingSlice } as T,
      evaluations: {
       ...acc.evaluations,
       [data[0]]: evaluationSlice,
      } as T,
     }
    },
    {
     training: {} as T,
     evaluations: {} as T,
    }
   )
  const trainingPercentage = 0.9
  const validationPercentage = 1 - trainingPercentage

  // permute the training and evaluation data
  const trainingValidationPermutations = [
   sliceTrainingsData(0, 0 + validationPercentage),
  ]
  for (let i = 0 + validationPercentage; i < 1; i = i + validationPercentage) {
   trainingValidationPermutations.push(
    sliceTrainingsData(i, i + validationPercentage)
   )
  }

  console.log(
   `trainingValidationPermutations: ${JSON.stringify(
    trainingValidationPermutations
   )}`
  )
  return trainingValidationPermutations
 }

 let correctResponses = 0
 let wrongResponses = 0

 const trainingRecords = getTrainingAndEvaluationPermutations(trainingData)

 for (const trainingPermutation of trainingRecords) {
  console.log(`trainingPermutation=${JSON.stringify(trainingPermutation)}`)

  jestSpy.mockImplementation(() => trainingPermutation.training)

  for (const evaluationRecords of Object.entries(
   trainingPermutation.evaluations
  )) {
   for (const input of evaluationRecords[1]) {
    const intent_prompt = await promptRefinement(input)
    const response = await call_bedrock(intent_prompt, jsonResponse)

    let received = response

    // trim to JSON string
    if (jsonResponse) {
     received = JSON.stringify(JSON.parse(received))
    }

    console.log(`Expected ${evaluationRecords[0]}\nReceived ${received}`)

    if (evaluationRecords[0] === received) {
     correctResponses++
    } else {
     wrongResponses++
    }
   }
  }
 }
 console.log(
  ` correctResponses: ${correctResponses}\n wrongResponses: ${wrongResponses} \n ${
   correctResponses / (correctResponses + wrongResponses)
  } accuracy`
 )
}

test("evaluate one to many", async () => {
 const mockTrainingsData = jest.spyOn(
  ArcbotStackStream,
  "getOneToManyTrainingData"
 )

 const oneToManyPromptRefinement = async (userInput: string) =>
  relationship_json_prompt(
   {
    CUSTOMER: { caption: "Customer" },
    EMPLOYEE: { caption: "Employee" },
    INVOICE: { caption: "Invoice" },
   },
   userInput
  )

 await runEvaluation(
  oneToManyTrainingsData,
  mockTrainingsData,
  oneToManyPromptRefinement,
  true
 )
})

test("evaluate intent", async () => {
 const mockTrainingsData = jest.spyOn(
  ArcbotStackStream,
  "getIntentTrainingData"
 )

 const generate_intent_identification_promptRefinement = async (
  userInput: string
 ) => generate_intent_identification_prompt(userInput)

 await runEvaluation(
  intentTrainingsData,
  mockTrainingsData,
  generate_intent_identification_promptRefinement
 )
})

test("evaluate table identification", async () => {
 const mockTrainingsData = jest.spyOn(
  ArcbotStackStream,
  "getTableIdentificationData"
 )

 const generate_table_identification_promptRefinement = async (
  userInput: string
 ) => {
  const prompt = generate_table_identification_prompt(
   userInput,
   Object.keys(tableIdentificationData)
  )

  return prompt
 }

 await runEvaluation(
  tableIdentificationData,
  mockTrainingsData,
  generate_table_identification_promptRefinement
 )
})

test("evaluate modify table", async () => {
 const mockTrainingsData = jest.spyOn(
  ArcbotStackStream,
  "getModifyTableTrainingsData"
 )

 modifyTableTrainingsData

 const modify_table_promptRefinement = async (userInput: string) => {
  const prompt = await modify_table_prompt({}, userInput)

  return prompt
 }

 await runEvaluation(
  modifyTableTrainingsData,
  mockTrainingsData,
  modify_table_promptRefinement,
  true
 )
})

I think the most interesting part here is the method interface getTrainingAndEvaluationPermutations(trainingData) as that always expect the same format as input and gives you back a permuted test validation split of the input training data. The training data have to be in record string list shape:

<T extends { [s: string]: string[] }>

Where the key represents the expected result / classification class / LLM output and the value represents possible inputs which leads to the result. The result will be this type:

{
 training: T
 evaluations: T
}
;[]

It is an Array representing the permutations. Each permutation has a training and evaluations slice.

One traing data example would be:

export const intentTrainingsData: { [key: string]: string[] } = {
 create_new_table: [
  "Create table to store invoices",
  "I need to store my customers information",
  "I need a table for my employees",
 ],
 modify_existing_table: [
  "Customers table should also have an address",
  "Add address to the customer table",
  "Invoice should have a date",
 ],
 link_two_tables: [
  "Customer should have multiple invoices",
  "Each employee should be responsible for multiple customers",
 ],
 do_not_know: [
  "How are you today?",
  "What is your name?",
  "What is the weather today?",
 ],
}

This training set is to teach the model the intent recognition of the user. The permuted training validation would looks like:

[
 {
  training: {
   create_new_table: [
    "Create table to store invoices",
    "I need to store my customers information",
   ],
   modify_existing_table: [
    "Customers table should also have an address",
    "Add address to the customer table",
   ],
   link_two_tables: [
    "Customer should have multiple invoices",
   ],
   do_not_know: [
    "How are you today?",
    "What is your name?",
    "What is the weather today?",
    "2 + 3",
   ],
  },
  evaluations: {
   create_new_table: [
     "I need a table for my employees",
   ],
   modify_existing_table: [
     "Invoice should have a date",
   ],
   link_two_tables: [
    "Customer should have multiple invoices",
    "Each employee should be responsible for multiple customers",
   ],
   do_not_know: [
    "How are you today?",
    "What is your name?",
   ],
  },
 },
 {
  training: {
   create_new_table: [
    "Create table to store invoices",
    "I need to store my customers information",
    "I need a table for my employees",
   ],
   modify_existing_table: [
    "Customers table should also have an address",
    "Add address to the customer table",
    "Invoice should have a date",
   ],
   link_two_tables: [
    "Each employee should be responsible for multiple customers",
   ],
   do_not_know: [
    "How are you today?",
    "What is your name?",
    "What is the weather today?",
    "2 + 3",
   ],
  },
  evaluations: {
   create_new_table: [
    "Create table to store invoices",
    "I need to store my customers information",
    "I need a table for my employees",
   ],
   modify_existing_table: [
    "Customers table should also have an address",
    "Add address to the customer table",
    "Invoice should have a date",
   ],
   link_two_tables: [
    "Customer should have multiple invoices",
    "Each employee should be responsible for multiple customers",
   ],
   do_not_know: [
    "What is the weather today?",
   ],
  },
 },
]

Golden Response

This is an idea from the AI community that shows promise. Although I haven't personally tested it yet, the concept is to compare the response with a "golden response" to ensure its correctness. The golden response can be compared with the actual response with the same Language Learning Model (LLM). We can then determine if they are identical or very similar. This approach holds potential and I'm eager to try it out soon.

Thanks

I would like to express my gratitude to the AWS Community for their invaluable assistance in helping me.

A special thanks goes to Chris Miller for giving me a lot of thoughts and feedback on my validation approach. Neylson Crepalde for making me aware and explaining the golden response validation method.

Once again, thank you all for your support and contributions.

Conclusion

Working with AWS Bedrock AI is incredibly enjoyable. The field is constantly evolving, and there is always something new to learn. In this post, I explained how to partly validate your LLM responses.

I hope you found this post helpful, and I look forward to sharing more with you in the future.

I love to work on Open Source projects. A lot of my stuff you can already use on https://github.com/mmuller88 . If you like my work there and my blog posts, please consider supporting me on the:

And don't forget to visit my site

AWS Bedrock Claude 2.1 - Return only JSON

Martin Muller 🇩🇪🇧🇷🇵🇹 — Thu, 07 Dec 2023 18:49:36 +0000

Working with the AWS Bedrock API is an exhilarating experience! I came across an interesting business case where I needed to develop an AI MVP. The MVP generates JSON data based on a prompt and utilizes the anthropic.claude-v2:1 model in AWS Bedrock.

I encountered an issue where the response I received was not pure JSON. It contained additional characters that I couldn't remove like:

" format: {\"one\":\"Supplier\",\"many\":\"Time\"}"

Seeking help from the AWS Community, I was able to find a solution to this problem. In this post, I will share the solution with you.

The Problem

Human: $YOUR_PROMPT . Answer in JSON formatAssistant:{,

This technique is known as "Put words in Claude’s mouth". It involves providing a prompt to Claude and letting it generate the rest of the response on its own. While there may be alternative approaches to solving this issue, I am currently happy with this solution.

Thanks

I would like to express my gratitude to the AWS Community for their invaluable assistance in helping me resolve this issue.

A special thanks goes to Corvus Lee for providing the advice that ultimately solved the problem.

I would also like to thank Ken Collins for bringing the Claude 2 docs sheet to my attention.

Once again, thank you all for your support and contributions.

Conclusion

Working with AWS Bedrock AI is incredibly enjoyable. The field is constantly evolving, and there is always something new to learn. In this post, I demonstrated how to obtain a pure JSON response from AWS Bedrock Claude 2.1. As AI technology continues to advance rapidly, you may not encounter this issue in the future. However, if you are working with Claude 2.1 or newer, be sure to refer to the documentation for more information.

I hope you found this post helpful, and I look forward to sharing more with you in the future.

I love to work on Open Source projects. A lot of my stuff you can already use on https://github.com/mmuller88 . If you like my work there and my blog posts, please consider supporting me on the:

And don't forget to visit my site

Java Spring CI/CD in AWS - The HalloCasa Journey

Martin Muller 🇩🇪🇧🇷🇵🇹 — Thu, 07 Dec 2023 18:39:57 +0000

A fresh start can be daunting, especially when it comes to revamping an existing Java Spring application. This was my journey with the HalloCasa real estate platform, and here's how we took it from deployment nightmares to smooth, cloud-native operations. If you would like to improve your Java Spring CI/CD experience as well. Reach out to me!

Introducing HalloCasa

HalloCasa is a dynamic Real Estate Platform designed to allow real estate agents to create their own business card within a couple of minutes. It also serves as a directory and a referral network.

My motivations to work on HalloCasa are that I had the chance to contribute to the development of a powerful product. My expertise in AWS could be applied effectively.
Working with a co-founder who was not only business-savvy but also had a knack for generating compelling content in the real estate area. He has his own podcast https://blog.hallocasa.com/podcasts/. Make sure to check it out!

The Deployment Dilemma

In the beginning, our deployment process could only be described as patchwork:

Setting up Maven, Tomcat, and MySQL locally. Creating a war file and then SSHing into an EC2 VM to transfer it. Manually restarting the Tomcat server each time.

Our NextJS frontend wasn't much different. We had to SSH into our EC2 VM and manually run “npm start”. And to make matters a bit more convoluted, our frontend was in BitBucket while the backend resided in AWS CodeCommit.

Deploying the New and Improved Way

Here’s how I revamped our deployment:

Shifting to GitHub

Migrated from AWS CodeCommit and BitBucket to GitHub.
Leveraged GitHub Actions for test builds and introduced a PR AI reviewer for efficiency.

Dockerization

Deployed our configuration as code, which is both clean and manageable. With docker-compose, local development became hassle-free. Gone were the days of setting up Maven, Tomcat, and MySQL locally.
Further utilized this to feed AWS ECS, making deployments consistent. Additionally health checks were introduced to ensure that the application is up and running. If the health check fails, the application will restart and in most cases that fixes the problem.

CDK Pipelines

Implemented a CI/CD staging pipeline that first deploys changes onto QA and subsequently to PROD. Ensured robustness by integrating testing with Postman. We setup a dedicated AWS account for each stage, ensuring clear demarcation and management.

DB Migration with Flyway

Adopted Flyway to empower our Java App to manage the database, eliminating the need to provide access to the database for schema modifications.

Monitoring Metrics

Introduced plenty of metrics to monitor our backend. For an informed decision on crucial metrics, I sought advice from ChatGPT about the most essential metrics for a Java application.
Our revamped production setup went live on 01/08/23, and the migration was seamless.

Reflecting on the Journey

The refurbishment of HalloCasa reaffirmed several beliefs:

Docker is a lifesaver. The portability and consistency it brings to applications are unparalleled. CI/CD is a game-changer. It automates manual tasks and ensures faster, reliable deliveries. AWS CDK is powerful. It simplifies cloud resource provisioning and management. CDK Pipelines streamline deployments. With it, managing multiple stages of deployment becomes a breeze. One revelation that stood out was how the lines between infrastructure and application seemed blurred in our setup. And this convergence was a positive one, indicating tight integration and consistency.

The HalloCasa journey is a testament to the power of continuous learning, adaptation, and innovation. Our users now enjoy a more robust platform, and we can sleep better at night, knowing that our deployment is smoother than ever. If you have a Java application which needs a bit polishing, reach out to me!

I hope you enjoyed this post and I look forward to seeing you in the next one.

I love to work on Open Source projects. A lot of my stuff you can already use on https://github.com/mmuller88 . If you like my work there and my blog posts, please consider supporting me on the:

And don't forget to visit my site