Forem: marcusmayo

💭 PromptOps Policy Coach — From Metrics to Mechanisms You Can Trust

marcusmayo — Tue, 21 Oct 2025 16:13:09 +0000

If you’ve ever tried to scale AI inside a big company, you already know the truth: models aren’t the hard part — trust is.

And trust doesn’t show up because we ask for it; it shows up because we can measure what’s happening and govern how it behaves.

Last week I shared Why Metrics Matter — how velocity, predictability, and flow efficiency quietly fixed delivery pain on our AI teams.

Today I’m taking that one step further with PromptOps Policy Coach — a platform that turns those same delivery insights into governable AI systems.

💡 This article is part of my Weekend AI Project Series, where I turn weekend build ideas into production-ready AI systems.

👉 Read Part 1 — Adventures in Vibe Coding

🎯 TL;DR

🧠 What it is: A policy Q&A system that runs one question through five prompt frameworks and tracks cost/performance in real time.
💡 Why it exists: To make AI consistent, explainable, and affordable across HR/Legal/IT.
⚙️ What it proves: Enterprise AI isn’t “just prompts.” It’s patterns + governance + metrics working together.

Frameworks: CRAFT, CRISPE, Chain-of-Thought, Constitutional AI, ReAct

RAG: Custom numpy-based retrieval

Cost: < $0.01/query (OpenAI GPT-4o-mini)

Deploy: Docker or Google Cloud Shell

💬 The backstory

In big orgs, three things kill AI rollouts:

Inconsistent outputs — same question, five answers.
Runaway costs — invisible API usage that eats budgets alive.
Slow adoption — heavy infra that scares off internal teams.

So I standardized how the AI reasons, made sources explicit with RAG, and surfaced cost & performance as first-class citizens. That became PromptOps.

⚙️ A quick tour

1) One clean interface

Ask a policy question. Pick a framework. Get the answer and see what it cost. Simple and transparent.

2) Five brains, one question

ReAct — think → act → observe
CRISPE — helpful, human tone
CRAFT — exec-level structure
Chain-of-Thought — careful reasoning
Constitutional AI — principles + self-checks

3) RAG that’s boring on purpose

Custom, numpy-based, dependency-light. Fast and deployable anywhere.

4) Live metrics

Tokens, cost, response time — per framework — because if you can’t see it, you can’t trust it.

🏗️ Architecture

graph TB
    A[Company Docs] --> B[Chunk & Index]
    B --> C[Vector Search (numpy + embeddings)]
    E[User Query] --> D[Multi-Framework Engine]
    C --> D
    D --> F[CRAFT]
    D --> G[CRISPE]
    D --> H[Chain-of-Thought]
    D --> I[Constitutional AI]
    D --> J[ReAct]
    F --> K[OpenAI GPT-4o-mini]
    G --> K
    H --> K
    I --> K
    J --> K
    K --> L[Response + Sources + Metrics]
    L --> M[Cost Tracking]
    L --> O[Dashboard]

🧩 Engineering highlights

✅ Custom RAG > heavy stacks — smaller image, fewer headaches, clearer control.

✅ Cloud Shell optimized — consistent demo environment (no local setup drama).

✅ OpenAI v1 client — cheaper, future-proof.

✅ Defensive code — zero-error demos.

Benchmarks: 2.4–8.4s response | $0.0001–$0.0002/query | <200MB footprint.

🚀 What it means for enterprise teams

HR/IT/Legal → consistent answers with source links

Finance → predictable usage and spend

Compliance → logs and auditability

Product → compare frameworks and ship what users prefer

It’s a working prototype of how AI governance should feel — transparent, fast, dependable.

🛠️ Quick start

Cloud Shell / Local

git clone https://github.com/marcusmayo/machine-learning-portfolio.git
cd machine-learning-portfolio/prompt-ops-policy-coach
pip install -r requirements.txt
streamlit run app/enhanced_app.py \
  --server.port 8501 --server.address 0.0.0.0 \
  --browser.serverAddress localhost \
  --browser.gatherUsageStats false \
  --server.enableCORS false --server.enableXsrfProtection false

Docker

docker build -t policy-coach .
docker run -d -p 8080:8080 --name policy-coach-prod --env-file .env policy-coach

🧭 What’s next

Framework marketplace · SSO/RBAC · QA suite for prompt consistency · Cost optimizer · Kubernetes scaling.

💬 Final thought

If Why Metrics Matter was about measuring, PromptOps is about governing.
Measure → improve. Govern → trust.

🧠 Read My AI Build Logs (CTA)
Medium → https://medium.com/@mayo.marcus
Dev.to → https://dev.to/marcusmayo

📇 Connect
LinkedIn → https://lnkd.in/e9CBVihC
X / Twitter → https://x.com/MarcusMayoAI
Email → marcusmayo.ai@gmail.com

💻 Portfolio
Part 1 — https://github.com/marcusmayo/machine-learning-portfolio
Part 2 — https://github.com/marcusmayo/ai-ml-portfolio-2

From ML Beginner to Production Engineer: How I’m Leveling Up My AI Projects

marcusmayo — Sun, 12 Oct 2025 16:26:59 +0000

🎯 From training toy models to shipping real ML systems — here’s what that journey really looks like.

Most people start their ML learning journey in Jupyter notebooks. But when you want your model to serve real users, things get serious — and a lot more complex.

Here’s how the levels break down 👇

🧩 Level 1 – Learning the Basics

Clean datasets (Kaggle, UCI)
Jupyter notebooks & visualization
Simple metrics and evaluation

⚙️ Level 2 – Professional Data Science

Handling messy, real-world data
Organized code + config files
Feature engineering & tuning
Git for reproducibility

🚀 Level 3 – Machine Learning Engineering

Containerized model APIs (Docker/FastAPI)
MLflow for tracking + model registry
CI/CD pipelines
Monitoring & scaling on AWS/GCP

I'm documenting my path across these levels — moving from education to execution.

The next phase: Level 4, where models scale, retrain automatically, and support real users.

🧠 Read My AI Build Logs

📫 Get In Touch

LinkedIn: Connect with me

X / Twitter: @MarcusMayoAI

Email: marcusmayo.ai@gmail.com

Portfolio Part 1: AI & MLOps Projects

🎙️ What Building the AI Interview Analyzer Taught Me About Production ML

marcusmayo — Sat, 11 Oct 2025 12:36:22 +0000

After shipping the AI Interview Analyzer on GCP
, I realized that production-ready AI isn’t about adding more models — it’s about orchestrating them efficiently.

This build used:

FastAPI + Whisper for fast audio transcription

RoBERTa + Toxic-BERT + mDeBERTa for tone and competency scoring

Gemini 2.0 Flash for contextual feedback

Compute Engine to handle large audio workloads

It taught me three truths about real ML deployment:
1️⃣ Infrastructure matters more than model size.
2️⃣ Feedback loops make AI useful, not just functional.
3️⃣ Performance visibility (CloudWatch / GCP Monitoring) builds trust.

Full article 👇
🔗 https://dev.to/marcusmayo/building-an-ai-powered-interview-analyzer-on-gcp-31ia

📢 Follow my AI builds & insights:
| 🐦 @MarcusMayoAI
| 🧠 Dev.to/marcusmayo
| 💻 GitHub/marcusmayo
| 💼 LinkedIn

🧠 The Simplest Neural Network That Actually Works

marcusmayo — Fri, 10 Oct 2025 21:59:09 +0000

Before tackling multi-layer or transformer architectures, I built the simplest neural network I could — a single-layer perceptron to classify 0s and 1s from the MNIST dataset.

Project Highlights:

Framework: TensorFlow + Keras

Architecture: 1 Dense layer, 1 neuron, sigmoid activation

Optimizer: SGD

Accuracy: 99.9% test accuracy

Dataset: MNIST (filtered to digits 0 and 1)

Key Takeaway:
Even a one-layer model can teach core ML principles:

Data normalization

Gradient descent

Binary cross-entropy

Evaluation with precision, recall, and F1

Explore the notebook here 👇
🔗 Simple Neural Network Project

📢 Follow my AI builds & insights:
🐦 @MarcusMayoAI
| 🧠 Dev.to/marcusmayo
| 💻 GitHub/marcusmayo
| 💼 LinkedIn

💡 What Serverless Design Taught Me About AI Cost Optimization

marcusmayo — Thu, 09 Oct 2025 17:36:46 +0000

Building the Edenred Invoice Assistant
taught me something simple but powerful:

Cost optimization is an architecture decision, not a finance one.

When you design AI systems that know when not to run — like spinning down SageMaker endpoints when idle — you turn efficiency into intelligence.

This approach saved 90% in AWS cost and improved uptime reliability.

You can explore the full architecture and lessons learned here 👇
🔗 https://dev.to/marcusmayo/edenred-invoice-assistant-serverless-ai-chatbot-for-invoice-payment-support-4bpn

👋 Follow for real-world AI builds:
|🐦 @MarcusMayoAI
| 🧠 Dev.to
| 💻 GitHub
| 💼 LinkedIn

🤖 Edenred Invoice Assistant – Serverless AI Chatbot for Invoice & Payment Support

marcusmayo — Mon, 06 Oct 2025 23:55:53 +0000

🤖 Edenred Invoice Assistant – Serverless AI Chatbot for Invoice & Payment Support
Part of my ongoing Weekend AI Project Series where I turn weekend experiments into production-grade AI systems.

🎯 Problem & Goal
Invoice and payment queries overwhelm finance teams — repetitive, predictable, and time-consuming.
I wanted to build a serverless AI assistant that handles these inquiries in real-time, with zero infrastructure overhead and full cost control.

☁️ Architecture Overview
Tech Stack:

🧠 AWS Lambda — Serverless compute for sub-second responses

🧩 Amazon SageMaker — Model fine-tuning and batch training

📦 S3 — Training data, model artifacts, and logs

🔗 API Gateway — REST endpoint serving chatbot responses

📊 CloudWatch — Monitoring and error tracking

Frontend → API Gateway → AWS Lambda → SageMaker Inference (trained model)
↳ Fallback logic → Pre-trained responses (cost-optimized)
Key Design Principle:
The SageMaker endpoint spins down post-training. Fallback logic (pattern-based responses) handles common queries — maintaining user experience while cutting costs by ~90%.

💡 Features Implemented
✅ Smart Response System:
Handles invoice submissions, payment status, and account queries.

✅ SageMaker Training Pipeline:
Fine-tuned Hugging Face model for domain-specific language understanding.

✅ Cost-Optimized Fallback Logic:
Pattern-matched responses cover 95% of user queries without active inference cost.

✅ Lambda Optimization:
Pre-loaded model weights and response caching = sub-second latency.

✅ Enterprise Readiness:
CORS-enabled, IAM roles configured, and robust error handling for production-grade uptime.

🧩 System Highlights
Metric Performance
Response Time < 1s average
Accuracy (trained scenarios) 95%+
Uptime 100% (fallback system)
Cost Reduction 90% vs always-on SageMaker
Deployment Serverless (Lambda + API Gateway)

📊 Chatbot in Action
Demo Scenarios:

“How do I submit an invoice?”

“Check payment status.”

“Why was my invoice rejected?”

“How do I update my bank details?”

🧱 Fallback ensures continuity even if the model endpoint is offline.

⚙️ Technical Implementation
Lambda Function
Handles logic, invokes SageMaker if active, else triggers fallback:

def handler(event, context):
query = event["queryStringParameters"]["q"].lower()
if query in fallback_responses:
return fallback_responses[query]
else:
return sagemaker_inference(query)
Fallback Response Dictionary
python
Copy code
fallback_responses = {
"invoice submission": "You can submit invoices via the Finance Portal under 'Upload Invoice'.",
"payment status": "Payments are processed every Thursday. You can track them via your dashboard.",
"bank details": "Update your bank info under Profile > Payment Settings."
}

🧱 Deployment Details
Environment Setup:

Python 3.11

Dependencies in requirements.txt

Deploy via AWS SAM CLI or Serverless Framework

Pipeline:

Train model → Save artifact to S3

Deploy Lambda with model fallback logic

Integrate API Gateway endpoint

Configure CloudWatch for monitoring

🧭 Key Learnings
1️⃣ Cost vs. Performance: Smart fallbacks drastically reduce inference costs without hurting UX.
2️⃣ Serverless Design: Lambda is ideal for low-frequency workloads with instant scale.
3️⃣ MLOps Simplified: SageMaker pipelines streamline model iteration and retraining.
4️⃣ Enterprise Fit: Combining AI logic with predictable cost makes adoption easier for finance teams.

🧱 Repository & Access
📁 Code: github.com/marcusmayo/ai-ml-portfolio-2

🧠 Main Portfolio: github.com/marcusmayo/machine-learning-portfolio

💬 Closing Thoughts
Building Edenred Invoice Assistant reinforced one key idea:

“Intelligent cost optimization is as valuable as model accuracy.”

This project shows how to merge AI innovation with business pragmatism — a skill every ML product engineer needs.

Follow for More:

🧠 Weekend AI Project Series

💼 LinkedIn Connect with me

💻 GitHub Live Projects

🚀 Weekend AI Project Series: Adventures in Vibe Coding

marcusmayo — Mon, 06 Oct 2025 17:12:30 +0000

🚀 Weekend AI Project Series: Adventures in Vibe Coding

The Weekend AI Project Series turns 48-hour builds into production-ready AI systems. Each episode explores a new MLOps challenge — from architecture tradeoffs to cost optimization and deployment pipelines — with real code, measurable outcomes, and product thinking at the core.

✅ Every project demonstrates:

Real-world ML pipelines
Cloud deployment (AWS / GCP)
MLOps best practices
Cost-optimization strategies
Lessons from production AI delivery

🎯 Recent Episodes

✨ Episode 5 drops this weekend — stay tuned for the next build in the Adventures in Vibe Coding series.

🐙 All Projects on GitHub

🔗 Follow Along for Weekend AI Builds

Each weekend = one real-world AI product built, shipped, and documented. Follow for hands-on experiments in AI Engineering, MLOps, and Product Strategy.

📖 Weekend AI Project Series (Medium List): medium.com/@mayo.marcus/list/weekend-ai-project-series-adventures-in-vibe-coding
💡 Dev.to Hub: dev.to/marcusmayo/weekend-ai-project-series-adventures-in-vibe-coding
💼 LinkedIn: linkedin.com/in/e9CBVihC
🐦 X (Twitter): x.com/MarcusMayoAI
🐙 GitHub: github.com/marcusmayo

✨ Turning weekend ideas into production-grade AI systems — one episode at a time.

🎙️ Building an AI-Powered Interview Analyzer on GCP

marcusmayo — Mon, 06 Oct 2025 16:01:31 +0000

A production-grade AI project that listens, scores, and coaches — built over a single weekend as part of my Weekend AI Project Series: Adventures in Vibe Coding.

AI interviews are messy. Human feedback is subjective.

So I built a system that listens, transcribes, analyzes, and mentors.

In this deep dive, I’ll show you how I:

Deployed a FastAPI backend with Whisper ASR for transcription
Integrated 3 NLP models (RoBERTa, Toxic-BERT, mDeBERTa) for sentiment and competency scoring
Added Gemini 2.0 Flash for human-like feedback
Migrated from Cloud Run to Compute Engine for production workloads

By the end, you’ll see how to turn a weekend experiment into a fully-functional, production-ready AI application — the kind of build that gets noticed by both engineers and hiring managers.

🚀 Project Overview

This project demonstrates how to build a production-ready AI interview analysis system — one that evaluates communication quality, professionalism, and competency in recorded interviews.

It combines:

🎙️ Speech-to-text (ASR) using Whisper

🧠 NLP scoring with RoBERTa, Toxic-BERT, and mDeBERTa

🤖 Feedback generation with Gemini 2.0 Flash

The system produces quantitative scores, segment-level analytics, and contextual AI feedback — the kind that turns interview recordings into actionable coaching data.

⚙️ Architecture Overview

The pipeline runs on Google Cloud Compute Engine (n1-standard-16) with the following key components:

Audio Upload → Whisper ASR → NLP Scoring → Ensemble Aggregation → Gemini Feedback → UI Visualization

Components:

Frontend (HTML + JS): Handles uploads, displays scores, and feedback.

FastAPI Backend (Python 3.11): Routes processing, manages inference requests.

Whisper Models (ASR): Supports tiny → medium variants for speed/accuracy tradeoffs.

NLP Models (Hugging Face):

cardiffnlp/twitter-roberta-base-sentiment

unitary/toxic-bert

MoritzLaurer/mDeBERTa-v3-base-mnli-xnli

LLM Feedback: Powered by Google Gemini 2.0 Flash for summarization and recommendations.

🧠 Core ML Pipeline

Here’s how each component works together:

Transcription (ASR)

The Whisper model transcribes the uploaded interview audio (MP3/WAV/M4A).

from faster_whisper import WhisperModel

model = WhisperModel("tiny", device="cuda")
segments, _ = model.transcribe("interview.m4a")
transcript = " ".join([s.text for s in segments])

NLP Scoring

Each transcript segment is passed through three different models:

from transformers import pipeline

sentiment = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment")
toxicity = pipeline("text-classification", model="unitary/toxic-bert")
competency = pipeline("zero-shot-classification", model="MoritzLaurer/mDeBERTa-v3-base-mnli-xnli")

result = {
"sentiment": sentiment(transcript[:512])[0]["score"],
"toxicity": 1 - toxicity(transcript[:512])[0]["score"], # inverted
"competency": competency(transcript[:512], ["leadership", "communication", "technical skill"])
}

Ensemble Scoring System

The scores are normalized and weighted across five dimensions:

Component Weight Purpose
Sentiment 0.25 Emotional tone
Toxicity 0.20 Professionalism
Competency 0.25 Skill fit
Keywords 0.15 Domain-specific terms
Filler Words 0.15 Clarity of expression

This produces an overall “Interview Fit Score” between 0–100.

AI Feedback (Gemini Integration)

After scoring, Gemini 2.0 Flash generates structured feedback:

prompt = f"""
You are an AI interviewer. Based on the following transcript and scores:
{transcript[:2000]}
Scores: {result}
Provide:

3 Strengths
3 Areas for Improvement
2 Next Steps """

response = gemini.generate_text(prompt)

Output Example:

Strengths: Excellent communication and positive tone.

Improvement: Needs stronger technical examples.

Next Steps: Practice STAR method; refine domain language.

🧩 Visualization

The frontend visualizes scores with color-coded progress bars and an NLP-driven performance timeline:

📊 Dashboard Example

📋 AI Feedback Example

🧱 Deployment Details

Cloud: Google Cloud Compute Engine

Machine: n1-standard-16 (16 vCPUs, 64GB RAM)

Environment: Dockerized FastAPI service

Storage: Local + Cloud Storage (optional for large files)

Monitoring: Basic logging via Cloud Logging

Note: The system originally ran on Cloud Run, but due to the 32MB file upload limit, it was migrated to Compute Engine for unrestricted workloads.

⚠️ Challenges & Fixes
Issue Root Cause Resolution
Cloud Run upload limit 32MB request cap Migrated to Compute Engine VM
Long Whisper inference Model size vs. time Added model selection (tiny→medium)
Flat score ranges Heuristic-only scoring Replaced with NLP-based segment scoring
Dependency errors Missing faster_whisper Pinned requirements + venv isolation
Frontend API mismatch Response schema drift Added unified response format + error handling

🔍 Key Learnings

Infrastructure matters — Serverless is not always production-friendly.

Speed/Accuracy tradeoff — Tiny vs. Medium Whisper can be 8× faster for 90% of the accuracy.

Heuristics ≠ ML — Real models make insights meaningful.

UX is part of ML — Users need visible progress and clear outcomes.

🧭 Future Roadmap

WhisperX Word-Level Analysis
→ Enables clickable word-level scoring visualization.

Role-Aware Rubrics
→ Zero-shot matching between candidate responses and job descriptions.

Real-Time SSE Updates
→ Show live progress of transcription and analysis in the UI.

🧰 Tech Stack Summary
Category Tools / Services
Cloud GCP Compute Engine
Backend FastAPI, Python 3.11
ML Whisper, RoBERTa, Toxic-BERT, mDeBERTa
LLM Gemini 2.0 Flash
Frontend HTML + JS
Infra Docker, venv, Cloud Logging

📂 Project Structure
interview-predictor/
├── app.py # FastAPI backend
├── utils/
│ ├── asr_processor.py # Whisper transcription
│ ├── nlp_analyzer.py # NLP model scoring
│ ├── ensemble_scorer.py # Weighted aggregation
│ ├── timeline_analyzer.py # Segment analysis
│ └── llm_feedback.py # Gemini integration
├── static/
│ └── index.html # Frontend UI
├── requirements.txt # Dependencies
└── Dockerfile # Deployment setup

📁 GitHub

🧠 Main portfolio: https://github.com/marcusmayo/machine-learning-portfolio

🚀 AI/ML portfolio (new repo): https://github.com/marcusmayo/ai-ml-portfolio-2

(New projects will be added here as the portfolio expands.)

💬 Closing Thoughts

This project taught me that the hardest part of AI engineering isn’t model tuning — it’s designing systems that work under real-world constraints.

If you’re an ML engineer, data scientist, or product builder exploring AI system design, this project is a great blueprint to start from.

Connect & Collaborate
I’m open for:

🤝 AI Product Coaching

🧠 Consulting on AI/ML System Design

💼 Collaborations with startups & innovation teams

Follow my work:
🧠 Weekend AI Project Series

💼 LinkedIn Connect with me

💻 GitHub Live Projects