Forem: CIZO

We Automated an Entire Business Operation. Here's What Actually Worked.

CIZO — Fri, 01 May 2026 07:19:58 +0000

There's a difference between automating a task and automating a process.

A task is one step. A process is ten steps, three teams, a bunch of exceptions, and someone manually following up on all of it every single day.

We recently built an end-to-end automation system for a healthcare equipment supply company. The client had working processes. People were getting things done. But everything depended on manual coordination, informal handoffs, and individuals remembering to follow up.

It worked. Until someone was sick, or the volume went up, or a new person joined who didn't know the unwritten rules.

Sound familiar?

What We Were Actually Solving

Before we wrote a single line of code, we spent time just watching how work moved.

Requests came in from multiple channels. Someone had to figure out where they belonged. Then pass them along. Then check back. Then update a spreadsheet. Then notify someone else. Then wait for approval. Then follow up again if no one responded.

Nothing was broken. It was just slow, repetitive, and completely dependent on people doing the same low-value steps over and over.

The client wanted automation. But they were clear about one thing: they did not want a system that ran on its own without any human oversight. They wanted speed and control, not just speed.

That one requirement shaped everything we built.

The Design Decision That Changed Everything

Most automation projects we see are built around individual tasks. Automate this form. Automate that notification. Automate this report.

The problem is you end up with a faster version of the same fragmented process. Each step moves quicker, but the handoffs between steps are still messy.

We decided early on to treat this as an operating model problem, not a task automation problem.

That meant designing how work enters the system, how it gets classified, who owns it, how it moves forward, when a human needs to get involved, and how everything gets logged. All of it, connected.

Request comes in
      ↓
Intake layer captures and classifies it
      ↓
Orchestrator creates a work item and assigns ownership
      ↓
Specialized agents handle narrow tasks
      ↓
Guardrails check everything before action
      ↓
Low-risk steps complete automatically
      ↓
High-risk steps go to a human reviewer
      ↓
Everything gets logged
      ↓
Dashboards show what is happening in near real time

Simple to draw. A lot of work to get right.

The Agent Setup

We did not build one big agent that tried to do everything.

That approach sounds appealing but it is hard to test, hard to govern, and hard to trust. When something goes wrong, you have no idea which decision caused the problem.

Instead, we built narrow agents, each with one job. An orchestrator sat above them and coordinated everything.

Here is how the agents were split:

Intake and classification agent — reads incoming requests, figures out what they are, flags missing information. That is its only job.

Data validation agent — checks that required fields are present, formats are correct, and information matches known records. Does not do anything else.

Execution support agent — handles approved routine actions. Drafts, updates, internal notifications. Works only from approved templates and validated data.

Knowledge retrieval agent — answers questions using approved internal sources only. No hallucinated answers. Every response needs a source.

Quality and compliance agent — reviews outputs before they go anywhere. Checks policy alignment, completeness, consistency.

Reporting agent — pulls operational data and generates summaries. Works from aggregated logs, not raw sensitive records.

Each agent had a defined list of tools it was allowed to use. It could not go outside that list. The orchestrator handled routing between agents and kept track of workflow state.

Orchestrator
    ├── Intake Agent        (classify + validate incoming requests)
    ├── Data Validation     (check completeness and accuracy)
    ├── Execution Agent     (routine approved actions only)
    ├── Knowledge Agent     (approved sources, citations required)
    ├── Quality Agent       (review before completion)
    └── Reporting Agent     (dashboards and summaries)

Narrow agents plus strong orchestration. Safer and much easier to debug than one broad autonomous system.

Guardrails Were Not an Afterthought

This is the part most automation projects underinvest in.

Guardrails are not just error handling. They are the rules that define what the system is and is not allowed to do on its own.

We built them into every layer.

On data — agents only received the information they needed for their specific task. Nothing extra.

On tools — each agent had an approved tool list. No agent could access systems outside its role.

On actions — anything above a defined risk level required a human to approve it before it happened.

On outputs — every agent output was checked against a schema before it moved downstream.

On retries — if a step failed, the system retried a fixed number of times. After that it escalated to a human, not kept trying forever.

We also defined a clear list of things the system would never do automatically:

Final approval on high-risk or high-impact decisions
External communication outside pre-approved templates
Changes to governance rules or agent permissions
Any action where data was incomplete or conflicting

When the system hit one of these walls, it stopped. It categorized the exception and sent it to the right person with context. No guessing. No trying to push through anyway.

The Human Review Layer

Keeping humans in the loop was not a compromise. It was a design principle.

The client wanted people to stay accountable for judgment calls. That meant building a review queue that was actually useful, not just a place where things went to die.

Every escalation included:

What the request was
What the system found
Why it escalated
What options were available
Where to find more context if needed

Reviewers could approve, reject, or ask for clarification. All decisions were logged with a reason. Nothing was approved silently.

This made auditing straightforward. Every automated decision had a trail back to a source. Every human decision had a record too.

What We Got Wrong Early

We built the feedback loop later than we should have.

In the first few weeks, when the system got something wrong, we fixed it manually and moved on. No structured logging of what went wrong or why.

That was a mistake. Every incorrect result was data we could have learned from. By the time we had the correction loop in place, we had missed a few weeks of signal.

If we built this again, the feedback loop would be there from day one. Every exception logged, categorized, and fed into the rule improvement backlog automatically.

What Happened After Launch

A few things became clear pretty quickly.

Manual follow-up dropped a lot. Teams were not spending half their day chasing status updates anymore. Work moved through the system with clear ownership and automatic nudges.

Exception handling became consistent. Before, how an edge case got resolved depended on who was working that day. After, every exception type had a defined path.

People trusted it. That one surprised us a little. Engineers and operations staff who were skeptical at the start started relying on the dashboards to understand their workload. When the system was transparent about what it was doing and why it escalated things, people trusted it faster than we expected.

One team member said something that stuck: "I used to spend half my day on follow-ups. Now I actually have time to think."

That is what this kind of system is supposed to do.

A Few Things Worth Knowing Before You Build This

Map the process before you touch technology. Automating a messy process just makes the mess faster. Spend time understanding current state, decision points, and exception types first.

Narrow agents are better than broad ones. It feels slower to build them that way. But they are easier to test, easier to fix, and easier to explain to stakeholders.

Make guardrails visible to users. When people can see why the system stopped and escalated something, they trust it more. A black box that sometimes escalates and sometimes does not will never be trusted.

Exceptions are product feedback. Every time the system cannot handle something, it is telling you where your rules or your data quality need work. Treat exceptions as a backlog, not a bug list.

Measure safety, not just speed. Track how often the system escalates. Track how often human reviewers override it. Track exception rates. Those numbers tell you whether the system is actually working or just completing tasks quickly.

Tech Stack

Layer	What We Used
Orchestration	Custom workflow engine (Node.js)
Agents	OpenAI GPT-4o / Claude API
Validation	Rules engine with deterministic checks
Database	PostgreSQL
Logging and audit	Structured event logging with retention rules
Dashboards	Custom reporting layer
Integrations	Client-specific APIs for inventory and records

Nothing exotic. The stack is conventional. What made it work was the design around it.

Closing Thought

Business process automation is not really a technology problem. It is a process design problem that happens to use technology.

The teams that get the best results are the ones who think through ownership, exceptions, escalation paths, and governance before they start building. The technology part is actually the easier half.

If you are working on something similar or running into problems with an existing automation setup, feel free to drop a comment. Happy to share more about specific parts of the architecture.

Built by the team at CIZO. We build AI systems, mobile apps, and IoT solutions for startups and enterprises. Say hi: hello@cizotech.com

Building an AI Sourcing System for Industrial Components: What We Learned

CIZO — Thu, 23 Apr 2026 11:54:50 +0000

I want to talk about a class of AI system design problem that doesn't get much attention.

Most of the AI architecture content out there is about making systems smarter, faster, or more capable. RAG pipelines, fine-tuning, better prompts, multi-agent orchestration. All useful stuff.

But there's a different problem that comes up the moment you deploy AI into a domain where output precision is non-negotiable. And the solution isn't a smarter model. It's a smarter system design around the model.

We ran into this building an AI sourcing system for industrial components. Here's what we learned.

The Setup

The goal was to let procurement engineers search for industrial components — bolts, springs, fasteners — using natural language. Instead of navigating complex filter forms and needing to know exact specs upfront, they could just describe what they need.

Simple enough idea. The problem showed up the moment we thought carefully about what "wrong output" means in this domain.

In a content recommendation system, a bad result is annoying. The user scrolls past it.

In industrial procurement:

wrong material standard → wrong part → installed in machinery → failure

That changes the entire design constraint. You're not optimizing for recall or a good average. You're optimizing to never return a technically invalid result.

Why the Obvious Architecture Fails Here

The first approach most teams reach for looks like this:

User query → LLM → Results

The LLM interprets the query, maybe generates some filters, searches the catalog, returns results. Clean. Simple. Fast to build.

Here's the failure mode.

When a user types "spring for high load in a small space", the query is underspecified. A lot of parameters are missing — load range, material, dimensions, spring type, applicable standard. The LLM will fill those in. That's what LLMs do. They generate plausible completions.

The generated values might be totally reasonable. They also might be wrong. And the system won't flag which parameters it assumed versus which it derived from actual input. It just responds.

In practice, this creates what I'd call the confident wrong answer problem. The output is well-formatted, sounds authoritative, and is technically incorrect. That's harder to catch than an obvious error, and in this domain, it gets acted on.

We tested this extensively in the early design phase. With a standard LLM-driven search approach, the system would return results that passed a surface-level inspection but failed when checked against engineering standards. Not every time. Just often enough that the system couldn't be trusted in production.

The Core Design Insight

Here's the reframe that unlocked the right architecture:

AI is reliable at understanding intent. AI is not reliable at making technical decisions.

These feel like the same thing but they're not. Understanding that a user means "corrosion-resistant outdoor bolt" is a language task. LLMs are genuinely good at this. Deciding that the correct material is A4 stainless steel conforming to ISO 4017 is an engineering task. That requires domain rules, constraints, and validated logic — not statistical inference over training data.

The mistake in the obvious architecture is giving AI both jobs. The fix is splitting them.

The Architecture We Built

User Input (Natural Language)
        ↓
┌─────────────────────────┐
│   Intent Extraction     │  ← AI layer. Interprets meaning only.
│   (AI — NLU)            │
└─────────────────────────┘
        ↓
┌─────────────────────────┐
│  Parameter Structuring  │  ← Engineering logic. Converts intent
│  Layer                  │    to bounded valid ranges.
└─────────────────────────┘
        ↓
┌─────────────────────────┐
│   Controlled Search     │  ← Searches on structured params,
│                         │    not raw NL query.
└─────────────────────────┘
        ↓
┌─────────────────────────┐
│   Validation Layer      │  ← Every candidate checked before
│                         │    surfacing. No exceptions.
└─────────────────────────┘
        ↓
   Precise Output

Six layers total. Each one has a narrow, specific responsibility. Let me go through the important ones.

Layer 2: Intent Extraction (The AI Layer)

This is where the LLM lives. Its job is purely extractive. Given a natural language query, return structured intent.

For "I need a corrosion-resistant bolt for outdoor use":

{
  "product_type": "bolt",
  "use_case": "outdoor",
  "requirements": ["corrosion_resistant"]
}

That's it. The LLM does not:

Generate material specifications
Suggest applicable standards
Produce dimensional ranges
Make any technical decisions

It interprets language. Nothing else. The moment you let it do more, you've reintroduced guesswork into the pipeline.

Layer 3: Parameter Structuring (The Critical Layer)

This is the layer most teams skip, and it's the most important one.

The extracted intent goes into a structured parameter engine built on engineering rules. This engine converts intent into bounded valid possibilities — not final values, but candidate sets with explicit exclusions.

For our bolt example:

Input intent:
  product_type: bolt
  use_case: outdoor
  requirements: [corrosion_resistant]

Output parameters:
  material_candidates: [stainless_a2, stainless_a4]
  coating_candidates: [hot_dip_galvanized, none_if_a4]
  standard_candidates: [iso_4017, iso_4018, din_933]
  exclusions: [carbon_steel_uncoated, aluminum_without_load_check]
  open_parameters: [size, thread_pitch, length]

Two rules this layer enforces:

No parameter is assumed. Every value is either derived from the intent, mapped from engineering rules, or left explicitly open. Never blindly filled in.
Exclusions are applied before search. The system doesn't search for carbon steel outdoor bolts and then filter them out. They never enter the search space at all.

The difference between "filter out after retrieval" and "exclude before search" matters more than you'd think. Filtering after retrieval still means the system briefly considered invalid candidates. Upstream exclusions make those candidates structurally impossible to return.

Layer 4: Controlled Search

Search runs on the structured parameters from Layer 3. Not on the original natural language query.

search_params = {
    "product_type": "bolt",
    "material": ["stainless_a2", "stainless_a4"],
    "standards": ["iso_4017", "din_933"],
    "application_tags": ["outdoor"],
    "exclude_materials": ["carbon_steel_uncoated"]
}

candidates = catalog.search(search_params)

No semantic similarity search on specs. No fuzzy matching on technical fields. Structured query against structured catalog data.

This eliminates a whole class of retrieval errors where results are linguistically related to the query but technically incompatible with the requirement.

Layer 5: Validation (Zero Tolerance)

Every candidate from Layer 4 goes through validation before it can reach the user.

For each candidate:
  ✓ Parameter compatibility check
      (do all specs work together as a system?)
  ✓ Standard compliance check
      (does this part actually conform to claimed standard?)
  ✓ Availability check
      (is this sourceable right now?)
  ✓ Constraint compatibility
      (no spec conflicts)

  If any check fails → reject
  User never sees it

No approximate matches. No "this is close." The output only contains candidates that have passed every check.

This is the layer that draws the hard line between "probably correct" and "confirmed correct."

Cross-Cutting Systems

Three systems run in parallel across the full pipeline.

Pattern Learning Engine
Stores validated parameter combinations from successful query-to-result flows. When similar intent comes in again, pre-validated mappings get reused. Reduces re-derivation overhead and improves consistency over time.

Feedback and Correction Loop
Any mismatch that gets flagged — by users, by downstream checks, by monitoring — triggers a rule update in the parameter engine. The same error doesn't repeat. This is how the system gets more accurate with use rather than degrading.

Quality Monitoring
Tracks accuracy metrics per query type. Fires alerts when result quality starts to drift. Essential when the catalog scales — a catalog growing from 20k to 200k parts will introduce edge cases the original rules didn't cover, and you want to catch those before they affect production output.

One honest admission: we built the feedback loop later than we should have. It came in partway through the project rather than from the start. Every wrong result before that was a missed learning opportunity. In any future project like this, the correction loop goes in on day one.

Tech Stack

Layer	Technology
Intent extraction	OpenAI GPT-4o / Claude API
Parameter engine	Custom rules engine (Node.js)
Search	PostgreSQL with structured queries
Validation	Rules + AI-assisted constraint checks
Feedback loop	Event logging + async rule updates
Monitoring	Custom metrics + alerting
Inventory	Client-specific API integrations

The stack is conventional. Nothing exotic. The architecture around the stack is what matters.

What the Results Looked Like

After this went live in a real industrial procurement workflow:

Incorrect component matches dropped significantly
Decision time for procurement engineers decreased
Engineers who had been skeptical of AI results started trusting and using the system

That last point is the meaningful one. Experienced engineers in technical domains are appropriately skeptical of AI output. When they start trusting a system, it's because the system has actually earned it — not because the UI looks good.

Design Principles Worth Keeping

A few things we'd apply to any AI system in a precision-critical domain.

Separate intent understanding from technical decision-making. They're different cognitive tasks that need different system components. Conflating them is where most AI accuracy problems start.

Structure before search. Always convert natural language to structured parameters before hitting retrieval. Raw NL queries introduce semantic drift into the retrieval layer.

Upstream exclusions beat downstream filters. If something is technically invalid, exclude it before it enters the search space. Don't retrieve it and filter it out.

Validation is not optional. In high-stakes AI, validation isn't a quality-of-life feature. It's the feature that makes the system trustworthy.

Build correction loops early. Every wrong output is training data for your rule engine — but only if you're capturing and acting on it from the start.

Closing Thought

The failure mode for AI in technical domains is predictable. AI gets too much autonomy over decisions it can't make reliably. The system produces confident-sounding output that's technically wrong. It erodes trust. The project either gets abandoned or lives permanently in "supervised review" mode where a human checks every output.

The fix isn't a more capable model. It's a more carefully designed system around the model. One where AI does what it's genuinely good at — understanding language — and deterministic logic does what it's genuinely good at — enforcing constraints and validating results.

AI for understanding.
System for control.

If you're building something similar, or running into this class of problem, happy to talk through it. Find us at cizotech.com or drop a comment below.

Built by the team at CIZO. We build production-grade AI systems, mobile apps, and IoT solutions. hello@cizotech.com

canonical_url: "https://cizotech.com/blog/ai-controlled-industrial-sourcing-decision-system"

We Built an AI That Doesn't Guess: Architecture for Industrial Component Sourcing

CIZO — Wed, 15 Apr 2026 10:55:30 +0000

Most AI search systems follow a flow that looks something like this:

User Input → LLM → Results

For a lot of domains, that's fine. Imprecise results in a playlist recommender or a content search tool are annoying. You try again.

But we recently built an AI-powered sourcing system for industrial components — bolts, springs, fasteners — and in that domain, an imprecise result doesn't mean a slightly wrong recommendation. It means:

wrong parameter → wrong part → real failure

So we threw out the standard architecture and built something different. This post is a technical breakdown of what we built, why, and the design principles behind it.

The Problem: AI Guesses. Always.

When a user types a vague query into a standard AI search system, the LLM fills in the blanks. It has to — that's what it does. It infers, interpolates, and presents a confident output.

Here's the failure mode we kept running into during initial testing:

User input:

"High load spring for a small space"

What an uncontrolled AI might assume:

Load range: something that seems high, based on training data patterns
Material: steel, probably
Dimensions: compact, probably
Standard: whatever seems relevant

The result looks technically formatted. The specs look plausible. But they were never validated against engineering constraints. The system guessed — and presented the guess with full confidence.

In industrial procurement, this is the most dangerous type of error: the confident wrong answer.

The Core Design Principle

Early in this project, we had a realization that reframed the entire architecture:

AI is excellent at understanding intent. AI is not reliable at making technical decisions.

These are two different cognitive tasks. We needed to split them — give AI the job it's genuinely good at, and give the deterministic system the job that requires precision.

The architecture we landed on:

User Input
    ↓
[AI Layer] — Intent extraction only. No spec decisions.
    ↓
[Parameter Structuring Layer] — Engineering logic maps intent to valid ranges
    ↓
[Controlled Search Layer] — Searches using structured constraints, not raw NL
    ↓
[Validation Layer] — Every candidate checked before surfacing
    ↓
Precise Output

Layer-by-Layer Breakdown

Layer 1: Natural Language Input

Users express queries as engineers actually think on the floor:

"I need a corrosion-resistant bolt for outdoor use"
"Spring for high load in a small space"
"Fastener for vibration-heavy environment"

No structured input required. No dropdowns or filter forms.

Layer 2: Intent Understanding — The AI's Actual Job

This is the only layer where the LLM has autonomous authority — and it has one job: structured intent extraction.

For the query "I need a corrosion-resistant bolt for outdoor use", the AI extracts:

{
  "product_type": "bolt",
  "use_case": "outdoor",
  "requirement": "corrosion_resistant"
}

That's it. The AI does not produce:

Material specs
ISO/DIN standards
Dimensional ranges
Final search filters

The AI interprets. Nothing else happens at this layer.

Layer 3: Parameter Structuring — Engineering Logic Takes Over

This is the most important layer in the system.

The extracted intent is passed to a structured parameter engine that maps intent to valid engineering possibilities — not final values, but bounded ranges and candidate sets.

For our bolt query:

use_case: outdoor + requirement: corrosion_resistant
→ material_candidates: ["Stainless Steel A2", "Stainless Steel A4"]
→ coating_candidates: ["Zinc plated (not recommended outdoor)", "Hot-dip galvanized", "None (A4 self-resistant)"]
→ standard_candidates: ["ISO 4017", "ISO 4018", "DIN 931", "DIN 933"]
→ exclusions: ["Carbon steel without coating", "Aluminum (load check required)"]

Two rules this layer enforces absolutely:

No parameter is ever assumed — every value is derived from engineering rules or left as an open range
Invalid combinations are excluded before search — the system doesn't search for carbon steel outdoor bolts and then filter them out; it never includes them in the search space at all

Layer 4: Controlled Search

The search layer receives structured parameters — not the original natural language query. This is a deliberate architectural choice.

Searching with raw NL queries allows semantic drift: the retrieval system surfaces items that are linguistically related but technically incompatible. Structured parameters eliminate this class of error entirely.

Example search execution for the outdoor bolt:

search_params = {
    "product_type": "bolt",
    "material": ["stainless_a2", "stainless_a4"],
    "standard": ["iso_4017", "din_933"],
    "application_compatibility": ["outdoor"],
    "exclusions": ["carbon_steel_uncoated"]
}

results = catalog.search(search_params)

No fuzzy matching. No semantic retrieval on specs. Structured query against structured data.

Layer 5: Validation — Zero Tolerance

Every candidate returned from Layer 4 goes through a validation pass before the user sees anything. This layer catches edge cases the parameter engine might have missed.

Checks run per candidate:

✓ Parameter compatibility (does this combination make engineering sense?)
✓ Standard compliance (does this part actually conform to the claimed standard?)
✓ Inventory / availability check (is this actually sourceable?)
✓ Constraint compatibility (no conflicts between specs)
✗ Reject if any check fails

This is the layer that converts "probably right" into "confirmed right." Nothing passes without clearing all checks.

Layer 6: Output

The user receives:

Exact matching components
Full technical specifications
Availability + stock data
Compatible variations (e.g., same bolt in A4 vs A2)

No "this should work." Only: this is the correct part.

Cross-Cutting Systems

Three systems run continuously across the full pipeline:

Pattern Learning Engine

Stores validated parameter combinations from successful selections. When a query pattern is seen again, the system can reuse pre-validated mappings rather than re-deriving from scratch. Improves both speed and consistency over time.

Feedback & Correction Loop

Incorrect matches (flagged by users or caught by monitoring) trigger rule updates in the parameter engine. The system gets more accurate with each correction, rather than repeating the same edge cases indefinitely.

Quality Monitoring

Tracks per-query accuracy, detects edge case clusters, and fires alerts when result quality metrics start to drift. Essential for maintaining reliability as the product catalog grows — an unchecked system will slowly degrade in accuracy without anyone noticing.

Design Principles We'd Apply to Any Similar System

After shipping this, here are the principles we'd carry forward to any AI system in a precision-critical domain:

1. Never let AI decide technical values autonomously
AI should narrow down possibilities; rules should finalize selections.

2. Structure before search
Never pass raw natural language directly to a search or retrieval layer. Always convert to structured parameters first.

3. Validation is not optional
Every result must be verified before surfacing. This isn't overhead — it's the feature.

4. Separate intent understanding from technical decision-making
These are different cognitive tasks. Model your architecture to reflect that separation.

5. Build for correction from day one
Your system will make mistakes. The question is whether it learns from them. Build feedback + correction loops before you need them.

Tech Stack

Component	Technology
Intent understanding	OpenAI GPT-4o / Claude (via API)
Backend	Node.js + Python
Parameter engine	Custom rule-based system
Validation	Rules + AI-assisted checks
Database	PostgreSQL + catalog systems
Inventory	API integrations (client-specific)

The stack is conventional. The architecture around it is what matters.

Takeaway

The failure mode for most AI systems in high-stakes domains is the same:

The AI is given too much autonomy over decisions it can't make reliably.

The fix isn't a better model. It's a better architecture — one where AI does what it's genuinely good at (understanding language and intent), and deterministic systems do what they're genuinely good at (enforcing constraints and validating outputs).

AI for understanding.
System for control.

If you're building AI for healthcare, manufacturing, logistics, fintech, or any domain where errors have real-world consequences — this separation isn't a nice-to-have. It's the thing that makes your system trustworthy in production.

Built by the team at CIZO — we build production-grade AI systems, mobile apps, and IoT solutions. Say hi: hello@cizotech.com

How We Architected an AI Engine That Generates 100+ Ad Creatives From a Single Brand Brief

CIZO — Fri, 10 Apr 2026 06:23:10 +0000

A technical breakdown of the layered AI pipeline behind a scalable creative strategy system — and what developers can steal from it.

We recently completed an internal AI platform for a performance marketing team managing multiple brands simultaneously. Their core problem was simple to state, hard to solve: they needed dozens of ad creative variations per campaign, but creative production was slow, manual, and impossible to scale.

The system we designed — an AI Creative Strategy Engine — now takes raw brand inputs and produces structured advertising assets (hooks, scripts, image ads, video concepts, UGC scripts) at volume. Here's how we built it, what architecture decisions we made, and what we'd do differently.

The Problem Space

Performance marketing on Meta and TikTok is a volume game. You don't launch one ad — you launch 20–50 variations, let them compete, kill the losers, double down on winners, and repeat. The bottleneck was never strategy. It was production throughput.

The old workflow looked like this:

Brand Brief
    → Marketing Strategy Discussion (days)
    → Creative Team Brainstorming (days)
    → Copywriting & Script Writing (days)
    → Design / Video Production (days)
    → Limited Creative Variations (3–5)
    → A/B Testing

By the time you had testable assets, the market had moved. And scaling this linearly — hiring more writers, more designers — was not a viable answer.

The Architecture: A Layered Intelligence System

The key insight was to stop thinking about this as "AI helping humans write ads" and start thinking about it as a structured data pipeline where brand intelligence flows through transformation layers and emerges as deployable assets.

Here's the full system architecture:

┌─────────────────────────────────────────────┐
│              BRAND INPUTS                   │
│  Website · Product Info · Personas          │
│  Onboarding Forms · Call Transcripts        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│         BRAND INTELLIGENCE LAYER            │
│  Brand Voice Extraction                     │
│  Product Positioning                        │
│  Audience Understanding                     │
│  Messaging Framework                        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│        CREATIVE INTELLIGENCE LAYER          │
│  Angle Mining                               │
│  Hook Framework Generation                 │
│  Emotional Trigger Analysis                 │
│  Winning Ad Pattern Library                 │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│            STRATEGY ENGINE                  │
│  Generates: Ad Hooks · Video Scripts        │
│  Creative Briefs · Campaign Concepts        │
│  Content Angles                             │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│          GOVERNANCE / QA LAYER              │
│  Brand Consistency Check                    │
│  Messaging Validation                       │
│  Quality Scoring                            │
│  Structured Formatting                      │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│          CREATIVE GENERATION                │
│  Static Image Ads · Video Ads               │
│  UGC Scripts · Creative Variants            │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│     CAMPAIGN DEPLOYMENT + FEEDBACK LOOP     │
│  Ad Performance → Winning Creatives         │
│  → Influence Future Strategy Generation     │
└─────────────────────────────────────────────┘

Each layer has one job. Outputs are structured. Nothing flows to the next stage without passing validation. Let's break each one down.

Layer 1: Brand Intelligence Extraction

This is the ingestion layer. We feed it:

Brand website (scraped + chunked)
Product documentation
Customer personas
Onboarding form responses
Sales/marketing call transcripts (Whisper-transcribed)

The LLM task here is extraction and structuring, not generation. The prompt engineering goal is to produce a stable, reusable brand object:

{
  "brand_voice": "Direct, empowering, slightly irreverent",
  "core_positioning": "Recovery tech for serious athletes",
  "audience_segments": [
    {
      "id": "seg_01",
      "label": "Competitive weekend warriors",
      "pain_points": ["DOMS", "slow recovery", "missed training days"],
      "language_patterns": ["grind", "bounce back", "next session"]
    }
  ],
  "messaging_pillars": ["Speed of recovery", "Science-backed", "Used by pros"],
  "avoid": ["Medical claims", "Before/after framing", "Aggressive pricing language"]
}

This brand object persists and is referenced by every downstream layer. Consistency comes from the data model, not from re-prompting every time.

Layer 2: Creative Intelligence — Angle Mining

Given the brand object, this layer generates a library of creative angles. An "angle" is a strategic lens through which to frame the product:

Pain-first: Lead with the problem (DOMS is killing your gains)
Social proof: Lead with credibility (Used by 40,000 athletes)
Curiosity: Lead with a surprising claim (Most foam rollers are doing it wrong)
Aspirational: Lead with the outcome (What if you recovered overnight?)
Contrarian: Challenge conventional wisdom (Ice baths might actually slow recovery)

Each angle maps to a set of emotional triggers pulled from a pattern library built from high-performing historical ads (CTR, ROAS, thumbstop rate).

def mine_angles(brand_object: dict, pattern_library: list[dict]) -> list[dict]:
    """
    Returns ranked creative angles for a brand,
    scored against historical pattern performance.
    """
    prompt = build_angle_mining_prompt(brand_object, pattern_library)
    raw = llm.complete(prompt)
    angles = parse_structured_output(raw, schema=AngleSchema)
    return rank_by_pattern_match(angles, pattern_library)

Layer 3: Strategy Engine — Hook + Script Generation

This is where volume happens. For each angle × audience segment combination, the engine generates:

3–5 ad hooks (the first 3 seconds of a video or first line of copy)
Full video scripts (15s, 30s, 60s variants)
Static ad copy (headline + body + CTA combinations)
UGC creator briefs (instructions for human creators)

The combinatorial math starts working in your favour here. With 5 angles × 3 audience segments × 4 hook variants = 60 unique concepts from a single brand brief, before you've touched image or video generation.

Prompt structure for hook generation:

System: You are a direct-response copywriter specialising in paid social.
        Output ONLY valid JSON. No preamble.

User: Given this brand context: {brand_object}
      And this creative angle: {angle}
      And this audience segment: {segment}

      Generate 5 ad hooks. Each hook must:
      - Be under 8 words
      - Create immediate pattern interrupt
      - Match the brand voice exactly
      - Trigger the emotional lever: {angle.trigger}

      Return as: {"hooks": [{"text": str, "rationale": str, "format": "video|static"}]}

Layer 4: Governance / QA Layer

This is the layer most teams skip. It is the layer that makes the system production-safe.

Every piece of generated content passes through three checks:

1. Brand consistency scoring

def score_brand_consistency(content: str, brand_object: dict) -> float:
    """
    Returns 0.0–1.0 score. Content below 0.75 is rejected and regenerated.
    Checks: voice match, avoid-list violations, pillar alignment.
    """

2. Regulatory / compliance check
For this client: no unsubstantiated health claims, no before/after framing (platform policy), no superlatives without evidence.

3. Structured formatting validation
Output must conform to the asset schema before being written to the creative library. A video script without a defined hook segment, body, and CTA does not pass.

class CreativeAsset(BaseModel):
    id: str
    type: Literal["video_script", "static_copy", "ugc_brief"]
    angle_id: str
    segment_id: str
    hook: str
    body: str
    cta: str
    brand_consistency_score: float
    approved: bool

Rejected assets are automatically regenerated with failure reason injected into the prompt context. In practice, ~12% of first-pass outputs are rejected and regenerated successfully on retry.

Layer 5: Creative Generation

Approved strategy outputs flow into generation:

Asset Type	Tool / Model
Static image ads	DALL-E 3 / Stable Diffusion (finetuned on brand assets)
Short-form video	Runway Gen-3 / Kling via API
UGC scripts	Passed to human creator network as structured briefs
Variant expansion	GPT-4o for copy variations on approved hooks

The creative generation layer is intentionally modular. We wrap each provider behind an interface:

class CreativeGenerator(Protocol):
    def generate(self, brief: CreativeBrief) -> GeneratedAsset: ...

class RunwayGenerator(CreativeGenerator):
    def generate(self, brief: CreativeBrief) -> GeneratedAsset:
        # Runway-specific implementation
        ...

class StableDiffusionGenerator(CreativeGenerator):
    def generate(self, brief: CreativeBrief) -> GeneratedAsset:
        # SD-specific implementation
        ...

When a better video model ships next month, you swap the implementation. The pipeline doesn't care.

Layer 6: The Feedback Loop

This is the layer that turns a tool into a system that learns.

After campaigns run, performance data (CTR, thumbstop rate, ROAS, hook retention) flows back and annotates the creative assets in the library:

def update_pattern_library(
    asset: CreativeAsset, 
    performance: CampaignMetrics
) -> None:
    """
    Winning creatives (top quartile ROAS) are decomposed:
    - Angle type extracted
    - Hook structure tagged
    - Emotional trigger logged
    - Added to pattern library with performance weight
    """
    if performance.roas >= WINNING_THRESHOLD:
        pattern = decompose_winning_creative(asset)
        pattern_library.upsert(pattern, weight=performance.roas)

The pattern library is what the Creative Intelligence Layer (Layer 2) references. The system gets better at generating hooks with every campaign that runs. This compounding loop is the actual product moat.

Tech Stack Summary

LLM Backbone:        GPT-4o (strategy/scripts) + Claude (QA/governance)
Orchestration:       LangChain / custom pipeline runner
Image Generation:    DALL-E 3 + Stable Diffusion (brand-finetuned)
Video Generation:    Runway Gen-3 API
Speech-to-Text:      OpenAI Whisper (for call transcript ingestion)
Data Layer:          PostgreSQL + pgvector (embeddings for pattern library)
API Layer:           FastAPI
Frontend:            Next.js (internal dashboard)
Queue:               Redis + Celery (async generation jobs)

What We Learned / What We'd Do Differently

What worked well:

The brand object as a persistent, reusable data structure. Every layer referencing a single source of truth eliminated inconsistency.
The governance layer. Building QA in as a pipeline stage (not a manual review step) was the right call for production safety.
Modular generator interfaces. We've already swapped two models since launch without touching pipeline logic.

What we underestimated:

Prompt versioning. Prompt changes break downstream output schemas. Treat prompts like code — version control them, test them, deploy them deliberately.
Latency in multi-step pipelines. When you chain 6 LLM calls sequentially, latency compounds. We ended up parallelising Layers 2 and 3 significantly and moving to async generation jobs.
The cost of regeneration. The 12% rejection-and-regeneration rate adds up. A tighter governance check earlier in the pipeline (pre-generation rather than post) would have been cheaper.

🔄 What we'd architect differently:

Move brand consistency scoring into the generation prompt context rather than as a post-generation filter. Prevention > correction.
Build the feedback loop instrumentation in from sprint 1, not as a v2 feature. The pattern library is only as good as the data flowing into it.

The Broader Point

The system works because it was designed as a data pipeline, not a chatbot with a marketing skin. Each layer has one responsibility, outputs are typed and validated, and the feedback loop creates compounding improvement over time.

These are not novel engineering ideas — they are software engineering fundamentals applied to AI workflows. The teams shipping durable AI products are the ones who treat LLMs as components in a system, not as magic boxes that answer questions.

At CIZO, we design and build AI-powered mobile applications — from architecture and LLM integration to deployment. If you're building an AI product and want to talk architecture, we're always up for a conversation.

Tags: #ai #llm #machinelearning #showdev #productivity

Cover image suggestion: A dark diagram showing the 6-layer pipeline with glowing connectors — or a split showing "manual workflow" vs "AI pipeline" side by side.

Architecting a Zero-Touch AI Delivery System with Make.com, GPT-4 & Google Workspace

CIZO — Thu, 02 Apr 2026 13:54:49 +0000

TL;DR — We built a 5-workflow automation pipeline that triggers on Stripe payment, pulls user context from a data store, runs 15 sequential GPT-4 completions, assembles a branded Google Doc, and delivers it to the buyer — all in under 5 minutes, with zero manual intervention.

System Overview

This is a production architecture breakdown of an AI-powered delivery system built for a personal brand products business. Two products, two independent pipelines, one shared data layer.

[Voice Form] ──► [3 x Make.com Workflows] ──► [Make.com Data Store + Airtable]
                                                          │
                        ┌─────────────────────────────────┤
                        │                                 │
              [Stripe: Playbook]               [Stripe: Content Machine]
                        │                                 │
              [Playbook Workflow]           [Content Machine Workflow]
                        │                                 │
              [15 x GPT-4 Completions]      [Whisper → GPT-4 → Leonardo AI]
                        │                                 │
              [Google Doc Assembly]         [Google Doc + Slides + Drive]
                        │                                 │
                  [Email Delivery]               [Email Delivery]
                        │                                 │
                  [Airtable Log]                 [Airtable Log]

Entry Point 1 — The Voice Form (Top of Funnel)

Before any purchase, users complete a Tally voice form. This is the data collection layer — it captures:

Name, email, phone
Niche and area of expertise
Goals and target audience
Tone of voice preferences
Platform focus (LinkedIn, Instagram, YouTube, etc.)

On submission, three Make.com workflows fire simultaneously:

Workflow 1 — Data Store Write

Trigger: Tally form webhook
Action:  Write all form fields to Make.com Data Store
Key:     user_email (used as lookup key at purchase time)

Workflow 2 — Airtable CRM Upsert

Trigger: Tally form webhook
Action:  Search Airtable for existing record by email
         → If found: UPDATE record with new form data
         → If not found: CREATE new record
Status:  Set to "New User" or "User Updated"

Workflow 3 — Additional Data Processing

Trigger: Tally form webhook
Action:  Supplementary processing and storage logic

Why split into 3 workflows?
Separation of concerns. Each workflow has a single responsibility. If Airtable goes down, the data store write still succeeds. Easier to debug, easier to extend.

Entry Point 2 — Stripe Payment Webhooks

Both main pipelines are payment-triggered. Stripe fires a checkout.session.completed webhook into Make.com when a purchase completes.

Each product has its own dedicated webhook endpoint → its own Make.com scenario. This keeps the pipelines fully independent — a failure in one never affects the other.

Pipeline 1 — The Playbook Workflow

Step 1: User Validation & Context Retrieval

1. Search Airtable by email → validate user exists
2. GET from Make.com Data Store using email as key
   → Retrieves all voice form answers stored at top of funnel
3. GET brand archetype reference doc from Google Docs
   → Used as a reference document in AI prompts

Step 2: 15 Sequential GPT-4 Completions

This is the core of the Playbook pipeline. Each completion generates one section of the brand strategy document.

For each completion:
  1. Build prompt (user context + archetype reference + section instructions)
  2. POST to OpenAI /v1/chat/completions
  3. Parse and format response
  4. Sleep buffer (avoid rate limits)
  5. Store output variable for Google Doc assembly

The 15 sections generated:

#	Section	Description
1	Tone of Voice	How the user communicates
2	Niche of Genius	Their specific expertise area
3	Claim to Fame	Unique credibility statement
4	Tagline	One-line brand statement
5	Buyer Persona	Ideal client profile
6	Buyer Journey	Client journey stages
7	Sales Navigator	Strategic sales positioning
8	Keywords	SEO & content keywords
9	About Section	3-part bio combining all sections
10	LinkedIn Bio	Platform-optimised profile copy
11	YouTube Bio	Channel description
12	Instagram Bio	150-character profile copy
13	Facebook Bio	Page description copy
14	Brand Archetype	Personality archetype classification
15	Content Pillars	Core content themes

Key prompt engineering consideration:
Each completion receives the outputs of previous completions as context. By completion 9 (About Section), the prompt includes tone of voice, niche, claim to fame, tagline, and buyer persona — creating a coherent, internally consistent document.

Step 3: Google Doc Assembly

1. Copy branded Google Doc template (via Drive API)
2. Use Docs API to replace placeholder tokens with AI outputs
   e.g. {{TONE_OF_VOICE}} → generated content
3. Set document sharing permissions
4. Store Doc URL in Airtable record

Step 4: Delivery & CRM Update

1. Send delivery email with Google Doc link
2. Update Airtable record:
   - status: "Playbook Delivered"
   - playbook_url: [doc link]
   - delivered_at: [timestamp]

Pipeline 2 — The Content Machine Workflow

More complex than the Playbook. Involves file handling, transcription, multi-modal AI, and Drive folder management.

Step 1: Email Validation

Trigger: Tally form submission (video upload + contact details)
Action:  Search Airtable by email → validate and link to existing record

Step 2: File Routing Logic

The system handles three input types:

IF file_type == "video/*":
  → CloudConvert: video → MP3
  → OpenAI Whisper: MP3 → transcript text

ELSE IF file_type == "audio/*":
  → OpenAI Whisper: audio → transcript text (skip conversion)

ELSE IF transcript_provided == true:
  → Use provided transcript directly (skip conversion + transcription)

Why this matters architecturally: Different buyers submit different file types. The routing logic means the pipeline handles all cases gracefully without requiring users to pre-convert anything.

Step 3: Playbook PDF Processing (Optional)

IF user has Playbook PDF:
  1. Upload PDF to OpenAI Files API
  2. Extract content via file retrieval
  3. Delete file from OpenAI (cleanup)
  4. Use extracted content as brand reference in content prompts

This is the cross-product integration point — The Content Machine uses The Playbook's content to generate brand-consistent output.

Step 4: GPT-4 Content Generation

Six content outputs, each with a dedicated generation pass and an emoji-cleaning pass:

Pass 1:  AI Content Strategy Overview    → Formatter → Emoji clean
Pass 2:  Newsletter Article              → Formatter → Emoji clean
Pass 3:  Blog Post                       → Formatter → Emoji clean
Pass 4:  Hashtag Set                     → Formatter → Emoji clean
Pass 5:  LinkedIn Carousel Copy          → Formatter → Emoji clean
Pass 6:  [Additional output]             → Formatter → Emoji clean

Why the emoji cleaning pass?
GPT-4 frequently inserts emojis in content outputs by default. For professional brand copy destined for a Google Doc, this needs stripping. A dedicated cleaning completion is cleaner than prompt engineering alone.

Step 5: Google Drive Folder Management

1. Search Airtable for existing Drive folder ID for this user
   IF exists: use existing folder IDs
   IF not exists:
     → Create main folder: "[User Name] - CIZO Content"
     → Create subfolder: "Outputs"
     → Store folder IDs in Airtable for future runs

The folder ID persistence pattern is important. Users can resubmit videos for new content packages. On second and subsequent runs, the system finds the existing folder and adds to it rather than creating duplicates.

Step 6: Asset Generation & Upload

1. Leonardo AI: Generate custom image from content theme prompt
2. Download image from Leonardo CDN
3. Upload image to user's Google Drive output folder
4. Upload MP3 audio to Google Drive output folder
5. Create LinkedIn carousel in Google Slides:
   a. Copy branded Slides template
   b. Apply custom brand colours via Slides API
   c. Populate slide content with carousel copy

Step 7: Google Doc Assembly & Delivery

1. Copy branded Google Doc template
2. Populate with all generated content sections
3. Insert carousel link + image reference
4. Insert headshot if uploaded
5. Create Airtable record logging all output URLs and folder IDs
6. Generate short URL via Short.cm API
7. Send delivery email with short URL

Data Layer — Airtable CRM Schema

Every user interaction updates a central Airtable record. The key fields:

Users Table:
├── email (primary key)
├── name, phone
├── niche, goals, tone_preferences
├── status [New User | Updated | Playbook Delivered | Content Delivered]
├── playbook_url
├── drive_folder_id
├── drive_output_folder_id
├── content_doc_url
├── created_at, updated_at, delivered_at

The drive_folder_id field is what enables the repeatable content system — once set, it persists across all future Content Machine runs for that user.

Key Architecture Decisions

1. Make.com Data Store as session cache
Rather than re-querying Airtable for form data at purchase time, the data store acts as a fast key-value cache keyed by email. Lower latency, simpler lookup, independent of CRM availability.

2. Sleep buffers between GPT-4 completions
With 15 sequential completions in the Playbook workflow, rate limit management is critical. Sleep modules between completions prevent 429 errors without requiring retry logic.

3. File cleanup after OpenAI Files API use
PDFs uploaded to OpenAI's Files API are deleted immediately after content extraction. This keeps the account clean, avoids storage costs, and is better practice from a data minimisation perspective.

4. Independent webhook endpoints per product
Stripe webhooks route to separate Make.com scenarios per product. This means product-specific logic changes never risk breaking the other pipeline, and each can be tested and deployed independently.

5. Folder ID persistence in Airtable
Storing Drive folder IDs after first creation turns a stateless workflow into a stateful one — without a database. The CRM becomes the state store.

Failure Modes & Considerations

Scenario	Handling
User buys without completing voice form	Airtable record missing → system creates one with payment data only; AI outputs will be less personalised
OpenAI rate limit hit	Sleep buffers reduce likelihood; Make.com retry logic handles transient failures
CloudConvert job fails	Workflow errors out; Make.com error handler can notify admin
Google Drive API quota	Unlikely at this scale; monitor via Google Cloud Console
Duplicate purchase (same email)	Airtable upsert handles gracefully; new doc created and linked

Estimated Build Scope

Phase                    Hours
─────────────────────────────
Architecture design      6–8
Prompt engineering       8–10
Make.com workflow build  12–16
CRM schema & logic       4–6
Google Workspace APIs    6–8
Testing & QA             8–12
─────────────────────────────
Total                    44–60 hrs

What Would You Do Differently?

A few things worth considering if rebuilding this today:

Replace Make.com with a custom Node.js service for the 15-completion Playbook workflow — more control over retry logic, error handling, and execution time
Add a webhook queue (e.g. via Inngest or Quirrel) between Stripe and Make.com to handle burst traffic gracefully
Stream GPT-4 outputs rather than waiting for full completion on each pass — would reduce total pipeline latency significantly
Abstract the Google Doc templating into a reusable service — currently tightly coupled to specific template IDs

Wrapping Up

The core insight here isn't the tools — it's the architecture pattern: capture context early, trigger on payment, personalise at generation time, deliver automatically.

That pattern is reusable across a wide range of product businesses. Anywhere personalised document delivery is the bottleneck, this approach applies.

Built by the engineering team at CIZO — we build AI-powered mobile apps and automation systems. Open to questions in the comments.

Building an AI System That Generates UGC Ads in Minutes (Multi-Model Orchestration Explained)

CIZO — Tue, 24 Mar 2026 14:08:36 +0000

Creating ad creatives is still one of the slowest parts of growth.

Even today, the workflow looks like this:

Find UGC creators
Ship products
Wait for content
Edit and publish

This takes days (sometimes weeks).

We wanted to change that.

So we built a system that generates UGC-style video ads in under a minute using multiple AI models working together.

This post breaks down how we built it — from architecture to attribution fixes.

The Problem We Were Solving

We saw three major bottlenecks:

1. Creative Production Doesn’t Scale

Every new ad required a full production cycle.
This limits testing and slows down iteration.

2. AI Tools Are Fragmented

Most tools solve one part:

Image generation
Video generation
Script generation

But not the entire pipeline.

3. Attribution Was Broken

We found:

Duplicate install events
Conflicting SDK signals
Inflated metrics

Which made optimization unreliable.

System Overview

We didn’t build “an AI feature.”

We built a multi-model AI pipeline.

Core Components:

Scenario API → Generates product visuals & variations
Creatify API → Converts assets into video ads
Custom Orchestration Layer → Manages flow, timing, and output

The Pipeline (Step-by-Step)

Here’s what happens when a user generates an ad:

User uploads a product image
Selects an AI actor
Scenario API generates visual assets
Creatify API renders video
Orchestration layer combines everything
Final ad is delivered

All of this happens in under a minute.

The Hard Part: Orchestration

The real challenge wasn’t calling APIs.

It was managing:

1. Async Processing

Each AI model responds at different times.
We had to design a system that:

Waits intelligently
Handles failures gracefully
Keeps latency low

2. Output Consistency

Different models → different outputs.

We needed:

Consistent visuals
Cohesive storytelling
Usable final ads

This required normalization and validation layers.

3. Speed Constraints

Target: < 60 seconds generation time

This meant:

Parallel processing where possible
Efficient retries
Minimal blocking operations

Fixing Attribution (Critical Layer)

While building the creative engine, we discovered a bigger issue:

The data layer was broken.

Issues:

Meta SDK + AppsFlyer conflicts
Duplicate events
Incorrect install tracking

Solution:

We rebuilt the attribution system:

Set AppsFlyer as the single source of truth
Removed conflicting signals
Fixed event mapping: `- start_trial
purchase`
Enabled proper postbacks

Result:

Clean tracking
Accurate reporting
Better campaign optimization

Product Layer: Hiding Complexity

Even with all this complexity, the product had to feel simple.

UX Principles:

Minimal steps
Fast feedback (instant previews)
No technical configuration

The goal:
Hide complexity. Deliver power.

Results

UGC ads generated in minutes
Unlimited creative variations
Faster testing cycles
Up to 96% cost reduction

Key Takeaway

Most people think AI products are about models.

They’re not.

They’re about systems.

AI models generate outputs.
Orchestration creates value.

Final Thoughts

This project wasn’t just about automation.

It was about building:

A scalable creative engine
A reliable attribution system
A product that improves performance marketing

If you're building with AI, focus less on individual models
and more on how they work together.

Full Case Study

If you want the full breakdown (business + product + impact):
👉 We Built an AI That Creates UGC Ads in Minutes

Let’s Discuss

Curious how others are handling:

Multi-model orchestration?
AI latency issues?
Attribution challenges?

Drop your thoughts 👇

We Built AI That Qualifies Real Estate Leads in 5 Minutes

CIZO — Fri, 20 Mar 2026 10:09:49 +0000

Most real estate systems don’t have a lead problem.

They have a response problem.

Leads come in from ads.
But no one calls them fast enough.

We recently built an AI system that fixes this — by calling and qualifying leads within minutes.

Here’s how we designed it.

The Real Problem Wasn’t Lead Quality

Our client generates buyer leads through:

Meta (Facebook) ads
Google ads
TikTok ads
Website forms

On paper, everything looked good.

But in reality:

Leads weren’t being called quickly
Agents made only 1–2 attempts
Many leads were marked as “unreachable”

And there was no way to verify any of it.

What We Needed to Solve

We weren’t trying to build a smart AI.

We needed a system that could:

Call leads within 5 minutes
Ask basic qualification questions
Record outcomes in the CRM
Follow up if there’s no response
Create accountability

Simple. Fast. Reliable.

The System We Built

Here’s the workflow:

Lead enters CRM
   ↓
Trigger automation
   ↓
AI voice call placed
   ↓
Lead answers → qualification
   ↓
No answer → SMS fallback
   ↓
CRM updated with outcome

Every lead goes through this flow automatically.

AI Voice Call (Core Layer)

We used an AI voice agent to:

Confirm if the buyer is still interested
Check if they already have an agent
Ask availability for a follow-up call

Key decision:
Keep calls under 3 minutes

This isn’t a sales call.
It’s just qualification.

Deterministic Qualification Logic

Instead of “AI guessing,” we used strict rules:

Qualified if:

Still looking to buy
Not working with another agent

Not qualified if:

Not interested
Already signed with an agent
Requested no contact

This made the system predictable and reliable.

SMS Fallback Layer

If the call is not answered:

The system sends an automated SMS
Asks for a convenient callback time

This ensures:
Every lead gets at least one touchpoint

Tech Stack

We didn’t overcomplicate it:

CRM: GoHighLevel
Voice AI: VAPI
Text-to-Speech: ElevenLabs
Automation: Make.com

Each tool had a clear role.
No unnecessary layers.

Architecture Breakdown

GoHighLevel (Trigger)
   ↓
Make.com (Orchestration)
   ↓
VAPI (Voice Call)
   ↓
AI Conversation
   ↓
Result वापस → CRM update
   ↓
Optional SMS fallback

The key was orchestration — not just AI.

Key Design Principles

1. Speed > Intelligence

A fast response beats a perfect response.

2. Short Conversations Win

Long AI calls reduce engagement.

3. Log Everything

Every attempt is recorded in the CRM.

This solves:
“I never got the lead” problem

4. Consistency Over Creativity

Same logic. Every time. No variation.

What This Changes

Instead of selling:

Raw leads
The business can now sell:

AI-qualified conversations
That’s a completely different value proposition.

What We’re Building Next

Phase 2 includes:

Live call transfer to agents
Smart agent routing
Call tracking & recordings

Goal:
Convert instantly when intent is high

Final Thought

This isn’t about AI replacing agents.

It’s about fixing the first 5 minutes —
where most deals are lost.

If You’re Building Something Similar

If you're working on:

Lead generation systems
CRM automation
AI voice workflows

Focus less on “AI intelligence”
and more on response speed + system design.

That’s where the real impact is.

Automating Cabinet Design: Converting Architectural Drawings into 3D Models with AI

CIZO — Fri, 13 Mar 2026 09:42:30 +0000

Architectural drawings already contain everything needed to build cabinet layouts.

Dimensions.
Cabinet placements.
Appliance spacing.

But most of this information exists in PDF drawings, which are designed for humans, not machines.

When cabinet designs move into production, someone usually needs to manually:

interpret the drawing
rebuild the layout in CAD
generate a 3D model
verify measurements

That process is slow and repetitive.

We recently worked on a system designed to automate this workflow by converting cabinet drawings directly into structured 3D data.

The interesting part wasn’t the AI model itself.

It was building the pipeline that makes the automation reliable.

System Architecture

The system converts cabinet drawings into 3D models through a multi-stage pipeline.

PDF Drawing
   ↓
PDF → High Resolution Images
   ↓
View Region Detection
   ↓
Cabinet Detection (YOLO)
   ↓
Measurement Extraction (LLM + OCR)
   ↓
Coordinate Mapping
   ↓
3D Geometry Generation
   ↓
AutoCAD DWG Export

Each stage handles a specific problem in the workflow.

Breaking the system into modules made it easier to debug and improve accuracy.

Step 1 — Converting PDF Drawings to Images

Architectural PDFs are not ideal inputs for computer vision models.

They contain vector data mixed with annotations, layers, and text.

To simplify processing, we convert each page into a high-resolution PNG image (300 DPI).

Higher resolution improves:

text extraction accuracy
detection performance
line segmentation

Small dimension labels become unreadable at lower resolutions, so image quality matters more than expected.

Step 2 — View Region Detection

A single architectural sheet usually contains multiple views:

floor plans
elevations
section views
cabinet details

Processing the entire page creates too much noise.

Instead, we segment the sheet into visual regions and classify them.

The system prioritizes the base floor plan, which typically contains cabinet placement information.

This step reduces false detections later in the pipeline.

Step 3 — Cabinet Detection Using YOLO

Once the relevant region is identified, we run an object detection model.

We trained a YOLO model to detect cabinet-related objects such as:

base cabinets
upper cabinets
tall cabinets
appliances

Each detection returns:

bounding box coordinates
confidence score
object label

Low-confidence detections are filtered out before moving to the next stage.

This step establishes where cabinets exist in the layout.

Step 4 — Extracting Measurements

Detection tells us where cabinets are, but not their size.

Cabinet drawings include dimension annotations like:

30"
2'-6"
34 1/2"

These values may appear:

rotated
overlapping other text
connected via leader lines

Traditional OCR struggles with this.

Instead, we combine OCR with a vision-enabled LLM.

For each detected cabinet, the system:

Crops the region around the cabinet
Sends the image to a vision model
Requests structured measurements

Example output:

{
  "cabinet_type": "base",
  "width": 36,
  "height": 34.5,
  "depth": 24
}

To prevent errors, we added validation rules.

If measurements fall outside expected cabinet ranges, the result is flagged for manual review.

Step 5 — Coordinate and Scale Detection

Architectural drawings use scale references such as:

1/8" = 1'-0"

Without interpreting this scale, cabinet positions remain in pixel space.

The system identifies the scale marker and converts pixel distances into real-world coordinates.

Each cabinet receives an X/Y/Z position relative to the drawing origin.

This allows the layout to be reconstructed accurately in 3D space.

Step 6 — Generating the 3D Layout

Once we have:

cabinet detections
measurements
real-world coordinates

we can generate 3D geometry.

We implemented a viewer using Three.js where each cabinet becomes a parametric 3D object.

This step is less about visualization and more about validating the pipeline.

Architects can quickly review the generated layout and correct any misdetections.

The goal isn’t perfect automation.

It’s reducing manual modeling work.

Step 7 — Exporting to AutoCAD

The final stage converts the generated geometry into DWG files.

Using the AutoCAD SDK, the system exports:

cabinet blocks
correct dimensions
real-world coordinates

If upstream data is correct, the export works reliably.

Interestingly, this stage turned out to be one of the simplest parts of the system.

Most of the complexity lies in interpreting drawings correctly.

Challenges We Encountered

1. Drawings Are Inconsistent

Architectural drawings vary widely depending on the designer.

Annotation styles, measurement formats, and layout conventions are rarely standardized.

The system needs to handle a wide range of variations.

2. Measurement Ambiguity

Dimension labels are not always placed directly next to objects.

They may refer to multiple cabinets or entire cabinet groups.

Resolving these relationships requires contextual reasoning.

3. Legacy Drawings

Older scanned drawings introduce additional problems:

blurred lines
noisy backgrounds
overlapping annotations

These reduce detection accuracy significantly.

Current System Status

The system is currently an MVP under active development.

Performance is strong for:

clean digital drawings
modern architectural layouts

Edge cases remain for:

scanned plans
dense dimension annotations
complex sheet compositions

Even with these limitations, the system already provides meaningful time savings.

Architects can start with a generated layout and correct it instead of creating it from scratch.

Final Thoughts

Projects like this highlight an important lesson about AI systems.

The hardest part usually isn’t the model.

It’s designing the workflow around it.

Solving this problem required combining:

document processing
computer vision
LLM interpretation
geometry generation
CAD integration

Individually, these technologies are powerful.

Together, they create a system capable of automating a real engineering workflow.

Turning Cabinet Drawings into 3D Models with AI

CIZO — Thu, 05 Mar 2026 09:46:24 +0000

Construction and cabinet manufacturing still rely heavily on PDF drawings.

Designers create them.
Clients approve them.
But when production begins, someone still needs to manually convert those drawings into CAD models or 3D layouts.

That process is slow.

And repetitive.

So we asked a simple question:

Can AI convert cabinet drawings directly into usable 3D data?

This project explores how we built a system that reads cabinet drawings from PDFs and converts them into structured geometry that can generate DWG and 3D models.

The Real Problem

Cabinet drawings contain a lot of useful information:

Layout structure
Cabinet boundaries
Measurements
Labels
Door and drawer positions

But most of this information exists in visual form.

Machines cannot easily interpret that.

Traditional automation tools fail because they expect structured CAD data, not messy PDF drawings.

So the challenge was:

How do we convert visual architectural information into structured geometry?

System Overview

The pipeline we designed combines computer vision, detection models, and language models.

The workflow looks like this:

PDF is converted into images
Computer vision detects cabinets and components
Text extraction captures measurements
LLM interprets dimensions and structure
Structured geometry is generated
DWG / 3D models are produced

Each step solves a specific problem.

Step 1 — Detecting Cabinets with Computer Vision

We trained a YOLO-based object detection model to identify cabinet components inside drawings.

The model detects:

Base cabinets
Wall cabinets
Tall cabinets
Appliances
Structural boundaries

Why YOLO?

Because it provides fast detection with high spatial accuracy, which is critical when working with architectural drawings.

Once detected, the system extracts bounding boxes and spatial relationships between cabinets.

This becomes the foundation for geometry reconstruction.

Step 2 — Extracting Measurements

Cabinet drawings include important measurements like:

Width
Height
Depth
Spacing between cabinets

We use OCR pipelines to extract measurement text from the drawing.

But raw text is messy.

For example:

W 36"
H 34 1/2"
D 24"

This is where AI interpretation becomes necessary.

Step 3 — Using an LLM to Understand Dimensions

The extracted text is passed to an LLM layer that converts ambiguous measurements into structured data.

Example:

Raw text:

36 W x 34.5 H x 24 D

Converted into structured format:

{
 width: 36,
 height: 34.5,
 depth: 24
}

The LLM also resolves:

inconsistent labels
missing context
measurement formatting

This step turns visual annotations into reliable numerical data.

Step 4 — Reconstructing Cabinet Geometry

Once we have:

cabinet detection
dimensions
layout relationships

We can generate structured cabinet geometry.

Each cabinet becomes a parametric object like:

Cabinet {
  type: Base
  width: 36
  height: 34.5
  depth: 24
  position: (x,y)
}

From this structure we can generate:

3D models
AutoCAD DWG files
manufacturing layouts

Step 5 — Exporting CAD and 3D Models

The final step converts structured geometry into formats used by design tools.

Outputs include:

DWG files
3D cabinet assemblies
layout visualizations

At this stage, designers can directly open the results inside CAD software.

What previously required hours of manual work can now be generated automatically.

Key Challenges We Faced

Building this system exposed several real-world problems.

Drawing Variability

No two cabinet drawings are identical.

Different designers use different:

annotation styles
measurement formats
symbols

The AI system must handle high variation.

Scaling and Measurement Accuracy

Architectural drawings use scaled representations.

We had to design logic that converts pixel measurements into real-world dimensions.

Even small errors could break cabinet assembly.

Spatial Relationships

Cabinets are not isolated objects.

They depend on:

walls
appliances
adjacent cabinets

The system must understand layout context, not just object detection.

What This Enables

Automating cabinet interpretation unlocks several possibilities:

Faster cabinet design workflows

Automated CAD generation

Reduced manual drafting work

Faster manufacturing preparation

In the future, systems like this could process entire architectural plans, not just cabinets.

Final Thoughts

Most industries still rely on documents created for humans, not machines.

But with the right combination of:

computer vision
detection models
language models

we can convert visual design documents into structured data pipelines.

Cabinet drawings are just one example.

The bigger opportunity lies in automating how machines read and understand design documents.

We Integrated JobTread + CompanyCam into a Daily Reporting Workflow. Timing Was the Hard Part.

CIZO — Tue, 24 Feb 2026 16:13:19 +0000

If you’ve built automations in the real world, you’ve probably had this moment.

The workflow looks perfect in testing.
The triggers fire.
The messages appear.
The demo is clean.

Then production starts.

And the system doesn’t “break.” It becomes unreliable.

That’s what happened when we built an automation layer around JobTread and CompanyCam to produce daily job reports for leadership.

The data already existed.

Field teams were documenting work inside CompanyCam. Job context lived in JobTread. Photos and descriptions were already being captured.

So on the surface, it looks like a simple problem.

“Just summarize what’s already there.”

But when you attempt that with naive event triggers, you discover the real problem isn’t summarization.

It’s timing.

Why timing destroys most automations

Most people build these workflows as a chain reaction.

A photo gets uploaded → run the automation.
A description is updated → run again.
A job status changes → run again.

This creates the illusion of “real-time reporting.”

What it actually creates is reporting drift.

The summary changes depending on when it runs. A late photo changes the output. A sync delay causes partial context. Two events fire close together and you generate duplicates. Leadership sees conflicting versions and stops trusting it.

This is one of those automation failure modes that doesn’t show up as an error.

Everything “works.”

And that’s why it’s dangerous.

The design change that stabilized everything

We stopped treating reporting as an event reaction.

We treated it as a deliberate system artifact.

That means we introduced a controlled activation model.

Instead of “run whenever anything changes,” the system generates one report when the reporting window is considered ready.

One job.
One day.
One report.

If you’re used to automation thinking, this might feel like less automation.

In practice, it’s more dependable automation.

Because now you have a boundary.

You know what moment counts as “report time.”
You know what gets included.
You know what’s ignored.
You stop fighting late updates.

Why structured AI output matters

Once timing is controlled, the second thing that breaks trust is unstructured output.

If you use AI as a generic summarizer, the output becomes a paragraph. That paragraph changes depending on phrasing. It may be accurate, but it’s not decision-ready.

Operational reporting needs consistent sections.

It needs to answer the same questions every day in roughly the same shape.

So we treated the AI layer like a constrained renderer.

Not a writer.

We shaped the output around what leadership actually needs:

What work was completed.
What material was used.
What issues happened.
What’s blocked.
What’s planned next.

When output has structure, you get operational clarity.

When output doesn’t, you get “AI text.” And people tune out.

Why delivery channel matters

A lot of automations “work” but don’t get used because they land in the wrong place.

If leadership has to log into another tool to see the report, it becomes optional. Optional becomes ignored.

So the report had to land where leadership already operates.

When reports arrive consistently in the same channel, people start building habits around them.

That’s when automation stops being a novelty and becomes part of operations.

The actual lesson

Most automation tutorials teach you how to connect tools.

Production automation is about controlling behavior.

It’s about deciding:

When should the system generate a report?
What counts as the reporting window?
How do you prevent duplicates?
What happens when data arrives late?
How do you keep output stable?

If you don’t answer those questions early, your automation will slowly collapse under timing variability.

Not because the tools failed.

Because the system never decided how it should behave when timing is imperfect.

And timing is always imperfect.

That’s what production teaches you.

If Your AI Product Can’t Handle Deletion, It Can’t Handle Monetization

CIZO — Tue, 17 Feb 2026 10:27:45 +0000

There’s a structural issue most AI products ignore until it’s too late.

Deletion.

Not model quality.
Not inference speed.
Not embeddings.

Deletion.

While designing a monetized AI system recently, we ran into something that forced a complete rethink of the architecture.

The product allowed users to:
• Chat freely (ephemeral conversations for safety)
• Pay to preserve meaningful interactions
• Delete their account at any time
• Expect compliance-grade data handling

Individually, each requirement made sense.

Together, they conflicted.

Because the moment a user pays to preserve a conversation, you’ve created a retention contract.

And if your deletion logic doesn’t respect that contract at the data layer, monetization becomes unstable.

Where Most Systems Go Wrong

The naive implementation looks like this:
• Store conversations in a table
• Add a saved = true column
• Add subscription checks in business logic
• Prevent deletion via UI if needed

It works during demos.

It works in staging.

It even works for the first few hundred users.

Then:
• TTL cleanup jobs run
• Subscriptions expire
• Account deletion triggers cascade rules
• Compliance requests arrive
• Billing records need audit consistency

And suddenly, your “saved” boolean means nothing.

Deletion is not a UI concern.

It is a structural authority problem.

The Architectural Separation That Fixes It

The system only stabilized when we separated:

Interaction objects
Materialized persistence
Retention authority

The flow became explicit:

User Message
 ↓
ConversationThread (ephemeral, TTL governed)
 ↓
Message (ephemeral)
 ↓
User selects “Save”
 ↓
ChronicleAsset (materialized snapshot)
 ↓
Entitlement (retention authority)
 ↓
DeletionRequest → Entitlement Check → Cascade Rules

The critical insight:
Threads are not the retention boundary.
ChronicleAssets are.

Once you define that boundary, deletion and monetization stop fighting each other.

Chronicle as a Materialized Snapshot

A saved conversation cannot remain a mutable thread.

It must become its own immutable artifact.

ChronicleAsset {
 chronicle_asset_id: UUID,
 source_thread_id: UUID,
 owner_user_id: UUID,
 snapshot_ref: ObjectStoreURI,
 created_at: timestamp,
 immutable: true
}

This makes it structurally distinct from live interaction.

Deletion can wipe threads safely.

But ChronicleAssets are governed by entitlements.

Entitlement as Retention Authority

Monetization must be enforced by data-level ownership rules — not UI locks.

Entitlement {
 entitlement_id: UUID,
 user_id: UUID,
 target_entity_type: "ChronicleAsset",
 target_entity_id: UUID,
 status: "active" | "expired" | "revoked",
 valid_until: timestamp
}

Deletion logic becomes:
• If no entitlement → cascade delete
• If active entitlement → preserve asset
• If compliance override → apply regulated deletion

Without this structure, monetization will eventually contradict deletion.
And when that happens, engineers are forced to debug philosophy using production data.

Why This Matters for AI Systems Specifically

AI products are not just inference systems.

They are:
• Memory systems
• Identity systems
• Retention systems
• Authority systems

Deletion exposes whether those systems are coherent.

If your architecture cannot formally define:
• What is ephemeral
• What is materialized
• What entity enforces retention
• What overrides deletion
• What must remain auditable

Then you don’t have a production AI system.

You have a prototype with pricing.

Monetization is not a feature.

It is a retention boundary decision.

And if your AI product cannot survive deletion logic, it won’t survive scale

When AI Systems Scale, Dashboards Start to Get in the Way

CIZO — Tue, 10 Feb 2026 05:41:13 +0000

As AI moves from experiments into real production systems, teams start to encounter a familiar pattern. It doesn’t show up during early demos or pilot phases. It appears later — once AI is embedded into workflows that people rely on every day.

At that point, dashboards often stop being the center of the system. Over time, they become a source of friction.

This isn’t an argument against dashboards. It’s a description of what tends to happen as AI-driven systems grow in complexity and decision frequency.

Dashboards Are a Natural Starting Point

Dashboards work well early on.
They provide:

Visibility into system state
Aggregated metrics and trends
A clear place for humans to make decisions

When decisions are infrequent or low-risk, this setup is efficient. Humans review information, apply judgment, and trigger actions. Many early AI systems fit comfortably into this model, which is why dashboards become the default choice.

Where the Model Starts to Break

As systems mature, the work changes.
Teams begin to see:

More decisions per day
Increasingly conditional logic
Time-sensitive actions with downstream impact

At this stage, dashboards don’t fail technically. The data is still accurate. The issue is operational.

People spend more time:

Monitoring screens
Correlating signals across tools
Acting as intermediaries between systems

The system technically works — but human attention becomes the bottleneck.

From Monitoring to Execution

Once decision volume crosses a certain threshold, teams usually stop asking how to visualize information better and start asking why someone needs to look at it at all.
This is where the system begins to change shape.
Instead of reporting state and waiting, parts of the system start to:

Trigger actions automatically
Apply predefined rules
Escalate exceptions
Log outcomes for later review

Dashboards don’t disappear, but they stop being the primary interface. Their role shifts toward oversight instead of direct control.

Agents as an Architectural Response

This transition often introduces what are commonly called “agents.”
In practice, these aren’t chatbots or unconstrained autonomous systems. They are bounded execution units designed to reduce coordination overhead.

An agent typically:

Has access to relevant context
Applies defined decision logic
Takes action or escalates
Reports what happened

Agents emerge not because they’re trendy, but because dashboards alone don’t scale well once execution becomes the dominant concern.

What Changes When Agents Take Over Execution

As agents move closer to the core workflow, several patterns tend to emerge:

Fewer interfaces Teams stop adding dashboards for every edge case.
Clearer accountability Decisions are automated, escalated, or logged explicitly.
Lower cognitive load Humans focus on exceptions instead of constant monitoring.
More consistent behavior System outcomes depend less on who is watching at a given moment.

Dashboards still matter — they just stop being the system.

Humans Don’t Disappear From the Loop

None of the systems we’ve seen aimed for full automation.
Humans remain essential for:

Oversight and review
Handling ambiguous or novel cases
Defining policies and constraints
Evaluating whether automation still makes sense

As systems mature, human involvement becomes less frequent but more intentional.

Implications for Teams Building AI Systems

A few practical lessons tend to follow:

Design workflows around actions, not just views
Treat dashboards as optional components, not architectural anchors
Expect interfaces to evolve as decision complexity increases
Avoid heavy UI investment before execution paths are clear

Not every system needs agents. But beyond a certain scale, dashboards alone rarely hold up.

Designing for System Behavior, Not Interfaces

This shift isn’t a prediction about the future of software. It reflects how systems already behave once AI moves from analysis to execution.

As responsibility shifts from people to systems, interfaces naturally become secondary. Teams that recognize this early spend less time managing dashboards — and more time improving how decisions actually get made.