Forem: Dawid Dahl

AAID: Augmented AI Development

Dawid Dahl — Mon, 06 Oct 2025 08:21:06 +0000

Professional TDD for AI-Augmented Software Development

What Is AAID and Why It Matters
The Business Case: What Performance Research Shows
Who This Guide Is For
Built on Proven Foundations
Developer Mindset
Prerequisite: Product Discovery & Specification Phase
- From Specification to Development
Getting Started With AAID
AAID Workflow Diagram
AAID Development Stages
- Stage 1: Context Providing
- Stage 2: Planning
- Stage 3: TDD Development Begins
- Stage 4: The TDD Cycle
Continuing the TDD Cycle
Conclusion: The Augmented Advantage
Example Implementation
Appendices (Optional)
- Appendix A: Acceptance Testing
- Appendix B: Helpful Commands (Reusable Prompts)
- Appendix C: AAID AI Workflow Rules
- Appendix D: Handling Technical Implementation Details
- Appendix E: Dependencies and Mocking
About the Author

What Is AAID and Why It Matters

AUGMENTED AI DEVELOPMENT AAID (/eɪd/ - pronounced like "aid") is a disciplined approach where developers augment their capabilities by integrating with AI, while maintaining full architectural control. You direct the agent to generate tests and implementation code, reviewing every line and ensuring alignment with business requirements.

You're not being replaced. You're being augmented.

This separates professional software development from "vibe coding." While vibe coders blindly accept AI output and ship buggy, untested code they can't understand, AAID practitioners use proper TDD (Test-Driven Development) to ensure reliable agentic assistance.

The Business Case: What Performance Research Shows

DORA (Google Cloud's DevOps Research and Assessment) highlights the proven TDD principle AAID relies on: developer-owned testing drives performance [1]. At the same time, a 25% increase in AI adoption correlates with a 7.2% drop in delivery stability and 1.5% decrease in throughput, while 39% of developers report little to no trust in AI-generated code [2].

AAID solves this. The TDD discipline forces every AI-generated line through comprehensive testing and mandatory reviews, capturing AI's productivity gains (increased documentation quality, code quality, review and generation speed [2]) without the stability loss.

DORA proves speed and stability aren't trade-offs [3]. With AAID, speed comes from AI augmentation supported by the safety net of tests, stability from disciplined testing. You get both together, not one at the expense of the other.

[1] DORA Capabilities: Test automation
[2] Announcing the 2024 DORA report | Google Cloud Blog
[3] DORA's software delivery metrics: the four keys

Who This Guide Is For

AAID is for serious developers who aim at maintainable software. Whether you're a professional engineer or someone building a personal project you expect to last over an extended period of time.

If you just need quick scripts or throwaway prototypes, other AI approaches work better.

What you need:

Basic understanding of how AI prompts and context work
Some experience with automated testing
Patience to review what the AI writes (no blind copy-pasting)

What you don't need:

TDD experience (you'll learn it here)
Specific tech stack knowledge
Deep AI expertise

The result? Predictable development with great potential for production-grade quality software. While initially the AAID workflow requires more discipline and effort than vibe coding, in the long run you'll move faster. No debugging mysterious AI-generated bugs or untangling code you don't understand.

This guide shows you exactly how to, from context-setting through disciplined TDD cycles, ship features that deliver real business value.

It's also an incredibly fun way to work!

Built on Proven Foundations

Unlike most other AI-driven workflows, AAID doesn't try to reinvent product discovery or software development. Instead it stands on the shoulders of giants, applying well-established methodologies:

Kent Beck's TDD cycles
Dave Farley's Continuous Delivery and four-layer acceptance testing model
Robert C. Martin's Three Laws of TDD
Daniel Terhorst-North's Behavior-Driven Development (BDD) methodology
Gojko Adzic's Specification by Example methodology
Aslak Hellesøy's BDD and Gherkin syntax for executable specifications
Eric Evans's Domain-Driven Design and Ubiquitous Language
Martin Fowler's work on refactoring, evolutionary design, and Domain-Specific Languages
And more.

These battle-tested practices become your foundation that guides AI-assisted development.

Developer Mindset

Success with AAID requires a specific mindset:

1: 🧠 Don't abandon your brain

You need to stay engaged with, and comprehend, every line of code, every test, every refactoring. The AI generates the code, but you decide what stays, what changes, what is removed, and why.

Without this understanding, you're just hoping things will work, which is sure to spell disaster in any real-world project.

2: 🪜 Incremental steps

This mentality is what really sets this AI workflow apart from others. Here, instead of letting the AI go off for minutes or even hours and produce a lot of dangerous garbage code, you make sure to remain in control by iterating in small, focused steps.

One test at a time. One feature at a time. One refactor at a time.

This approach surfaces mistakes early and can even help you save money by keeping token usage low, while also making it easier to use smaller and cheaper models.

This is why the TDD cycle in AAID adds multiple review checkpoints—⏸️ AWAIT USER REVIEW—after each phase (🔴 RED, 🟢 GREEN, and 🧼 REFACTOR).

☝️
These incremental steps mirror DORA's research on working in small batches: tiny, independent changes give you faster feedback and reduce risk [1].

Prerequisite: Product Discovery & Specification Phase

Before development begins, professional teams complete a product specification phase involving stakeholders, product owners, tech leads, product designers, developers, QA engineers, architects. From a high level, it follows some kind of refinement-pattern like this:

Client's Vague Wish → Stories → Examples

Using techniques like Impact Mapping, Event Storming, and Story Mapping, teams establish specifications that represent the fundamental business needs that must be satisfied. The resulting specifications can include:

User stories with BDD examples, organized into epics
- Or a Story Map containing the user stories + BDD examples ← (the AAID recommendation)
PRD (Product Requirements Document)
Ubiquitous language documentation. (A common language shared among stakeholders, developers, and anyone taking part in the project)
Any additional project-specific requirements

The exact combination varies by project.

This specification package will then be used—almost religiously—to serve as the objective foundation for the AAID workflow, aligning development with the actual needs of the business.

⚙️
Technical requirements (infrastructure elements, styling, NFRs) are tracked as separate linked tasks within stories, keeping behavioral specs pure. Learn more.

From Specification to Development

Here's how a typical user story with BDD examples can look.

Take note of how all these BDD examples only describe the behavior of the system. Importantly, they say nothing of how to implement them technically.

User Story Example:

Title: User archives completed todos

User Story:
As a user, I want to archive completed todos, so that my active list stays clean
and I can focus on current tasks.

Acceptance Criteria:

Feature: User archives completed todos

Scenario: Archive a completed todo
  Given the user has a completed todo "Buy milk"
  When they archive "Buy milk"
  Then "Buy milk" should be in archived todos
  And "Buy milk" should not be in active todos

Scenario: Cannot archive an incomplete todo
  Given the user has an incomplete todo "Walk dog"
  When they attempt to archive "Walk dog"
  Then they should see an error message
  And "Walk dog" should remain in active todos

Scenario: Restore an archived todo
  Given the user has archived todo "Review code"
  When they restore "Review code"
  Then "Review code" should be in active todos

This article is not about this product discovery and specification refinement step; it assumes you have the specs ready. When you do, it will guide you towards how to transform the specs → tests and code ready for production.

Getting Started With AAID

Prerequisites: AAID is a feature development workflow that assumes:

✅ Specifications ready: User stories with BDD scenarios from Product Discovery

✅ Working project: Development environment, test runner, linting and basic tooling configured (new or existing codebase)

Note: Basic project scaffolding (running framework generators, setting up config files) involves structural setup rather than implementable technical contracts, placing it outside AAID's TDD workflow. Custom infrastructure implementations (adapters, middleware, auth setup, etc) use AAID with TDD. See Appendix D for details on technical implementation.

Setting up AAID takes just three steps:

1: Add the workflow rules

Save the AAID rules from Appendix C to your project:

Cursor: .cursor/rules/aaid.mdc
Claude Code: CLAUDE.md in project root
Gemini CLI: GEMINI.md in project root

2: Add reusable commands (optional but recommended)

Copy the command files from Appendix B to your project:

Cursor: .cursor/commands/ (Markdown format)
Claude Code: .claude/commands/ (Markdown format)
Gemini CLI: .gemini/commands/ (TOML format - needs conversion)

3: Have the `AAID` workflow diagram ready

AI agents sometimes make mistakes and unfortunately may not always follow instructions. If/when that happens, since you follow the AAID mindset, you can manually steer it back on track with the workflow diagram as your guide.

Liftoff into `AAID` space 🚀

That’s it, there's no more magic to it than that. The rules enforce a disciplined TDD workflow, and the commands speed up your development. Now you're ready for AAID Stage 1!

💡	Demo repo: For a working example with all files pre-configured for Cursor, check the TicTacToe demo repository.

AAID Workflow Diagram

Now that you have your specs from the product specification phase (like the user story above), and the AI environment set up, we are ready to start building!

This diagram presents the formal workflow; detailed explanations for each step follow in the AAID Development Stages section below.

The diagram shows three distinct development paths, distinguished by colored arrows:

Blue arrows: Shared workflow stages and the domain/business logic path this article focuses on, as opposed to the technical and presentation paths.
Orange arrows: Technical implementation specific branches (see Appendix D)
Purple arrows: Presentation/UI specific branches (no TDD - see Appendix D)

🔗
Click this link to view the full diagram.

If the diagram is not rendered on mobile, copy/paste the mermaid code into a mermaid editor.

AAID Development Stages

📚 Stage 1: Context Providing

Before any AI interaction, establish comprehensive context. The AI needs to understand the project landscape to generate relevant code.

☝️

☝️
Note on commands: Throughout this guide, you'll see references like `/project-context`. These are pre-written reusable prompts that you trigger with the `/` prefix. The repo stores them in Cursor's `.cursor/commands/`, and you can copy the same markdown into other tools' custom-command setups (e.g., `CLAUDE.md`). You use these commands to augment your implementation speed. Find their implementations in Appendix B.

Note on commands: Throughout this guide, you'll see references like /project-context. These are pre-written reusable prompts that you trigger with the / prefix. The repo stores them in Cursor's .cursor/commands/, and you can copy the same markdown into other tools' custom-command setups (e.g., CLAUDE.md).

You use these commands to augment your implementation speed.

Find their implementations in Appendix B.

Steps:

Add High-Level Context (trigger /project-context and include the relevant context in the same message as arguments to the command)

Project's README, architecture docs, package.json, config files, etc. Whatever you find important to your project from a high level.
Overall system design and patterns
AI Research: Use /research-&-stop to let AI proactively search codebase patterns and relevant documentation

🤖
This will make the AI read through and summarize the basic project context, and how to do things. This is similar to onboarding a new human colleague.

Determine What You're Building

Choose your development type early to load the right context:

Domain/Business Logic: Core behavior delivering business value
Technical Implementation: Infrastructure elements—adapters (Hexagonal), repositories/gateways (Clean/DDD), controllers (MVC)—plus integrations and initializations (see Appendix D)
Presentation/UI: Visual styling, animations, audio (see Appendix D)

Add Specification Context (specific to your development type)

For Domain/Business: User stories with BDD scenarios, PRD sections
For Technical: Technical tasks, NFRs, architecture decisions
For Presentation: Design specs, Figma files, style guides

🤖
The AI is now fundamentally aligned with your development goals, whether creating business value, implementing technical infrastructure, or crafting user interfaces.

Add Relevant Code Context (specific to your development type)

For Domain/Business: Domain dependencies, tests, similar features, pure function utils for similar logic
For Technical: Existing infrastructure elements, infrastructure patterns, utils, integration points
For Presentation: Components, design system, CSS framework, presentation-related config files

🤖
Along with automated checks like linting and formatting, plus any personal AI IDE/CLI instructions you use, this step keeps the AI consistent with your codebase’s style and conventions. It also helps it technically understand how the various parts of the codebase depend on each other.

🤝 Stage 2: Planning (High-Level Approach)

With the AI agent now informed of your specific project's context from Stage 1, collaborate to understand the feature at a high level before diving into implementation. This is not about prescribing implementation details; those will emerge through TDD for domain and technical work, or through design implementation for presentation/UI work. Instead, it's about making sure you and the AI are aligned on scope and approach.

☝️
If you and the AI have different ideas of what is supposed to be built, many times using AI can actually slow progress down rather than speed it up. This AI planning stage helps eliminate this issue.

Planning vs TDD Discovery

For domain and technical work, the planning stage provides a roadmap of what to build and roughly which tests to write. TDD will still discover how to build it through the 🔴 Red • 🟢 Green • 🧼 Refactor cycle. For presentation/UI work, planning outlines validation criteria rather than tests.

What Planning IS:

Understanding which parts of the system are involved
Creating a test roadmap (roughly what to test, and in what order) for domain/technical work
Creating validation criteria for presentation/UI work
Recognizing existing patterns to follow
Mapping out the feature's boundaries and finding related key interfaces/ports
Identifying external dependencies to mock (for testable work)

What Planning IS NOT:

Designing specific classes or methods
Defining data structures
Prescribing implementation details
Creating complete interfaces/ports up front
Making architectural decisions tests haven't forced yet

Think of it like navigation: Planning sets the destination, TDD finds the path.

Modifying Untested Code? Changing untested code in an existing codebase requires strategies (characterization tests, finding seams, etc) outside this guide's scope. In that case, see books like Working Effectively with Legacy Code.

Steps:

1) Discuss the Feature

Discuss and explore freely, as you would with a human
Is everything crystal clear given the provided specifications? Does the AI have any questions?
Share any constraints or technical considerations
Explore potential approaches with the AI
Clarify ambiguities; make sure the AI makes no wild assumptions

2) Check for Additional Context

Ask: "Do you need any other context to understand the feature's scope and boundaries?"
Provide any missing domain knowledge or system information
Trigger /research-&-stop for AI-driven investigation

3) Request Appropriate Roadmap
Based on your Stage 1 choice:

Generate a high-level roadmap before any coding
For domain/business logic: trigger /ai-roadmap-template
For technical implementation: trigger /ai-technical-roadmap-template
For presentation/UI: trigger /ai-presentation-roadmap-template
Focus on test/validation scenarios and their logical sequence
Keep at "mermaid diagram" level of abstraction
An actual mermaid diagram can be generated if applicable

4) Review and Refine Roadmap

Review the roadmap to make sure it addresses every specification (business, technical, or presentation requirements)
Use it to ensure you and the AI agent are aligned
Make sure it respects existing project patterns and boundaries
For domain/technical: Verify the test sequence builds incrementally from simple to complex
For presentation: Verify validation criteria are clear and measurable
Iterate with the AI if adjustments are needed

☝️ Note on task lists: Many other AI workflows (such as Task Master) generate "task lists" with checkboxes in the planning stage. The idea is the AI will then arbitrarily check off items as "done" as it goes. But how can you trust the AI's judgment for when something is actually done?

In addition, with such checkboxes, you must manually re-verify everything after future code changes, to prevent regressions.

That's why you don't use checkbox/task-planning in AAID. Instead, for domain and technical work, you express completion criteria as good old automated tests. Tests aren't added as an afterthought, they're treated as first-class citizens.

Automated tests = objective and re-runnable verification, eliminating both aforementioned problems of trust and regression.

If the roadmap looks good, now is when disciplined development actually starts!

🔀
Path Divergence: After roadmap approval, the workflow splits into three paths: • Domain/Business Logic → Continue to Stage 3 (TDD Development) • Technical Implementation (Non-Observable) → Continue to Stage 3 (TDD Development) • Presentation/UI (Observable Technical) → Proceed to implementation and validation without TDD See Appendix D or the Workflow Diagram for more information on these three implementation categories.

🔀

Path Divergence: After roadmap approval, the workflow splits into three paths:

• Domain/Business Logic → Continue to Stage 3 (TDD Development)
• Technical Implementation (Non-Observable) → Continue to Stage 3 (TDD Development)
• Presentation/UI (Observable Technical) → Proceed to implementation and validation without TDD

See Appendix D or the Workflow Diagram for more information on these three implementation categories.

💻
Important for Frontend Developers: `AAID` absolutely applies to frontend development! Frontend behavioral logic (form validation, state management, data transformations, etc) uses TDD just like backend. Only pure presentation/UI aspects (colors, audio, spacing, animations, some accessibility concerns) skip TDD for manual validation. See Appendix D for detailed examples.

📝 Stage 3: TDD Development Begins

Choose one of these two approaches for implementing your tests when starting work on a new feature:

Test Ordering with ZOMBIES: Whichever approach you choose, order your tests following James Grenning's ZOMBIES heuristic. Zero → One → Many is the happy path; after each step, interleave applicable Boundaries, Interface, and Exceptions before moving to the next. Keep both Scenarios and solutions simple throughout. Link.

Option 1: Test List Approach

Collaborate with the AI to create a list of (unimplemented) tests derived from the specs; breaking down each behavior into granular, testable steps.

Use the Roadmap from "Stage 2: Planning" directly or as inspiration for the test list.

The test list is a living document. Following Kent Beck's TDD approach, this list isn't carved in stone. Add new tests as you discover/think of new edge cases, remove tests that become redundant, or modify tests as your understanding evolves. The list is a tool to guide development, not a contract you must fulfill exactly as written.

This TDD practice is in contrast to acceptance testing, where tests must map 1:1 to the project specs (usually BDD scenarios).

describe("User archives completed todos", () => {
  it.skip("should archive a completed todo")
  it.skip("should not archive an incomplete todo")
  it.skip("should restore an archived todo")
})

☝️
It is extremely important that the tests are not yet implemented at this stage. This is because TDD's iterative cycle prevents you from baking implementation assumptions into your tests. Writing all tests upfront risks testing your preconceptions rather than actual behavior requirements.

Option 2: Single Test Approach

Start with the simplest test and then build incrementally:

describe("User archives completed todos", () => {
  it("should archive a completed todo", () => {
    // To be implemented
  })
})

🔄 Stage 4: The TDD Cycle

🤖

🤖
The Three Laws of TDD: The reusable TDD commands (`/red-&-stop`, `/green-&-stop`, `/refactor-&-stop`) enforce Robert C. Martin's Three Laws of TDD through the disciplined RED - GREEN - REFACTOR cycle: • RED: Write a minimal failing test (enforces Laws 1 & 2: no production code without failing test; minimal test to fail) • GREEN: Write the simplest code to pass (enforces Law 3: minimal production code to pass) • REFACTOR: Improve code while keeping tests green In practice, the `AAID` rules file often handles phase discipline automatically, but these commands offer explicit control when needed. Re-issue with feedback to guide the AI.

The Three Laws of TDD: The reusable TDD commands (/red-&-stop, /green-&-stop, /refactor-&-stop) enforce Robert C. Martin's Three Laws of TDD through the disciplined RED - GREEN - REFACTOR cycle:

• RED: Write a minimal failing test (enforces Laws 1 & 2: no production code without failing test; minimal test to fail)
• GREEN: Write the simplest code to pass (enforces Law 3: minimal production code to pass)
• REFACTOR: Improve code while keeping tests green

In practice, the AAID rules file often handles phase discipline automatically, but these commands offer explicit control when needed. Re-issue with feedback to guide the AI.

For each test, follow this disciplined 3-phase cycle:

🔴 RED Phase →
🟢 GREEN Phase →
🧼 REFACTOR Phase →
Next test → (cycle repeats)

Note: Each phase follows the same internal pattern:

Collaborate and generate with AI ¹

Run tests

Handle potential issues (if any arise)

Use /analyze-&-stop or other investigation & problem solving commands as needed

AWAIT USER REVIEW

Let's walk through a full TDD cycle using this consistent structure.

¹ 🦾 Proficiency Note: As you master AAID, the initial "collaborate" step often becomes autonomous AI generation using your established commands and context. This speeds up the workflow considerably. You might simply invoke /red-&-stop and let the AI generate appropriate code, then focus your attention on the AWAIT USER REVIEW checkpoints. This dual-review structure (light collaboration + formal review) is what enables both speed and control.

User Story Specification:

Let's use this simple spec as a basis.

Title: User adds a new todo

User Story:
As a user, I want to add a new todo to my list, so that I can keep track of my tasks.

Acceptance Criteria:

Feature: Add a new todo

Scenario: Add a new active todo
  Given the user has an empty todo list
  When they add a new todo "Buy groceries"
  Then "Buy groceries" should be in their active todos
  And the todo should not be completed

☝️
Unit tests build incrementally, testing one behavior at a time. As they plan for fine-grained technical correctness/edge cases, unit tests don't always need to map 1:1 with acceptance criteria; that's the acceptance test's job. More on this distinction in Appendix A: Unit Testing and Acceptance Testing.

🔴 RED Phase

→ Collaborate with AI to write test (/red-&-stop)

Un-skip the first test if using test list
Or write the first test from scratch if using single test approach

→ Run test and verify failure

Should fail as expected (compilation failures count as valid test failures)

→ Handle potential issues (if any arise)

If test passes unexpectedly: AI stops and reports the issue
Choose investigation approach (often using investigation & problem solving commands like /analyze-&-stop)
AI implements your chosen fix, then stops for review

Example RED phase prompt:

/red-&-stop

// link/paste the business specification, e.g the BDD scenario

Because of the context that has been provided in the previous steps, the prompt often doesn't have to be longer than this.

Generated test:

// todo.service.test.ts

describe("addTodo", () => {
  it("should add a todo with the correct text", () => {
    // When
    const result = addTodo("Buy groceries") // Fails: 'addTodo' is not defined

    // Then
    expect(result.text).toBe("Buy groceries")
  })
})

⏸️ STOP: AWAIT USER REVIEW
AI agent must `AWAIT USER REVIEW` before proceeding to GREEN. During RED phase review, evaluate: 🔴 Tests behavior (what the system does), not implementation (how it does it) 🔴 In the test phase you design the API of what you are building; its user interface. So—does it feel nice to use? 🔴 Is the test hard to understand or set up? That could be a sign you need to rethink your approach. Clean code starts with a clean test 🔴 Clear test name describing the requirement 🔴 Proper Given/When/Then structure 🔴 Mock external dependencies to isolate the unit; test should run in milliseconds

⏸️ STOP: AWAIT USER REVIEW

AI agent must AWAIT USER REVIEW before proceeding to GREEN.

During RED phase review, evaluate:
🔴 Tests behavior (what the system does), not implementation (how it does it)
🔴 In the test phase you design the API of what you are building; its user interface. So—does it feel nice to use?
🔴 Is the test hard to understand or set up? That could be a sign you need to rethink your approach. Clean code starts with a clean test
🔴 Clear test name describing the requirement
🔴 Proper Given/When/Then structure
🔴 Mock external dependencies to isolate the unit; test should run in milliseconds

Optional: example RED Phase follow-up prompt:

/red-&-stop

- Create todo service class instead of function
- Inject repository
- Update test to check "completed" attribute only

Often follow-ups like these are not needed because of Stage 1.4: Add Relevant Code Context, and 2.3 Request Feature Roadmap

Test after RED review:

// todo.service.test.ts
// Both imports will fail - files don't exist yet (compilation failure = valid test failure)

import { TodoService } from "./todo.service"
import type { Todo } from "./interfaces/todo.interface"

describe("TodoService", () => {
  it("should add a todo with completed set to false", () => {
    // Given
    const mockRepository = {} // Start minimal - no API assumptions yet
    const service = new TodoService(mockRepository)

    // When
    const result = service.addTodo("Buy groceries")

    // Then
    expect(result.completed).toBe(false)

    // Note: You're testing ONE behavior. The repository.save()
    // will be forced by a future test, not this one.
  })
})

🟢 GREEN Phase

→ Collaborate with AI to write code (/green-&-stop)

Write the simplest code to make the test pass
Keep implementation naïve/hardcoded until tests "triangulate" (multiple tests force abstraction/generalization)
No extra logic for untested scenarios

→ Run tests to verify success

Current test should pass
All other existing tests still pass

→ Handle potential issues (if any arise)

If tests fail: AI stops and reports which ones failed
Choose debugging approach (often using investigation & problem solving commands like /debug-&-stop)
AI implements your chosen solution, then stops for review

☝️
Why simplest first? One test can only verify one thing, so complex code means untested parts. If your over-engineered solution breaks, you're debugging the test failure AND untested logic simultaneously. Simple code gets you stable fast and forces each new feature to get its own test, keeping everything verified.

Example GREEN phase prompt:

/green-&-stop

Generated code:

// interfaces/todo.interface.ts

export interface Todo {
  text: string
  completed: boolean
}

// todo.service.ts

import type { Todo } from "./interfaces/todo.interface"

export class TodoService {
  constructor(private repository: unknown) {} // 'unknown' is fine - no test demands otherwise

  addTodo(text: string): Todo {
    // Simplest naïve/hardcoded implementation to pass the test
    return { text: "", completed: false }
  }
}

⏸️ STOP: AWAIT USER REVIEW
AI agent must `AWAIT USER REVIEW` before proceeding to REFACTOR. During GREEN phase review, evaluate: 🟢 The code is the simplest possible solution to make the test pass 🟢 If tests triangulate (multiple examples reveal a pattern), verify code generalizes 🟢 No unnecessary abstractions or future-proofing if tests do not demand it 🟢 Code structure follows project patterns

🧼 REFACTOR Phase

❕	This phase is one of the main reasons developers won't be replaced any time soon

→ Collaborate with AI to refactor (/refactor-&-stop)

Collaborate to enhance code while keeping tests green
Apply patterns that improve current code quality

→ Run tests to verify stability

No regressions introduced

→ Handle potential issues (if any arise)

If refactoring breaks tests: AI stops and reports what failed
Review failure and choose: fix the issue or revert to green state
Use investigation & problem solving commands like /minimal-fix-&-analyze-&-stop for fixes
Revert when the refactoring approach itself is flawed

Example REFACTOR phase prompt:

@refactor-&-stop

// Note to reader: The AI may suggest minimal refactoring since you only have one test.
// But you can guide it toward patterns that improve current code quality.

- Extract Todo model class
- Put interface in same file
- Model should be immutable

Generated refactored code:

// models/todo.model.ts

export interface Todo {
  text: string
  completed: boolean
}

export class TodoModel implements Todo {
  constructor(
    public readonly text: string = "",
    public readonly completed: boolean = false
  ) {}

  static create(): Todo {
    // For now just handles completed status
    // Future tests will force us to handle text properly
    return new TodoModel("", false)
  }
}

// services/todo.service.ts

import { type Todo, TodoModel } from "../models/todo.model"

export class TodoService {
  constructor(private readonly repository: unknown) {}

  addTodo(text: string): Todo {
    // Still ignoring text parameter - test doesn't check it yet
    // Repository still unused - no test requires persistence yet
    return TodoModel.create()
  }
}

⏸️ STOP: AWAIT USER REVIEW
AI agent must `AWAIT USER REVIEW` before proceeding to next test. Final overall review opportunity. During REFACTOR phase final review, evaluate: 🧼 Apply your engineering expertise to assure quality 🧼 Focus on fundamentals: modularity, abstraction, cohesion, separation of concerns, coupling management, readability, testability 🧼 Remove unnecessary comments, logs, debugging code 🧼 Consider potential security vulnerabilities 🧼 Optional: Conduct manual user testing for what you've built. Check the "feel"—only humans can do that!—and UX 🧼 Optional: Run AI bug finder for additional safety

⏸️ STOP: AWAIT USER REVIEW

AI agent must AWAIT USER REVIEW before proceeding to next test. Final overall review opportunity.

During REFACTOR phase final review, evaluate:
🧼 Apply your engineering expertise to assure quality
🧼 Focus on fundamentals: modularity, abstraction, cohesion, separation of concerns, coupling management, readability, testability
🧼 Remove unnecessary comments, logs, debugging code
🧼 Consider potential security vulnerabilities
🧼 Optional: Conduct manual user testing for what you've built. Check the "feel"—only humans can do that!—and UX
🧼 Optional: Run AI bug finder for additional safety

Optional: example REFACTOR Phase follow-up prompt:

@refactor-&-stop

- Remove all comments

Often these prompts aren't needed due to the AI workflow instructions and context provided earlier.

Code after REFACTOR review:

// services/todo.service.ts

import { type Todo, TodoModel } from "../models/todo.model"

export class TodoService {
  constructor(private readonly repository: unknown) {}

  addTodo(text: string): Todo {
    return TodoModel.create()
  }
}

Congratulations, you made it through all the AAID steps! While the workflow might seem overwhelming at first, with practice it becomes habit, and the speed increases accordingly.

Continuing the Stage 4: TDD Cycle

After completing the first cycle, you'd repeat the process with the next test that forces the code to evolve:

Second cycle might test: 'should create todo with provided text'

Forces: return { text, completed: false }

Third cycle might test: 'should persist new todos'

A repository interface to define the persistence contract, replacing the unknown type.
Forces: Repository to have a save method
Forces: this.repository.save({ text, completed: false })

Fourth cycle might test: 'should be able to find the persisted todo after creating it'

Forces: Repository.save must provide identifying information (an ID)

Each cycle follows the same disciplined flow:

🔴 RED →
⏸️ Review →
🟢 GREEN →
⏸️ Review →
🧼 REFACTOR →
⏸️ Final review

The tests gradually shape the implementation, ensuring every line of production code exists only because a test demanded it. This eliminates dead code and hidden bugs: if it's not tested, it doesn't exist.

Conclusion: The Augmented Advantage

Your bottleneck changes with AAID. Instead of being stuck on implementation details, you're now constrained only by your ability to architect and review.

The work becomes more strategic. You make the high-level decisions while AI handles the code generation. TDD keeps this relationship stable by forcing you to define exactly what you want before the AI builds it.

This completely avoids the dangers of vibe coding. AAID helps you as a professional ship quality software with full understanding of what you've built.

And as the AAID loop becomes muscle memory, you will catch regressions early and ship faster.

That's the augmented advantage.

Example Implementation

For a concrete example of code generated with AAID, explore this TicTacToe CLI demo.

100% of the code was generated by an AI agent.

It demonstrates a minimal hexagonal architecture with clear separation between domain logic and adapters, following AAID principles.

Comprehensive test coverage is also included as a consequence of TDD; both unit tests and BDD-style acceptance tests, mapped directly from specs.

End of Guide

You've reached the end of the AAID guide. The appendices below are optional reference material you can dip into as needed.

Appendix A: Acceptance Testing

This article on AAID focuses on TDD (Test-Driven Development) for Unit Testing, which ensures you actually write your code correctly and with high quality.

Acceptance Testing, on the other hand, verifies that your software aligns with business goals and is actually done. It serves as an executable definition-of-done.

Understanding how these two testing strategies complement each other is crucial for professional developers, as both are invaluable parts of writing production-grade software.

☝️
Acceptance Testing is similar to E2E testing; both test the full app flow, via the system boundaries. The key difference: AT mocks external dependencies you don't control (third-party APIs, etc) while keeping internal dependencies you do control (your database, etc) real. E2E usually mocks nothing and runs everything together. Problem with E2E: Tests fail due to external factors (third-party outages, network issues) rather than your code. Acceptance Testing isolates your system so failures indicate real business logic problems, or technical issues that you are responsible for.

☝️

Acceptance Testing is similar to E2E testing; both test the full app flow, via the system boundaries.

The key difference: AT mocks external dependencies you don't control (third-party APIs, etc) while keeping internal dependencies you do control (your database, etc) real. E2E usually mocks nothing and runs everything together.

Problem with E2E: Tests fail due to external factors (third-party outages, network issues) rather than your code. Acceptance Testing isolates your system so failures indicate real business logic problems, or technical issues that you are responsible for.

The two kinds of tests answer different questions:

TDD (Unit Tests): "Is my code technically correct?"
ATDD (Acceptance Tests): "Is my system releasable after this change?"

So in short: TDD builds the solution, Acceptance Tests confirm it’s the right solution.

Key Differences

Unit Tests (TDD)

Answer: "Is my code technically correct?"
Fine-grained, developer-focused testing
Mock all external dependencies
- See Appendix E for dependency categories
Test suite should run in seconds to tens-of-seconds
Apply design pressure through testability
Can but doesn't necessarily map 1:1 to user stories/acceptance criteria
Guide code quality and modularity
Part of the fast feedback loop in CI/CD

Example of what a unit test looks like:

describe("TodoService", () => {
  it("should archive a completed todo", async () => {
    // Given
    const completedTodo = { id: "todo-1", title: "Buy milk", completed: true }
    mockTodoRepository.findById.mockResolvedValue(completedTodo)

    // When
    const result = await service.archiveTodo("todo-1")

    // Then
    expect(result.isOk()).toBe(true)
    expect(mockTodoRepository.moveToArchive).toHaveBeenCalledWith(completedTodo)
    expect(mockTodoRepository.removeFromActive).toHaveBeenCalledWith("todo-1")
  })
})

Acceptance Tests (ATDD/BDD)

Answer: "Does the system meet business requirements?"
Business specification validation through user-visible features
Test in a production-like environment through system boundaries
Mock unmanaged external dependencies (like third-party APIs)
- Don't mock managed external dependencies (like app's database)
- See Appendix E for dependency categories
Test suite will run slower than unit tests
Maps 1:1 to user stories/acceptance criteria
Verify the system is ready for release
Stakeholder-focused (though developers + AI implement)

Example of what an acceptance test looks like (using the Four-Layer model pioneered by Dave Farley):

Layer	Description
1. Executable Specification	The test
2. Domain-Specific Language (DSL)	Business vocabulary
3. Driver	Bridge between DSL and SUT
4. System Under Test (SUT)	Production-like application environment

import { user, todo } from "../dsl"

describe("User archives completed todos", () => {
  it("should archive a completed todo", async () => {
    // Given
    await user.startsWithNewAccount()
    await user.hasCompletedTodo("Buy milk")

    // When
    await todo.archive("Buy milk")

    // Then
    todo.confirmInArchive("Buy milk")
    todo.confirmNotInActive("Buy milk")
  })
})

Acceptance tests know nothing about how our app works internally. Even if the app changes its technical implementation details, this specification (test) will remain valid.

In acceptance tests, every DSL call follows the same flow: Test → DSL → Driver → SUT.

The DSL provides business vocabulary (like user or archive todo), while the driver connects to your SUT from the outside (through APIs, UI, or other entry points). This separation keeps tests readable and maintainable.

Notice how unit tests directly test the class with mocks, while acceptance tests use this DSL layer to express tests in business terms.

🔌

🔌
Note on Integration Testing: While this guide focuses on unit testing through TDD, `AAID` also applies to integration testing. Integration tests verify a single infrastructure element's technical contract by testing it with only its immediate managed dependency (e.g., a repository adapter with real database). Unmanaged dependencies are mocked. See Appendix E for complete dependency handling guidelines.

Note on Integration Testing: While this guide focuses on unit testing through TDD, AAID also applies to integration testing. Integration tests verify a single infrastructure element's technical contract by testing it with only its immediate managed dependency (e.g., a repository adapter with real database). Unmanaged dependencies are mocked. See Appendix E for complete dependency handling guidelines.

In AAID, AI helps you rapidly write unit tests and implementations. Knowing the difference between unit and acceptance testing prevents you from mistaking 'technically correct code' for 'done features,' a crucial distinction in professional development.

AAID Acceptance Testing Resources

Description	Link
Companion article covering the full AT workflow	AAID Acceptance Testing Workflow
Visual workflow diagram of the `AAID` three-phase AT cycle (Mermaid)	AAID AT graph
Rule file to enable `AAID` AT mode in a project	Acceptance Testing Mode
Demo of executable specifications used in practice	TicTacToe executable specifications

Appendix B: Helpful Commands (Reusable Prompts)

These reusable prompt commands speed up your AAID workflow.

Setup & Planning Commands

Used in Stage 1: Context Providing and Stage 2: Planning

Command	Description	Stage	Link
`/project-context`	Establishes comprehensive project understanding with architecture, testing strategy, code style, etc Note on context: Since Commands in Cursor cannot currently directly reference files with `@` symbols inside the command files themselves, you'll need to include any necessary context when invoking the command. For example: `/project-context @README.md @docs/architecture.md`. The command will then operate on the provided context.	Stage 1	View
`/ai-roadmap-template`	Creates high-level roadmap for domain/business logic features that guides TDD without prescribing implementation	Stage 2	View
`/ai-technical-roadmap-template`	Creates roadmap for technical implementation (infrastructure elements: adapters, repositories, controllers, etc.) - see Appendix D	Stage 2	View
`/ai-presentation-roadmap-template`	Creates roadmap for observable technical elements (pure UI/sensory) - see Appendix D	Stage 2	View
`/ai-acceptance-roadmap-template`	Creates strategic roadmap for acceptance testing with isolation strategy - see Appendix A	Stage 2	View

☝️
Planning Tools: Some tools have dedicated planning mechanics (e.g., Claude Code's Plan Mode or the Cursor equivalent). Combine these with roadmap commands when beneficial.

TDD Development Commands

Used in Stage 4: The TDD Cycle

These commands embed the Three Laws of TDD:

No behavioral production code without a failing test
Write only enough test code to fail
Write only enough production code to pass

Each command enforces these laws at the appropriate phase by referencing the AAID rules file, which serves as the single source of truth for the workflow.

Command	Description	TDD Phase	Link
`/red-&-stop`	Enter RED phase: Write minimal failing test, then STOP for review	🔴 RED	View
`/green-&-stop`	Enter GREEN phase: Write simplest passing code, then STOP for review	🟢 GREEN	View
`/refactor-&-stop`	Enter REFACTOR phase: Improve code with tests green, then STOP for review	🧼 REFACTOR	View

In practice: Since the rules file is automatically loaded by your IDE/CLI, you often won't need these commands; the AI will often follow the workflow from the rules alone. That said, the commands remain useful as explicit phase triggers when needed.

☝️
Adding to Existing Projects: These commands work for adding new features to any codebase (new or existing). Modifying Untested Code: When changing existing untested code, first establish characterization tests (documenting current behavior) and find seams (testable injection points). See books like Working Effectively with Legacy Code.

Investigation & Problem Solving Commands

Used throughout various AAID stages for research and debugging

These commands help when you need additional context (Stage 2: Planning) or encounter issues during the TDD cycle (Stage 4: "Handle potential issues" step).

Command	Description	Primary Use	Link
`/analyze-&-stop`	Diagnose specific problems, errors, or failures without making changes	Debugging failures	View
`/analyze-script-&-stop`	Run a specific script and analyze results without making changes	Script diagnostics	View
`/debug-&-stop`	Add debug logging and analyze results to understand issues	Deep debugging	View
`/minimal-fix-&-analyze-&-stop`	Implement the simplest fix, verify results, and analyze outcome	Quick fixes	View
`/research-&-stop`	Comprehensive investigation and context gathering (use for broad exploration)	Context gathering	View

☝️
Triggering "/analyze-script-&-stop": The user discusses or simply types the the script after the command name, for example: "`/analyze-script-&-stop test:db`"

Miscellaneous Commands

Utility commands for common development tasks

Command	Description	Use Case	Link
`/git-commit`	Create clean commit messages following project guidelines	Version control	View
`/gherkin-guard`	Enforce consistent Gherkin-style Given/When/Then comments in tests	Test formatting	View

These are just examples of AAID commands. Create your own or modify these to match your workflow. The key is using reusable prompts to greatly augment your development speed.

Appendix C: AAID Workflow Rules

Configure your AI environment to understand the AAID workflow. These are simple text instructions, no special AAID app or tool is required.

☝️
Note on AI instruction following accuracy: At the time of writing, current AIs are good, but not at all perfect, at following instructions and rules such as the AAID AI Workflow Rules. Sometimes you may need to remind the AI if it for example forgets a TDD phase, or moves directly to GREEN without stopping for user review at RED. As LLMs improve over time, you'll need to worry less about this.

☝️

Note on AI instruction following accuracy: At the time of writing, current AIs are good, but not at all perfect, at following instructions and rules such as the AAID AI Workflow Rules. Sometimes you may need to remind the AI if it for example forgets a TDD phase, or moves directly to GREEN without stopping for user review at RED.

As LLMs improve over time, you'll need to worry less about this.

AAID AI Workflow Rules/Instructions

This is the official AAID workflow rules. But feel free to customise it.

AAID AI Workflow Rules/Instructions

Usage Guide

For Cursor:

Project‑specific: commit a rule file in .cursor/rules/ so it's version controlled and scoped to the repo.
Global: Add to User Rules in Cursor Settings
Simple alternative: Place in AGENTS.md in project root

For Claude Code:
Place in CLAUDE.md file in your project root (or ~/.claude/CLAUDE.md for global use)

For other AI tools:
Look for "custom instructions", "custom rules", or "system prompt" settings

Appendix D: Handling Technical Implementation Details

The main guide above has focused on BDD/TDD for domain behavior. Technical implementation details—infrastructure elements and presentation—are covered in Appendix D.

Read Appendix D: Handling Technical Implementation Details

Appendix E: Dependencies and Mocking

Once you've identified your test type from the Implementation Categories, this reference clarifies how to properly handle the dependencies of what you're testing.

It covers the four dependency categories (Pure In-Process, Impure In-Process, Managed Out-of-Process, Unmanaged Out-of-Process) and shows how each test type (unit, integration, contract, acceptance) handles them differently.

Read Appendix E: Dependencies and Mocking

Dawid Dahl is a full-stack developer and AI skill lead at UMAIN | EIDRA. In his free time, he enjoys metaphysical ontology and epistemology, analog synthesizers, consciousness, techno, Huayan and Madhyamika Prasangika philosophy, and being with friends and family.

Photography credit: kaixapham

Encapsulating the Past: How We Tamed a Legacy System with Timeless Software Engineering Principles

Dawid Dahl — Wed, 18 Sep 2024 08:16:49 +0000

Introduction
Inheriting a Mess
Reinventing from the Ground Up
Ports and Adapters
The Technology Behind Our Overhaul
Did SOLID Principles Guide Our Design?
Building Confidence with the Testing Pyramid Strategy
Ensuring Testability with Pure Functions
Deployment on Google Cloud Platform
In Summary: Was the Backend Transformation Successful?

Introduction

The day we took over the operations of a legacy e-commerce backend system from the global protection brand POC, one thing was certain: this was going to be a formidable challenge.

The codebase we inherited from the previous development team was riddled with issues:

It was fragile, often breaking with (or even without) the slightest modification.
Changes couldn’t be made with confidence, as the system was completely untested.
It lacked any coherent design principles, leaving us without a solid foundation to build on.

Given the state of the system, it became clear that a simple cleanup wouldn’t suffice. What we needed was a complete overhaul — a new application designed from the ground up, drawing inspiration and guidance from various timeless software engineering principles. This approach would allow us to address every flaw we encountered, laying a solid foundation for the future.

Inheriting a Mess

The use case that led to the now-legacy solution involved transferring data—such as stock, orders, and tracking events—between the client's ERP (Enterprise Resource Planning) system and their Shopify e-commerce platform.

Let’s take a look at the inventory flow as an example:

The problem was that their ERP, Microsoft Dynamics AX, is a relic from the stone age, offering none of the modern amenities like a REST or GraphQL API. Instead, it resorted to dropping literal XML files onto an SFTP server, to later be picked up for processing.

This processing was handled by a no-code platform called Make. While Make offered a nifty solution for simpler workflows, its limitations became painfully obvious when dealing with complex business logic and advanced use cases.

On top of that, the technology chosen as the "database" for data on its way to and from Shopify was Google Sheets. Using a spreadsheet for this purpose of course lacked the robustness needed for complex workflows and storage.

The system also relied on Matrixify, a third-party Shopify app, for data imports and exports. While functional, the app's awkward interface and us needing to depend on an external tool introduced additional risks, underscoring the fragility of the entire legacy setup.

Reinventing from the Ground Up

To solve these challenges, we first asked if the client was open to switching to a more modern ERP. They were initially on board, but their IT team estimated the cost at nearly 1 million euros, so that option was off the table. Rather than dwelling on this obstacle, we came up with an idea 💡:

How about we encapsulate the whole legacy system in a new backend service—an ERP adapter—which would then be able to offer a simple API interface for the E-com engine to interact with?

This way, we could deal with the issues inside once, and then no one on the outside would ever have to think about quirky XML file syntax, Google Sheets going down because of not being able to process more than 50 000 rows, or unstable SFTP server interactions ever again.

So we did a major architectural overhaul. Here are a few of the main changes:

Adapted a proper Postgres database, with Prisma as ORM.
Got rid of the dependency for an import/export SaaS product and built the functionality ourselves. (Mutation batching, Centra rate-limit handling, logging.)
Added strong typing with TypeScript for every entity and interaction.
Exposed a GraphQL API.
For security, storage, cron jobs, hosting, and more, we used Google Cloud Platform.
Set up an independent QA environment in GCP, to be able to safely test new features before deploying to production.
(For the E-com engine we switched from their old Shopify setup that used Liquid templating and barely readable checkout scripts, to a headless setup with Centra and a Next app for the frontend.)

The Technology Behind Our Overhaul

One of our primary goals was to ensure that different parts of the codebase were independent (decoupled), so changes in one area wouldn't affect another. With the legacy system, we never felt free to change something that worked, because we had no idea what would break. This is a very bad situation to be in, as new features can't be added easily, if at all.

To achieve our goal, we chose what we believe is the best backend framework for TypeScript: NestJS.

It’s like Express, but more fleshed out with built-in features that developers from languages like Java or C# will recognize, such as a modular architecture, middleware, and tools for request interception and validation.

Most importantly, it provides a robust Dependency Injection (DI) system, making the code scalable and easier to test by preventing different parts of the codebase from becoming entangled.

Armed with this framework, we were now ready implement the Hexagonal, or Ports and Adapaters, software architecture.

Ports and Adapters

The point of this architecture is to keep the core business logic decoupled from external systems, like third-party services, databases, or file transfers. By organizing the system around interfaces (or "ports") and separating the external integrations into distinct implementations (or "adapters"), we ensure the business logic remains stable even as external dependencies change. This separation also makes testing easier by allowing us inject fake/mock adapters without touching the core logic.

To enforce this separation, we split the system into public modules (business logic) and private modules (adapters). Public modules contain stable core logic, while private modules handle external dependencies, which can evolve without affecting the core.

Adapters (Red)

Adapters connect the core application to external systems, such as SFTP services, XML processing, and network batching. They are part of the private modules, meaning they can change without touching the stable core logic in the public modules. This keeps external system changes isolated.

Ports

Ports define interfaces that the business logic both implements and invokes to interact with external systems. For example, the ISyncInventory port, implemented by the InventoryService, handles inventory synchronization, while the ISftpConnector port, invoked by the business logic, deals with file transfers. Using these ports, the business logic remains decoupled from external system details, ensuring the application is flexible and adaptable to changes.

Application Business Logic (Green)

The business logic lives in the public modules and handles the core rules and processes, such as the inventory service managing data synchronization. By depending only on ports, the business logic stays decoupled from external systems, ensuring it remains stable, maintainable, and easy to test, even when external systems change.

Did SOLID Principles Guide Our Design?

To ensure our architecture is robust, let’s review it against the SOLID principles that Robert C. Martin, famous for his books on Clean Code and Clean Architecture, has laid out. Does our system hold up to these timeless software engineering guidelines?

S - Single-responsibility Principle

Each module has one clear purpose. For example, our InventoryService only handles inventory logic, while adapters deal with external interactions like SFTP or APIs.

@Module({
    imports: [
        ConfigModule.forRoot({
            isGlobal: true
        }),
        GraphQLModule.forRoot<ApolloDriverConfig>({
            driver: ApolloDriver,
            playground: false,
            autoSchemaFile: join(process.cwd(), "src/schema.gql"),
            sortSchema: true
        }),
        AuthModule,
        EventEmitterModule.forRoot(),
        PrismaModule,
        CloudStorageModule,
        InventoryModule, // <--- Here
        XmlModule,
        GraphQLBatchModule,
        NetworkRequestRetryModule,
        FetchModule,
        CentraIntegrationModule,
        WebhookModule,
        SftpConnectorModule,
        OrderModule,
        ExceptionModule,
        ErrorModule,
        LoggerModule,
        TrackingModule,
        PricingModule,
        ProductModule
    ],
    controllers: [AppController],
    providers: [
        AppService,
        ...appConfig
    ],
    exports: [ConfigModule]
})
export class AppModule {}

Here is the main app module, where the Nest framework allows us to import all our modules, each of which has a single responsibility.

O - Open-closed Principle

Our modules are open for extension but closed for modification. We can add new features, like additional adapters, without altering the existing core logic.

L - Liskov Substitution Principle

This principle ensures that different implementations of an interface can be swapped without breaking the system. Adhering to this, we can replace an adapter like ISftpConnector with another SFTP implementation, and it will work seamlessly as long as it follows the expected behavior defined by the interface. This way, adapters can be switched out without affecting the business logic.

I - Interface Segregation Principle

We create small, focused interfaces that each handle a single responsibility, and then compose them into larger ones, like ISftpConnector, ensuring that modules only rely on the specific functionality they need. This prevents the tight coupling often caused by inheritance and keeps dependencies clean and maintainable.

export interface ISftpConnector
    extends ISftpConnectorFileGet,
        ISftpConnectorFilesGet,
        ISftpConnectorFileAdd,
        ISftpConnectorFileDelete,
        ISftpConnectorIsDirEmpty,
        ISftpConnectorPurgeDir {}

ISftpConnector is composed of smaller interfaces, allowing us to separate concerns and avoid bloated, monolithic interfaces, which can lead to the infamous "God object".

D - Dependency Inversion Principle

As we have seen, our system relies on abstractions (interfaces) rather than concrete implementations. The core logic depends on ports (interfaces), while the adapters implement those ports, keeping the layers decoupled.

export interface IInventory
    extends ISyncAxInventoryToAdapterInventory,
        ISyncCentraInventoryToAdapterInventory,
        ISyncAdapterInventoryToCentraInventory,
        IGetWarehouse,
        // etc ...
        IDeleteInventoryRecord {}

This is the interface (port) for the inventory service. (The application business logic.)

@Resolver("AXInventoryResolver")
export class AXInventoryResolver {
    private readonly logger: LoggerService

    constructor(
        @Inject(INVENTORY_SERVICE_TOKEN)
        private readonly inventoryService: IInventory,
        private readonly exception: ExceptionService
    ) {
        this.logger = LoggerService.withContext(AXInventoryResolver.name)
    }

    // etc ...

Here we see that the GraphQL resolver (network) has no direct interactions with the inventory service. It directly depends on the abstract IInventory, and thus remains decoupled.

@Injectable()
export class AXInventoryService implements IInventory {
    private readonly logger: LoggerService

    constructor(
        private readonly prisma: PrismaService,
        @Inject(XmlService)
        private readonly xml: IXMLService,
        @Inject(SFTP_CONNECTOR_TOKEN)
        private readonly sftp: ISftpConnector,
        @Inject(CLOUD_STORAGE_SERVICE_TOKEN)
        private readonly cloudStorageService: ICloudStorageService,
        // etc ..
        private readonly config: ConfigService,
    ) {
        this.logger = LoggerService.withContext(AXInventoryService.name)
    }

    public syncAxInventoryToAdapterInventory =
        (market: Market) =>
        (): TE.TaskEither<
            InventoryError,
            SyncAxInventoryToAdapterInventorySuccessMessage
        > =>
            pipe(
                TE.Do,
                TE.bind("inventoryFileIdentifier", () =>
                    this.getInventoryFileIdentifier(market)
                ),
                // etc ...

In the code, we see ISftpConnector being injected into the AXInventoryService. This illustrates the "inversion" principle: at compile time the high-level service depends on an abstract interface, while the concrete implementation is injected only at runtime. This keeps the system flexible and adaptable to changes in external services.

Building Confidence with the Testing Pyramid Strategy

Our system does indeed follow the SOLID principles! That's great, but how does it hold up in testing? One of our main goals was to ensure that changes could be made confidently, with good test coverage.

Fortunately, by adhering to SOLID and the Ports and Adapters architecture, testing becomes much easier as a natural side effect. Like a bonus! The clear separation of concerns allows us to test each layer independently, as shown in the diagram below:

This strategy is known as the Testing Pyramid.

E2E Testing

Simulates real user interactions by calling the server over HTTP, using a test database, and seeding data before tests. It's thorough but slower due to involving external systems.

Integration Testing

Mocks most adapters to avoid calling real external systems. It’s faster and ensures modules work well together without involving full system dependencies.

Unit Testing

Mocks all adapters, ensuring no external systems are touched. It’s ultra-fast, focusing on testing isolated logic within a single module.

Ensuring Testability with Pure Functions

In addition to our testing strategy, we ensure that each service in our Nest modules—whether public or private—follows a functional programming style using the fp-ts library.

The creator of fp-ts, Giulio Canti, recently joined the Effect team. Effect is a library very similar to fp-ts, with some additional bells and whistles.

To illustrate this, let's take a look at the typical structure of a Nest module.

This approach allows us to write pure functions—functions that 1) always return the same output for the same input, and 2) don’t produce side effects. Side effects occur when a function interacts with the world outside of itself (e.g., calling an API or modifying a global state), making testing and debugging more difficult.

To avoid this, we use TaskEither, a type from fp-ts that represents an asynchronous operation that can either succeed or fail. Here’s an example from our IOrder interface:

import { taskEither as TE } from "fp-ts"

type OrderNumber = number

export interface IOrderServiceCreate {
    createOrder<T>(axOrderJson: T): TE.TaskEither<OrderError, OrderNumber>
}

IOrder composes interfaces like IOrderServiceCreate, where TaskEither is used for async operations that could fail.

@Injectable()
export class AXOrderService implements IOrder {

// ... (code)

public createOrder = <T>(data: T): TE.TaskEither<OrderError, OrderNumber> => {
        // ... (code)

        // A value "data" of generic type T goes into the pipeline:
        return pipe(
            E.Do,
            E.bind("data", () => E.right(data)),
            E.bind("validatedData", validateData),
            E.bind("orderNrAndShipmentId", getOrderNrAndShipmentId),
            E.bind("market", getMarket),
            TE.fromEither,
            TE.bind("id", persist),
            TE.bind("xml", createXml),
            TE.bind("gcsBucketName", getGCSBucketName),
            TE.bind("cloudUploadSuccess", performCloudUpload),
            TE.bind("sftpUploadSuccess", performSftpUpload),
            TE.chain(persistSuccess)
        )

        // And comes out transformed on the other side,
        // as a type: TaskEither<OrderError, OrderNumber>
    }

TaskEither is technically what is called a monad, which is a design pattern in functional programming. The funky syntax is based on Haskell's do notation. (Link.)

This entire declarative flow in the service, from beginning to end, is lazy and pure. Laziness ensures that nothing happens until exactly when the function is invoked, and purity guarantees that the function’s behavior is deterministic. This predictability makes our services easier to test, as every input will consistently return the same result without causing hidden side effects.

Deployment on Google Cloud Platform

Finally for the deployment, we handle it by running the app in a Docker container in Google Cloud Run, which handles infrastructure and scales automatically to meet demand. We also rely on Google Cloud's built-in authentication, so security is managed behind the scenes, letting us focus on building the app instead of worrying about access control.

In Summary: Was the Backend Transformation Successful?

So, this all sounds great on paper, but what’s been the real outcome? We’re proud to say the system has been running smoothly since deployment, doing its job without a hitch.

For an engineer, there’s little more satisfying than seeing a system you’ve built work seamlessly, reliably, and without constant intervention.

By taking the time to apply timeless software engineering principles, we’ve built a stable backend platform that the client has been highly satisfied with — one that lets them focus on adding new features instead of constantly fixing things. They can finally innovate with confidence, knowing their backend will keep up with whatever comes next.

Dawid Dahl is a full-stack developer at UMAIN | EIDRA. In his free time, he enjoys metaphysical ontology and epistemology, analog synthesizers, consciousness, techno, Huayan and Madhyamika Prasangika philosophy, and being with friends and family.

The Death of RAG: What a 10M Token Breakthrough Means for Developers

Dawid Dahl — Mon, 19 Feb 2024 09:33:42 +0000

"In our research, we’ve also successfully tested up to 10 million tokens." - Google Researchers

The other day, Google announced Gemini Pro 1.5 with a massive increase in accurate long-context understanding. While I could not immediately put my finger on exactly what the broader implications might be, I had a hunch that if this is actually true, it's going to change the game.

And it was not until I spoke to a colleague at work when I finally realized the true impact of Google's announcement.

He said:

"Latency is going to be a thing though... 10M tokens is quite a few MBs."

It struck me that I’d actually be thrilled to have the option to wait longer, if it meant a higher quality AI conversation.

For example let’s say it took 5 minutes, or 1 hour — hell, even if it took one whole day — to have my entire codebase put into the chat’s context window. If after that time, the AI had near-perfect access to that context throughout the rest of the conversation like Google claims, I’d happily, patiently and gratefully wait that amount of time.

What is `RAG` (Retrieval-Augmented Generation)

Both me and my colleague had worked on the ARC AI Portal at our workplace, an internal platform where we give everybody free access to GPT-4 and are utilizing something that is called RAG (retrieval-augmented generation), for various purposes. The purpose of RAG is to provide an AI access to information it does not natively possess, akin to the fresh perspective of initiating a new ChatGPT session.

For example, one use case for RAG was when another colleague of ours, the author Rebecka Carlsson, asked us to let people chat directly with her latest book The Speaker's Journey using our company's AI portal.

So the AI portal team developed the full RAG pipeline: took the book → chunked it into small pieces (not literally) → used OpenAI's embedding model to vectorize the chunks → inserted them into a vector database → and finally gave the AI access to the database within the chat via something called semantic search.

Mostly, it worked great. If people asked some specific detail from her book, the RAG solution was able to retrieve the information more often than not.

But here is the deal, RAG is a hack. We are essentially brute-forcing the information onto the AI, and that means that often it doesn't actually work as well as one would hope. It also can't do summaries well. It also requires developer time to set up, meaning it's slow and costly.

So my point to my colleague was this:

As it is now, people obviously wait weeks, even months, and pay loads of money to people like us to implement RAG, a solution which is riddled with problems even when done by an expert.

Surely then, Google — and eventually OpenAI when they release the equivalent solution — adding a little bit of latency for this new feature is fine.

The Big Problem In AI-Driven Development Today

In my current AI-driven development (AIDD) workflow, I always find myself copy-pasting the relevant parts of my codebase into the AI chat window in the beginning of the conversation. This is because, like most things in life, the specific functions I collaborate on with the AI never exist in isolation. It is always embedded in some larger network of system dependencies.

Important to point out: At work we use either our internal AI portal or ChatGPT Teams, where OpenAI never train their models on our proprietary data or conversations.

So even as I painstakingly take my time to try and copy-paste all the relevant context, since a production codebase is such a huge eco-system of tens or even hundreds of thousands of lines of code, I could never realistically give it all. And even if I do give a lot, as the conversation goes on, the AI will eventually begin to forget.

Github Copilot tried to solve this with a native RAG solution built straight into the code editor. While it works sometimes, it's so sketchy I can never rely on it, meaning their RAG implementation is almost useless.

This cumbersome dance of feeding the AI piece by piece of our codebase, and it constantly forgetting and needing to start over, is a fragmented, inefficient process that disrupts the flow of the AI collaboration and often leads to results that are hit or miss.

That is - until now.

The Exciting Post-`RAG` Era

Approximately, 1 million tokens would amount to around 50,000 lines of code, and 10 million tokens would thus equate to about 500,000 lines of code. That means that if Google's claims are correct, almost all our codebases would fits into an AI's view all at once.

This would be nothing short of revolutionary.

It's akin to moving from examining individual cells under a microscope to viewing an entire organism at once. Where once we pieced together snippets of code to get a partial understanding, a 10 million token context allows us to perceive the full "organism" of our codebase in all its glorious complexity and interconnectivity.

This shift then would offer a complete and holistic view, enhancing our ability to collaborate with the AI to add new features, refactor, test and optimize our software systems efficiently.

So Is `RAG` Dead?

Even after the conversation with my colleague, the thoughts of the deeper implications kept on coming. When we get up to 10m context length with better retrieval than RAG, what is even the point of RAG? Does it have any value at all?

By RAG I mean specifically: creating embeddings, feeding them into a vector database and then doing semantic search over those embeddings before feeding the results back to the AI.

Just take one unique selling point of RAG today: metadata. That is, the ability to attach extra pieces of information — such as sources, line-of-code number, file-type, compliance-labels, etc — to the data that the AI interacts with. With such metadata, we can enable the retrieval step to access detailed information for greater specificity and context-awareness in the AI's responses.

But really, why go through the vector database hassle, when you could just have a quick higher-order function that transforms your entire codebase into a json data structure with whatever desired metadata you'd like?

Something such as:

type FormatterFunction<T = unknown, U = unknown, V = unknown> = (
  inputData: T,
  config: U
) => V;

type ProcessData<T = unknown, U = unknown, V = unknown> = (
  formatter: FormatterFunction<T, U, V>,
  inputData: T,
  config: U
) => V;

Could result in some data structure like this:

…
{
  code: “print(‘the’)”,
  loc: 73672
},
{
  code: “print(‘post’)”,
  loc: 73673
},
{
  code: “print(‘RAG’)”,
  loc: 73674
},
{
  code: “print(‘era’)”,
  loc: 73675
},
…

And then you give this as JSON to the AI, instead of giving it the regular codebase. Sure, it’s more characters, which would increase the overall token count. But when you’re dealing with the hypothetical insanity of millions of tokens, this is starting feel like a possibility.

Playing the Devil's `RAG`-vocate

Before we declare RAG dead, let's invite a Devil's advocate and think about some of the other reasons why we might want to keep RAG around?

😈 Fake?

"Yeah I saw the original Gemini video, which turned out to be fake. So why would I believe this?"

I was also very skeptical, until I saw this video from someone not working for Google.

Also, there were these demos from a tester not affiliated with Google on X as well.

I was extremely surprised by these promising results.

😈 Staying Updated:

"`RAG` keeps AI clued into the latest info, something a static context can't always do."

Well, what would prevent us from just giving the freshest data at the beginning of every AI conversation? Or even updating it periodically during the same conversation?

😈 Reducing Hallucinations:

"Since `RAG` runs on our own server, we have the power to tell the AI to simply say 'I don't know' if relevant context was not able to be retrieved."

This is true, and the simple fact that we as developers have a programmatic step of total control between the retrieval and the response stages just intuitively feels good. So this is a good point.

But then again, there is nothing stopping us from implementing some solution where we first do the retrieval query, and then perform some arbitrary action before feeding the result back to the model. You wouldn't need to do the whole manual chunking/manual embedding/vector database/manual semantic-search for that.

😈 Handling the Tough Questions:

"For those tricky queries that need more than just a quick look-up, `RAG` can dig deeper."

If we have the full and complete data, and if the AI can have instant access to all of it like Google appeared to demonstrate in their demos, why would we need to dig deeper with RAG at all?

😈 Efficiency:

"When it comes to managing big data without bogging down the system, `RAG` can be pretty handy."

If this large context window is offered as a service, then that means the system is actually designed to be bogged with data.

😈 Keeping Content Fresh:

"`RAG` helps AI stay on its toes, pulling in new data on the fly."

Google declares: "Gemini 1.5 Pro can seamlessly analyze, classify and summarize large amounts of content within a given prompt." This means it can pull in data from the entire context window on the fly.

😈 Computational and Memory Constraints:

"Processing 100 million tokens in a single pass would require significant computational resources and memory, which might not be practical or efficient for all applications. Not to mention costly."

This is a good point. As more compute is needed, costs will be higher compared to RAG.

Also considering the global environmental impact - running data centers is one of the major energy drains today. Efficient use of computational resources with RAG could potentially contribute to more sustainable AI practices.

😈 Extending with API requests:

"Sometimes, the AI would need to augment its data with external API requests to get the full picture. When we do `RAG`, it happens on a server, so we can call out to external services before returning the relevant context back to the model."

AI already has access to web browsing, and there is nothing in principle that prevents an AI to use it while constructing its responses. If you would like more control over external services and make network requests, you should utilize AI Function Calling instead.

😈 Speed:

"I saw Google's demos. It took a long time to get a response; vector databases are much much faster."

This is also true. But personally, I'd rather wait a long time for an accurate response, than wait a short time for a response I can't trust.

Also honestly, who would be surprised if within a couple of months the latency is starting to decrease as new models are released?

In Summary

Google's new break-through announcement could flip the script for developers by allowing AI to digest our entire codebases at once, thanks to its potential 10 million token capacity. This leap forward should make us rethink the need for RAG, as direct, comprehensive code understanding by AI becomes a reality.

The prospect of waiting a bit longer for in-depth AI collaboration seems a small price to pay for the massive gains in accuracy and sheer brain power. As we edge into this new era, it's not just about coding faster; it's about coding smarter, with AI as a true partner in our creative and problem-solving endeavors.

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys metaphysical ontology and epistemology, analog synthesizers, consciousness, techno, Huayan and Madhyamika Prasangika philosophy, and being with friends and family.

The New Computer: Use Serverless to Build Your First AI-OS App

Dawid Dahl — Thu, 01 Feb 2024 12:46:55 +0000

There is no denying some really interesting and groundbreaking things are cooking over at OpenAI.

Why do I say that? The reason is that in recent months they have started to release some things many people didn't fully expect. I believe this is a sign that internally, OpenAI is currently executing on an overarching plan that over the coming years will change the digital landscape completely.

What are some of these things they have released? Examples include GPTs, GPT Actions, and most recently: GPT @-mentions.

This is simply a way to reference your GPTs—AI chatbots that you can customize on your own—in your current ChatGPT conversation.

Well, you might say, that doesn't sound like such a big deal? And why is so much time being spent on these GPTs? Are they even any good?

Let me show you why it is a big deal.

The Dawn Of A New Computer

Back in the day, there was a little company called Microsoft that revolutionized personal computing. Founded in 1975 by Bill Gates and Paul Allen, Microsoft achieved its big break with MS-DOS, an operating system developed for the IBM PC in 1981. This success paved the way for Windows, which became the dominant operating system worldwide.

Seizing the opportunity in the flourishing era of personal computing, Microsoft's strategic innovations and market adaptability turned it into a tech juggernaut.

Image of an hypothetical LLM OS by Andrej Karpathy, working at OpenAI.

I believe that what Microsoft did with the release of Windows 1.0 back in 1985, is what OpenAI is gearing up to do in 2024 and beyond: creating a new kind of AI-OS for the next generation of personal computers. This could be as pivotal for our digital interactions as when Bill and Paul revolutionized computing with Windows.

GPTs as AI-OS Apps

So essentially, these GPTs are to the AI-OS, what traditional applications were to Windows; instead of launching an app on your computer, you will be orchestrating AI agents to perform actions on your behalf. And it will be so much more fun and engaging than pressing down 👇🏻 keys on a board or other pieces of plastic.

Instead of being on your own as in the days of PC's past, as I described in a previous article, you will instead be collaborating directly with a host of artificially intelligent beings, not at all unlike how Luke Skywalker is dealing with C-3PO in Star Wars.

But how do you actually create one of these new AI-OS apps? In the next section, I'll guide you through the process using AI to help you build a (AI Application Value Level 2) GPT Action, using serverless functions technology.

The most common way of building a GPT Action today, if you look on ChatGPT-related YouTube content, is Zapier: a no-code platform allowing you to perform actions like sending email or updating your calendar. By using serverless functions instead, you actually won't need to pay Zapier a subscription fee every month!

ℹ️ 1. Even though being a developer helps when building serverless functions, with the help of AI (and a little grit), it's not strictly necessary, as you can learn as you go.

ℹ️ 2. Even though it is called “serverless”, that doesn’t mean there is no server. It just means that we don’t use our own local server; we use some other company’s server in the ☁️.

Using Serverless Functions to Create Your First Proto AI-OS-App

So what shall we build? As a proof-of-concept, let's go for an AI-OS app that should, on the server, generate some ASCII art of a cow that says something. Like this:

To create the ASCII art, we'll use cowsay on the server, which is an external library designed for this cowsome purpose.

Then that art should be sent from the server back to the AI-OS app (our GPT), which will then create a beautiful painting drawing inspiration from this ASCII art.

You will need 1) a ChatGPT Plus or Teams account, 2) a free Vercel account and 3) a free GitHub account to build along with me.

Step 1: Set Up The ☁️ Environment

Open up ChatGPT and ask it to generate a serverless function on Vercel.

To get started, use this prompt:

"Could you carefully guide me through creating a serverless function with Vercel using Node, starting by setting up a Next.js project using create-next-app, then writing a basic serverless function in TypeScript, and finally deploying it via the Vercel CLI? Please also explain step-by-step how we link the Vercel project to GitHub."

If you prefer a written guide, you can use this. To see or clone my finished serverless function repository on GitHub, click here.

Vercel's Hobby Plan offers free serverless functions for small projects, with up to 10-second runtime and ample monthly capacity of 100 GB-hours. That means a simple function can be run around 700,000 times a month, for free! No need to pay Zapier every month.

More info on pricing here.

Step 2: Create The Function 🛠️

Now you should have a Next project. Inside the app folder, there is an api folder. Inside that folder, create a new folder and call it something that should be thought of as a spell 🪄✨ we use to activate our function. Let's go for gpt-functions-cowsay, or whatever you'd like. Remember this spell name, we will need it later.

Next, in this spell folder, create a file called route.ts. The folder structure will thus be: app/api/gpt-functions-cowsay/route.ts.

If at any point you feel lost, no worries! Just ask ChatGPT for clarification or help.

Now, request ChatGPT to write the server-side code if it didn't already, to generate a cowsay and return the result. Use this prompt to get started:

"I need help creating a simple serverless function in Next.js that uses the 'cowsay' package. The function should take text from a URL search parameter, make a cow say it, and return this along with the request. Can you guide me through the steps, including necessary TypeScript code, to set up this function?"

If the AI does its job, the code for the function will end up something like this.

import { NextResponse } from "next/server";
import { say } from "cowsay";
import type { NextRequest } from "next/server";

export const GET = (request: NextRequest) => {
  const cowsayText = request.nextUrl.searchParams.get("cowsay") || "";

  return NextResponse.json(
    {
      cowsay: say({ text: cowsayText }),
    },
    {
      status: 200,
    }
  );
};

Paste this code into the route.ts file in the spell folder (gpt-functions-cowsay).

Please note that although this function performs a simple task, in reality, within this server environment, you now wield the full power of software engineering.

That's right. Unlike with Zapier where you are restricted to follow their rules, in here, you can build any tool you want. And through the Actions input the GPTs creation editor, you can hand this tool over to the AI for it to use on your behalf.

Take a moment and just reflect on the vast possibilities. The sk-AI is the limit!

Step 3: Create An OpenAPI Spec 📄

Now, the way we make our GPT aware of our new function so it can use it, is to hand it something called an OpenAPI specification.

Yes, that was not a typo. While OpenAI is the company, OpenAPI is a rulebook for how computer programs talk to each other (APIs).

If you are not a developer, you will have no idea how to write such a specification. But fear not, you can use another GPT called ActionsGPT to do it for you.

In the configuration tab of the GTP creator, click the "Create new action" button.
In a separate ChatGPT thread, @-mention ActionsGPT.

Ask it: "I have set up a serverless function in Vercel. What should I do now to get an OpenAPI specification from you?" You could hand it some of the code too.
ActionsGPT will tell you to hand it some information.
You will give it something like this. (The Base URL you get from your Vercel project.) The prompt doesn't have to be exact, just get the urls and the GET or POST right and describe what your function does. Use this prompt to get started:

"Endpoint URL(s): gpt-functions-cowsay
HTTP Methods: GET
Base URL: gpt-functions-cowsay.vercel.app

When given an input called cowsay, it will take it and make a cowsay out of it. Then it will return the cowsay."

In Aramaic, "avra kehdabra" means "I will create as I speak". If gpt-functions-cowsay is the kadabra, GET is the abra. Using them both together will cast the function's magic! ✨

ActionsGPT will then generate the OpenAPI spec for you.

Step 4: Launch Your GPT! 🚀

Finally, paste the OpenAI specification into the Schema input of the GPT Actions editor. Like this:

If you encounter errors, consult ActionsGPT with your serverless function code at hand. Iteration is key in when building with AI.

Use this free privacy policy generator to create a policy for the GPT action, in case you want your GPT to be public.

Step 5: You're done! ✅

That's it, if OpenAI allows you to save this GPT, that means you did it - you just built your first simple AI-OS app! 👏🏻

This might've seemed daunting, especially for non-developers. And don't worry if you couldn't get it to work on your first try. Because remember, adding an action to a GPT is a Level 2 task in AI software development — it's supposed to be a bit on the tougher side! But also more rewarding and fun to build, if you ask me.

Conclusion

In this guide, you've learned how to create an AI-OS app using serverless technology with our cowsay example. This introductory project showcases the potential for building some truly innovative AI applications.

If you didn't follow along and build it with me, here is the cowtastic Cowsay Creator in action!

It is all right to press "Allow" here. You can check the Github repo to verify that apart from bad cow art, nothing else bad happens in our serverless action.

OpenAI's latest developments hint at a major shift, similar to when Windows first changed computing. We're seeing the start of a new AI-OS that could change everything, indicating that a future with C-3PO-like companions might be closer than we anticipate.

And while our Cowsay Creator GPT was just for fun and practice, by exploring this, you're already a part of the emerging AI-OS future. Who knows what actually valuable AI-OS apps you'll create next!

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys metaphysical ontology, analog synthesizers, consciousness, Huayan and Madhyamika Prasangika philosophy, and being with friends and family.

For those keen to dive deeper into function calling with LLMs, in this article I offer another thorough exploration of the topic.

Climbing the AI Application Value Ladder: 🤖🪜

Dawid Dahl — Tue, 12 Dec 2023 05:52:21 +0000

In the whirlwind of AI advancements, it's easy to get caught up in the hype. Many companies boast about leveraging AI, often merely as a facade for a basic ChatGPT implementation, making a few calls to their API.

As developers and AI enthusiasts, we need to ask therefore: what truly adds real value to a company? Let’s climb the AI Application Value Ladder 🤖🪜, a mental framework where we balance implementation difficulty against a company's unique selling point (USP).

Value Level 1: Custom Instructions & Prompt Engineering

Difficulty: Easy
Value: Low
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: None
ML Developers: None
Eval QA: Junior, Intermediate or Senior

At this initial level, we focus on customizing AI models to access proprietary data or mimic specific personalities. This is basic and straightforward, often involving system prompts via GUIs or APIs and ChatGPT custom instructions. While valuable for specific purposes, its overall impact is limited.

Value Level 2: Function Calling

Difficulty: Medium - Hard
Value: Medium - High - Very High
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: Junior, Intermediate or Senior
ML Developers: None
Eval QA: Junior, Intermediate or Senior

Here, AI models execute software actions predefined by human programmers. This step involves bridging structured software functionality with the more vague data handling of large language models (LLMs). It's a significant step up in both complexity and value.

For more information, I have a whole blog post here on Function Calling.

Value Level 3: Basic RAG (Retrieval Augmented Generation)

Difficulty: Easy - Medium
Value: Low - High
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: Junior, Intermediate or Senior
ML Developers: None
Eval QA: Junior, Intermediate or Senior

Basic RAG is employed when an AI model through semantic search retrieve proprietary data or context (information that the base model doesn't know), which is stored in a so called vector database.

It helps reduce hallucinations (inaccurate or fictional outputs) and examples include the ARC AI Portal - an internal app my company made where after corporate conventions one could, in near-real-time, ask questions about what was said by speakers at the convention.

However, it's complex, unpredictable, and rather hacky as it's not genuinely machine learning-based; we're not actually teaching a model how to do something.

Value Level 4: Advanced RAG

Difficulty: Hard - Very Hard
Value: High - Very High
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: Intermediate or Senior
ML Developers: None
Eval QA: Intermediate or Senior

Advanced RAG steps up the complexity with summary queries, re-ranking, and multi-step RAG pipelines, like those used in the data framework library Llamaindex. While offering high value, it's expensive, notoriously tricky to get right, slow, and still not a true ML application.

Value Level 5: Fine-tuning

Difficulty: Very Hard
Value: High - Very High
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: Junior, Intermediate or Senior
ML Developers: Junior, Intermediate or Senior
Eval QA: Junior, Intermediate or Senior

Used in actual ML applications, fine-tuning is key for giving an AI model unique abilities or styles. OpenAI's Function Calling behaviour itself is a good example how a model can learn to use different tools effectively through fine-tuning.

This process is less about accessing proprietary data (as in RAG) and more about training the model in a specific manner. In contrast to levels 2, 3, and 4 which can be achieved by programming, this level requires machine learning knowledge and the skills to gather and clean high-quality datasets.

Value Level 6: ML/Programmer Multi-Model Hybrid

Difficulty: Hardest
Value: Highest
Team Required:

Domain Experts: Junior, Intermediate or Senior
Software Developers: Intermediate or Senior
ML Developers: Intermediate or Senior
Eval QA: Intermediate or Senior

The pinnacle of the AI Application Value Ladder 🤖🪜 involves creating multi-model AI systems, combining the previous levels. This method integrates various models of different sizes and merges software with ML development, leading to advanced, performant, and cost-efficient systems.

An example is Builder.io's translation of Figma designs into code. Rather than relying solely on the more expensive and slower ChatGPT 4, they effectively segmented their challenges, applying fine-tuned, smaller, and faster models for each, in combination with RAG and regular programming.

Conclusion

The AI Application Value Ladder 🤖🪜 serves as a guide to understanding the varied levels of value creation in AI development. It outlines how each step, from basic prompt engineering to complex multi-model systems, contributes differently to a company's AI capabilities.

As the field of AI continues to evolve rapidly, embracing agents and multi-sense models, having a general framework like the 🤖🪜 is crucial. It helps in discerning which innovations truly advance our capabilities, ensuring we stay ahead in a landscape of constant change.

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys philosophy, analog synthesizers, consciousness, techno, Huayan and Madhyamika Prasangika, and being with friends and family.

Meet Your Future Co-workers: The Rise of AI Agents in the Office

Dawid Dahl — Wed, 15 Nov 2023 10:40:23 +0000

OpenAI's Dev Day has concluded, bringing a host of exciting announcements such as a longer context window (128k), the unification of all their tools into a single model, reduced prices, and much more.

But in this flurry of new and shiny releases, I think many might have missed one of the most striking things that Sam Altman (founder of OpenAI) said:

"Now I want to talk about where we are headed, and the main reason for why we are here today. Starting now, we're taking our first small step that is taking us closer to a future of agents."

Emergence of Autonomous Digital Beings

Agents? What's an agent? Sam Altman's mention of GPTs (link) and Assistant API (link), also released on Dev Day, isn't just about enabling the creation of advanced chatbots with capabilities like visual perception 👁️, image generation 🖼️, and interactive functionalities 🦾. It's a nod towards a more profound shift.

He is primarily highlighting that his company is currently laying the groundwork for a world inhabited by digital entities, capable of functioning with varying levels of autonomy.

Having thought long and hard about what this all means, I now want to paint a picture of this near (2-3 years) future.

While my perspective is that of a developer, the revolution of autonomous and multi-sensory AI agents is set to redefine workplaces across a wide range of professions. Let’s explore what this could look like!

A Day in the Life with AI Colleagues

You arrive at the office, and grab yourself a perfect cup of coffee.

As you get to your desk, you're greeted by a gathering of dedicated, albeit unusual, collaborators. One perched on your monitor, another nestled beside your keyboard, a third mid-air displaying analytics, and many more around, all eager to assist you with the tasks of the day.

Many of them have been working all night—in fact most never sleep at all—and are ready with feedback, status reports and improvements for you to review and hopefully accept while enjoying that coffee.

Let's meet the team, shall we?

Philosopher King - Lion

Habitat: Cloud

Just like a lion is the king of the animal kingdom, this agent is in charge of your overall AI assistant collective, and offers wise oversight based on philosophical and ethical principles.

Governing and guiding the mission based on your—the human’s—goals and vision, the Philosopher King lion makes sure things do not descend into madness, chaos or nonsense.

Has the power to turn on, promote, demote, or turn off other agents.

Unit Tester - Woodpecker

Habitat: Project terminal, GitHub repo

The Unit Tester AI diligently checks each small unit of your code for bugs, mirroring the meticulous tapping of a woodpecker on trees to find insects.

Integration Tester - Spider

Habitat: Project terminal, GitHub repo

Spiders weave complex webs where each thread is connected, much like how an Integration Tester agent would ensure that different pieces of your application work together seamlessly.

End-to-End Tester - Dolphin

Habitat: Monitor, Browser

Dolphins are known for their intelligence and comprehensive hunting strategies, akin to how an End-to-End Tester AI would smartly navigate through the entire application to verify a complete user experience.

Refactorer - Beaver

Habitat: GitHub repo, Code-editor

Beavers are natural builders who constantly modify their dams; a Refactorer AI would from time to time similarly offer ways to reshape and optimize your codebase for better flow and efficiency.

Bug Hunter - Anteater

Habitat: GitHub repo

With a keen sense for sniffing out its prey, an anteater represents the Bug Hunter AI agent that sniffs out code smells and eradicates errors in your repo.

Spec Guardian - Elephant

Habitat: GitHub repo

Elephants are known for their exceptional memory, much like a Spec Guardian assistant that would keep track of and enforce the software specifications and standards.

Team: Security Auditor - Owl & Performance Optimizer - Cheetah

Habitat: Project terminal, GitHub repo, Browser console

Right beside your workspace, you often find an owl and a cheetah, an unusual pair.

The owl, with its exceptional vision, acts as a Security Auditor AI, constantly alert for vulnerabilities, while the cheetah, embodying a Performance Optimizer AI, ensures your application runs at great speed. Together, they exemplify a perfect balance of security and efficiency.

Documentation Author - Honeybee

Habitat: GitHub repo, Google Drive

Much like honeybees create structured honeycombs, a Documentation Author agent crafts organized and detailed documentation, ensuring clarity and ease of access for developers and users alike.

Code Reviewer - Meerkat

Habitat: GitHub repo, Code-editor

Vigilant and social, meerkats take turns watching for danger while others work, similar to a Code Reviewer AI’s role in critically examining code before it has a chance to turn into spaghetti. Often even before you have a chance to press save!

Works in close collaboration with the testing team: the woodpecker, the spider, and the dolphin.

Customer Support - Golden Retriever

Habitat: Deployed project, Website

Just as a Golden Retriever is known for its loyalty and friendliness, this AI assists your users or stakeholders day and night with your creations, sparing you the effort.

The Tech Empowering AI Co-workers

This is just a sneak peek of some of the specialized agents that will be available to assist you in the future. I don't know about you, but personally, I get super excited about this future workplace! It's amazing how this is not just distant science-fiction speculation, but our actual and fast approaching reality.

So what are the upcoming AI technologies that will make our new AI assistant friends come alive?

Easy Agent Creation

With OpenAI's GPTs and Assistant API, this has actually already started.

Larger context window

When AI assistants can handle and retain vast amounts of data, they will be capable of integrating seamlessly into environments like a GitHub repository, an Adobe software suite, or an Asana project.

Vision

Improved AI vision will transform their interaction from purely text-based to visual. For instance, if our E2E Dolphin didn't have good "eye"-sight 👁️, it wouldn't have been able to monitor actual computer screens, clicking around the browser like a real user or spotting bugs as they occur and taking action accordingly.

Lower Costs and Increased Speed

These key factors make AI more accessible and efficient, crucial for widespread adoption.

Continuous Existence and Memory

Unlike current ChatGPT, the AI agents in our story maintained ongoing digital awareness of both self and environment. They therefore recall past interactions and continue their "life" over time; they are not just waiting for the next human prompt.

This continuous existence—which is what Sam Altman was hinting at in his keynote speech—coupled with more advanced memory management, will elevate them from mere static chatbots to dynamic collaborators, capable of growing with each new experience.

Greater Intelligence

With enhanced intelligence, these AI assistants will be able to cooperate in a dynamic network together with other agents, guided by strategic directives like those from the Philosopher Lion.

Just like the animals in our story found themselves in a scenario requiring a general contextual understanding and collaborative problem-solving skills, this greater reasoning ability of future AI models will prove to be key.

The Future of Work Awaits

In the new world of work, AI agents like the ones in our story will not be just tools, they will be co-workers. Embrace this change, stay curious, and be ready to work alongside them.

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys philosophy, analog synthesizers, consciousness, and being with friends and family.

Function Calling: The Most Significant AI Feature Since ChatGPT Itself?

Dawid Dahl — Thu, 07 Sep 2023 10:52:12 +0000

A few months ago, Umain, the company I work for, organized a hackathon. The goal? To create tech products that harness the power of AI.

Although the hackathon yielded some promising results, it also revealed a fundamental obstacle stifling further breakthroughs in AI application development: the wide gap between unstructured and structured data.

To better cement the concepts of unstructured and structured data into memory, I will from now on refer to this as Vague-Ass stuff and Hard-Ass stuff.

First, I will clarify what I mean by Vague-Ass and Hard-Ass. Because grasping this dichotomy is crucial for the utility of Function Calling, a brand new feature from OpenAI that addresses all the challenges we faced during the hackathon.

Vague-Ass Stuff

Vague-Ass stuff represents the nebulous, the immeasurable—things that can't be captured in algorithms or databases.

I used a new AI product to expand the image in the middle out to the sides. The result turned out great.

Here, you'll find:

Human consciousness
Emotions; desires and fears
Social norms; cultural nuances
The dreams of clients and stakeholders

In this realm, humans communicate using natural language — just everyday conversation.

Hard-Ass Stuff

Conversely, Hard-Ass stuff is the domain governed by rules and structured methodologies.

This category includes:

Mathematics & Logic
Computation
Machines

The language of choice here is structured language—code, logic, and equations.

Bridging the Vague-Ass / Hard-Ass Gap (VHG)

Bridging the Vague-Ass / Hard-Ass Gap (VHG for short) has brought some revolutionary improvements for human civilization. It has enabled automation, economic growth, and improved connectivity, making life significantly better.

The point of many professions we have today—such as mathematicians, structural engineers, software developers—is to serve as mediators between these two realms.

In fact, I realized that what we developers actually do all day is transform Vague-Ass stuff into Hard-Ass stuff and then back into Vague-Ass stuff.

Let me show you:

Information flows from the Vague-Ass dimension of unstructured feelings
→ using some kind of form to collect Hard-Ass structured data
→ doing something with that data
→ and then handing it back to the Vague-Ass dimension where some value has hopefully been created, in terms of good feelings.

This is how developers have always worked. And the question we were trying to explore at the hackathon was how to integrate the new and transformative potential of AI into this workflow.

The ChatGPT Challenge: A Dead-End at the VHG

So here I was, presenting our new AI code-smell-detector-app. Yet, something was nagging me: the limitations of using ChatGPT for reliable VHG bridging.

What does that mean? ChatGPT deals in Vague-Ass stuff using natural language, and therefore it struggles when it attempts to interface with the strictly Hard-Ass stuff that our applications require to function correctly.

Specifically, here is where we as developers are struggling:

Collect Hard-Ass structured data with ChatGPT

Extract Hard-Ass structured data from ChatGPT

Despite my team and colleagues creating some remarkable prototypes, we were often hamstrung by these limitations. No matter how nicely we asked and pleaded with the AI, it just wouldn't do what we needed it to in order to interact with our apps.

Well, maybe 90% or even 99% of the time it would actually do what we wanted. But when writing software, such odds are often not acceptable.

This significantly restricted the potential of every team at the hackathon. And ever since the release of ChatGPT, this has pretty much been the status quo for AI app development.

Until now...

The Artificial Intelligence Bridge

In June 2023, OpenAI suddenly released Function Calling for ChatGPT. What does it do? Basically, it solves every single issue that I have been writing about up to this point.

That's why I believe it might be the most significant feature since ChatGPT itself was released. Bridging the gap between Vague-Ass stuff and Hard-Ass stuff means that we will be able to take everything we already excel at as developers, and plug it straight into the promising land of generative AI.

So How Does It Work?

Contrary to what its name suggests, ChatGPT will never actually call your functions. In fact, for Function Calling to work there doesn't even have to be any real functions!

Essentially, all it does is attempt to generate the parameters to hypothetical or potential functions, which you using a JSON schema describe to ChatGPT.

To break it down, this is what is now possible:

Convert Vague-Ass natural language into Hard-Ass parameters. Once you have Hard-Ass parameters, you are able to do whatever you want inside your function and return it in whatever way you'd like.
Summarize the Hard-Ass data you've generated inside your function back to the user in Vague-Ass natural language
Extract Hard-Ass structured data from Vague-Ass ChatGPT computations. By describing exactly what kind of parameters you want (for example, an array of any length with objects), you can reliably return ChatGPT data in your desired format. How? By simply collecting the parameters and returning them.

To explain how Function Calling works using an example, I have made this illustration:

Please excuse the pinecone references, it is an internal joke. Pinecones are not actually an integral part of our corporate structure.

Here or here or is a link to the actual illustration since the image above is too small.

In Conclusion

The advent of Function Calling has some amazing implications for the future of AI apps. It stands as a powerful tool for developers to better serve as the bridge between Hard-Ass stuff and Vague-Ass stuff.

This breakthrough not only addresses existing challenges but also opens new avenues for innovative applications.

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys philosophy, analog synthesizers, consciousness, and being with friends and family.

AI Is Changing The Way We Code: AI-Driven Development (AIDD)

Dawid Dahl — Thu, 09 Mar 2023 07:49:32 +0000

These days, it appears that anyone can write code. You’ve probably seen plenty of videos online of non-developers using AI to generate working scripts in no time. Code that actually seems to work! It’s like magic.

And it is kind of like magic. These next-generation models like ChatGPT emerged onto the scene in 2023 and blew almost everyone’s minds; even non-technical people were amazed by all the things AI could do, coding being just one of them.

And not only that, it is getting better and smarter by the month. The pace of improvement is absolutely astounding. It seems everything is about to change.

It's a Br-AI-ve New World

But wait… doesn’t that mean developers are about to be replaced by AI?

Surely if grandma can now divide-and-conquer a data structure using an AI-generated merge sort algorithm in O(log n) time without breaking a sweat, how could it not mean exactly that?

Maybe it’d be smart to start looking elsewhere for work. How about gardening? Plumbing? Yoga instructor influencer? Anywhere where we might shield ourselves from this sudden AI disruption.

Well, hold on just a moment. In this new world, code can indeed be generated by AI. Great code, too. But here’s the kicker: how can we gain confidence that this code will actually do what it is supposed to do, in a good way, and in an ever-changing environment? Especially within the context of a software system consisting of other AI-generated code, with which it is going to have to integrate.

How can we as developers harness and benefit from the vast intelligence of these new AI companions, while also ensuring that our customers’ complex software systems remain maintainable, scale well, and function without bugs?

That is the problem that AIDD solves.

In this article and its accompanied video, we'll explore the eight core steps of AI-Driven Development and see how this workflow could have the potential to supercharge your life as a developer. And how it does so not by fighting against—nor surrendering—to the AI, but by joining forces with them.

The Value Of Human Development Skills

But before we do, for the developers out there who might still be concerned about the rise of Artificial Intelligence and how it could affect their job security. What exactly are the reasons why your skills and creativity as a developer will remain crucial to the software construction process?

Here are just five of the many exciting responsibilities you can expect as a future AIDD developer. Some you will be familiar with, some are brand new.

Be the one who has a deep and thorough understanding of the vast space of vague human customer requirements.
Carefully guide the AI through the AIDD process.
Create your application’s testing pyramid. From unit to integration to E2E tests. Yes, the AI can and will help you out here, but you are ultimately responsible for the health and maintenance of the test suite.
Repair and modify systems built using AIDD.
Use your skills as a developer to make sure the system fulfils customer requirements while also respecting the many principles of professional software construction:

Correct (vs buggy, crashing, doing nothing, or not executing)

Stable (vs brittle, or non-deterministic)

Readable (vs unreadable, or obfuscating)

Testable (vs tightly coupled)

Scalable (vs tightly coupled)

Extensible (vs tightly coupled)

Flexible (vs tightly coupled)

Reusable (vs tightly coupled)

Cohesive (vs low cohesion)

Maintainable (vs not DRY, no documentation, dead code, etc)

Performant (vs slow, and/or expensive)

Secure (vs insecure)

Usable (vs frustrating user experience)

Accessible (vs only for privileged users)

Portable (vs tightly coupled to specific medium)

Consistent (vs paradigms or coding styles or formatting mixed together without forethought)

Minimal (vs over-engineered)

 
These five competencies—dealing with human customers and their ambiguous requirements, AI guidance, test suite creation/maintenance, repair/modification of AI-created systems, wisdom of high-level software principles—are just some of the reasons why you will still be valuable in the workplace as a developer.

For the foreseeable future, it is reasonable to assume that many of these skills won’t be taken over by AI.

So What Is AI-Driven Development?

AIDD follows the 'red, green, refactor' cycle, just like Test-Driven Development (TDD). It also employs the technique of writing tests before implementation code. In a way, we can say that AIDD is like a futuristic extension of TDD.

However, unlike in traditional TDD where developers are responsible for creating both unit tests and implementation code on their own, AIDD introduces a new approach, emphasizing deep AI teamwork.

Employing the AIDD technique, you are no longer working alone; you have a smart and patient AI ally by your side, ready to come to your aid at almost every stage of development. Instead of you doing the heavy lifting, your AI companion goes to work behind the scenes while you can focus on higher-level development tasks.

The 8 Steps of AIDD

To demo the technique, let’s first go through the steps at a general level. After that, in a video we’ll go concrete with a simple yet practical example, where we construct an isolated function to achieve some goal.

This way, you will start to get a feel for how this process works, and how it can really empower you in delivering value to your customers.

1: Set the goal

Think about the function at an extremely high level; what are you trying to achieve? Consider it only in terms of Input→Output.

☝🏻 Reflect on the function’s API, the ergonomics of actually using the function. Specifically, how many and what kind of arguments should it take? Should they be optional or not? Etc.

2: Formulate the abstract type

Use a strongly-typed language—TypeScript, C#, Haskell—whatever you are using in your project, to manually write the type or interface for the function's input and the output. Strive always for pure functions, unless you have absolutely no choice but to cause side effects.

☝🏻 If you need help in this or any other step, as always—ask the AI.

3: Construct mock functionality

Using the function type you made, give them to the AI and ask it to construct a mock function. This is a function without an implementation; it simply simulates a real function by faking the output.

4: Write tests

Ask the AI to create unit tests for this yet-non-existent-function, as many as you need, based on the function type and a description of what you are trying to achieve.

Carefully review the tests it creates. If necessary, add more manually, until you get the feeling that if these tests pass, you will actually feel confident in the function’s correctness.

Here's another possible way to do step 4: write all the tests yourself, but with only one assertion for each test case. Then, you can ask the AI to generate 5, 10, 20 more similar assertions for each.

With more assertions per unique test case, the probability of the function passing the test by sheer luck decreases.

☝🏻 The unit tests should be lightning fast. This is achieved by not relying on any external state or circumstance, employing techniques like the heavy use of pure functions, monads like Promise or Maybe, dependency injection, mocking, etc.

5: Run tests and expect failure ❌

Run the tests using the mocked function and watch them dramatically fail.

If the majority of the tests are not failing—some probably will pass due to luck—return to step 4 and rewrite the tests to be more comprehensive and all-encompassing.

6: Create concrete implementation

Now finally, it’s time for the magical last step of TDD and AIDD: the refactoring. It is here we actually create and then improve on the function.

Give the AI the type or types you made in step 2. Or give it a simple verbal explanation. Or even all the unit tests you made. And unleash it!

7: Run tests and expect success ✅

Use the function it creates and, hopefully, watch them gloriously pass!

If they don't—which will probably happen quite frequently, at least now in 2023—collaborate with the AI to achieve the goal of the tests passing. You could:

Ask the AI to try again, supplying any error messages and ideas you might have to help it out.

Or:

Manually review the function and see if you can fix it, or implement it yourself. Ask the AI for ideas to help you out.

8: Refactor

Once all the tests have passed, it's important to review the function manually and assess whether it aligns with the spirit of professional software construction mentioned earlier. If not, maybe consider refactoring the function.

☝🏻 Again, if you're still not sure why you won't be replaced by AI anytime soon, the step above is one of the main reasons.

If the tests start to fail during the refactoring process, keep collaborating with the AI, following the same approach as in step 7, until all the tests pass again.

(Optional Final Step): Celebrate!

All done! Congratulate yourself and celebrate the fact that you are now one step closer to delivering happiness to your customer! 🎉

☝🏻 In this article we are focusing on unit testing. However, I believe that in general AIDD is applicable to the other testing strategies as well. That said, it should be acknowledged that these other tests typically cover more complex scenarios and therefore require further manual effort and careful thought from the developer.
☝🏻 Although the principles of TDD and AIDD are close to identical, it's not until you start developing that you begin to see the significant differences between the two approaches.

AI-Driven Development in the Wild

Finally, let’s see an actual example of how AIDD can be used in the real world.

Why not attempt to follow along with your own AI companion?

If you did follow along with the video, pause for just a moment and take in the fact you didn’t actually create that function yourself. You didn’t even create all the unit tests yourself!

Instead, you were like a god; guiding your creation with subtle nudges—yet rarely directly intervening.

That, at least to me, is mind-blowing.

Dawid Dahl is a full-stack developer at UMAIN | ARC. In his free time, he enjoys philosophy, analog synthesizers, consciousness, and being with friends and family.

Credit for pyramid and cycle graphics: Jenny Eckerud.
TS/Jest CodeSandbox template used in the video.

Forem: Dawid Dahl

AAID: Augmented AI Development

Table of Contents

What Is AAID and Why It Matters

The Business Case: What Performance Research Shows

Who This Guide Is For

Built on Proven Foundations

Developer Mindset

Prerequisite: Product Discovery & Specification Phase

From Specification to Development

Getting Started With AAID

1: Add the workflow rules

2: Add reusable commands (optional but recommended)

3: Have the AAID workflow diagram ready

Liftoff into AAID space 🚀

AAID Workflow Diagram

AAID Development Stages

📚 Stage 1: Context Providing

🤝 Stage 2: Planning (High-Level Approach)

Planning vs TDD Discovery

📝 Stage 3: TDD Development Begins

🔄 Stage 4: The TDD Cycle

🔴 RED Phase

🟢 GREEN Phase

🧼 REFACTOR Phase

Continuing the Stage 4: TDD Cycle

Conclusion: The Augmented Advantage

Example Implementation

End of Guide

Appendix A: Acceptance Testing

Key Differences

AAID Acceptance Testing Resources

Appendix B: Helpful Commands (Reusable Prompts)

Setup & Planning Commands

TDD Development Commands

Investigation & Problem Solving Commands

Miscellaneous Commands

Appendix C: AAID Workflow Rules

AAID AI Workflow Rules/Instructions

Usage Guide

Appendix D: Handling Technical Implementation Details

Appendix E: Dependencies and Mocking

Encapsulating the Past: How We Tamed a Legacy System with Timeless Software Engineering Principles

Table of Contents

Introduction

Inheriting a Mess

Reinventing from the Ground Up

The Technology Behind Our Overhaul

Ports and Adapters

Adapters (Red)

Ports

Application Business Logic (Green)

Did SOLID Principles Guide Our Design?

S - Single-responsibility Principle

O - Open-closed Principle

L - Liskov Substitution Principle

I - Interface Segregation Principle

D - Dependency Inversion Principle

Building Confidence with the Testing Pyramid Strategy

E2E Testing

Integration Testing

Unit Testing

Ensuring Testability with Pure Functions

Deployment on Google Cloud Platform

In Summary: Was the Backend Transformation Successful?

The Death of RAG: What a 10M Token Breakthrough Means for Developers

What is RAG (Retrieval-Augmented Generation)

The Big Problem In AI-Driven Development Today

The Exciting Post-RAG Era

So Is RAG Dead?

Playing the Devil's RAG-vocate

😈 Fake?

"Yeah I saw the original Gemini video, which turned out to be fake. So why would I believe this?"

😈 Staying Updated:

"RAG keeps AI clued into the latest info, something a static context can't always do."

😈 Reducing Hallucinations:

"Since RAG runs on our own server, we have the power to tell the AI to simply say 'I don't know' if relevant context was not able to be retrieved."

😈 Handling the Tough Questions:

"For those tricky queries that need more than just a quick look-up, RAG can dig deeper."

😈 Efficiency:

3: Have the `AAID` workflow diagram ready

Liftoff into `AAID` space 🚀

What is `RAG` (Retrieval-Augmented Generation)

The Exciting Post-`RAG` Era

So Is `RAG` Dead?

Playing the Devil's `RAG`-vocate

"`RAG` keeps AI clued into the latest info, something a static context can't always do."

"Since `RAG` runs on our own server, we have the power to tell the AI to simply say 'I don't know' if relevant context was not able to be retrieved."

"For those tricky queries that need more than just a quick look-up, `RAG` can dig deeper."

"When it comes to managing big data without bogging down the system, `RAG` can be pretty handy."

"`RAG` helps AI stay on its toes, pulling in new data on the fly."

"Sometimes, the AI would need to augment its data with external API requests to get the full picture. When we do `RAG`, it happens on a server, so we can call out to external services before returning the relevant context back to the model."