Forem: BrycePC

AWS Devops Agent - AI-Based Incident Analysis Demo with "The Better Store"

BrycePC — Sat, 11 Apr 2026 09:02:04 +0000

In a previous series of articles I have described the design and implementation of a sample ECommerce serverless solution on AWS at: https://dev.to/brycepc/building-the-better-store-an-agile-cloud-native-ecommerce-system-on-aws-part-1-introduction-to-27ii. The solution's source code is also available at: https://github.com/TheBetterStore.

In this article I explore the new AWS DevOps Agent service which has just become generally available in March 2026, to demonstrate its AI capabilities for automatic root cause analysis of alarm-notified incidents using all sources of information that it has available to it, including (but not limited to) Cloudtrail events, API Gateway, application and database logs and metrics, x-ray, its determined implementation topology, and integrated source code repositories - using errors introduced into The Better Store. The produced results as discussed in the conclusion certainly look impressive, for quickly analyzing errors in detail which could otherwise take much manual effort!

Initial Setup

DevOps Agent may be considered as a global service within AWS in that it monitors resources across all regions; though noting at the current date the service's control plane for configuration is available in 6 regions only (us-east-1, us-west-2, ap-southeast-2, ap-northeast-1, eu-central-1, eu-west-1). Its initial setup is relatively simple; it involves primarily:

Creating an AWS Devops Agent space to define an application boundary to monitor. AWS allows creation of multiple agent spaces within an account primarily to support segregating investigation scope to team ownership; where each on-call team has an Agent Space containing only the accounts (if cross-account access for an application is in-play) and tools relevant to what they're responsible for.
Configuring capabilities including:
1. Secondary sources; e.g. other AWS accounts where cross-account resources are used, or Azure accounts where Azure resources may also be integrated for DevOps Agent monitoring.
2. Telemetry - external sources including Dynatrace, Datadog, Grafana, New Relic and Splunk may be integrated.
3. Pipeline; where source code repositories for applications being monitored may be configured to support Devops Agent Analyses.
4. MCP Servers to support agentic retrieval against additional sources of information which may be pertinent to an investigation. For example, integration of Atlassian MCP servers are supported, which can provide AWS Devops Agent with additional information such as runbooks from an organization's Confluence pages.
5. Webhooks; to allow 2-way access between an Agent Space and 3rd-party applications and services.
Configuring access, including Operator access which support teams will use for interacting with DevOps Agent and its analyses. By default, operator access is available to users as a short-term link from the AWS web console which they require access to. Alternatively access can be integrated with IAM Identity Center or an external identity provider.
Configuring log delivery.
For our demonstration with TheBetterStore we will be configuring the following:
1. Pipeline - GitHub repositories used for TheBetterStore's services.
2. Webhooks - A separate lambda function as defined at tbs-devops-aiagent will be subscribed to error alarms configured for TheBetterStore resources, and on being triggered it will invoke a webhook defined by AWS Devops Agent, passing the alarm information to trigger an incident investigation.

Demo: Incident Simulation and DevOps Agent Analysis

For the demonstration, a bug was deliberately introduced into a new code branch in the tbs-app-order GitHub repository, to invalidate the key used for inserting data into a dynamodb table. This would cause an error to be thrown in the tbs-app-order-prod lambda services when a user tries to purchase an item from TheBetterStore, triggering an alarm. The sequence of events are illustrated as:

1. Bug introduced

2. Feature is deployed via GitHub Actions Pipeline

3. TheBetterStore error is generated via a purchase:

4. Cloudwatch alarm is triggered

5. Devops Agent Incident Is Created

Devops Incident Response Analysis

Incidents generated by Devops Agent are presented within the Operators web console in 3 separate tabs; an investigation timeline, root cause analysis (RCA), and a mitigation plan. Additionally, the Operators web console always provides a Chat window, whereby ad-hoc questions may be asked, with prompts for suggested questions also being provided (e.g. 'Summarise the investigation). Each of these sections are described below

Investigation timeline
Information within this tab describes how DevOps Agent queries underlying resources for its analysis. The first item that is displayed is a topology diagram for AWS-deployed resources and their relationships which may be in-scope for the analysis. This information is automatically-discovered by AWS Devops Agent, and it may assist with its analysis from the following:

Identifies in-scope components and blast radius
Traces impact paths through dependency chains
Helps pinpoint root cause
Shows what downstream services may be affected.

For our incident, the following is presented

Next, a timeseries of analysis events are listed which details how Devops Agent queries different sources of information to reach its findings. Only a subset of these are listed below, as an example.

10:30:26 Started

10:30:48. Fetching Data

Now I have a clear picture:

Alarm: tbs-app-order-prod-CreateOrder-Errors triggered at 2026-04-10T22:29:04Z
Metric: AWS/Lambda Errors for function tbs-app-order-prod-CreateOrderFunction-BZMzl0lYouog
Datapoint: 1 error at 2026-04-10T22:24:00Z (5-minute period)
Account: 048116471576, region: ap-southeast-2
The Order Service has:

API GW: v6jpsjputa
Lambda: CreateOrderFunction-BZMzl0lYouog
DynamoDB: tbs-app-order-prod-OrderTable-QYMTFX1MFY1E

Repository: TheBetterStore/tbs-app-order

I need to start investigating by checking the Lambda logs for what went wrong, looking at recent deployments or code changes, reviewing the Lambda metrics, checking CloudTrail for any infrastructure modifications, and examining the dependencies like DynamoDB and Stripe integration.

10:31:40 Fetching data

Now let me run multiple CloudTrail searches in parallel. I'll search by resource name for the key resources, and also search by event source for CloudFormation and CodeDeploy.

10:31:55 Fetching data

The actual log group name is /aws/lambda/tbs-app-order-prod-CreateOrderFunction (without the suffix). Let me query it now:

10:32:00 Fetching data

Let me now check GitHub workflow runs and CodeDeploy deployments:
1. Batch get deployments
2. Get github workflow run logs
3. Git log

10:32:28 Fetching data

The most recent successful deployment has a commit message "Introduce bug" — that's highly suspicious! Let me dig deeper:

1. Batch get deployments
2. Get github workflow run logs
3. Git log

10:32:42 Fetching data

Now let me look at the actual code changes in the "Introduce bug" commit:

Git log
Get github file

10:32:54 Observation

DynamoDB ValidationException: Missing the key OrderId in the item during CreateOrder invocations
Two CreateOrder Lambda invocations at 22:27:24Z and 22:28:23Z both failed with the same DynamoDB ValidationException: "One or more parameter values were invalid: Missing the key OrderId in the item". This indicates the Lambda code is attempting a DynamoDB PutItem (or similar) operation without including the required partition key "OrderId". Both invocations followed successful Stripe PaymentIntent creation (with orderId metadata set), suggesting the error occurs between payment intent creation and DynamoDB persistence. The Lambda platform reports both invocations as "success" (status code), meaning the function caught the error rather than crashing—but the application-level ERROR log triggered the CloudWatch alarm. Notably, both invocations used 512MB of the 512MB allocated memory, and the baseline window had zero invocations, so this is not a pre-existing pattern.
DynamoDB ValidationException thrown twice: 'Missing the key OrderId in the item' — the Lambda function is writing to DynamoDB without the required partition key
Source: cloudwatch

10:34:14 Root cause

Deployment of commit 4ba4473 renamed OrderId to OrderIdX in DynamoDB DTO mapping, breaking all CreateOrder writes
At 2026-04-10T07:16-07:19Z, GitHub Actions workflow run #24231398661 deployed commit 4ba4473 ('Introduce bug') from branch feature/aiagenttest to the CreateOrder Lambda function via CodeDeploy Blue/Green (deployment d-Q063WA0TH). The commit modified src/infrastructure/persistence/order-repository.ts, renaming the DynamoDB partition key field from 'OrderId' to 'OrderIdX' in the static toDto() method. Since 'OrderId' is the required partition key of the OrderTable (tbs-app-order-prod-OrderTable-QYMTFX1MFY1E), every DynamoDB PutItem call now fails with ValidationException because the item is missing the partition key. This causes 100% of CreateOrder invocations to fail. The deployment occurred ~15 hours before the first error was observed at 22:27Z, indicating no CreateOrder requests were made in the intervening period.

Investigation completed

Root Cause
Information within this tab summarises the investigation and timeline findings, to include the incident impact, root cause(s) and key findings

Mitigation Plan
Information here focuses on steps to resolve the incident. For our example the root cause was identified as being due to a bug being introduced into our GitHub source code repository. Steps are provided on how to revert the change/rollback the errant commit and to redeploy the solution to AWS.

Above steps also included sample AWS CLI commands that could be run to validate deployments and final service health (though it did specify the incorrect region for our services for our test (as of April 2026; it is likely to be resolved in a future update).

Summarise The Investigation
Finally, use of the Chat feature; specifically the prompt 'Summarise the Investigation' was tried here, which provided the following results:

This gave additional insights to the impact of the issue that could be helpful; including the amount of the failed orders and note of a reconciliation gap with Stripe, to be amended. The chat window also suggested other deeper queries that we could consider.

Conclusion

This demonstration highlighted how a Cloudwatch alarm-triggered incident was able to invoke its automated analysis by Devops Agent, which used multiple sources of information available to it to generate accurate mitigation steps and insights in less than 4 minutes from its commencement. Its analysis involved examination of multiple sources of information, including:

Cloudtrail events, as to whether infrastructure for any resources had been updated (they hadn't).
Examining the topology to understand dependencies and architecture (and potential dowstream effects) of resources. This identified that there could be a reconciliation gap between our DynamoDB and Stripe (card payments provider)!
Examining recent deployments of the affected resource in Cloudformation, which is used for its deployment.
Examining the impact lambda's application logs in Cloudwatch.
Examining GitHub deployments, and associated code changes.
Checking for errors prior to the change (to confirm the errors are new).
Recognising that the causing git commit was highly suspicious (though noting this code commit comment was entered for the demonstration).
Concluding that the issue was caused by the code change, for inserting records into a DynamoDB table with the wrong key; but also noting that the lambda was using all of its allocated memory in the process, as an additional issue to be addressed!
Generating root cause analysis, and mitigation steps.

Without this automation, it is envisaged that a competent support engineer who is familiar with the application would be able to identify the issue by first examining the lambda's application logs (while its errors are the source of the alarm) to see the error details. However at this point they may not be aware of the lambda's recent deployment; if not it is expected that they would need to liaise with the lambda's development team to identify and resolve the issue. The turnaround for this from an engineer first receiving an alarm would typically range from 30 minutes to 1-3 hours; during which in this example the system could be unavailable to users! It is unlikely that analysis at this time would also identify a potential memory issue, and coverage of an impact assessment for example to include gap analysis formed with Stripe would be dependent on maturity of support processes in place.
It is also anticipated that Devops Agent would be well-placed for identifying more obscure issues where it is able to quickly select and scan required sources of information; for example slow performance due to slow RDS SQL queries leading to lambda or API Gateway times; or a new Service Control Policy being applied to the AWS Account which unexpectedly result in permission errors. Such problems typically require time and AWS specialists to resolve.

The benefits of AWS Devops Agent for incident analysis hence look fantastic, and the ability to add MCP Servers can also provide further information enrichment capabilities to organisations. However there are a couple of factors that should be considered:

SRE maturity: The described solution is dependent on Cloudwatch alarms being appropriately configured, to trigger incident investigations when there is a real problem.
Cost: Charges are based on the time that the agent spends on operation tasks such as investigations and chat queries, billed by the second. For deployments to us-east-1 this is $0.0083USD per agent second. For an investigation taking 4 minutes to execute (similar to our demonstration), this would equate to $1.992USD. As can be imagined charges could accumulate rapidly if alarms are frequent! AWS Devops Agent however at this time is included in the free tier plan for new customers, and a free trial is also provided to customers for the first 2 months after first use of the service. Credits are also provided to customers on paid AWS Support plans. The AWS DevOps Pricing page should always be consulted for latest reliable information.
Security: It is good to know that AWS Devops Agent provides read-only actions against its AWS Accounts to support queries and investigations; it does not support update actions for example for remediation. There are however still important security considerations to be aware of, to ensure AWS-stored data is not leaked beyond intended audiences and an organisation's security policies are maintained. These include:
1. Separate agent spaces should be created with their own IAM roles to maintain isolation between application boundaries, and to prevent unintended access across different environments or teams. Authorisation should be configured for each Devops Agent space via integration with IAM Identity Center or other external provider to ensure only allowed users can access these.
2. While AWS Devops Agent provides prompt injection safeguards to ensure read-only capabilities beyond opening of tickets and support cases, integrated external MCP servers may offer the same protection, and these should be carefully reviewed before enabling. AWS provides further security recommendations for AWS Devops Agent, which can be reviewed here.

References

What are DevOps Agent Spaces? - https://docs.aws.amazon.com/devopsagent/latest/userguide/about-aws-devops-agent-what-are-devops-agent-spaces.html
Best Practices for Deploying AWS DevOps Agent in Production - https://aws.amazon.com/blogs/devops/best-practices-for-deploying-aws-devops-agent-in-production/
What is a DevOps Agent Topology? - https://docs.aws.amazon.com/devopsagent/latest/userguide/about-aws-devops-agent-what-is-a-devops-agent-topology.html
AWS DevOps Agent Security - https://docs.aws.amazon.com/devopsagent/latest/userguide/aws-devops-agent-security.html

Disclaimer: The views and opinions expressed in this article are those of the author only.

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 1: Introduction to Microservice Architecture

BrycePC — Sat, 11 Apr 2026 02:44:23 +0000

A resilient cloud native app built using Domain Driven Design and Microservice Architecture on AWS

Overview

A lot has changed in IT over the last 10–15 years, with the advent of Cloud Native Computing. Its rise has enabled organisations from the smallest of start-ups to large organisations to design and implement systems that can:

Automatically scale on demand (and paying only for what you use).
Provide system resilience; i.e. such that a service can recover on failure.
Support change agility; i.e. to allow the implementation of new features within a short timeframe with minimal risk of outage.

As a Senior Consultant with several years experience specialising in Cloud-Native design, full-stack development and DevSecOps implementation on AWS within both start-up and large organisations, there has been much to keep up with and learn as new methodologies have emerged predominantly around Microservice Architectures (MSA) and Cloud Native patterns. Many of these topics are still largely debated; e.g. how big should a microservice be? Should microservices be allowed to communicate with other microservices, and are shared databases across microservices okay?

As an endeavour to expand and share my own knowledge of the topics with examples to demonstrate and build upon, “The Better Store” has been developed as a sample open-source eCommerce system with web (SPA) frontend to illustrate Domain Driven Design (DDD), MSA, Cloud Native development and implementation on AWS, with the intention of evolving this over time. This article is Part One of a series of seven planned articles, which will cover the following topics with reference to The Better Store:

Part One: An introduction to Microservices Architecture (MSA) to realize benefits of Cloud Native Computing, and Domain Driven Design (DDD) as a popular to assist in the formulation of an optimum MSA solution design for our given business domain.

Part Two: A focus on DDD Strategic Patterns, to assist in defining our business domain and appropriate Bounded Contexts, which help identify candidate microservices.

e.g. The Better Store context map

Figure 1. Sample Context Map for “The Better Store”, providing insights to potential decoupled microservices and their granularity/scope.

Part Three: Using DDD Tactical Patterns for further elaboration of The Better Store’s domain model and MSA design, with a focus on its Order Context.

e.g.

Figure 2: Illustrating a potential static view as a class diagram for designing an Order microservice, showing Domain (core logic), Infrastructure and Application Tiers

Part Four: Selecting Cloud Native Design patterns and AWS Services for implementation of The Better Store as an agile, highly scalable and resilient global solution; including at a high-level:

Decoupled services and communication styles
Database per Service
CQRS, and related data consistency patterns (i.e. Sagas, Aggregates and Event Driven Architecture)
API Gateway and composition
Global and auto-scaling architecture
Serverless implementation using AWS API Gateway and Lambda with NodeJS, Typescript, Inversify and the Onion Architecture

Figure 3: An envisaged implementation view of The Better Store

Part Five: Use of DevOps; specifically AWS Cloudformation combined with GitHub and AWS Pipelines for defining both applications and infrastructure as code, for fully-automated deployments.

Part Six: Development of the web frontend as a Single Page Application, using a javascript framework

Figure 4: The Better Store home page, whereby users may browse products, and after signup or sign-in, may add these to cart and purchase (using test payment services only)

We envisage that our final MSA landscape may look like the following:

Figure 5: An envisaged conceptual architecture for The Better Store, illustrating decoupled microservices

Part Seven: Monitoring and Site Reliability Engineering; how to get the most out of AWS Cloudwatch for early identification of potential issues and remediation, scaling and obtaining business insights.

For Part One though, we’ll now look further at a microservices introduction, its challenges, and then what we would like to initially see for The Better Store.

An introduction to Microservices

One of the most well-known methodologies adopted in recent years to help realize Agility, Scalability and Resilience capabilities in The Cloud, particularly for medium to enterprise-sized applications is Microservice Architectures (MSA). MSA takes the approach of designing applications as a composite of small, decoupled, stateless and independent services that are cohesive to a specific task, to provide the following advantages:

Services are kept small and cohesive to a specific function, to promote Change Agility, and Scalability.
If a change is required to a specific application feature, e.g. cancelling an eCommerce order; only one independently-deployable service; e.g. Order Service that is specific to maintaining orders should require changing at the backend. This helps keep logic clear in one place, and the risk of a severe or unforeseen release defect is reduced when only 1 component needs to be released. This reduced change size and risk can (and should) allow changes to be deployed faster with reduced change management, allowing consumers to benefit from changes more quickly (while also further reducing risk as new changes are deployed in small increments). Smaller-sized services can also generally be started quicker by the Cloud platform when configured for horizontal scaling (i.e. starting multiple instances of a service to handle increased demand).
Note that adoption of MSA to compose an application as many microservices does add a level of complexity to the solution, especially concerning their interaction and when multiple services are being developed concurrently. Releasing small-sized changes frequently with the support of mature DevOps processes for automated testing and deployment is often considered a prerequisite, which requires initial investment for organisations that wish to adopt MSA.
Services are decoupled; to promote Change Agility, Scalability and Resilience, and be Technology Agnostic
We mentioned above that keeping services cohesive to a specific task helps reduce complexity and increase agility by keeping services small and independently-deployable, with reduced risk of impacting other services.
The potential risk of a change to a service impacting other services and the overall application is also highly-dependent on the degree of coupling that exists between services and resources. For example, the following illustrates a hypothetical implementation of an order service, which is responsible for first completing payment of an order, before sending instructions for its shipping fulfillment:

Figure 6: Illustration showing coupling between services and the database

Order Service here is dependent (tightly coupled) to Fulfillment Service; meaning that any changes that are made to Fulfillment Service have the potential to impact Order Service. Regression testing of both services should be considered for any such changes. Furthermore, a runtime failure of Fulfillment Service would also likely impact Order Service and possibly the user’s experience. Additional coupling is also illustrated by the fact that both services are sharing the same database. Any changes that are required to be made to the database for one service may also have consequences for the other service.
In both these cases, clear communication and collaboration is needed between teams that manage both services whenever changes are required.

On the other hand, an MSA-based alternative may look like the following:

Figure 7: Illustration showing decoupled services, as favoured for MSA

Here, Order Service has been decoupled from Fullfilment Service, such that its order confirmation messages are sent en-route to Fulfillment Service via a publish-subscribe mechanism. The order service only needs to know how to send its messages to the message-based technology; for its order payment/confirmation use case it does not need to wait for a response from Fulfillment Service. The fulfillment service does not even need to be immediately available at the time messages are sent, as long as it is configured as a durable subscriber (for discussion in a future article).

The refactored solution has also decoupled the data store, such that each service has its own database. This provides benefits not only in terms of allowing the service’s development team having full control of changes to its data store with greatly reduced risk of impacting other services, they can also select the database technology or vendor that is most suitable to them, providing polyglot persistent to the overall application. Similarly, teams have a choice of implementing their services in a programming language that is most suited to their skill sets and uses cases, provided that it is supported by their Cloud provider.

3. Services are stateless; to promote scalability and resilience.
This means that services do not need to remember information of previous messages received from a client; each client request is required to be handled as though it were received from a client for the first time. This feature enables automated horizontal scaling and failover of services, whereby new service instances can be created and destroyed as needed based on demand or failure, and any new instance should be able to handle requests from clients that were served by another instance before.

Challenges of Microservices

Promoting change agility, scalability and resilience of applications are major benefits provided by MSA and Cloud Native Computing. However, many of its design principles introduce larger challenges that are not present with traditional monolithic solutions, and debates on many of these topics continue to rage, including:

How big is a microservice, or rather how should the granularity of a microservice be decided?
What are best practices for ensuring that microservices are effectively decoupled within a business domain, for optimal change agility and minimal maintenance in the future?
How can shared databases be eliminated if direct communications or ‘chattiness’ between microservices should be avoided?
How can work be roll-back if there is a partial failure; e.g. if a fulfillment operation cannot be completed, how can a user’s payment best be refunded?
What are the best communication technologies and patterns for different use cases?
How can global solutions best be implemented, to provide either regional disaster recovery/failover, or maximum performance for serving data consistent data to users no matter where they are?
What prerequisites are required for embarking on an MSA implementation?

Indeed, catering for much of the above is a complex task and can require much initial investment, particularly for organisations that do not yet have mature DevOps, Agile teams and practices, or Site Reliability Engineering (SRE) capabilities. Change agility and scalability advantages of MSA are valuable for medium to large applications, however these may not be so important and only add a restrictive overhead to start-ups. Each of these however will be discussed further in the design of The Better Store.

Coming next in Part Two: Using DDD Strategic Patterns to design The Better Store

References

“Domain Driven Design, Tackling Complexity in the Heart of Software”, Evans, E., Addison-Wesley (2003)
“Patterns, Principles, and Practices of Domain-Driven Design”, Millett & Tune, Wiley & Sons (2015)
“Design Patterns for Cloud Native Applications”, Indrasiri & Suhothayan, O’Reilly (2021)
DDD, Hexagonal, Onion, Clean, CQRS, … How I put it all together, Graca, H, _web _(2017)
“Intro to Amazon EventBridge”, Beswick, J. (AWS), web video

Disclaimer: The views and opinions expressed in this article are those of the author only.

Building “The Better Store” — Part 4: Implementing a Microservices Architecture with Cloud Native Patterns and AWS Services

BrycePC — Wed, 18 Mar 2026 08:20:03 +0000

The previous two articles of this series have provided introductions to Domain Driven Design principles, and how they may be used for defining an appropriate Microservice Architecture for our sample ‘The Better Store’ cloud-native ECommerce system.

This article continues from the proposed DDD tactical design to formulate an implementation of our decomposed microservices, while describing and adopting popular cloud-native patterns to reap advantages that they provide.

The focus here will be defining an implementation which has the following features scope, as defined in the previous Strategic Patterns section:

Scenario 1: OrderPurchased
**When **I submit valid Card details
**And **Payment is approved
**Then **My order details with cart details will be stored in the Order Repository
**And **The order details will be eventually persisted to the reporting database
**And **An electronic Receipt will be emailed to me
**And **A shipping order is sent for Fulfilment
**And **I will be directed back to the store’s home page, with a notice confirming the order number.

This logical flow is represented by the following:

_Figure 1: Logical flow representing completion of the Order Purchased scenario.

So now that we have identified the resources and required interactions between them for implementing flows, we need to determine exactly how to implement these in a way that provides optimal scalability, resilience and performance while considering cost with AWS as the platform of choice. Some questions that we can start asking ourselves while looking at the above high-level implementation designs are:

What is the best backend hosting technology to use for compute and data services that quickly scale on-demand, can be easily scaled across regions for a global implementation, are resilient, provide cost optimization for both development and production environments in terms of compute and operational/maintenance costs.
What are the best methods for enabling communications between services and data stores which provide scalability, resilience and cost optimization that accommodate both development and production environments?
What are the best methods for managing transactions and errors?

Microservice Architecture Patterns

A number of patterns and best practices which help to answer these questions have been defined for microservice architectures; however Chris Richardson provides a great overview and illustration of these at his website https://microservices.io/. A cut-down version of this to illustrate those considered for The Better Store are shown below:

Figure 2. A summary of Microservice Patterns, highlighting key patterns for discussion with The Better Store in yellow.

A description of those used and why by The Better Store are next.

Application Patterns

Decomposition
A. Decompose By Subdomain describes how Domain Driven Design may be used to decompose a business’s domain into decoupled subdomains or Bounded Contexts, each of which may be considered as a candidate for a microservice implementation. This topic has been the focus of our previous articles in this series.

B. The Self Contained Service pattern describes how services within an application are decoupled and can be updated and deployed with minimal risk of impacting other services. It also means services ideally should not synchronously call other services, or resources that they do not own such as shared databases; as in doing so can increases risk of release issues.
Consider for example the following alternative implementations for order fulfilment requests upon receiving payment confirmation from a card merchant. The first illustrates tight coupling between Order and Fulfillment services and use of a shared database. In this topology, changes to either of the Order or Fulfilment services, or the shared database, risk impacting request processing, and consequently customers not receiving their items as expected if not appropriately remediated. Side effects can also include:

Unexpected cloud usage charges; for example an issue causing a request to wait and consequently time out for a downstream service call will also impact its upstream requesting services.
Larger system availability issues due to exhausted resources (for example AWS Lambda concurrency, or database connections) if requests are not able to complete.
Potential issues in error handling when errors are returned to the payment and receipt system, which would need to be considered.

Figure 3: Illustrating tight coupling between Order and Fulfilment services and use of a shared database.

An alternative, decoupled and more resilient solution is shown below, noting that the client does not require data to be returned in a response; its calls may be made asynchronously. This solution offers the following:

Confirm Payment calls from the Payment system are placed on a queue with ‘Guaranteed At Least Once’ delivery; and a successful response is ALWAYS returned to it. The client just needs to know that the message has been delivered successfully.
Both the Order and Fulfilment service receive requests asynchronously; it does not matter if they are not currently running, the messages will wait for them until they are next available.
If either service fails in processing a request then processing may be configured to retry for a set number of times, after-which they may be placed on a “Dead Letter Queue” for appropriate error remediation.
Both Order and Fulfilment services in this way are effectively decoupled, including having their own databases, such that an error introduced to one service should minimise impact for the other.

Figure 4: An alternative decoupled solution

A further note for the Better Store; this pattern has also been taken to describe how each service should own and define all of their application and infrastructure dependencies such as data storage, security resources etc; such that they can be deployed quickly and independently across environments. This includes for example having their own security roles, firewall definitions, databases (see below), SSL certificates and domain records defined; external/shared dependencies are kept to a minimum; such as the VPC in which they reside.
In a future DevSecOps article, Infrastructure as Code (IaC) using AWS Cloudformation will be described, for the creation of fully-encapsulated Cloudformation stacks which may be used to deploy instances of independent microservices, including resources that they require.

The example below illustrates the Order Cloudformation stack, which defines all of its required resources, and shared infrastructure stacks that it depends on.

Figure 5: The Order Service encapsulated for deployment as a self-contained AWS Cloudformation stack.

To conclude this section on Self Contained Services, we can name some candidate AWS services for implementing the resources described:

Guaranteed at-least-once queueing: SQS
Dead Letter Queues: SQS
Asynchronous messaging between services (“Remote Procedure Invocation/RPC”): SNS, EventBridge, DynamoDB Streams
Data stores: DynamoDB, RDS/RDS Aurora
Self-contained IaC: Cloudformation
Compute processing for processes of short duration, with fast scaling capabilities: Lambda

Application Architecture
C. Monolithic: where an application is built and deployed from a single source code repository. This may have advantages for smaller applications and startups while source code is new and small, while offering reduced complexity. However, its continued growth over time without checks can yield a system that is harder to change, scale and deploy, in a phenomenon coined the ‘Big Ball of Mud’ (Foote & Yoder).

D. MSA: as already discussed, is focused on change agility when decomposing applications into multiple services. It cares less about code reuse in contrast to earlier architectures; it may be that duplicate code can sometimes exist between services, but this does allow such code to be modified if required in an application, knowing that other applications will not be affected as a result.

Database Architecture
E. Database per Service: is another decoupling approach recommended for microservices. Traditional relational databases can grow large to support data models shared by multiple services, while they also enforce relational constraints and atomic transactions across tables to maintain data integrity.
Imposing the database/service pattern implies the following:

The database is split into objects specific for each microservice, which means breaking relational constraints and ACID transaction support otherwise provided.
The application’s architecture needs to be refactored to cater for the loss of these constraints to preserve data integrity. Patterns that may assist include, ‘Saga’, and ‘Idempotent Consumer’, which are introduced below. Advantages of the Database per Service again is change agility; any changes that may be required to a database should generally only impact its owning microservice. This greatly-reduces the risk of issues and the amount of regression testing that may otherwise be required when making changes. Furthermore, each service is free to use a database technology that is most suitable for their needs (aka polyglot persistence), for example:

The Order microservice is expected to use AWS DynamoDB, a serverless NoSQL database which scales well for high-demand, and is capable of replicating data across regions for potential future global scalability of the application.
The Reports microservice is expected to use AWS Aurora Serverless, to receive orders in batches, which supports complex relational queries using SQL to provide overnight reports. Its serverless nature is expected to provide cost optimisation for its low intended traffic, while any cold-starts in its activity will not impact users.

F. Saga is a pattern that addresses the problem of how to manage business transactions that span multiple services and/or databases, for example when implementing the Database per Service pattern described above, and including a distributed transaction e.g. via a 2-phase-commit is either too complex or not possible for error handling. It describes a process whereby such transactions are implemented as a sequence of partial transactions against each of the participant databases. If any single step of the transaction fails, then previous changes are to be rolled-back by running copensating transactions in the reverse order.

An example of a choreography-based saga including compensating-rollback of transactions is given below (where system behaviour is asynchronously event-driven):

Figure 6: Saga pattern illustrating compensating actions to roll-back a transaction.

Application Infrastructure Patterns

Communication Patterns
G: Remote Procedure Invocations (RPI): refers to the use of standard synchronous request/reply protocols for inter-service communications, for example via REST, gRPC. These have advantages over Remote Procedure Calls (RPC’s) between services, which are dependent on a specific programming language being used between client and server; such as an SDK call between a NodeJS application and AWS’s NodeJS SDK.

Inter-service communications using standard protocols are sometimes necessary for processing of requests, and RPI’s use of the request/reply pattern allows this to be achieved simply. The pattern however does result in tight-coupling between services involved, as discussed for the Self-Contained Service pattern above.

_Candidate AWS services: API Gateway, AppSync_

H. Messaging: refers to the use of asynchronous message channels for inter-service communications, in-contrast to synchronous Remote Procedure Invocations.
As previously described in the Self-Contained Service pattern described above, use of asynchronous messaging is aimed at decoupling services and increasing overall system availability, such that a change to one service should generally be seamless to services that communicate with it.

The pattern also includes different types of communication; for example:

Notification; a sender sends a message to a recipient, and does not expect a reply.
Request/asynchronous response — where the recipient replies eventually. The sender does not block waiting.
Publish/subscribe — a service sends messages to 0, 1 or more subscribers. These may also be ‘durable consumers’; to guarantee they will receive messages eventually if they are not currently running.

_Candidate AWS services: SQS, EventBridge, SNS._

I. Idempotent Consumer: is a key pattern requires full consideration when implementing microservices; it means that a service must be able to handle requests if received more than once with no side effect; i.e, the outcome of processing a request repeatedly must be the same as if only processed once.

The reason why this pattern is so important is that a number of AWS services guarantee ‘At Least Once’ message delivery to consumers; i.e. no messages will be lost, but duplicates may be received and the consuming service must be able to deal with these.

Such examples include:

SQS may redeliver a message to a consumer if previously consumed but has not been acknowledged as processed, before its Delivery Timeout period has elapsed.
Asynchronous requests may be automatically retried by some AWS services on encountering an error. For example, if an error is thrown from a lambda function, the lambda function will automatically retry processing 2 further times in case it was transient, and if still not successful it will place the request on a Dead Letter Queue if configured.
Failed message deliveries from EventBridge, SQS and SNS all may result in messages being retried, and being delivered to a Dead Letter Queue if a threshold has been exceeded.

The design of idempotent consumers does also have benefits for error handling; for example; if a single request contains 100 records in which only 1 fails; the request can be safely retried following correction of the error for the single record; resending of the other 99 records will not result in any change to the system.

Methods for implementing this pattern may include:

Ensuring that every request has a unique identifier and recording receipt of these in a data store when messages are received and processed. Any subsequent receipts of the messages may be ignored.
For some applications designing requests to contain all state of a request to be processed, such that performing an update in the datastore for the record will not result in any change.
Ensuring that requests have a timestamp included from the receiving system, and only processing a record if this is newer than the timestamp last received by the consumer.

J. Api Gateway: is often implemented in front of a service to act as a single entry point for its clients, to provide the following capabilities:

They define a service’s Published Language (refer to DDD Strategic Patterns) / interface contract via an open-standard specification such as Swagger or OpenAPI, for purposes of providing a shared understanding of required input data for requests and the expected output, between developers and its consumers. It is intended that these specifications provide all the information that its consumers require, the inner workings of the service do not need to be known.
They may serve simply as a proxy layer to underlying services, while offering additional capabilities such as authentication, authorisation, request throttling (e.g. to protect the system from unexpected surges in traffic), WAF and transport-based encryption.

Candidate AWS services: Api Gateway, AppSync (supporting GraphQL)

Observability
K. Metrics: provide a continuous stream of data points over time as a measure of the performance and health of an application and its resources, for monitoring and potential remediation. Example metrics include:

Counts of consumer requests and errors over time
Average, maximum and minimum request durations for request processing (latency) over time
CPU (%) and system memory (e.g. MB) used over time.

Candidate AWS services: Cloudwatch Metrics

L. Log Aggregation: refers to a centralized logging service that aggregates logs from multiple service instances, for easy accessibility and analysis.

Figure 7: Screenshot of AWS Cloudwatch Insights, which allows log groups to be queried using a SQL-like syntax for fast analysis and troubleshooting.

Candidate AWS services: Cloudwatch Logs, Cloudwatch Insights, OpenSearch

M. Distributed Tracing: provides the ability to determine how a single request may traverse across multiple services for its processing within a distributed system, which is made possible by their allocation of a unique trace id when first received.
Distributed tracing provides the following benefits:

It enables developers to understand the flow of processing events for a request.
Can help identify performance bottlenecks at different processing points in the system.

Figure 8: Screenshot of an AWS X-ray trace for processing of a single request.

Candidate AWS services: XRay, Open Telemetry

N. Dashboards: provide a graphical collection of metrics for a defined portion of the source to give a holistic view of its behaviour.

The following provides an example specific to the Order service, providing a view of resources that it contains:

Figure 9: Cloudwatch Dashboard constructed specifically for monitoring resources belonging to the Order service.

Candidate AWS services: Cloudwatch Dashboards, OpenSearch (Kibana), Grafana

O. Alarms: These may be used to provide notifications to IT Staff in cases where manual intervention is required when certain system metric thresholds being exceeded.

Examples include:

Request volumes are higher than the system’s capacity, to cause throttling of some requests (throttling metric > 0).
Asynchronous requests have failed processing following x amount of retries, and have been placed in the configured Dead Letter Queue (which has an alarm threshold > 0 for a defined period).
Synchronous requests to a lambda are failing; where the lambda Error metric threshold is > 0.
Relational database CPU is > 90% for a defined period; vertical scaling may need to be considered.

Candidate AWS services: Cloudwatch Alarms, OpenSearch (Kibana), Grafana, SNS (notifications)

Infrastructure Patterns

Deployment
P. Services/Host or VM: Involves deploying a number of services or potentially an entire system on a single host.
This may initially provide advantages of simplicity and efficient resource utilization in contrast to a Service/VM pattern, but it also has the following disadvantages:

Difficulty in isolating resource usage between services and reduced availability because of this; an errant service, or host issue will impact multiple services.
Potential difficulty/less efficiency in being able to horizontally-scale a single small service, when larger more resource-consuming resources also need to be included.
Maintenance of the underlying Operating System is typically the responsibility of the cloud account holder, including OS updates and security patching.
Horizontal scaling involves instantiating new VM’s, which due to the required startup of their OS and other underlying services can be slow.

Candidate AWS services: EC2 (shared or dedicated hosting)

Q. Service/Host or VM: Involves deploying single services into their own dedicated host VM’s. This provides advantages over Services/Host or VM, in that services are isolated from each other, at the cost of having to maintain and pay for additional hosts or VM’s.
Horizontal scaling of entire VM’s is slow, but potentially faster than if hosting multiple services/VM.

Candidate AWS services: EC2 (shared or dedicated hosting), Beanstalk

R. Service/Container: Involves packaging services as docker images, and deploying them into isolated docker containers.
Benefits of container vs VM deployments include:

Horizontal scaling of docker instances is much faster in contrast to starting new VM’s and their underlying OS’s.
The container image also encapsulates the runtime that the service requires; which provides portability with consistent for deployment of services into different environments. Note unless serverless options are used, maintenance of their underlying host VM including OS is still required.

Candidate AWS services: Beanstalk, ECS, Kubernetes, App Runner

S. Serverless: Refers to the deployment of services to compute platforms which hide their underlying server details; their cloud provider instead assumes responsibility for managing underlying hosts, their associated infrastructure, and OS patching.
Typically the implementor needs to only provide the amount of memory (GB) and/or the number of virtual CPU’s that are to be allocated to a service executable.

Candidate AWS services: ECS (Fargate), Kubernetes (Fargate), Lambda.
Of note, while the first 3 services above provide container-based hosting of services, Lambda provides Function as a Service (FaaS) capabilities; where each implementation provides a single compute function only, which are designed to run transactions of short duration but which can scale very quickly based on consumer demand.

Next, we will look at how some of these patterns may be used for our chosen Use Cases.

Implementation

On consideration of the microservice patterns and candidate AWS services for their implementation, we conclude here in defining an implementation view, as illustrated below:

Figure 10: Implementation view for Order Purchased scenario

Decisions made for this architecture include:

Order, fulfilment and reporting services will be implemented as separate decoupled AWS microservices; each of which will be defined as separate AWS Cloudformation stack instances (tbs-app-order-prod, tbs-app-reports-prod, tbs-app-fulfilment-prod) that are defined in their own GitHub repositories, matching the stack names. This aligns well to the Decompose by Subdomain and Self Contained Service patterns.
The Database per Service pattern will be used such that tbs-app-order and tbs-app-reports services have their own database that is best-suited for their use-case. The tbs-app-order service will implement DynamoDB as a highly-scalable and potentially global database that can accommodate high traffic volumes, where structured data and complex query capabilities are not required. The tbs-app-reports database will implement AWS Aurora Serverless v2, to subscribe to batched order confirmation updates, and allow these to be stored in a structured manner to allow complex queries for monthly reporting. As its traffic volumes are expected to be low and immediate responses are not required by clients (cold-starts of the dabase can be accommodated), scaling down to 0 CPU will be used for cost optimization. Finally, it is expected that running of queries can target the database’s read-only endpoint, to not impact writing to the database (though again this is probably not required for its expected traffic).
Inter-service communications will be asynchronous using messaging where possible, using primarily AWS EventBridge. AWS EventBridge offers similar capabilities to SNS for implementing the publish/subscribe pattern; it is a little bit slower which is not deemed so important to us for asynchronous communications, as it has advantages over SNS including more integration options, content-based subscriptions (i.e. subscribers can select to receive requests based on their content), and event storing (although this and event sourcing is not considered further here; it has its own complexities). SQS is used for guaranteed message delivery, where it is integrated with API Gateway to receive payment confirmation messages from the payment system (Stripe). In this way our webhook that we configure in Stripe to invoke has very high availability; success responses will always be returned to Stripe as messages are placed on the queue for processing.
Aysynchronous services will implement the Idempotent Consumer pattern, to support Guaranteed At Least Once delivery properties of SQS and EventBridge, and the automated retry ( 2 times) behaviour of asynchronously-triggered AWS Lambda functions if they throw an error. Dead Letter Queues (SQS) will be configured for SQS queues, EventBridge and Lambda functions where appropriate, to ensure that for Lambdas; errored requests are not lost, and other services such as EventBridge, that retries do not continue forever!
Synchronous messaging via Remote Procedure Invocation will be implemented as RESTful API’s using API Gateway. Examples of its use include requests from the client website to retrieve and post data, where the client is dependent on information returned in the response.
AWS Cloudwatch will be used for monitoring metrics and logs of services and their associated resources, and providing monitoring dashboards and alarm capabilities. AWS X-ray will be used for distributed tracing of requests received. Note other services such as OpenSearch and Managed Grafana are also available and may provide greater capabilities; Cloudwach has been chosen due to its simplicity for implementation while providing sufficient capabilities at low cost for our needs.
Serverless resources will be used where possible for our implementations, for reduced maintentance that would other be required for managing and patching servers, its generally-faster horizontal scalability and use of the Pay As You Go model which is generally favourable, especially for non-production systems! AWS Lambda is currently used to provide all compute functionality for The Better Store, while all of its request processing workloads are small and of very short duration (i.e. < 10 seconds, where AWS Lambda offers request processing for durations of up to 15 minutes).

To conclude, sample code (as Cloudformation templates and NodeJS implementations) for the services described here and other supporting stacks may be viewed on GitHub.

Coming soon in Part Five: Use of DevSecOps; specifically AWS Cloudformation for defining both applications and infrastructure as code, for fully-automated deployments.

References

“A pattern language for microservices”, Richardson, Chris. _web _(2023)
“Big Ball of Mud”, Foote & Yoder (University of Illinois), _web _(1999)
“The Better Store Documentation”, Cummock, B. _web _(2025)
“The Better Store Github Repository”, Cummock, B. _web _(2025)

Disclaimer: The views and opinions expressed in this article are those of the author only.

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 3: DDD Tactical Patterns and App Architecture

BrycePC — Wed, 18 Mar 2026 07:32:58 +0000

Part Two of Building the Better Store described the use of DDD Strategic Patterns as tools for tackling complex requirements for a business system, by decomposing the problem domain to produce a high level design composed of decoupled ‘Bounded Contexts’; each of which may be considered as an initial blueprint for a Microservice.

This article continues from the conceptual view produced in part two, and becomes more technically-focused on implementation details using DDD Tactical Patterns.

Introducing DDD Tactical Patterns

DDD tactical patterns, also known as ‘model building blocks’, are used to help define static models for complex bounded contexts.
The main patterns and their relationships are illustrated as:

Figure 1: DDD Tactical Patterns

where each of the patterns may be described as below:

Figure 2: Table defining main tactical patterns, for designing a Bounded Context.

Example: Order Bounded Context

The Order bounded context was first introduced in my Part 2: Defining Defining DDD Strategic Patterns article, as representing a core subdomain responsible for managing orders and payments within The Better Store. Its DDD strategic design included the following:

A. BDD Features

PurchaseProductsInCartFeature; order-related scenarios include: @ConfirmOrder; an Order consists of Products and their quantities in the cart, the Customer and associated email address, delivery address and shipping cost (of order contains physical products). The customer is directed to the payment system for completion here.
ManageOrderFeature; order management scenarios including: @ViewOrder; allows details of a previously-created order to be retrieved from the system. @ViewOrderHistory; allows a list of previous orders created for a customer over the last 6 months to be retrieved.

B. Class Responsibility Collaboration

Figure 3: Collaboration Responsibility Card for the Order subdomain, illustrating relationships with other subdomains

Combining these strategic design outputs with the described tactical patterns, an initial draft high-level class diagram may be constructed as below:

Figure 4: High-level class diagram representing a static view of the Order bounded context, for a potential microservice implementation

Note that while this provides us with a good start for an object-oriented design of how we may wish to implement an Order microservice using an object-oriented language , we need to at this point consider an appropriate Application Architecture for structuring the service, using layering principles to help ensure the application can be easily extended and maintained into the future, while avoiding the potential Big Ball of Mud[7] anti-pattern! For this we will be using the Onion Architecture, as described next.

Application Architecture with Layering; Introducing Onion Architecture!

A layered application architecture is a standard technique used by software developers and application architects to structure application source code into abstract layers (or tiers); for example by splitting code into separate subdirectories, modules and/or namespaces within the application’s code repository based on their general concerns; such as presentation, business domain logic, and data access. An example of this topology is illustrated below:

Figure 5. Illustrating implementation and deployment of an n-tier application

This also promotes a top-down dependency model, whereby higher layers can only communicate with the layer above them; for example logic within the presentation layer cannot directly obtain data from the database via the data access layer; such queries must be via calls to the business layer.

Advantages of a layered architecture for realizing Separation of Concerns include:

Code complexity is reduced as is it organised within its area of concern. This allows application logic to be easier found and changed, with reduced risk of impacting other areas of the application. For example, making changes to the user interface may be performed with little or no change or regression testing being required for other application layers.
Such an architecture may even render it possible for an application’s entire user interface, or underlying database product to be replaced within an application, with little or no change being required to its business logic.

However, while the decomposition of an application into layers helps reduce its complexity, it operates at a high-level and does not necessarily include best practices for structuring code within the layers, or to align with artefact types modeled using Domain Driven Design. Each layer within a larger application can quickly become unwieldly and at risk of the ‘Big Ball of Mud’ as an application becomes larger if left unchecked without adoption of further decomposition and decoupling best practices and standards, such as Inversion of Control and _SOLID _by the development team. The Onion Architecture helps with the realization of these.

The Onion Architecture was first defined by Jeffrey Palermo [8] as a layered application dependency model, whereby outer layer components may be dependent on any of its lower-layer components, as illustrated in figure 6 below:

Figure 6: Layered application design with Onion Architecture

The architecture however exhibits the following differences to the n-tier layered architecture as described earlier:

It promotes Inversion of Control (IoC) to provide loose or interchangeable coupling of components. In this respect, each layer/circle within its model encapsulates internal implementation details and exposes an interface for outer layers to consume.
The inner components are comprised of domain entities and services as defined by our DDD tactical patterns to provide core business functionality. Also included are abstract interfaces, for example data access methods, as illustrated in the Application Core in figure 7. Their concrete methods however are implemented at the outermost infrastructure layer, using IoC. This is because the technology implemented for a database, or other external dependency such as HTTP Rest endpoints or technology-specific adapter should be agnostic to business domain components, and be able to be substituted with another product if needed. They may also be substituted with a mock component for automated testing purposes.
Application services are often implemented to provide an additional decoupled layer above domain services, for example to:
Serve as a proxy to requests to domain services, but with inclusion of additional functionality such as authentication/authorisation, or request/response object transformation, to satisfy system requirements. - Orchestrate calls to underlying domain services to meet specific use cases.
The Infrastructure layer typically includes:
- API servers such as REST API gateways, which manage and propagate externally-received requests to underlying application and domain service components.
- Presentation components, such as a web user interface, which may be dependent on calls to application and domain services (as well as API’s as described above).
- Repository components for accessing data stores; e.g. relational or NoSQL databases.
- Adapters for accessing external dependencies; for example a REST HTTP client or AWS SQS client, both with bespoke security, logging and error handling requirements included.
- Automated unit or integration tests. These can include dependency injection configurations which mock external dependencies, to provide fast and consistent test results within an isolated environment.

Details of IoC are out-of-scope of this article; further information of this and related SOLID principles which relate well to the technical implementation of The Better Store is well-described in R. Jansen’s web article: Implementing SOLID and the onion architecture in Node.js with TypeScript and InversifyJS [10]. The following class diagrams and sample code however provide an example of how InversifyJS may be used within our NodeJS+Typescript OrderService implementation to define interface bindings to concrete RestApi client class for our main application, and the same interface bindings to a mock Payment API client class for a corresponding automated test application.

Figure 7: Illustrating the use of a REST API Client adapter interface within the Application Core layer, and a choice of decoupled concrete classes within the Infrastructure layer which may be configured for the application via an IoC. For example, RestApiClient may be configured to be used by the application for production deployment; whereas a separate test build may be configured to use MockPaymentsApiClient to enable automated testing of the application independent of the Payments solution.

Using the Onion Architecture constructs described in the previous section, our Order microservice application architecture may be further refined as the following:

Figure 8: A high-level class diagram describing a potential Order microservice using the Onion Architecture, with swimlanes denoting its layers.

A complete AWS Serverless implementation of the architecture using Node.js with Typescript and Inversify for Dependency Injection will be described in the future Part Six article, however the following screenshot provides a taster of what we expect to come, in terms of its code scaffolding. Its code is available to view at: https://github.com/TheBetterStore/tbs-app-order.

Figure 9: Screenshot illustrating NodeJS+Typescript source code scaffolding to realise the described application architecture.

Coming next in Part Four: Building the Microservices Architecture with Cloud Native Design Patterns and AWS Services.

References

“Domain Driven Design, Tackling Complexity in the Heart of Software”, Evans, E., Addison-Wesley (2003)
“Patterns, Principles, and Practices of Domain-Driven Design”, Millett & Tune, Wiley & Sons (2015)
“Practical Event-Driven Microservices Architecture”, Rocha, H., Apress (2022)
“Building Microservices”, Newman, S., O’Reilly (2015)
“Domain Driven Design & Microservices for Architects”, Sakhuja, R., Udemy (2021)
“Microsoft Application Architecture Guide, 2nd ed”, Microsoft, _web _(2013)
“Big Ball of Mud”, Foote & Yoder (University of Illinois), _web _(1999)
“The Onion Architecture: part 1”, Palermo, J., _web _(2008)
“DDD, Hexagonal, Onion, Clean, CQRS, … How I put it all together”, Graca, H, _web _(2017)
“Implementing SOLID and the onion architecture in Node.js with TypeScript and InversifyJS”, Jansen, R., web (2018)
“Onion Architecture Let’s slice it like a Pro”, Kapoor, R., web(2022)

Disclaimer: The views and opinions expressed in this article are those of the author only.

Achieving organisation-scoped AWS Config compliance using Cloudformation Lambda Hooks

BrycePC — Tue, 17 Mar 2026 08:24:49 +0000

Overview

AWS Cloudformation Hooks is an existing AWS feature that helps ensure compliance of AWS resources when being created or updated in accounts as Infrastructure as Code (IaC) via Cloudformation or CDK , against an organization’s defined standards. Examples of such checks can include ensuring that RDS, S3 or other data storage resources are configured with at-rest encryption, that log groups have retention policies, EC2 instances are not publicly-exposed etc. These rules typically strongly-align with and promote AWS Well-Architected best practices.

AWS first introduced the ability to create custom Cloudformation hooks in 2022, using Java or Python code to allow organizations to also define their own rules. Implementation required the following steps:
i. Initialising an AWS Cloudformation Hooks project using the Cloudformation CLI.
ii. Implementing the hook’s handler logic to evaluate compliance of resources included within Cloudformation stacks.
iii. Packaging and registering the hook in Cloudformation.

The creation of these hook projects unfortunately are often not simple, where much boilerplate code is often required, and packaging and registering of hooks also requires further effort.

AWS however have recently (since November 2024) made available 2 new hook types which make these configurations easier, being:

Guard Hooks: These allow the specification of AWS Cloudformation Guard rules to set requirements for resources within Cloudformation templates. Pros: Uses an open-source Domain Specific Language (DSL) to define rules for simplicity, rather than needing to compile and package code to implement. They have no associated execution charges. Cons: Its own DSL may lack flexibility of logic, and require a learning curve in-contrast to a more-general language where in-house skills may already exist
Lambda Hooks: Custom logic for interrogating resources within a Cloudformation changeset or stack can also be implemented as lambda functions. Lambda hooks may then simply be configured to reference the lambda implementations. Pros: Provide further flexibility for implementing compliance logic in contrast to Guard Hooks, while also being simpler to implement and configure than Custom Hooks. Rule behaviour can be easily amended within the lambda function, without requiring reconfiguration of the hook in Cloudformation. Hook rules may also be implemented in any language that lambda supports. Cons: AWS does not charge for the hook, but standard lambda execution changes do apply.

While considering how Cloudformation hooks may assist with ensuring standards compliance with the previously-introduced “The Better Store” sample project, I have chosen to explore lambda Cloudformation hooks further here, and how they may be integrated with AWS Organizations/Stack Sets to automatically validate Cloudformation templates for all AWS accounts/regions within defined Organization Units (OU’s). This setup and its results are described next.

Solution Prerequisites
The following are required for implementing the organisation-level hooks for the example:

AWS Organizations has been configured, with the following being set on the Organizations Master Account:
- Within AWS Organizations/Services section of the AWS web UI, the service “AWS Account Management” is enabled.
- A separate ‘Tools’ account has been provisioned within the Organization, which will be used for performing organization-scoped Cloudformation deployments via StackSets.
- The “Tools” account has been configured as a ‘Delegated Administrator’ for Cloudformation stacksets within the AWS Organizations master account (via the Cloudformation Stacksets web UI).
- Target accounts which are to be automatically configured with Cloudformation hooks are provisioned within target Organization Units in AWS Organizations (e.g. ‘non-prod’ for all non-production workload accounts).
A deployment bucket is created within the tools account, to store built lambda artefacts and Cloudformation templates which deploy them. A bucket policy is defined which provides GetObject access to its objects, to all accounts defined within the AWS Organization.
A deployment user is configured in the “Tools” account, which has the following permissions:
- Write access to the above S3 deployment bucket.
- Cloudformation permissions
- AWS Organizations query permissions A user access key is also securely created and used for this to perform deployments, and an AWS user profile ‘thebetterstore-tools’ is configured to use these. NB it is expected a productionized process would perform deployments within a DevOps pipeline, and that IAM roles may instead be used, which are preferred over user access keys.
The AWS CLI is installed (examples assume we are using a Linux environment, though Windows can also be used with minor tweaks). N5. odeJS v20+ (the example will be implemented in Javascript, to target the NodeJS 20.x runtime)

Implementation

The following illustrations describe my target deployment architecture for lambda Cloudformation hooks in an organization, and the composition of components used:

Figure 1. High-level Lambda Hook Deployment via AWS Organizations / StackSets. Deployment of organization-specific Cloudformation StackSets has been delegated to a separate Tools account, which is also responsible for building the lambda hooks project and publishing these to the deployment bucket. A Hooks stackset automatically-deploys changes to the lambda hooks stack (tbs-devops-cfnhooks) across existing and/or new AWS accounts within specified target OU’s.

Figure 2: Illustrating the composition of deployed CfnHooks stacks, hook configuration scope, and the relationship of the hooks to target resources

where main components are implemented with the following solution scaffolding:

bin
— deploy-setup.sh
— deploy-stackset.sh
deploy-setup/
— template-setup.yaml
— template-stackset.yaml
tbs-devops-cfnhooks/
— cfnhooks-lambda/
— — app.js
— — package.json
— cfnhooks-loggroup/
— — app.js
— — package.json
— template.yaml

Further details of these are as follows:

A. Lambda Hooks Template (template.yaml)

This defines the CloudFormation hook resources, their corresponding lambda functions and required IAM roles to be implemented in target AWS accounts, as below:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  tbs-devops-cfnhooks

Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      Path: '/'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'lambda.amazonaws.com'
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"

  LambdaHookExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      Path: '/'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'hooks.cloudformation.amazonaws.com'
            Action:
              - sts:AssumeRole
            Condition:
              StringEquals:
                "aws:SourceAccount": !Sub ${AWS::AccountId}
              ArnLike:
                "aws:SourceArn": !Sub "arn:aws:cloudformation:${AWS::Region}:${AWS::AccountId}:type/hook/*"
      Policies:
        - PolicyName: !Sub ${AWS::StackName}-LambdaHookExecutionRole
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: lambda:InvokeFunction
                Resource:
                  - !GetAtt LambdaCfnHookFunction.Arn

  LambdaHook:
    Type: AWS::CloudFormation::LambdaHook
    Properties:
      Alias: Tbs::Devops::LambdaCfnHooks
      ExecutionRole: !GetAtt LambdaHookExecutionRole.Arn
      FailureMode: FAIL
      HookStatus: ENABLED
      LambdaFunction: !GetAtt LambdaCfnHookFunction.Arn
      TargetFilters:
        TargetNames:
          - AWS::Lambda::Function
        Actions:
          - CREATE
          - UPDATE
        InvocationPoints:
          - PRE_PROVISION
      TargetOperations:
        - STACK
        - RESOURCE
        - CHANGE_SET
        - CLOUD_CONTROL

  LogGroupHook:
    Type: AWS::CloudFormation::LambdaHook
    Properties:
      Alias: Tbs::Devops::LogGroupCfnHooks
      ExecutionRole: !GetAtt LambdaHookExecutionRole.Arn
      FailureMode: FAIL
      HookStatus: ENABLED
      LambdaFunction: !GetAtt LogGroupCfnHookFunction.Arn
      TargetFilters:
        TargetNames:
          - AWS::Logs::LogGroup
        Actions:
          - CREATE
          - UPDATE
        InvocationPoints:
          - PRE_PROVISION
      TargetOperations:
        - STACK
        - RESOURCE
        - CHANGE_SET
        - CLOUD_CONTROL

  LambdaCfnHookFunction:
    Type: AWS::Lambda::Function
    Properties:
      Architectures:
        - x86_64
      Code: cfnhooks-lambda/
      Handler: app.handler
      MemorySize: 256
      Role: !GetAtt LambdaExecutionRole.Arn
      Runtime: nodejs20.x
      ReservedConcurrentExecutions: 10
      Timeout: 10

  LogGroupCfnHookFunction:
    Type: AWS::Lambda::Function
    Properties:
      Architectures:
        - x86_64
      Code: cfnhooks-loggroup/
      Handler: app.handler
      MemorySize: 256
      Role: !GetAtt LambdaExecutionRole.Arn
      Runtime: nodejs20.x
      ReservedConcurrentExecutions: 10
      Timeout: 10

B. Sample Lambda code(cfnhooks-loggroup)
The following code illustrates implementation for a LogGroup inspection lambda hook, which checks to ensure that log groups are encrypted with a KmsKey, and that a retention period has been set:

export const handler = async (event, context) => {
  var targetModel = event?.requestData?.targetModel;
  var targetName = event?.requestData?.targetName;

  var response = {
    "hookStatus": "SUCCESS",
    "message": "LogGroup is correctly configured.",
    "clientRequestToken": event.clientRequestToken
  };

  if (targetName == "AWS::Logs::LogGroup") {
    let retentionInDays = targetModel?.resourceProperties?.RetentionInDays;
    let kmsKeyId = targetModel?.resourceProperties?.KmsKeyId;

    let errorMessage = ""
    if (!retentionInDays) {
      errorMessage += "LogGroup RetentionInDays must be present.\n "
    }
    if (!kmsKeyId) {
      errorMessage += "LogGroup KmsKeyId must be present.\n "
    }

    if(errorMessage) {
      response.hookStatus = "FAILED";
      response.errorCode = "NonCompliant";
      response.message = errorMessage;
    }
  }
  return response;
};

C. StackSet Template (deploy-setup/template-stackset.yaml):

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  This template creates an S3 bucket for storing built lambda artefacts for devops deployments, which may be accessed
  by other accounts within this AWS Organization

Parameters:
  TargetOrgUnitIds:
    Description: Target organization units for deploying solution
    Type: CommaDelimitedList

  StackSetName:
    Type: String
    Default: tbs-devops-cfnhooks-stackset

  TemplateUrl:
    Description: S3 URL of CF template which defined our SAM solution for lambda hooks
    Type: String

Resources:
  CfnLambdaHookStackset:
    Type: AWS::CloudFormation::StackSet
    Properties:
      AutoDeployment:
        Enabled: true
        RetainStacksOnAccountRemoval: false
      CallAs: DELEGATED_ADMIN # Required when deploying to delegated admin accounts for an org
      Capabilities: [CAPABILITY_IAM, CAPABILITY_NAMED_IAM]
      PermissionModel: SERVICE_MANAGED
      StackInstancesGroup:
        - Regions:
            - ap-southeast-2
          DeploymentTargets:
            OrganizationalUnitIds: !Ref TargetOrgUnitIds
      StackSetName: !Ref StackSetName
      TemplateURL: !Ref TemplateUrl

D. StackSet Deployment Bash Script

#!/bin/bash

stackName="tbs-devops-cfnhooks-stackset"
region="ap-southeast-2"
toolsAccountId="1234" # To replace with real value
targetOrgUnitIds="ou-234,ou-34534" # To replace with real values
deployBucket="lambdacfnhooks-${toolsAccountId}-deploybucket"
currentTime=$(date +"%Y%m%d%H%M%S")

cd ../tbs-devops-cfnhooks

# First we package our cfn-hooks lambda solution
aws cloudformation package --template-file template.yaml \
--s3-bucket $deployBucket --s3-prefix $stackName --region $region \
--output-template-file generated-template.yaml \
--profile thebetterstore-devopsaccount

# Next export our generated template to S3
aws s3 cp ./generated-template.yaml s3://$deployBucket/generated-template.yaml \
--profile thebetterstore-tools


# Next deploy the stackset, defined in IaC (Cfn) to our tools account, which will then manage deployment of the hooks to accounts
# within specified target OU's. NB use lastupdatedtime tag to ensure deployment/that changes are detected
cd ../deploy-setup
aws cloudformation deploy --template template-stackset.yaml --stack-name $stackName \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM --region $region \
--parameter-overrides TemplateUrl="https://$deployBucket.s3.ap-southeast-2.amazonaws.com/generated-template.yaml" \
TargetOrgUnitIds=${targetOrgUnitIds} \
--tags lastupdatedtime="$currentTime" \
--profile thebetterstore-tools

The end result from deployment should be the observation of the following in the AWS Cloudformation UI of target accounts:

Finally we can check our hook functionality by attempting to deploy a stack containing non-compliant resources. For this exercise, I attempted to deploy a new stack called tbs-devops-cfnhooktest in an AWS account belonging to a target Organization Unit, which was comprised of:

Resources:
  LogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: ANonCompliantLogGroup

i.e. containing a log group missing the required retention period and KmsKeyId properties.

Attempting to deploy this resulted in the following errors being thrown by our new Cloudformation LogGroup hook:

Conclusion

Traditional methods for helping to ensure compliance of automatically-deployed resources using Cloudformation/CDK in AWS have generally included developer training, code reviews, ‘shift-left’ guards in deployment pipelines, and detective controls using AWS Config and/or SecurityHub. It is however still very easy for non-compliant configurations to be missed and creep in, particularly in cases where guards or other controls may not be available for an organization’s internal standards such as naming conventions. Security, reliability, performance and maintainability of deployed solutions can all suffer as a result of standards not being met.

The use of Custom Lambda Hooks as preventative controls however look very promising to tackle this challenge; which offer great flexibility for implementing checks or naming conventions that may be required, and generation of custom and meaningful errors when there are issues. Furthermore their implementation and deployment across accounts within AWS Organizations proved fairly straight-forward in this exercise. Targeting Lambda Hooks deployments against Organization Units within AWS Organizations, using ‘Self-Managed’ StackSets also means that any new accounts created within these will also be automatically configured.

It may be that development OU’s should be targeted initially to ensure that compliance issues of stacks are caught during their maintence, and not for example during an emergency hotfix against production! However with the gradual implementation of the hooks across environments, hopefully AWS Config non-compliance alerts may become a thing of the past!

Code used in this post is also available via my public GitHub repository at: https://github.com/TheBetterStore/tbs-devpops-cfnhooks.

References

AWS CloudFormation Hooks concepts, AWS documentation (web)
Writing AWS CloudFormation Guard rules, AWS documentation (web)
AWS CloudFormation Hooks now support custom AWS Lambda functions, AWS documentation (web)
Activate trusted access for stack sets with AWS Organizations, AWS documentation (web)
Create a resource-based delegation policy with AWS Organizations, AWS documentation (web)
Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 1: Introduction to Microservice Architecture, Bryce Cummock (medium/web)

Disclaimer: The views and opinions expressed in this article are those of the author only.

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 2: Defining DDD Strategic Patterns

BrycePC — Tue, 17 Mar 2026 07:59:20 +0000

In Part One of Building the Better Store, I provided an introduction of an envisaged eCommerce website, and how Domain Driven Design (DDD) and its Strategic Patterns may be used to help compose a decoupled Microservices Architecture (MSA) that would promote its future change agility, scalability and fault-tolerance.

This article focuses on illustrating and using these patterns for design of The Better Store.

Introducing Domain Driven Design

Domain Driven Design (DDD) is a methodology initially defined in 2003 in Eric Evan’s popular and well-received book ‘Domain-Driven Design: Tackling Complexity in the Heart of Software’ [1]. It is aimed at decomposing the complexity of a Problem Space (that is, a business’s functional domain/capabilities that are requiring development) into a
domain model comprised of one or more decoupled and cohesive Bounded Contexts; being sets of well-organised definitions of objects and business rules that share the same ubiquitous language understood by both business stakeholders and technical staff, to offer easier interpretation and design of software. These Bounded Contexts define the problem’s Solution Space.

While DDD predates the advent of microservices by almost 10 years, its goals and defined patterns for designing systems as cohesive, decoupled and collaborative bounded contexts are strongly aligned with microservice architecture design principles, for implementing cohesive and decoupled services that realize advantages of change agility, resilience and scalability within cloud environments. As such, DDD has become a popular reference today when discussing microservice architectures.

Domain-driven design is aimed at promoting system design quality via the following methods:

A. Using Strategic Patterns to:

Distill the Problem Space, i.e. an organisation’s business capabilities, to identify features important to the business by:
- Using ‘Knowledge Crunching’ methods to extract relevant information and help identify and decompose the problem space into separate manageable subdomains.
- Defining a Shared/Ubiquitous Language (UL) for each subdomain that is understood by both business and technical stakeholders, which should subsequently be used for documentation and code artefacts (e.g. for the names of classes and their methods and attributes).
Produce a Solution Space comprising Bounded Contexts to demarcate and model solutions for identified subdomains. This should include a Context Map that defines relationships between the bounded contexts.

For example (with credit to Millett & Tune [3]):

Figure 1: Illustrating DDD Strategic Patterns for defining a Problem Space

Figure 2: DDD patterns for a corresponding Solution Space

B. Using Tactical Patterns to:

Create an effective Object-Oriented domain model for each Bounded Context, using defined patterns such as Entity, Aggregate, Value Object, Service, Factory and Repository.
Illustrate emerging patterns of domain events and event sourcing.

This article will next focus on application of Strategic Patterns for design of The Better Store. The application of tactical patterns will be the focus in my next article: Part 3: Defining DDD Tactical Patterns.

The Better Store & The Problem Space

As an initial step in the design of The Better Store, we would expect one or more appointed business domain experts to collaborate with the development team in ‘Knowledge Crunching’ sessions, using popular techniques such as ‘Event Storming’ for fast sharing and discussion of ideas for desired functional behaviour. Event Storming has been touted as having much success in recent years, for sharing ideas of system behaviour and scope between both technical and non-technical people [4].

The output of Event Storming workshops are highly-visual diagrams conveying envisaged business contexts and behaviours. From here, main use cases may be chosen that will become part of the ‘Core Domain’ for implementation of a Minimal Viable Product (MVP). More explicit specifications for these can then be defined using tools such as:

Behaviour-Driven Development (BDD) specifications to focus on the behaviour of the main use cases oriented around specific scenarios.
Class Responsibility Collaborator (CRC) cards to define potential structural elements from these, and the interactions between them.

The application of these techniques will be illustrated at a very high-level below for an MVP definition of The Better Store. Further reading is recommended for those that wish to learn more of the techniques; the scope for each of these, in particular Event Storming and BDD is large!

NB: the methodologies chosen here have been selected as examples for The Better Store as a new application. Other techniques would undoubtedly also come into play for other scenarios; for example, exploring published models for similar applications, or referencing existing documentation for a current application that is to be enhanced or refactored.

A. Identifying System Behaviour Using Event Storming
Alberto Brandolini first introduced this methodology in his book Introducing Event Storming [3], whereby analysis of a business process involves bringing together the development team and the right business process expert(s) from applicable “knowledge silos” within the business as participants in collaboration workshops. Within the workshops, participants are requested by the workshop mediator to use different colour-coded sticky notes to model a flow of domain events, commands and external systems over time (from left to right) i.e. temporal modelling on a clear wall space. Online collaboration tools such as Miro are also available to simulate this activity for teams where participants are in remote locations.

Event storming is often performed in a number of phases as defined and controlled by the mediator as appropriate for the system and/or goal; for example for a high-level design of The Better Store we will derive these from Brandolini’s ‘Big Picture Workshop’ configuration to use:

Phase 1. Chaotic Exploration

All participants add orange notes to represent expected domain events (verbs in the past tense), and lilac notes to represent questions or unknowns (specific discussions should be kept short at this time; our focus here is coverage).

Phase 2. Timeline Enforcement

Events are rearranged to restrict them to a single timeline. This should result in further discussions and help identify gaps or unknowns.

Phase 3. Commands, People and Systems

Blue notes are added to represent commands, which may be called by external users or systems to generate the events.
Pink notes are then added to represent external systems that may be used.
Finally, yellow notes with a User icon are added to represent people that interact with the system.

Phase 4. Explicit Walkthrough

Different narrators take the lead for describing the behaviour for different portions of the system, for further review by other participants.

Phase 5. Identifying Aggregates

We can use yellow notes to denote potential ‘Aggregates’; that is, system entities that have their own identity id, sub-elements and transactional lifecycle (we will be discussing these more in the next Tactical Patterns article). An example is Order, which will be required to have its own unique id for storage within a database, alongside its selected product items as attributes (which would also be persisted when an order is created or updated in a data store).

Phase 6. Problems and Opportunities

Additional time for participants to add further questions (lilac notes), or opportunities (green notes).

Phase 7. Wrap up

Ensure photos of the board are taken (and notes are kept if hard to ready from photos), to allow its referencing for analysis and modelling later.

The following provides an example output from a remotely-run (using Miro) Event Storming session for The Better Store.

Figure 3: Event Storming sample output for The Better Store

B. Defining Functional Behaviour with Behaviour Driven Development (BDD) Specifications

BDD is a software development process based on Test Driven Development (TDD), which focuses on defining explicitly functional scenarios and required behaviour, using its own ‘GWT’ (Given, When, Then) specification
language. This format is also intended to provide a language that is understood by both business and technical stakeholders for defining requirements, oriented around user stories, or ‘features’. Examples for “The Better Store” generated as BDD specifications may include:

Using this method of capturing requirements removes the ambiguity that traditional requirements documentation can result in while also focusing on the domain language. The generation of these can also help with formulation of a subdomain’s Ubiquitous Language [2].

C. Defining Functional Components with Class Responsibility Collaborator (CRC) cards
CRC Card modelling is an object-oriented technique that involves identifying the main system actors, e.g. from the Aggregates first deduced during Event Storming or entities noted in the BDD specifications above, and representing these visually for easy collaboration on separate cards.
Each CRC card should capture the following:

A class name, which represents a known concept within the domain and is easily understood by business and technical members (this will go into our Ubiquitous Language).
Class responsibilities.
Associated classes. A class often does not have sufficient information to fulfil its responsibilities and must collaborate with other classes to complete their task. Such collaboration may be either: i. a request for information from another class, or ii. a command to perform an action.

e.g.

Figure 4: CRC card examples for The Better Store

Distilling the problem space into subdomains

The above exercises, in particular Event Storming and CRC cards assist us with demarcating functional requirements into separate subdomains, identifying dependencies between them, and an initial Ubiquitous Language for each. As a final step in defining our problem space, we would like to further categorise these in order of importance to our business. The outcome of this exercise is to prioritise functionality for development focus, to ensure that tasks that provide the most competitive advantage receive the most attention.

Core Subdomains
These cover the most important part of the business that provides its competitive advantage. Identification of the core domain(s) here helps provide clarity of the software that should receive the greatest development focus. For The Better Store these have been identified as:

Supporting Subdomains
These provide supporting functions to the core domains. If possible, Commercial Off The Shelf (COTS) products should also be used.

Generic Subdomains
Generic subdomains provide common functionality that are not core to the business and could also be provided by COTS software (again freeing-up developers to focus on the core areas).

The following diagram summarises the subdomains identified for our problem space, and their dependencies:

Figure 5: High-level domain model

Defining the Solution Space

The Solution Space provides a model for realizing the needs of the requirements given in the Problem Space, for example by defining appropriate Bounded Contexts (BC’s) for each subdomain to implement. Each bounded context is also provided with its own Ubiquitous
Language; much of which should have been defined during analysis of the Problem Space for the belonging subdomains. In this way each bounded context is kept cohesive to a specific functional area.

Following the modelling of Bounded Contexts, the solution space also defines Context Mapping, using Integration Patterns to define the collaboration relationships between these. This is an important output which will highlight the degree of decoupling between services, and the shared knowledge required by the teams that own them.

DDD’s Integration Patterns are described below [1, 10]:

Figure 6: DDD integration patterns

where the patterns may be categorised and described below. Context Mapping annotations are also provided below; their usage is illustrated in a sample context mapping diagram for The Better Store that follows.

Symmetric Patterns: where 2 BC’s have related interdependencies.
- Separate Ways; BC’s are independent with no relationships between them. Teams can work at their own pace. This is an ideal scenario!
- Partnership; 2 BC’s are mutually dependent on each-other; i.e., they are tightly-coupled. Teams need to know the business models and UL of the other team. Changes to 1 BC need to be coordinated between teams. These are an anti-pattern and should be avoided where possible.
- Shared Kernel; A move to demarcate shared models used between BC’s. e.g., as a separate shared library and UL. This should be kept to a minimal number of contexts.
Asymmetric Patterns: where one BC is dependent on another. The dependent BC is termed Downstream (D), whereas the provider/host is termed Upstream (U). BC’s in D hence have knowledge of models in U BC’s.
- Customer-Supplier; An upstream BC exposes models specifically for the needs of a downstream BC; i.e., in a client/host relationship.
- Conformist; An upstream BC exposes models with no regards to any downstream BC. The downstream BC conforms to the upstream BC’s models.
- Anti-Corruption Layer (ACL); where a downstream BC is NOT conformist; an isolated transformation layer is used to protect the downstream BC from corruption, i.e. from using the upstream domain’s model. The ACL only has the knowledge of models from U and D to perform necessary mappings from U’s model to the downstream BC (D).
One-to-many Patterns
- Open-Host Service (OHS); the upstream provider/OHS offers common services to other BC’s. The downstream BC’s may choose to either conform to U, or use an ACL.
- Published Language (PL); The OHS provides a common language accepted by a downstream BC. These are denoted as OHS | PL for Upstream in a context mapping diagram. OpenAPI specifications for REST API’s are an example of these.

Note we want to avoid the ‘Big Ball of Mud’; this is the described anti-pattern which often results from unchecked growth of a monolith over time, where without practices code can become unstructured and very hard to extend and maintain.

A context mapping diagram for The Better Store may be modeled as:

Figure 7: A context map for The Better Store

Next Steps: Creating a Model Driven Design

The strategic patterns shown above have provided a high-level design of expected service decomposition as Bounded Contexts and the relations between them. Key outputs are:

Decomposition of bounded contexts; with each having its own Ubiquitous Language and business rules definitions.
Context mapping, to define required relationships between them, with maximum decoupling being in favour.

As can be seen, DDD is a broad and involved methodology, and there are many tools available for decomposing the problem space to define the required outputs. Its full application may be better recommended for working with large and complex domains that are difficult to manage with other techniques [11], however I believe many of the discussed methodologies here such as Event Storming, CRC cards, Ubiquitous Language definitions and references to decoupled integration patterns would prove valuable for defining microservice architectures for less complex domains also.

Coming next in Part Three: Defining DDD Tactical Patterns. This will explore the more technical and object-oriented modelling of bounded contexts to complete the solution space. We will also be drilling into the tactical patterns for the realization of cohesive and decoupled microservices for The Better Store.

References

“Domain Driven Design, Tackling Complexity in the Heart of Software”, Evans, E., Addison-Wesley (2003)
“Patterns, Principles, and Practices of Domain-Driven Design”, Millett & Tune, Wiley & Sons (2015)
“Introducing Event Storming”, Brandolini, A., LeanPub.com (2021) “Event Storming: Collaborative Learning for Complex Domains”, Rayner, P., 4. Talk in Saturn (2017), https://www.youtube.com/watch?v=vf6xoi2d9VE
“Practical Event-Driven Microservices Architecture”, Rocha, H., Apress (2022)
“Open Agile Architecture”, The Open Group, https://pubs.opengroup.org/architecture/o-aa-standard/index.html (2020)
“Implementing Domain Driven Design”, Vernon, V., Addison-Wesley (2013)
“Building Microservices”, Newman, S., O’Reilly (2015)
“Big Ball of Mud”, Foote & Yoder (University of Illinois), web (1999)
“Domain Driven Design & Microservices for Architects”, Sakhuja, R., Udemy (2021)
“Microsoft Application Architecture Guide, 2nd ed”, Microsoft, web (2013)

Disclaimer: The views and opinions expressed in this article are those of the author only.

Forem: BrycePC

AWS Devops Agent - AI-Based Incident Analysis Demo with "The Better Store"

Initial Setup

Demo: Incident Simulation and DevOps Agent Analysis

Devops Incident Response Analysis

Conclusion

References

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 1: Introduction to Microservice Architecture

Overview

An introduction to Microservices

Challenges of Microservices

References

Building “The Better Store” — Part 4: Implementing a Microservices Architecture with Cloud Native Patterns and AWS Services

Microservice Architecture Patterns

Application Patterns

Application Infrastructure Patterns

Infrastructure Patterns

Implementation

References

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 3: DDD Tactical Patterns and App Architecture

Introducing DDD Tactical Patterns

Example: Order Bounded Context

Application Architecture with Layering; Introducing Onion Architecture!

References

Achieving organisation-scoped AWS Config compliance using Cloudformation Lambda Hooks

Overview

Implementation

Conclusion

References

Building “The Better Store” an agile cloud-native ecommerce system on AWS — Part 2: Defining DDD Strategic Patterns

Introducing Domain Driven Design

The Better Store & The Problem Space

Distilling the problem space​ into subdomains

Defining the Solution Space​

Next Steps: Creating a Model Driven Design

References

Distilling the problem space into subdomains

Defining the Solution Space