DEV Community

Cover image for Orchestrating Complex Serverless Workflows on AWS
5 1 1 1 1

Orchestrating Complex Serverless Workflows on AWS

TL;DR: Just linking Lambda functions makes your app hard to manage and easy to break. AWS Step Functions help you control steps in your app with built-in error fixing and easy tracking. AWS EventBridge lets parts of your app send messages (events) to each other without being directly connected.
Pattern 1: Use Step Functions to run long tasks in the background while your app stays fast.
Pattern 2: Use EventBridge to start jobs automatically when something happens, like a new customer signing up.
These tools make your serverless app easier to grow, fix, and keep working well.

Table of Contents

  1. Introduction
  2. Why Do Orchestration and Events Matter?
  3. AWS Step Functions - Your Workflow Manager
  4. AWS EventBridge – Your Serverless Event Bus
  5. Pattern 1 - Asynchronous API Processing with Step Functions
  6. Pattern 2 - Event Driven Workflow Triggering with EventBridge
  7. Practical Tips
  8. Taking the Next Step
  9. References

Introduction

So, you've learned how to use AWS Lambda. You can create functions, call them using API Gateway, and save data in DynamoDB. That’s great! But what happens when your app starts getting bigger and more complex?

When one user action needs to do many things, like calling different services, handling errors well, and making sure everything happens in the right order, just linking Lambda functions can get messy. It can feel like a game of pinball, where you lose track of what’s happening.

When you try to handle state, retries, and errors across multiple Lambda functions, things get hard. You also need to see what’s going on when a process has many steps. That’s where the real power of AWS serverless tools helps.

Two tools are especially useful here: AWS Step Functions and AWS EventBridge.

  • EventBridge acts like a message system that lets different parts of your app (and other services) send and receive events without directly calling each other. This keeps your app flexible and able to handle changes or failures better.

  • Step Functions lets you create a visual workflow that shows the steps and how they connect, like a flowchart for your app.

This guide helps you go beyond basic Lambda.

We will look at two practical patterns using Step Functions and
EventBridge. These patterns help you build stronger, easier-to-maintain, and more scalable serverless applications on AWS.

Why Do Orchestration and Events Matter?

Before we go into Step Functions and EventBridge, let’s talk about why these tools are important when your serverless apps grow.

Imagine you’re building a multi-step order system with just Lambda functions calling each other:

  1. ProcessOrder gets the order.
  2. It calls ValidateInventory.
  3. If inventory is fine, it calls ProcessPayment.
  4. If payment works, it calls ShipOrder.
  5. But if something fails, what do you do? Roll back? Tell the user? Retry?
  6. How do you know which step is running? Or if it’s finished?
  7. If ProcessPayment takes a long time, does the first function just wait and risk timing out?

Chaining Lambdas like this makes them too dependent on each other. Handling errors and tracking the process becomes messy. This problem is called the Lambda Pinball anti-pattern, where your logic jumps around from function to function like a pinball in a machine.

Direct chaining ties functions too closely. The system becomes fragile. Error handling spreads across different functions, making it hard to manage. Keeping track of the whole process gets tricky. People call this the "Lambda Pinball" anti-pattern.

Lambda Pinball Anti-Pattern Diagram

This is where orchestration and event-driven patterns help a lot:

  • Orchestration (Step Functions): It gives you one place to define and manage the workflow. Step Functions keep track of state between steps, handle retries and errors, and let you see what’s happening.

  • Event-Driven (EventBridge): It separates services. Instead of calling each other directly, functions send events like OrderPlaced. Other services listen for events like OrderPlaced and act on them. This makes the system stronger, if one service is down, the others can still work. It’s also easier to add new features, since you don’t have to change existing services to add a new one that listens to the same event.

Using Step Functions for workflows and EventBridge for events helps you build serverless systems that are easier to manage, grow, and handle failures.

AWS Step Functions - Your Workflow Manager

Using Step Functions for workflows and EventBridge for events helps you build serverless systems that are easier to manage, grow, and handle failures.

Think of AWS Step Functions as a tool to design and run workflows. You define the steps using JSON in the Amazon States Language. This setup creates a state machine. A system that controls how each step runs, keeps track of the current step, and handles errors and retries for you.

Basic Step Functions Workflow Diagram

Key Benefits:

  1. Automatic State Management: Step Functions keeps track of data between steps, so you don’t have to pass or store it manually.

  2. Built-in Error Handling: You can set rules to retry on temporary errors or catch specific errors right in the workflow, making error handling easier and centralized.

  3. Supports Long Tasks: Workflows can run up to a year, perfect for things that take a long time or need human input much longer than Lambda timeouts.

  4. Run Steps in Parallel: You can run several tasks at the same time and wait for all or some to finish before moving on.

  5. Direct AWS Service Calls: Step Functions can call many AWS services directly, like Lambda, SQS, DynamoDB, and others,no extra code needed for simple calls.

  6. Clear Visibility: You get a visual view in the AWS console showing each step’s input, output, and errors, which helps a lot with debugging.

Here's a conceptual snippet of what a state machine definition might look like:

{
  "Comment": "A simple example of a Step Functions state machine",
  "StartAt": "ValidateInput",
  "States": {
    "ValidateInput": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ValidateLambda",
      "Next": "ProcessData"
    },
    "ProcessData": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ProcessLambda",
      "Retry": [{
        "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
        "IntervalSeconds": 2,
        "MaxAttempts": 3,
        "BackoffRate": 2
      }],
      "Catch": [{
        "ErrorEquals": ["States.TaskFailed"],
        "Next": "NotifyFailure"
      }],
      "End": true
    },
    "NotifyFailure": {
      "Type": "Task",
      "Resource": "arn:aws:sns:us-east-1:123456789012:MySNSTopic",
      "End": true
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This simple example shows defining states (ValidateInput, ProcessData, NotifyFailure), linking them (Next), and adding retry/catch logic.

AWS EventBridge – Your Serverless Event Bus

Step Functions manages workflows you define, but EventBridge handles events you might not know about yet. It works like a central hub where events from AWS services, your apps, or external SaaS tools flow through and get routed to the right places automatically.

Basic EventBridge Event Bus Diagram

Key Benefits:

  1. Decoupling: Event producers don’t need to know who will handle the event, and handlers don’t need to know who sent it. They just send or listen for events. This makes your system more flexible and stronger.

  2. Content-Based Filtering: You can set rules to catch only certain events based on what’s inside them.

  3. Flexible Routing: One event can trigger many targets like Lambda, Step Functions, SQS, and more.

  4. Many Event Sources: EventBridge works with over 90+ AWS services and many SaaS tools. You can react to things like new S3 files, DynamoDB changes, or partner events from tools like Datadog.

  5. Schema Registry: Store and share event formats so teams understand them better and can even generate code for handling events.

Example:

Say users upload images to an S3 bucket. Instead of making S3 call your image processor directly, you can:

  1. Set S3 to send ObjectCreated events to EventBridge.
  2. Create a rule that listens only for .jpg or .png files in certain folders.
  3. Set the rule’s target to your image processing Lambda or a Step Functions workflow for more steps.

Now, the S3 upload and image processing are separate. You can add more rules to send the same event to other services, like notifications or audits, without changing S3 or the processing function. This keeps your system flexible and easier to update.

Pattern 1 - Asynchronous API Processing with Step Functions

Sometimes, your API needs to start a task that takes a long time but still respond quickly to the user.

Example: A user asks for a detailed report that could take minutes to create.

In this case, the API starts a Step Functions workflow to handle the long process in the background and immediately returns a response saying the request is received. The workflow runs the report generation without making the user wait.

Architecture:

Asynchronous API Pattern Diagram

  1. The client sends a POST request to /generate-report through API Gateway.

  2. API Gateway starts the Step Functions workflow directly or via a quick Lambda.

  3. The workflow begins with the client’s input.

  4. API Gateway immediately sends back a 202 Accepted response with the workflow ID so the client can check progress later.

  5. The Step Functions workflow runs these tasks:

  • Validate input with a Lambda.
  • Query data using Lambda or Fargate.
  • Format the report with Lambda.
  • Save the report to S3 with Lambda.
  • Optionally notify the user via Lambda or SNS when done or if it fails.

Why this helps:

  • The client doesn’t wait for the whole report to finish.
  • Step Functions handles retries and errors automatically.
  • The API stays light and scalable, while the heavy work runs separately.

Pattern 2 - Event Driven Workflow Triggering with EventBridge

You can use events from different sources to automatically start complex workflows.

Example: When a new customer signs up and their info is added to a DynamoDB Customers table, start an onboarding workflow with multiple steps.

Architecture:

Event-Driven Workflow Diagram

Here’s a simple breakdown:

  1. The Customers DynamoDB table has Streams enabled to track changes.
  2. A Lambda function listens to the DynamoDB Stream and gets batches of changes.
  3. For each new customer (INSERT), the Lambda creates a custom event with the customer data and sends it to a custom EventBridge event bus.
  4. An EventBridge rule listens for events with source: myapp.customers and detail-type: CustomerCreated.
  5. The rule triggers a Step Functions workflow for onboarding.
  6. The Step Functions workflow runs steps like:
  • Add customer to CRM.
  • Send a welcome email.
  • Provision resources for the customer.

Why this works:

  • Customer creation is separated from onboarding logic.
  • The system reacts automatically to new customers.
  • You can add more rules or workflows easily without changing the original services.

Bonus:
You might connect DynamoDB Streams directly to Step Functions using EventBridge Pipes, skipping the Lambda if no event filtering or transformation is needed.

Practical Tips

  1. Cost Models:
    Step Functions Standard charges per state transition. Express charges based on how long it runs and how many times it’s called often cheaper for many short tasks.
    EventBridge charges per event sent to custom or partner event buses and per target invoked. AWS service events are usually free.

  2. Observability:
    Use CloudWatch Logs inside your Lambdas. Turn on AWS X-Ray tracing for Lambda and Step Functions to see the full flow of requests. Set up CloudWatch Metrics and Alarms to track failures and queue depths.

  3. Standard vs Express Workflows:
    Use Standard for long, reliable workflows (up to 1 year) where exactly-once matters.
    Use Express for fast, high volume, short tasks (under 5 minutes) where it’s okay if tasks run more than once and cost is a priority.

  4. Error Handling:
    Use Step Functions’ Retry blocks to handle temporary problems like network issues. Use Catch blocks to handle specific errors and run clean-up or notification tasks.

  5. Idempotency:
    Because events might arrive more than once, make sure tasks can safely run multiple times with the same input without causing problems. Check if the work is already done before acting.

Taking the Next Step

Using Step Functions for orchestration and EventBridge for event-driven workflows lets you build more powerful, scalable, and reliable serverless apps. The examples of asynchronous API handling and event-triggered workflows show just how these services solve real challenges.

Once you understand these, you can design complex systems that are easier to manage and adapt to changing needs.

Try implementing one of these patterns yourself. Explore the extensive
AWS Serverless Patterns Collection for more inspiration and ready-to-deploy examples. And most importantly, share your experiences and questions in the comments below. Let's learn together :)!

References

ACI image

ACI.dev: The Only MCP Server Your AI Agents Need

ACI.dev’s open-source tool-use platform and Unified MCP Server turns 600+ functions into two simple MCP tools on one server—search and execute. Comes with multi-tenant auth and natural-language permission scopes. 100% open-source under Apache 2.0.

Star our GitHub!

Top comments (0)

Create a simple OTP system with AWS Serverless cover image

Create a simple OTP system with AWS Serverless

Implement a One Time Password (OTP) system with AWS Serverless services including Lambda, API Gateway, DynamoDB, Simple Email Service (SES), and Amplify Web Hosting using VueJS for the frontend.

Read full post