Forem: Victor Martinez

AWS + Google Cloud: A Step Toward True Multicloud—or Just a Convenient Patch?

Victor Martinez — Wed, 03 Dec 2025 03:51:15 +0000

For years we’ve circled around the same debate:
Is it better to build multicloud architectures using agnostic technologies, or to fully embrace a single provider and optimize everything around it?

The argument is familiar.
On one side, you have cost efficiency and the power of deeply integrated cloud services.
On the other, you have resilience—especially when outages like the well-known AWS Virginia incidents can ripple across a massive portion of the internet.

Every re:Invent, my attention usually gravitates toward new features: better serverless capabilities, smarter managed services, or breakthrough technologies that unlock new design patterns.
But this year, the announcement that stood out the most wasn’t a new product.
It was the collaboration between AWS and Google Cloud to simplify multicloud networking.

At a high level, it looks like a straightforward agreement: improve private connectivity between the two clouds, reduce complexity, and remove some historical pain points that made multicloud networking feel like a chore. Anyone who has tried to stitch AWS ↔ Google Cloud manually knows exactly what that pain feels like.

However, the implications run deeper.

This partnership opens the door to a possible future where cloud systems become increasingly agnostic, where architectural decisions are driven by capabilities—not by the limitations of network plumbing between providers.

Yet there’s another way to read this move:
Maybe this isn’t the birth of an open, collaborative multicloud era, but rather a strategic patch designed to satisfy current multicloud customers while strengthening a two-provider alliance.
If that’s the case, we might be watching the formation of a new “default” recipe in enterprise architectures:
AWS + Google Cloud as the dominant multicloud pair, quietly pushing other clouds to the periphery.

Whether this collaboration becomes a turning point in cloud interoperability or simply a convenient handshake between two giants remains to be seen.

What is certain is that the market has been demanding simpler, more stable, and less painful multicloud experiences—and this announcement suggests that the providers have finally started listening.

AWS Lambda Durable Function: Is the Orchestration Problem Finally Solved?

Victor Martinez — Wed, 03 Dec 2025 03:33:25 +0000

The shift from monolithic architectures to microservices marked a historic inflection point in the evolution of cloud applications. When AWS Lambda arrived a few years later, it pushed this transformation even further by giving us a way to build extremely small, atomic services. A single request could now trigger a self-contained function that scaled independently, executed quickly, and required no infrastructure management. This simplicity changed how many teams approached software design.

Small transactional workloads became the new norm. The more we decomposed logic into granular functions, the more we gained: near-infinite scalability, cost efficiency, and an explosion of database services designed for serverless elasticity. Lambda freed developers from boilerplate infrastructure concerns and encouraged architectures where business logic lived in short, sharp bursts of execution.

Yet this victory came with a thorny side effect: orchestration complexity.

Once functions became tiny building blocks, connecting them grew harder. A single business process often required chaining together many functions, coordinating state transitions, handling retries, ensuring idempotency, and managing failure states across distributed components. In other words, reducing the complexity inside each function multiplied the complexity between functions.

For a while, AWS Step Functions emerged as the default answer. State machines offered a visual and declarative way to stitch Lambdas into workflows. It worked—up to a point. As real-world systems grew larger and more dynamic, developers ran into friction. State definitions became verbose, onboarding new engineers was difficult, and costs could rise quickly. Amazon itself acknowledged this challenge when Prime Video publicly described moving away from Step Functions due to cost and operational overhead.

AWS responded with improvements like Express Workflows, designed to reduce cost and increase throughput for high-frequency orchestrations. This softened some of the pain, but not the core tension: workflows were either too rigid or too expensive for many scenarios involving large-scale, fine-grained functional decomposition.

Now AWS is introducing Durable Function for Lambda, a fresh attempt to tackle the orchestration puzzle. The idea is simple but powerful: let developers write orchestrations in code, using familiar programming constructs, and let AWS handle state persistence, progress tracking, and retries behind the scenes. It is an effort to eliminate the complexity without abandoning the advantages of serverless decomposition.

The ambition here is bold. If durable orchestrators can become easy to adopt, cost-efficient, and flexible enough for a wide range of business processes, they might unlock the long-promised vision of cloud applications running almost entirely on Lambda—where containers become the exception, not the rule.

My current take is cautiously optimistic. Durable Function is a move in the right direction, but we are still a few years away from seeing an orchestration model that fully matches the maturity, ergonomics, and predictability required for large-scale, Lambda-only architectures. Serverless compute is no longer the challenge; distributed workflow management is. The platform that solves orchestration cleanly, seamlessly, and affordably will define the next era of cloud-native development.

Durable Function may be one more step toward that future—perhaps a meaningful one—but the journey isn’t over yet.

Modern Web Application Architecture on AWS: Patterns, Hosting, and APIs with GraphQL, WebSockets, and REST

Victor Martinez — Fri, 08 Nov 2024 16:05:17 +0000

Building web applications in the cloud is now standard, allowing developers to create robust, scalable, and easy-to-maintain systems. AWS offers a range of tools and architectures that simplify building and deploying these apps. In this post, we'll cover common architecture patterns, hosting options on AWS, and how to integrate with APIs like GraphQL, WebSockets, and REST to improve data access and interactivity.

Model-View-Controller (MVC): This traditional architecture is excellent for monolithic applications where the server handles most of the work. It's easy to start with but can become challenging to scale as the app grows.
Single-Page Applications (SPA): Using frameworks like Angular, React, or Vue, SPAs shift rendering to the client (browser), which reduces server load. However, they can be trickier to debug and deploy.
Server-Side Rendering (SSR): Technologies like Next.js and Nuxt.js combine server-side and client-side rendering, providing a modular setup that scales well.

AWS offers several hosting options to suit different architectures:

Static File Hosting: Use Amazon S3 and CloudFront to host static assets like HTML, JavaScript, and CSS. This is perfect for SPAs that only need to serve content without heavy processing.

Dynamic Applications with Containers or Serverless Functions:
Containers: These are ideal for more complex, stable applications that need controlled scaling. AWS allows you to set up scaling metrics and adjust resources, helping you manage costs effectively.
Serverless Functions (AWS Lambda): These functions automatically scale up or down and are cost-effective for apps with variable traffic. They're great for microservices, as AWS Lambda handles the infrastructure and scaling for you.

API Integrations with GraphQL, WebSockets, and REST
AWS makes using different API options that provide data management and real-time communication easy. Depending on your app's needs, you can choose from:

GraphQL on AWS: AWS AppSync allows your apps to use GraphQL to fetch only the data they need in a single request. This reduces server interaction and is perfect for SPAs or SSRs that need to streamline data access.

WebSockets: For apps that require real-time updates, such as live notifications or data feeds, WebSockets on AWS API Gateway offer a scalable, serverless solution that integrates easily with Lambda or containers.

REST (Representational State Transfer): REST is a widely used standard for building APIs, and AWS provides the API Gateway REST API to connect your app to other services or backend logic using standard HTTP operations (GET, POST, PUT, DELETE). REST is great for apps that perform structured operations like creating, reading, updating, and deleting data.

Benefits: REST APIs are simple, easy to understand, and compatible with most programming languages and platforms. They are ideal for apps that benefit from caching to speed up responses.

Scalability and Cost Management Strategies
Container Scalability and Cost Efficiency: AWS helps optimize costs through features like Spot Instances and various CPU architectures, perfect for stable workloads.
Cost Efficiency with Serverless: AWS Lambda only charges for what you use, making it a cost-effective choice for fluctuating workloads, allowing global deployment with minimal expenses.

Developer Tools and Resources
AWS offers tools like the Lambda Web Adapter, making deploying web apps easier on AWS Lambda. This simplifies migrating apps and breaking them down into microservices.

In summary, AWS provides a flexible set of tools and architecture patterns that, combined with APIs like REST, GraphQL, and WebSockets, enable developers to build interactive, scalable, and cloud-optimized applications. Your architecture, hosting, and API choice will depend on scalability needs, implementation complexity, real-time data requirements, and cost management.

The Journey to Multi-Region Infrastructure[2]: Implementing Disaster Recovery Patterns

Victor Martinez — Sat, 07 Sep 2024 01:32:54 +0000

In our previous post, we discussed the business implications of disaster recovery strategies. Let's investigate the technical aspects of implementing standard disaster recovery (DR) patterns. This post will focus on each pattern's architectural considerations, challenges, and implementation details.

1. Active/Passive

When it comes to the active/passive pattern, consider it to keep a complete backup of your production system ready to spring into action. The key here is maintaining a full copy of your data and application while patiently waiting in the wings.

To make this work, you must set up regular data synchronization processes. This typically involves database backups, but don't stop there. Consider implementing infrastructure-as-code practices to ensure you can quickly deploy your passive environment when needed. It's like having a well-rehearsed understudy ready to take the stage at a moment's notice.
However, this approach isn't without its challenges. Ensuring data consistency between your active and passive systems can be tricky.

You'll need to minimize data loss during failover, which means keeping a keen eye on your Recovery Point Objective (RPO). Automating the failover process is crucial to reduce manual intervention and potential human errors. Monitoring and testing are your best friends in this scenario. Implement regular backup integrity checks to ensure your understudy knows its lines.

Conduct periodic failover drills to validate your recovery procedures. Monitor your backup sizes and transfer times. This will help you optimize your Recovery Time Objective (RTO) and ensure you can return to recovery when disaster strikes.

2. Active/Active

Moving on to the Active/Active pattern, we discuss deploying your applications in two or more active regions. It's like having multiple stages for your performance, each capable of handling the whole show.
To achieve this, it would be best to implement a load balancer or global traffic manager for request routing. Think of it as your stage manager, directing the audience to the correct performance. You'll also need to set up bidirectional data replication between your active systems to keep everything in sync.

The technical challenges here are more complex. Ensuring data consistency across all active systems is like keeping multiple simultaneous performances perfectly synchronized. You'll also need to manage application versions and compatibility across regions, which can feel like coordinating costume changes across different time zones.
When it comes to scaling, think asymmetrically. You might want to design for a 70/30 or 80/20 traffic split between your primary and secondary regions.

Implement auto-scaling in your secondary regions to handle failover scenarios smoothly. And don't forget to consider multi-tenant architectures for efficient resource utilization - it's like optimizing your theatre seating for different types of performances.

Routing (Multi-region/Multi-cloud)

The Routing pattern takes things global. Here, you're deploying your applications across multiple cloud providers or regions. It's like taking your show on a world tour, performing in different venues worldwide.
You must implement global traffic management with intelligent routing to make this work. It's not just about directing traffic anymore; it's about understanding the nuances of each location and making intelligent decisions about where to send your customers.

The challenges here are significant. You'll manage complex deployment pipelines across multiple environments, like coordinating opening nights in different countries simultaneously. Implementing efficient cross-region or cross-cloud data synchronization is crucial, and you'll need to ensure consistent application performance across diverse infrastructures.
Monitoring and observability become even more critical in this scenario. Implement distributed tracing across regions to track your global performance. Set up centralized logging and monitoring solutions to give you a birds-eye view of your entire operation. You might even need to develop custom metrics for cross-region performance and availability to understand how your global system is performing truly.
Implementation Strategies
Your approach to data synchronization will depend on your chosen pattern. For Active/Passive setups, consider using database replication tools or backup/restore mechanisms. If you're going for Active/Active or Routing patterns, you'll want to implement real-time data replication or, eventually, consistent models, depending on your application requirements.

Your deployment processes need to be rock-solid. Utilize blue/green deployment strategies for zero-downtime updates. It's like changing the set without interrupting performance. Implement canary releases for gradual rollouts across regions, letting you test the waters before diving in fully. Always have robust rollback procedures in place for multi-region deployments. Think of it as your safety net when things don't go according to plan.
Testing and validation are non-negotiable. Automate your failover and failback processes to eliminate human error. Implement chaos engineering practices to validate your system's resilience. It's like stress-testing your performance under the most challenging conditions. And don't forget to conduct regular cross-region disaster recovery exercises. Practice makes perfect, after all.

Technical Considerations Before Implementation
Before you embark on implementing these patterns, there are several critical technical considerations to keep in mind.
First, take a good, hard look at your application architecture. Evaluate how stateless your current application is and how tightly coupled your data is. You might want to consider refactoring towards microservices for improved modularity. It's like breaking down a complex orchestral piece into its individual instrument parts for easier management.
Data management is another crucial aspect. Assess your data volume and change rates to determine the optimal replication strategies. For multi-region active systems, you might want to consider eventual consistency models. It's a balancing act between data freshness and system performance.

Your network architecture needs careful planning, too—design for low-latency inter-region connectivity to keep your global system responsive. Implement secure VPN or direct connect solutions for data transfer to keep your information safe as it travels across regions.

The Journey to Multi-Region Infrastructure: Understanding Availability and Business Needs

Victor Martinez — Thu, 29 Aug 2024 03:13:40 +0000

When I decided to write about implementing a multi-region strategy, I quickly realized that this topic is far too complex to cover in a single blog post. Therefore, I've started a series of posts explaining the entire process for companies to achieve a successful infrastructure project with the highest possible availability in the cloud.

To begin this extensive process, we need to start from the business perspective and understand why we need greater availability. It is crucial to recognize that multi-region implementation is not a business or technological goal. Talking about multi-region means increasing the complexity of the platform, its operation, and consequently, the cost of the platform to solve a higher need: availability.

Let's shift the conversation to availability. Before diving into multi-region implementation, I recommend following this flow:
Measuring availability involves various metrics that give us a 360-degree view of the organization's state. However, these metrics can be complex to measure and even more challenging to understand within an organization. An organization should focus on mastering two key metrics: Mean Time to Recovery (MTTR) and Service Level Agreement (SLA).

When we talk about the availability percentage, we're referring to the SLA offered to our customers in the contract and the availability percentage we measure on our platform. These two values are intimately connected. To define SLAs within the company's offering, we must first separate the organization's domains or products. It's almost impossible to talk about all systems in a single value. Depending on the company's complexity, they can be divided geographically or by product (system).
For example, if we have product A with 99% availability and product B with 96%, and we offer our customers an average of 97.5%, they will expect a 2.5% impact. This becomes a problem we'll have to justify when an incident brings us down to 96%. We can't simply respond to a customer with something like, "It's the average of all services." Additionally, clarifying availability in domains or products is not new and can be observed in both value propositions and status pages of major cloud providers:

AWS Lambda SLA
Microsoft Online Services SLAs

Distributing SLAs is normal practice for companies with clearly defined products, but it can be challenging in some cases when the company has yet to establish a product portfolio.

Mean Time to Recovery (MTTR) is another crucial point. It establishes our offer to restore service in extreme cases, such as natural disasters or significant physical or cloud infrastructure failures. In these scenarios, we're talking about major failures like a data center without power, internet outrage, failure of the entire information storage layer due to cloud provider issues, latency exceeding our response time between provider services, etc.

This time can be understood as the time to restore operational capabilities in the worst-case scenarios, from recovering information to the time for domain services to propagate the new public IP address.
Once these two points are established, we can discuss high availability schemes and potential architectural challenges in each scenario. But I'll leave that for the next blog post on architectural changes.

In conclusion, before jumping into multi-region implementation, it's crucial to understand your business needs, define clear SLAs for each product or domain, and establish realistic MTTR goals. These foundational steps will guide your journey towards a more resilient and available infrastructure.

Stay tuned for the next post in this series, where we'll discuss the architectural changes necessary to implement a multi-region strategy.

Apply critical path on microservices architectures.

Victor Martinez — Wed, 22 Mar 2023 19:33:09 +0000

The Critical path method is a common step in project management. It is handy for a large or complex project because it keeps our focus and reduces the risk of project delays. Is it possible to use these project practices in software architecture?

One of the goals of every software engineering team is to provide high availability. In other words, be up how much you can. So here is the opportunity to talk about the "Critical path." So let's transform this concept into good practice.

"Critical" means “most important or sensitive step” with it in mind. In a microservice architecture, we can identify the essential services for our customers. A simple example is a search service on a classified website or payment modules in e-commerce. Those services usually are the most transactional and require our users to go online. On the other hand, services that use queues and batch processes typically aren't critical. A good example is an invoice microservice; anybody loses their mind for receiving a receipt late.

To put this into practice, you can follow these three guidelines:

Keep it absurdly simple. We must scale these good practices and brake a couple of rules it is necessary. So, use only a few services, technologies, and program languages because everything increases the risk of failure.

Last out, first priority. However, every new technology or update should start on other services, and more importantly, you must continually update your critical path services.

Fewer microservices. Microservices mesh can produce a high level of Reusability. However, this increases the number of critical services. So a microservice bigger than the average could mean reduced communications and fewer essential services for us.

In summary, identifying our critical path is a great way to reduce failure and increase availability. As a result, we can keep the focus on our most important service and create a safe way to innovate. Remember:

If everything is important, then nothing is."

― Patrick Lencioni.