<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Matt</title>
    <description>The latest articles on Forem by Matt (@matt0135).</description>
    <link>https://forem.com/matt0135</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838606%2F7a6fe9f0-82a2-4416-8d22-bd7cd565eab5.png</url>
      <title>Forem: Matt</title>
      <link>https://forem.com/matt0135</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/matt0135"/>
    <language>en</language>
    <item>
      <title>Beyond the Console: The Modern DevOps Guide to Architecting on AWS</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Sat, 11 Apr 2026 18:30:00 +0000</pubDate>
      <link>https://forem.com/matt0135/beyond-the-console-the-modern-devops-guide-to-architecting-on-aws-3aea</link>
      <guid>https://forem.com/matt0135/beyond-the-console-the-modern-devops-guide-to-architecting-on-aws-3aea</guid>
      <description>&lt;p&gt;The cloud landscape has changed dramatically over the last few development cycles. When I first started working with AWS, a lot of my day was spent clicking through the Management Console to provision resources or troubleshoot misconfigurations. Today, the role of a DevOps engineer looks completely different. We are no longer just the gatekeepers of infrastructure; we are the architects of internal developer platforms. &lt;/p&gt;

&lt;p&gt;Building on AWS today requires a mindset shift. It is about creating resilient, scalable systems that empower development teams to move faster without breaking things. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Shift to Platform Engineering
&lt;/h3&gt;

&lt;p&gt;Cloud engineering on AWS has evolved significantly from traditional sysadmin tasks. The days of logging into a terminal to manually tweak an EC2 instance or configure a database are long gone. Today, our focus is on building automated, self-healing systems. &lt;/p&gt;

&lt;p&gt;As DevOps engineers, we increasingly act as product managers for internal infrastructure. Our goal is to provide a reliable foundation that abstracts away the underlying complexity of AWS services. This shift toward platform engineering changes how we design, deploy, and maintain our cloud environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure as Code Maturity
&lt;/h3&gt;

&lt;p&gt;Writing Infrastructure as Code (IaC) is the absolute baseline for any serious cloud environment. Modern tools like Terraform, Pulumi, and AWS Cloud Development Kit (CDK) allow us to treat our VPCs, IAM roles, and EKS clusters exactly like application code. We use version control, require peer reviews, and run automated tests before infrastructure changes ever hit production.&lt;/p&gt;

&lt;p&gt;Consider a scenario where a company needs to duplicate their entire production environment in a new AWS region for disaster recovery. If the original infrastructure was built via manual console clicks, this process takes weeks of painful discovery. With a mature IaC setup, deploying a complete replica to a new region is often as simple as updating a region variable and triggering a CI/CD pipeline.&lt;/p&gt;

&lt;p&gt;This approach also introduces the power of automated security testing. We can run policy checks before a pull request is merged to catch misconfigurations early. This ensures that no one accidentally exposes an S3 bucket to the public internet or provisions an unencrypted DynamoDB table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling With Multiple Accounts
&lt;/h3&gt;

&lt;p&gt;A single AWS account works fine for a new project, but it quickly becomes a tangled web of permissions as a company grows. Moving to a multi-account strategy using AWS Organizations and AWS Control Tower is a massive operational leap. &lt;/p&gt;

&lt;p&gt;Structuring your AWS environment across multiple accounts provides several strict advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workloads are strictly isolated to limit the blast radius of security incidents.&lt;/li&gt;
&lt;li&gt;Service Control Policies (SCPs) enforce baseline security rules across the entire organization.&lt;/li&gt;
&lt;li&gt;Identity and Access Management (IAM) permissions become much easier to scope down to least privilege.&lt;/li&gt;
&lt;li&gt;Finance teams gain precise cost attribution based on account-level billing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of writing complex resource policies to prevent one team from modifying another team's Lambda functions, the account boundary provides strict isolation by default. This makes compliance audits much smoother and gives developers safe sandboxes to experiment in without risking production data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embedding Security and Compliance
&lt;/h3&gt;

&lt;p&gt;Security in AWS works best when it is embedded into every layer of the delivery process. Relying on manual security reviews at the end of a release cycle slows down development and frustrates engineers. Instead, security should be automated and invisible where possible.&lt;/p&gt;

&lt;p&gt;One major shift is moving away from static IAM access keys. By using OpenID Connect (OIDC) for CI/CD pipelines, tools like GitHub Actions can assume temporary IAM roles to deploy infrastructure. This eliminates the risk of long-lived AWS credentials being leaked in source code. &lt;/p&gt;

&lt;p&gt;Additionally, continuously checking your security posture with AWS Security Hub and Amazon GuardDuty provides automated threat detection. These tools act as an ever-watchful set of eyes, alerting the team to anomalous behavior like an EC2 instance communicating with a known malicious IP address.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making Cost an Engineering Metric
&lt;/h3&gt;

&lt;p&gt;AWS provides incredible flexibility, but leaving the meter running on unoptimized resources can quickly destroy an IT budget. Cloud cost optimization must be integrated directly into the engineering lifecycle rather than treated as an afterthought.&lt;/p&gt;

&lt;p&gt;Small architectural decisions compound heavily over time on AWS. For example, routing all internal microservice traffic through a public NAT Gateway can rack up thousands of dollars in data transfer fees. Swapping that architecture to use VPC Endpoints keeps the traffic internal, drastically reducing the monthly bill while improving security.&lt;/p&gt;

&lt;p&gt;Embracing managed services and compute optimization also drives down costs. Migrating workloads from standard x86 instances to AWS Graviton processors often yields immediate price-performance benefits. By enforcing strict tagging policies via AWS Config, teams can accurately trace these costs back to specific products or environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability Beyond the Basics
&lt;/h3&gt;

&lt;p&gt;Traditional monitoring relies on answering whether a server is up or down, but modern cloud-native applications require much deeper observability. Knowing that an API Gateway is returning 500 errors is only the first step in debugging an outage. Engineers need to know exactly which microservice, database query, or third-party API caused the failure. &lt;/p&gt;

&lt;p&gt;Implementing tools like AWS X-Ray or OpenTelemetry allows teams to trace a single user request across the entire system. You can watch a request travel through an Application Load Balancer, trigger a container in ECS, and query an Aurora database. When an alert fires in the middle of the night, having this deep context readily available reduces the mean time to recovery drastically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building for Developer Experience
&lt;/h3&gt;

&lt;p&gt;Ultimately, the goal of a modern DevOps practice on AWS is to get out of the developers' way safely. Infrastructure teams should not be a bottleneck for application deployments. We achieve this by focusing heavily on Developer Experience (DevEx) and creating "golden paths."&lt;/p&gt;

&lt;p&gt;Golden paths are pre-approved, standardized templates for common architectures. If a developer needs to deploy a serverless application, they shouldn't need to become an expert in API Gateway integrations and IAM execution roles. They should be able to consume a self-service module that handles the heavy lifting.&lt;/p&gt;

&lt;p&gt;By wrapping these self-service tools in automated guardrails, we ensure that every new deployment is secure, tagged correctly, and highly available by default. This approach keeps development velocity high while maintaining the strict reliability that enterprise environments demand.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>aws</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>Terraform State Files Explained: What They Are, Why They Exist, and Why They Scare Everyone</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:00:51 +0000</pubDate>
      <link>https://forem.com/matt0135/terraform-state-files-explained-what-they-are-why-they-exist-and-why-they-scare-everyone-4nfd</link>
      <guid>https://forem.com/matt0135/terraform-state-files-explained-what-they-are-why-they-exist-and-why-they-scare-everyone-4nfd</guid>
      <description>&lt;p&gt;If you have been using Terraform for more than a few months you have almost certainly done one of these things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accidentally committed a &lt;code&gt;terraform.tfstate&lt;/code&gt; file to git&lt;/li&gt;
&lt;li&gt;Seen a &lt;code&gt;terraform plan&lt;/code&gt; output that made no sense because the state was out of sync&lt;/li&gt;
&lt;li&gt;Watched a colleague delete a resource that Terraform then tried to recreate on the next apply&lt;/li&gt;
&lt;li&gt;Had a pipeline fail with "Error acquiring the state lock"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not beginner mistakes. They happen to experienced engineers who understand Terraform's syntax but have not fully internalised how state actually works under the hood.&lt;/p&gt;

&lt;p&gt;This post fixes that. By the end you will have a clear mental model of what the state file is, why it has to exist, what it actually contains, and where it breaks down.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why does Terraform need a state file at all?
&lt;/h2&gt;

&lt;p&gt;This is the question most tutorials skip and it is the most important one to answer.&lt;/p&gt;

&lt;p&gt;When you write a Terraform resource block like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_support&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;enable_dns_hostnames&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terraform needs to answer three questions every time you run &lt;code&gt;terraform plan&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does this VPC already exist in AWS?&lt;/li&gt;
&lt;li&gt;If it exists, does it match what I have declared?&lt;/li&gt;
&lt;li&gt;If it does not match, what needs to change?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To answer question 1, Terraform has two options. It could query the AWS API on every run and scan your entire account looking for a VPC with matching attributes. Or it could maintain a record of what it has already created and use that as a reference point.&lt;/p&gt;

&lt;p&gt;Querying the full AWS API on every run sounds appealing but it does not scale. AWS does not expose a universal "find me the resource that matches these attributes" API. Every resource type has a different API shape. Some resources require multiple API calls to fully describe. And many attributes look identical across resources (two VPCs can have the same CIDR block). Terraform would have no reliable way to know which VPC it created versus which one already existed before it ran.&lt;/p&gt;

&lt;p&gt;So Terraform keeps a state file. It is a record that maps every resource block in your configuration to a specific real-world resource ID in your cloud account. When Terraform creates your VPC it records the VPC ID (&lt;code&gt;vpc-0a1b2c3d4e5f&lt;/code&gt;) in the state file alongside every attribute it set. On the next run it reads the state file, calls the AWS API to fetch the current attributes of &lt;code&gt;vpc-0a1b2c3d4e5f&lt;/code&gt; specifically, and diffs the result against your configuration.&lt;/p&gt;

&lt;p&gt;The state file is not a cache of your cloud. It is a mapping between your Terraform configuration and real infrastructure. That distinction matters a lot.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is actually inside a state file?
&lt;/h2&gt;

&lt;p&gt;The state file is a JSON document. You should open one at least once in your career. Here is a condensed version of what a single VPC resource looks like inside it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"terraform_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.7.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"managed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"main"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"provider[&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;registry.terraform.io/hashicorp/aws&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"instances"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"schema_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"arn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:ec2:ap-south-1:123456789012:vpc/vpc-0a1b2c3d4e5f"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"cidr_block"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"enable_dns_hostnames"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"enable_dns_support"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vpc-0a1b2c3d4e5f"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"owner_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123456789012"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"main"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;id&lt;/code&gt; field is the anchor.&lt;/strong&gt; This is the AWS resource ID that Terraform uses to look up the real resource on every subsequent run. Without it Terraform cannot find the resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;All attributes are stored.&lt;/strong&gt; Not just the ones you declared. Terraform stores every attribute the provider returned after creation including computed attributes like &lt;code&gt;arn&lt;/code&gt; and &lt;code&gt;owner_id&lt;/code&gt; that you never wrote in your config. This is how it can detect drift on attributes you did not explicitly set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The schema version is tracked.&lt;/strong&gt; When a provider upgrades and changes its resource schema Terraform uses this to run state migrations automatically. This is why upgrading provider versions sometimes triggers state changes even when your infrastructure has not changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There is no history.&lt;/strong&gt; The state file is a snapshot of right now. There is no audit log of what changed or when. If you want history you need to enable versioning on your remote backend.&lt;/p&gt;




&lt;h2&gt;
  
  
  The three-way diff that terraform plan runs
&lt;/h2&gt;

&lt;p&gt;Understanding the state file properly means understanding the three-way diff that &lt;code&gt;terraform plan&lt;/code&gt; performs. Most people think of &lt;code&gt;plan&lt;/code&gt; as a simple "config vs cloud" comparison. It is not.&lt;/p&gt;

&lt;p&gt;It is actually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your .tf files  &amp;lt;--&amp;gt;  State file  &amp;lt;--&amp;gt;  Real AWS resources
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 1: Config vs State&lt;/strong&gt;&lt;br&gt;
Terraform compares your &lt;code&gt;.tf&lt;/code&gt; files against the state file. This tells it which resources were added, removed, or changed in your configuration since the last apply.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: State vs Cloud&lt;/strong&gt;&lt;br&gt;
For every resource that exists in the state file Terraform calls the AWS API to fetch its current real-world attributes. It compares these against what the state file recorded. Differences here indicate drift. Someone changed something outside of Terraform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Produce a plan&lt;/strong&gt;&lt;br&gt;
Terraform combines both diffs to produce the final plan. A resource might need to change because you edited the config, because it drifted from the state, or both.&lt;/p&gt;

&lt;p&gt;This is why a &lt;code&gt;terraform plan&lt;/code&gt; that shows unexpected changes is almost always one of two things: you changed the config intentionally, or something changed your infrastructure outside of Terraform.&lt;/p&gt;


&lt;h2&gt;
  
  
  Where the state file breaks down
&lt;/h2&gt;

&lt;p&gt;The state file model works well when everything goes through Terraform. It breaks down at the edges.&lt;/p&gt;
&lt;h3&gt;
  
  
  Drift from out-of-band changes
&lt;/h3&gt;

&lt;p&gt;The state file only knows what Terraform did. If someone opens the AWS Console and changes a security group rule, adds a tag to an EC2 instance, or resizes an RDS instance, the state file has no idea. The next &lt;code&gt;terraform plan&lt;/code&gt; will either flag it as drift (if Terraform manages that attribute) or silently ignore it (if it does not).&lt;/p&gt;

&lt;p&gt;This is not a bug. It is a fundamental consequence of the state-file architecture. The state file is not a real-time reflection of your cloud. It is a record of what Terraform last did.&lt;/p&gt;
&lt;h3&gt;
  
  
  The import problem
&lt;/h3&gt;

&lt;p&gt;If you have existing AWS resources that were not created by Terraform you cannot just write a resource block and run &lt;code&gt;terraform apply&lt;/code&gt;. Terraform will try to create a new resource because it has no record of the existing one in its state.&lt;/p&gt;

&lt;p&gt;You have to use &lt;code&gt;terraform import&lt;/code&gt; to manually associate the existing resource ID with the state file. This works but it is tedious, it requires you to know the exact resource ID, and you still have to write a perfectly matching configuration block or the next &lt;code&gt;terraform plan&lt;/code&gt; will show a diff and potentially modify your resource.&lt;/p&gt;
&lt;h3&gt;
  
  
  State file as a single point of failure
&lt;/h3&gt;

&lt;p&gt;If you are using local state (the default) and you lose your state file your Terraform configuration is now disconnected from your real infrastructure. Terraform does not know any of those resources exist. It will try to create duplicates on the next apply which will either fail (for resources that enforce uniqueness) or succeed and create a mess.&lt;/p&gt;

&lt;p&gt;This is why local state is appropriate only for learning and why every production Terraform setup needs a remote backend. But that is a topic for the next post.&lt;/p&gt;
&lt;h3&gt;
  
  
  Sensitive values in state
&lt;/h3&gt;

&lt;p&gt;Terraform stores sensitive values in the state file in plain text. If your configuration creates an RDS instance with a password or a Secrets Manager secret the state file will contain those values. This is a well-known issue with no perfect solution. Encrypting the remote backend and tightly controlling access to it are the minimum baseline.&lt;/p&gt;


&lt;h2&gt;
  
  
  The commands that touch state directly
&lt;/h2&gt;

&lt;p&gt;Most engineers learn &lt;code&gt;terraform plan&lt;/code&gt; and &lt;code&gt;terraform apply&lt;/code&gt; early. Fewer learn the state management commands that become essential when things go wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform state list&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Lists every resource tracked in the current state file. Useful for a quick audit of what Terraform knows about.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform state list
aws_vpc.main
aws_subnet.public
aws_internet_gateway.main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform state show &amp;lt;resource&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Shows the full recorded attributes of a specific resource. Useful for debugging drift or checking what Terraform thinks a resource looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform state show aws_vpc.main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform state rm &amp;lt;resource&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Removes a resource from the state file without destroying the real resource. Use this when you want Terraform to stop managing a resource. The resource continues to exist in AWS but Terraform forgets about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform state mv &amp;lt;source&amp;gt; &amp;lt;destination&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Moves a resource within the state file. The most common use case is renaming a resource or moving it into a module without destroying and recreating it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform import &amp;lt;resource&amp;gt; &amp;lt;id&amp;gt;&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Adds an existing AWS resource to the state file. This does not modify the resource. It just tells Terraform "this resource block in my config corresponds to this resource ID in AWS."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;terraform refresh&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Updates the state file to match the real state of your infrastructure. This is essentially step 2 of the three-way diff run in isolation. Useful when you suspect drift but do not want to run a full plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  What most engineers get wrong about state
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Treating the state file as a source of truth.&lt;/strong&gt; The state file is a reference point, not truth. The real state of your infrastructure is in AWS. The state file can be stale, partial, or corrupted. Any serious workflow accounts for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never opening the state file.&lt;/strong&gt; The state file is a plain JSON document. Reading it when something goes wrong is one of the fastest ways to understand what Terraform actually knows. Do not treat it as a black box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring drift until it causes an incident.&lt;/strong&gt; Drift accumulates quietly. A security group rule changed here, a tag modified there. None of it breaks anything immediately. Then someone runs &lt;code&gt;terraform apply&lt;/code&gt; in a pipeline and Terraform "corrects" the drift at the worst possible time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storing state locally in any shared environment.&lt;/strong&gt; The moment more than one person or one pipeline touches the same infrastructure, local state becomes a race condition waiting to happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  The key mental model
&lt;/h2&gt;

&lt;p&gt;Think of the state file as a &lt;strong&gt;marriage certificate between your Terraform configuration and your real AWS resources&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The certificate does not prove the marriage is in good shape right now. It just records that the marriage happened and identifies both parties. You still have to do the work of keeping the relationship healthy. And if you lose the certificate things get complicated fast.&lt;/p&gt;

&lt;p&gt;Everything about Terraform state management flows from this: the need for remote backends, the import problem, drift detection, the sensitivity around who can access state and when. Once you have this mental model the rest of Terraform's state behaviour starts to make sense.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>devops</category>
      <category>terraform</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
