<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Yogesh VK</title>
    <description>The latest articles on Forem by Yogesh VK (@yogesh_vk).</description>
    <link>https://forem.com/yogesh_vk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3821385%2Fce580426-0152-47ef-a0df-c1df7d4f33bb.png</url>
      <title>Forem: Yogesh VK</title>
      <link>https://forem.com/yogesh_vk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/yogesh_vk"/>
    <language>en</language>
    <item>
      <title>Why AI and Automation Are Not Always the Right Answer in DevOps</title>
      <dc:creator>Yogesh VK</dc:creator>
      <pubDate>Fri, 10 Apr 2026 06:00:00 +0000</pubDate>
      <link>https://forem.com/yogesh_vk/why-ai-and-automation-are-not-always-the-right-answer-in-devops-18ch</link>
      <guid>https://forem.com/yogesh_vk/why-ai-and-automation-are-not-always-the-right-answer-in-devops-18ch</guid>
      <description>&lt;p&gt;In DevOps, AI, and platform engineering, speed without understanding can amplify failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Every engineering team reaches this moment sooner or later.&lt;/p&gt;

&lt;p&gt;A repetitive task appears. A process feels slow and folks just say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can we just automate this?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On the surface, it sounds like the right instinct. DevOps, after all, was built on the idea of reducing manual effort and increasing reliability through automation.&lt;/p&gt;

&lt;p&gt;But in my experience, automation is not always the right first answer.&lt;/p&gt;

&lt;p&gt;Sometimes the process is slow because it contains necessary human judgment. Sometimes the repetition is actually a signal that the underlying system design needs improvement. And increasingly, with AI entering DevOps workflows, there is a temptation to automate decisions that should still remain human.&lt;/p&gt;

&lt;p&gt;The question is not whether something can be automated. The more important question is whether it should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automation Often Scales Existing Problems
&lt;/h2&gt;

&lt;p&gt;One of the most common mistakes teams make is automating a broken or poorly understood workflow.&lt;/p&gt;

&lt;p&gt;A manual deployment process may feel slow and frustrating. But if the underlying release steps are unclear, automating them simply means the same confusion now happens faster and at larger scale.&lt;/p&gt;

&lt;p&gt;In infrastructure systems, this can be dangerous.&lt;/p&gt;

&lt;p&gt;A pipeline that automatically pushes Terraform changes into production may look efficient, but if reviewers do not fully understand the blast radius of those changes, automation simply accelerates risk.&lt;/p&gt;

&lt;p&gt;The result is not better engineering. It is faster failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Difference Between Repetition and Judgment
&lt;/h2&gt;

&lt;p&gt;Not every repetitive task is suitable for automation.&lt;/p&gt;

&lt;p&gt;Some tasks are repetitive because they are operationally necessary. Others require human context and judgment, even if the steps appear similar.&lt;/p&gt;

&lt;p&gt;For example, reviewing a Terraform plan may seem repetitive. But what the reviewer is actually doing is not checking syntax. They are making decisions about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operational risk&lt;/li&gt;
&lt;li&gt;rollback impact&lt;/li&gt;
&lt;li&gt;customer-facing downtime&lt;/li&gt;
&lt;li&gt;security implications
That is not repetition. I think that is judgment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Automating the process without preserving that judgment layer often removes the most valuable part of the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Makes This Even More Tempting
&lt;/h2&gt;

&lt;p&gt;The rise of AI in DevOps workflows makes this challenge even more relevant. AI tools can now can do all of these and more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarize pull requests&lt;/li&gt;
&lt;li&gt;explain Terraform plans&lt;/li&gt;
&lt;li&gt;analyze logs&lt;/li&gt;
&lt;li&gt;suggest infrastructure changes
These are genuinely useful capabilities. But there is an important boundary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI should help engineers understand systems better. It should not replace ownership of decisions.&lt;/p&gt;

&lt;p&gt;For example, using AI to explain a Terraform plan is helpful. Using AI to automatically approve and apply infrastructure changes is often the wrong answer. The operational responsibility still belongs to humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Good Automation Removes Toil, Not Thinking
&lt;/h2&gt;

&lt;p&gt;The best automation removes toil, not thought. Good examples include some of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automatic formatting checks&lt;/li&gt;
&lt;li&gt;CI validation pipelines&lt;/li&gt;
&lt;li&gt;policy enforcement&lt;/li&gt;
&lt;li&gt;environment cleanup schedules&lt;/li&gt;
&lt;li&gt;cost anomaly alerts
These tasks are repetitive, rules-driven, and low in ambiguity. They benefit greatly from automation. What should remain human are workflows involving uncertainty, trade-offs, and accountability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Better Question to Ask&lt;br&gt;
Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can we automate this?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What part of this process is pure toil, and what part requires human judgment?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction changes everything. Once you separate toil from judgment, automation becomes much safer and much more effective.&lt;/p&gt;
&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;In DevOps and platform engineering, automation is incredibly powerful.&lt;/p&gt;

&lt;p&gt;But the goal should never be automation for its own sake. The goal is better systems in my opinion.&lt;/p&gt;

&lt;p&gt;Sometimes that means automation. Sometimes it means improving engineers and systems understanding first.&lt;/p&gt;

&lt;p&gt;And increasingly, with AI in the mix, it means being very deliberate about what we allow machines to decide on our behalf.&lt;/p&gt;

&lt;p&gt;Because not every slow process is a bad one. Some of them are where engineering judgment actually lives..&lt;/p&gt;

&lt;p&gt;Do you feel the same way?&lt;/p&gt;

&lt;p&gt;Originally published on Medium:&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://medium.com/@yogesh.vk/why-ai-and-automation-are-not-always-the-right-answer-in-devops-8c0cb5e439bf" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;medium.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>devops</category>
      <category>sre</category>
    </item>
    <item>
      <title>Using AI to Explain Terraform Plans to Humans</title>
      <dc:creator>Yogesh VK</dc:creator>
      <pubDate>Fri, 27 Mar 2026 06:16:00 +0000</pubDate>
      <link>https://forem.com/yogesh_vk/using-ai-to-explain-terraform-plans-to-humans-3dp4</link>
      <guid>https://forem.com/yogesh_vk/using-ai-to-explain-terraform-plans-to-humans-3dp4</guid>
      <description>&lt;p&gt;Turning raw infrastructure diffs into decisions engineers can actually understand.&lt;/p&gt;

&lt;h2&gt;
  
  
  INTRODUCTION
&lt;/h2&gt;

&lt;p&gt;Terraform plans are incredibly precise. They show every resource change, attribute modification, and dependency update that will occur during an apply.&lt;br&gt;
But precision is not the same as clarity.&lt;br&gt;
For many engineers reviewing infrastructure changes, Terraform plans feel more like a wall of text than a meaningful explanation of what is about to happen. The information is there, but extracting the real implications often requires experience and careful reading.&lt;/p&gt;

&lt;p&gt;This is exactly where AI can become useful. Not by executing infrastructure changes, but by translating Terraform plans into something humans can reason about.&lt;/p&gt;
&lt;h2&gt;
  
  
  THE PROBLEM WITH RAW TERRAFORM PLANS
&lt;/h2&gt;

&lt;p&gt;Terraform's plan output is designed for correctness, not readability.&lt;br&gt;
It faithfully lists changes such as resource replacements, attribute updates, and dependency adjustments. While this is ideal for machines and precise workflows, it can make reviews difficult for humans, especially in larger environments.&lt;br&gt;
A simple plan might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hundreds of attribute updates&lt;/li&gt;
&lt;li&gt;nested resource changes&lt;/li&gt;
&lt;li&gt;implicit dependencies across modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What reviewers actually want to know is much simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What changed?&lt;/li&gt;
&lt;li&gt;Why does it matter?&lt;/li&gt;
&lt;li&gt;Is the risk acceptable?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Terraform itself does not answer those questions.&lt;/p&gt;
&lt;h2&gt;
  
  
  WHERE HUMAN REVIEW BREAKS DOWN
&lt;/h2&gt;

&lt;p&gt;Experienced engineers eventually develop an instinct for reading Terraform plans. They scan for dangerous signals:&lt;br&gt;
resource replacement&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;subnet or network changes&lt;/li&gt;
&lt;li&gt;IAM policy expansions&lt;/li&gt;
&lt;li&gt;scaling changes in compute clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But this intuition takes time to build, and even experienced reviewers can miss subtle interactions when reviewing large changes late in the day or under delivery pressure.&lt;br&gt;
The real problem isn't lack of information. It's cognitive load.&lt;/p&gt;

&lt;p&gt;Terraform tells us everything. Humans only need to understand the important parts.&lt;/p&gt;
&lt;h2&gt;
  
  
  WHY AI IS GOOD AT THIS PROBLEM
&lt;/h2&gt;

&lt;p&gt;AI models are particularly good at summarizing structured text and identifying patterns.&lt;br&gt;
A Terraform plan contains many signals that AI can interpret effectively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which resources will be created, updated, or destroyed&lt;/li&gt;
&lt;li&gt;whether replacements will occur&lt;/li&gt;
&lt;li&gt;potential cost changes&lt;/li&gt;
&lt;li&gt;security-sensitive modifications&lt;/li&gt;
&lt;li&gt;large blast-radius changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of forcing humans to parse hundreds of lines of output, AI can produce a concise summary describing the operational impact.&lt;br&gt;
This transforms the Terraform plan from a raw diff into an explanation.&lt;/p&gt;
&lt;h2&gt;
  
  
  AI AS A REVIEW ASSISTANT IN CI/CD
&lt;/h2&gt;

&lt;p&gt;A practical place to integrate this capability is within CI/CD pipelines.&lt;br&gt;
After generating a Terraform plan, a pipeline step can feed the plan output into an AI model. The model then produces a human-readable summary that is attached to the pull request.&lt;br&gt;
Instead of reviewing raw plan text alone, engineers see a structured explanation such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Risk Summary: This change replaces the EKS node group, which will trigger a rolling replacement of worker nodes.
Security Impact: No IAM policies were expanded.
Cost Impact: Estimated monthly increase: approximately $120 due to increased instance size.
Operational Notes: Node replacement may temporarily reduce cluster capacity during rollout.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This type of explanation does not replace the Terraform plan. It simply helps humans understand it faster.&lt;/p&gt;
&lt;h2&gt;
  
  
  USING GITHUB ACTIONS FOR AI-ASSISTED PLAN REVIEWS
&lt;/h2&gt;

&lt;p&gt;GitHub Actions provides a natural place to implement this pattern.&lt;br&gt;
A typical pipeline already includes steps like formatting, validation, and plan generation. Adding an AI analysis step is straightforward and can operate entirely in read-only mode.&lt;br&gt;
The workflow might look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run terraform plan&lt;/li&gt;
&lt;li&gt;Export plan output as JSON&lt;/li&gt;
&lt;li&gt;Send plan summary to an AI model&lt;/li&gt;
&lt;li&gt;Post a structured explanation as a pull request comment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key point is that the AI does not change infrastructure or execute Terraform commands. It only interprets the plan and produces a human-readable summary.&lt;br&gt;
This keeps the decision-making process firmly in human hands.&lt;/p&gt;
&lt;h2&gt;
  
  
  WHY THIS IMPROVES INFRASTRUCTURE SAFETY
&lt;/h2&gt;

&lt;p&gt;When infrastructure reviews fail, it is rarely because Terraform produced incorrect output.&lt;br&gt;
Failures occur because reviewers misinterpret the impact or miss important signals hidden within large plans.&lt;br&gt;
AI-assisted explanations reduce that risk by highlighting the kinds of changes humans care about most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;replacements&lt;/li&gt;
&lt;li&gt;deletions&lt;/li&gt;
&lt;li&gt;network changes&lt;/li&gt;
&lt;li&gt;permission expansions&lt;/li&gt;
&lt;li&gt;scaling adjustments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI becomes a second set of eyes, helping reviewers focus their attention where it matters.&lt;/p&gt;
&lt;h2&gt;
  
  
  THE IMPORTANT BOUNDARY
&lt;/h2&gt;

&lt;p&gt;Even though AI can interpret plans effectively, it should never be allowed to execute them.&lt;br&gt;
Running terraform apply still requires human ownership and operational judgment. AI can explain consequences, but it cannot decide whether those consequences are acceptable.&lt;br&gt;
That boundary is what keeps AI useful rather than dangerous.&lt;/p&gt;
&lt;h2&gt;
  
  
  CLOSING THOUGHT
&lt;/h2&gt;

&lt;p&gt;Terraform already tells us what will change.&lt;br&gt;
AI helps answer the more useful question: What does this change actually mean?&lt;/p&gt;

&lt;p&gt;By turning raw infrastructure diffs into clear explanations, AI allows DevOps teams to review changes faster, understand risk better, and make more confident decisions.&lt;br&gt;
And that is exactly where AI belongs in infrastructure workflows - helping humans think more clearly, not replacing their judgment.&lt;/p&gt;

&lt;p&gt;How does your team review Terraform plans today - raw output, custom tooling, or something smarter?&lt;/p&gt;

&lt;p&gt;Originally published on Medium:&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://medium.com/@yogesh.vk/using-ai-to-explain-terraform-plans-to-humans-e631b264fafd" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;medium.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>ai</category>
      <category>devops</category>
      <category>terraform</category>
      <category>cicd</category>
    </item>
    <item>
      <title>5 Expensive Terraform Mistakes I Keep Seeing in Real Infrastructure and How AI can Help</title>
      <dc:creator>Yogesh VK</dc:creator>
      <pubDate>Fri, 20 Mar 2026 06:15:00 +0000</pubDate>
      <link>https://forem.com/yogesh_vk/5-expensive-terraform-mistakes-i-keep-seeing-in-real-infrastructure-and-how-ai-can-help-2m38</link>
      <guid>https://forem.com/yogesh_vk/5-expensive-terraform-mistakes-i-keep-seeing-in-real-infrastructure-and-how-ai-can-help-2m38</guid>
      <description>&lt;p&gt;Small infrastructure decisions that quietly turn into large cloud bills&lt;br&gt;
Infrastructure-as-Code has dramatically improved how teams manage cloud environments. Terraform in particular has made it possible to define infrastructure in version-controlled, repeatable configurations.&lt;/p&gt;

&lt;p&gt;In theory, this should make infrastructure both predictable and efficient.&lt;/p&gt;

&lt;p&gt;In practice, however, Terraform does not automatically make systems cost-efficient. It simply makes infrastructure changes easier to reproduce. If inefficient patterns exist in the configuration, Terraform will reproduce them perfectly.&lt;/p&gt;

&lt;p&gt;Over time, small infrastructure decisions accumulate. Many of them appear harmless when introduced, but months later they become visible as unexpectedly large cloud bills.&lt;/p&gt;

&lt;p&gt;Here are some of the most common Terraform patterns I keep seeing that quietly drive up infrastructure costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Oversized Compute That Never Gets Revisited
&lt;/h2&gt;

&lt;p&gt;One of the most common patterns starts early in a project.&lt;/p&gt;

&lt;p&gt;During initial development, engineers often choose slightly larger instance types to avoid performance issues. It is safer to start with extra capacity rather than risk under-provisioning a critical service.&lt;/p&gt;

&lt;p&gt;The problem is that these instance sizes often remain unchanged long after workloads stabilize.&lt;/p&gt;

&lt;p&gt;Terraform makes it easy to define infrastructure once and leave it untouched. As long as systems continue running without obvious performance problems, there is little incentive to revisit instance sizing decisions.&lt;/p&gt;

&lt;p&gt;Over time, this leads to clusters and services running on instance types that are significantly larger than necessary.&lt;/p&gt;

&lt;p&gt;This issue is particularly visible in Kubernetes clusters, where node groups are frequently defined with conservative sizing assumptions. If workloads later become more efficient, the underlying infrastructure may remain over-provisioned indefinitely.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Resource Replacements Triggered by Small Configuration Changes
Terraform’s declarative model means that certain configuration changes require resources to be replaced rather than updated.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, modifying attributes such as subnet associations, encryption settings, or instance types may cause Terraform to destroy and recreate a resource.&lt;/p&gt;

&lt;p&gt;While Terraform clearly reports these replacements in the plan output, the operational and financial impact is not always obvious during review.&lt;/p&gt;

&lt;p&gt;Replacing compute clusters, databases, or node groups can temporarily increase infrastructure usage, create additional storage snapshots, or trigger redeployment processes that consume additional resources.&lt;/p&gt;

&lt;p&gt;When these replacements happen frequently across environments, they can contribute to unexpectedly high infrastructure costs.&lt;/p&gt;

&lt;p&gt;This is one reason many teams are beginning to use AI-assisted plan analysis in CI pipelines — to highlight resource replacements and explain their operational impact before they are applied.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Logging and Observability Configurations That Grow Without Limits
Terraform is often used to provision logging pipelines and observability systems. These systems are essential for debugging and monitoring production environments.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, logging configurations are frequently defined with very generous defaults.&lt;/p&gt;

&lt;p&gt;For example, teams may configure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;high verbosity log levels&lt;/li&gt;
&lt;li&gt;long retention periods&lt;/li&gt;
&lt;li&gt;large ingestion pipelines
These settings are useful during development but are rarely revisited as systems mature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because Terraform configurations remain stable over time, these logging pipelines can continue collecting massive volumes of data long after the original debugging needs have passed.&lt;/p&gt;

&lt;p&gt;In some environments, observability costs eventually exceed the cost of the infrastructure being monitored.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Idle Infrastructure Environments
Another common Terraform pattern involves environment duplication.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Many organizations create separate environments for development, staging, integration testing, and experimentation. Terraform makes it easy to spin up these environments using identical modules.&lt;/p&gt;

&lt;p&gt;The problem is that these environments often remain running continuously even when they are rarely used.&lt;/p&gt;

&lt;p&gt;A staging environment that runs databases, compute nodes, load balancers, and storage resources can easily cost hundreds of dollars per month. Multiply that across multiple teams and environments, and the cost grows quickly.&lt;/p&gt;

&lt;p&gt;In many cases, these environments are only actively used during working hours.&lt;br&gt;
Automated scheduling policies or environment lifecycle management can dramatically reduce this waste, but these controls are rarely implemented initially.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Storage That Quietly Accumulates
Storage resources are particularly prone to long-term cost growth.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Terraform configurations frequently create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;snapshots&lt;/li&gt;
&lt;li&gt;backups&lt;/li&gt;
&lt;li&gt;object storage buckets&lt;/li&gt;
&lt;li&gt;artifact repositories
Because storage is relatively inexpensive per gigabyte, these resources often grow without much scrutiny.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over time, however, storage layers accumulate historical artifacts that are rarely accessed but continue to incur costs.&lt;/p&gt;

&lt;p&gt;Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;old database snapshots that were never cleaned up&lt;/li&gt;
&lt;li&gt;log archives retained indefinitely&lt;/li&gt;
&lt;li&gt;unused container images in registries&lt;/li&gt;
&lt;li&gt;artifact storage from old CI pipelines
Without lifecycle policies, these storage systems gradually become long-term archives rather than operational infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why These Issues Are Hard to Detect&lt;br&gt;
The most expensive Terraform mistakes rarely appear as obvious misconfigurations.&lt;/p&gt;

&lt;p&gt;Instead, they emerge gradually as systems evolve.&lt;/p&gt;

&lt;p&gt;Each decision may appear reasonable in isolation. The instance size seems safe. The logging level helps debugging. The staging environment might be needed later.&lt;/p&gt;

&lt;p&gt;The problem is that Terraform faithfully preserves these decisions over time.&lt;/p&gt;

&lt;p&gt;Without regular review, infrastructure configurations slowly drift away from the actual needs of the system.&lt;/p&gt;

&lt;p&gt;How AI Can Help Detect These Patterns Earlier&lt;br&gt;
AI cannot fix infrastructure architecture problems automatically. But it can help identify patterns that humans might overlook.&lt;/p&gt;

&lt;p&gt;For example, AI systems analyzing infrastructure configurations or Terraform plans can highlight signals such as:&lt;/p&gt;

&lt;p&gt;compute resources that appear significantly over-provisioned&lt;br&gt;
environments that remain idle for long periods&lt;br&gt;
storage resources that grow continuously without access&lt;br&gt;
resource replacements that may trigger unnecessary redeployments&lt;br&gt;
These insights allow teams to review infrastructure decisions earlier rather than discovering problems only when the monthly bill arrives.&lt;/p&gt;

&lt;p&gt;The Real Lesson&lt;br&gt;
Terraform is an incredibly powerful tool for managing infrastructure. But like any automation system, it faithfully executes the decisions encoded within it.&lt;/p&gt;

&lt;p&gt;If inefficient patterns exist in the configuration, Terraform will reproduce them perfectly every time.&lt;/p&gt;

&lt;p&gt;The goal is not to avoid mistakes entirely. That is unrealistic in complex systems.&lt;/p&gt;

&lt;p&gt;The goal is to detect small inefficiencies early — before they accumulate into large and expensive infrastructure problems.&lt;/p&gt;

&lt;p&gt;Closing Thought&lt;br&gt;
Cloud costs rarely explode because of one catastrophic decision.&lt;/p&gt;

&lt;p&gt;More often, they grow quietly from dozens of small infrastructure choices that were never revisited.&lt;/p&gt;

&lt;p&gt;Terraform gives us the power to manage infrastructure systematically. The challenge is making sure the systems we define remain aligned with how they are actually used.&lt;/p&gt;

&lt;p&gt;That requires continuous review, feedback, and sometimes a second set of eyes — whether human or machine.&lt;/p&gt;

&lt;p&gt;Originally published on Medium:&lt;br&gt;
&lt;a href="https://medium.com/@yogesh.vk/5-expensive-terraform-mistakes-i-keep-seeing-in-real-infrastructure-and-how-ai-can-help-9a4849ddfc91" rel="noopener noreferrer"&gt;https://medium.com/@yogesh.vk/5-expensive-terraform-mistakes-i-keep-seeing-in-real-infrastructure-and-how-ai-can-help-9a4849ddfc91&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>terraform</category>
      <category>aws</category>
    </item>
    <item>
      <title>AI for DevOps and Platform Engineering: Practical Use Cases That Actually Work</title>
      <dc:creator>Yogesh VK</dc:creator>
      <pubDate>Fri, 13 Mar 2026 04:53:55 +0000</pubDate>
      <link>https://forem.com/yogesh_vk/ai-for-devops-and-platform-engineering-practical-use-cases-that-actually-work-2a63</link>
      <guid>https://forem.com/yogesh_vk/ai-for-devops-and-platform-engineering-practical-use-cases-that-actually-work-2a63</guid>
      <description>&lt;p&gt;Moving beyond hype to real workflows where AI improves infrastructure engineering, and where AI is actually useful for DevOps and Platform Engineering teams today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;INTRODUCTION&lt;/strong&gt;&lt;br&gt;
AI is rapidly entering every corner of software engineering. DevOps and platform teams are no exception. New tools promise to generate infrastructure code, manage deployments, and even run operations autonomously.&lt;/p&gt;

&lt;p&gt;But most experienced infrastructure engineers react with skepticism.&lt;/p&gt;

&lt;p&gt;Infrastructure systems are complex, stateful, and deeply interconnected. Blind automation often introduces more risk than it removes. The question is not whether AI can be used in DevOps workflows — it is where it should be used, and where it should not.&lt;/p&gt;

&lt;p&gt;The most effective teams are not replacing engineers with AI. They are using AI to reduce cognitive load, surface hidden risks, and make better operational decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;THE SHIFT FROM AUTOMATION TO ASSISTED DECISION-MAKING&lt;/strong&gt;&lt;br&gt;
For years, DevOps focused heavily on automation. CI/CD pipelines automated builds, tests, deployments, and infrastructure provisioning. Infrastructure-as-Code tools like Terraform allowed teams to define environments in reproducible ways.&lt;/p&gt;

&lt;p&gt;AI introduces a new layer to this model.&lt;/p&gt;

&lt;p&gt;Instead of simply automating actions, AI can assist engineers in understanding the consequences of those actions. It becomes a reasoning layer that helps interpret complex systems rather than directly controlling them.&lt;/p&gt;

&lt;p&gt;In practice, this means AI is most valuable when it explains systems, analyzes changes, and highlights risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI FOR INFRASTRUCTURE CODE REVIEWS&lt;/strong&gt;&lt;br&gt;
Infrastructure changes often carry significant risk. A single change in Terraform can replace compute clusters, modify network boundaries, or expand IAM permissions.&lt;/p&gt;

&lt;p&gt;Traditional CI pipelines verify syntax and policy compliance, but they rarely explain the real impact of a change.&lt;/p&gt;

&lt;p&gt;AI can help fill this gap by reviewing Terraform plans and summarizing their implications. Instead of manually scanning hundreds of lines of plan output, engineers can see a concise explanation of what will change and why it matters.&lt;/p&gt;

&lt;p&gt;This turns infrastructure reviews into clearer conversations about risk and intent.&lt;/p&gt;

&lt;p&gt;Raw Terraform Plan (excerpt):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="c1"&gt;# aws_eks_node_group.platform_nodes must be replaced&lt;/span&gt;
&lt;span class="nx"&gt;-&lt;/span&gt;&lt;span class="err"&gt;/+&lt;/span&gt; &lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_eks_node_group"&lt;/span&gt; &lt;span class="s2"&gt;"platform_nodes"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="nx"&gt;instance_types&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"t3.large"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="nx"&gt;-&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"m5.large"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nx"&gt;scaling_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="nx"&gt;desired_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="nx"&gt;-&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;AI-Generated Explanation:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Terraform Plan Summary
High Impact Change
- EKS node group "platform_nodes" will be replaced
- Worker nodes will be recreated
Operational Impact
- Pods will be rescheduled during node replacement
- Temporary capacity reduction possible
Cost Impact
- Instance type upgrade (t3.large → m5.large)
- Estimated monthly increase: ~$120
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;AI IN CI/CD PIPELINES&lt;/strong&gt;&lt;br&gt;
CI/CD pipelines are another natural integration point for AI.&lt;/p&gt;

&lt;p&gt;Modern pipelines already perform many automated checks:&lt;/p&gt;

&lt;p&gt;formatting validation&lt;br&gt;
policy enforcement&lt;br&gt;
dependency scanning&lt;br&gt;
infrastructure plan generation&lt;br&gt;
AI can extend this pipeline by interpreting the results of those checks.&lt;/p&gt;

&lt;p&gt;For example, an AI step in a GitHub Actions workflow might analyze a Terraform plan and generate a structured summary highlighting resource replacements, cost changes, or security-sensitive updates.&lt;/p&gt;

&lt;p&gt;The pipeline still requires human approval before changes are applied. AI simply improves the context available to reviewers.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terraform Plan Review&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;terraform-plan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
   &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
   &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Terraform Plan&lt;/span&gt;
       &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform plan -out=tfplan&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Convert plan to JSON&lt;/span&gt;
       &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform show -json tfplan &amp;gt; plan.json&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AI Plan Analysis&lt;/span&gt;
       &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
         &lt;span class="s"&gt;ai-review plan.json &amp;gt; plan-summary.md&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Post summary to PR&lt;/span&gt;
       &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;marocchino/sticky-pull-request-comment@v2&lt;/span&gt;
       &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;plan-summary.md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The AI step reads the Terraform plan and generates a human-readable summary posted directly into the pull request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI FOR SHIFT-LEFT INFRASTRUCTURE SECURITY&lt;/strong&gt;&lt;br&gt;
DevSecOps practices encourage teams to identify security risks earlier in the development lifecycle. However, infrastructure security policies are often difficult to interpret or enforce consistently.&lt;br&gt;
AI can assist by analyzing infrastructure definitions and identifying potential issues before they reach production.&lt;/p&gt;

&lt;p&gt;For example, an AI assistant could flag:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;overly permissive IAM policies&lt;/li&gt;
&lt;li&gt;public exposure of internal services&lt;/li&gt;
&lt;li&gt;misconfigured storage access&lt;/li&gt;
&lt;li&gt;network boundary changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These insights can appear during pull request reviews or pipeline checks, allowing teams to address security concerns before deployment.&lt;/p&gt;

&lt;p&gt;Example PR comment:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Infrastructure Security Review
Issue Detected
- S3 bucket allows public read access
Resource
aws_s3_bucket.website_assets
Risk
Public exposure of application assets.

Suggested Fix
Add block_public_acls = true
Add block_public_policy = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;AI FOR OBSERVABILITY AND INCIDENT RESPONSE&lt;/strong&gt;&lt;br&gt;
Operations teams often face the challenge of interpreting large volumes of monitoring data.&lt;/p&gt;

&lt;p&gt;Logs, metrics, and alerts can provide enormous amounts of information, but identifying the root cause of an issue still requires human reasoning.&lt;/p&gt;

&lt;p&gt;AI can assist by analyzing telemetry data and highlighting patterns that indicate emerging problems. Instead of scanning dashboards and logs manually, engineers receive summaries that connect signals across systems.&lt;/p&gt;

&lt;p&gt;Used carefully, this can reduce alert fatigue and accelerate incident investigation.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Raw logs:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR connection timeout db-primary
ERROR connection timeout db-primary
ERROR connection timeout db-primary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;AI explanation:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alert Analysis
 Pattern Detected
    Repeated connection failures to database cluster.
 Likely Cause
    Database connection pool exhaustion.
Suggested Investigation
    Check RDS connection limits and application pool size.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This ties AI to real operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WHERE AI SHOULD NOT BE USED&lt;/strong&gt;&lt;br&gt;
Despite its strengths, AI should not be allowed to control critical infrastructure operations without human oversight.&lt;/p&gt;

&lt;p&gt;Executing infrastructure changes, approving deployments, or modifying security policies are decisions that carry operational responsibility.&lt;/p&gt;

&lt;p&gt;AI can provide insight, but it cannot own the consequences of those decisions.&lt;/p&gt;

&lt;p&gt;The most effective DevOps teams treat AI as an assistant rather than an operator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BUILDING AI-AUGMENTED PLATFORM WORKFLOWS&lt;/strong&gt;&lt;br&gt;
The real opportunity is not replacing DevOps workflows, but enhancing them.&lt;/p&gt;

&lt;p&gt;A healthy AI-assisted platform might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI explanations for Terraform plans&lt;/li&gt;
&lt;li&gt;AI-generated summaries for infrastructure pull requests&lt;/li&gt;
&lt;li&gt;AI-assisted security analysis during CI/CD&lt;/li&gt;
&lt;li&gt;AI-powered analysis of observability data
Each capability improves clarity and reduces cognitive load while preserving human ownership of operational decisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CLOSING THOUGHT&lt;/strong&gt;&lt;br&gt;
AI will undoubtedly influence how infrastructure systems are built and operated. But its greatest value will not come from replacing engineers.&lt;/p&gt;

&lt;p&gt;It will come from helping them understand increasingly complex systems.&lt;/p&gt;

&lt;p&gt;DevOps was originally about bringing development and operations closer together. The next phase may be about bringing human judgment and machine insight into better balance.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://medium.com/@yogesh.vk/ai-for-devops-and-platform-engineering-practical-use-cases-that-actually-work-efadc4a90f70?postPublishedType=initial" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;medium.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;





</description>
      <category>terraform</category>
      <category>ai</category>
      <category>cicd</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
