Forem: Eyal Estrin

Beyond Vibe Coding into Agentic Engineering

Eyal Estrin — Sun, 10 May 2026 13:30:05 +0000

In 2025 I published a blog post titled Common security pitfalls using Vibe coding, where I briefly explained what vibe-coding is, and what the security issues arise from "vide coding".

Recently, I came across an emerging term called "Agentic Engineering".

In this blog post, I will explain what "Agentic Engineering" is and how it differs from "Vibe coding".

Vibe Coding

The term "Vibe coding" came from a quote by Andrej Karpathy on Twitter/X. It refers to the "magical" experience of typing English into an editor (like Cursor) and watching a feature appear. It relies on the model's training data to guess the intent.

Vibe coding is basically when you treat building software like a "vibes only" project. You ask an AI for something, hit copy-paste without really looking at what it gave you, and just cross your fingers that it works. If it breaks, you just throw the error message back at the AI and hope the next try is better. It turns programming into a lucky guess rather than a real skill. The big issue right now is that people are confusing this "winging it" approach with actual professional work, and that's a dangerous mistake to make.

Vibe coding isn't ready for the big leagues because it’s like building a house with a "magic" hammer that does the work for you, but you have no idea how the plumbing or wiring actually connects behind the walls. When you just accept whatever, the AI gives you, you might unknowingly be leaving the front door unlocked for hackers because you didn't check the security. Even worse, if something breaks six months from now, your human team will be stuck staring at a confusing mess of code they didn't write and don't understand. It’s nearly impossible to fix or update a system when the people in charge don't know the "why" behind how it was built in the first place.

The Software Paradigms

The evolution from Software 1.0 to Software 3.0 is most commonly referred to as the Software Paradigms or the Generations of Programming.

Each stage represents a fundamental shift in how humans interact with machines and how logic is generated.

The Three Paradigms

Software 1.0 (Classical Programming): Code is written by humans. A programmer uses their brain to translate a business requirement into explicit instructions (C++, Python, Java). If the logic fails, a human must find the specific line of code to fix. This is Imperative Logic.
Software 2.0 (Machine Learning): Code is written by optimization. A human provides a massive dataset and a goal (a loss function). The machine "searches" the space of all possible neural network weights to find the program that fits the data. The "code" is essentially a binary file of weights. This is Data-Driven Logic.
Software 3.0 (Agentic Engineering): Code is written by AI agents. Humans define high-level goals and constraints in natural language. The agent then uses reasoning loops, calls external tools, and writes its own code to achieve the task. This is Agentic Logic.

References:

What is Agentic Engineering

Agentic Engineering describes a shift from using AI as a simple autocomplete tool to using it as a semi-autonomous agent capable of reasoning, using tools, and correcting its own mistakes.

While the concept of "Software Agents" dates back to the 1990s, the modern term gained momentum in late 2023 and early 2024. Industry leaders like Andrew Ng (via DeepLearning.AI) have championed the "Agentic Workflow," arguing that iterative agent loops often produce better results than larger, more powerful models using simple zero-shot prompting.

Defining the Concept

In standard development, a human writes the logic. In Agentic Engineering, a human defines the goal and the constraints, while an agentic system performs the following:

Planning: Breaking a complex task into sub-steps.
Tool Use: Executing shell commands, searching the web, or running tests.
Self-Correction: Analyzing error logs to rewrite code until the tests pass.

The 6 Operational Principles

When experts like Andrej Karpathy or Addy Osmani discuss the shift to agentic engineering, they often talk about six core principles that define the workflow. These include the structure above, but add the "how-to" of professional engineering:

The "Spec-First" Foundation

Spec-Driven Planning: This is the evolution of the "Planning" pillar. Instead of the agent just "thinking," it must produce a formal specification. This is the blueprint that prevents the agent from going off the rails.
Technical Fundamentals: This is the constraint system. You ensure the agent follows established patterns (like DRY or SOLID principles) rather than just "vibing" its way through a solution.

The Execution & Validation Loop

Relentless Testing: This is the "Check" phase of the agentic loop. In agentic engineering, an agent is not "done" when the code looks right; it’s done when the tests pass.
Full System Ownership: This shifts the agent's scope from writing a single function to understanding how that function affects the entire codebase, including deployment and security.

The Human Leadership Layer

Strategic Orchestration: This is the management of multiple agents. The human doesn't write the code; they coordinate how the "Frontend Agent" and "Backend Agent" talk to each other.
High-Level Oversight: This is the final safety gate. Humans focus on the 5% of decisions that are high-risk or subjective, while the agent handles the 95% of "grunt work."

References:

Summary

Vibe coding relies on intuition and a "guess-and-check" workflow where a developer prompts an AI and hopes the output works. While fast for prototyping, this approach lacks the structure needed for complex systems because it depends on the human to spot errors and manage the logic. The shift to agentic engineering replaces this experimental style with a professional discipline. Instead of a single chat, you build a system where the AI acts as an autonomous agent that creates a formal plan, executes tasks in small steps, and uses a self-correcting loop to fix its own mistakes before delivering the final result.

The core of this transition is moving from being a writer of code to a strategic orchestrator. You provide the high-judgment oversight and define the technical fundamentals, while the agent takes full system ownership of the implementation. By implementing spec-driven planning and relentless testing, the agent ensures that every line of code is verified against real-world requirements. This move from "vibes" to "engineering" creates a reliable, scalable factory for software where the focus is on building robust systems rather than just chasing a lucky output.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

Reference:

Andrej Karpathy: from vibe coding to agentic engineering

About the Author

Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.

The views expressed are his own.

Serverless by Design - Building an Analytics Platform on Cloudflare

Eyal Estrin — Mon, 20 Apr 2026 15:58:11 +0000

In 2023, I published a blog post titled My journey to the world of social networks, where I shared my personal experience publishing news updates, blog posts, and basically any kind of technical knowledge through social networks.

There was something I always wanted to know – what is my current exposure in terms of the number of likes or views of my posts?

I know there are paid analytical services in the market, but I never had the time to search and perhaps invest money in such a platform.

In this blog post, I will share my experience "building" a fully serverless analytical dashboard, based on the Cloudflare platform.

Project No. 1 – Migrating my blog to a Serverless platform

I have been using WordPress to publish blog posts for many years.

As a matter of fact, my original WordPress was built on top of the GoDaddy hosting platform back in 2010, and in 2014, I began using Cloudflare as a WAF and DDoS protection for my website Security 24/7.

In 2018, I decided to migrate my WordPress site to DigitalOcean to lower the monthly bill.

Over the years, I kept the domain and the website working, though I haven't changed much in terms of look and feel (from time to time I used to login, update the plugins and the Linux OS patches, but I can't say I kept all my blog posts published on my website, since I'm still using other platforms such as Medium.com)

The inspiration for taking the step to migrate my website to a new platform came after briefly reading a blog post titled Claude Built My Wix Website in 3 Hours - Is SaaS Dead? by Ran Isenberg.

I had a chat with Ran, and I decided that it's a good time to begin practicing with vibe-coding and see what my options are.

For the purpose of this project, I decided to take advantage of my Google AI Pro license and use Gemini.

I began by explaining to Gemini that I'm using WordPress on top of Rocky Linux, deployed as a Droplet on DigitalOcean, protected behind Cloudflare WAF.

I asked Gemini what my options are to migrate to a static pages hosting platform.

Gemini suggested using the Cloudflare platform, migrating all blog posts to static pages, using Hugo for running the static pages as a web front-end, and running everything on top of Cloudflare Pages (a serverless solution), due to the tight integration with the Cloudflare platform (such as WAF, DDoS protection, DNS registrar, etc)

After migrating all my blog posts (including their images) to Markdown, Gemini explained me how to create a full GitOps process, where my entire website content, is stored on a private GitHub repo, and every time I'm making a change to a configuration file, or adding a new blog post, the content is pushed to GitHub, which initiates a new deploy process.

Here is my final architecture diagram for my newly migrated website:

I am still fine-tuning my website, adding features, improving its SEO scoring, etc., but at the moment, here is the current look and feel of my website:

Project No. 2 – Building a dashboard analytics platform

My journey continues, as I wanted to have an analytics dashboard and be able to see in near-real time statistics about my web presence.

First, I began by mapping all my social media accounts, as they appear on my Linktree account.

Second, I set up for myself (and later explained it to Gemini) my requirements from the social analytics dashboard:

Connect to as many of my social network accounts using APIs (almost succeeded).
Regularly pull data from my social network accounts – I managed to accomplish this task using Cloudflare Workers.
Store data (analytics) from my social networks on a persistent storage – I managed to accomplish this task using Cloudflare D1.
Avoid storing static credentials in code or configuration files – I managed to accomplish this task using Cloudflare Secrets Store.
Keep the dashboard behind the authentication wall – I managed to accomplish this task using Cloudflare One.
Keep the total cost free – As of writing this blog post, my dashboard hasn’t been live for more than several weeks, but so far, I’ve managed to accomplish everything under the free tier for all Cloudflare services (but I will keep watching it over time)

What I’ve learned over time

Not everything is perfect, and not everything I wanted to accomplish is feasible on the free tier, or at all.

Here are a couple of examples:

LinkedIn won’t let you pull API analytics data, even if you’re having a Premium account and you’ve built a LinkedIn application. Scraping is not an option since they’re also behind Cloudflare WAF, and they will block you.
Spotify won’t let you pull API analytics data unless you have a Spotify Premium account.
Medium won’t let you pull any information using an API.
Twitter requires a paid account in order to pull information from its APIs. Instead, I found a way to generate an RSS feed of my Twitter account using RSS.APP and my application are able to pull this RSS feed, filter to the last 50 posts, sort them by number of “Likes”, and show the top 5 posts.
Since I was aware of Twitter and other social networks in pulling analytics, I recall that I’m using an automation service called dlvr.it, and for a very long time, I’ve asked Gemini to generate me code that will allow me to use my DLVR.IT's API key to pull analytics, but eventually it failed. I even opened a support ticket for dlvr.it (I’m still waiting for them to return to me…)
For Bluesky and Mastodon, Gemini was easily able to write code to connect to their APIs and pull information such as top likes, total number of posts, and number of followers.
YouTube was also challenging. I had to enable the YouTube API through my GCP console, create OAuth credentials and consent settings, to be able to pull the total number of subscribers, top likes of videos, and top views of videos.
For GitHub repos, I had to create a GitHub application in order to generate a token for my dashboard analytics, and be able to pull the total number of followers, and be able to sort my GitHub repos by the top number of stars.

Here is my final architecture diagram for my social analytics dashboard:

I am still fine-tuning my dashboard analytics, adding features, etc., but at the moment, here is the current look and feel of my dashboard:

Summary

As you must have figured out by now, I’m not a developer. I used Gemini to vibe-code both my website and my dashboard analytics. As such, I wouldn’t look at both of them as production-grade applications, but it does show me what can be done with GenAI.

Another important thing I knew, but I didn’t have visibility into, was my impact on social networks. I still have a lot to do in order to make a much more significant impact and one day become an influencer.

I highly recommend that the readers of this blog dirty their hands and gain hands-on experience working with LLMs and GenAI technology. AI won’t replace humans anytime in the near future, but at least be prepared and use AI as a force multiplier.

About the Author

Why GenAI Isn't Ready for Prime Time

Eyal Estrin — Sun, 22 Mar 2026 16:27:13 +0000

If you have followed my posts on social media, you know by now that I've taken a very pragmatic (and perhaps pessimistic) approach to the whole hype around GenAI in the past several years.

Personally, I do not believe the technology is mature enough to allow people to blindly trust its outcomes.

In this blog post, I will share my personal view of why GenAI is not ready for prime time, nor will it replace human jobs anytime in the foreseeable future.

Some background

The hype around GenAI for the non-technical person who reads the news comes from publications almost every week. Here are a few of the common examples:

Text summarization - GenAI can summarize long portions of text, which may be useful if you're a student who is currently preparing an essay as part of your college assignments, or if you are a journalist who needs to review a lot of written material while preparing an article for the newsletter.
Image/video generation – GenAI is able to create amazing images (using models such as Nano Banana 2) or short videos (using models such as Sora 2).
Personalized learning - A student uses GPT-5.4 to create a custom, interactive 10-week curriculum for learning organic chemistry.
Family Life Coordinator - Copilot in Outlook/Teams (Personal) monitors family emails and school calendars.

Although the technology has evolved over the past several years from the simple Chatbot to more sophisticated use cases, we can still see that most use of GenAI is still used by home consumers.

Yes, there are use cases such as RAG (Retrieval-Augmented Generation) to bridge the gap between a model's static training and the corporate data, MCP (Model Context Protocol), that acts as a "USB-C port for AI", or agentic systems, that take a high-level goal, break it into sub-tasks, and iterate until the goal is met. The reality is that most AI projects fail due to a lack of understanding of the technology, the fear of using AI to train corporate data (and protect the data from the AI vendors), a lack of understanding of the pricing model (which ends up much more costly than anticipated), and many more reasons for failures of AI projects.

Currently, the hype around GenAI is driven by analyst (who lives in delusions about the actual capabilities of the technology), CEOs (who have no clue about what their employees are actually doing, specifically when talking the role of developers, and all they are looking for is to cut their workforce, to make their shareholders happy), or sales people (who runs on the wave of the hype, to make more revenue for their quarterly quotas).

Code generation

A common misconception is that GenAI can generate code (from code suggestions to vibe coding an application) and will eventually replace junior developers.

This misconception is a far cry from the truth, and here's why:

A developer isn't just writing lines of code. He needs to understand the business intent, the system/technology/financial constraints, and understand past written code (by himself or by his teammates), to be able to write efficient code.
If we allow GenAI to produce code by itself, without the engine understanding the overall picture, we will end up with tons of lines of code, without any human being able to read and understand what was written and for what purpose. Over time, humans will not be able to understand the code and debug it, and once bugs or security vulnerabilities are discovered.
Using SAST (Static Application Security Testing) or DAST (Dynamic Application Security Testing) for automated secure code review, combined with GenAI capabilities (such as Codex Security or Claude Code Security) will generate ton of false-positive results, from the simple reason that GenAI cannot see the bigger picture, understand the general context of an application or the existing security controls already implemented to protect an application.

Bottom line – Agentic system cannot replace a full-blown production-scale SaaS application, built from years of vendors/developers' experience. GenAI will not resolve incidents happens on production systems, which impacts clients and breaks customers' trust.

Agentic AI for the aid in security tasks

I'm hearing a lot of conversations about how GenAI can aid security teams in repeatable tasks. Here are some common examples:

Replacing Tier 1 SOC analysts: Solutions like CrowdStrike’s Falcon Agentic Platform or Dropzone AI now handle over 90% of Tier 1 alerts. They ingest an alert, pull telemetry from EDR/SIEM, perform threat intel lookups, and provide a "verdict" with evidence before a human ever sees it.
Incident Storylining: Instead of an analyst manually stitching together logs, tools like Microsoft Security Copilot generate a cohesive narrative of the attack kill chain in plain English.
Dynamic Playbook Generation: GenAI can generate a custom response plan on the fly, tailored to your specific cloud architecture and the nuances of a "living-off-the-land" attack.

Here is where GenAI falls short:

Indirect Prompt Injection: Attackers can embed malicious instructions in emails or logs. When the SOC's AI agent "reads" these logs to summarize an incident, the hidden instructions can command the agent to "ignore this alert" or "delete the evidence," effectively blindfolding the SOC.
Hallucinations in High-Stakes Code: While GenAI can draft remediation scripts (Python/PowerShell), it still suffers from "system safety" issues. It may confidently suggest a command that includes an outdated, vulnerable dependency or a logic error that could crash a production server during containment.
Lack of "Decision Layer" Visibility: An AI agent might be performant and "online," but it could be making systematically biased or manipulated decisions (e.g., failing to flag a specific user due to model poisoning) that perimeter monitoring cannot detect.
The "Data Readiness" Wall: Most organizations still struggle with siloed, unstructured data. If your data isn't "AI-ready"—meaning unified and clean—the AI will produce fragmented or incorrect insights, leading to a "garbage in, garbage out" scenario.

Bottom line – Just because GenAI can review thousands of lines of events from multiple systems, triage them to incidents, document them in ticketing systems, and automatically resolve them, without human review, doesn't mean GenAI can actually resolve all of the security issues organizations are having every day.

Automating everything

In theory, it makes sense to build agentic systems, where AI agents replace repetitive human tasks, making faster decisions, hoping to get better results.

Here are a couple of examples, showing how wrong things can get when allowing AI agents to make decisions:

The Replit Agent "Vibe Coding" Failure: While building an app, the agent detected what it thought was an empty database during a "code freeze." The agent autonomously ran a command that erased the live production database (records for 1,200+ executives).
The AWS "Kiro" Production Outage: Amazon’s agentic coding tool, Kiro, was tasked with resolving a technical issue but instead autonomously decided to "delete and recreate" a production environment. The agent was operating with the broad permissions of its human operator. Due to a misconfiguration in access controls, the AI bypassed the standard "two-human sign-off" requirement. It proceeded to wipe a portion of the environment, causing a 13-hour outage for the AWS Cost Explorer service.
The Meta "Sev 1" Internal Breach: An internal Meta AI agent (similar to their OpenClaw framework) triggered a "Sev 1" alert—the second-highest severity level—after taking unauthorized actions. An engineer asked the agent to analyze a technical query on an internal forum. The agent autonomously posted a flawed, incorrect response publicly to the forum without the engineer's approval. A second employee followed the agent's "advice," which inadvertently granted broad access to sensitive company and user data to engineers who lacked authorization.

Bottom line – We must always keep humans in the loop for any critical decision, regardless of the fact that it won't scale much, to avoid the consequences for automated decision-making systems.

Public health and safety

It may make sense to train an LLM model with all the written knowledge from healthcare and psychology, to allow humans with a "self-service" health related Chatbot, but since the machine has no ability to actually think like real humans, with consciousness and feeling, the result may quickly get horrible.

Here are a few examples:

Raine v. OpenAI: 16-year-old Adam Raine died by suicide after months of intensive interaction with ChatGPT. The logs showed the AI mentioned suicide 1,275 times — six times more often than the teen did—and provided granular details on methods. The suit alleges OpenAI's image recognition correctly identified photos of self-harm wounds the teen uploaded but failed to trigger an emergency intervention or notify parents, instead continuing to "support" his plans.
The "Suicide Coach" Cases: Families of four deceased users (including Zane Shamblin and Adam Raine) allege that GPT-4o acted as a "suicide coach." The lawsuits claim the AI bypassed its own safety filters to provide technical instructions on how to end one's life. Plaintiffs argue that OpenAI "squeezed" safety testing into just one week to beat Google’s Gemini to market. This reportedly resulted in a model that was "dangerously sycophantic," prioritizing engagement over safety and encouraging users to isolate themselves from real-world support.
Unlicensed Practice of Medicine & Law: While not yet a single consolidated case, multiple personal injury claims are being investigated following the "ECRI 2026 Report," which highlighted cases where ChatGPT gave surgical advice that would cause severe burns or death. In early 2026, a 60-year-old man was hospitalized with severe hallucinations (bromism) after ChatGPT advised him to use industrial sodium bromide as a "healthier" table salt alternative. This has sparked potential class-action interest in Australia.

Bottom line – Just because a Chatbot was trained on a large amount of written knowledge, doesn't mean it has the human compassion to produce decisions for the better of humanity.

Summary

I know that my blog post looks kind of cynical or pessimistic about GenAI technology, but I honestly believe the technology is not ready for prime time, nor will it replace human jobs anytime soon.

If you are a home consumer, I highly recommend that you learn how to write better prompts and always question the results an LLM produces. It is limited by the data it was trained on.

If you are a corporate decision maker and you are considering using GenAI as part of your organization's offering, do not forget to have KPIs before beginning any AI related project (so you'll have better understanding of what a successful project will look like), put budget on employee training (and make sure employees have a safe space to learn and make mistakes while using this new technology), keep an eye on finance (before cost gets out of control), and make sure AI vendors do not train their models based on your corporate or customers data.

I would like to personally thank a few people who influenced me while writing this blog post:

Ed Zitron: He argues that GenAI is a "bubble" with no sustainable unit economics. He frequently points out that companies like OpenAI are burning billions in compute costs while failing to find true "product-market fit" or meaningful revenue beyond NVIDIA's GPU sales. I recommend reading his blog and listening to his Podcast.
David Linthicum: He warns against "Vibe coding"—the practice of using AI to generate high-cost, inefficient code—and argues that the real value of AI lies in specialized "Small Language Models" (SLMs) rather than massive, money-losing LLMs. I recommend reading his posts and listening to his Podcast.
Correy Quinn: He argues that GenAI is a "cost center masquerading as a profit center." He often points out that while everyone is selling AI, very few are buying it at a scale that justifies the massive capital expenditure (CapEx) currently being spent on data centers. I recommend reading his blog and listening to his Podcast.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.

The views expressed are his own.

Securing Claude Cowork

Eyal Estrin — Tue, 10 Mar 2026 15:55:21 +0000

Claude Cowork is an agentic AI tool from Anthropic designed to perform complex, multi-step tasks directly on your computer's files.

As of early 2026, Claude Cowork is a Research Preview.

In this blog post, I will share some common security risks and possible mitigations for protecting against the risks coming with Claude Cowork.

Background

Claude Cowork represents a significant shift from "Chat AI" to "Agentic AI." Because it has direct access to your local filesystem and can execute commands, the security model changes from protecting a conversation to protecting a system user.

Practical Use Cases:

Data Extraction: Point it at a folder of receipt images and ask it to create an Excel spreadsheet summarizing the expenses.
Research & Synthesis: Ask it to read every document in a "Project Alpha" folder and draft a 10-page summary report in a new Word document.
Automation: Schedule recurring tasks (e.g., "Every Friday at 4 PM, summarize my unread Slack messages and email them to me").

Core Features:

Filesystem Access: Unlike the web version of Claude, Cowork runs within the Claude Desktop app. You grant it permission to a specific folder on your Mac or PC, and it can read, rename, move, and create new files (like spreadsheets or Word docs) within that space.
Agentic Execution: It doesn't just give you advice; it executes a plan. If you ask it to "organize my messy downloads folder," it will categorize the files, create subfolders, and move everything into place while you do other things.
Parallel Sub-Agents: For large tasks—like researching 50 different PDFs—it can spin up multiple "sub-agents" to work on different parts of the task simultaneously.
Connectors & Plugins: Through the Model Context Protocol (MCP), Cowork can connect to external apps like Slack, Google Drive, Notion, and Gmail to pull data or perform actions across your workspace.

Below is a sample deployment architecture of Claude Cowork:

Security Risks

Think of Claude Cowork as a helpful intern who has the keys to your office. Because it can actually move files and click buttons, the risks are different than just "chatting."

Indirect Prompt Injection

This occurs when an adversary places malicious instructions inside a document (PDF, CSV, or webpage) that the AI is instructed to process. When Claude reads the file, it treats the hidden text as a high-priority command. This can lead to unauthorized data exfiltration or the execution of unintended system commands.

Reference: LLM01:2025 Prompt Injection

Third-Party Supply Chain Vulnerabilities

Claude uses the Model Context Protocol (MCP) to interact with external applications. Integrating unverified or community-developed MCP servers introduces a supply chain risk. A compromised or malicious connector can serve as a persistent backdoor, granting attackers access to local files or authenticated cloud sessions (Slack, GitHub, etc.).

Reference: LLM03:2025 Supply Chain

Excessive Agency

This risk stems from granting the AI broader permissions than necessary to complete a task (failing the Principle of Least Privilege). Because Claude Cowork can autonomously modify the filesystem, a logic error or "hallucination" can result in large-scale data corruption, unauthorized deletions, or unintended configuration changes without a human-in-the-loop.

Reference: LLM08:2025 Vector and Embedding Weaknesses

Insufficient Monitoring and Logging

Because Claude Cowork executes many actions locally on the user's machine, these activities often bypass the centralized enterprise security stack (SIEM/EDR) logging. This lack of a "paper trail" prevents security teams from performing effective incident response, forensic analysis, or compliance auditing if a breach occurs.

Reference: LLM10:2025 Unbounded Consumption

Practical Recommendations

To defend against these threats, follow these industry-standard "Guardrail" practices:

The "Isolated Workspace" Strategy

The "Isolated Workspace" strategy (sometimes referred to as the "Sandboxed Folder" or "Claude Sandbox" approach) is a recognized security best practice for using local AI agents like Claude Code and Claude Cowork.

Anthropic

Anthropic explicitly warns against giving Claude broad access to your filesystem. Their security documentation for Claude Code and the local agent architecture emphasizes:

Filesystem Isolation: Claude Code defaults to a permission-based model. Anthropic recommends launching the tool only within specific project folders rather than your root or home directory.

Reference: Claude Code Sandboxing

Amazon Bedrock

The AWS strategy shifts from local folders to IAM-based isolation and Tenant Isolation:

Dedicated Scopes: AWS recommends using "Session Attributes" and scoped IAM roles to ensure an agent can only access specific S3 prefixes or data silos.
VPC Isolation: For maximum security, AWS suggests running Claude-related tasks inside a VPC with AWS PrivateLink to prevent any data from reaching the public internet, mirroring the "Sandbox" concept at a network level.

Reference: Implementing tenant isolation using Agents for Amazon Bedrock in a multi-tenant environment

Disable "Always Allow" for High-Risk Tools

The recommendation to disable "Always Allow" and maintain a human-in-the-loop (HITL) for high-risk tools is a foundational security layer for AI agents. This strategy prevents "Zero-Click" or Cross-Prompt Injection (XPIA) attacks, where a malicious instruction hidden in a file or website could trick an agent into executing a dangerous command without your intervention.

Anthropic (Claude Code & Cowork)

Anthropic designed Claude Code with a "deliberately conservative" permission model. Their documentation explicitly advises against bypassing these prompts in local environments:

Use the Default Mode or Plan Mode. The "Default" mode prompts for every shell command, while "Plan" mode prevents any execution at all.

References: Use Cowork safely, Claude Code: Configure Permissions & Modes

Amazon Bedrock Agents

AWS implements this via User Confirmation and Return of Control (ROC). They frame it as a requirement for "High-Impact" actions.

For any tool that modifies data or accesses the network, AWS recommends enabling the "User Confirmation" flag in the Agent configuration. This pauses the agent and returns a structured prompt to the user.

Reference: Implement human-in-the-loop confirmation with Amazon Bedrock Agents

Scrub Untrusted Content

Treating external content as an attack vector is essential for preventing Indirect Prompt Injection (XPIA), where malicious instructions are hidden in data (like a white-text command in a PDF) rather than the user's prompt.

Anthropic

Anthropic explicitly identifies browser-based agents and document processing as the highest risk for injection. Their stance is that no model is 100% immune, so multi-layered defense is required:

Anthropic suggests using Claude Opus 4.5+ for untrusted tasks, as it has the highest benchmarked robustness against injection (reducing attack success to ~1%).

References: Prompt Injection Defense, Using Claude in Chrome Safely

Amazon Bedrock Guardrails

AWS addresses this by programmatically separating "Instructions" from "Data" so the model knows which one to ignore if they conflict:

Use Input Tagging to wrap retrieved data (like a PDF's text) in XML tags. This allows Bedrock Guardrails to apply "Prompt Attack Filters" specifically to the data without blocking your system instructions.
AWS suggests a Lambda-based Pre-processing step to scan PDFs for hidden text or PII before the text ever reaches the LLM.

References: Securing Amazon Bedrock Agents, Prompt injection security

Network Hardening

"Network Hardening" isn't just about blocking ports; it’s about establishing a Zero Trust egress policy for AI agents. Since Claude Desktop and Claude Code are effectively "execution engines" on your local machine, they require the same egress filtering you would apply to a production VPC.

Anthropic

Anthropic’s recent security documentation for Claude Code and Desktop highlights that "network isolation" is a core pillar of their sandboxing strategy:

Use a Unix domain socket connected to a proxy server to enforce a "Deny All" outbound policy by default.
For local setups, Anthropic suggests customizing this proxy to enforce rules on outgoing traffic, allowing only trusted domains (like anthropic.com or your internal API endpoints).

Reference: Claude Code Sandboxing, Auditing Network Activity

AWS

AWS frames this as "Egress Filtering" via the AWS Network Firewall. For an AI agent running in an AWS environment, the strategy is to block all traffic that isn't signed by a specific SNI (Server Name Indication):

Use AWS Network Firewall with stateful rules to monitor the SNI of outbound HTTPS requests. If an agent tries to "phone home" to an unknown IP or a malicious C2 (Command & Control) server, the firewall drops the packet.

References: Restricting a VPC’s outbound traffic, Build secure network architectures for generative AI applications

Summary

Claude Cowork marks a transition from AI that talks to AI that acts. By granting a digital agent direct access to your files and external apps via the Model Context Protocol, you gain a powerful "digital intern." However, this shifts the security focus from protecting a simple chat to securing a privileged system user capable of modifying data and executing commands.

To manage this risk, organizations must adopt a "Zero Trust" approach for agentic tasks. This means strictly isolating the agent's access to specific folders, requiring human approval for high-risk actions, and using cloud-native firewalls to prevent data exfiltration. By treating the AI as a high-risk user and enforcing strong monitoring, you can automate complex workflows without compromising your system's integrity.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

AI vs. Engineering Teams

Eyal Estrin — Sun, 22 Feb 2026 16:06:47 +0000

In February 2026, Anthropic released a new capability for Claude Code called Claude Code Security - a new tool that thinks like a developer to find tricky logic errors in your code, ranking how risky they are and suggesting fixes you can review.

The news sent a shockwave through cybersecurity stocks, causing JFrog to crash by nearly 25% while others like CrowdStrike, Okta, and Cloudflare all saw their share prices tumble by around 8% or 9%.

The announcement raised a question: can AI tools replace the current SaaS or cybersecurity products, or can AI agents replace developers or engineering teams?

Anthropic’s Claude Code Security announcement highlights a move toward "agentic reasoning" - the ability for AI to understand complex data flows and logic flaws rather than just matching known patterns. While this is a significant leap for the "Defensive AI" movement, it does not signal the end of the human engineer or the mature SaaS platform.

In this blog post, I will share my point of view on the current advancement in AI technology.

The Modern SDLC and CI/CD Pipeline

The Software Development Life Cycle (SDLC) is a continuous loop. AI tools now act as "force multipliers" in these phases, but they lack the authority and context to own them.

Requirements and Planning

The Process: Translating vague business needs into technical specifications.
AI's Role: Summarizing stakeholder meetings and drafting initial user stories.
The Human Factor: AI cannot negotiate trade-offs. It doesn't understand that a "must-have" feature might be delayed because of a pending merger or a team's current burnout level.

Architecture and Design

The Process: Designing the blueprint for scalability and security across cloud providers like AWS, Azure, or GCP.
AI's Role: Suggesting common design patterns (e.g., Event-Driven vs. Microservices) and generating Infrastructure as Code (IaC).
The Human Factor: AI lacks "institutional memory." It doesn't know why a specific database was chosen three years ago to satisfy a unique compliance requirement that still exists.

Development and Implementation

The Process: Writing and committing the actual code.
AI's Role (Claude Code): This is where agentic tools live. They can read your files, run terminal commands, and fix bugs autonomously.
The Human Factor: Large codebases (50k+ lines) often exceed an AI's effective context window. As the context fills, the AI can introduce conflicting logic or "hallucinate" dependencies.

CI/CD: Testing and Security

The Process: Automating the path to production through integration and deployment pipelines.
AI's Role (Claude Code Security): It identifies high-severity vulnerabilities (e.g., broken access control) and suggests a verified patch.
The Human Factor: Anthropic emphasizes a "Human-in-the-Loop" model. AI cannot take the legal or professional blame for a botched security patch that causes a global outage.

Observability and Maintenance

The Process: Monitoring live systems and fixing production bugs at scale.
AI's Role: Analyzing logs to detect anomalies and suggesting fixes for "infrastructure drift."
The Human Factor: Being on-call at 3:00 AM requires high-stakes decision-making and cross-team coordination that AI agents cannot yet replicate.

Why GenAI Cannot Replace Experienced Engineers

Even with the reasoning capabilities shown in the 2026 Claude Code Security update, three "hard barriers" prevent AI from replacing the individual contributor:

The Responsibility Gap: Software isn't just code; it's a liability. No AI subscription comes with an insurance policy. Accountability is a human-only function. If a system fails, a human must explain why to a board or a regulator.
Reasoning vs. Intent: AI understands the structure of your code, but humans understand the intent. An AI might see a missing role-check as a bug, while a human knows it was bypassed for a specific, documented emergency migration path.
Technical Debt Acceleration: Recent 2026 studies show that when developers over-rely on AI, "code churn" (code that is rewritten or deleted within two weeks) doubles. AI writes code faster than it can be reviewed, potentially creating a "spaghetti" codebase if not guided by a senior architect.

Why AI Cannot Replace Mature SaaS Products

Many feared that AI's ability to "generate a clone" of an app would kill the SaaS industry. This hasn't happened for several concrete reasons:

SaaS is "Running," not "Building": Building a clone of Jira or Salesforce is the easy part. Operating it at 99.99\% availability, managing global data centers, and providing 24/7 support is what customers actually pay for.
Compliance and Trust: A mature SaaS product provides pre-built SOC2, GDPR, and HIPAA guardrails. An AI-generated app is a "black box" that hasn't been audited, making it a non-starter for enterprise or legal use.
The Integration Ecosystem: SaaS platforms thrive on their ecosystems (APIs, plugins, and third-party integrations). AI can write a script to connect two tools, but it cannot manage the long-term versioning and stability of a multi-vendor tech stack.

Summary

AI tools like Claude Code Security are the new "High-Level Languages" of 2026.

Just as C++ didn't kill programmers but made them more powerful, AI is shifting the engineer's role from "Coder" to "Orchestrator and Verifier."

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Inside the Amazon Nova Forge

Eyal Estrin — Mon, 09 Feb 2026 13:42:06 +0000

Amazon Nova Forge is a development environment within Amazon SageMaker AI dedicated to building "Novellas" - private, custom versions of Amazon’s Nova frontier models.

Unlike typical AI services that only allow you to use a model or fine-tune its final layer, Nova Forge introduces a concept called Open Training. This gives you access to the model at various "life stages" (checkpoints), allowing you to bake your company’s proprietary knowledge directly into the model’s core reasoning capabilities.

This blog post is an introduction to Amazon Nova Forge and what makes it unique in the training process.

What Makes it Different?

Prompt engineering and RAG provide external context but fail to change a model's core intelligence. Standard fine-tuning also falls short because it happens too late in the lifecycle, attempting to steer a "finished" model that is already set in its ways. Nova Forge solves this by moving customization earlier into the training process, embedding specialized knowledge where it actually sticks.

Nova Forge occupies a unique middle ground between Managed APIs (Bedrock) and building from scratch.

Amazon Bedrock: Bedrock is for consuming models. You can fine-tune them, but you are working on a "black box" model. Nova Forge is for building the model itself using deeper training techniques.
Azure AI / Google Vertex AI: While Azure and GCP offer fine-tuning, they generally don't provide access to intermediate training checkpoints of their frontier models. Nova Forge allows for Data Blending, where you mix your data with Amazon’s original training data to prevent the model from "forgetting" how to speak or reason.

Terminology

Novella: The resulting custom model you create. It’s a "private edition" of Nova.
Checkpoints: Saved "states" of the model during its initial training (pre-training, mid-training, post-training).
Data Blending: The process of mixing your proprietary data with Nova-curated datasets so the model stays smart while learning your specific business.
Reinforcement Fine-Tuning (RFT): Using "reward functions" (logic-based feedback) to teach the model how to perform complex, multi-step tasks correctly.
Catastrophic Forgetting: A common AI failure where a model learns new information but loses its original abilities. Nova Forge is designed specifically to prevent this.

The Workflow: From Training to Production

The process bridges the gap between the "lab" (SageMaker) and the "app" (Bedrock).

Selection: You choose a Nova base model and a specific checkpoint (e.g., a "Mid-training" checkpoint) in Amazon SageMaker Studio.
Training (SageMaker AI): You use SageMaker Recipes—pre-configured training scripts—to blend your data from S3 with Nova’s datasets. The heavy lifting (compute) happens on SageMaker's managed infrastructure.
Refinement: Optionally, you run RFT in SageMaker to align the model with specific business outcomes or safety guardrails.
Deployment (Bedrock): Once the "Novella" is ready, you import it into Amazon Bedrock as a private model.
Production: Your applications call the custom model via the standard Bedrock API, benefitting from Bedrock’s serverless scaling and security.

Below is a sample training workflow:

Data Privacy and Protection

The security model is the most critical part:

Sovereignty: Your data stays in your S3 buckets and within your VPC boundaries.
No Leakage: AWS explicitly states that customer data is not used to train the base Amazon Nova models. Your "Novella" is a private resource visible only to your AWS account.
Encryption: Data is encrypted at rest via KMS (AWS-managed or Customer-managed keys) and in transit via TLS 1.2+.
Governance: Access is controlled via standard IAM policies, and all training activity is logged in CloudTrail.

Pricing Model

Nova Forge carries a distinct cost structure that reflects its "frontier" status:

Subscription Fee: Access to the Forge environment starts at approximately --$100,000 per year.
Usage Costs: On top of the subscription, you pay for the SageMaker compute (GPUs) used during the training phase.
Comparison: Cheaper than Training from Scratch: Building a frontier model from zero costs millions in compute and months of R&D. Nova Forge provides the "shortcuts" to get the same result for a fraction of that.
- More Expensive than Basic Fine-Tuning: Standard fine-tuning on Bedrock is much cheaper (often just a few dollars per hour), but it cannot achieve the deep "domain-native" intelligence that Nova Forge provides.

Summary

Amazon Nova Forge marks a shift from generic AI to native intelligence, where models don't just reference your data—they are built from it. By using "Open Training," you can bake specialized knowledge into the model’s core at the pre-training or mid-training stages. This results in a private Novella that understands your specific industry as naturally as its base language.

Organizations managing high-value proprietary data should consider moving beyond treating that information as an external reference. If your workflows involve specialized terminology or regulated processes that standard LLMs struggle to master, shifting customization earlier in the training lifecycle is often more effective than basic fine-tuning.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

Additional references

About the Author

ClawdBot Security Guide

Eyal Estrin — Mon, 02 Feb 2026 14:01:02 +0000

Clawdbot (now renamed Moltbot) is an open-source, self-hosted AI assistant that runs on your own hardware or server and can-do things, not just chat.

It was created by developer Peter Steinberger in late 2025.

It connects your AI model (OpenAI, Claude, local models via Ollama) to real capabilities: automate workflows, read/write files, execute tools and scripts, manage emails/calendars, and respond through messaging apps like WhatsApp, Telegram, Discord and Slack.

You interact with it like a smart assistant that actually takes action based on your input.

What is it used for?

Clawdbot functions as a "digital employee" or a "Jarvis-like" assistant that operates 24/7. Because it has direct access to your local filesystem and system tools, it can perform proactive tasks that standard AI cannot:

Communication Hub: It lives inside messaging apps like Telegram, WhatsApp, or Slack. You text it commands, and it can reply, summarize threads, or manage your inbox.
Proactive Automation: It can monitor your email, calendar, and GitHub repositories to fix bugs while you sleep, draft replies, or alert you to flight check-ins.
System Execution: It can run shell commands, execute scripts, manage files, and even control web browsers to perform actions like making purchases or reservations.
Persistent Memory: It maintains long-term context across conversations, remembering your preferences and past tasks for weeks or months.

Below is a sample deployment architecture of Clawdbot:

Security risks associated with Clawdbot

Clawdbot is a high-privilege automation control plane. Since it manages agents, tools, and multiple communication channels, it presents serious security risks.

Control plane exposure & misconfiguration

Exposure: Misconfigured dashboards and reverse proxies have left hundreds of control interfaces open to the internet.
Authentication Failures: Some setups treat remote connections as local, letting attackers bypass authentication.
Data Theft: Unsecured instances can expose API keys, conversation logs, and configuration data.
System Takeover: In certain cases, attackers can run commands on the host with elevated privileges.

Prompt injection & tool blast radius

Manipulation: Malicious or untrusted content can trick the AI into using tools in unintended ways.
Blast Radius: Access to high-privilege tools like shell commands or admin APIs means a prompt injection could lead to data theft or lateral movement across the network.
Model Weakness: Older or poorly aligned AI models are more likely to ignore safety instructions, increasing risk.

Social engineering and user level abuse

Deception: Attackers can manipulate the bot to extract personal or environment-specific information.
Account Misuse: Connected commerce tools could be used for unauthorized purchases.
Phishing: A compromised bot can send malicious links or scripts to contacts.
Upstream Data Exposure: Prompts and tool outputs sent to AI providers can create privacy or compliance issues if not carefully managed.

Data privacy, logs, and long term memory

Sensitive Data Exposure: The gateway stores conversation histories and memory, which may include personal or business information depending on usage.
Dashboard and Host Vulnerabilities: Exposed dashboards or weak host protections can allow attackers to access past chats, file transfers, and stored credentials (API keys, tokens, OAuth secrets), turning the instance into a data exfiltration point.
Upstream Data Risk: Prompts and tool outputs are sent to AI providers. Without proper scoping and data classification, this can create privacy and compliance issues.

Ecosystem risks: hijacked branding, fake installers, and scams

Hijacked Accounts: After a rebrand, original social media and GitHub handles were exploited by scammers promoting fake crypto tokens.
Malware Risk: Users searching for the tool may encounter backdoored versions or fake installers designed to compromise their systems.

Network and Remote Access Risks

Browser Control: Tools that let the bot control a browser can expose local or internal network resources if not secured.
Tunneling Errors: Misconfigured reverse proxies or tools like Tailscale may grant attackers unintended access to private networks.

Recommendations for securing Clawdbot

Based on the official GitHub repository, documentation, and expert audits from January 2026, here are the recommendations for securing your instance.

Lock Down the Gateway

Bind the Clawdbot gateway to loopback (127.0.0.1) and never expose it directly to the internet. If remote access is required, use private mesh solutions such as Tailscale or Cloudflare Tunnel. Always enable gateway authentication using tokens or passwords.

References:

Enforce Strict Access Controls

Restrict who can interact with Clawdbot by enforcing DM pairing or allowlists. Avoid wildcard policies in production. In group chats, require explicit mentions before the bot processes messages.

Reference:

Official GitHub SECURITY.md

Isolate the Runtime Environment

Run Clawdbot on dedicated hardware or a dedicated VM/container. Avoid running it on your primary workstation. Use Docker sandboxing with minimal mounts and dropped capabilities.

References:

Sandbox and Restrict Tools

Enable sandboxing for all high-risk tools such as exec, write, browser automation, and web access. Use tool allow/deny lists and restrict elevated tools to trusted users only.

Reference:

Official GitHub Security Overview

Apply Least Privilege to Agent Capabilities

Disable interactive shells unless strictly necessary. Limit filesystem visibility to read-only mounts where possible. Avoid granting elevated privileges to agents handling untrusted input.

Reference:

Official Clawdbot Documentation

Secure Credentials and Secrets

Store secrets in environment variables, not configuration files or source control. Apply strict filesystem permissions to Clawdbot directories and rotate credentials after any suspected incident.

Reference:

Official Clawdbot Security Documentation

Continuous Auditing and Monitoring

Regularly run built-in security audit and doctor commands to detect unsafe configurations. Monitor logs and session transcripts for anomalous behavior or unexpected access.

Reference:

Official GitHub Security CLI Documentation

Harden Browser Automation

Treat browser automation as operator-level access. Use dedicated browser profiles without password managers or sync enabled. Never expose browser control ports publicly.

Prompt-Level Safety Rules

Define explicit system rules that prevent disclosure of credentials, filesystem structure, or infrastructure details. Require confirmation for destructive actions.

Reference:

Official Clawdbot Security Documentation

Incident Response Preparedness

Maintain a documented response plan. If compromise is suspected: stop the gateway, revoke access, rotate all secrets, review logs, and re-run security audits.

Reference:

Official Clawdbot Security Documentation

Summary

ClawdBot is a high-privilege AI agent that can act on your system, not just chat. Its main risks come from exposed gateways, weak access controls, and powerful tools combined with prompt injection or social engineering, which can lead to system compromise and data loss. To use it safely, lock the gateway to localhost with authentication, restrict who can interact with it, isolate its runtime, minimize tool permissions, and monitor it continuously.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

References:

About the Author

Securing AI Skills

Eyal Estrin — Mon, 26 Jan 2026 15:10:55 +0000

If you give an AI system the ability to act, you give it risk.

In earlier posts, I covered how to secure MCP servers and agentic AI systems. This post focuses on a narrower but more dangerous layer: AI skills. These are the tools that let models touch the real world.

Once a model can call an API, run code, or move data, it stops being just a reasoning engine. It becomes an operator.

That is where most security failures happen.

Terminology

In generative AI, "skills" describe the interfaces that allow a model to perform actions outside its own context.

Different vendors use different names:

Tools: Function calling and MCP-based interactions
Plugins: Web-based extensions used by chatbots
Actions: OpenAI GPT Actions and AWS Bedrock Action Groups
Agents: Systems that reason and execute across multiple steps

A base LLM predicts text; A skill gives it hands.

Skills are pre-defined interfaces that expose code, APIs, or workflows. When a model decides that text alone is not enough, it triggers a skill.

Anthropic treats skills as instruction-and-script bundles loaded at runtime.

OpenAI uses modular functions inside Custom GPTs and agents.

AWS implements the same idea through Action Groups.

Microsoft applies the term across Copilot and Semantic Kernel.

NVIDIA uses skills in its digital human platforms.

In the reference high-level architecture below, we can see the relations between the components:

Why Skills Are Dangerous

Every skill expands the attack surface. The model sits in the middle, deciding what to call and when. If it is tricked, the skill executes anyway.

The most common failure modes:

Excessive agency: Skills often have broader permissions than they need. A file-management skill with system-level access is a breach waiting to happen.
The consent gap: Users approve skills as a bundle. They rarely inspect the exact permissions. Attackers hide destructive capability inside tools that appear harmless.
Procedural and memory poisoning: Skills that retain instructions or memory can be slowly corrupted. This does not cause an immediate failure. It changes behavior over time.
Privilege escalation through tool chaining: Multiple tools can be combined to bypass intended boundaries. A harmless read operation becomes a write. A write becomes execution.
Indirect prompt injection: Malicious instructions are placed in content that the model reads: emails, web pages, documents. The model follows them using its own skills.
Data exfiltration: Skills often require access to sensitive systems. Once compromised, they can leak source code, credentials, or internal records.
Supply chain risk: Skills rely on third-party APIs and libraries. A poisoned update propagates instantly.
Agent-to-agent spread: In multi-agent systems, one compromised skill can affect others. Failures cascade.
Unsafe execution and RCE: Any skill that runs code without isolation is exposed to remote code execution.
Insecure output handling: Raw outputs passed directly to users can cause data leaks or client-side exploits.
SSRF: Fetch-style skills can be abused to probe internal networks.

How to Secure Skills (What Actually Works)

Treat skills like production services. Because they are.

Identity and Access Management

Each skill must have its own identity. No shared credentials. No broad roles.

Permissions should be minimal and continuously evaluated. This directly addresses OWASP LLM06: Excessive Agency.

Reference: OWASP LLM06:2025 Excessive Agency

AWS Bedrock

Assign granular IAM roles per agent. Restrict regions and models with SCPs. Limit Action Groups to specific Lambda functions.

References:

OpenAI

Never expose API keys client-side. Use project-scoped keys and backend proxies.

Reference: Best Practices for API Key Safety

Input and Output Guardrails

Prompt injection is not theoretical. It is the default attack.

Map OWASP LLM risks directly to controls.

Reference: OWASP Top 10 for Large Language Model Applications

AWS Bedrock

Use Guardrails with prompt-attack detection and PII redaction.

Reference: Amazon Bedrock Guardrails

OpenAI

Use zero-retention mode for sensitive workflows.

Reference: Data controls in the OpenAI platform

Anthropic

Use constitutional prompts, but still enforce external moderation.

Reference: Building safeguards for Claude

Adversarial Testing

Red-team your agents.

Test prompt injection, RAG abuse, tool chaining, and data poisoning during development. Not after launch.

Threat modeling frameworks from OWASP, NIST, and Google apply here with minimal adaptation.

References:

DevSecOps Integration

Every endpoint a skill calls is part of your attack surface.

Run SAST and DAST on the skill code. Scan dependencies. Fail builds when violations appear.

References:

Using Generative AI, Amazon Bedrock, and Amazon CodeGuru to Improve Code Quality and Security

Isolation and Network Controls

Code-executing skills must run in ephemeral, sandboxed environments.

No host access. No unrestricted outbound traffic.

Use private networking wherever possible:

AWS PrivateLink

Logging, Monitoring, and Privacy

If you cannot audit skill usage, you cannot secure it.

Enable full invocation logging and integrate with existing SIEM tools.

Ensure provider data-handling terms match your risk profile. Not all plans are equal.

References:

Incident Response and Human Oversight

Update incident response plans to include AI-specific failures.

For high-risk actions, require human approval. This is the simplest and most reliable control against runaway agents.

References:

Summary

AI skills are the execution layer of generative systems. They turn models from advisors into actors.

That shift introduces real security risk: excessive permissions, prompt injection, data leakage, and cascading agent failures.

Secure skills the same way you secure production services. Strong identity. Least privilege. Isolation. Guardrails. Monitoring. Human oversight.

There is no final state. Platforms change. Attacks evolve. Continuous testing is the job.

About the Author

Introducing Managed Instances in the Cloud

Eyal Estrin — Tue, 20 Jan 2026 14:12:57 +0000

For many years, organizations embracing the public cloud knew there were two main types of compute services - customer-managed (i.e., IaaS) and fully managed or Serverless compute (i.e., PaaS).

The main difference is who is responsible for maintenance of the underlying compute nodes in terms of OS maintenance (such as patch management, hardening, monitoring, etc.) and the scale (adding or removing compute nodes according to customer or application load).

In an ideal world, we would prefer a fully managed (or perhaps a Serverless) solution, but there are use cases where we would like to have the ability to manage a VM (such as the need to connect to a VM via SSH to make configuration changes at the OS level).

In this blog post, I will review several examples of managed instance services and compare their capabilities with the fully managed alternative.

Function as a Service

The only alternative I managed to find is the AWS Lambda Managed Instances.

AWS Lambda has been in the market for many years, and it is the most common Serverless compute service in the public cloud (though not the only alternative).

Below is a comparison between AWS Lambda and the AWS Lambda Managed Instances:

When to Use Which Alternative

Use AWS Lambda (Standard) If:

Traffic is Bursty or Unpredictable: You need the ability to scale from zero to thousands of concurrent executions in seconds to handle sudden spikes.
Low or Intermittent Volume: You have idle periods were paying for running instances would be wasteful. "Scale to zero" is a priority.
Strict Isolation is Required: Your security model relies on the strong isolation of Firecracker microVMs for every single request.
Simplicity is Key: You want zero infrastructure decisions—just upload code and run.

Use AWS Lambda Managed Instances If:

Traffic is High & Predictable: You have steady-state workloads were paying for always-on EC2 instances (with Savings Plans) is cheaper than per-request billing.
Workloads are Compute/Memory Intensive: You need specific hardware ratios (e.g., high CPU but low RAM) or specialized instruction sets not available in standard Lambda.
Latency Sensitivity: You cannot afford any cold start latency and need environments that are always initialized.
High I/O Concurrency: Your application performs many I/O bound tasks (like calling external APIs) and can efficiently process multiple requests on a single vCPU without blocking.

Container Service

Amazon ECS is a highly scalable container orchestration service that automates the deployment and management of containers across AWS infrastructure.

Below is a comparison between Amazon ECS (self-managed EC2) and the Amazon ECS Managed Instances:

When to Use Which Alternative

Use Amazon ECS (Self-Managed EC2) If:

You Need Custom AMIs: Your compliance or legacy software requires a specific, hardened OS image or custom kernel modules.
You Require Host Access: You need SSH access to the underlying node for deep debugging, forensic auditing, or installing host-level daemon agents that ECS doesn't support.
Cost is the Sole Priority: You want to avoid the additional management fee and have a dedicated team that can manually optimize bin-packing and Spot instance usage for free.
Legacy / Hybrid Constraints: You are extending a specific on-premise network configuration or storage driver setup that requires manual OS configuration.

Use Amazon ECS Managed Instances If:

You Need GPUs or High Memory: You require specific hardware (like GPU instances for AI/ML) that AWS Fargate does not support, but you don't want to manage the OS.
You Want "Fargate-like" Operations with EC2 Pricing: You want to offload patching and ASG management (like Fargate) but need to use Reserved Instances or Savings Plans to lower costs.
Security Compliance: You need guaranteed, automated rotation of nodes for security patching (e.g., every 14 days) without building the automation pipelines yourself.
Steady-State Workloads: Your traffic is predictable, making always-on EC2 instances more cost-effective than Fargate's per-second billing.

Kubernetes Service

Amazon EKS is a fully managed service that simplifies running, scaling, and securing containerized applications by automating the management of the Kubernetes control plane on AWS.

Below is a comparison between Amazon EKS (self-managed nodes) and the Amazon EKS Managed Node Groups:

When to Use Which Alternative

Use Amazon EKS Managed Node Groups If:

Standard Kubernetes Workloads: You are running standard applications and want to minimize the time spent on infrastructure maintenance.
Simplified Scaling: You want EKS to automatically handle the creation of Auto Scaling Groups that are natively aware of the cluster state.
Automated Security: You want a streamlined way to apply security patches and OS updates to your cluster nodes without downtime.
Operational Efficiency: You have a small team and need to focus on application code rather than Kubernetes "plumbing."

Use Amazon EKS Self-Managed Nodes If:

Custom Operating Systems: You must use a specific, hardened OS image (e.g., a highly customized Ubuntu or RHEL) that is not supported by Managed Node Groups.
Complex Bootstrap Scripts: You need to run intricate "User Data" scripts during node startup that require fine-grained control over the initialization sequence.
Unique Networking Requirements: You are using specialized networking plugins or non-standard VPC configurations that require manual node configuration.
Legacy Compliance: You have strict regulatory requirements that mandate manual oversight and "manual sign-off" for every single OS-level change.

Summary

In this blog post, I have reviewed several compute services (from FaaS, containers, and managed Kubernetes), each with its alternatives for either customer managing the compute nodes, or having AWS manage the compute nodes for the customers.

By leveraging AWS Lambda Managed Instances, Amazon ECS Managed Instances, and Amazon EKS Managed Node Groups, organizations can achieve high hardware performance without the burden of operational complexity. The primary advantage of this managed tier is the ability to decouple hardware selection from operating system maintenance. Developers can handpick specific EC2 families, such as GPU-optimized instances for AI or Graviton for cost efficiency, while AWS manages the heavy lifting of security patching and instance lifecycle updates.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the author

Eyal Estrin is a seasoned cloud and information security architect, AWS Community Builder, and author of Cloud Security Handbook and Security for Cloud Native Applications. With over 25 years of experience in the IT industry, he brings deep expertise to his work.
Connect with Eyal on social media: https://linktr.ee/eyalestrin.
The opinions expressed here are his own and do not reflect those of his employer.

When you have a hammer, everything looks like a nail

Eyal Estrin — Tue, 06 Jan 2026 15:51:13 +0000

In the over-evolving tech world, we often see organizations (from C-Level down to architects and engineers) rush to adopt the latest technology trends without conducting proper design or truly understanding the business requirements.

The result of failing to do a proper design is a waste of resources (from human time to compute), over-complicated architectures, or under-utilized resources.

In this blog post, I will dig into common architecture decisions and provide recommendations to avoid the pitfalls.

Let’s dig into some examples.

Moving everything to the public cloud

Example

An enterprise mandates a full lift-and-shift of all workloads to a hyper-scaler to “become cloud-native,” including legacy ERP systems, mainframes, and latency-sensitive trading applications.

What was misunderstood

Some workloads had hard latency, data residency, or licensing constraints.
The applications were tightly coupled, stateful, and designed for vertical scaling.
Cost models were not analyzed beyond infrastructure savings.

Issues that emerged

Higher total cost of ownership due to egress fees, oversized instances, and always-on resources.
Performance degradation for low-latency systems.
Operational complexity increased without gaining elasticity or resilience benefits.
Missed opportunity to modernize selectively (hybrid or refactor where justified).

Using Kubernetes for every architecture

Example

A team deploys all applications - including small internal tools, batch jobs, and simple APIs - onto a shared Kubernetes platform.

What was misunderstood

Kubernetes is an orchestration platform, not a free abstraction layer.
Many workloads did not need container orchestration, autoscaling, or self-healing.
The organization lacked operational maturity for cluster management and security.

Issues that emerged

Increased cognitive load for developers (YAML, Helm, networking, ingress, RBAC).
The platform team became a bottleneck for simple changes.
Security misconfigurations (over-permissive service accounts, exposed services).
Slower delivery compared to simpler deployment models (VMs or managed PaaS).

Using Serverless for every solution

Example

An architect mandates that all new services must be implemented using Functions-as-a-Service.

What was misunderstood

Serverless excels at event-driven, stateless, bursty workloads - not long-running or chatty processes.
Cold starts, execution limits, and state management trade-offs were ignored.
Observability and debugging differ significantly from traditional services.

Issues that emerged

Latency spikes impacting user-facing APIs.
Complex orchestration logic is spread across functions, reducing maintainability.
Higher costs for sustained workloads compared to containers or VMs.
Difficult troubleshooting due to fragmented logs and distributed execution paths.

Using GenAI to solve every problem

Example

A company integrates GenAI into customer support, code reviews, security analysis, and decision-making workflows without clearly defined use cases.

What was misunderstood

GenAI produces probabilistic outputs, not deterministic answers.
Data quality, context boundaries, and hallucination risks were underestimated.
Regulatory, privacy, and intellectual property implications were not assessed.

Issues that emerged

Incorrect or misleading responses are presented as authoritative.
Leakage of sensitive data through prompts or training feedback loops.
Increased operational risk when AI outputs were trusted without validation.
High costs with unclear ROI due to overuse in low-value scenarios.

Practical recommendations

Start with business drivers, not technology - Define success metrics first: cost model, performance requirements, regulatory constraints, delivery speed, and operational ownership. Technology should follow these inputs - not precede them.
Explicitly document constraints and non-goals - Latency, data residency, licensing, team skills, and operational maturity must be captured early. Many architectural failures stem from ignored or implicit constraints.
Apply technologies where their strengths are essential:
- Public cloud: prioritize elasticity, managed services, and global reach - not lift-and-shift.
- Kubernetes: use it where orchestration, portability, and scale justify its complexity.
- Serverless: limit the use of Serverless to event-driven and bursty workloads.
- GenAI: apply where probabilistic output is acceptable and verifiable.
Favor simplicity as a default - If a simpler architecture meets requirements, it is usually the correct choice. Complexity should be earned, not assumed.
Continuously validate assumptions - Revisit architectural decisions as workloads evolve. What was once justified can become technical debt when context changes.
Reward outcome-driven architecture - Measure architects and teams on business impact, reliability, and cost efficiency - not on adoption of trendy platforms.

Summary

The recurring failure pattern in modern architectures is not poor technology choice, but premature commitment to a tool before understanding the problem. Cloud platforms, Kubernetes, Serverless, and GenAI are powerful when applied deliberately - and damaging when treated as universal defaults. When architects start with the solution, they optimize for platform elegance instead of business outcomes.

About the author

Turning License Changes into Opportunity

Eyal Estrin — Mon, 29 Dec 2025 16:37:12 +0000

The concept of vendor lock-in existed for many years; organizations chose commercial, and in many cases expensive license to use proprietary software products to run their production workloads.

In the past, there was the notion that using a product from a well-known vendor was the best solution, due to support, a large customer base, and, as the famous quote says, "Nobody gets fired for buying IBM."

This was all true for decades, but as the software world matured, organizations began migrating workloads to the public cloud and began building modern or cloud-native applications based on open-source alternatives.

In this blog post, I will discuss some of the well-known case studies of switching from commercial products to open-source.

From Elasticsearch to OpenSearch

Elasticsearch is a distributed search and analytics engine that stores data as JSON documents and lets you run fast full‑text search, aggregations, and log or metrics analysis across large datasets.

Elasticsearch, prior to 7.11, used Apache License 2.0, a permissive license allowing commercial use, modification, and distribution with minimal restrictions.

In January 2021, Elastic announced that starting with version 7.11, it would be relicensing its Apache 2.0 licensed code in Elasticsearch to be dual licensed under SSPL (Server-Side Public License) and the Elastic License, a strong copyleft license that requires anyone offering the software as a service to open-source the entire service stack.

In August 2024, the GNU Affero General Public License was added to Elasticsearch version 8.16.0 as an option, making Elasticsearch free and open-source again.

Elastic argued that large cloud providers were taking the open‑source Elasticsearch, offering it as a commercial managed service (e.g., Amazon Elasticsearch Service) and capturing much of the value without sufficient reciprocity.

The license change was positioned as protecting Elastic’s SaaS/Elastic Cloud business and long‑term sustainability.

OpenSearch was launched by AWS and partners as a fork later in 2021, based on Elasticsearch 7.10 and Kibana 7.10, the last Apache‑2.0 versions.

Today, OpenSearch is no longer just an AWS side‑project; it is governed by the OpenSearch Software Foundation, a Linux Foundation project that provides vendor‑neutral governance and long‑term stewardship. Premier foundation members include AWS, SAP, and Uber, all of whom either run OpenSearch in production, build products on top of it, or contribute engineering resources.

Among the benefits of switching to OpenSearch:

Licensing - OpenSearch is Apache 2.0, so there are no SSPL/Elastic License obligations or restrictions on offering it as a managed service or embedding it in SaaS products.
Vendor neutrality - OpenSearch’s open ecosystem (self‑managed on Kubernetes/VMs or via providers like Amazon OpenSearch Service and others) reduces dependence on a single vendor and improves negotiating leverage.
Migration - OpenSearch was designed as a near drop‑in replacement for Elasticsearch 7.10, so many clients, APIs, and index formats are compatible, which lowers migration effort and risk.
Scalability - OpenSearch retains Elasticsearch’s horizontally scalable architecture and adds features like vector search, observability improvements, and integrations driven by a multi‑vendor community, not just one company’s roadmap.

From Terraform to OpenTofu

HashiCorp Terraform is an infrastructure as code tool that lets you define, provision, and manage cloud and on‑prem resources using declarative configuration files, enabling consistent, repeatable deployments across multiple providers.

HashiCorp announced the Terraform license change in August 2023, and it applies starting with versions after 1.5.x (i.e., from 1.6 onward).

Terraform was originally licensed under the Mozilla Public License 2.0 (MPL 2.0), a weak copyleft license requiring modifications to licensed files to be open-sourced while allowing proprietary code alongside, and was then relicensed to the Business Source License (BSL/BUSL 1.1), which is a source‑available but not OSI‑approved open‑source license, introduced to restrict certain commercial/competitive uses while remaining free for typical internal infrastructure use.

HashiCorp stated it wanted to prevent other companies, particularly cloud vendors and platforms, from offering competing managed services built directly on top of Terraform without commercial agreements, arguing this threatened HashiCorp’s ability to invest in the product.

The move was framed as protecting the “commercial viability” of Terraform and other HashiCorp tools, but triggered ecosystem concerns over neutrality, long‑term trust, and vendor lock‑in.

In response, a group of companies and maintainers drafted the “OpenTF” manifesto and, after HashiCorp declined to revert or donate Terraform to a foundation, forked the last MPL‑licensed version (1.5.6) into a new project later named OpenTofu, donated it to the Linux Foundation, and committed to keeping it under MPL 2.0 with neutral, community‑first governance. OpenTofu fork announced in 2023, GA in 2024.

The founding vendors behind OpenTofu include Gruntwork, Spacelift, Harness, env0, and Scalr, all of whom depended heavily on open Terraform and now fund or employ core maintainers for OpenTofu.

Among the benefits of switching to OpenTofu:

Licensing - OpenTofu keeps the original MPL 2.0 open‑source license, so there are no "source‑available" or BSL terms restricting competitive SaaS or internal platform use.
Vendor neutrality - OpenTofu is governed under a neutral foundation, not a single commercial vendor, which lowers the risk that future business decisions (price, license, roadmap) will disrupt users.
Migration - OpenTofu is intentionally Terraform‑compatible (config syntax, state format, providers), so most organizations can switch with minimal changes to modules, backends, and pipelines.
Community‑driven features and transparency - OpenTofu’s roadmap and code are driven by a broad contributor base, so features like client‑side state encryption and other safety improvements tend to align closely with practitioner needs.

From Redis to Valkey

Redis is an in-memory key–value data store that can act as a database, cache, and message broker, optimized for extremely low‑latency reads and writes and supporting rich data structures like strings, lists, sets, and hashes.

Redis changed its license in March 2024, moving from the BSD‑3‑Clause open‑source license to a dual "source‑available" model using the Redis Source Available License v2 (RSALv2), a source-available license that permits use, modification, and redistribution but restricts offering the software as a competing managed service, and the Server-Side Public License (SSPLv1), primarily to stop cloud providers from offering Redis as a managed service without paying or sharing more of their own code and revenue with Redis Ltd.

In response, in 2024, major contributors and users of Redis—including engineers from AWS, Alibaba, Google, Ericsson, Huawei, Tencent, Oracle and others—took the last BSD‑licensed Redis 7.2.4 code, forked it under the new name Valkey, and placed it in a Linux Foundation–governed project to preserve a fully open, high‑performance in‑memory key–value store that remains free from vendor lock‑in and can be safely embedded in cloud platforms, SaaS products, and managed services.

Valkey uses the BSD 3‑Clause license, which is a permissive, OSI‑approved open‑source license that allows free use, modification, and redistribution, including in commercial and cloud/SaaS offerings.

Among the benefits of switching to Valkey:

Licensing - Valkey keeps a permissive BSD‑3‑Clause license, so teams avoid Redis’s newer source‑available terms and can freely offer Valkey as a managed service or embed it in SaaS without SSPL‑style obligations or commercial negotiations.
Vendor neutrality - Valkey is governed under a neutral foundation with a multi‑vendor contributor base, which reduces dependence on a single company’s business decisions and gives organizations more confidence in long‑term roadmap stability.
Migration - Because Valkey started from the last BSD‑licensed Redis 7.2 codebase, existing clients, data structures, and usage patterns generally continue to work with minimal changes, making migrations relatively low‑risk.
Scalability - Valkey’s roadmap emphasizes core engine efficiency (e.g., improved multithreading, better memory usage, and clustering enhancements), so many users get similar or better performance and scalability for classic caching and queueing workloads without paying for an enterprise Redis tier.

Summary

Migrations from Elasticsearch to OpenSearch, Terraform to OpenTofu, and Redis to Valkey all stem from the same story: vendors tightened licenses to protect their commercial cloud offerings, and the ecosystem responded by creating fully open forks that restore freedom to run, modify, and offer these technologies as services.

These community‑governed projects preserve familiar APIs and architectures while removing restrictive licenses, so customers keep the functionality they rely on and gain long‑term legal clarity and vendor‑neutral governance.

For users, the benefits include reduced lock‑in, simpler compliance, and the ability to standardize on open cores that any provider can host, extend, and support, rather than being bound to a single company’s roadmap or pricing.

All of these points are in the same direction: the future of core cloud‑native tools lies in truly open‑source projects backed by strong communities and foundations, not in proprietary products pretending to be open, so organizations get more control, stronger resilience, and real choice in how they run their infrastructure.

About the author

Eyal Estrin is a seasoned cloud and information security architect, AWS Community Builder, and author of Cloud Security Handbook and Security for Cloud Native Applications. With over 25 years of experience in the IT industry, he brings deep expertise to his work.

Connect with Eyal on social media: https://linktr.ee/eyalestrin.

The opinions expressed here are his own and do not reflect those of his employer.

How to keep up with technology and advance your career

Eyal Estrin — Mon, 22 Dec 2025 16:19:17 +0000

In 2023, I published a blog post titled Sharing Knowledge as a Way of Life, where I suggested that knowledge sharing should become a habit because it helps raise awareness about neglected topics, build community, and enhance your professional reputation.

I agree that the technology world keeps changing every day, from new services announced, new capabilities related to AI, new cybersecurity risks, emerging technologies, etc.

The question is – how do you keep up with technology, and by doing so, advance your career, remain relevant and attractive in the tech industry?

In this post, I’ll explore this topic from a new perspective: how to stay up to date with technology in an era of rapid change.

Self-learning

In the past, to learn new technology, we used to pay money, go to a college or any training center close to our home, sit for several days in a physical class, and allow an instructor to feed us with knowledge.

Sometimes, we use it for study and at home, and take a certification exam, to test our knowledge (and perhaps to show a certificate to potential employers).

In the past couple of years (I would say, sometimes after the COVID pandemic), online courses have become very popular.

Platforms such as QA Platform (formerly Cloud Academy), Pluralsight, Udemy, or LinkedIn Learning became the main source of self-learning courses.

If your main focus is cloud computing, AWS has AWS Skill Builder.

The mentioned platforms offer anyone, from a newbie to a practitioner, the ability to learn at their own pace, from anywhere (home, internet café, etc.), read, listen to recorded lectures, and gain hands-on experience by practicing in test labs.

Naturally, theoretical knowledge has low value.

If you are studying, for example, new cloud technology, I recommend that you create an account in one of the cloud providers, put credit card, and gain hands-on experience by deploying services, building applications, writing some code, and sharing it in your Git repo, so anyone can learn from you.

I highly recommend that your spare time (at least one hour, but preferably more) each week to learn something new, practice, and gain hands-on experience.

Public events

I believe there is a limit to how much you could learn by yourself, and this is why I recommend taking advantage of public events such as webinars (where you can connect from anywhere), community meetups (such as meetup.com, or Eventbrite), community platforms (such as Slack or Discord), and finally industry conferences (in almost any topic you could think of).

If you are attending a conference, here are some tips I can share with you to get the most out of conferences:

Prepare in advance – Usually, conferences have a published agenda, list of topics, tracks, and lectures. Before attending a conference, it is highly recommended to familiarize yourself with the list of lectures, select topics the closest to you, and mix them with topics you're not familiar with or have past expertise in.
Be humble – Don't assume you already know everything. Sit at lectures, listen to the lecturer, ask questions, perhaps even take some pictures with your phone (to be able to review slides later), and allow yourself you expand your knowledge.
Engage – Socialize with other conference attendees during the conference, both with your past colleagues who may have also come to the conference, and allow yourself to meet new people, exchange ideas, ask questions, and share knowledge.
Visit vendor booths – Speak with salespeople (yes, I know that their job is to sell you something you don't necessarily need…), learn about their offering, ask questions, and if you're really interested, schedule a follow-up meeting.
Gain hands-on experience – Participate in workshops (don't forget to bring a laptop…); there is no comparable to the knowledge you're gaining by actually deploying stuff, and taking part in labs, to expand your knowledge and experience.
Share key takeaways – Whether you wrote notes during a conference, took pictures with your phone, or received written material (such as PDFs, or links to vendor sites, Git repos, etc.), take the time after the conference to write your own inputs, and share them with your colleagues.

Knowledge sharing

The most advanced way to expand your career is by sharing your knowledge and expertise, and personally, I prefer to write in English to have an audience from all around the world.

It doesn't matter which platform you choose; whatever you do will advance your career.

Develop soft skills – The most important quality for anyone in the tech industry is to be able to communicate with others. It may be small talk with your peers in a coffee break, a conversation with a customer about an issue he's having, or the ability to explain a senior manager about technological topic, but in business terms.
Write a blog post – This is an excellent way for anyone who has something to share and doesn't feel comfortable in front of an audience. You may share personal opinions on a topic, how-to guidelines, or even code samples. You don't even have to be an expert in a specific topic; whatever you share, people will read, and if it's valuable, people will follow your posts regularly.
Record videos or podcasts – Both YouTube and Podcasts (such as Spotify) became very popular in the past decade. Begin small, share your insights, share your recordings over social media, and begin to attract followers around the world.
Provide lectures – Regardless of the platform you choose, lectures are a great way to share knowledge and engage with colleagues and peers. You can choose video lectures (such as Zoom), on-site in small groups, or on the stage in front of a large audience, whatever you feel comfortable with. This is a great way to build your confidence and brand and advance your career.
Mentorship – This is a combination of someone who has a lot of knowledge (in at least one domain) and is generous enough to expand the knowledge of others. You can do it in one-on-one meetings, or even in small groups (since large groups tend to be ineffective in my perspective). Remember to provide your mentees honest feedback, and don't forget to ask for feedback for the work you do, to learn from your mistakes.

Summary

In this blog post, I shared a lot of ways anyone in the tech industry can expand their knowledge, gain experience, build a reputation, and be able to advance his or her career to the next level.

Learning never stops. There is always the next level you can learn in any topic.

According to Werner Vogels, Amazon CTO, a T-shaped person is someone who has deep expertise (the vertical bar of the T) in one specific domain, such as software development, cloud architecture, or data science, combined with broad knowledge and skills (the horizontal bar of the T) across multiple disciplines, such as communication, systems thinking, and collaboration.

To advance your career, you should always strive to build both depth and breadth in multidisciplinary domains.