Forem: Shrestha Pandey

I Built a GitHub Bot That Reviews My Pull Requests Using a Local LLM

Shrestha Pandey — Wed, 20 May 2026 09:46:21 +0000

Last month I pushed a bug to production which I would have caught if I had somebody look at my code. Well, it's not like I was the only one working on the project, so I had to read it myself and then forget about it, only to spend an annoying Tuesday trying to figure out why it didn't ship.

I was thinking about Ollama. I had heard about it a couple of times but never really needed to use it for anything. So this weekend I built a GitHub bot that reviews pull requests with a local LLM, without sending any code out of the box. It makes a comment on the PR with what it finds.

Here's how I built it.

What the bot actually does

When you open a pull request, GitHub fires a webhook. The bot receives it, pulls the diff for each changed file using GitHub's REST API, sends that diff to a locally running LLM via Ollama, and posts the review back as a comment on the PR.

The output looks like this on your PR:

Project Structure:

self-hosted-ai-code-review-bot-for-github-prs/
├── src/
│   ├── server.js
│   ├── github.js
│   ├── ollama.js
│   ├── diffParser.js
│   └── chunker.js
├── private-key.pem
├── .env
├── package-lock.json
└── package.json

Setting up the GitHub App first

Before any code, you need a GitHub App. A GitHub App has its own identity, which gets installed on specific repos, and uses short-lived installation tokens instead of long-lived credentials.

Go to GitHub → Settings → Developer Settings → GitHub Apps → New GitHub App.

Permissions you need:

Pull requests: Read & write
Issues: Read & write
Metadata: Read-only

Under webhook events, subscribe to Pull request. Set the webhook URL to your ngrok address + /webhook (we'll come back to that).
Generate and download the private key (.pem file). GitHub uses this to sign the installation tokens.

Clone the repo and install dependencies:

git clone <your-repo-url>
cd <repo-name>
npm install

Your .env:

PORT=3000
GITHUB_APP_ID=your_app_id
GITHUB_PRIVATE_KEY_PATH=./privatekey.pem
WEBHOOK_SECRET=your_webhook_secret
OLLAMA_MODEL=deepseek-coder # or llama3, whichever you're running

Then install the app on your repository via the Install App option.

github.js — auth and API calls

GitHub Apps don't use static tokens. You sign a JWT with your private key, exchange it for a short-lived installation token, and use that to make API calls. @octokit/auth-app handles all of this:

The auth layer

GitHub Apps use installation tokens, not static keys. You sign a JWT with your private key, exchange it for a short-lived token, use that token for API calls. @octokit/auth-app handles the whole flow:

export async function getOctokit(installationId) {
  const privateKey = fs.readFileSync(
    process.env.GITHUB_PRIVATE_KEY_PATH,
    "utf8"
  );

  const auth = createAppAuth({
    appId: process.env.GITHUB_APP_ID,
    privateKey,
    installationId
  });

  const installationAuth = await auth({ type: "installation" });

  return new Octokit({ auth: installationAuth.token });
}

You pass the installationId from the webhook payload. GitHub includes it in every event so you always know which installation triggered it.
The other two functions in github.js are straightforward: one calls octokit.pulls.listFiles to get the changed files with their diffs, the other calls octokit.issues.createComment to post the final review.

Filtering and chunking the diff

Not every file in a PR deserves review time. package-lock.json, .vscode/ settings, minified files, we skip them. GitHub also sometimes returns files without a patch field (binary files, files too large to diff), those get skipped too.

For files that do make it through, large diffs need to be split before sending to the LLM. Context windows are limited and model quality drops toward the end of long inputs:

export function chunkDiff(diff, maxSize = 1500) {
  const chunks = [];
  if (!diff) return chunks;

  let current = "";
  const lines = diff.split("\n");

  for (const line of lines) {
    if ((current + line).length > maxSize) {
      chunks.push(current);
      current = "";
    }
    current += line + "\n";
  }

  if (current) chunks.push(current);
  return chunks;
}

1500 characters per chunk works reliably with both Llama 3 and DeepSeek Coder. Each chunk gets its own Ollama call, results get concatenated per file.

The prompt is important

This is where I spent the most time, and where the whole thing either works or doesn't.

My first version was something unclear like "review this diff and point out issues." The output was not good. The model would write paragraphs explaining what the file does, invent security vulnerabilities that weren't in the code, reference issue numbers that don't exist, and end with generic advice like "make sure to add proper error handling throughout your application."

The problem is that without constraints, the model tries to be helpful in every direction at once. It doesn't know you only care about what changed. So you have to be as explicit about what you don't want as what you do:

export async function reviewWithOllama(diffChunk) {
  const prompt = `
You are an AI GitHub pull request reviewer.
You are reviewing ONLY the provided git diff.

STRICT RULES:
- Review ONLY changed lines from the diff.
- Ignore unchanged code completely.
- Do NOT explain the whole application.
- Do NOT give generic software engineering advice.
- Do NOT hallucinate missing features.
- Do NOT invent security issues.
- Do NOT invent issue IDs, ticket numbers, PR references, or metadata.
- Do NOT mention issues outside the diff.
- Every issue MUST directly relate to a changed line.
- Keep response concise.
- If no real issue exists, reply exactly: "No significant issues found."

Focus ONLY on:
1. Potential bugs introduced
2. Style or readability issues introduced
3. Performance issues introduced
4. Security risks directly introduced
5. Small improvement suggestions directly related to the diff

Diff:
${diffChunk}
`;

  const response = await fetch("http://localhost:11434/api/generate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: process.env.OLLAMA_MODEL,
      prompt,
      stream: false,
      options: { temperature: 0 }
    })
  });

  const data = await response.json();
  return data.response;
}

Once I added the negative rules, the output became useful. The model stopped inventing context and started pointing at specific changed lines.

temperature: 0 — code review isn't creative work. You want the model to be deterministic, same diff producing consistent output. It also reduces hallucinations noticeably.

The webhook handler

Express receives the event, checks if it's worth handling, and orchestrates everything:

app.post("/webhook", async (req, res) => {
  try {
    const event = req.headers["x-github-event"];
    const action = req.body.action;

    if (event !== "pull_request") return res.status(200).send("Ignored");

    if (!["opened", "synchronize", "reopened"].includes(action)) {
      return res.status(200).send("Ignored action");
    }

    const installationId = req.body.installation.id;
    const owner = req.body.repository.owner.login;
    const repo = req.body.repository.name;
    const pullNumber = req.body.pull_request.number;

    const octokit = await getOctokit(installationId);
    const files = await getPullRequestFiles(octokit, owner, repo, pullNumber);

    let finalReview = `# 🤖 AI Code Review\n\n`;

    for (const file of files) {
      if (!shouldReviewFile(file)) continue;

      const chunks = chunkDiff(file.patch);
      let fileReview = "";

      for (const chunk of chunks) {
        fileReview += await reviewWithOllama(chunk) + "\n\n";
      }

      finalReview += `## File: ${file.filename}\n\n${fileReview}---\n\n`;
    }

    await postReviewComment(octokit, owner, repo, pullNumber, finalReview);
    res.status(200).send("OK");
  } catch (error) {
    console.error("Webhook error:", error);
    res.status(500).send("Error processing webhook");
  }
});

Running it

Install Ollama from ollama.com and pull a model:

ollama run llama3
# or, better for code
ollama run deepseek-coder

Start the server and expose it with ngrok:

npm run dev
ngrok http 3000

Copy the ngrok HTTPS URL, update your GitHub App's webhook URL to https://your-ngrok-url/webhook, push a branch, open a PR.

CPU vs GPU: Depending on hardware, local inference on CPU may take several seconds per diff chunk. Llama 3 8B on a laptop CPU takes 10–20 seconds per diff chunk. For a background bot that's fine, you open the PR, switch tabs, comment shows up a minute later. With an NVIDIA GPU, Ollama picks it up automatically and it's under 2 seconds.

Deploying it properly

ngrok dies when you close your terminal. For anything persistent, a cheap VPS works. DigitalOcean's $6/month droplet, small EC2 instance, whatever. Install Node.js and Ollama on it, put Nginx in front with SSL, point your webhook URL at the domain.

Tools like GitHub Copilot Code Review send your diff to external servers. With this setup the diff goes from GitHub's API to your own machine into your local model which means nothing leaves your infrastructure. For proprietary codebases that's not a minor thing.

Where the idea came from

I came across VickyBytes a few weeks back. It's a platform that posts structured project ideas in Creator Labs, where you build something, write about it, and earn from it if the content is good enough. This was one of those.
Having a concrete spec helped. "Play with Ollama" had been on my list for months and never happened. A specific project with a clear outcome actually got it done. The bot runs on my own repos now, so that worked out.
If you write technical content and want structured project ideas to build around, visit vickybytes.com.

Full source is on GitHub. The prompt is where most of the interesting experimentation happens. That's where you'll actually learn how local LLMs behave in practice.

Why Reasoning Models Changed Everything

Shrestha Pandey — Fri, 10 Apr 2026 14:38:33 +0000

For years, making language models smarter meant making them bigger. Then someone asked a different question: what if, instead of training more, you let the model think longer?

In September 2024, OpenAI released o1. In January 2025, DeepSeek released R1. These two models together, invalidated the assumption that was governing the entire field since 2020, that the path to better AI runs through bigger training runs.

This assumption was backed by the Kaplan et al. scaling laws paper, which showed the language model performance follows a reliable power law with training compute. If you provide more parameters, more data, more GPU-hours, you get a more capable model. The field organized itself around this insight. Every major lab poured billions into pre-training.

o1 and R1 showed that there’s a second dimension to scale that the field had largely ignored: compute at inference time. And it turns out that for tasks requiring multi-step reasoning, this second dimension can be just as powerful as the first, and far cheaper to exploit.

What chain-of-thought actually is

Chain-of-thought prompting has been around since 2022, when Wei et al. at Google Brain showed that simply asking a language model to “think step by step” before answering dramatically improved its performance on math and logic tasks. This was a really surprising result. The model wasn’t being retrained, it was just prompted differently. The extra tokens the model generated while reasoning served as scratchpad that improved its final answer.

Transformers generate one token at a time, and each token is conditioned on all previous tokens. When a model solves a math problem by writing out steps, those steps become part of the context that informs the final answer. The model is using its own output as working memory. Without chain-of-thought, it has to compress all that computation into a single forward pass.

Nobody had figured out how to train a model to do this reliably, until o1. Prompting a model to “think step by step” helps, but the quality of reasoning is inconsistent. You might get careful and structured reasoning sometimes, but sometimes you get verbose filler that doesn’t really help. The model doesn’t know when to think hard and when not to.

The reinforcement learning connection

OpenAI’s o1 system card describes training the model with reinforcement learning to produce a chain of thought before answering. The core thing is that RL can teach the model how to think specifically, to develop reasoning strategies that lead to correct answers on verifiable tasks like mathematics and code.

According to OpenAI’s published description, o1 learns through RL to recognize and correct its mistakes in the middle of reasoning, break hard problems into simpler subproblems, and abandon approaches that aren’t working. These behaviors like self-correction, decomposition, backtracking, emerge from the training signal.

The result is a model where more thinking time produces better answers. On the AIME 2024 benchmark (American Invitational Mathematics Examination), o1 scored in the 74th percentile of human test-takers. GPT-4o, using the same benchmark, scored around 9%. That gap is from the model being trained to use its inference compute more productively.

DeepSeek R1: showing the mechanism

DeepSeek R1 published in January 2025 was an extremely important advancement because it achieved on benchmark scores the same level of performance as o1; however, this is the easy part. The significance of DeepSeek was that it provided an in-depth description of the training recipe allowing other people in the field the ability to study and replicate it.

They started with DeepSeek-V3-Base, a 671B parameter pretrained model built on a Mixture-of-Experts architecture. Their first experiment called DeepSeek-R1-Zero, applied reinforcement learning directly to this base model without supervised fine-tuning beforehand. The reward signal was such that, the model gets rewarded for producing a correct final answer, and for formatting its output with explicit reasoning inside <think> tags.

No direct human-labeled examples were used to provide the basis for generating a reward signal for successful attempts. Similarly, no preference-based human reward model was utilized for training and reward purposes for deep learning via reinforcement learning. Simply put, their reinforcement learning experiment used the following evaluation criteria: (1) Did you provide the correct final answer?

The results of R1-Zero were remarkable and, to be honest, a little unsettling. The model's average pass@1 score on AIME 2024 increased from 15.6% at the start of training to 71.0% by the end. More striking was what happened to the model's behavior during this process. The reasoning traces grew substantially longer as training progressed.

The model spontaneously developed strategies like re-reading the problem from the beginning partway through a solution, checking its own work, and explicitly noting when it suspected an error. None of this was designed in. It emerged from optimizing for correct answers.

The paper describes a notable emergent behavior where the model, while solving a math problem, pauses mid-reasoning, re-evaluates its approach, and switches to a different strategy to reach the correct answer. This emerges from reward training, where the model learns that revisiting its reasoning can sometimes lead to better outcomes.

The training algorithm: GRPO

The RL algorithm that DeepSeek used (Group Relative Policy Optimization (GRPO)), is worth understanding, because it's part of why this approach is tractable at scale.

Standard reinforcement learning for language models (the approach used in earlier RLHF pipelines) depends on Proximal Policy Optimization (PPO). This requires an additional network called a critic, which is basically another large neural network that can estimate the value of partially completed responses. The size of the critic is typically the same as the size of the model that is being trained, resulting in twice the memory/compute requirements of the PPO-based reinforcement learning algorithm, because you must perform two full forward passes for every training step.

GRPO, first introduced in the DeepSeekMath paper (2024), eliminates the critic. With GRPO, instead of estimating value through a learned value estimation model, GRPO now generates multiple responses for each prompt and receives a reward for each response. This group of multiple responses will provide an average reward, or baseline, for comparison when determining the advantage of any response (i.e., the signal that tells the model whether that particular response was better or worse than what was expected) through the formula:

A_i = (r_i - mean(r_1...r_G)) / std(r_1...r_G)

This is similar to the REINFORCE algorithm from the 90’s, however it is being created on larger scales with modern hardware and still holds the same clipping mechanism found within PPO for training stability. Because of this, you would see how using 50% fewer resources is huge, especially given that your policy model contains 671 billion parameters.

There were only two main types of rewards that were given in the reward function: correctness rewards (whether or not the final answer matched the ground truth for math problems and/or coding problems) and format rewards (whether or not the model used the structure expected for output).

There were no process rewards, nor step-by-step supervision. The implicit assumption of this approach was that if you give the model a correct-answer signal, along with a good deal of freedom to explore, the model will be able to find its own way to good reasoning.

Why R1-Zero wasn't the final model

R1-Zero had a problem because DeepSeek-V3-Base was pretrained on multilingual data, the model sometimes switched languages mid-reasoning. It would start a problem in English, shift into Chinese for a few sentences then continue in English. The reasoning was often correct but the output was unreadable. It also had readability issues. The thinking traces were sometimes clear internal monologues, but at other times, they were nearly incoherent.

DeepSeek R1 addressed this through a more involved training pipeline. They first collected a small number of high-quality chain-of-thought examples that demonstrated the kind of structured and readable reasoning they wanted. This was used for supervised fine-tuning before RL began, giving the RL process a better starting point. They also added a language consistency reward to penalize mid-reasoning language switching. A subsequent round of rejection sampling and SFT on the RL model's own outputs added coverage for non-reasoning tasks like writing and general question-answering. The result DeepSeek-R1, performs comparably to OpenAI o1-1217 on reasoning benchmarks.

DeepSeek released smaller models like 7B, 14B, 32B parameters, trained by fine-tuning on reasoning traces generated by the full R1 model. The 32B distilled version outperforms o1-mini on several benchmarks. This is knowledge distillation applied to reasoning i.e, smaller models learn reasoning patterns from larger models, without running the expensive RL training themselves.

Scaling just got a second dimension

The pre-training scaling laws described by Kaplan et al. in 2020 showed a clean relationship, i.e, training compute in, model capability out. The Chinchilla paper (Hoffmann et al., 2022) refined this further, showing that for a fixed compute budget, the optimal strategy is to train a smaller model on more data rather than a larger model on less data. These results organized the field for years.

Reasoning models introduce a second scaling curve. OpenAI’s o1 blog says performance improves when the model gets more time to think.

In “The Bitter Lesson” (2019), Richard Sutton makes the case that methods of making use of computing power in a general sense usually outperform in the long term methods that encode human knowledge explicitly. The first instance of this was the scaling of the amount of computational resources utilized for pre-training; the second instance is the scaling of the number of computational resources utilized for inference through the use of learned reasoning. This indicates that we are not done with increasing the scale of AI; in actuality, we have only just entered into a new chapter in that story.

That being said, scaling with regards to inference does work best for tasks that can have verified answers (i.e., math, formal proofs, code that can be tested). This makes it rather easy to determine whether the answer produced by a model is correct or not (there is either an answer or there is not).

Extending this to tasks that are open ended (e.g., writing) where there is no clear ground truth is a very active area of research and far more difficult. The current generation of reasoning models is likely very powerful in the STEM and coding fields, although exactly how broadly they will be able to extend beyond these fields remains an active area of research as well.

This one is different

There is often a claim every few years that AI has reached a major breakthrough, but it usually turns out to be a small improvement. But this time is different. Now models can use more compute during inference to produce better reasoning, which creates new possibilities that were not available before.

Tasks that previously needed human intervention because models can’t be trusted to reason carefully, now work differently. You can spend more compute at inference time to get more reliable results. For simple tasks the model can run fast, while harder or more important problems can be given more time to think, making the tool much more flexible than models from a few years ago.

DeepSeek R1 being released with open weights under an MIT license also changes the economics. The distilled R1-32B model, when used with quantization on a single high-end GPU, performs better than o1-mini on numerical tests. Now that there are multiple APIs, labs, researchers, and smaller organizations with limited resources can take advantage of them. This is a major change in the number of people who would have had access to this type of model before the DeepSeek R1 was made freely available.

Making models bigger and training them on more data helps them learn more, like reading lots of books makes someone more knowledgeable. But letting models “think longer” when answering questions helps them reason better and make fewer mistakes, like a person taking more time to work through a problem.

Both of these improvements are important and now we have both. What we do with that combination is still an open question.

References

DeepSeek-R1 Paper — DeepSeek-AI, 2025
Learning to Reason with LLMs — OpenAI, 2024
DeepSeekMath / GRPO Paper — Shao et al., 2024

For more such developer content, visit:
https://vickybytes.com

Why Terraform Breaks After Day-1 And How Terraform Actions Fix It

Shrestha Pandey — Wed, 01 Apr 2026 08:00:42 +0000

Let me start with something most infrastructure engineers might not say out loud — Terraform solves Day-1 beautifully and then kinda leaves you hanging.

You write your HCL, run terraform apply, and everything is provisioned perfectly. The state file appears impeccable. But six months later that same infrastructure has been poked, patched, manually changed and silently drifted away from what terraform thinks exists. No one realizes this until something breaks in production.

This article is about that “gap” between provisioning and actually managing infrastructure across its entire lifetime.

Day-2 Is Where Infrastructure Goes to Die (Slowly)

When a full stack is provisioned onto AWS using Terraform it has a good state and everything is the same, and then after some time passes and a deployment fails, someone logs into the console and changes a security group rule; now the deployment has been successful… but this change has not been documented and no tickets have been raised regarding this change.

When they run the scheduled terraform apply, Terraform sees the difference and resets the security group to the original state, resulting in production breaking. Everyone is confused because there were no code changes made.

The root cause of this issue is that the tools have not been designed for such usage; Terraform's core capability was to provide an infrastructure provisioning capability.

Therefore, what are teams doing for their Day-2 operations? Most have a combination of:

Bash scripts that contain parts nobody understands
AWS Console changes that are made manually and never documented
Ad-hoc Ansible runs that don't tie back to Terraform state in any way
Lambda functions that are each triggering another Lambda function creating a non-traceable chain

In total, over 30 different tools are managing a single hybrid infrastructure estate, which is being actively managed by organizations in the field.

The Lifecycle Nobody Talks About Enough

Infrastructure has four phases and most of the industry focuses heavily over two of them.

The first phase, or "Day-0", is the "Build Phase." In this phase an organisation will form their infrastructure and define policies. There has not been any provisioning yet and is done in partnership with the platform and security teams.

The second phase, or "Day-1", is "Deploy Phase." In this phase terraform apply will run, infrastructure will be built, and the application teams will deploy their workloads. This is where terraform really starts to show its capabilities.

Day-2 or "Manage Phase." This phase is where management happens, patches are installed, configurations are changed, certificates are renewed and scaled as needed and where compliance is checked for validity and accuracy. Day 2 can take years to complete and it is also were all of the operational pain will occur. Terraform traditionally has no place in this phase.

Day-N "Decommission Phase." This phase is where everything is removed and cleaned up.

Over the last ten years the DevOps industry has been focused on perfecting Day-1 tooling; however, there are very few tools available for Day-2.

Terraform Actions — What Changed in v1.14

Terraform Actions were added as stable functionality in Terraform v1.14 and were unveiled at HashiConf 2025. Now, providers can execute an action that does more than just CRUD - calling a lambda function, stopping an EC2, invalidating a CloudFront cache, or triggering an Ansible playbook.

These new actions are located in their own top-level action block in your HCL. Terraform can automatically execute them based on event triggers during a resource's lifecycle, or they can be invoked manually via the CLI without the need to do a complete terraform apply.

You can invoke an operational action (such as calling a lambda to warm up a cache) without having Terraform re-evaluate the entire state of your infrastructure. This is a significant change in how you will use your infrastructure from now on.

The AWS provider currently has:

aws_lambda_invoke
aws_ec2_stop_instance
aws_cloudfront_create_invalidation

How Actions Actually Work — The Syntax

There are two pieces. The action block itself, and the trigger that fires it.

Defining an Action

action "aws_lambda_invoke" "warm_cache" {
  config {
    function_name = aws_lambda_function.cache_warmer.function_name
    payload = jsonencode({
      source = "terraform_action"
    })
  }
}

Note the config {} wrapper. Provider-specific arguments go inside config, not directly in the action block.

Meta-arguments like count and provider exist outside config:

action "aws_lambda_invoke" "warm_cache" {
  count    = var.invoke_on_deploy ? 1 : 0
  provider = aws.us_east_1
  config {
    function_name = aws_lambda_function.cache_warmer.function_name
    payload       = jsonencode({ source = "terraform_action" })
  }
}

Triggering an Action on Resource Lifecycle Events

This goes inside the resource's lifecycle block:

resource "aws_lambda_function" "api" {
  function_name = "my-api-handler"
  # ... rest of config

  lifecycle {
    action_trigger {
      events  = [after_create, after_update]
      actions = [action.aws_lambda_invoke.warm_cache]
    }
  }
}

Two main things to understand:

events uses unquoted keywords — after_create and after_update
actions is plural and takes a list, not a single reference

You can also add a condition to guard the action:

lifecycle {
  action_trigger {
    events    = [after_create]
    actions   = [action.ansible_playbook.patch_instance]
    condition = var.enable_auto_patching
  }
}

When condition is false, the action is skipped completely. This is useful when the configuration should exist but only run in certain environments, like production.

Running Actions from the CLI

This is where it gets useful for Day-2 workflows:

# Just plan the action, don't run it
terraform plan -invoke=action.aws_lambda_invoke.warm_cache

# Actually run the action
terraform apply -invoke=action.aws_lambda_invoke.warm_cache

Terraform only executes that one action. No evaluation or change of any other part of your configuration occurs. Each action can only be executed once at a time; therefore, multiple -invoke cannot be run in a single command.

Provisioning EC2 + Immediate Patching via Ansible Automation Platform

One of the most important and widely used use cases is linking EC2 provisioning and automated patching through Ansible Automation Platform (AAP).

The challenge it solves is simple; there are usually many security patches pending for an Ubuntu AMI that has been provisioned several months prior. If EC2 instances are provisioned, and then you manually take the time to patch each one independently, then at some point (most likely within 30 days) you will not patch an instance you provisioned. Thus, the solution is to link the patching process to the lifecycle of the Terraform instance provisioning so that patching cannot be missed.

The Terraform Side

variable "instance_count" {
  type    = number
  default = 2
}

variable "ubuntu_ami" {
  type        = string
  description = "AMI ID — use a recent Ubuntu LTS, patching will handle the rest"
}

variable "aap_controller_url" {
  type      = string
  sensitive = true
}

variable "aap_oauth_token" {
  type      = string
  sensitive = true
}

variable "allow_instance_reboot" {
  type    = bool
  default = false
}

resource "aws_instance" "app_servers" {
  count         = var.instance_count
  ami           = var.ubuntu_ami
  instance_type = "t3.medium"
  subnet_id     = aws_subnet.public.id
  key_name      = aws_key_pair.deployer.key_name

  vpc_security_group_ids = [aws_security_group.allow_ssh.id]

  tags = {
    Name      = "app-server-${count.index}"
    ManagedBy = "terraform"
  }

  lifecycle {
    action_trigger {
      events  = [after_create, after_update]
      actions = [action.ansible_aap_job.patch_servers]
    }
  }
}

The after_update event is critical; should an instance be replaced (due to AMI update, instance type modification, or any reason that would force a new instance to be created), the patching will occur on the newly-created instance automatically without any manual intervention required.

The Action Block

action "ansible_aap_job" "patch_servers" {
  config {
    controller_url    = var.aap_controller_url
    oauth_token       = var.aap_oauth_token
    job_template_name = "EC2 Linux Patching"
    extra_vars = jsonencode({
      vm_hosts = [
        for instance in aws_instance.app_servers : {
          instance_id = instance.id
          public_ip   = instance.public_ip
        }
      ]
      allow_reboot = var.allow_instance_reboot
    })
  }
}

Credentials are stored in HCP Terraform's sensitive variable store. Instance IDs and IPs come straight from resource state at runtime, so AAP always gets current values.

Note: You must refer to your provider documentation to verify the argument names of your AAP action based upon the version you are using, while the structure remains valid.

The Ansible Playbook

AAP receives vm_hosts as an extra variable, builds inventory dynamically, and patches:

---
- name: Patch EC2 Instances
  hosts: all
  gather_facts: yes
  become: yes

  pre_tasks:
    - name: Wait for SSH connectivity
      ansible.builtin.wait_for_connection:
        timeout: 120
        delay: 10

    - name: Gather package facts
      ansible.builtin.package_facts:
        manager: apt

  tasks:
    - name: Update apt package index
      ansible.builtin.apt:
        update_cache: yes
        cache_valid_time: 3600

    - name: Apply security patches
      ansible.builtin.apt:
        upgrade: dist
        only_upgrade: yes
      register: patch_result

    - name: Check if reboot is required
      ansible.builtin.stat:
        path: /var/run/reboot-required
      register: reboot_required_file

    - name: Reboot if needed and allowed
      ansible.builtin.reboot:
        reboot_timeout: 300
        post_reboot_delay: 30
      when:
        - reboot_required_file.stat.exists
        - allow_reboot | default(false) | bool

  post_tasks:
    - name: Verify instance is up after patching
      ansible.builtin.ping:

/var/run/reboot-required is a file Ubuntu creates automatically when a package update (typically a kernel patch) requires a restart to take effect. The playbook checks for this file rather than blindly rebooting. And even then, it only reboots if allow_reboot is true, which is controlled from your Terraform variables.

AAP Job Template Configuration

With respect to Ansible Automation Platform:

Project is a reference to the Git repo containing your playbook.
Inventory is created dynamically from the vm_hosts value which is assigned to you by Terraform when it runs.
Credentials - Your SSH Private RSA key will be stored in AAP's credential vault and will be used to connect to the VM via SSH. This is a very secure separation of the enterprise applications and their configurations (Terraform) and how to connect to them (Ansible).

What the Full Workflow Looks Like

An engineer modifies the instance_count from 2 to 5 and then sends a Git commit with the modification.

The engineer pushes it to HCP Terraform. HCP Terraform recognizes the alteration made and initiates a plan, which indicates that Terraform will create 3 new AWS EC2 instances; and that Terraform will submit an action request once the provisioned instances are created.

After an engineer reviews and approves the proposed plan, Terraform executes the apply phase, creating three EC2 instances within AWS. At this time, the action_trigger will be invoked. Terraform calls AAP's API, sending the newly created instance IDs and public IP addresses, to activate the patching job.

The AAP will make a dynamic inventory through Terraform and subsequently will wait until all three instances can be accessed through SSH. Once that occurs, AAP will execute an apt dist-upgrade command to identify whether a reboot is required, and then reboot the instance, if allowed. Finally, AAP will, upon each instance coming back online & responding normally, send a report back to Terraform.

Upon the completion of the reports from AAP, Terraform will acknowledge completion of the run.

Other Places Actions Are Immediately Useful

CloudFront invalidation after S3 deployments

action "aws_cloudfront_create_invalidation" "bust_cache" {
  config {
    distribution_id = aws_cloudfront_distribution.website.id
    paths           = ["/*"]
  }
}

resource "aws_s3_object" "site_bundle" {
  lifecycle {
    action_trigger {
      events  = [after_update]
      actions = [action.aws_cloudfront_create_invalidation.bust_cache]
    }
  }
}

Lambda warm-up after deployments: After deployments, cold starts on the first production request are common sources of failures in Lambda functions. First, there may be cold starts on the first production request after deployments, which is typical of failures; hence, the function is invoked immediately after deployment to ensure users do not encounter any faults.

action "aws_lambda_invoke" "warm_up" {
  config {
    function_name = aws_lambda_function.api_handler.function_name
    payload       = jsonencode({ source = "warmup" })
  }
}

resource "aws_lambda_function" "api_handler" {
  lifecycle {
    action_trigger {
      events  = [after_create, after_update]
      actions = [action.aws_lambda_invoke.warm_up]
    }
  }
}

Stopping dev instances on demand

action "aws_ec2_stop_instance" "stop_dev" {
  config {
    instance_id = aws_instance.dev_server.id
  }
}

terraform apply -invoke=action.aws_ec2_stop_instance.stop_dev

Chaining multiple actions — actions is a list, order is respected, each one completes before the next starts:

lifecycle {
  action_trigger {
    events = [after_create]
    actions = [
      action.ansible_aap_job.patch_servers,
      action.aws_lambda_invoke.register_in_cmdb,
      action.aws_lambda_invoke.notify_slack
    ]
  }
}

Things That Will Catch You Out

An action that fails prevents a run's end:

By default, how long Terraform waits for an action to finish is determined by the status of the action being waited on. This allows visibility into the status of actions, but introduces potential for issues if AAP has gone down just prior to a critical deployment, preventing Terraform from being able to wait for AAP to complete. Use condition guards for actions that have minimal impact on the overall deployment if they're interrupted.

Idempotency is not a luxury:

Every time there is a resource change, the after_update event fires. That means that your playbooks and lambda handlers will be invoked multiple times over the lifecycle of your infrastructure. It is acceptable to run apt dist-upgrade multiple times but not acceptable to perform a database migration multiple times. You must design your programming for re-execution from the very beginning of the process.

Actions do not write to state:

When an action is executed there is no record of its execution being written to a statefile in terraform. The only way to tell that the action was executed is from the run history in HCP Terraform and in the logs for any other systems that were involved i.e., AAP job history, cloud watch, etc... You have to plan how you will be able to see/understand when your Terraform works based on its observability functionality.

The provider support continues to grow:

v1.14 of the AWS Provider supports a narrow set of operations (action types). Always refer to the Terraform Registry and Provider changelogs prior to assuming any operation is an action.

CLI invocation requires existing resources:

If your action references instance.id, but the instance doesn't exist in state, the -invoke option will fail during the plan phase. Actions that use CLI should reference existing resources that have already been provisioned.

The Actual Shift

Almost all infrastructure management involves Day-2 operations.

In the past, Day-2 operations were recorded in runbooks, as part of a Jenkins job that only a few people understood, or as a bash script that was modified years ago. Day-2 operations were delivered in a reactive manner, which means if something breaks, somebody performs an action.

With Terraform Actions, Day-2 operations can now be housed with the infrastructure they manage, from the same repository and pull request workflow and using the same audit trails as the infrastructure they are provisioned on. Patch management will be defined in terms of the infrastructure your organization will provision.

This kind of change reduces the number of incidents occurring at 2:00 AM.

Terraform Actions is stable from Terraform CLI v1.14.0. Check developer.hashicorp.com/terraform/language/invoke-actions for official documentation and your provider's registry page for supported action types.

Technical insights sourced from a community session on Terraform Day-2 operations.

For more developer content, visit vickybytes.com

Concurrency is Not Parallelism — And Most Developers Conflate Them

Shrestha Pandey — Wed, 11 Mar 2026 05:33:10 +0000

There's a quote I keep returning to whenever this topic comes up.

Rob Pike, one of Go's creators, said it at Heroku's Waza conference back in 2012: "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."

This one sentence, fourteen years ago. And I still see developers confusing the two every time.

I'm not saying that as a criticism — I've mixed them up before too. These two ideas are very close to each other, and in a lot of languages they even use the same tools. On top of that, documentation across the industry has been using the terms interchangeably for years. When languages like Go, JavaScript, Python, Java, and Rust all handle things a little differently and each uses slightly different wording, it’s pretty easy to see why people mix them up.

So here's my attempt at a proper technical explanation.

Start Here: The Actual Definitions

Concurrency is a structural thing. A program is said to be concurrent if it is written in such a way that many operations can be in progress at the same time. This does not mean that all operations will be in progress at exactly the same time. Instead, it means that the operations can take turns being in progress. This is what concurrency is all about.

On the other hand, parallelism is all about the actual execution of the tasks and implies that the tasks are being performed at the same time. In fact, parallelism is not possible with a single-core processor.

The difference between these two terms has been described in the literature of computer science as follows: "Concurrency means that two or more actions are in progress at the same time. Parallelism means that two or more actions are executed at the same moment."

The term in progress is of significant importance. It is quite obvious that a task can be in progress without being actively engaged in the execution of the program. For example, a task may be in progress while waiting for the result from the database, and another task may be in progress while engaged in the execution of the program. This is why it is possible for tasks to be concurrent without them actually executing concurrently.

Pike's full quote is worth reading once more:

"Concurrency is about structure, parallelism is about execution. Concurrency provides a way to structure a solution to solve a problem that may — but not necessarily — be parallelizable."

One thing that people commonly misunderstand is that writing concurrent code does not necessarily mean that it is parallel. You can write code that has many things in progress, but still run it on a single core CPU where only one thing is actually happening at any time. The tasks simply take turns making progress.

However, the reverse is also true. It is possible that you are dealing with parallelism without concurrency. For instance, you could divide one large calculation into many different CPU processes. The calculation would occur simultaneously. However, there would not be any processes interacting with other processes.

In other words, concurrency and parallelism are independent concepts. While concurrency and parallelism often occur together in practice, they are not the same thing.

Why Everyone Mixes These Up

First, threads blur the boundary. Concurrency and parallelism can be done using threads. It is possible that the same Thread object could represent processes that alternate on the same processor core, or processes that run concurrently on different processor cores. The programming language syntax for threads is the same in both cases, but the reality depends on the environment in which the code is executed. You can't tell which one you're getting just from the API.

Second, documentation is often imprecise. If you glance at many references dealing with programming languages, you will find that terms like concurrent execution are used when, in fact, the author means parallel execution, and vice versa. Again, this is not specific to any particular ecosystem.

Third, async and await have made the confusion even stronger. The terms async and await have gained popularity in languages such as JavaScript, Python, C#, Rust, Swift, and Kotlin. The terms have been used to define a pattern of code that can handle multiple tasks. The confusion arises from the fact that the syntax for the terms looks the same for both I/O-bound concurrency and for speeding up CPU-heavy code, which actually requires parallelism. The code can look identical, even though the runtime behavior is completely different.

The four actual combinations

It is helpful to think beyond the notion of concurrency and parallelism as simply true or false. Instead, there are four different combinations, and it is a good thing to know which of the combinations your system actually falls into.

Concurrent but not parallel: In this case, there are many different tasks that exist at the same time. However, they share the same single CPU core. This means that only one task is ever being executed at any particular time. However, the system will switch between the different tasks so that they can all make progress. A classic example of this is Node.js handling thousands of different HTTP connections on a single thread.

Parallel but not concurrent: In this case, there is only one task, but that task is split across many different CPU cores. This means that different parts of the task will be executing at the same time. A classic example of this is image rendering. This is one task that is simply split across many different parts.

Concurrent and parallel: There are a number of independent tasks to be executed, and the system is capable of executing several tasks concurrently. A good example is a Go language-based web server that distributes the goroutines to the different CPUs. The requests are defined as independent tasks that can be executed concurrently.

Neither concurrent nor parallel: This is simple sequential execution. One task runs to completion before the next begins. Many programs start this way, and some remain this way because it keeps the system simple and predictable.

Most web services will turn out to belong to either the concurrent only group or the concurrent and parallel group. It may make a big difference in your thinking if you know that your system actually belongs to one of these groups.

How different runtimes actually do this

These abstract concepts become clearer when you understand how actual runtime environments implement these concepts. Each ecosystem has its own design choices, hence concurrency can vary significantly depending on the programming language in use.

JavaScript / Node.js:

JavaScript is single threaded, meaning that the V8 engine runs the code on a single call stack. It runs the code one frame at a time. This is where the event loop comes in. This is what makes concurrency in JavaScript code possible.

This is what enables the JavaScript code to perform async operations. This is what the event loop is. It is what makes the JavaScript code perform async operations. For example, if an HTTP request is made, the operation is handed over to the system. This system is libuv. It uses the OS's async I/O. While the operation is being performed, the JavaScript code is still being executed on the call stack. Once the operation is done, the callback is queued, and the event loop runs the operation once the call stack is free.

Another thing that often catches developers out is the way that microtasks and macrotasks are handled. While microtasks, such as settled Promises and queueMicrotask, are fully completed before macrotasks, such as setTimeout, setInterval, and I/O callbacks are executed, that is simply the way the specification is written and is not an error. However, that is often not the way that the developer would have expected things to work, and that is something that needs to be understood if you are going to use JavaScript properly.

MDN describes the way that the event loop works in the following simple terms: "The event loop enables asynchronous programming in JavaScript while remaining single threaded.”

Because of the way that the event loop is designed, Node.js is concurrent rather than parallel. It can handle thousands of connections at once because most of those connections are simply waiting. The event loop is very good at handling thousands of waiting connections, but that does not mean that thousands of things are happening at once.

So, if you need parallelism, you need to use Web Workers in browsers or Worker Threads in Node.js. They provide separate execution contexts in separate OS threads. Workers cannot access shared DOM or shared object references. They can only do it by using structured cloning or SharedArrayBuffer with Atomics. Yes, you need to ask for parallelism in JavaScript.

This means that CPU-intensive code in the main thread blocks the event loop. It blocks everything else in Node.js from running as long as it is running. async/await does not change this. Worker Threads do.

// still blocking the main thread, async doesn't help here
const result = await heavyComputation();

// this runs on a separate OS thread
const { Worker } = require('worker_threads');
const worker = new Worker('./heavy-task.js');
worker.on('message', (result) => console.log(result));

Go:

In the case of the Go language, concurrency is handled as if it were actually a part of the language. This is due to the fact that concurrency is handled via the basic concurrency construct called the Goroutine, which is essentially a lightweight function that can run concurrently. It is to be noted that goroutines are not threads. Instead, goroutines are handled via the Go runtime. This is done via the M:N Threading Model.

The advantage that is gained is that the memory requirements are much lower. This is due to the fact that the goroutine starts off with a stack size of only about 2KB, which can increase or decrease dynamically. In the case of threads, however, it is necessary to reserve about 1 to 2 MB of memory. This is the reason why it is possible to run hundreds of thousands of goroutines simultaneously without any issues. It would simply not be possible to run that many threads.

The scheduler in Go uses another algorithm, known as Work Stealing. Each processor, P, has its own queue of goroutines, known as the run queue. If one processor is out of work, it can steal work from another processor’s run queue. The scheduler is also preemptive in Go versions 1.14 and later. This means that the runtime can stop the execution of the goroutines at safe points, like function calls or loop back edges. This is done to prevent any one goroutine from consuming an OS thread indefinitely.

When a goroutine is blocked doing I/O or channel receive, it parks, and another goroutine is executed on the same OS thread. The code looks simple, like it is doing a simple sequence of operations, but it is not. The scheduling is going on in the background.

func fetchData(url string, ch chan string) {
    resp, err := http.Get(url) // goroutine parks here, OS thread stays free
    if err != nil {
        ch <- ""
        return
    }
    defer resp.Body.Close()
    body, _ := io.ReadAll(resp.Body)
    ch <- string(body)
}

func main() {
    ch := make(chan string, 3)
    go fetchData("https://api.example.com/a", ch)
    go fetchData("https://api.example.com/b", ch)
    go fetchData("https://api.example.com/c", ch)
    for i := 0; i < 3; i++ {
        fmt.Println(<-ch)
    }
}

GOMAXPROCS controls how many OS threads can run Go code at the same time. Since Go 1.5 it defaults to the number of CPU cores. Goroutines are then distributed across those threads automatically.

The Go team also pushes a specific coordination philosophy: “Do not communicate by sharing memory; instead, share memory by communicating.” Data moves through channels instead of shared variables. The goal is fewer race conditions by design, not just by careful discipline.

Python:

Python’s history here is almost a cautionary story. Not because Python is bad, but because it shows what happens when one design decision, made early on, ends up influencing the ecosystem for decades and how hard it is to change that decision.

The GIL – Global Interpreter Lock is a mutex used in the CPython interpreter to ensure that only one thread executes Python code at a time. The number of CPU cores you have can be as many as you want. You cannot have multiple threads of execution running Python code at the same time in the CPython interpreter because of the GIL.

The reason for the GIL was that the Python interpreter used reference counting for memory management. If the Python interpreter didn’t have the GIL, multiple threads of execution could have potentially updated the reference count of a Python object at the same time, and the memory state of the Python interpreter would have been inconsistent. The GIL ensured that the Python interpreter was simple, stable, and easy to integrate with code written in another language – C. The cost of this simplicity was that Python threads were only ever really useful for I/O-bound concurrency. For CPU-bound concurrency, they didn’t really do anything to help, and they could even have a negative effect.

So, the ecosystem has developed workarounds. asyncio has become the de facto standard for I/O-bound concurrent code with an event loop, similar to Node.js. multiprocessing has become the standard for CPU-bound code with separate processes, each with their own interpreter and GIL.

However, all of this is changing with the advent of Python 3.13 in October 2024. PEP 703 has introduced an optional "free-threaded build" of CPython, which can run without the GIL. Python 3.14 in October 2025 has taken this further by including an optional thread-safe incremental garbage collector, which solves the latency issue in the 3.13 build.

Removing the GIL wasn’t simple. The approach uses biased reference counting. Each object tracks an owning thread. The owning thread can use fast non-atomic operations, while other threads must use slower atomic ones. This prevents the cache thrashing that would occur if all reference count operations had to be atomic.

CPU-bound multithreaded Python code can finally scale across cores in the free-threaded build, at times even approaching linear speedups. This is just not possible in the original CPython.

There is, however, a cost.

The free-threaded build will be slightly slower for single-threaded code. This is shown by the pyperformance benchmarks, where the free-threaded build is 1% slower on ARM (macOS aarch64) and up to 8% slower on x86-64 Linux compared to the normal GIL build.

The GIL can also quietly return. Note that if you import a C extension module that was not built with free-threading support and is missing the Py_mod_gil slot, the interpreter will in fact re-enable the GIL instead of crashing. Your code will still work, but it will be serialized again.

It also requires explicit installation. The default Python build still has the GIL. If you require free threading, you must use the 3.13t or 3.14t builds. Library support is still variable.

The long-term plan is to remove the GIL entirely. This is expected to occur in Python 3.20.

Java:

Java's original approach was simple: "one Java thread equals one operating system thread." It was solid, it worked, but it didn't scale well. So if your server has 10,000 simultaneous connections, that's 10,000 operating system threads and approximately 10-20 GB of pre-allocated stack memory just for the threads before doing any real work. The solution has been "reactive programming": non-blocking programming frameworks, chains of CompletableFuture, and reactive streams. They work, and they are the solution, although writing and debugging them can be pretty painful.

Virtual threads in Java 21 - JEP 444, September 2023 - are the solution. Virtual threads are lightweight threads that are managed by the JVM and are multiplexed onto a pool of platform threads. When a virtual thread is blocking due to I/O operations, the virtual thread yields the underlying platform thread, and the JVM immediately runs another virtual thread on that underlying platform thread. Normal sequential blocking code is written while the JVM does the work behind the scenes.

One detail worth knowing: virtual thread scheduling is cooperative by default. A virtual thread yields when it reaches a blocking operation. If you have a tight CPU loop, it will not yield until something blocks. Go’s preemptive scheduler differs from this in some ways that are important for CPU-bound code.

For CPU-bound parallelism, Java uses ForkJoinPool and StructuredTaskScope (preview in Java 21, evolving in later versions) for the distribution of computation over multiple CPUs.

I/O-bound vs CPU-bound: the actual decision

When you want to make a program faster, the real question is not “should I use threads?” The real question is what is actually slow.

If the task is I/O bound, the program spends most of its time waiting. It could be waiting for a network response, a database query, disk access, or an external API. During that time the CPU is often idle. In these cases concurrency helps because the program can work on other tasks while one task is waiting. This is why platforms like Node.js can handle many connections even with a single thread.

But if the task is CPU bound, the CPU is already busy doing calculations. Adding more async or concurrency will not make it faster. It mostly helps organize the code. What actually helps in this case is parallelism, where multiple CPU cores work on the problem at the same time.

What you're building	Right tool	Won't help
API handling many concurrent requests	Concurrency — async, event loop, goroutines	Blocking thread per request
Batch image or video processing	Parallelism — Worker Threads, multiprocessing	Sequential processing
Fanning out multiple DB queries	Concurrency — Promise.all, asyncio.gather	Querying one at a time
ML inference on CPU	Parallelism — multiprocessing, native thread pools	Python threads pre-3.13
Chat server with thousands of idle connections	Concurrency	One OS thread per connection
Video encoding	Parallelism — frames are independent, distribute them	Single-threaded encoding

A server with 10,000 concurrent connections does not need 10,000 cores. Most of the connections are just waiting for the server to respond. Concurrency is enough for this problem. A video encoder trying to encode 10,000 videos as fast as possible is a very different problem.

Same number of tasks on paper, but completely different requirements in practice.

What async/await actually does

This is probably where the most confusion rears its ugly head in real-world development.

async/await is just syntax for doing concurrent I/O-bound code but with the illusion of sequential code. When you await something, the task just stops, and other tasks are run. When the task is done, the original task continues. That is it.

What async/await does not do is give you more CPU cores or allow you to run code in parallel. If you await a CPU-heavy function, the function runs on the same thread and blocks everything else on that thread until the function is done.

# concurrent — total time ≈ the slowest single call
async def main():
    async with aiohttp.ClientSession() as session:
        results = await asyncio.gather(
            fetch(session, "https://api.example.com/a"),
            fetch(session, "https://api.example.com/b"),
            fetch(session, "https://api.example.com/c"),
        )

# still blocking the event loop for the entire duration
async def slow():
    result = sum(i * i for i in range(10_000_000))
    return result

However, simply making the function async and sprinkling await keywords throughout the code does not help CPU-bound code. It is simply an ornament. To handle CPU-bound code in Python, ProcessPoolExecutor is the way to go. The code will run in other processes, and in another process, there is no GIL. For the free-threaded build of Python 3.14, you can use ThreadPoolExecutor too. You will get true parallelism using many threads.

What’s changed recently that matters

A few things changed over the last couple of years worth being aware of.

Python has finally killed the GIL, at least optionally. Python 3.13, October 2024, includes a free-threaded interpreter with the GIL off. Python 3.14, October 2025, took it further. Free threading is now officially supported, not just something you test carefully before shipping. For the first time in the history of Python, threads are capable of running CPU-intensive code in parallel. The path to the GIL being removed as an option, as a default, is planned through Python 3.20.

Java 21 virtual threads make most of the point of writing reactive Java code for I/O-intensive services irrelevant. One thread per OS thread is no longer a scalability concern. Just write your sequential blocking code, and let the JVM schedule it. For web API services, they are almost I/O bound, so this is a big win.

Rust's async story is settled, and Tokio is the dominant runtime with a mature ecosystem. Rust's ownership model means data races are a compile-time error, not a runtime surprise. If your systems need to be both correct and fast, Rust is now a viable option.

Edge runtimes have fundamentally changed the concept of scaling. Cloudflare Workers, Deno Deploy, these are all isolated event loop instances running all over the world. You scale by deploying to more machines, not by deploying more threads to one machine. The mind shift of thinking of your server as a function running in 200 different locations at once is a big one.

References

Rob Pike — "Concurrency is Not Parallelism" — Heroku Waza Conference, January 2012. go.dev/talks/2012/waza.slide
Rob Pike — "Go Concurrency Patterns" — Google I/O, June 2012. go.dev/talks/2012/concurrency.slide
MDN Web Docs — "JavaScript execution model" — developer.mozilla.org, updated 2025
Node.js Documentation — "The Node.js Event Loop" — nodejs.org, 2026
Go Documentation — Goroutines and Channels — go.dev
Python PEP 703 — "Making the GIL Optional in CPython" — peps.python.org
Python 3.14 Docs — "Free Threading" — docs.python.org/3.14
JetBrains Blog — "Faster Python: Unlocking the GIL" — blog.jetbrains.com, December 2025
JEP 444 — "Virtual Threads" — openjdk.org, Java 21
Manning — "Concurrency vs Parallelism" — freecontent.manning.com

For more such developer content, visit:
https://vickybytes.com

Note: Edited with AI Assistance

OpenClaw: The AI Agent That Actually Does Stuff - Part 2 - Building a Real Bot

Shrestha Pandey — Fri, 27 Feb 2026 16:08:25 +0000

If you read Part 1, you know what OpenClaw is and why it matters. Now we’re going beyond theory. In this part, I’m going to build something real, and show you exactly what happened, including the challenging parts as well, and by the end, you’ll have a working bot that sends you an automated morning dev briefing on Telegram without you doing anything.

That honesty is the whole point. If you want a tutorial that hides the rough parts, there are many of those. But this is not that.

What We're Building

An automated morning briefing bot. Every day at 9am, it fetches the top dev and AI stories from web, filters what matters, and sends you a clean summary directly on Telegram. Once you set it up, it just arrives.

You don’t need any script running in a terminal, or cron job you manually configured. You just need your phone, buzzing with the briefing while you’re still chilling.

That’s the automation part. That’s what makes this different from a chatbot.

What You Need Before Starting

Before you start:

Node.js v22 or higher: run node --version to check. Update from nodejs.org.
WSL2 if you're on Windows: native Windows is not supported by OpenClaw, more on this below.
A Telegram account: this is how you talk to your bot.
An API key: I'll cover the free options below.
Terminal comfort: you don't need to be an expert but you should be able to run commands without panicking.

The last one is important. OpenClaw is genuinely powerful but it’s not a one-click install. If the terminal feels unfamiliar right now, bookmark this and come back when you’re more comfortable.

Windows? Set Up WSL2 First

OpenClaw doesn't work on native Windows so you need WSL2 — a Linux environment that runs inside Windows. Takes about 5 minutes.

Open PowerShell as Administrator:

wsl --install

Restart your machine when it asks. Then open Ubuntu from the Start menu, set up your username and password, and run:

sudo apt update && sudo apt upgrade -y

From here, run everything in this Ubuntu terminal — not PowerShell, not Command Prompt. Just WSL2 Ubuntu.

The API Key Situation

This confused me more than any other part of this build, so I want to help you avoid the same frustration.

OpenClaw needs an AI model to think, and that model needs API credits. Free tiers on OpenRouter and DeepSeek technically work but they hit limits fast, and with a weak free model the bot sometimes just explains what it would do instead of actually doing it.

If you can add even $2–5 to OpenRouter or Anthropic, do it. It's enough for weeks of real usage and everything just works properly. With Claude Sonnet, you tell the bot "send me a briefing every morning at 9am" and it saves the scheduled job itself. With a free model you might need to set that config manually — I'll show you how either way.

My advice: start free to understand how things work. Add credits when you're ready to go fully hands-off. The bot works fine in both cases. The only real difference is how much guidance it needs while you’re setting it up.

Installing OpenClaw

node --version
# Need v22.x.x or above

npm install -g openclaw@latest
openclaw --version

Create Your Telegram Bot First

Do this before running the setup wizard so you have the token ready
Telegram → search @BotFather → send /newbot

Name it whatever you want, give it a username ending in _bot. BotFather gives you a token that looks like 123456789:ABCdef.... Copy it right away — you'll need it in the next step.

Running the Setup Wizard

openclaw onboard --install-daemon

The --install-daemon part is important. It sets up the gateway as a proper background service that starts automatically when your machine boots. Without it you'd have to manually start everything every single time.

Here's what to choose at each screen:

Onboarding mode: Manual

Gateway setup: Local gateway. You'll see gateway.bind: loopback on the next screen — this means the gateway is only accessible from your own machine, not your network. Leave it exactly like that.

Model: OpenRouter with openrouter/deepseek/deepseek-chat if you're going free, or Anthropic with claude-sonnet-4-6 if you have credits.

Port and bind: just press Enter both times, defaults are fine

Auth mode: token. Some older guides still show auth: none as an option but that got removed after a real security issue was found. Always pick token.

Tailscale: skip it for now. It's for accessing your bot remotely from your phone over the internet. Useful later but adds complexity right now.

Messaging channel: Telegram → paste your BotFather token

DM policy → Pairing. This generates a one-time code that permanently links your Telegram account as the bot owner. Nobody else can use your bot.

Skills: skip everything → Finished

You'll see a huge list of skills — GitHub, Obsidian, Spotify, and a bunch of others. Ignore all of it for now. We're keeping this simple.

Pairing Your Telegram Account

Now:

openclaw tui

Message your bot on Telegram. It'll reply with a pairing code like F85VDARD

Run this in terminal:

openclaw pairing approve telegram F85VDARD

Type confirm in the TUI. Your account is linked now.

Health Check, Then First Message

openclaw doctor

Run this every time you change config. It catches problems early. If no critical errors are showing, then go ahead.

Now go to Telegram and just send:

hello

If it replies to you, everything's connected. Gateway, model, Telegram — all working.

Building the Bot

Send this in Telegram:

give me a morning briefing with top dev and AI news from https://news.ycombinator.com

Here's what actually came back from my bot:

Morning Briefing — Top Dev & AI News

1. Google API keys weren't secrets, but then Gemini changed the rules
   Gemini now treats API keys as sensitive, breaking older workflows.
   240 points | 4h ago

2. OpenSwarm – Multi-Agent Claude CLI Orchestrator for Linear/GitHub
   Tool to automate dev workflows using Claude agents.
   18 points | 3h ago

3. Self-improving software won't produce Skynet
   Analysis of why self-modifying code isn't an existential risk.
   13 points | 2h ago

It browsed Hacker News, filtered what was relevant, and formatted it. I didn't write a single line of code. I sent one message.

Setting Up the Automation

Now this is the part that actually makes it an agent.

Send this in Telegram:

every morning at 9am, fetch top dev and AI stories from [https://news.ycombinator.com](https://news.ycombinator.com/), summarize the most important ones and send me the briefing here on Telegram automatically

If you're on a paid model, it saves the scheduled job itself. Check:

cat ~/.openclaw/cron/jobs.json

If the jobs array is empty because you're on a free model, set it manually. First grab your exact values:

#Your username
echo $USER

#Your timezone
cat /etc/timezone

Note both outputs. Then run this, replacing the placeholders:

node -e "
const fs = require('fs');
const jobs = {
version: 1,
jobs: [
  {
    id: 'morning-briefing',
    schedule: '0 9 * * *',
    timezone: 'YOUR_TIMEZONE_HERE',
    prompt: 'Fetch top 5 dev and AI stories from https://news.ycombinator.com, summarize them concisely and send the briefing to me on Telegram',
    channel: 'telegram',
    enabled: true
  }
 ]
};
fs.writeFileSync('/home/YOUR_USERNAME_HERE/.openclaw/cron/jobs.json', JSON.stringify(jobs, null, 2));
console.log('Done!');
"

Replace YOUR_TIMEZONE_HERE with what cat /etc/timezone printed — something like Asia/Kolkata. Replace YOUR_USERNAME_HERE with what echo $USER printed.

Then:

openclaw gateway restart

Done. Every morning at 9am the gateway runs the briefing and sends it to your Telegram. You're not doing anything. It just happens.

Do You Need the Terminal Open for This?

No. That's the whole point of --install-daemon.

The gateway is a system service now. macOS runs it via launchd, Linux via systemd. It starts when your machine boots, runs silently in the background, and executes the cron job at 9am whether you're at your desk or not. Close the terminal. Restart your laptop. It doesn't matter.

The terminal was only for setup. The automation runs without it.

What I Actually Learned

Most tutorials don't include this part. I'm including it because it's the most useful thing I can tell you.

Which model you use matters a lot. With a free model, the bot gets what you're asking for but sometimes it just explains what it would do instead of doing it. With Claude Sonnet it actually does it. OpenClaw being model agnostic is genuinely useful — but that also means the experience is only as good as the model you give it.

The first Telegram reply is when it clicks. Not because it's technically impressive — it's a message, not exactly magic. But because you realize everything is connected and running on your own machine. Your data, your gateway, your agent. That hits different than using some cloud tool.

It's challenging at first and that's okay. The setup has rough edges. The free model has real limits. But once you get through it and see how the pieces fit together, you start thinking about what else you could build. That's always a good sign.

What You Can Build Next

Once you've got this working, the system is easy to extend.

Add more sources — dev.to, GitHub release feeds, whatever RSS you follow. Add a Friday evening digest instead of just mornings. Connect the Obsidian skill so summaries go straight into your notes vault. Add weather via wttr.in — no API key needed.

The skill system is what makes all this composable. Each skill is a SKILL.md file — readable, auditable, nothing hidden. And because OpenClaw uses the same AgentSkills spec as Claude Code, Cursor, and GitHub Copilot, there's a much bigger ecosystem available than just what's on ClawHub.

To Wrap Up

OpenClaw is worth the effort. It's not easy to set up and the free model has real limitations. But once it's running, it's genuinely different from everything else out there.

Start with one workflow, one source, one schedule. Get comfortable with how it behaves. Then expand from there.

Resources

GitHub + docs: github.com/openclaw/openclaw
Official site: clawd.bot
Security details: docs.openclaw.ai/gateway/security
Community Discord: discord.gg/openclaw
Health check: openclaw doctor

For such articles and developer content, visit: https://vickybytes.com

OpenClaw: The AI Agent That Actually Does Stuff - Part 1

Shrestha Pandey — Tue, 24 Feb 2026 09:30:01 +0000

Let me be honest with you before we get into anything.

I’ve been exploring developer tools for a while now and I've gotten pretty good at spotting how the hype cycles work. Something new drops and GitHub, Twitter trends it, and then everyone starts moving on to it. I assumed OpenClaw — or Clawdbot, or Moltbot, whatever name you saw it under — was going to be exactly like that.

Then I actually sat down, researched it, installed it and spent a proper week using it as a part of my daily workflow.

And here we are.

This blog is not just going to be a list of features. I want you to actually understand what OpenClaw is doing differently, how it works, what you can do with it, and what you should know if you’re just getting started with AI agents. By the end, you’ll have enough knowledge to make a decision about whether this actually matters.

Let’s begin.

The Name Story (As It Changed Three Times)

If you’ve seen Clawdbot, Moltbot, and OpenClaw and thought they were different projects, they’re not. It’s the same project, just renamed over time.

In November 2025, Austrian developer Peter Steinberger built it as a small weekend experiment and called it Clawdbot, a playful nod to Claude, with a lobster mascot to match. He open sourced it and didn’t expect much.

In January 2026, it suddenly blew up, gaining around 60,000 GitHub stars in just a few days. Soon after, Anthropic raised a trademark concern because “Clawdbot” sounded too close to “Claude.”

On January 27, it was renamed Moltbot, keeping the lobster theme. A few days later, the community voted and the final name became OpenClaw, which better reflects what it actually is.

The current name is OpenClaw. Any tutorial mentioning Clawdbot or Moltbot is referring to the same project.

What Is OpenClaw?

Currently, most of the AI tools work like chatboxes. You open the tab, type a prompt, get the response and then manually select what you need and where you’re actually doing your work. The AI lives on someone’s server, and you’re the one who connects its output to the real environment.

OpenClaw takes a completely different approach.

It’s a free, open-source AI agent you run yourself. Once you set it up, it lives on a machine you control — your laptop, desktop, a cloud VPS, even a Raspberry Pi 4. Once it’s installed and working, you communicate with it through the texting apps you use everyday like WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Microsoft Teams, Google Chat, Matrix, anyone of your choice as it supports all of them.

Something that distinguishes it, is that it doesn’t only reply with text. It actually does things. It can browse the web, read and write files on your machine, run terminal commands, manage your calendar, draft and send emails, interact with GitHub, fill out forms, and combine all of that together in multi-step workflows that run on their own without manual intervention.

In Steinberger’s words, it’s “AI that actually does things” — which differentiates it from tools that only explain how to do things.

Most importantly, your data never leaves your machine until you explicitly allow it to send somewhere. There’s no hidden policy change that suddenly changes how your conversations are handled. Your files, your history, your memory, all of these are stored locally in simple Markdown and YAML files that you can open in any text editor.

It’s all in your machine, your data, your system.

How It Actually Works

This is the most important section, that helps you understand its architecture — why it can do what it does, why security matters, and why it’s built the way it is. So instead of rushing, let’s break it down in a clear, simple way.

OpenClaw works through three main parts that work together.

The Gateway — The Control Plane

The Gateway is a Node.js daemon running constantly in the background, bound to localhost only by default. It stays connected to all your messaging channels, sends each message to the right agent session, and handles requests one at a time for each session, in the exact order they arrive.

That detail is more important than it might seem. By handling requests serially, the system avoids race conditions, where multiple tasks running at the same time might conflict over the same files or shared state.

In addition to handling messages, the system also triggers a heartbeat every 30 minutes by default, or every hour if you're using Anthropic OAuth. This heartbeat allows the agent to check for any scheduled tasks, and perform monitoring on its own, even if you haven’t asked it anything recently.

Agent Runtime — The Thinking Loop

The Agent Runtime is the AI loop.

When a message arrives, the system first loads the session history and any long-term memory saved on disk as plain .md files, so it genuinely remembers what you told it last week. It then adds only the skills that are actually relevant to your current request, instead of loading every installed skill and slowing things down.

After that, everything is sent to the AI model you have chosen, whether that is Claude, GPT-4o, DeepSeek, or a local model.

If the model decides it needs to take an action, it sends back a tool call instead of a normal text reply. The system runs that tool, collects the result, and gives it back to the model. The model then decides if it has enough information or if it needs to take another step.

This process continues until the model reaches a final answer, which is then streamed back to you.

Skills — The Extension System

OpenClaw comes with several built-in tools. It can control a real browser, which means it can actually click buttons, fill out forms, log in to websites, and interact with pages like a human. It can also read and write files, run shell commands, and handle scheduled jobs.

On top of these tools, there is ClawHub, which is basically a shared community library of skills created by other users. It includes integrations for tools like Todoist, GitHub, Spotify, Obsidian, Home Assistant, Gmail, Google Calendar, Linear, and many others.

That said, ClawHub is community-contributed and lightly vetted — more on this in the security section below.

Skills themselves are simple. Each one is mainly a SKILL.md documentation file, sometimes it includes an optional install script. The agent reads the documentation to understand what the skill can do, similar to onboarding a new teammate by giving them clear written instructions, instead of making them go through code to figure things out.

Another important detail is that the skill format OpenClaw also works with the AgentSkills spec, an open standard adopted by Claude Code, Cursor, VS Code, GitHub Copilot, and others. So you are not limited to just the skills available on ClawHub. It connects to a much larger ecosystem, giving you access to many more existing skills.

What You Can Actually Build With It

Here are some workflows developers are actually running:

Repo management from your phone. It pushes to branches, check test results, review recent commits, from anywhere without your laptop. If you have ever been traveling or commuting when something urgent happened in a repo, you already understand how useful this is.

Overnight autonomous coding loops. At the end of the day, you describe what needs to be done, for example, refactor the auth module, write tests, and open a PR. By morning, the pull request is ready, along with the agent’s reasoning written in the comments. The documentation recommends using Claude for more complex, multi-step coding tasks because it handles reasoning better and is more resistant to prompt injection. Sonnet 4.6 is suggested when you want to manage costs, and Opus 4.6 when you need stronger performance.

Proactive information briefings. You can connect skills to sources like GitHub issues, Hacker News, subreddits, or RSS feeds and track keywords related to your work. The agent filters the content, summarizes what matters, and sends you a digest at whatever schedule you choose. This works through the heartbeat system, so it runs automatically without you having to trigger it each time.

Voice notes into structured notes. Send a voice message through Telegram, and the agent will transcribe it, pull out the key points, and save it to any tool you prefer. This process takes about 15 seconds.

Why It Matters Even If You’re Just Starting Out

Let me be honest about this. OpenClaw is not beginner friendly. You need Node.js v22 or higher, you should be comfortable using the terminal, and you need some basic understanding of networking. The documentation clearly says that if you are not confident with command line tools, this is not something you should run casually. That is not gatekeeping, it is just being realistic about the risks.

That said, it is still worth learning about early in your AI journey.

One big reason is transparency. OpenClaw is MIT licensed and fully open. You can read the code yourself. You can see how the Gateway handles and queues requests. You can inspect how skills are discovered and loaded. You can open the memory files on disk and see exactly what is stored. You can even look at how the system prompt is constructed before every LLM call. Here, nothing is a black box unless you choose not to look.

That level of visibility helps you build a clear mental model of how agent systems actually work. You start to understand the moving parts, how memory is managed, how tools are selected, how decisions flow from one step to the next. And that understanding is not limited to this one project. As AI agents become more common in real software workflows, having that foundation will matter.

If the terminal still feels intimidating, that is fine. Save it for later and revisit it when you feel more comfortable. The setup process will likely improve over time. But even before you run it yourself, the concepts behind it are worth understanding.

Security: The Non-Negotiable Part

OpenClaw has wide access. It can read and write files, run shell commands, connect to messaging apps, check your calendar, and much more. But that level of access means you need to be careful with how you set it up, thus security is not optional here.

Network access: By default, the Gateway only binds to localhost, which is good. If you want to access it in your phone, use an SSH tunnel or Tailscale, which the project recommends. Do not expose port 18789 directly to the public internet. That is asking for trouble.

Authentication setup: The auth: none option was permanently removed in version 2026.1.29 after a real vulnerability was discovered. Crafted links could redirect your auth token to an attacker’s server, and the WebSocket server was not validating Origin headers properly.

If you see any guide that still shows this:

# Removed — do not use

auth: none

# Use one of these instead

auth: token    # openclaw gateway token
auth: password # OPENCLAW_GATEWAY_PASSWORD=yourpassword

Update immediately if you are running an older version. Do not continue using outdated configs.

ClawHub skills: These are contributed by the community and are only lightly vetted. Before installing anything, read the SKILL.md file and any install.sh script. It just takes two minutes.

Prompt injection risks: The system prompt guardrails are helpful, but they are not hard security boundaries. Real protection comes from locking down your inbound channels. Use allowlists where possible. Enable mention gating in group chats. Treat links and file attachments from unknown sources as potentially malicious.

Run openclaw doctor regularly: There is a built in diagnostic command called openclaw doctor. Run it from time to time. It helps surface configuration mistakes early, before they turn into bigger security issues.

Getting Started

What you need before installing:

Node.js v22+ (node --version)
Terminal comfort (WSL2 on Windows — native Windows not supported)
API key for your model (Claude recommended; GPT-4o or DeepSeek work too)
Ideally a separate machine or cheap VPS (2GB RAM minimum, ~$4–5/month on Hetzner or DigitalOcean)

Install:

npm install -g openclaw@latest
openclaw onboard --install-daemon

The --install-daemon flag installs the Gateway as a proper system service. On macOS it uses launchd, and on Linux it uses systemd. That means it will automatically start when your machine boots up. Do not skip this step, otherwise you will have to manually start it every time.

The setup wizard guides you through the whole process. It helps you configure authentication, initialize your workspace, connect your first messaging channel, and install your first skill.

If you are just getting started, use Telegram as your first channel. It is the simplest to set up. WhatsApp is supported too, but the linking process is more involved and can be a bit tricky the first time.

Recommended model config:

models:
  default: claude-sonnet-4-6    # good balance of capability and cost
  fallback: gpt-4o
  # For complex autonomous tasks: claude-opus-4-6

Then run:

openclaw doctor

After you finish the setup, run this once. And make it a habit to run it again anytime you change the configuration.

In the beginning, keep things simple. Start with one clear workflow, maybe a Telegram research assistant or a basic morning task manager. Use it for a while and understand how it responds, how it uses tools, and how it handles memory.

Once you feel confident about how it behaves, then slowly expand its access and add more capabilities.

Where It's Headed

On February 14, 2026, Steinberger announced that he is joining OpenAI and moving OpenClaw under an open source foundation. It will remain independent from any single company, stay MIT licensed, and operate with proper governance. The codebase is not going closed. It remains fully open.

For something that is only a few months old, that kind of direction is rare. This does not look like a weekend side project that will fade away quietly. The core ideas behind it, being local first, model agnostic, messaging native, truly agent driven, and fully open, are closely aligned with where modern developer tools are heading.

Even if you are not planning to install it right now, it is worth understanding what it represents and how it works.

Resources

GitHub + docs: github.com/openclaw/openclaw
Official site: clawd.bot
Security details: docs.openclaw.ai/gateway/security
Community Discord: discord.gg/openclaw
Health check: openclaw doctor

For such articles and developer content visit: https://vickybytes.com

What 20-Year Tech Veterans Said About Developer Skills for 2026

Shrestha Pandey — Mon, 16 Feb 2026 17:13:21 +0000

A few days back, I participated in a competition supported by VickyBytes. That experience completely changed my perspective.

Between the rounds, I had an opportunity to speak with several tech professionals, each having over 20 years of industry experience. Their valuable insights on emerging technologies changed the way I used to think. They didn't just work in the industry, they witnessed multiple technology shifts over their careers.

When I asked them, "If you were starting today in 2026, what would you actually focus on?"

What they told me was much more specific than what I expected. From these discussions, 10 technologies emerged as the most important for developers in 2026.

Here's what I learned.

1. Agentic Orchestration & MCP

Developers are now expected to build multi-agent systems using the Model Context Protocol (MCP) to integrate LLM models with various tools and data sources. It represents a major change from building a single AI chatbot to designing systems where multiple agents collaborate to perform specific tasks.

The Model Context Protocol provides an organized way for modern AI models to communicate with tools and data sources. Instead of writing custom integration codes for each tool, you implement MCP server once and any MCP compatible AI can utilize it.

This includes designing systems where different agents handle different tasks. For example: one agent monitors data stream, another analyzes patterns, third communicates with APIs, and fourth makes decisions based on input.

The technical challenge comes with orchestration. It involves knowing when agents should work in parallel, managing context windows across agents, handling errors, and debugging systems where behaviour depends on the interactions between agents.

Positions like “AI Agent Architect” and “Multi Agent Systems Engineer” are now appearing with competitive salary ranges, reflecting the shift in job market.

2. Rust

Rust has become an essential language for performance-critical and blockchain applications, mainly because of its memory-safety guarantees, as memory safety bugs are not acceptable in many fields.

The ownership system is a feature that defines Rust. Every part of memory has exactly one owner, and as soon as the owner goes out of scope, the memory is released automatically. To share data, you either transfer ownership or let someone borrow it for a defined period. And when it comes to changing data across threads, safety at compile-time must be ensured.

Something that makes it stand out is that all categories of bugs like use-after free, buffer overflows, null pointer dereferences, become compile-time errors instead of runtime failures. The language is built to keep these issues out of the compiled code.

The learning process can be challenging. Initially, you might struggle with the compiler and try to understand its error messages. As time passes, things start to make sense. Everything feels clearer, and you start thinking about ownership and borrowing more naturally.

The job market is strong. Roles in blockchain, embedded systems, cloud infrastructure, and gaming now specify Rust as a requirement, senior positions often come with high pay. Learning Rust’s memory model helps you write better code in other languages as well because it makes memory management clearer and more intentional.

3. Retrieval-Augmented Generation

Expertise in vector databases like Pinecone or Weaviate, along with building retrieval pipelines, has become important for creating AI systems that use real-time and private data. RAG helps reduce AI hallucinations by making the model base its answers on real data instead of depending only on its training data.

RAG architecture involves multiple components. First, documents are divided into chunks (around 200-500 words each) and then converted into vector embeddings using models that capture their meaning. These embeddings are stored in vector databases that are built to find similar content quickly, even across large amounts of data.

When a query comes in, it gets converted into an embedding using the same model and the vector database searches the most similar chunks. Those chunks are sent to the model as context along with the query, so it can generate an answer based on real data.

Retrieval pipelines often need to be hybrid because pure semantic search can miss exact matches while pure keyword search can miss conceptual relationships. Strong systems usually combine both, with a reranking step using a smaller model to improve the final results.

Choosing a vector database comes with tradeoffs. Managed solutions like Pinecone are easier to use, while self-hosted options like Weaviate or Qdrant give more control and can be cheaper at large scale. But the hard part isn’t only the tools, it’s understanding how embeddings work and how to write good prompts for RAG. Systems should be designed to recognize when the retrieved context isn’t enough, instead of giving wrong answers confidently.

4. Platform Engineering 2.0

Developers are moving from traditional DevOps to Internal Developer Platforms (IDPs). IDPs allow self-service infrastructure and also include AI-driven protection mechanisms to prevent mistakes. This is the point where infrastructure is no longer something that every developer understands and interacts with, but rather a product that can be easily consumed.

The aim is to simplify the infrastructure complexity and still retain the flexibility. Developers should be able to deploy their services easily without having to know the details of Kubernetes. Monitoring should already be read to use. Security and compliance should be part of the system by default.

Strong platforms give developers self-service tools that already follow company’s best practices. Instead of providing cloud credentials and documentation to developers, you give them simple interfaces that guide them for right choices and make incorrect choices difficult.

Integration of AI guardrails is something that’s evolved in 2026. Platform teams are now creating systems to manage AI usage. They handle things like prompt management, rate-limiting for LLM calls, and preventing sensitive data from reaching external APIs.

This also applies to model deployment. Fine-tuned models are versioned and deployed using the same pipelines as application code. A/B testing is built into the platform, and monitoring keeps track of errors and code automatically. If certain limits are passed, the system can rollback changes on its own.

This is a mix of having strong infrastructure skills and a product mindset. You are creating services that other engineers use on a daily basis. Thus, it is important to understand how these services are used, where they fail, and how you can improve them based on usage patterns. It is not just about the technology, but whether people are using the platform and whether it helps them move faster.

Platform engineering roles show up more often in job listings, and they often pay more than senior backend roles. The reason is simple: a small platform team can boost the productivity of the entire organization, making a much bigger impact.

5. Go (Golang) for Cloud-Native

Go is still a top choice for building microservices and Kubernetes tools because it handles concurrency in a scalable way. Most of the cloud-native world like Kubernetes, Docker, Terraform, is built with Go, so if you work in cloud infrastructure, you’re very likely to run into it.

The main strength of Go lies in goroutines, making concurrency simple. Instead of managing threads or dealing with complex patterns, you start a goroutine for each task. The Go runtime handles the complex part, distributing thousands of goroutines across a small number of operating system threads.

This makes it much easier to build services that handle multiple requests at the same time. The code looks simple and organized, even while managing thousands of operations. The garbage collector is designed for low delays, even when the system is under heavy load.

The language is very simple. There’s no complex inheritance, no operator overloading and very little hidden behaviour. This keeps code easy to read even months or years later, and helps new members become productive faster.

The tooling is strong and consistent. go fmt removes arguments about code style. go test handles testing and benchmarking. go build creates a single static binary with no runtime dependencies, making deployment much easier.

The ecosystem is well established. There are solid web frameworks, database drivers, gRPC support, and a reliable standard library. In most cases, what you need is already there, so you’re not constantly searching for missing pieces.

6. TypeScript & Type-Safe Frontends

TypeScript has become a key part for building enterprise-grade web applications, with frameworks like Next.js and NestJS leading the way. What began as “JavaScript with types” is now the standard choice for serious web development.

The most important advantage is that errors can be detected at compile time, not at runtime. Type checking ensures that functions are called properly, that properties of objects exist, and that values such as null or undefined are handled intentionally.

The ecosystem has consolidated around TypeScript. The tools are designed to integrate smoothly with TypeScript. Capabilities such as accurate auto-completion, refactoring, and documentation become more accurate with the presence of type information in the code.

Types can be used to reflect business rules, making sure certain values aren’t mixed up, that some actions only work on validated data, and that API responses match the expected structure. Editor integration changes how you write code. Autocomplete doesn’t just suggest function names, it also shows parameter types and return values.

For teams, the advantages build over time. Types make the intent of the code clearer, which speeds up the reviews. Junior developers get quick feedback from the compiler, instead of finding errors later in production.

The switch doesn’t have to happen all at once. You can move from JavaScript to TypeScript one file at a time and tighten the rules gradually. There’s no need to commit to full strict mode on day one. The job market shows the dominance of TypeScript. Most senior frontend roles now expect TypeScript experience. It’s not a niche skill anymore, it’s simply the baseline.

7. Shift-Left Security (DevSecOps)

Security is now integrated early in the development process. Developers must be proficient in automated threat modeling and secure code development right in their IDEs. This is a paradigm shift from security being the final gate to security being a natural part of the development process.

This approach adds security checks throughout the development process. IDEs flag potential vulnerabilities as you code. Pre-commit hooks catch problems before they reach version control, and CI/CD pipelines run deeper scans before code review. Security becomes ongoing feedback instead of last-minute check.

Modern tools like Secret scanners catch API keys, passwords, or tokens in commits and history. Dependency scanners watch for known vulnerabilities and can automate updates. Static analysis tools spot issues like SQL injection or XSS directly in the code.

Secure-by-default libraries make safety part of everyday coding. Database query builders block SQL injection by design, and HTTP clients manage authentication and rate limits automatically. Using these tools means security happens naturally, as part of normal development.

The business case is clear. Fixing security problems after code is in production is far more expensive than catching them early. Data breaches can lead to huge costs from remediation, fines and lost customer trust. Investing in security tools is small compared to the potential losses.

Security champions are developers trained in security who act as points of contact for the security team and help spread best practices across larger teams. Instead of the security team reviewing everything, each development team has someone with deeper security knowledge.

8. WebAssembly (WASM)

Languages such as C++ and Rust can now run almost as fast as native code in browsers. This opens the way for complex web-based AI and gaming applications. WebAssembly enables developers to bring their code and performance-critical paths to the web.

The speed is quite remarkable. For heavy computations like 3D graphics, video encoding, scientific simulations, and cryptography, WebAssembly offers speeds that are almost comparable to native code, all while staying safely within the browser boundaries.

Real-world use cases demonstrate this capability. Graphics design software, CAD tools, and image editors now run inside browsers with speeds that previously required desktop applications. Even game engines compile to WebAssembly, providing speeds almost comparable to console-quality performance right inside the browser.

The security architecture is sound. WASM runs in the same sandbox as JavaScript, with no filesystem or network access unless explicitly granted. Unlike JavaScript, WASM modules explicitly list their imports and exports, which makes it easier to reason about their capabilities.

In practice, WASM takes care of the heavy lifting, and JavaScript handles the UI and platform interactions. The separation is clean, with WASM exporting functions for computation and JavaScript providing access to browser APIs.

The tooling has improved a lot. Emscripten compiles C and C++ to WASM with strong browser support. Rust has first-class WASM support with excellent tools, and other languages are gradually adding WASM targets as well.

There are some limitations. WASM can’t access the DOM directly, so JavaScript is needed for UI work. Working with mixed WASM and JavaScript projects also means managing two languages and separate build systems.

Beyond performance, WASM provides a secure sandbox for running untrusted code, and plugin systems can leverage WASM to add extensibility while keeping strong security boundaries.

9. Telemetry & Observability Engineering

The focus in modern systems is moving beyond basic logging toward full observability, often using OpenTelemetry and AI-driver debugging. Traditional logs just aren’t enough for today’s complex distributed systems.

Observability is built on structured telemetry: metrics to show what is happening, traces to show how requests are flowing through the system, and logs to provide additional information. These three pillars must be unified so that events can be correlated across the whole system.

The model is now becoming proactive instead of reactive. Instead of looking back at failures after they happen, observability must call attention to problems as they are happening, warning of a degradation of performance before it becomes an outage.

OpenTelemetry allows for standardized instrumentation and data formats, making telemetry platform-agnostic and not vendor-lock-in. Context propagation, trace IDs, and additional context passed through all services is critical to reconstructing the whole picture of request processing.

The most important tool is understanding what to observe. Too much telemetry generates noise, unnecessary logs or uninstrumented metrics, while good observability is centered on the signals that indicate real issues or insights into system behaviour.

Instrumentation is now a core developer responsibility. Developers must decide which metrics matter, structure traces to provide actionable insights, and write logs that help debugging rather than adding unnecessary data.

10. FinOps for Developers

With the volatility of AI and cloud costs, developers now need to write code with costs in mind and include budget checks in their deployment processes. Infrastructure costs aren’t just an operations issue anymore, they’re part of development.

The problem is that cloud costs can change and are hard to predict. Running AI models can become very expensive as usage grows. Auto-scaling can add resources quickly to handle load, which can lead to higher bills.

Cost-aware development means knowing how technical choices affect money. Picking a database isn’t only about features, it’s about understanding how much it will cost. Choosing compute resources means balancing speed and performance against budget limits.

Good teams include cost checks in their development pipeline. Tools can estimate costs before deployment. Some AI model are much more expensive than others, and this matters when millions of requests are involved.

Auto-scaling must be set up carefully. Fast scaling reacts to traffic quickly but can raise costs a lot. Slow scaling saves money but can hurt performance. The right setup balances speed and cost while monitoring both metrics.

Communication is as important as skills. Programmers must be able to tell other people why some features are more expensive. The ability to estimate costs before building them helps to prioritize work.

The Common Thread

These ten technologies share a common characteristic: they exist to handle complexity. Modern systems are distributed, AI dependent, need strong security, and use a lot of resources. Old ways of building software can’t always keep up with these demands.

The change isn’t just about new tools, it’s about how software is created. Security has to be ongoing, not just a final step. Costs should be planned during the start, not added after issues appear. AI systems need careful orchestration, not just simple prompting.

These changes aren’t short-term trends. They reflect a deeper shift in how software is designed, built, and managed.

Getting Started

These technologies can feel overwhelming at first. The key is to focus: pick one out of them that fits your current work. If you’re building AI systems, try RAG or MCP. For infrastructure work, look into Go or Platform Engineering. If you work with web apps, focus on TypeScript and security tools.

Start by building something simple. Learning through tutorials is one thing, but learning through experience is what builds actual skills. Choose a project where you apply the technology to solve actual problems. You will encounter real-world problems, make errors, and learn through debugging.

Learn from people who have already worked with these technologies. Seeing why they made certain choices, what issues they faced, and what they’d do differently gives insights that documentation alone can’t provide.

Conclusion

Technology keeps changing fast. These ten technologies matter because they solve real problems in large-scale production systems. They represent a new way of thinking about software, not just new tools in the old way of working.

The skills you gain from learning them go beyond the tools themselves. Understanding Rust’s ownership model, improves memory management thinking in any language. Working with RAG systems builds knowledge in information retrieval and prompt design, useful across AI projects. Experience in Platform Engineering applies whenever you need to improve developer workflows.

These technologies aren’t about keeping up with trends, they’re about giving you the skills to build the systems that matter today and in the future.

Note: Edited with AI assistance.