Forem: IBM Developer

Java 25: What's new for developers?

IBM Developer — Fri, 10 Oct 2025 12:58:08 +0000

This article was originally published on IBM Developer by Alex Soto Bueno.

Java 25 was released on September 16, 2025. This version is important because it is an LTS (Long Term Support) version, the next one after Java 21. As an LTS release, it will receive a minimum of five years of support and updates/bug fixes.

This new version contains multiple enhancements, both for devs and for the runtime execution.

In this article, we'll focus on stable and preview features for developers. We won't cover incubator features as they will likely change in future releases. (The Vector API remains in the incubator stage for the tenth time, waiting for the release of Project Valhalla.)

Major features and enhancements

These are some of the most significant additions, changes, and previews in Java 25:

Feature	What it does
JEP 507: Primitive Types in Patterns, `instanceof`, and `switch` (Third Preview)	You can use primitive types in pattern matching, `instanceof`, and `switch` constructs.
JEP 511: Module Import Declarations	A module import lets you access every class from the packages that the module exports.
JEP 512: Compact Source Files and Instance Main Methods	You can write a `main` method without needing to create a class.
JEP 513: Flexible Constructor Bodies	`super` and `this` statements do not have to be the first statement in a constructor. This is better, for example, for the argument validation.
Scoped Values (JEP 506)	A simpler way to share immutable variables across a thread and its child threads.
Key Derivation Function API (JEP 510)	Provides standard APIs in the JDK for deriving cryptographic keys (PBKDF2, Argon2, etc.).
JEP 519: Compact Object Headers	Reduces the size of object headers (metadata per object) to make objects use less memory.
JEP 521: Generational Shenandoah Garbage Collector	Makes the Shenandoah GC use a generational model (young/old generations), improving GC performance especially in startup or mixed-allocation programs.
JFR Enhancements	Improvements to JDK Flight Recorder, like CPU-time profiling, cooperative sampling, method timing, and tracing.
Stable Values (JEP 502, Preview)	Provides a mechanism to define values that are set once and then treated as constants by the JVM, allowing optimizations beyond what final fields offer.
PEM Encodings for Cryptographic Objects (JEP 470, Preview)	Supports encoding and decoding cryptographic keys, certificates in PEM format.

New stable features for developers

These include:

Module import declarations
Flexible constructor bodies
Scoped values
Key derivation function (KDF) API

Continue reading on IBM Developer to learn about the stable and preview features in this new LTS release of Java...

Build context-aware AI apps using MCP

IBM Developer — Wed, 23 Jul 2025 10:46:32 +0000

This tutorial was originally published on IBM Developer.

Large language models (LLMs) have transformed how developers build applications, but they face a fundamental limitation: they operate in isolation from the data and tools that make applications truly useful. Whether it's accessing your company's database, reading files from your filesystem, or connecting to APIs, LLMs need a standardized way to interact with external systems.

The Model Context Protocol (MCP) addresses these limitations by providing a standardization layer for AI agents to be context-aware while integrating with the data and tools. Learn more about what MCP is, its client-server architecture components, and its real-world benefits in this “What is MCP?” article or in the MCP docs.

The following figure shows the typical MCP architecture, with MCP hosts, MCP clients, MCP servers, and your own data and tools.

In this comprehensive tutorial, we'll explore MCP and learn how to build a production-ready integration with IBM watsonx.ai, demonstrating how to create AI applications that can seamlessly connect to enterprise data and services.

Continue reading on IBM Developer to learn how to build context-aware AI applications using MCP with Granite models...

Build a RAG-powered assistant

IBM Developer — Wed, 16 Jul 2025 11:49:09 +0000

This tutorial was originally published on IBM Developer.

Imagine you’re heads-down focused in a project, searching a GitHub repository’s Markdown files for that one small unit test command or an elusive detail about configuring an API. You’re flipping between READMEs, wikis, and scattered “docs” folders, losing time and patience. What if there was a way to just ask your documentation? "How do I run a single unit test from the suite?" or "What’s the retry policy for the endpoint?" and get a precise, context-aware answer in seconds? This is where, the technology of Retrieval-Augmented Generation (RAG) can help make your documentation conversational.

In this tutorial, we’ll build an intelligent documentation assistant that lets you chat with your project’s Markdown documentation (those .md files you see in GitHub, like READMEs). Using JavaScript, and a tool called LangChain, and the IBM Granite model via Ollama, we’ll create a command-line interface (CLI) that connects to your GitHub repository, pulls in your documentation, and answers your questions in plain language. It’s like having a super-seasoned teammate who knows every word of your project’s docs, akin to a pair programming buddy in your day to day workflows.

Why Markdown? Markdown is the lingua franca of developer documentation. It is lightweight, packed with critical info, readable, and ubiquitous in GitHub repos. They are the go-to format for project documentation on GitHub. Our assistant makes them interactive, saving you time and frustration. It’s the perfect starting point for a RAG-powered assistant.

Here’s what we’re building

We’re creating a command-line assistant that lets you chat with any markdown file instantly turning your documentation into an interactive, AI-powered resource.

Ask Questions, Get Answers: Provide a public URL to a markdown file (like a README or guide from GitHub). The assistant downloads and processes it, so you can ask questions about its content in natural language.
AI-Powered, Contextual Responses: When you ask a question, the assistant searches the document for the most relevant sections and uses a local large language model (IBM Granite 3.3 via Ollama) to generate accurate, context-aware answers.
No Complex Setup: There’s no need to clone repositories, manage tokens, or set up databases. Just paste a markdown file URL and start chatting.
Proof of Concept: This demo focuses on a single markdown file to showcase the Retrieval-Augmented Generation (RAG) workflow. The design is simple, but the approach can be extended to entire documentation sites, web chat interfaces, or large-scale document search.

Continue reading on IBM Developer to learn how to build a RAG-powered Markdown documentation assistant...

Build a RAG-based AI assistant with Quarkus and LangChain

IBM Developer — Thu, 10 Jul 2025 17:16:28 +0000

This tutorial was originally published on IBM Developer.

Enterprise Java developers are familiar with building robust, scalable applications using frameworks like Spring Boot. However, integrating AI capabilities into these applications often involves complex orchestration, high memory usage, and slow startup times.

In this tutorial, you'll discover how Quarkus combined with LangChain4j provides a seamless way to build AI-powered applications that start in milliseconds and consume minimal resources. Quarkus is a Kubernetes-native Java stack tailored for GraalVM and OpenJDK HotSpot. Quarkus offers incredibly fast boot times, low RSS memory consumption, and a fantastic developer experience with features like live coding.

In this tutorial, you'll build a smart document assistant that can ingest PDF documents, create embeddings, and answer questions about the content using retrieval augmented generation (RAG). The application will demonstrate enterprise-grade features like dependency injection, health checks, metrics, and hot reload development, all while integrating cutting-edge AI capabilities.

RAG and why you need it

The retrieval augmented generation (RAG) pattern is a way to extend the knowledge of an LLM used in your applications. While models are pre-trained on large data sets, they have static and general knowledge with a specific knowledge cut-off date. They don't "know" anything beyond that date and also do not contain any company or domain specific data you might want to use in your applications. The RAG pattern allows you to infuse knowledge via the context window of your models and bridges this gab.

There's some steps that need to happen. First, the domain specific knowledge from documents (txt, PDF, and so on) is parsed and tokenized with a so called embedding generation model. The generated vectors are stored in a vector database. When a user queries the LLM, the vector store is searched with a similarity algorithm and relevant content is added as context to the user prompt before it is passed on to the LLM. The LLM then generates the answer based on the foundational knowledge and the additional context which was augumented from the vector search. A high level overview is shown in the following figure.

Learning objectives

By the end of this tutorial, you'll be able to:

Set up a Quarkus project with LangChain4j for AI integration
Create declarative AI services using CDI and annotations
Implement document ingestion and vector embeddings
Build a RAG (Retrieval Augmented Generation) question-answering system
Experience Quarkus development mode with instant hot reload
Deploy as a native executable with sub-second startup times
Monitor AI operations with built-in observability features

Continue reading on IBM Developer to learn how to build your AI-powered document assistant...

Build Mainframe skills faster with watsonx Assistant for Z

IBM Developer — Mon, 07 Jul 2025 08:38:07 +0000

This article was originally published on IBM Developer by Sethulekshmi K

Mainframes are known for handling large workloads with high scalability, reliability, and security. They continue to be essential for many industries, including most of the world’s top banks, airlines, and retailers.

Over the past 20 years, Mainframes have evolved from legacy systems to modern platforms. Today, they support containerization, DevOps, artificial intelligence, machine learning, and generative AI. The latest IBM Z processors take this further by enabling fast, low-latency generative AI directly on the Mainframe.

Many companies face a shortage of Mainframe experts. Experienced workers are retiring, new talent is hard to find, and Mainframes require specialized skills. The key question is: how can organizations attract new talent and pass on knowledge before it’s lost?

According to Gartner, using generative AI with Mainframe systems can help address the skills gap. This creates an important opportunity for IBM to offer a solution.

IBM watsonx Assistant for Z is a generative AI tool designed to help solve Mainframe challenges. This article explains what it can do and how it helps.

Simplify Mainframe management with IBM watsonx Assistant for Z

IBM watsonx Assistant for Z is a chatbot designed to make Mainframe tasks easier. It provides expert guidance, automates complex steps, and helps both new and experienced IBM Z users.

The assistant speeds up learning, answers technical questions, and can be customized with your company’s own information.

What makes it different from other Chatbots?

Smarter conversations with AI

watsonx Assistant for Z gives fast, accurate answers by using both IBM Z documentation and your company’s own knowledge. It uses retrieval augmented generation (RAG) to pull information from trusted sources, so responses are reliable and relevant.

The system includes content from over 200 IBM Z products, including IBM Redbooks and technical papers. You can also add your own documents, such as internal guides and best practices, to customize the answers for your team.

The assistant uses the granite-3-8b-instruct language model, which is trained on a large and carefully selected dataset. IBM follows strict rules for data quality, filtering out misleading content to improve accuracy and reduce errors in responses.

Connects with your tools and systems

watsonx Assistant for Z works with many platforms that support Open API, such as Ansible, AIOps tools, z/OS Management Facility (z/OSMF), and ServiceNow. You can follow the assistant’s suggestions and run tasks directly from the chat without needing deep Mainframe knowledge.

Continue reading on IBM Developer to learn more about customizing watsonx Assistant for Z for your team, benefits and use cases, and how IBM’s CIO office used watsonx Assistant for Z to improve Mainframe performance and support new users.

Why sharding is essential to fine-tuning LLMs

IBM Developer — Wed, 02 Jul 2025 14:35:35 +0000

This article was originally published on IBM Developer.

Training and fine-tuning large language models (LLMs) is becoming a central requirement for modern AI applications. As these models grow in size—from billions to hundreds of billions of parameters—the demands on computational resources have increased dramatically. Fine-tuning such models on a single GPU is no longer realistic due to memory limitations and training inefficiencies.

Sharding is the process of splitting a model’s data or components across multiple devices—such as GPUs or nodes—so that the training workload is distributed. By dividing the model’s parameters, gradients, and optimizer states into smaller “shards,” each device only needs to manage a fraction of the total, making it possible to train models that would not otherwise fit in memory. Sharding also enables parallel training, which speeds up the process and improves scalability.

In this article, we explore the importance of sharding for scalable LLM fine-tuning, describe various sharding strategies, and provide practical guidance based on industry-standard tools.

Why training and fine-tuning LLMs require sharding

Training large language models (LLMs) involves handling substantial amounts of data and computation during each pass through the network. These passes are generally referred to as:

Forward pass: When data flows through the model to generate predictions.
Backward pass: When the model computes how wrong the predictions were (loss) and adjusts internal weights accordingly through backpropagation.

Each training iteration requires tracking and updating several core components:

Model parameters: These are the learnable weights of the neural network that determine the model’s behaviour. They are updated during training to minimize prediction errors.
Gradients: These represent the rate of change of the loss with respect to each model parameter. Gradients are computed during the backward pass and guide how the model updates its parameters.
Optimizer states: These are internal values maintained by optimization algorithms like Adam or SGD. They help fine-tune how each parameter gets updated based on the gradient and previous updates.

While inference can be managed on a single GPU using techniques like offloading or quantization, training requires all three of these components to reside in GPU memory simultaneously. This can triple the memory requirement compared to inference. Without sharding, even relatively modest models (7B–13B parameters) can exceed the capabilities of high-end GPUs.

Moreover, sharding enables:

Larger batch sizes, improving convergence and model generalization.
Distributed compute workloads, reducing training time.
Better scalability across infrastructure.

Continue reading on IBM Developer to see a DeepSpeed ZeRO example of scalable fine-tuning...

How ColBERT works

IBM Developer — Wed, 02 Jul 2025 14:30:57 +0000

This article was originally published on IBM Developer.

A re-ranker is a model or system that is used in information retrieval to reorder or refine a list of retrieved documents or items based on their relevance to a given query.

In a typical retrieval pipeline, the process consists of two stages:

Initial retrieval: A lightweight retriever (for example, BM25, or Best Matching 25, is a dense retriever) that fetches a large set of candidate documents quickly.
Re-ranking: A more sophisticated and computationally expensive model that reorders the retrieved candidates to improve relevance and accuracy.

ColBERT (Contextualized Late Interaction over BERT) is a retrieval model that is designed to strike a balance between the efficiency of traditional methods like BM25 and the accuracy of deep learning models like BERT, an open source deep learning model used for natural language understanding.

The ColBERT re-ranker is especially effective in retrieval-augmented generation (RAG) pipelines, where precise and contextually rich document retrieval directly impacts the quality of generated answers.

Types of re-rankers

Here are different types of re-rankers and their features.

Type	Strengths	Weaknesses	Example use cases
Traditional	Fast, interpretable, lightweight	Lack semantic understanding	Basic search engines, initial filtering
Cross-encoders	High accuracy, deep interaction	Computationally expensive	Document ranking for QA, passage retrieval
Bi-encoders	Efficient, scalable	Less accurate for fine-grained queries	Large-scale retrieval, first-pass ranking
Late Interaction Models	Fine-grained, efficient	Moderate computational cost	RAG systems, conversational AI
Hybrid	Best of both worlds	Integration complexity	Enterprise search, hybrid RAG systems

ColBERT re-ranker employs late interaction for scoring, which allows for efficient yet effective ranking of documents.

How ColBERT works

Unlike standard transformer-based retrievers, which calculate relevance scores by concatenating a query and a document into a single sequence, ColBERT uses late interaction. This means:

The query and document embeddings are computed independently.
The interaction happens later, during scoring, rather than during encoding.

This approach allows pre-computation of document embeddings, making retrieval much faster without significant loss in accuracy.

Continue reading on IBM Developer...

Why serving large language models is hard ... and how vLLM and KServe can help

IBM Developer — Mon, 23 Jun 2025 13:53:52 +0000

This article was originally published on IBM Developer.

My title "Fast inference and furious scaling," obviously inspired by the movies, is not just catchy but also captures the pace of generative AI technology. With new optimization techniques, tools emerging daily, and not enough "good first issues," generative AI is a rapidly evolving landscape that often leaves beginners struggling to find their footing.

In this article, beginners who are new to the world of LLM inferencing and serving can learn about why it's a complicated thing to do and gain a clearer idea of how to get started using two open source tools: vLLM and KServe. Instead of bogging down in technical details, the focus of this article is on the 'why' and 'how' of LLM inferencing and serving to give a background context for those who want to participate, while including resources for further deep-diving linked along the way.

What does it mean to "serve" an LLM?

Model serving boils down to making a pre-trained model useable. When you try a cloud-based service like ChatGPT, models have been made available for you to send prompts (that is, inference requests) and receive a response (that is, output). Those models are being served for you to consume. Behind the scenes, the model has been made available through an API.

Sounds simple, right? Well, not quite.

Serving LLMs isn't just about wrapping them in an API. Just like any memory-intensive software program, large models bring performance trade-offs, infrastructure constraints, and cost considerations.

Here are just a few reasons that model serving is not simple:

Massive LLMs require massive resources
LLMs are complex
Scaling is hard

Continue reading on IBM Developer...

Optimize your JVM using memory management and garbage collection

IBM Developer — Thu, 19 Jun 2025 17:47:32 +0000

This article was originally published on IBM Developer.

The Java Virtual Machine (JVM) is the engine that runs your Java application. It handles memory allocation, garbage collection (GC), thread management, and JIT compilation.

VM performance tuning is the process of optimizing the Java Virtual Machine (JVM) configuration and behavior to improve the performance, scalability, and reliability of Java applications. If your JVM is not configured properly, you might experience high latency, out of memory errors, CPU spikes, slow response times, or application crashes.

In this article, we’ll review two key performance tuning techniques: memory management and garbage collection. By optimizing your JVM with these two techniques, you will improve the performance, scalability, and reliability of your Java applications.

Memory management in JVMs

In this article, we will explore these JVM flags:

-Xmx: The maximum heap size, which you use to limit how much memory is used.
-Xms: The initial heap size, which you use to pre-allocate memory to improve performance.
-Xss: The thread stack size, which you use to control how much memory each thread can use.
-XX or -Xlog:gc*:file: The automatic process of reclaiming memory by removing unused or unreachable objects to ensure efficient memory management and prevent leaks.

Maximum heap size (-Xmx)

The -Xmx flag sets the maximum heap memory that the JVM is allowed to use. The JVM will not allocate more heap memory that the value specified.

For example, if you use -Xmx2G, then the JVM can use up to 2 gigabytes of heap memory.

If your application needs more heap memory than what is specified during runtime, an OutOfMemoryError exception is thrown.

Initial heap size (-Xms)

The -Xms flag sets the initial heap memory that is allocated when the JVM starts. By setting the initial heap memory, you help prevent dynamic memory allocation during early runtime.

For example, if you specify -Xms512M, then the JVM starts with 512 megabytes of heap memory.

To avoid heap resizing and improve performance, you can set -Xms to the same value that you set -Xmx.

Thread stack size (-Xss)

The -Xss flag sets the stack memory size for each thread. Each thread has its own stack that is used for method calls and local variables.

For example, if you specify -Xss1M, then each thread will be given 1 megabyte of stack memory.

If you specify a size that is too small, then a StackOverflowError exception is thrown. If you specify a size that is too large, then you have fewer threads available due to high memory usage per thread.

Garbage collection (-XX or -Xlog:gc)

Garbage Collection (GC) in Java is the process by which the Java Virtual Machine (JVM) automatically identifies and removes unused or unreachable objects from memory to free up space and ensure efficient memory management. Garbage collection helps prevent memory leaks, avoid dangling pointers, and reduce programmer error.

Garbage collection does the following:

Finds objects that are no longer needed. These are objects that your code can no longer access (no active reference to them).
Reclaims the memory used by these objects. This memory is then made available for new object creation.
Optionally returns unused memory back to the OS (in some JVMs).

Java 8: -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:

-XX flags are non-standard, advanced options (as opposed to standard -X or normal flags) and are used to enable, disable, or configure GC behavior, memory management and more.

Format:

-XX:+OptionName - Enable the option.
-XX:-OptionName - Disable the option.
-XX:OptionName=value - Set the option to a specific value.

Java 9 onwards: -Xlog:gc*:file=

Continue reading on IBM Developer...

Top 5 Python blogs, articles, or tutorials on IBM Developer in the first half of 2025

IBM Developer — Tue, 10 Jun 2025 11:26:02 +0000

This blog was originally published on IBM Developer.

Python is one of the most popular programming languages in the generative AI landscape. Whether you're implementing RAG pipelines, optimizing AI models, or taking advantage of AI agent frameworks, Python is front and center due to it's simplicity and versatility.

If you're ready to start building with Python, check out this curated list of our top 5 blogs, articles, and tutorials so far this year. They'll help you enhance your Python skills, stay ahead of the curve, and get inspired for your next generative AI project!

#5 Enhancing RAG performance with smart chunking strategies

In Enhancing RAG performance with smart chunking strategies, you learn just how important chunking is when retrieving data in your RAG system. This article explains how you first need to understand the specific nature of your data and then select the right chunking strategy to improve the performance of your RAG system.

Explore more content on IBM Developer about the retrieval-augmented generation (RAG).

#4 Cache augmented generation (CAG): Enhancing speed and efficiency in AI systems

The article Cache augmented generation (CAG): Enhancing speed and efficiency in AI systems explains what cache augmented generation (CAG) is, how it works, and its potential impact on AI systems.

Explore more content on IBM Developer about the retrieval-augmented generation (RAG).

#3 Optimizing LLMs with cache augmented generation

In Optimizing LLMs with cache augmented generation, you learn how cache augmented generation (CAG) and its integration with the Granite models helps to optimize LLM workflows by preloading knowledge and precomputing states, making it ideal for static data set applications with fast response needs.

Explore more content on IBM Developer about large language models (LLMs) and the Granite models.

#2 Comparing AI agent frameworks: CrewAI, LangGraph, and BeeAI

Comparing AI agent frameworks: CrewAI, LangGraph, and BeeAI explains the capabilities, features, and implementation considerations of leading AI agent frameworks focused on multi-agent collaboration and orchestration. The article shows how CrewAI, LangGraph, and BeeAI each offer powerful capabilities for implementing AI agent systems, with distinct advantages based on specific requirements and use cases.

Check out an additional article, Implementing AI agents with AI agent frameworks, for a hands-on look at implementing an AI agent in each of these three AI agent frameworks.

Explore more content on IBM Developer about agentic AI.

#1 Create a LangChain AI Agent in Python using watsonx

In the tutorial Create a LangChain AI Agent in Python using watsonx, you learn how to use the LangChain Python package to build an AI agent that uses its custom tools to return a URL directing to NASA's Astronomy Picture of the Day.

Explore more content on IBM Developer about agentic AI.

Want more?

Python is a simple and versatile open source programming language that's ideal for AI tasks. Now that you've seen what developers were looking to learn about Python in the first half of 2025, make sure you stay tuned to see what the rest of 2025 brings. We'll bring you the latest in Python concepts, step-by-step tutorials, and AI techniques!

Check out more of our Python content on our Python hub page.

Collecting and transporting JFR dumps from containerized environments

IBM Developer — Thu, 29 May 2025 12:54:43 +0000

Authored by: Aditi Srinivas, Gireesh Punathil

This article was originally published on IBM Developer.

JDK Flight Recorder (JFR) was introduced in IBM Semeru Runtimes v11.0.27, v17.0.15 and v21.0.7 releases. JFR provides an industry standard, low overhead, continuous workload monitoring experience to the users of Semeru-based Java workloads that are running in containerized or conventional deployment targets.

In containerized environments like Red Hat OpenShift, diagnosing performance issues in Java applications often requires capturing JFR data. However, collecting and transporting these recordings from ephemeral, isolated containers poses logistical and technical challenges.

Traditional monitoring tools often assume direct or SSH access to the host, which is typically restricted in modern cloud-native environments. In addition, many container environments inhibit persistent volumes, so the diagnostic data needs to be taken out through manual means, before the container is recycled. Without streamlined methods to securely extract and manage JFR files, root cause analysis can be delayed, affecting application reliability and supportability.

To address these challenges, we need a well-defined set of tasks to be carried out to perform the application monitoring and performance analysis seamlessly. This article provides the steps to configure and record a JFR dump from a running Semeru application in a containerized environment (specifically an OpenShift environment) and transport it to your local system, with the assumption that you do not have SSH access to the pod that is running the container.

Set up your environment

Make sure rsync is installed beforehand. Perform this step in the Dockerfile itself while defining the image. This step is only required if your JFR dump is abnormally large (more than 2GB).

# sudo apt install rsync

Next, make sure you have the ability to execute shell scripts in the remote OpenShift container.

# oc login --token=<your_token> --server=<your_cluster_api>

Locate the target pod:

# oc get pods -n <namespace>

Locate the container within the pod:

# oc get pod <pod> -n <namespace> -o jsonpath='{.spec.containers[*].name}'

Issue sample commands or shell scripts inside the container:

# oc exec -n <namespace> <pod> -c <container> -- /bin/sh -c <cmd>

For example:

To list files in the root directory:

# oc exec -n my-namespace my-pod -c my-container -- /bin/sh -c "ls /"

Print environment variables:

oc exec -n my-namespace my-pod -c my-container -- /bin/sh -c "env"

Collect a JFR dump

JFR capabilities are enabled by default in most JVMs, so you don’t need to pass any special arguments to activate them.

To initiate a JFR recording, you can use the jcmd utility, which allows you to interact with the running JVM process.

Note down the JVM ID which will be used later using jcmd -l command.

# oc exec -n <namespace> <pod> -c <container> -- /bin/sh -c "jcmd -l"

Make sure to embed the commands in double quotes to avoid truncation.

Continue reading on IBM Developer.

How servers become part of the Cloud

IBM Developer — Tue, 13 May 2025 07:28:33 +0000

This tutorial was originally published on IBM Developer by Zack Grossbart

Understanding how the cloud works is key to building stable and reliable cloud applications. Should you keep all your servers in one zone or spread them out? What causes capacity errors, and how do you fix them? Where is your workload actually running and what happens if that server fails? How do you ensure it's running on something stable?

To design secure and resilient applications, you need to answer these questions. Knowing how a server becomes part of the cloud and how IBM monitors, detects issues, and fixes them is essential to trusting your workloads with us.

If you’ve ever wondered what goes on behind the scenes to bring a new server into the cloud, this article is for you.

Cloud resiliency and scale come from more than just stacking servers in a room. It takes careful planning, smart design, and coordination of both physical and logical components.

The IBM Cloud is always evolving. We’re constantly adding capacity, launching new features, fixing servers, and upgrading infrastructure to give our customers the best experience possible.

Bringing a new server into the cloud involves a lot of moving parts—from configuration and power to networking and placement. In this article, we’ll walk you through how IBM adds new servers to the VPC cloud: where they live, how they’re set up, how they’re connected, and how they become a trusted part of the IBM Cloud.

Building blocks of a Cloud

To understand how a cloud works, we need to look at the two main components that make it all possible:

Physical components: The actual hardware and infrastructure.
Logical components: The software and systems that organize and manage that hardware.

Continue reading on IBM Developer