Solutions are increasingly built using microservices architecture, leading to complex distributed systems. Monitoring these systems becomes challenging due to the diversity of tools, protocols, and data formats.
This blog focuses on:
- Explaining the basics of OpenTelemetry, its role in observability, and the current state of observability in the industry.
- Explaining how to instrument code and identify when to use manual and automatic instrumentation.
- Discussing the OpenTelemetry Collector and Connector, which are responsible for processing and forwarding telemetry data.
What is OpenTelemetry?
OpenTelemetry addresses these challenges by providing a unified framework for collecting, processing, and exporting telemetry data, enabling you to gain deep insights into their apps’ behavior.
For this guide, consider a modern coffee shop app with the following microservices:
- Order Service: Handles customer orders.
- Payment Service: Processes payments.
- Inventory Service: Manages stock levels.
- Notification Service: Sends order confirmations.
Each service operates independently, possibly written in different languages and deployed across various environments. When a customer places an order, the request traverses multiple services, making it essential to have a comprehensive observability solution to monitor and troubleshoot the system effectively.
Key Benefits of OpenTelemetry:
- Unified Instrumentation: Instrument your code once and send telemetry data to multiple backends without re-instrumentation.
- Vendor-Neutral: Avoid vendor lock-in by using standard APIs and protocols. Since you can switch platforms without having to re-instrument your entire solution.
-
Unified telemetry: Combines tracing, logging, and metrics into a single framework enabling correlation of all data and establishing an open standard for telemetry data.
- Linking these parameters helps you make better decisions.
- Community-Driven: Benefit from a vibrant open-source community contributing to continuous improvements.
- Improved Correlation: Easily correlate data across different telemetry signals for better insights.
Three pillars of Observability
Telemetry is collected via instrumentation and flows through a pipeline that enriches, batches, and stores it for later analysis. Most observability tooling revolves around three categories of telemetry: logs, metrics, and traces.
While they share architectural similarities, such as instrumentation, ingestion, storage, and visualization. Each type presents unique challenges and is best suited to answer different types of questions.
Logs
Logs are immutable, timestamped records of discrete events. Each log entry typically contains a message and optional structured metadata. However, coming up with a standardized log format is no easy task, since different pieces of information are critical for different types of software.
You can also build logging agents and protocols to forward logs to a central location for efficient storage. For example, consider a user placing an order in a microservices-based coffee shop app. The order-service logs a line like:
{
"timestamp": "2025-06-01T08:43:12Z",
"level": "INFO",
"service": "order-service",
"message": "New order placed: latte",
"order\_id": "ORD-20250601-001"
}
Metrics
Metrics help you understand a high-level view of the current state of your system. A metric is a single numerical value derived by applying a statistical measure to a group of events.
In other words, metrics represent an aggregate. This is useful because their compact representation allows us to graph how a system changes over time.
Different Metric Types:
- Counters: Total number of orders placed
- Gauges: Current number of in-progress orders
- Histograms: Distribution of order preparation times
- Summaries: Quantiles of response times
Coffee Shop Example*:* The order-service emits a metric would be displayed.
orders_placed_total{beverage=”latte”} 1560
A Prometheus dashboard may show a sharp spike in latte orders, suggesting a promotional campaign is working or an anomaly is occurring.
Traces
To understand the larger context in distributed solution, you must identify other related events, such as the specific requests or transactions that initiated the log entry and the sequence of services or microservices involved in processing that request across the system.
Traces visualize the full journey of a single request across services. A trace consists of multiple spans, each representing a step in the request’s lifecycle. This makes it possible to reconstruct the journey of requests in the system.
Coffee Shop Example*:* A user places an order. The request flows through UI -> order-service -> payment-service -> inventory-service.
Each service adds a span with trace and span IDs, allowing you to view:
- Total request duration
- Which service caused a delay
- Any failed steps in the chain
Problems with the Current Observability Approach
Logs, metrics, and traces typically live in separate systems, with different formats and tooling. This fragmentation forces you to jump between dashboards and correlate data manually. Even with shared metadata like timestamps or service names, stitching information together remains time-consuming and error-prone.
Coffee Shop Example: Imagine a spike in failed order-service requests. You check metrics and see a high error rate. You then switch to logs, scan for failures, and try to match logs with trace IDs. Without consistent context, root cause analysis becomes guesswork.
Lack of Built-in Instrumentation in Open Source Software
Many open source libraries expose hooks but do not include native telemetry support. You must build and maintain custom adapters.
Problems this causes:
- Version Compatibility: Library updates may break adapters.
- Telemetry Loss: Converting data between formats can degrade signal quality.
- Engineering Overhead: Teams spend time wiring telemetry instead of building features.
Coffee Shop Example*:* If the inventory-service uses a third-party stock manager with no OpenTelemetry support, you must manually instrument it or depend on its observability hooks.
What OpenTelemetry is NOT
OpenTelemetry simplifies telemetry collection and export, but it doesn’t offer end-to-end observability out of the box. It’s a toolkit, not a monitoring platform.
Not OpenTelemetry’s Job | Description |
---|---|
Data storage | OpenTelemetry exports data; it doesn’t store it. You’ll need systems like SigNoz, Prometheus, or Elasticsearch. |
Visualization | No dashboards or charts are included. Use tools like Grafana, Jaeger, or Datadog. |
Alerting | OpenTelemetry doesn’t generate alerts. Integrate it with systems that support alert rules. |
Monitoring out-of-the-box | It doesn’t auto-instrument everything or provide prebuilt dashboards. You must configure and integrate. |
Performance optimization | It helps identify bottlenecks, but doesn’t tune your app. |
OpenTelemetry standardizes how you collect logs, metrics, and traces. It enables observability, but doesn’t deliver it on its own. You still need storage, visualization, alerting, and analysis platforms to complete the picture.
Signals in OpenTelemetry
OpenTelemetry organizes observability data into three core signals:
Signal | Purpose |
---|---|
Traces | Capture the lifecycle and flow of a request across services. |
Metrics | Measure system and app performance over time. |
Logs | Record discrete events and state changes in the app. |
Each signal is independent but can be correlated to provide richer observability. OpenTelemetry’s architecture ensures signal consistency and interoperability across programming languages through its official OpenTelemetry Specification.
OpenTelemetry Specification Components
- Common Terminology: Ensures a consistent vocabulary across implementations.
-
API Specification: Provides language-agnostic interfaces to generate telemetry (traces, metrics, logs). APIs are backend-agnostic and enable portable instrumentation.
- For more information, refer to Tracing API, Metrics API, and OpenTelemetry Logging.
-
SDK Specification: Defines how SDKs process, sample, and export telemetry. Ensures consistent behavior across languages.
- For more information, refer to Tracing SDK, Metrics SDK, and Logs SDK.
- Semantic Conventions: Standardizes names and attributes for telemetry data (e.g., HTTP status codes, DB queries).
- OpenTelemetry Protocol (OTLP): Describes a vendor-neutral transport protocol to send telemetry
Why separate API from SDK?
The API–SDK split improves modularity, portability, and vendor neutrality:
- Library safety: A shared library (e.g., database driver) can safely include only the API, avoiding heavy SDK dependencies and avoiding conflicts in user apps.
- Portability: You can ship apps with OpenTelemetry APIs baked in, and let platform teams decide which SDK/exporter to use later.
- Flexibility: You can write your own SDK or replace components (e.g., use a custom sampler or exporter).
OpenTelemetry API vs SDK
Feature | OpenTelemetry API | OpenTelemetry SDK |
---|---|---|
Purpose | Defines interfaces to generate telemetry | Implements logic to process and export telemetry |
Responsibility | Exposes functions to create spans, metrics, logs | Manages batching, sampling, context, and export |
Language-specific? | Yes | Yes |
Included by default? | Yes, lightweight | No, must be explicitly added and configured |
Default behavior | No-op | Active when configured |
Used by | App and library developers | DevOps, SREs, platform engineers |
Stability | High | May evolve with backends and exporter needs |
Customizable | No | Yes (exporters, processors, samplers) |
For example, consider the following scenarios:
Scenario | Best choice |
---|---|
Open-source library with tracing support | API only (lightweight, no deps) |
Production microservice exporting to Grafana | API + SDK + OTLP Exporter |
CLI tool needing optional debug tracing | API (enabled conditionally with SDK) |
How to Instrument Code with OpenTelemetry
OpenTelemetry supports multiple instrumentation approaches to capture telemetry from apps. Understanding these methods helps choose the right approach based on your app’s complexity, development stage, and observability goals.
OpenTelemetry classifies instrumentation into three categories, often overlapping in practice:
Category | Effort | Control | Customization | Code Changes |
---|---|---|---|---|
Automatic Instrumentation (Zero-Code) | Low | Limited | Minimal | None |
Instrumentation Libraries | Moderate | Medium | Moderate | Minimal to moderate |
Manual Instrumentation (Fully Code-Based) | High | Full | Full | Extensive |
OpenTelemetry provides three ways to capture telemetry from your app:
1. Automatic Instrumentation
Auto-instrumentation in .NET 8 is available via the OpenTelemetry .NET Auto-Instrumentation Agent, which instruments common libraries like ASP.NET Core, HttpClient, and SQL clients at runtime.
Ideal use case: Use this to quickly add observability to .NET services without modifying source code.
Example: orders-service (.NET 8, ASP.NET Core)
- Download and install the auto-instrumentation binaries .NET Auto-Instrumentation GitHub
- Run the app with the auto-instrumentation profiler
set OTEL_SERVICE_NAME=orders-service
set OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
set CORECLR_ENABLE_PROFILING=1
set CORECLR_PROFILER_PATH=C:\otel-dotnet auto\OpenTelemetry.AutoInstrumentation.Native.dll
dotnet run
What it captures:
- HTTP requests and responses
- Outgoing HTTP/gRPC calls
- SQL queries via ADO.NET
Pros:
- No code changes required
- Fast onboarding
- Works well for ASP.NET Core, Entity Framework, and HttpClient
Cons:
- Limited to supported libraries
- Less control over span names and metadata
2. Library-Based Instrumentation
Library-based instrumentation uses the OpenTelemetry SDK and prebuilt instrumentations like AddAspNetCoreInstrumentation.
Ideal use case: You want to customize configuration and capture high-value signals without full manual control.
Example: menu-service (.NET 8, ASP.NET Core)
Install NuGet packages:
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
Configure in Program.cs:
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.WithTracing(tracerProviderBuilder =>
{
tracerProviderBuilder
.SetResourceBuilder(
ResourceBuilder.CreateDefault()
.AddService("menu-service"))
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddOtlpExporter(otlp =>
{
otlp.Endpoint = new Uri("http://otel-collector:4317");
});
});
var app = builder.Build();
app.MapGet("/", () => "Hello from Menu Service");
app.Run();
What it captures:
- Inbound ASP.NET Core request spans
- Outbound calls (HttpClient, gRPC)
- Custom span and resource metadata
Pros
- Easy to configure
- Integrates well with DI and hosting model
- Supports enrichment and filtering
Cons
- Requires adding code/configuration
- Less flexible than full manual instrumentation
3. Manual Instrumentation
Manual instrumentation lets you define custom spans for critical business logic (e.g., awarding loyalty points or calculating discounts).
Ideal use case: You need to trace domain-specific logic not covered by auto or library-based methods.
Example: loyalty-service (.NET 8 Worker Service)
Install packages:
dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Trace
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
Configure tracing in Program.cs:
using OpenTelemetry.Trace;
using OpenTelemetry.Resources;
using System.Diagnostics;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddOpenTelemetry()
.WithTracing(tracerProviderBuilder =>
{
tracerProviderBuilder
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("loyalty-service"))
.AddSource("LoyaltyService")
.AddOtlpExporter(options =>
{
options.Endpoint = new Uri("http://otel-collector:4317");
});
});
var app = builder.Build();
// Start a custom span manually
var source = new ActivitySource("LoyaltyService");
using var activity = source.StartActivity("AwardLoyaltyPoints", ActivityKind.Internal);
activity?.SetTag("customer.id", "cust-123");
activity?.SetTag("points.awarded", 20);
// Simulate business logic
Console.WriteLine("Loyalty points awarded.");
What it captures:
- Custom spans for logic like point calculations
- Rich metadata (tags)
- Correlation with other telemetry (metrics/logs)
Pros:
- Full control over telemetry
- Capture domain-specific operations
- High value for debugging or performance tuning
Cons:
- Requires development effort
- Must manage span lifecycle correctly
- Potential for inconsistent usage without guidelines
Overlaps and Clarifications
- Instrumentation libraries sometimes provide automatic instrumentation after import, blurring the line between zero-code and code-based.
- Under the hood, all approaches use some form of libraries.
- Zero-code is broad and quick; libraries add customization; manual is full control.
Recommended Approach and Strategy
- Start with automatic instrumentation to gain immediate insight with minimal effort.
- Add instrumentation libraries where you need more coverage or framework-specific tracing.
- Use manual instrumentation for critical business logic or custom metrics requiring fine-grained control.
Why use OpenTelemetry Collector?
The OpenTelemetry Collector is a vendor-agnostic, standalone service that simplifies telemetry management in production. It decouples telemetry generation from ingestion and export, offering the following benefits:
The Collector provides three core capabilities:
Function | Description |
---|---|
Receive | Accepts telemetry from apps, agents, or other Collectors via OTLP or other supported protocols. |
Process | Filters, enriches, transforms, batches, or samples telemetry data. |
Export | Sends processed data to one or more observability backends. |
Key benefits
Without Collector | With Collector |
---|---|
Apps must export data directly to each backend | Central point of control for all telemetry |
Risk of tight coupling to backend protocols | Decouples app logic from backend details |
Difficult to enforce consistent processing | Apply transformations consistently |
No central routing or batching | Route and batch data efficiently |
Understanding OpenTelemetry Protocol (OTLP)
OTLP is the native telemetry transport used across OpenTelemetry. It standardizes how telemetry is serialized, transmitted, and received.
Key benefits:
- Unified: Handles traces, metrics, and logs in one format.
- Vendor-neutral: Reduces backend lock-in and removes custom exporters.
- Efficient: Uses gRPC and Protobuf for high-performance streaming.
- Extensible: Schema evolves without breaking compatibility.
- Integrated: Collector and most observability tools support OTLP out of the box.
OTLP transport options
Transport | Encoding | Use case |
---|---|---|
gRPC | Protobuf | Default for performance and bi-directional streaming |
HTTP/1.1 | JSON | Debugging, human-readable payloads |
HTTP/2 | Protobuf | Efficient, firewall-friendly alternative to gRPC |
Example: Pre-OTLP vs OTLP
Before OTLP | With OTLP |
---|---|
Prometheus exporter, Zipkin exporter, Fluentd plugin | One OTLP exporter and one Collector instance |
Multiple exporters in each service | Centralized, simplified telemetry pipeline |
In OpenTelemetry, logs are a critical signal for observability. Any data that isn’t a trace or metric is categorized as a log. Events, for instance, are specialized log entries.
Unlike traces and metrics, which OpenTelemetry implements via dedicated APIs and SDKs, logging is designed to integrate with existing logging frameworks in various programming languages. Instead of requiring a brand-new logging API, OpenTelemetry provides a Logs Bridge API that links traditional logging systems with telemetry signals such as traces and metrics.
How Logging Works in OpenTelemetry
You instrument logging using the Logs Bridge API, which connects popular logging frameworks (like Serilog, ILogger, or log4net in .NET) to OpenTelemetry’s pipeline.
Key Components
- LoggerProvider: Factory for creating loggers.
- Logger: Used to create log entries (LogRecord).
- LogRecord: Represents a single log entry with metadata.
- LogRecordExporter: Sends logs to destinations like the OpenTelemetry Collector.
- LogRecordProcessor: Processes logs before they’re exported.
LogRecord Structure
A LogRecord typically includes:
- timestamp: When the log occurred.
- trace_id, span_id: Links to a trace/span for correlation.
- severity_text: e.g., INFO, WARNING, ERROR.
- body: The log message or structured content.
- attributes: Custom metadata (e.g., user.id, http.method).
Example Use Case: Coffee app has a /get_coffee endpoint. When a coffee request fails due to a missing ID, the app logs this event.
logger.error(“Missing coffee ID”, extra={“http.status_code”: 400, “coffee_id”: None})
This log entry can be linked to the trace of the request, helping correlate the failure with upstream service calls and backend metrics.
Collector Configuration
The OpenTelemetry Collector decouples telemetry generation from backend concerns. It processes logs, traces, and metrics independently.
Collector Pipeline Example
receivers:
otlp:
protocols:
grpc:
processors:
batch: {}
exporters:
logging:
loglevel: debug
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging]
Collector Deployment Topologies
The OpenTelemetry Collector supports multiple deployment models, allowing you to tailor observability pipelines based on your architecture and scalability needs. Each topology serves different use cases—from tightly coupled microservices to centralized processing in large-scale environments.
Sidecar Deployment : OpenTelemetry Collector runs as a sidecar alongside each application instance. This setup is common in containerized environments like Kubernetes, where the Collector is injected into each Pod.
Advantages:
- Low latency: The Collector runs on the same host or Pod, reducing network overhead for exporting telemetry data.
- Isolation: Each service has a dedicated Collector instance, ensuring telemetry data stays service-specific and avoids cross-contamination.
- Simplified trace correlation: Local logs, traces, and metrics can be more easily linked.
Ideal for microservices architectures where services operate independently and require individual telemetry pipelines.
Node Agent Deployment: a single Collector instance runs per host or node. This is typically implemented as a Kubernetes DaemonSet or similar system service in virtual machine environments.
Advantages:
- Centralized control per node: One Collector handles telemetry for all services on the same node.
- Resource-efficient: Fewer Collector instances are required compared to the sidecar model.
- System metrics access: Can collect host-level metrics (CPU, memory, disk, etc.) in addition to application telemetry.
Ideal Use Case:
- Suitable for clusters with many lightweight services that share node resources.
- Often used to monitor node-level infrastructure and runtime metrics alongside service-level data.
Standalone or Gateway Deployment: The Collector runs as a dedicated service, often behind a load balancer. Applications send telemetry data remotely to this central Collector (typically over OTLP).
Advantages:
- Scalability: A centralized Collector cluster can scale independently from application workloads.
- Simplified configuration management: Telemetry pipelines and transformations are managed in one place.
- Decoupling from application logic: Developers don’t need to worry about backend changes or exporter configurations.
Ideal Use Case:
- Best suited for large-scale systems with high telemetry volume.
- Useful for teams that want to offload all processing from applications and maintain a consistent observability architecture.
Benefits of OpenTelemetry Collectors:
- Separation of concerns: Developers emit logs; operators manage pipelines.
- Centralized management: All configuration is in one place.
- Resource efficiency: Offloads processing from app.
- No redeployments needed: Change pipelines without touching app code.
SigNoz with OpenTelemetry
SigNoz is a powerful observability platform built specifically for OpenTelemetry. It provides a seamless experience for collecting, storing, visualizing, and querying telemetry data, without vendor lock-in.
With OpenTelemetry, you collect signals (logs, metrics, and traces) from the coffee shop services. These signals are sent to the OpenTelemetry Collector, which processes and forwards them to SigNoz.
In our coffee shop microservices example, SigNoz plays the role of the observability backend, giving your team full visibility into traces, metrics, and logs generated by the app. Here’s how SigNoz helps the coffee shop:
- Traces: Visualize how an order moves through the system, from frontend-service to payment-service and inventory-service. Identify latency bottlenecks or failed calls.
- Metrics: Monitor key service-level indicators like espresso_orders_per_minute, latency, and error_rate without writing custom dashboards.
- Logs: Correlate logs with trace IDs and span IDs to troubleshoot order failures (e.g., inventory out-of-stock or payment declined).
For more information, refer to:
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.