DEV Community

Cover image for Understanding Observability with OpenTelemetry and Coffee
hridyesh bisht
hridyesh bisht

Posted on

2 1 1 1 1

Understanding Observability with OpenTelemetry and Coffee

Solutions are increasingly built using microservices architecture, leading to complex distributed systems. Monitoring these systems becomes challenging due to the diversity of tools, protocols, and data formats.

This blog focuses on:

  • Explaining the basics of OpenTelemetry, its role in observability, and the current state of observability in the industry.
  • Explaining how to instrument code and identify when to use manual and automatic instrumentation.
  • Discussing the OpenTelemetry Collector and Connector, which are responsible for processing and forwarding telemetry data.

What is OpenTelemetry?

OpenTelemetry addresses these challenges by providing a unified framework for collecting, processing, and exporting telemetry data, enabling you to gain deep insights into their apps’ behavior.

For this guide, consider a modern coffee shop app with the following microservices:

  • Order Service: Handles customer orders.
  • Payment Service: Processes payments.
  • Inventory Service: Manages stock levels.
  • Notification Service: Sends order confirmations.

An visual asset displaying flow of data from Service -> OTel Collector -> Observability Backend

Each service operates independently, possibly written in different languages and deployed across various environments. When a customer places an order, the request traverses multiple services, making it essential to have a comprehensive observability solution to monitor and troubleshoot the system effectively.

Key Benefits of OpenTelemetry:

  • Unified Instrumentation: Instrument your code once and send telemetry data to multiple backends without re-instrumentation.
  • Vendor-Neutral: Avoid vendor lock-in by using standard APIs and protocols. Since you can switch platforms without having to re-instrument your entire solution.
  • Unified telemetry: Combines tracing, logging, and metrics into a single framework enabling correlation of all data and establishing an open standard for telemetry data.
    • Linking these parameters helps you make better decisions.
  • Community-Driven: Benefit from a vibrant open-source community contributing to continuous improvements.
  • Improved Correlation: Easily correlate data across different telemetry signals for better insights.

Three pillars of Observability

Telemetry is collected via instrumentation and flows through a pipeline that enriches, batches, and stores it for later analysis. Most observability tooling revolves around three categories of telemetry: logs, metrics, and traces.

While they share architectural similarities, such as instrumentation, ingestion, storage, and visualization. Each type presents unique challenges and is best suited to answer different types of questions.

Logs

Logs are immutable, timestamped records of discrete events. Each log entry typically contains a message and optional structured metadata. However, coming up with a standardized log format is no easy task, since different pieces of information are critical for different types of software.

You can also build logging agents and protocols to forward logs to a central location for efficient storage. For example, consider a user placing an order in a microservices-based coffee shop app. The order-service logs a line like:

{  
  "timestamp": "2025-06-01T08:43:12Z",  
  "level": "INFO",  
  "service": "order-service",  
  "message": "New order placed: latte",  
  "order\_id": "ORD-20250601-001"  
}
Enter fullscreen mode Exit fullscreen mode

Metrics

Metrics help you understand a high-level view of the current state of your system. A metric is a single numerical value derived by applying a statistical measure to a group of events.

In other words, metrics represent an aggregate. This is useful because their compact representation allows us to graph how a system changes over time.

Different Metric Types:

  • Counters: Total number of orders placed
  • Gauges: Current number of in-progress orders
  • Histograms: Distribution of order preparation times
  • Summaries: Quantiles of response times

Coffee Shop Example*:* The order-service emits a metric would be displayed.

orders_placed_total{beverage=”latte”} 1560

A visual asset displaying Metrics agent sending data to Prometheus DB and then Grafana Dashboard.

A Prometheus dashboard may show a sharp spike in latte orders, suggesting a promotional campaign is working or an anomaly is occurring.

Traces

To understand the larger context in distributed solution, you must identify other related events, such as the specific requests or transactions that initiated the log entry and the sequence of services or microservices involved in processing that request across the system.

Traces visualize the full journey of a single request across services. A trace consists of multiple spans, each representing a step in the request’s lifecycle. This makes it possible to reconstruct the journey of requests in the system.

Coffee Shop Example*:* A user places an order. The request flows through UI -> order-service -> payment-service -> inventory-service.

A visual asset displaying flow of request through UI-> order-service -> payment-service -> inventory-service.

Each service adds a span with trace and span IDs, allowing you to view:

  • Total request duration
  • Which service caused a delay
  • Any failed steps in the chain

Problems with the Current Observability Approach

Logs, metrics, and traces typically live in separate systems, with different formats and tooling. This fragmentation forces you to jump between dashboards and correlate data manually. Even with shared metadata like timestamps or service names, stitching information together remains time-consuming and error-prone.

Coffee Shop Example: Imagine a spike in failed order-service requests. You check metrics and see a high error rate. You then switch to logs, scan for failures, and try to match logs with trace IDs. Without consistent context, root cause analysis becomes guesswork.

Lack of Built-in Instrumentation in Open Source Software

Many open source libraries expose hooks but do not include native telemetry support. You must build and maintain custom adapters.

Problems this causes:

  • Version Compatibility: Library updates may break adapters.
  • Telemetry Loss: Converting data between formats can degrade signal quality.
  • Engineering Overhead: Teams spend time wiring telemetry instead of building features.

Coffee Shop Example*:* If the inventory-service uses a third-party stock manager with no OpenTelemetry support, you must manually instrument it or depend on its observability hooks.

What OpenTelemetry is NOT

OpenTelemetry simplifies telemetry collection and export, but it doesn’t offer end-to-end observability out of the box. It’s a toolkit, not a monitoring platform.

Not OpenTelemetry’s Job Description
Data storage OpenTelemetry exports data; it doesn’t store it. You’ll need systems like SigNoz, Prometheus, or Elasticsearch.
Visualization No dashboards or charts are included. Use tools like Grafana, Jaeger, or Datadog.
Alerting OpenTelemetry doesn’t generate alerts. Integrate it with systems that support alert rules.
Monitoring out-of-the-box It doesn’t auto-instrument everything or provide prebuilt dashboards. You must configure and integrate.
Performance optimization It helps identify bottlenecks, but doesn’t tune your app.

OpenTelemetry standardizes how you collect logs, metrics, and traces. It enables observability, but doesn’t deliver it on its own. You still need storage, visualization, alerting, and analysis platforms to complete the picture.

Signals in OpenTelemetry

OpenTelemetry organizes observability data into three core signals:

Signal Purpose
Traces Capture the lifecycle and flow of a request across services.
Metrics Measure system and app performance over time.
Logs Record discrete events and state changes in the app.

Each signal is independent but can be correlated to provide richer observability. OpenTelemetry’s architecture ensures signal consistency and interoperability across programming languages through its official OpenTelemetry Specification.

A visual asset displaying flow of data in OpenTelemetry Specifications

OpenTelemetry Specification Components

  • Common Terminology: Ensures a consistent vocabulary across implementations.
  • API Specification: Provides language-agnostic interfaces to generate telemetry (traces, metrics, logs). APIs are backend-agnostic and enable portable instrumentation.
  • SDK Specification: Defines how SDKs process, sample, and export telemetry. Ensures consistent behavior across languages.
  • Semantic Conventions: Standardizes names and attributes for telemetry data (e.g., HTTP status codes, DB queries).
  • OpenTelemetry Protocol (OTLP): Describes a vendor-neutral transport protocol to send telemetry

Why separate API from SDK?

The API–SDK split improves modularity, portability, and vendor neutrality:

  • Library safety: A shared library (e.g., database driver) can safely include only the API, avoiding heavy SDK dependencies and avoiding conflicts in user apps.
  • Portability: You can ship apps with OpenTelemetry APIs baked in, and let platform teams decide which SDK/exporter to use later.
  • Flexibility: You can write your own SDK or replace components (e.g., use a custom sampler or exporter).

OpenTelemetry API vs SDK

Feature OpenTelemetry API OpenTelemetry SDK
Purpose Defines interfaces to generate telemetry Implements logic to process and export telemetry
Responsibility Exposes functions to create spans, metrics, logs Manages batching, sampling, context, and export
Language-specific? Yes Yes
Included by default? Yes, lightweight No, must be explicitly added and configured
Default behavior No-op Active when configured
Used by App and library developers DevOps, SREs, platform engineers
Stability High May evolve with backends and exporter needs
Customizable No Yes (exporters, processors, samplers)

For example, consider the following scenarios:

Scenario Best choice
Open-source library with tracing support API only (lightweight, no deps)
Production microservice exporting to Grafana API + SDK + OTLP Exporter
CLI tool needing optional debug tracing API (enabled conditionally with SDK)

How to Instrument Code with OpenTelemetry

OpenTelemetry supports multiple instrumentation approaches to capture telemetry from apps. Understanding these methods helps choose the right approach based on your app’s complexity, development stage, and observability goals.

OpenTelemetry classifies instrumentation into three categories, often overlapping in practice:

Category Effort Control Customization Code Changes
Automatic Instrumentation (Zero-Code) Low Limited Minimal None
Instrumentation Libraries Moderate Medium Moderate Minimal to moderate
Manual Instrumentation (Fully Code-Based) High Full Full Extensive

OpenTelemetry provides three ways to capture telemetry from your app:

1. Automatic Instrumentation

Auto-instrumentation in .NET 8 is available via the OpenTelemetry .NET Auto-Instrumentation Agent, which instruments common libraries like ASP.NET Core, HttpClient, and SQL clients at runtime.

Ideal use case: Use this to quickly add observability to .NET services without modifying source code.

Example: orders-service (.NET 8, ASP.NET Core)

  1. Download and install the auto-instrumentation binaries .NET Auto-Instrumentation GitHub
  2. Run the app with the auto-instrumentation profiler
set OTEL_SERVICE_NAME=orders-service
set OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
set CORECLR_ENABLE_PROFILING=1
set CORECLR_PROFILER_PATH=C:\otel-dotnet auto\OpenTelemetry.AutoInstrumentation.Native.dll
dotnet run
Enter fullscreen mode Exit fullscreen mode

What it captures:

  • HTTP requests and responses
  • Outgoing HTTP/gRPC calls
  • SQL queries via ADO.NET

Pros:

  • No code changes required
  • Fast onboarding
  • Works well for ASP.NET Core, Entity Framework, and HttpClient

Cons:

  • Limited to supported libraries
  • Less control over span names and metadata

2. Library-Based Instrumentation

Library-based instrumentation uses the OpenTelemetry SDK and prebuilt instrumentations like AddAspNetCoreInstrumentation.

Ideal use case: You want to customize configuration and capture high-value signals without full manual control.

Example: menu-service (.NET 8, ASP.NET Core)

Install NuGet packages:

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
Enter fullscreen mode Exit fullscreen mode

Configure in Program.cs:

using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .SetResourceBuilder(
                ResourceBuilder.CreateDefault()
                    .AddService("menu-service"))
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            .AddOtlpExporter(otlp =>
            {
                otlp.Endpoint = new Uri("http://otel-collector:4317");
            });
    });

var app = builder.Build();
app.MapGet("/", () => "Hello from Menu Service");
app.Run();
Enter fullscreen mode Exit fullscreen mode

What it captures:

  • Inbound ASP.NET Core request spans
  • Outbound calls (HttpClient, gRPC)
  • Custom span and resource metadata

Pros

  • Easy to configure
  • Integrates well with DI and hosting model
  • Supports enrichment and filtering

Cons

  • Requires adding code/configuration
  • Less flexible than full manual instrumentation

3. Manual Instrumentation

Manual instrumentation lets you define custom spans for critical business logic (e.g., awarding loyalty points or calculating discounts).

Ideal use case: You need to trace domain-specific logic not covered by auto or library-based methods.

Example: loyalty-service (.NET 8 Worker Service)

Install packages:

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Trace
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
Enter fullscreen mode Exit fullscreen mode

Configure tracing in Program.cs:

using OpenTelemetry.Trace;
using OpenTelemetry.Resources;
using System.Diagnostics;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerProviderBuilder =>
    {
        tracerProviderBuilder
            .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("loyalty-service"))
            .AddSource("LoyaltyService")
            .AddOtlpExporter(options =>
            {
                options.Endpoint = new Uri("http://otel-collector:4317");
            });
    });

var app = builder.Build();

// Start a custom span manually
var source = new ActivitySource("LoyaltyService");

using var activity = source.StartActivity("AwardLoyaltyPoints", ActivityKind.Internal);
activity?.SetTag("customer.id", "cust-123");
activity?.SetTag("points.awarded", 20);

// Simulate business logic
Console.WriteLine("Loyalty points awarded.");
Enter fullscreen mode Exit fullscreen mode

What it captures:

  • Custom spans for logic like point calculations
  • Rich metadata (tags)
  • Correlation with other telemetry (metrics/logs)

Pros:

  • Full control over telemetry
  • Capture domain-specific operations
  • High value for debugging or performance tuning

Cons:

  • Requires development effort
  • Must manage span lifecycle correctly
  • Potential for inconsistent usage without guidelines

A visual asset displaying three ways to capture telemetry from your app and send it to OLTP

Overlaps and Clarifications

  • Instrumentation libraries sometimes provide automatic instrumentation after import, blurring the line between zero-code and code-based.
  • Under the hood, all approaches use some form of libraries.
  • Zero-code is broad and quick; libraries add customization; manual is full control.

Recommended Approach and Strategy

  1. Start with automatic instrumentation to gain immediate insight with minimal effort.
  2. Add instrumentation libraries where you need more coverage or framework-specific tracing.
  3. Use manual instrumentation for critical business logic or custom metrics requiring fine-grained control.

Why use OpenTelemetry Collector?

The OpenTelemetry Collector is a vendor-agnostic, standalone service that simplifies telemetry management in production. It decouples telemetry generation from ingestion and export, offering the following benefits:

The Collector provides three core capabilities:

Function Description
Receive Accepts telemetry from apps, agents, or other Collectors via OTLP or other supported protocols.
Process Filters, enriches, transforms, batches, or samples telemetry data.
Export Sends processed data to one or more observability backends.

A visual asset displaying information sent from OTel Collector to Prometheus, Jaeger and S3

Key benefits

Without Collector With Collector
Apps must export data directly to each backend Central point of control for all telemetry
Risk of tight coupling to backend protocols Decouples app logic from backend details
Difficult to enforce consistent processing Apply transformations consistently
No central routing or batching Route and batch data efficiently

Understanding OpenTelemetry Protocol (OTLP)

OTLP is the native telemetry transport used across OpenTelemetry. It standardizes how telemetry is serialized, transmitted, and received.

Key benefits:

  • Unified: Handles traces, metrics, and logs in one format.
  • Vendor-neutral: Reduces backend lock-in and removes custom exporters.
  • Efficient: Uses gRPC and Protobuf for high-performance streaming.
  • Extensible: Schema evolves without breaking compatibility.
  • Integrated: Collector and most observability tools support OTLP out of the box.

OTLP transport options

Transport Encoding Use case
gRPC Protobuf Default for performance and bi-directional streaming
HTTP/1.1 JSON Debugging, human-readable payloads
HTTP/2 Protobuf Efficient, firewall-friendly alternative to gRPC

Example: Pre-OTLP vs OTLP

Before OTLP With OTLP
Prometheus exporter, Zipkin exporter, Fluentd plugin One OTLP exporter and one Collector instance
Multiple exporters in each service Centralized, simplified telemetry pipeline

In OpenTelemetry, logs are a critical signal for observability. Any data that isn’t a trace or metric is categorized as a log. Events, for instance, are specialized log entries.

Unlike traces and metrics, which OpenTelemetry implements via dedicated APIs and SDKs, logging is designed to integrate with existing logging frameworks in various programming languages. Instead of requiring a brand-new logging API, OpenTelemetry provides a Logs Bridge API that links traditional logging systems with telemetry signals such as traces and metrics.

How Logging Works in OpenTelemetry

You instrument logging using the Logs Bridge API, which connects popular logging frameworks (like Serilog, ILogger, or log4net in .NET) to OpenTelemetry’s pipeline.

Key Components

  • LoggerProvider: Factory for creating loggers.
  • Logger: Used to create log entries (LogRecord).
  • LogRecord: Represents a single log entry with metadata.
  • LogRecordExporter: Sends logs to destinations like the OpenTelemetry Collector.
  • LogRecordProcessor: Processes logs before they’re exported.

LogRecord Structure

A LogRecord typically includes:

  • timestamp: When the log occurred.
  • trace_id, span_id: Links to a trace/span for correlation.
  • severity_text: e.g., INFO, WARNING, ERROR.
  • body: The log message or structured content.
  • attributes: Custom metadata (e.g., user.id, http.method).

A visual asset displaying flow of logs from OpenTelemetry Logging module

Example Use Case: Coffee app has a /get_coffee endpoint. When a coffee request fails due to a missing ID, the app logs this event.

logger.error(“Missing coffee ID”, extra={“http.status_code”: 400, “coffee_id”: None})

This log entry can be linked to the trace of the request, helping correlate the failure with upstream service calls and backend metrics.

Collector Configuration

The OpenTelemetry Collector decouples telemetry generation from backend concerns. It processes logs, traces, and metrics independently.

Collector Pipeline Example

receivers:
  otlp:
    protocols:
      grpc:


processors:
  batch: {}


exporters:
  logging:
    loglevel: debug


service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]
Enter fullscreen mode Exit fullscreen mode

Collector Deployment Topologies

The OpenTelemetry Collector supports multiple deployment models, allowing you to tailor observability pipelines based on your architecture and scalability needs. Each topology serves different use cases—from tightly coupled microservices to centralized processing in large-scale environments.

Sidecar Deployment : OpenTelemetry Collector runs as a sidecar alongside each application instance. This setup is common in containerized environments like Kubernetes, where the Collector is injected into each Pod.

A visual asset displaying Sidecar Deployment in OpenTelemetry Collector

Advantages:

  • Low latency: The Collector runs on the same host or Pod, reducing network overhead for exporting telemetry data.
  • Isolation: Each service has a dedicated Collector instance, ensuring telemetry data stays service-specific and avoids cross-contamination.
  • Simplified trace correlation: Local logs, traces, and metrics can be more easily linked.

Ideal for microservices architectures where services operate independently and require individual telemetry pipelines.

Node Agent Deployment: a single Collector instance runs per host or node. This is typically implemented as a Kubernetes DaemonSet or similar system service in virtual machine environments.

A visual asset displaying Node Agent Deployment in OpenTelemetry Collector

Advantages:

  • Centralized control per node: One Collector handles telemetry for all services on the same node.
  • Resource-efficient: Fewer Collector instances are required compared to the sidecar model.
  • System metrics access: Can collect host-level metrics (CPU, memory, disk, etc.) in addition to application telemetry.

Ideal Use Case:

  • Suitable for clusters with many lightweight services that share node resources.
  • Often used to monitor node-level infrastructure and runtime metrics alongside service-level data.

Standalone or Gateway Deployment: The Collector runs as a dedicated service, often behind a load balancer. Applications send telemetry data remotely to this central Collector (typically over OTLP).

A visual asset displaying Standalone Deployment in OpenTelemetry Collector

Advantages:

  • Scalability: A centralized Collector cluster can scale independently from application workloads.
  • Simplified configuration management: Telemetry pipelines and transformations are managed in one place.
  • Decoupling from application logic: Developers don’t need to worry about backend changes or exporter configurations.

Ideal Use Case:

  • Best suited for large-scale systems with high telemetry volume.
  • Useful for teams that want to offload all processing from applications and maintain a consistent observability architecture.

Benefits of OpenTelemetry Collectors:

  • Separation of concerns: Developers emit logs; operators manage pipelines.
  • Centralized management: All configuration is in one place.
  • Resource efficiency: Offloads processing from app.
  • No redeployments needed: Change pipelines without touching app code.

SigNoz with OpenTelemetry

SigNoz is a powerful observability platform built specifically for OpenTelemetry. It provides a seamless experience for collecting, storing, visualizing, and querying telemetry data, without vendor lock-in.

With OpenTelemetry, you collect signals (logs, metrics, and traces) from the coffee shop services. These signals are sent to the OpenTelemetry Collector, which processes and forwards them to SigNoz.

A visual asset displaying flow of metrics from OTel Collector to SigNoz

In our coffee shop microservices example, SigNoz plays the role of the observability backend, giving your team full visibility into traces, metrics, and logs generated by the app. Here’s how SigNoz helps the coffee shop:

  • Traces: Visualize how an order moves through the system, from frontend-service to payment-service and inventory-service. Identify latency bottlenecks or failed calls.
  • Metrics: Monitor key service-level indicators like espresso_orders_per_minute, latency, and error_rate without writing custom dashboards.
  • Logs: Correlate logs with trace IDs and span IDs to troubleshoot order failures (e.g., inventory out-of-stock or payment declined).

A screenshot displaying logs linked with traces in SigNoz

For more information, refer to:

Heroku

The AI PaaS for deploying, managing, and scaling apps.

Heroku tackles the toil — patching and upgrading, 24/7 ops and security, build systems, failovers, and more. Stay focused on building great data-driven applications.

Get Started

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

Dev Diairies image

User Feedback & The Pivot That Saved The Project

🔥 Check out Episode 3 of Dev Diairies, following a successful Hackathon project turned startup.

Watch full video 🎥

👋 Kindness is contagious

Explore this insightful write-up, celebrated by our thriving DEV Community. Developers everywhere are invited to contribute and elevate our shared expertise.

A simple "thank you" can brighten someone’s day—leave your appreciation in the comments!

On DEV, knowledge-sharing fuels our progress and strengthens our community ties. Found this useful? A quick thank you to the author makes all the difference.

Okay