Micrometer Observability in Spring Boot: Unified Metrics and Tracing Guide

Home › Blog › Micrometer Observability in Spring Boot: Unified Metrics and Tracing Guide

Micrometer Observability in Spring Boot

Micrometer observability Spring Boot integration has become the standard approach for instrumenting Java applications. With the Observation API introduced in Micrometer 1.10 and fully embraced by Spring Boot 3.x, developers can now capture metrics, traces, and logs through a single unified API. This eliminates the fragmented instrumentation patterns that plagued earlier monitoring setups, where each signal lived in its own silo.

In this guide, you will learn how to set up the observability stack from scratch, configure exporters for Prometheus and Grafana Tempo, create custom observations, tune sampling and cardinality, and build production-ready dashboards. By the end, you will have a complete pipeline that correlates metrics with distributed traces automatically.

Why Unified Observability Matters

Traditional monitoring required separate libraries for metrics (Micrometer), tracing (Brave or OpenTelemetry), and logging (SLF4J). Each had its own configuration, context propagation, and export pipeline. Consequently, correlating a latency spike on a dashboard with the specific trace that caused it was a manual, error-prone process that often ended in guesswork.

The Observation API solves this by providing a single entry point. When you create an observation, it automatically generates both a timer metric and a trace span. Furthermore, it propagates context so that log statements within the observation include the trace ID. This means one line of instrumentation code produces three correlated signals instead of three disconnected ones.

Micrometer observability metrics dashboard — Unified observability dashboard correlating metrics with distributed traces

Setting Up the Observability Stack

Start by adding the necessary dependencies to your Spring Boot 3.x project. The key dependency is micrometer-tracing-bridge-otel, which bridges Micrometer’s Observation API to OpenTelemetry for trace export.

<dependencies>
    <!-- Core observability -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-observation</artifactId>
    </dependency>

    <!-- Tracing bridge to OpenTelemetry -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-tracing-bridge-otel</artifactId>
    </dependency>

    <!-- Export traces to Zipkin/Tempo -->
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-exporter-zipkin</artifactId>
    </dependency>

    <!-- Export metrics to Prometheus -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>

    <!-- Spring Boot Actuator -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
</dependencies>

Next, configure the application properties to enable observation and trace export. Note the sampling probability — running at 100% is fine in development but expensive at production volume:

# application.yml
management:
  observations:
    key-values:
      application: order-service
  tracing:
    sampling:
      probability: 1.0  # 100% in dev, lower in prod
  endpoints:
    web:
      exposure:
        include: health,prometheus,metrics
  metrics:
    distribution:
      percentiles-histogram:
        http.server.requests: true
    tags:
      application: order-service

spring:
  application:
    name: order-service

Sampling Strategy: The Cost Lever You Cannot Ignore

Trace volume scales directly with traffic, and exporting every span to a backend like Tempo gets expensive fast. A fixed probability such as 0.1 (10%) is a sensible starting point for high-traffic services, but it has a flaw: low-frequency errors may never be sampled. Therefore, production teams typically pair a modest head-based probability with tail-based sampling at the OpenTelemetry Collector, which decides whether to keep a trace after it completes — keeping all errors and slow requests while dropping the boring fast ones.

The practical rule is to make sampling per-environment configurable so you can crank it to 1.0 temporarily while debugging an incident, then return to a cost-effective baseline. Because the sampling decision propagates through trace headers, all downstream services honor the same choice, keeping each trace complete rather than fragmented.

Creating Custom Observations

The Observation API lets you instrument business logic with a clean, fluent interface. Each observation automatically creates a timer metric and a trace span. Additionally, you can attach key-value pairs that appear as both metric tags and span attributes, which is exactly where most teams introduce subtle bugs if they are careless about cardinality.

@Service
public class PaymentService {

    private final ObservationRegistry registry;
    private final PaymentGateway gateway;

    public PaymentResult processPayment(PaymentRequest request) {
        return Observation.createNotStarted("payment.process", registry)
            .contextualName("process-payment")
            .lowCardinalityKeyValue("payment.method", request.getMethod().name())
            .lowCardinalityKeyValue("currency", request.getCurrency())
            .highCardinalityKeyValue("order.id", request.getOrderId())
            .observe(() -> {
                gateway.validate(request);
                PaymentResult result = gateway.charge(request);
                return result;
            });
    }
}

The lowCardinalityKeyValue method creates metric tags (bounded values like payment method), while highCardinalityKeyValue creates span attributes (unbounded values like order IDs). This distinction is critical: every unique combination of low-cardinality tag values produces a separate time series in Prometheus, so putting an order ID or user ID there can generate millions of series and quietly take your monitoring stack down.

The Cardinality Explosion in Practice

To make the danger concrete, imagine tagging http.server.requests with the raw request URI. An endpoint like /orders/{id} appears to be one route, but if the path variable leaks into the tag you get /orders/1, /orders/2, and so on — a brand-new series per order. Within hours, Prometheus memory balloons and queries slow to a crawl. The fix is to ensure the URI tag uses the templated path, which Spring Boot does by default, and to never promote an identifier to a metric tag.

// BAD: user ID as a metric tag -> unbounded series, cardinality explosion
Metrics.counter("login.attempts", "user", userId).increment();

// GOOD: bounded outcome as the tag; identity belongs on the span, not the metric
Metrics.counter("login.attempts", "result", success ? "success" : "failure")
       .increment();

Observation Conventions for Consistency

For larger teams, define conventions to ensure consistent naming across services. This approach promotes standardized dashboards and alerts, and it centralizes the cardinality rules so individual developers cannot accidentally violate them.

public class PaymentObservationConvention
        implements GlobalObservationConvention<PaymentObservationContext> {

    @Override
    public String getName() {
        return "payment.process";
    }

    @Override
    public KeyValues getLowCardinalityKeyValues(PaymentObservationContext ctx) {
        return KeyValues.of(
            KeyValue.of("payment.method", ctx.getMethod()),
            KeyValue.of("payment.status", ctx.getStatus()),
            KeyValue.of("region", ctx.getRegion())
        );
    }

    @Override
    public boolean supportsContext(Observation.Context context) {
        return context instanceof PaymentObservationContext;
    }
}

Spring Boot Micrometer observability tracing setup — Distributed tracing flow across Spring Boot microservices

Automatic HTTP Instrumentation

Spring Boot 3.x automatically instruments all HTTP server requests and WebClient or RestClient calls using the Observation API. Therefore, you get request-duration metrics, error rates, and distributed trace propagation without writing any instrumentation code. The trace context travels between services using the W3C traceparent header, so a single request that touches five services produces one connected trace.

@RestController
@RequestMapping("/api/orders")
public class OrderController {

    private final OrderService orderService;
    private final RestClient restClient;

    @GetMapping("/{id}")
    public ResponseEntity<Order> getOrder(@PathVariable Long id) {
        // Automatically observed: http.server.requests metric + trace span
        Order order = orderService.findById(id);

        // RestClient calls are also auto-instrumented;
        // trace context propagates via W3C traceparent headers
        CustomerDetails customer = restClient.get()
            .uri("http://customer-service/api/customers/{id}", order.getCustomerId())
            .retrieve()
            .body(CustomerDetails.class);

        return ResponseEntity.ok(order.withCustomer(customer));
    }
}

Building Production Dashboards

With metrics flowing to Prometheus and traces to Grafana Tempo, you can build dashboards that correlate the two. The key is using exemplars — metric samples that carry a trace ID — so you can click a spike on a latency chart and jump straight to the trace that caused it. This single feature collapses the typical “I see a p99 spike but which request was it?” investigation from minutes to seconds.

# docker-compose.yml — observability stack
services:
  prometheus:
    image: prom/prometheus:v2.51.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  tempo:
    image: grafana/tempo:2.4.0
    command: ["-config.file=/etc/tempo.yml"]
    volumes:
      - ./tempo.yml:/etc/tempo.yml
    ports:
      - "3200:3200"   # Tempo API
      - "9411:9411"   # Zipkin receiver

  grafana:
    image: grafana/grafana:10.4.0
    environment:
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
    ports:
      - "3000:3000"
    volumes:
      - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/ds.yml

Advanced: Custom ObservationHandler

For specialized requirements, you can create a custom observation handler that reacts to lifecycle events — start, stop, error, and scope changes. This is useful for audit logging, custom SLI calculation, or forwarding events to an external system without scattering that logic through your business code.

@Component
public class AuditObservationHandler
        implements ObservationHandler<PaymentObservationContext> {

    private final AuditLog auditLog;

    @Override
    public void onStop(PaymentObservationContext context) {
        auditLog.record(AuditEntry.builder()
            .action("payment.processed")
            .status(context.getStatus())
            .traceId(context.get(TraceContext.class).traceId())
            .duration(context.getDuration())
            .build());
    }

    @Override
    public boolean supportsContext(Observation.Context context) {
        return context instanceof PaymentObservationContext;
    }
}

When NOT to Use Micrometer Observations (Trade-offs)

While the Observation API is powerful, there are scenarios where it adds unnecessary overhead. Avoid wrapping extremely hot loops or CPU-bound computations — the per-observation cost, though small, compounds at millions of iterations per second and can distort the very latency you are trying to measure. For simple in-memory work, a plain counter or gauge is cheaper and clearer.

Furthermore, if you run a non-Spring framework that already ships OpenTelemetry auto-instrumentation — Quarkus or Micronaut, for example — layering Micrometer observations on top can create duplicate spans and double-counted metrics. In those cases, prefer the framework’s native instrumentation. Finally, remember that observability is not free at the backend either: high sampling plus high cardinality is the fastest way to an expensive, slow monitoring bill, so treat both as budgets to manage rather than dials to max out.

Production observability monitoring dashboard — Production monitoring dashboard with correlated metrics and traces

Key Takeaways

Micrometer observability Spring Boot integration unifies metrics, traces, and logs through a single Observation API.
Use lowCardinalityKeyValue for metric tags and highCardinalityKeyValue for trace attributes to avoid cardinality explosions.
Tune sampling per environment, and add tail-based sampling at the Collector so you keep errors and slow traces without paying for everything.
Spring Boot 3.x auto-instruments HTTP and RestClient calls — custom observations are only needed for business logic.
Exemplars bridge the gap between metrics dashboards and individual traces for fast root-cause analysis.

External Resources

In conclusion, Micrometer observability Spring Boot instrumentation is foundational for operating modern Java services. By unifying metrics, traces, and logs, guarding cardinality, and budgeting sampling deliberately, you can build systems that are observable, debuggable, and affordable. Start with the auto-instrumented basics, add custom observations only where business logic demands them, and continuously measure to confirm the signals are actually earning their keep.

Micrometer Observability in Spring Boot: Unified Metrics and Tracing Guide