DevToolBoxFREE
Blog

OpenTelemetry Complete Guide: Unified Observability for Modern Applications

20 min readby DevToolBox Team
TL;DROpenTelemetry is the CNCF open-source observability framework that unifies distributed traces, metrics, and logs. It provides a standardized API across languages, a vendor-agnostic Collector for data collection and export, supports auto and manual instrumentation, and is the foundation for building modern observability platforms.
Key Takeaways
  • OTel unifies three signals: traces, metrics, and logs with correlated context
  • Architecture splits into API (interfaces), SDK (implementation), and Collector (data pipeline)
  • Auto-instrumentation generates telemetry data with zero code changes
  • OTLP is the standard protocol supported by all major backends
  • Tail-based sampling reduces costs while retaining error traces
  • Kubernetes Operator simplifies in-cluster deployment and management

What Is OpenTelemetry?

OpenTelemetry (OTel) is a vendor-neutral, open-source observability framework for generating, collecting, and exporting telemetry data. Hosted by the CNCF, it was formed by merging OpenTracing and OpenCensus, providing a unified set of APIs, SDKs, and tools covering traces, metrics, and logs.

OpenTelemetry Architecture

OTel architecture consists of three layers: API (defines interfaces), SDK (provides implementation), and Collector (data pipeline).

Application Layer           Collector Layer            Backend Layer
+--------------------+     +--------------------+    +------------+
| OTel API           |     | Receivers          |    | Jaeger     |
|  TracerProvider     | --> |  otlp, jaeger,     | -> | Tempo      |
|  MeterProvider      |     |  prometheus, zipkin|    | Zipkin     |
|  LoggerProvider     |     +--------------------+    +------------+
+--------------------+     | Processors         |    | Prometheus |
| OTel SDK           |     |  batch, filter,    | -> | Mimir      |
|  SpanProcessor      |     |  attributes, sample|    | Datadog    |
|  MetricReader       |     +--------------------+    +------------+
|  LogRecordProcessor |     | Exporters          |    | New Relic  |
|  OTLP Exporter      |     |  otlp, prometheus, | -> | Grafana    |
+--------------------+     |  datadog, debug    |    | Loki       |
                           +--------------------+    +------------+

API Layer

The API defines zero-dependency interfaces (TracerProvider, MeterProvider, LoggerProvider). Library authors instrument against the API without pulling in specific implementations.

SDK Layer

The SDK provides concrete implementations including Span processors, metric aggregators, and exporters. Application developers configure the SDK at the entry point.

Collector Layer

The Collector is a standalone service that receives, processes, and exports data. It decouples applications from backends, supporting batching, retries, and multi-destination export.

Three Signals: Traces, Metrics, and Logs

Distributed Traces

Traces record the complete path a request takes through a distributed system. A Trace is composed of multiple Spans forming a tree via parent-child relationships.

Trace: [trace_id: abc123]
|
+-- Span A: API Gateway (root span, 250ms)
|   attributes: http.method=GET, http.url=/api/orders
|   +-- Span B: Order Service (200ms)
|   |   +-- Span C: DB Query (45ms, db.system=postgresql)
|   |   +-- Span D: Cache Lookup (3ms, db.system=redis)
|   +-- Span E: Payment Service (35ms, status=ERROR)

Metrics

OTel Metrics defines Counter (monotonically increasing), Histogram (distribution), Gauge (instantaneous), and UpDownCounter (bidirectional).

Logs

OTel Logs integrates with existing frameworks (Log4j, SLF4J, Python logging) via a Bridge API, correlating log records with trace context.

Installation and Setup

Node.js

npm install @opentelemetry/api @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-metrics-otlp-http
// tracing.ts - Initialize OpenTelemetry
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from
  '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from
  '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from
  '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from
  '@opentelemetry/sdk-metrics';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://localhost:4318/v1/metrics',
    }),
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'my-node-service',
});

sdk.start();
process.on('SIGTERM', () => sdk.shutdown());

Python

pip install opentelemetry-api opentelemetry-sdk \
  opentelemetry-exporter-otlp opentelemetry-instrumentation

# Auto-instrument - no code changes needed:
opentelemetry-instrument \
  --service_name my-python-service \
  --traces_exporter otlp \
  --metrics_exporter otlp \
  --exporter_otlp_endpoint http://localhost:4317 \
  python app.py

Go

go get go.opentelemetry.io/otel \
  go.opentelemetry.io/otel/sdk \
  go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp

// main.go
func initTracer() (func(context.Context) error, error) {
  exporter, err := otlptracehttp.New(
    context.Background(),
    otlptracehttp.WithEndpoint("localhost:4318"),
    otlptracehttp.WithInsecure(),
  )
  if err != nil { return nil, err }
  tp := sdktrace.NewTracerProvider(
    sdktrace.WithBatcher(exporter),
    sdktrace.WithResource(resource.NewWithAttributes(
      semconv.SchemaURL,
      semconv.ServiceNameKey.String("my-go-service"),
    )),
  )
  otel.SetTracerProvider(tp)
  return tp.Shutdown, nil
}

Java

# Download the Java agent and run with your app
curl -L -o opentelemetry-javaagent.jar \
  https://github.com/open-telemetry/\
opentelemetry-java-instrumentation/releases/latest/\
download/opentelemetry-javaagent.jar

java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.service.name=my-java-service \
  -Dotel.exporter.otlp.endpoint=http://localhost:4317 \
  -jar myapp.jar

Auto-Instrumentation

Auto-instrumentation patches popular libraries to generate spans and propagate context without code changes. Commonly supported libraries:

  • Node.js: Express, Fastify, HTTP, gRPC, pg, mysql2, Redis, MongoDB, AWS SDK
  • Python: Flask, Django, FastAPI, requests, psycopg2, SQLAlchemy, Redis, Celery
  • Go: net/http, gRPC, database/sql, Gin, Echo
  • Java: Spring Boot, Servlet, JDBC, Hibernate, Kafka, gRPC, OkHttp

Manual Instrumentation

Manual instrumentation gives full control over telemetry data for business-specific spans, custom attributes, and metrics.

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service', '1.0.0');

async function processOrder(orderId: string) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      span.addEvent('order.validation.started');
      const order = await validateOrder(orderId);
      span.addEvent('order.validation.completed', {
        'order.items_count': order.items.length,
      });
      // Nested span for payment
      await tracer.startActiveSpan('processPayment',
        async (paymentSpan) => {
          paymentSpan.setAttribute(
            'payment.method', order.paymentMethod);
          await chargePayment(order);
          paymentSpan.end();
        });
      span.setStatus({ code: SpanStatusCode.OK });
      return order;
    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message,
      });
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Custom Metrics

import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('order-service', '1.0.0');

// Counter
const orderCounter = meter.createCounter('orders.processed.total',
  { description: 'Total orders processed', unit: 'orders' });

// Histogram
const durationHist = meter.createHistogram(
  'orders.processing.duration',
  { description: 'Processing time', unit: 'ms' });

// Observable Gauge
const activeGauge = meter.createObservableGauge(
  'orders.active.count',
  { description: 'Active orders' });
activeGauge.addCallback((r) => r.observe(getActiveCount()));

orderCounter.add(1, { 'order.type': 'standard' });
durationHist.record(245, { 'order.type': 'standard' });

Context Propagation

Context propagation links spans across services into complete traces. W3C Trace Context injects trace ID, span ID, and sampling flags via the traceparent header.

// W3C Trace Context header:
// traceparent: 00-<trace-id>-<parent-span-id>-<flags>

import { context, propagation } from '@opentelemetry/api';

// Inject context into outgoing request
function makeRequest(url: string) {
  const headers: Record<string, string> = {};
  propagation.inject(context.active(), headers);
  return fetch(url, { headers });
}

// Extract context from incoming request
function handleRequest(req: Request) {
  const ctx = propagation.extract(
    context.active(), req.headers);
  return context.with(ctx, () => {
    return tracer.startActiveSpan('handle', (span) => {
      // child of caller span
      span.end();
    });
  });
}

Span Attributes and Events

Attributes are key-value metadata, events are point-in-time records within a span. OTel Semantic Conventions standardize common attribute names.

// Semantic Conventions examples:
// HTTP: http.request.method, http.response.status_code, url.full
// DB:   db.system, db.statement, db.operation.name
// RPC:  rpc.system, rpc.service, rpc.method

span.setAttribute('http.request.method', 'POST');
span.setAttribute('http.response.status_code', 200);
span.setAttribute('db.system', 'postgresql');
span.setAttribute('db.statement', 'SELECT * FROM orders WHERE id=?');

span.addEvent('cache.miss', {
  'cache.key': 'user:1234',
  'cache.backend': 'redis',
});

span.recordException(new Error('Connection timeout'));

Exporters

Exporters send telemetry data to backends. OTLP is the native protocol supported by all major backends.

  • OTLP (gRPC / HTTP): Recommended standard, supports all three signals
  • Jaeger: Direct export to Jaeger (Thrift or gRPC)
  • Zipkin: Zipkin-compatible backends
  • Prometheus: Expose /metrics endpoint for scraping
  • Console/Debug: For development debugging

Collector Configuration

The Collector is configured via YAML defining receivers, processors, exporters, and pipelines:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }
  prometheus:
    config:
      scrape_configs:
        - job_name: app-metrics
          scrape_interval: 15s
          static_configs:
            - targets: ['app:9090']
  jaeger:
    protocols:
      thrift_http: { endpoint: 0.0.0.0:14268 }

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
  filter:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.target"] == "/health"'
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048
    spike_limit_mib: 512

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls: { insecure: true }
  otlp/mimir:
    endpoint: mimir:4317
    tls: { insecure: true }
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
      processors: [memory_limiter, filter, batch, resource]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch, resource]
      exporters: [otlp/mimir]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource]
      exporters: [debug]

Running the Collector

# Docker
docker run -d --name otel-collector \
  -p 4317:4317 -p 4318:4318 \
  -v ./otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector-contrib:latest

# Docker Compose
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otelcol/config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol/config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8888:8888"   # Collector metrics

Sampling Strategies

Collecting every trace in high-traffic systems is impractical. Sampling strategies balance observability and cost.

Head-Based Sampling (SDK)

import {
  TraceIdRatioBasedSampler,
  ParentBasedSampler,
} from '@opentelemetry/sdk-trace-base';

// ParentBased: respect parent decision, sample 10% of roots
const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),
});

const sdk = new NodeSDK({ sampler, /* ... */ });

Tail-Based Sampling (Collector)

Tail-based sampling decides in the Collector after seeing complete traces, ideal for retaining all error and high-latency traces.

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    policies:
      - name: error-policy
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: latency-policy
        type: latency
        latency: { threshold_ms: 2000 }
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }
      - name: string-attr-policy
        type: string_attribute
        string_attribute:
          key: priority
          values: [high, critical]

Integrating with Observability Backends

Grafana Stack (Tempo + Mimir + Loki)

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls: { insecure: true }
  otlphttp/mimir:
    endpoint: http://mimir:9009/otlp
  otlphttp/loki:
    endpoint: http://loki:3100/otlp

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/mimir]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/loki]

Datadog

exporters:
  datadog:
    api:
      key: "\${DD_API_KEY}"
      site: datadoghq.com
    traces:
      span_name_as_resource_name: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog]

New Relic

exporters:
  otlp/newrelic:
    endpoint: otlp.nr-data.net:4317
    headers:
      api-key: "\${NEW_RELIC_LICENSE_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/newrelic]

Kubernetes Deployment

The OpenTelemetry Operator is the recommended way to manage OTel in Kubernetes, providing CRDs for Collectors and auto-instrumentation injection.

# Install cert-manager + OTel Operator
kubectl apply -f https://github.com/cert-manager/cert-manager/\
releases/download/v1.14.0/cert-manager.yaml

helm repo add open-telemetry \
  https://open-telemetry.github.io/opentelemetry-helm-charts
helm install otel-operator \
  open-telemetry/opentelemetry-operator \
  --namespace otel-system --create-namespace

Collector CRD (DaemonSet Mode)

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: otel-system
spec:
  mode: daemonset
  config:
    receivers:
      otlp:
        protocols:
          grpc: { endpoint: 0.0.0.0:4317 }
          http: { endpoint: 0.0.0.0:4318 }
    processors:
      batch: { timeout: 5s }
      k8sattributes:
        extract:
          metadata:
            - k8s.pod.name
            - k8s.namespace.name
            - k8s.deployment.name
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
    exporters:
      otlp:
        endpoint: tempo.observability:4317
        tls: { insecure: true }
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [otlp]

Auto-Instrumentation Injection

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: otel-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector.otel-system:4317
  propagators: [tracecontext, baggage]
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/\
autoinstrumentation-nodejs:latest
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/\
autoinstrumentation-python:latest
---
# Annotate Deployment for auto-instrumentation
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-node-app
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-nodejs: "true"
    spec:
      containers:
        - name: app
          image: my-node-app:latest

Best Practices

1. Follow Semantic Conventions

Use standard attribute names (e.g., http.request.method not method) for cross-service consistency and automatic backend parsing.

2. Set Resource Attributes

import { Resource } from '@opentelemetry/resources';
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
  ATTR_DEPLOYMENT_ENVIRONMENT_NAME,
} from '@opentelemetry/semantic-conventions';

const resource = new Resource({
  [ATTR_SERVICE_NAME]: 'order-service',
  [ATTR_SERVICE_VERSION]: '2.1.0',
  [ATTR_DEPLOYMENT_ENVIRONMENT_NAME]: 'production',
});

3. Control Span Granularity

Create spans for network calls, database operations, and critical business operations. Avoid creating spans in tight loops.

4. Handle Span Lifecycle Correctly

Always end spans in a finally block. Use startActiveSpan for async operations to maintain context.

5. Production Sampling

Never use AlwaysOn. Start with ParentBased + TraceIdRatio(0.1), add tail-based sampling to retain error traces.

6. Use the Collector

The Collector provides buffering, retries, batching, and multi-destination export. Reduces network connections and resource usage on the application side.

7. Correlate All Three Signals

// Inject trace context into logs (Node.js + Winston)
import { trace, context } from '@opentelemetry/api';
import winston from 'winston';

const logger = winston.createLogger({
  format: winston.format.combine(
    winston.format((info) => {
      const span = trace.getSpan(context.active());
      if (span) {
        const ctx = span.spanContext();
        info.trace_id = ctx.traceId;
        info.span_id = ctx.spanId;
      }
      return info;
    })(),
    winston.format.json()
  ),
  transports: [new winston.transports.Console()],
});
// Output: {"message":"Order processed",
//  "trace_id":"abc...","span_id":"def..."}

8. Set Collector Resource Limits

Configure memory_limiter to prevent OOM, set resources.limits in K8s, and monitor the Collector built-in metrics.

Conclusion

OpenTelemetry is becoming the de facto standard for observability. Its vendor-neutral design lets you instrument once and export to any backend. Unified context correlation across all three signals makes debugging distributed systems manageable. The Collector flexible pipelines make data processing straightforward. Start with auto-instrumentation for quick value, then gradually add manual instrumentation, optimize sampling, deploy Collector pipelines, and build a production-ready, full-stack observability platform.

𝕏 Twitterin LinkedIn
Was this helpful?

Stay Updated

Get weekly dev tips and new tool announcements.

No spam. Unsubscribe anytime.

Try These Related Tools

{ }JSON FormatterJSON Validator

Related Articles

Prometheus Complete Guide: Monitoring and Alerting for Modern Infrastructure

Master Prometheus with metric types, PromQL, recording rules, alerting, Alertmanager, exporters, Grafana integration, Kubernetes monitoring, and long-term storage.

Grafana Complete Guide: Dashboards and Observability for DevOps

Master Grafana with data sources, dashboards, panel types, variables, alerting, provisioning, Loki, Tempo, RBAC, plugins, and best practices.

Kubernetes Complete Guide for Developers: Pods, Helm, RBAC, and CI/CD

Master Kubernetes with this developer guide. Covers Pods, Deployments, Services, Ingress, Helm, PVC, health checks, HPA, RBAC, and CI/CD integration with GitHub Actions.