OBI Gives Incident Response the Request Context It Needs

OpenTelemetry Blog

When you're debugging a production LLM service at 2am, your traces show a spike in 500s from your inference endpoint. The aggregate error rate is up 12 percent, P99 latency jumped from 800ms to 3.2 seconds, and your on-call runbook tells you to check model health and gateway logs. What it doesn't tell you is whether this affects all traffic or just requests from a specific tenant, API key tier, or model version. That gap between "something is broken" and "this specific thing is broken for this specific segment" is where most of your mean time to diagnosis lives.

OpenTelemetry eBPF Instrumentation v0.7.0 addresses this by enabling HTTP header enrichment directly in spans without touching application code. You configure OBI to extract headers like X-Tenant-ID, X-Model-Version, or X-User-Tier and attach them as span attributes. When an error occurs, the trace now carries the context needed to immediately scope the problem. Instead of querying logs to correlate request IDs back to customer segments, the span itself tells you this was tenant_abc hitting gpt-4-turbo on the premium tier.

This matters more for LLM systems than traditional services because the failure modes are less uniform. A prompt that triggers a safety filter, a context window overflow, or a rate limit breach often affects specific usage patterns rather than all traffic. If your RAG pipeline starts returning empty results, knowing whether it's isolated to queries with more than five retrieved chunks or specific to one document collection changes your diagnosis path entirely. Without request context in spans, you're aggregating across heterogeneous workloads and missing the signal.

The implementation uses eBPF to hook into network syscalls and inject span attributes before telemetry export. This means zero application changes and no SDK version bumps. For teams running polyglot stacks where some services use the Python OpenTelemetry SDK, others use LangChain tracing, and a few are still on legacy instrumentation, this is the only practical way to get consistent context enrichment across the board. You configure it once at the infrastructure layer rather than coordinating changes across six service repos.

The tradeoff is cardinality. If you enrich spans with high-cardinality headers like request IDs or session tokens, your trace backend storage costs will spike and query performance degrades. The right pattern is enriching with bounded dimensions: tenant IDs if you have hundreds not millions, model versions, deployment zones, or API key tiers. For user-level debugging, keep request IDs in logs and use trace IDs to correlate, but don't put every unique identifier into span attributes.

Where this breaks down is when the context you need isn't in HTTP headers. If your incident requires correlating to feature flags evaluated mid-request or to which vector index shard was queried, eBPF header extraction won't help. You still need application-level instrumentation for semantic context like retrieved document IDs, prompt token counts, or cache hit rates. OBI gives you the request envelope; you still own the payload.

For teams already running OpenTelemetry, the switching cost is a config file and a daemonset update. The value shows up the first time you can immediately filter traces to affected tenants instead of spending twenty minutes in log aggregation queries.