How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Grafana Blog

OpenRouter's Broadcast feature solves a specific problem that most LLM platform teams hit around month three of production: you're routing requests across multiple providers and models, costs are climbing faster than usage, and when something breaks, you're stuck correlating application logs with provider dashboards that each use different formats and retention policies. The integration with Grafana Cloud puts OpenTelemetry traces directly into your existing observability stack without requiring code changes or SDK instrumentation.

The core value proposition is infrastructure-level tracing for multi-model deployments. When you're using OpenRouter's unified API to route between GPT-4, Claude, Gemini, and open-source models, each request generates a trace with model selection, token counts, timing breakdowns, and cost data. This gets sent via OTLP to Grafana Cloud Traces automatically. No manual span creation, no custom logging, no parsing provider-specific response formats.

What this actually captures matters more than the mechanism. Each trace includes which model was requested versus which actually served the response, critical when you're using fallbacks or load balancing. You get input tokens, output tokens, and USD cost per request, which lets you build dashboards showing cost attribution by feature, user cohort, or API key without custom analytics infrastructure. Time to first token appears as a separate metric from total duration, addressing the specific UX concern where streaming responses feel broken if they don't start within two seconds even if total generation is fast.

The token cost visibility addresses a real gap. Most teams track LLM costs through provider billing dashboards or by parsing response headers and aggregating in application code. Provider dashboards update with billing lag and don't map to your internal cost centers. Application-level tracking requires maintaining token counting logic for each provider's format and keeping it in sync as providers change their APIs. OpenRouter already handles the routing layer, so it has authoritative token counts and can apply current pricing without your code needing updates when GPT-4 pricing changes.

Where this breaks down is if you need request-level prompt and completion content in your traces. OpenTelemetry semantic conventions for GenAI include prompt and completion as span attributes, and the article shows these in trace screenshots, but production traces with full prompt text create data volume and PII concerns. Grafana Cloud Traces charges by span ingestion and storage, so teams with high request volumes need to calculate whether per-request tracing at scale fits their observability budget. A service doing 10 million LLM requests per month with average 500 token prompts captured in span attributes generates substantial trace data.

The TraceQL examples shown are straightforward filters on service name, duration, and custom metadata. This works for basic cost analysis and latency monitoring, but more sophisticated analysis like "show me all requests where the model switched from primary to fallback and latency exceeded p95" requires understanding which span attributes OpenRouter actually populates and how model routing decisions appear in the trace structure. The article doesn't detail the full span schema or whether fallback attempts create separate spans or just attributes on a single span.

For teams already running Grafana Cloud for application observability, the integration cost is configuration time plus incremental trace storage. For teams not using Grafana, this creates a decision point about whether LLM observability alone justifies adding another platform or whether to build similar visibility using existing tools. The "no code changes" benefit assumes you're already using OpenRouter as your LLM gateway. If you're calling provider APIs directly, you'd need to migrate to OpenRouter first, which is a larger architectural change than adding observability.

The practical use cases described focus on operational visibility rather than quality evaluation. You can track costs, latency, error rates, and usage patterns, but not whether responses were factually correct, contextually appropriate, or met task requirements. This is infrastructure observability, not LLM evaluation. Teams still need separate tooling for response quality, hallucination detection, or safety monitoring. The value is in consolidating the operational metrics that were previously scattered across provider dashboards and application logs into a unified trace view that maps to your actual request flow through a multi-model routing layer.

Read original source →

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Related Articles