How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Grafana Labs Blog

Monitoring LLM applications in production presents challenges that traditional APM tools weren't built to handle. When you're routing requests across multiple model providers, token consumption and costs matter as much as latency and error rates. Provider-specific rate limits, model-specific timeout characteristics, and the non-deterministic nature of generation failures create observability gaps that application-level logging struggles to fill.

OpenRouter's Broadcast feature addresses this by generating OpenTelemetry traces at the infrastructure layer for every API request passing through their unified LLM gateway. The implementation is straightforward: configure your Grafana Cloud OTLP endpoint once in the OpenRouter dashboard, and traces start flowing automatically. No SDK installation, no application code changes, no additional request latency.

Each trace captures the full lifecycle of an LLM request with attributes following OpenTelemetry semantic conventions for generative AI. You get model information including which model was requested versus which actually served the response, complete token counts for input and output, timing breakdowns showing time to first token and generation speed, per-request costs in USD, and termination reasons. Custom metadata like user IDs or feature flags that you attach to requests flows through as span attributes.

The traces land in Grafana Cloud Traces backed by Tempo, queryable via TraceQL. This matters because it puts LLM observability in the same interface your team already uses for the rest of your infrastructure. You're not context-switching to a separate AI-specific monitoring tool.

The cost visibility use case is immediately practical. When you're using GPT-4o for some requests and Claude 3.5 Haiku for others, understanding spend attribution requires more than checking your provider bills at month-end. A TraceQL query filtering on custom metadata lets you break down costs by feature, environment, or customer segment. Teams use this to answer questions like whether the expensive model is actually delivering better outcomes for specific use cases, or whether batch workloads could shift to cheaper alternatives without quality degradation.

Latency monitoring becomes more nuanced with LLM workloads. Total response time matters, but time to first token often matters more for user-perceived performance. A request that starts streaming tokens after 500ms but takes 10 seconds total feels more responsive than one that waits 3 seconds before streaming anything. OpenRouter traces capture both metrics, letting you build dashboards showing p95 time-to-first-token by model and set alerts when it exceeds thresholds.

Error debugging benefits from having full request context in a single trace. When a request fails, you can see whether it hit a provider rate limit, timed out during generation, or returned a truncated response. The trace shows which provider actually handled the request if you're using fallbacks, and includes the finish reason attribute that indicates whether generation completed normally or stopped due to length limits or content filters.

The infrastructure-layer approach scales better than application instrumentation as your LLM usage grows. When you're making calls from multiple services or experimenting with different models, maintaining consistent logging across your codebase becomes tedious. Having observability handled at the gateway means new services get full tracing automatically.

There are tradeoffs. You're limited to the attributes OpenRouter exposes, so if you need to trace through your entire application stack including database queries and external API calls, you'll still need application-level instrumentation. The traces show what happened at the LLM API boundary but not what your application did before or after. For teams using OpenRouter as their primary LLM interface though, this covers the most critical blind spots.

The integration works with standard Grafana Cloud credentials: your OTLP gateway endpoint, instance ID, and an API token with traces write permissions. Configuration happens entirely in the OpenRouter dashboard with a test connection button to verify the setup before enabling.

For platform teams managing LLM infrastructure, this pattern of infrastructure-layer observability is worth considering regardless of whether you use OpenRouter specifically. Pushing tracing down to the gateway or proxy layer reduces instrumentation burden and ensures consistency across services.

Read original source →

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Related Articles