Monitor Model Context Protocol (MCP) servers with OpenLIT and Grafana Cloud

Grafana Labs Blog

Model Context Protocol servers sit between your AI agents and the tools they need to call. When an agent decides it needs to query a database, fetch weather data, or search documents, that request flows through an MCP server. The protocol itself is straightforward, but production deployments quickly surface questions you can't answer without instrumentation: Why did that tool call take three seconds when it usually takes 300ms? Did the agent even receive the response? Which tool is hammering your rate limits?

OpenLIT offers auto-instrumentation for MCP servers with minimal code changes. The value proposition is simple: add openlit.init() to your server code, point it at an OTLP endpoint, and get distributed traces that span from agent request through tool execution. When paired with Grafana Cloud's managed Tempo and Prometheus backends, you get pre-built dashboards that surface tool invocation latency, error rates, and context window consumption without writing PromQL or TraceQL by hand.

The instrumentation captures spans at each protocol boundary. When a client calls list_tools, OpenLIT records the handshake latency. When the agent invokes search_documents, you get a span for the MCP layer plus child spans for any downstream API calls the tool makes. This matters because MCP servers often proxy requests to external services with their own latency characteristics. A slow Elasticsearch query or a third-party API timeout shows up as a distinct span, so you can tell whether the bottleneck is in your MCP server logic or the tool itself.

Context window tracking is particularly useful for cost control. Agents that query multiple tools in a single turn can accumulate large context payloads. OpenLIT exposes metrics like context window size and memory usage per request, which feed into Grafana dashboards showing 95th and 99th percentile resource consumption. If you're running MCP servers on Kubernetes with horizontal pod autoscaling, these metrics help you set accurate CPU and memory requests instead of guessing.

The zero-code instrumentation option via openlit-instrument is worth calling out. You can wrap an existing MCP server without modifying source code, which is useful for third-party tools or legacy services. The CLI accepts OTLP endpoint, service name, and environment tags as flags, so you can instrument a production server by changing the process invocation in your systemd unit or Kubernetes deployment spec.

One practical consideration: OpenLIT uses OpenTelemetry's semantic conventions for LLM observability, which means span attributes follow a consistent schema. If you're already collecting traces from LangChain, LlamaIndex, or other agent frameworks, MCP spans will use the same attribute names for model, token counts, and error types. This consistency simplifies queries across your agentic stack. For example, a TraceQL query filtering on llm.system = "mcp" and status = error will surface all MCP failures alongside LLM API errors in a unified view.

The Grafana Cloud integration includes dashboards for tool performance, protocol health, and error tracking. Tool performance shows invocation counts and latency distributions per tool, so you can identify which tools are slow or overused. Protocol health tracks handshake failures and version mismatches, which matter when you're running multiple MCP server versions or clients with different protocol implementations. Error tracking aggregates exceptions by tool and error type, making it easier to prioritize fixes.

For teams running self-hosted observability stacks, OpenLIT supports any OTLP-compatible backend. You can point it at a local OpenTelemetry Collector, Jaeger, or Tempo instance by setting the OTEL_EXPORTER_OTLP_ENDPOINT environment variable. The instrumentation layer remains the same, so you're not locked into Grafana Cloud if your requirements change.