Measure the business impact of every product change with Datadog Experiments

Datadog Blog

Datadog Experiments attempts to solve a real problem in modern experimentation platforms: the fragmentation between what users do, how systems perform under load, and what actually moves business metrics. Most teams run A/B tests in tools like Optimizely or LaunchDarkly, then jump to Datadog for performance metrics, back to Amplitude or Mixpanel for user behavior, and finally to their data warehouse for revenue impact. Datadog is betting that collapsing this workflow into a single platform will let teams ship experiments faster and catch performance regressions before they tank conversion rates.

The core value proposition is correlation at query time. When you're running an experiment, Datadog lets you see user-level behavioral events (clicks, page views, session duration) alongside infrastructure metrics (p95 latency, error rates, database query times) and business outcomes pulled directly from your warehouse (revenue per user, cart abandonment, subscription upgrades). This matters because the failure mode of most A/B tests isn't statistical—it's that variant B improves click-through rate but destroys page load time, and you don't notice until three days into the test when someone manually checks APM dashboards.

The warehouse-native metrics piece is particularly interesting. Rather than forcing you to instrument and send business events through Datadog's ingestion pipeline, you can query metrics directly from Snowflake, BigQuery, or Redshift. This means your source of truth for revenue, LTV, or churn stays in the warehouse where your finance team already trusts it, but you can still slice those metrics by experiment variant in real time. The tradeoff is query latency—warehouse queries are slower than hitting Datadog's time-series database, so you're looking at seconds rather than milliseconds for dashboard refreshes. For experiment analysis that's usually fine, but it rules out using these metrics for real-time decisioning or automated rollbacks.

The behavioral analytics component is essentially Datadog's answer to product analytics tools. You instrument events with their SDK, and it tracks user sessions, funnels, and retention cohorts. The differentiation is tight integration with RUM (Real User Monitoring). If variant B has a 15% drop in checkout completion, you can immediately pivot to see if it correlates with JavaScript errors, slow API calls, or specific browser versions. This beats the typical workflow of exporting cohorts from your product analytics tool, then trying to match user IDs in Datadog APM.

The obvious question is whether this justifies consolidating yet another tool into Datadog. If you're already running RUM and APM in Datadog, the incremental cost and complexity is low—you're adding event instrumentation and warehouse connectors, not rearchitecting your observability stack. But if you're on New Relic or Grafana for infrastructure and happy with your current experimentation workflow, the switching cost is steep. You're not just migrating dashboards; you're changing how product and engineering teams collaborate on experiments.

The platform makes the most sense for teams where performance directly impacts conversion and you're already Datadog-heavy. E-commerce platforms, SaaS products with complex onboarding flows, and marketplaces where latency kills transactions are the sweet spot. If your experiments are mostly backend algorithm changes where user-facing performance is stable, the unified workflow matters less—you can keep running experiments in your existing platform and checking Datadog when something breaks.