Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

Grafana Blog

Grafana Cloud's query fair usage policy sounds generous on paper—100x your monthly ingested log volume in free queries—but the billing calculation can bite you if you're not monitoring the right metrics. The formula is straightforward: logs billable GB equals the maximum of either your ingested GB or your queried GB divided by 100. Ingest 50 GB and query 7,000 GB? You're billed for 70 GB, not 50. That 40% cost increase appears at month-end with no warning if you're not watching your query usage ratio.

The real problem is that query volume accumulates from sources you probably aren't thinking about. Misconfigured Grafana alerting rules are the primary culprit. A rule that queries one hour of data but runs every minute generates 1,440 executions daily, scanning potentially massive log volumes depending on your retention and label cardinality. If you're using range queries instead of instant queries in your Loki alerting rules, you're compounding the problem—range queries execute multiple instant queries under the hood, multiplying your scanned bytes.

The Loki query fair usage dashboard, auto-provisioned for Grafana Cloud customers, breaks down query sources into three categories: grafana-alerts (rules managed in Grafana Alerting), dashboards, and Explore/other. The dashboard shows query bytes and execution frequency per rule, including the originating query text and username when available. This is where you'll find the smoking gun—that one alert rule someone configured six months ago that's now responsible for 60% of your query volume.

The distinction between Grafana-managed alerts and Loki ruler alerts matters here. Rules you upload to Loki via cortextool or lokitool using the Grafana Cloud APIs don't appear under the grafana-alerts category. If you're running a hybrid alerting setup, you need to audit both systems separately. The Explore/other category captures ad-hoc queries from the Explore UI and non-Grafana sources like logcli, which can spike unpredictably if multiple engineers are debugging production issues simultaneously.

The billing calculation happens monthly at the stack level, which means you can't offset high query usage in one stack with low usage in another if you're running multiple Grafana instances. Early in the billing cycle, your usage ratio often looks alarming because you've accumulated query volume but haven't ingested a full month of logs yet. This creates false urgency, but it's still worth investigating spikes rather than assuming they'll normalize.

Practical mitigation starts with aligning alert rule intervals with query ranges. A rule with a 1m evaluation interval should query exactly 1m of data, not 1h. Switch all Loki alerting rules to instant queries—they execute once per evaluation and produce one data point per matched series, versus range queries that execute repeatedly across the time window. For frequently-run expensive queries, use Prometheus recording rules to pre-calculate results, though this only helps if you're already running Prometheus alongside Loki.

The cost management UI shows your query usage ratio in real-time, but you're only billed retrospectively. If you breach the 100x threshold mid-month, you won't see the financial impact until the next invoice. Set up your own alerting on the query usage ratio metric if you're operating near the threshold—waiting for the billing dashboard to show overages means you've already incurred the cost. For teams running high-volume logging, the difference between 90x and 110x query usage can mean thousands of dollars monthly, making query optimization a first-class operational concern alongside ingestion costs.

Read original source →

Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

Related Articles