Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

Grafana Labs Blog

Grafana Cloud's query fair usage policy operates on a straightforward 100x multiplier against your monthly log ingestion, but the billing mechanics reveal some nuances worth understanding before you hit an unexpected overage.

The core formula is simple: ingest 50 GB of logs per month and you can query up to 5,000 GB without additional charges. The billing calculation uses max(ingested_gb, queried_gb / 100), evaluated per stack at month end. This means if you ingest 50 GB but query 7,000 GB, you're billed for 70 GB rather than your actual 50 GB ingestion. The policy exists because Loki's query engine can scan massive datasets, and without constraints, a handful of poorly constructed queries create disproportionate infrastructure load.

The timing aspect matters more than you might expect. Early in the billing cycle, your usage ratio often looks alarming because you haven't accumulated much ingestion volume yet, but the denominator grows throughout the month. Don't panic on day three when the ratio shows 80x if you're running steady ingestion.

The Loki query fair usage dashboard, auto-provisioned on hosted Grafana instances, breaks down consumption by source: dashboards, Grafana-managed alerts, and Explore queries. Pay particular attention to the Grafana-alerts category, which covers rules managed under Home > Alerts & IRM > Alerting. These are distinct from rules uploaded directly to Loki via cortextool or lokitool using the Cloud APIs, an important distinction when tracking down runaway query volume.

The most common culprit for overages is alerting rule misconfiguration. A typical antipattern: an alert rule querying one hour of data but evaluating every minute. That's a 60x amplification factor right there. The fix is switching to instant queries instead of range queries for Loki-based alert rules. An instant query executes once and produces one data point per matched series, while range queries are effectively multiple instant queries stitched together. More critically, align your evaluation interval with your query range. A rule running every minute should query exactly 1m of data, not 1h.

For recording rules specifically, which run via the scheduler, verify they're not scanning excessive time windows for pre-aggregation. These can silently consume enormous query volume since they execute automatically in the background.

Query construction discipline becomes essential at scale. Front-load label selectors in LogQL to reduce the dataset before applying parser stages or line filters. A query like {namespace="prod"} |= "error" | json | status_code >= 500 scans far less data than the equivalent without the initial namespace selector. Time range selection matters too—defaulting to "last 30 days" in dashboards multiplies your scanned volume by 30 compared to a one-day window.

The Explore interface shows query cost estimates before execution, which is useful for ad-hoc investigation but doesn't help with automated dashboard and alert load. For ongoing monitoring, the Cost Management interface (which replaces the deprecated billing dashboard) shows your query usage ratio under Products > Logs. Track this weekly rather than waiting for the monthly invoice.

One practical approach: establish a query budget per team or service, allocating portions of your 100x allowance based on criticality. Treat query volume as a resource constraint similar to memory or CPU, not an unlimited commodity. This encourages teams to optimize their LogQL and challenge whether that dashboard really needs to query the last seven days on every page load.

Read original source →

Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

Related Articles