50 Articles

Continuous profiling with Alloy's eBPF-based CPU profiler and Pyroscope enabled zero-instrumentation identification of performance bottlenecks in TON blockchain validation code, revealing that cryptographic operations (SHA256, Ed25519) and data structure choices (std::map vs std::unordered_set) were primary optimization targets, with concrete examples showing 2-20% speedups from targeted replacements and algorithmic improvements.

Grafana Labs Blog

The article explains how eBPF profilers symbolize Go binaries by mapping raw memory addresses to function names, leveraging Go's embedded .gopclntab section which persists even in stripped binaries—a capability unavailable to most other native languages that require server-side symbolization. Understanding this process is critical for debugging production profiling issues and explains why Go programs produce superior profiling data out-of-the-box.

Grafana Labs Blog

OpenLIT SDK provides automatic distributed tracing for AI agents in Grafana Cloud, capturing the full execution path including planning, tool calls, LLM invocations, token usage, and costs—enabling root-cause analysis of non-deterministic agent behavior and cost optimization through unified observability dashboards. The integration requires minimal instrumentation (single init() call) and works across multiple agent frameworks (CrewAI, OpenAI Agents, LangChain, etc.).

Grafana Labs Blog

Envoy's open source engineering practices—including automated deprecation tracking, comprehensive CI/CD with sanitizers and fuzzing, runtime feature guards, and scaled test ownership models—provide concrete, adoptable patterns that reduce maintainer burden and improve backward compatibility across large codebases. The talk distills institutional practices that prevent entropy in 2M+ line projects and are applicable to any CNCF project regardless of size.

CNCF Youtube

André Silva presents a practical blueprint for securing open-source package pipelines using OIDC authentication to eliminate secrets, automated SBOM generation, cryptographic build attestations, dependency automation with bots, vulnerability scanning via CodeQL, and GitHub Actions hardening through hash pinning and permission restrictions. These practices collectively mitigate supply chain attacks, reduce manual error risk, and improve compliance and build integrity verification.

CNCF Youtube

This SRE Weekly digest covers practical reliability engineering topics including error budgets for ML systems (which decay gradually rather than fail suddenly), enterprise budget misallocation toward reactive failure response over prevention, and infrastructure automation patterns like database migrations and graceful service restarts. The recurring theme is that SRE tooling and processes must adapt to new workload types and operational realities rather than applying traditional uptime-focused metrics uniformly.

SRE Weekly Blog