The Practices That Made Me Go 'Wow': What Every CNCF Project Ca... Rohit Agrawal & Kateryna Nezdolii

CNCF Youtube

Envoy's approach to deprecation management deserves more attention than it gets. Most projects handle breaking changes through documentation and maybe a changelog entry. Envoy built a system that automatically tracks deprecated features through their entire lifecycle and pings PR authors two releases after they introduce something marked for removal. This isn't just good manners—it's a forcing function that prevents technical debt from accumulating silently.

The mechanism works through runtime feature guards, which are essentially feature flags with expiration dates baked in. When you introduce a behavior change, you wrap it in a guard that explicitly declares its deprecation timeline. The system then generates GitHub issues automatically when it's time to clean up the old code path. This transforms deprecation from a vague intention into a tracked work item with clear ownership.

What makes this pattern particularly valuable is how it handles the backward compatibility problem at scale. In a 2M+ line codebase with hundreds of contributors, you can't rely on institutional memory or manual tracking. The automation ensures that deprecated code actually gets removed rather than lingering indefinitely because nobody remembers why it's there or who owns it.

The CI pipeline setup is equally instructive. Envoy runs every build through multiple sanitizers—address sanitizer, memory sanitizer, thread sanitizer—as part of the standard pipeline. This catches entire classes of bugs that would otherwise only surface in production under specific conditions. The cost is longer CI times, but the tradeoff is clearly worth it for infrastructure software where memory safety issues can cascade into security vulnerabilities.

Continuous fuzzing through OSS-Fuzz integration takes this further. Rather than fuzzing as an occasional activity, it runs constantly against the main branch. When it finds issues, they're treated as regular bugs in the issue tracker. This shifts fuzzing from a specialized security activity to a standard quality gate.

The test ownership model addresses a problem every large project eventually hits: who fixes flaky tests? Envoy assigns ownership at the test level, not just the code level. When a test fails, there's a clear owner responsible for either fixing it or disabling it with a tracked issue. This prevents the "not my test" problem where flaky tests get ignored until CI becomes unreliable.

These practices aren't specific to Envoy's domain or scale. A project with 100K lines of code faces the same fundamental problems—deprecations that never complete, memory bugs that slip through, tests that nobody owns. The difference is that Envoy encoded solutions into tooling and process before the problems became unmanageable.

The key insight is treating maintainer burden as a first-class engineering problem. Most projects optimize for feature velocity or contributor experience. Envoy optimized for maintainability, recognizing that in a long-lived infrastructure project, the cost of maintenance eventually dominates. The automation isn't about replacing human judgment—it's about making the right decisions cheap and the wrong decisions expensive.

If you maintain any CNCF project, the question isn't whether these practices apply to you. It's which ones you implement first before your codebase reaches the size where retrofitting them becomes prohibitively expensive.

Read original source →

The Practices That Made Me Go 'Wow': What Every CNCF Project Ca... Rohit Agrawal & Kateryna Nezdolii

Related Articles