Ingress NGINX is EOL: A practical guide for migrating to Kubernetes Gateway API

Datadog Blog

Ingress NGINX isn't actually reaching end-of-life, but the Kubernetes project has made it clear that Gateway API is the future. If you're running Ingress NGINX in production, you need a migration plan. The Gateway API graduated to GA in October 2023, and major cloud providers are pushing their managed implementations hard. More importantly, new features and improvements are flowing into Gateway API implementations while Ingress resources remain functionally frozen.

The core difference isn't just API design. Gateway API splits concerns between infrastructure operators and application developers through separate resource types. A GatewayClass defines the controller implementation, a Gateway configures listeners and TLS, and HTTPRoutes define routing rules. This separation means platform teams can enforce TLS policies and rate limiting at the Gateway level while developers manage their own routes without cluster-admin access. With Ingress, you're stuck with annotations that vary wildly between controllers and require careful RBAC gymnastics.

Start your migration by running both controllers in parallel. Deploy a Gateway API implementation alongside your existing Ingress NGINX controller. Most teams choose the NGINX Gateway Fabric, Envoy Gateway, or their cloud provider's managed option. Install it in a separate namespace and create a Gateway resource listening on a different IP or port initially. This lets you validate behavior without touching production traffic.

The validation phase matters more than most teams expect. Gateway API's routing semantics differ subtly from Ingress. Path matching defaults to exact instead of prefix, which breaks applications expecting trailing slash behavior. Header-based routing works differently. Timeouts and retry policies that lived in annotations now become explicit fields with different defaults. Create a test HTTPRoute for one low-risk service and verify request logs, response times, and error rates match your Ingress baseline. Check your metrics for connection pool exhaustion and DNS resolution patterns, both of which can shift when changing proxies.

Traffic shifting should happen service by service, not all at once. Use weighted DNS or an external load balancer to split traffic between your Ingress IP and Gateway IP. Start with 5% to the Gateway for a service with good observability. Monitor P99 latency, error rates, and connection counts for at least 24 hours before increasing the weight. Watch for subtle issues like WebSocket upgrades failing or gRPC streaming connections dropping. These often work fine in testing but break under production load patterns.

The monitoring gap during migration is real. Your existing dashboards probably key off the nginx_ingress_controller_requests metric. Gateway API implementations expose different metric names and labels. Envoy Gateway uses envoy_http_downstream_rq_total with different label cardinality. NGINX Gateway Fabric has its own naming scheme. Build parallel dashboards before you shift traffic, not during an incident. Set up alerts on both metric sets with identical thresholds so you catch regressions immediately.

One practical gotcha: rate limiting and authentication. If you're using Ingress NGINX annotations for oauth2-proxy integration or rate limiting, you'll need to reimplement these using Gateway API's HTTPRoute filters or an external policy attachment mechanism. Most implementations support BackendTLSPolicy for upstream TLS but handle auth differently. Budget time to rebuild these integrations, they're not lift-and-shift.

The migration timeline depends on your Ingress count, but expect three to six months for a production cluster with dozens of services. Rushing causes outages. Taking too long means maintaining two ingress stacks indefinitely, which is worse.