Kubernetes Deployment Not Rolling Back Properly After scenarios Update on EKS
I've been banging my head against this for hours... This might be a silly question, but I've been struggling with this for a few days now and could really use some help... I'm working with an scenario with my Kubernetes deployment on AWS EKS where the rolling update is not rolling back properly after a failed update. I have the following deployment configuration: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: my-app:latest ports: - containerPort: 80 livenessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 10 periodSeconds: 5 readinessProbe: httpGet: path: /ready port: 80 initialDelaySeconds: 5 periodSeconds: 5 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 ``` I have configured liveness and readiness probes to ensure that the pods are healthy, but during my last deployment, the new version of the image caused the application to crash, which resulted in the pods being marked as "not ready." I expected Kubernetes to roll back to the previous stable version, but instead, it seems to keep the failed pods and doesnโt initiate a rollback. I've checked my deployment history using `kubectl rollout history deployment/my-app` which shows the failed changes, and I tried to manually initiate a rollback with `kubectl rollout undo deployment/my-app`, but it didnโt fix the scenario; the deployment is still exploring in a partially updated state. I also verified that my deployment is configured to have a revision history limit, but it seems that itโs not reverting as expected. The EKS cluster is running Kubernetes version 1.21. I am not seeing any specific behavior messages in the logs relating to deployment failures, and the events associated with the pods indicate they are continuously restarting. Has anyone encountered this question before? What steps can I take to ensure that the deployment rolls back automatically upon failure? Any insights on debugging this situation would be greatly appreciated. For context: I'm using Yaml on Linux. I'm coming from a different tech stack and learning Yaml. What's the best practice here? My team is using Yaml for this desktop app.