ReplicaSet ≠ High Availability (Until You Test This)

Pods fail, nodes go down: this walkthrough shows what actually happens, and how to fix it.

Aug 04, 2025

30 second summary for you:

Running your app in Kubernetes doesn’t mean it’s highly available. This walkthrough shows how ReplicaSets restore failed pods, handle node loss, and work with probes, all backed by real command-line examples. Adapted from Packt’s The Kubernetes Bible (Chapter 10).

8-minute read
Hands-on commands included

The Problem: One Dead Pod, and Your App Stalls

Let’s say you’ve got a stateless NGINX app deployed in a multi-node Kubernetes cluster using a ReplicaSet. You think you’re covered because there are 4 replicas. But then you:

delete a pod manually
drain one of the nodes
simulate a container failure

In all three cases, you’re expecting automatic recovery. But it’s not magic. It's ReplicaSet (and sometimes liveness probes) doing the heavy lifting.

Let’s walk through all three failure modes and see what Kubernetes does.

Pod Deletion? No Problem.

This scenario demonstrates how a ReplicaSet restores deleted pods to maintain the desired number of replicas.

Here's a step-by-step walkthrough:

Define the ReplicaSet manifest: Save the following YAML as nginx-replicaset-example.yaml:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-replicaset-example
  namespace: rs-ns
spec:
  replicas: 4
  selector:
    matchLabels:
      app: nginx
      environment: test
  template:
    metadata:
      labels:
        app: nginx
        environment: test
    spec:
      containers:
        - name: nginx
          image: nginx:1.17
          ports:
            - containerPort: 80

Create the namespace: This ensures all your resources are scoped properly.

kubectl create -f ns-rs.yaml

Deploy the ReplicaSet: The manifest defines a ReplicaSet with 4 NGINX pods.

kubectl apply -f nginx-replicaset-example.yaml

Delete a pod manually: Simulate a pod failure by deleting one of the running pods.

kubectl delete pod <pod-name> -n rs-ns

Verify that the ReplicaSet restores the pod: The controller detects the change and automatically spins up a new pod to maintain the desired count.

kubectl get pods -n rs-ns
kubectl describe rs/nginx-replicaset-example -n rs-ns

Within seconds, the ReplicaSet controller notices the missing pod and recreates it to meet the declared replica count.

Takeaway: ReplicaSets automatically maintain the number of desired pods, making recovery from manual deletions fast and hands-free.

2. Node Failure? Here's What Actually Happens

This scenario demonstrates how ReplicaSets maintain high availability when a node goes down by rescheduling pods onto available nodes:

Here's a step-by-step walkthrough:

Expose your app with a Service:

kubectl apply -f nginx-service.yaml

This creates a service to access your app across pods.

Forward traffic from your local machine to the Kubernetes Service:

kubectl port-forward svc/nginx-service 8080:80 -n rs-ns
curl localhost:8080

This confirms your service is working and traffic is flowing to the pods.

Check where the pods are currently running:

kubectl get pods -n rs-ns -o wide

This shows which node each pod is scheduled on.

Simulate node failure by cordoning and draining the node:

kubectl cordon kind-worker

Prevents new pods from being scheduled on this node.

kubectl drain kind-worker --ignore-daemonsets

Evicts all running pods from the node while ignoring daemonsets.

kubectl delete node kind-worker

Removes the node from the cluster to simulate a full node failure.

Within moments, the ReplicaSet detects the missing pods and spins up new ones on the remaining healthy nodes. Your Service automatically reroutes traffic to these new pods.

Verify that everything is still working:

kubectl get pods -n rs-ns -o wide
curl localhost:8080

You’ll see that traffic still flows, and the app remains accessible without downtime.

Takeaway: The ReplicaSet ensures that the desired number of pod replicas is always maintained — even when a node goes offline. It handles pod rescheduling automatically, as long as there's sufficient capacity in your cluster.

3. Unhealthy Container? Probes Save the Day

Let’s see how Kubernetes handles an unhealthy container using liveness probes.

Here's a step-by-step walkthrough:

Add the following liveness probe to your ReplicaSet pod spec. It instructs the kubelet to check container health after 2 seconds and repeat every 2 seconds:

livenessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 2
  periodSeconds: 2

Apply your updated ReplicaSet manifest and wait for the pod to be up and running.
Simulate a container failure by deleting the default NGINX index file:

kubectl exec -it <pod-name> -- rm /usr/share/nginx/html/index.html

Check what happens by describing the pod:

kubectl describe pod <pod-name>

You’ll see Liveness probe failed events, followed by automatic container restarts.

Takeaway: The kubelet, not the ReplicaSet, manages container health. But when used with ReplicaSets, probes help create a resilient system that self-heals when a container goes bad.

Cleanup

You can delete the ReplicaSet and its pods:

kubectl delete rs/nginx-replicaset-livenessprobe-example

Or just delete the controller, leaving pods untouched:

kubectl delete rs/nginx-replicaset-livenessprobe-example --cascade=orphan

Key Takeaways

ReplicaSets guarantee pod replication and replacement—not health checking
Liveness probes enable kubelet to restart broken containers
Node failure recovery works if your cluster has enough capacity and replicas are spread
HA = ReplicaSets + Probes + Services, working in tandem

Based on Chapter 10 of The Kubernetes Bible, Second Edition

ReplicaSet ≠ High Availability (Until You Test This)

Pods fail, nodes go down: this walkthrough shows what actually happens, and how to fix it.

30 second summary for you:

The Problem: One Dead Pod, and Your App Stalls

Pod Deletion? No Problem.

2. Node Failure? Here's What Actually Happens

3. Unhealthy Container? Probes Save the Day

Cleanup

Key Takeaways

Discussion about this post