Kubernetes Health Checks: Liveness, Readiness, and Startup Probes Explained

Kubernetes health probes are one of the most powerful — and most misunderstood — features of the platform. Configured correctly, they make your application self-healing. Configured incorrectly, they cause cascading restarts, dropped requests, and mysterious downtime.

The Three Types of Probes

Liveness Probe: "Is the process stuck?"

The liveness probe answers: should Kubernetes restart this container?

If the liveness probe fails, Kubernetes kills the container and starts a new one. This is useful for detecting deadlocks, infinite loops, or corrupted state that can't recover without a restart.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 3

When the liveness probe fails: 1. Container is killed (SIGTERM → SIGKILL after grace period) 2. New container is started 3. If it keeps failing, Kubernetes backs off (CrashLoopBackOff)

Common mistake: Making the liveness probe check dependencies (database, external APIs). If your database is slow, the liveness probe fails, Kubernetes restarts your app, the app reconnects to the already-stressed database, making things worse. This is the #1 cause of cascading failures in Kubernetes.

Rule: Liveness probes should only check if the process itself is healthy, not its dependencies.

Readiness Probe: "Can this instance handle traffic?"

The readiness probe answers: should Kubernetes send traffic to this pod?

If the readiness probe fails, the pod is removed from the Service's endpoint list. It stays running — it just stops receiving new requests.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2
  successThreshold: 1

When the readiness probe fails: 1. Pod is removed from Service endpoints 2. No new traffic is routed to this pod 3. Existing connections continue (graceful) 4. When probe passes again, pod is added back

This is where you check dependencies. If your database is down, the readiness probe should fail so traffic is routed to pods that can still serve (maybe from cache).

Startup Probe: "Has the app finished initializing?"

The startup probe answers: is the container still starting up?

Until the startup probe succeeds, liveness and readiness probes are disabled. This is crucial for applications with long startup times.

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 30  # 30 × 5s = 150s max startup time

Without a startup probe: If your app takes 60 seconds to start and your liveness probe has a 15-second initial delay, Kubernetes will kill the container before it finishes starting, causing a restart loop.

Designing Your Health Endpoints

The `/healthz` Endpoint (Liveness)

Should be simple and fast:

@app.get("/healthz")
async def liveness():
    # Only check if the process is alive and responding
    return {"status": "alive"}

Do NOT include: - Database checks - External API checks - Heavy computation - File system checks (might hang on NFS)

The `/ready` Endpoint (Readiness)

Should verify the application can serve requests:

@app.get("/ready")
async def readiness():
    checks = {}

    # Check database connectivity
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception:
        checks["database"] = "failed"
        return JSONResponse({"status": "not ready", "checks": checks}, status_code=503)

    # Check cache connectivity
    try:
        redis.ping()
        checks["cache"] = "ok"
    except Exception:
        checks["cache"] = "failed"
        return JSONResponse({"status": "not ready", "checks": checks}, status_code=503)

    return {"status": "ready", "checks": checks}

Configuration Guidelines

Timing Parameters

Parameter	Liveness	Readiness	Startup
initialDelaySeconds	App startup time	0-5s	0
periodSeconds	10-30s	5-10s	5-10s
timeoutSeconds	3-5s	3-5s	3-5s
failureThreshold	3-5	2-3	30+
successThreshold	1	1-2	1

Formulas

Liveness failure detection time:

Detection = initialDelay + (periodSeconds × failureThreshold)
Example: 15 + (10 × 3) = 45 seconds

Startup maximum time:

Max startup = periodSeconds × failureThreshold
Example: 5 × 30 = 150 seconds

Common Anti-Patterns

1. Liveness Probe Checks Database

# BAD: Database outage causes all pods to restart
livenessProbe:
  httpGet:
    path: /health  # This endpoint checks DB

Fix: Use /healthz (process-only) for liveness, /ready (with DB check) for readiness.

2. Same Endpoint for Both Probes

# BAD: Same check for liveness and readiness
livenessProbe:
  httpGet:
    path: /health
readinessProbe:
  httpGet:
    path: /health

Fix: Separate endpoints with different checks.

3. Timeout Too Short

# BAD: Under load, 1s timeout will fail
livenessProbe:
  httpGet:
    path: /healthz
  timeoutSeconds: 1

Fix: Set timeout to at least 3 seconds. Under load, even simple endpoints can be slow.

4. No Startup Probe for Slow Apps

# BAD: App needs 60s to start, liveness kills it at 25s
livenessProbe:
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

Fix: Add a startup probe with generous failureThreshold.

5. Failure Threshold Too Low

# BAD: Single failure triggers restart
livenessProbe:
  failureThreshold: 1

Fix: Use failureThreshold of 3+ to tolerate transient issues.

Monitoring Your Probes

Set up external monitoring in addition to Kubernetes probes:

External uptime monitoring — checks your service from outside the cluster
Probe failure metrics — track how often probes fail (a leading indicator)
Pod restart count — increasing restarts indicate probe misconfiguration
CrashLoopBackOff alerts — something is fundamentally broken

Conclusion

Kubernetes probes are your application's immune system. Liveness is the restart button for stuck processes. Readiness is the traffic light for overloaded pods. Startup is the patience for slow initializers. Get them right, and your application becomes self-healing. Get them wrong, and they become the source of your outages.

Kubernetes Health Checks: Liveness, Readiness, and Startup Probes Explained

Kubernetes Health Checks: Liveness, Readiness, and Startup Probes Explained

The Three Types of Probes

Liveness Probe: "Is the process stuck?"

Readiness Probe: "Can this instance handle traffic?"

Startup Probe: "Has the app finished initializing?"

Designing Your Health Endpoints

The /healthz Endpoint (Liveness)

The /ready Endpoint (Readiness)

Configuration Guidelines

Timing Parameters

Formulas

Common Anti-Patterns

1. Liveness Probe Checks Database

2. Same Endpoint for Both Probes

3. Timeout Too Short

4. No Startup Probe for Slow Apps

5. Failure Threshold Too Low

Monitoring Your Probes

Conclusion

Start monitoring your services for free

The `/healthz` Endpoint (Liveness)

The `/ready` Endpoint (Readiness)