The Only 7 Website Monitoring Metrics That Actually Matter

More dashboards don't mean better monitoring. Most teams track too many metrics, creating noise that drowns out the signal. After years of helping teams set up monitoring, these are the only 7 metrics that consistently matter.

1. Availability (Uptime Percentage)

What it measures: Is your site reachable and responding correctly?

How to measure: External HTTP checks from multiple regions, every 1-5 minutes.

Thresholds: | Level | Threshold | Action | |-------|-----------|--------| | Normal | > 99.9% (30-day rolling) | None | | Warning | 99.5% – 99.9% | Review incidents | | Critical | < 99.5% | Investigate root cause |

Why it matters: This is the foundation. Everything else is irrelevant if your site is down. Calculate your uptime honestly — partial outages, slow responses, and error pages all count.

Pro tip: Measure from at least 3 geographic regions. A single-location check can show 100% uptime while an entire continent can't reach your site.

2. Response Time (TTFB — Time to First Byte)

What it measures: How long your server takes to start sending a response.

How to measure: External monitoring tracks this automatically with every check.

Thresholds: | Level | Threshold | Action | |-------|-----------|--------| | Good | < 300ms | Optimal | | Warning | 300ms – 1s | Investigate if trending up | | Critical | > 1s | Performance degradation | | Emergency | > 3s | Users are leaving |

Why it matters: TTFB is the earliest indicator of server-side problems. A gradual increase over days often signals a growing issue (memory leak, growing dataset, connection pool exhaustion) before it becomes an outage.

What to track: Monitor p50 (median) for typical experience and p99 for worst-case. If your p50 is 100ms but p99 is 5 seconds, 1% of your users are having a terrible time.

3. Error Rate

What it measures: Percentage of requests that return errors (4xx, 5xx).

How to measure: Application metrics or log aggregation.

Thresholds: | Level | 5xx Rate | 4xx Rate | |-------|----------|----------| | Normal | < 0.1% | < 2% | | Warning | 0.1% – 0.5% | 2% – 5% | | Critical | > 0.5% | > 10% |

Why it matters: A small, steady error rate is normal. A sudden spike means something broke. The distinction between 4xx (client errors) and 5xx (server errors) matters: - Rising 4xx: possible broken links, API changes, or bot traffic - Rising 5xx: your code or infrastructure is failing

4. SSL Certificate Expiry

What it measures: Days until your SSL/TLS certificate expires.

How to measure: Daily SSL checks that validate the full certificate chain.

Thresholds: | Level | Days Until Expiry | Action | |-------|-------------------|--------| | Normal | > 30 days | None | | Warning | 14-30 days | Verify auto-renewal | | Critical | 7-14 days | Renew manually if needed | | Emergency | < 7 days | Renew immediately |

Why it matters: An expired certificate takes your entire site offline with a browser warning that scares away users. It's 100% preventable with monitoring.

5. DNS Resolution Time

What it measures: How long it takes to resolve your domain name to an IP address.

How to measure: DNS monitoring checks from multiple resolvers.

Thresholds: | Level | Resolution Time | |-------|----------------| | Good | < 50ms | | Warning | 50-200ms | | Critical | > 200ms or failure |

Why it matters: Slow DNS adds latency to every single page load, API call, and asset request. DNS failures make your site completely unreachable regardless of server health. Most teams don't monitor DNS until they have their first DNS-related outage.

6. Apdex Score

What it measures: User satisfaction based on response time.

How to calculate:

Apdex = (Satisfied + Tolerating/2) / Total

Where:
  Satisfied = response time < T (e.g., 500ms)
  Tolerating = response time < 4T (e.g., 2000ms)
  Frustrated = response time > 4T

Thresholds: | Score | Rating | Meaning | |-------|--------|---------| | 0.94+ | Excellent | Users are happy | | 0.85-0.93 | Good | Mostly satisfied | | 0.70-0.84 | Fair | Some users frustrated | | 0.50-0.69 | Poor | Many users frustrated | | < 0.50 | Unacceptable | Most users frustrated |

Why it matters: Apdex distills complex performance data into a single number that even non-technical stakeholders understand. "Our Apdex is 0.72" is more meaningful than "our p95 is 2.3 seconds."

7. Uptime of Critical Dependencies

What it measures: Availability of services your application depends on.

What to monitor: - Payment gateway (Stripe, PayPal) - Email service (SendGrid, SES) - CDN (Cloudflare, CloudFront) - Database (connection availability) - Third-party APIs - DNS provider

Why it matters: Your application's availability is limited by its least reliable dependency. If your payment gateway is down, your checkout is down — even if your servers are perfectly healthy.

Metrics You Can Probably Stop Tracking

These metrics seem important but rarely drive action:

Metric Why It's Overrated
CPU usage High CPU isn't a problem unless it causes latency or errors
Memory usage Same — only matters when it causes OOM kills
Disk usage Set one alert at 85%, don't need a dashboard
Request count Interesting for capacity planning, not daily monitoring
Individual server health Focus on service health, not individual instances
Build/deploy time Track it, but don't put it on your monitoring dashboard

Setting Up Your Dashboard

One dashboard, seven panels:

┌─────────────────────┐ ┌─────────────────────┐
│ UPTIME: 99.97%      │ │ RESPONSE: 142ms p50 │
│ ███████████████████▓ │ │ ▁▁▂▁▁▁▁▂▁▁▁▁▁▁▂▁  │
└─────────────────────┘ └─────────────────────┘
┌─────────────────────┐ ┌─────────────────────┐
│ ERRORS: 0.02%       │ │ SSL: 47 days left   │
│ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │ │ ████████████░░░░░░  │
└─────────────────────┘ └─────────────────────┘
┌─────────────────────┐ ┌─────────────────────┐
│ DNS: 23ms avg       │ │ APDEX: 0.94         │
│ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ │ │ ████████████████░░  │
└─────────────────────┘ └─────────────────────┘
┌─────────────────────────────────────────────┐
│ DEPENDENCIES: Stripe ✓ CDN ✓ Email ✓ DB ✓  │
└─────────────────────────────────────────────┘

Conclusion

Seven metrics. One dashboard. That's all you need for effective website monitoring. Everything else is noise until one of these seven tells you something is wrong. Start here, and add complexity only when you have a specific question that these metrics can't answer.