How to Monitor Your SaaS Application: A Complete Guide

Running a SaaS application means your users expect 24/7 availability. Unlike a downloadable app that runs on the user's machine, every bug, every slow query, and every infrastructure hiccup directly affects your customers — and your revenue. Here's how to build monitoring that keeps your SaaS reliable.

The SaaS Monitoring Stack

A production SaaS application needs monitoring at five layers:

Layer 1: External Availability (User Perspective)

This is where you start. Can your users reach your application?

What to monitor: - Main application URL (login page, dashboard) - API endpoints (the ones your SPA or mobile app calls) - Public pages (marketing site, docs, blog) - CDN-served assets (JS bundles, images) - Third-party dependencies (payment gateway, auth provider)

How to monitor: - External HTTP checks every 1-5 minutes - From multiple geographic regions (your users aren't all in one city) - With keyword validation (catch error pages that return 200) - SSL certificate expiry monitoring

Alert on: Downtime from 2+ regions for 2+ consecutive checks.

Layer 2: Application Performance (User Experience)

Your app might be "up" but painfully slow. Users don't distinguish between "down" and "too slow to use."

What to monitor: - Response time (p50, p95, p99) per endpoint - Error rate (5xx responses / total responses) - Core Web Vitals (LCP, CLS, INP) - API-specific metrics (authentication success rate, search latency) - Background job processing time and queue depth

How to monitor: - Application Performance Monitoring (APM) instrumentation - Server-side request logging with timing - Client-side Real User Monitoring (RUM) via JS snippet - Synthetic monitoring for critical user journeys

Alert on: p99 > 3 seconds for 5 minutes, error rate > 1%.

Layer 3: Infrastructure (System Resources)

The servers, databases, and services running your app.

What to monitor: - CPU usage (sustained > 80% = problem) - Memory usage (approaching limit = OOM risk) - Disk usage (> 85% = time to clean up or scale) - Network I/O (bandwidth saturation) - Database connections (pool exhaustion) - Database query performance (slow query log) - Cache hit rate (Redis/Memcached) - Queue depth and processing rate

How to monitor: - Server monitoring agent (installed on each instance) - Database-specific monitoring (pg_stat_statements, slow query log) - Cloud provider metrics (CloudWatch, GCP Monitoring)

Alert on: Resource usage > 80% sustained for 10+ minutes.

Layer 4: Business Metrics (Revenue Impact)

Technical metrics don't tell you if the business is working.

What to monitor: - Signup conversion rate (landing page → registered) - Activation rate (registered → first meaningful action) - Payment success rate (attempted → completed) - Churn indicators (login frequency dropping) - Feature usage (are new features being adopted?)

How to monitor: - Analytics (Mixpanel, PostHog, or custom events) - Database queries on subscription/payment tables - Funnel tracking in your product

Alert on: Payment success rate drops below 95%, signup rate drops > 50% from baseline.

Layer 5: Security Monitoring

SaaS applications are high-value targets.

What to monitor: - Failed login attempts (brute force detection) - Unusual API usage patterns (rate limit violations) - SSL certificate changes (detect hijacking) - Dependency vulnerabilities (npm audit, pip-audit) - DNS record changes (unauthorized modifications)

How to monitor: - Application-level rate limiting with logging - DNS monitoring for record changes - Safe Browsing status checks - Regular dependency audits

Alert on: 100+ failed logins from one IP, DNS record change, SSL certificate mismatch.

Setting SLOs for Your SaaS

What Is an SLO?

A Service Level Objective is your internal target for reliability. It's more aggressive than your SLA (the promise to customers).

Metric	SLO	SLA
Availability	99.95%	99.9%
API latency (p95)	< 500ms	< 2 seconds
API error rate	< 0.1%	< 1%

How to Calculate Error Budget

Error budget = 1 - SLO

If SLO = 99.95%:
Error budget = 0.05% = 21.6 minutes/month

You can "spend" 21.6 minutes of downtime per month
before you breach your SLO.

When your error budget is running low: - Freeze non-critical deployments - Focus engineering effort on reliability - Postpone risky changes

Monitoring Your Tech Stack

Frontend (React, Vue, Next.js)

Core Web Vitals (LCP, CLS, INP)
JavaScript error tracking (Sentry, LogRocket)
Bundle size monitoring (Webpack bundle analyzer)
CDN cache hit rate

Backend API (Node.js, Python, Go)

Request rate, error rate, duration (RED metrics)
Endpoint-level performance breakdown
Database query timing
External API call timing

Database (PostgreSQL, MySQL)

Active connections vs pool size
Slow queries (> 100ms)
Replication lag (if using replicas)
Table and index sizes (bloat detection)
Deadlock count

Cache (Redis, Memcached)

Hit rate (target: > 95%)
Memory usage vs max memory
Eviction rate
Connection count

Queue (RabbitMQ, SQS, Celery)

Queue depth (growing = processing can't keep up)
Processing time per message
Dead letter queue size
Consumer count

Incident Classification for SaaS

Severity	Criteria	Response Time	Example
SEV1	All users affected, revenue impacted	Immediate	App fully down
SEV2	Subset affected, core flow broken	15 minutes	Payments failing
SEV3	Non-core feature broken	1 hour	Reporting broken
SEV4	Minor issue, workaround exists	Next business day	UI cosmetic bug

The Minimum Viable Monitoring Stack

Starting from zero? Here's what to set up in order:

External uptime monitoring (Day 1)
Main URL + API + payment endpoint
Multi-region checks every 5 minutes
Email + Telegram/Slack alerts
Application error tracking (Week 1)
Sentry or equivalent for exception tracking
Source maps for meaningful stack traces
Server monitoring (Week 1)
Agent on each server for CPU/RAM/disk
Database connection monitoring
Status page (Week 1)
Public page for customer communication
Linked from footer and support docs
On-call rotation (Month 1)
At least 2 people rotating weekly
Escalation chain documented
Performance monitoring (Month 1-2)
APM instrumentation for latency tracking
Slow query logging
Core Web Vitals tracking
Business metric monitoring (Month 2-3)
Payment success rate tracking
Signup funnel monitoring
Key feature usage metrics

Conclusion

Monitoring a SaaS application is not a one-time setup — it's an ongoing practice that grows with your product. Start with external uptime monitoring (the cheapest, highest-impact investment), then layer in application performance, infrastructure metrics, and business KPIs as your team and product mature. The goal is always the same: know about problems before your customers do, fix them fast, and learn from every incident to prevent the next one.

How to Monitor Your SaaS Application: A Complete Guide

How to Monitor Your SaaS Application: A Complete Guide

The SaaS Monitoring Stack

Layer 1: External Availability (User Perspective)

Layer 2: Application Performance (User Experience)

Layer 3: Infrastructure (System Resources)

Layer 4: Business Metrics (Revenue Impact)

Layer 5: Security Monitoring

Setting SLOs for Your SaaS

What Is an SLO?

How to Calculate Error Budget

Monitoring Your Tech Stack

Frontend (React, Vue, Next.js)

Backend API (Node.js, Python, Go)

Database (PostgreSQL, MySQL)

Cache (Redis, Memcached)

Queue (RabbitMQ, SQS, Celery)

Incident Classification for SaaS

The Minimum Viable Monitoring Stack

Conclusion

Start monitoring your services for free

Related articles