Cron Job Monitoring: How to Prevent Silent Failures

Your daily database backup cron job stopped running 3 weeks ago. Nobody noticed until the server crashed and the latest backup was from last month. Sound familiar?

Cron jobs are the silent workhorses of infrastructure — and their failures are equally silent. Unlike a web server crash that triggers immediate alerts, a cron job that stops running simply... doesn't run. No error. No alert. Just missing data that you discover at the worst possible time.

Why Cron Jobs Fail Silently

Common Failure Modes

Server restart — crontab lost or service not started
Disk full — job starts, fails to write output, exits silently
Permission changes — script can't access files/databases after an update
Dependency missing — a library or binary was removed during an update
Timeout — job takes longer than expected and gets killed
OOM kill — job uses too much memory and the OS kills it
Lock file stale — previous run left a lock file, new runs skip
Environment mismatch — works interactively, fails in cron (different PATH, env vars)
Certificate expired — job calls an API with expired SSL
Rate limiting — external API rejects requests

Why Traditional Monitoring Misses Them

Uptime monitoring checks if your website responds. Health checks verify your application is running. But neither detects that a background job silently stopped executing at 2 AM.

How Heartbeat Monitoring Works

Instead of checking if something is up, heartbeat monitoring checks if something happened. The concept is simple:

Your cron job sends a "ping" (HTTP request) when it runs successfully
The monitoring system expects to receive this ping on a schedule
If the ping doesn't arrive within the expected window, an alert fires

# Your cron job (before heartbeat monitoring)
0 * * * * /usr/local/bin/backup.sh

# Your cron job (with heartbeat monitoring)
0 * * * * /usr/local/bin/backup.sh && curl -s https://valpero.com/api/heartbeat/ping/abc123

That curl at the end is the heartbeat. If the backup script fails (non-zero exit code), the && prevents the curl from running, and the monitoring system alerts you.

What to Monitor

Critical Cron Jobs

Job	Typical Schedule	Impact of Failure
Database backups	Hourly / Daily	Data loss risk
SSL certificate renewal	Daily check	Site outage
Log rotation	Daily	Disk full → crash
Report generation	Daily / Weekly	Business impact
Data sync / ETL	Hourly	Stale data
Cleanup tasks	Daily	Disk/DB bloat
Health checks	Every minute	Missed outages
Queue processing	Continuous	Backlog growth
Payment reconciliation	Daily	Financial discrepancy
Email digest sending	Daily	User engagement drop

What to Include in the Heartbeat Ping

Don't just ping — include useful context:

# Basic: just ping on success
curl -s https://valpero.com/api/heartbeat/ping/abc123

# Better: include execution time and status
curl -s "https://valpero.com/api/heartbeat/ping/abc123?duration=${SECONDS}&status=ok"

# Best: send execution details
curl -s -X POST https://valpero.com/api/heartbeat/ping/abc123   -H "Content-Type: application/json"   -d "{"duration": ${SECONDS}, "records_processed": ${COUNT}}"

Setting Up Heartbeat Monitoring

Step 1: Create a Heartbeat Monitor

Set up a monitor with: - Name: "Production DB Backup" (human-readable) - Expected interval: how often the job should run (e.g., every 1 hour) - Grace period: how much delay is acceptable before alerting (e.g., 10 minutes)

Step 2: Add the Ping to Your Job

Add a curl/wget call at the end of your cron job, after the main task succeeds.

Step 3: Handle Failures Properly

#!/bin/bash
set -e  # Exit on any error

# Your actual job
pg_dump mydb > /backups/daily.sql
gzip /backups/daily.sql

# Only ping if everything succeeded
curl -fsS --retry 3 https://valpero.com/api/heartbeat/ping/abc123

Step 4: Configure Alerts

Set up alerts on the appropriate channels: - Critical jobs (backups, payments): SMS + Telegram - Important jobs (reports, syncs): Slack/Email - Nice-to-have jobs (cleanup): Email only

Advanced Patterns

Wrapper Script

Create a reusable wrapper that handles logging, error capture, and heartbeat pinging:

#!/bin/bash
# heartbeat-wrapper.sh <heartbeat-id> <command...>

HEARTBEAT_ID=$1
shift
COMMAND="$@"

START=$(date +%s)

if $COMMAND 2>&1 | tee /var/log/cron-${HEARTBEAT_ID}.log; then
    DURATION=$(($(date +%s) - START))
    curl -fsS --retry 3 "https://valpero.com/api/heartbeat/ping/${HEARTBEAT_ID}?duration=${DURATION}"
else
    EXIT_CODE=$?
    echo "Job failed with exit code ${EXIT_CODE}" >> /var/log/cron-${HEARTBEAT_ID}.log
    # Don't ping — monitoring will alert on missing heartbeat
fi

Usage:

0 * * * * /usr/local/bin/heartbeat-wrapper.sh abc123 /usr/local/bin/backup.sh

Start + End Pinging

For long-running jobs, ping at both the start and end:

# Ping "start" — we know the job attempted to run
curl -s "https://valpero.com/api/heartbeat/ping/abc123?status=start"

# Run the actual job
/usr/local/bin/heavy-etl-job.sh

# Ping "end" — we know the job completed
curl -s "https://valpero.com/api/heartbeat/ping/abc123?status=complete"

This lets you detect jobs that started but never finished (hung, killed, stuck).

Common Mistakes

Pinging before the job runs — you'll never know if it actually succeeded
No grace period — jobs that take variable time trigger false alerts
Ignoring exit codes — job.sh; curl ping pings even on failure; use &&
Not monitoring the monitor — if your heartbeat endpoint is down, all jobs appear failed
Too many heartbeats — monitor important jobs, not every tiny script

Conclusion

Every cron job that matters should have a heartbeat monitor. It takes 60 seconds to set up and prevents the most frustrating type of failure — the one you don't discover until it's too late. If a job is important enough to schedule, it's important enough to monitor.

Cron Job Monitoring: How to Prevent Silent Failures

Cron Job Monitoring: How to Prevent Silent Failures

Why Cron Jobs Fail Silently

Common Failure Modes

Why Traditional Monitoring Misses Them

How Heartbeat Monitoring Works

What to Monitor

Critical Cron Jobs

What to Include in the Heartbeat Ping

Setting Up Heartbeat Monitoring

Step 1: Create a Heartbeat Monitor

Step 2: Add the Ping to Your Job

Step 3: Handle Failures Properly

Step 4: Configure Alerts

Advanced Patterns

Wrapper Script

Start + End Pinging

Common Mistakes

Conclusion

Start monitoring your services for free