OperationsCron

Cron

Scheduled jobs on the VM and what each one does.

Heartbeat's scheduler is plain cron on the VM. Four lines drive everything that has to happen on a clock.

CronWhat it doesEntry
17 * * * *Hourly ingest + compute (every metric).scripts/refresh_all.py
23 4 * * *Daily μ/σ refresh for rolling_stat metrics.python -m api.scripts.refresh_benchmarks
30 4 * * *Daily client-segmentation snapshot (writes heartbeat.client_segment_snapshot).python -m api.scripts.refresh_segments
45 4 * * *Daily audit-cut registry sync + snapshot refresh for every (cut × window).python -m api.scripts.refresh_audit

What each job actually does

Hourly: refresh_all.py (:17)

POSTs {"metric_ids":"all","trigger":"cron"} to the local /api/metrics/recompute endpoint, then polls /api/metrics/recompute/{id} until done. The endpoint runs ingest and compute behind one job (single-flight, globally coalesced — see architecture › Refresh path). Phase transitions are printed to the cron log so you can see whether ingest or compute hung.

If the hourly tick lands while a user-initiated recompute is still running, the cron tick is silently absorbed (returns the existing job_id with coalesced=true).

Daily 04:23: refresh_benchmarks

Recomputes μ ± σ over the rolling history window for every rolling_stat metric, writing into heartbeat.metric_registry.benchmark_mean and benchmark_stddev. Cockpit reads these to colour z-score amber/red when value is outside |z| ≥ 1 / 2.

Daily 04:30: refresh_segments

Writes the daily client-segmentation snapshot (heartbeat.client_segment_snapshot) — the per-client tier tags (top / mid / micro) used by the audit pages and several B2B-shaped metrics. Decoupled from the activity-window metrics so the tier tag stays stable across short windows.

Daily 04:45: refresh_audit

Two phases:

  1. Sync registry from disk. Walks api/audit/cuts/<cut_id>.py, reads each module's META, upserts into heartbeat.audit_cut_registry. Removes rows for cuts whose files are gone.
  2. Recompute snapshots. For every (cut × window) in the registry, calls the cut's compute(conn, *, window) and UPSERTs the JSON payload into heartbeat.audit_cut_snapshot.

Without this job, new cuts surface as 404 on the audit page until the next 04:45 tick — bin/deploy.sh runs refresh_audit at the end specifically to avoid that gap.

Where the cron lines live

In the VM crontab (crontab -l), invoked via docker exec heartbeat_api .... There is no per-environment override — cron lives on the host, not in compose. To inspect / edit:

ssh ubuntu@13.62.60.156 'crontab -l'

Diagnostics when a tick goes wrong

# is the daemon up?
ssh ubuntu@13.62.60.156 'systemctl status cron'

# tail the cron log for the latest run
ssh ubuntu@13.62.60.156 'tail -200 /var/log/syslog | grep CRON'

# what happened to the api during the ingest phase?
ssh ubuntu@13.62.60.156 \
  'cd ~/heartbeat-dashboard && \
   docker compose -f docker-compose.prod.yml logs --since=1h api'

# is the active-job endpoint stuck?
ssh ubuntu@13.62.60.156 \
  'docker exec heartbeat_api curl -s localhost:8000/api/metrics/active-job'

A stuck job (api restart with an in-flight recompute) drops on restart — the next cron tick re-runs from scratch. There is no persistent job ledger; that's an explicit design choice (decisions › 2026-04-30).