MetricsDiscovery

Discovery

Adding a metric through the metrics-discovery skill — the only sanctioned path.

New metrics enter the registry through the metrics-discovery skill, not via ad-hoc SQL. The skill is the discipline gate between "this would be a useful number" and a tile on the cockpit. It enforces a single ordering rule:

Money story first, SQL last. A metric earns its place by being tied to revenue gained or losses avoided, not by being technically possible.

This page describes what the skill does, what its outputs look like, and where ad-hoc shortcuts go wrong. The skill itself lives at .claude/skills/metrics-discovery/SKILL.md; invoke it with /metrics-discovery or any phrasing the skill description matches ("новая метрика", "что бы померить", "audit the cockpit").

What the skill does

A discovery session walks through five steps, in order. Skipping any of them is what produces vanity counters.

1. Inventory the live registry

Read heartbeat.metric_registry directly — what's already there, what's unverified, what's been retired in the recent commit history. The goal is to avoid duplicates before any new SQL is written.

2. Probe the live source

For every candidate metric, the skill checks the dlt-landed schemas on marts-db for the columns it would need:

  • Is the relevant table populated?
  • Does <schema>._dlt_loads show recent successful loads?
  • Does the column the metric depends on actually exist with the semantics the candidate assumes?

If the data isn't landed — even if it's promised in pending sources — the candidate is parked there, not built off a wrong-but-landed surrogate. The webbank-first rule applies in full: fallback requires an explicit notes entry on the metric.

3. Write the JTBD + business rationale

For each candidate, write the money story explicitly:

  • Job to be done — the question this metric answers (e.g. "is card-order approval rate falling below the rate at which we sustain client onboarding?").
  • Red zone meaning — what loss in EUR or what unit of operational pain a red status corresponds to.
  • Action — what the owner does when this goes red. If the answer is "nothing" or "log a ticket", the metric does not belong on the cockpit.

This step is what separates "we could measure this" from "we should measure this". Without it, the cockpit fills up with vanity counters that no one acts on.

4. Get alignment before the SQL

The skill writes a dated candidates file under docs/metric-candidates-<YYYY-MM-DD>.md with the JTBD, source inventory, and proposed thresholds — and stops. Approval from Ivan / PM is required before any compute SQL is drafted.

This is the single most often-skipped step, and it's the reason for the artefact file: a candidate that nobody approved has a dated record sitting next to candidates that did get approved, so we can see the difference.

5. Draft value_sql, run it, sanity-check the number

Only after alignment does SQL get written. The skill drafts the value_sql against marts-db, runs it, and reports the number back — not just "the SQL parses", but "this is the actual current value".

The number is sanity-checked against:

  • the team's hand count (where one exists — the webbank-first rule exists because of a metric that diverged from one),
  • a Jasper-rendered report covering the same concept (Jasper is a source of definitions, not of data — see decisions),
  • the operator's domain expectation ("this can't be 22%").

Only after the number passes does the file land at db/metrics/<metric_id>.sql.

Anatomy of a metric file

A metric file has a fixed shape:

-- comment header: one-line purpose, methodology notes, owner
INSERT INTO heartbeat.metric_registry (metric_id, name, period, value_sql, ...) VALUES (...);
UPDATE heartbeat.metric_registry SET code = 'APRC' WHERE metric_id = '...';
UPDATE heartbeat.metric_registry SET how_to_read = '...', methodology = '...', sources = '...' WHERE metric_id = '...';
-- one-shot threshold reset (norm_value / alert_value / target_value)

Shared scaffolding (db/metrics/_schema.sql, _pre.sql, _post.sql) wraps the per-metric files; db/build_metrics.sh concatenates everything plus an auto-generated GC into the SQL stream piped into psql. The build script's GC means: deleting the file removes the row; no parallel manifest to maintain.

Deploy after the file lands

Once the file is in db/metrics/, run the standard VM deploy:

ssh ubuntu@13.62.60.156 'cd ~/heartbeat-dashboard && ./bin/deploy.sh'

bin/deploy.sh runs db/build_metrics.sh → psql, then compute_metrics (writes the first metric_history row), then refresh_benchmarks (recomputes μ/σ for rolling_stat metrics). The new tile surfaces on the cockpit immediately — unverified, ownerless, awaiting the lifecycle's business handover.

Don't shortcut the deploy script — applying the SQL by hand without the compute + benchmark refresh leaves the new tile stuck on "no data yet" until the next hourly cron tick.

Common failure modes the skill prevents

  • Metric without a money story — the skill refuses to write SQL before step 3 lands.
  • Metric off a wrong-but-landed source — the skill checks pending-ingest-sources and the webbank-first rule; surrogate metrics are flagged, not silently shipped.
  • Threshold derived from current data — the skill won't write target_value = (some percentile of metric_history) even if asked. Thresholds come from outside the dashboard.
  • Duplicate of an existing metric — caught at step 1 by inventorying the live registry.
  • Vanity counter — the JTBD step requires a stated action on red. Counters with no action don't earn a tile.

See also

  • Metric lifecycle — what happens after a metric lands.
  • Validate — the deep-audit skill for a single metric whose number you don't trust.
  • Sources › pending — the parking lot for metric concepts blocked on data that isn't landed.