Validate

The /metric-validate skill is the deep-audit counterpart to metrics-discovery. Where discovery asks "should this metric exist?", validate asks "does this metric's current number deserve to be trusted, and is the metric measuring something the team can act on?"

Use it whenever:

a number on the cockpit feels suspiciously good or bad,
you're about to verify a metric and need an independent cross-check,
an alert fired and you need to confirm whether the red is real before paging an owner,
the underlying source contract changed (new dlt schema, renamed column) and the metric needs a fresh smell test.

The skill operates on one metric_id at a time and writes a dated audit file under docs/metric-audits/<date>-<metric_id>.md.

What the skill checks

Five layers, in order:

1. Methodology

Read the metric's methodology, how_to_read, sources, period, and value_sql out of the registry. Does the SQL actually express what the methodology comment claims? Does the methodology match the JTBD that landed it? Discrepancies between "what the comment says" and "what the SQL does" are the most common source of distrusted numbers.

2. Live source freshness

Check <schema>._dlt_loads.inserted_at for the source schema. A fresh computed_at on the cockpit can paper over a stale source_as_of — the staleness signal exists for a reason, and validate cross-checks it directly against the dlt loads table rather than trusting the cached source_as_of on the history row.

3. Independent cross-check rebuild

Re-derive the same number in a different way — typically:

a hand count against the team's source-of-truth tables (webbank core.client_profile, client_profile_state_log, etc.),
a query that reaches the same answer through a different join path,
a Jasper-rendered report covering the same concept.

If the rebuild disagrees with metric_history, the audit file documents the gap and points at the SQL line responsible. If the rebuild agrees, the audit file documents the match and the methodology earns the right to claim "verified by cross-check".

4. Business semantics of red

The most important layer, and the one that distinguishes validate from "the SQL runs":

What does red on this metric actually mean in business terms? ("4 minutes of payment processing time" — meaningful? "22% approval rate" — wrong-low or right-low?)
Is the direction right? An amber on a "passing rate" metric should mean "fewer pass than we want", not "more". Sign mistakes in norm_value / alert_value look like correct numbers until somebody asks why nothing ever fires.
Does the period match the action horizon? A 30d window on a metric whose owner can only act on a 24h signal is a misdesign.

5. Action surface

If the metric went red right now, what would the owner do? If the answer is "open a ticket", the metric is closer to a vanity counter than to a cockpit tile. The audit file calls this out — not as a reason to delete the metric, but as a reason to revisit the JTBD or re-route the routing.

Outputs

The skill writes one file:

docs/metric-audits/<YYYY-MM-DD>-<metric_id>.md

Format:

Verdict. Pass / pass-with-caveat / fail.
What the metric claims to measure.
What it actually measures (after reading the SQL).
Live freshness. source_as_of, computed_at, dlt-loads cross-check.
Cross-check rebuild. SQL or hand-count, plus the number.
Red zone semantics. What red means in EUR / hours / clients.
Action. What the owner does on red.
Recommendation. Verify / fix SQL / fix thresholds / retire.

The file is committed to the repo as a permanent record. Even if the metric is later retired, its audit trail stays.

When validate is the right hammer

Validate is for one metric at a time, deeply. It is not the right tool for:

Daily triage of the cockpit — that's heartbeat-today.
Deciding whether to add a new metric — that's metrics-discovery.
Routine threshold tuning — that's the inline editor on the cockpit.
Bulk audit of every metric — open a candidates file and run validate on each one in sequence; there is no batch mode.