Validate
Deep audit of a single metric — does its number deserve to be trusted?
The /metric-validate skill is the deep-audit counterpart to
metrics-discovery. Where discovery asks
"should this metric exist?", validate asks "does this metric's
current number deserve to be trusted, and is the metric measuring
something the team can act on?"
Use it whenever:
- a number on the cockpit feels suspiciously good or bad,
- you're about to verify a metric and need an independent cross-check,
- an alert fired and you need to confirm whether the red is real before paging an owner,
- the underlying source contract changed (new dlt schema, renamed column) and the metric needs a fresh smell test.
The skill operates on one metric_id at a time and writes a
dated audit file under docs/metric-audits/<date>-<metric_id>.md.
What the skill checks
Five layers, in order:
1. Methodology
Read the metric's methodology, how_to_read, sources, period,
and value_sql out of the registry. Does the SQL actually express
what the methodology comment claims? Does the methodology match the
JTBD that landed it? Discrepancies between "what the comment says"
and "what the SQL does" are the most common source of distrusted
numbers.
2. Live source freshness
Check <schema>._dlt_loads.inserted_at for the source schema. A
fresh computed_at on the cockpit can paper over a stale
source_as_of — the staleness signal exists for a reason, and
validate cross-checks it directly against the dlt loads table rather
than trusting the cached source_as_of on the history row.
3. Independent cross-check rebuild
Re-derive the same number in a different way — typically:
- a hand count against the team's source-of-truth tables (webbank
core.client_profile,client_profile_state_log, etc.), - a query that reaches the same answer through a different join path,
- a Jasper-rendered report covering the same concept.
If the rebuild disagrees with metric_history, the audit file
documents the gap and points at the SQL line responsible. If the
rebuild agrees, the audit file documents the match and the
methodology earns the right to claim "verified by cross-check".
4. Business semantics of red
The most important layer, and the one that distinguishes validate from "the SQL runs":
- What does red on this metric actually mean in business terms? ("4 minutes of payment processing time" — meaningful? "22% approval rate" — wrong-low or right-low?)
- Is the direction right? An amber on a "passing rate" metric
should mean "fewer pass than we want", not "more". Sign mistakes
in
norm_value/alert_valuelook like correct numbers until somebody asks why nothing ever fires. - Does the period match the action horizon? A
30dwindow on a metric whose owner can only act on a24hsignal is a misdesign.
5. Action surface
If the metric went red right now, what would the owner do? If the answer is "open a ticket", the metric is closer to a vanity counter than to a cockpit tile. The audit file calls this out — not as a reason to delete the metric, but as a reason to revisit the JTBD or re-route the routing.
Outputs
The skill writes one file:
docs/metric-audits/<YYYY-MM-DD>-<metric_id>.mdFormat:
- Verdict. Pass / pass-with-caveat / fail.
- What the metric claims to measure.
- What it actually measures (after reading the SQL).
- Live freshness.
source_as_of,computed_at, dlt-loads cross-check. - Cross-check rebuild. SQL or hand-count, plus the number.
- Red zone semantics. What red means in EUR / hours / clients.
- Action. What the owner does on red.
- Recommendation. Verify / fix SQL / fix thresholds / retire.
The file is committed to the repo as a permanent record. Even if the metric is later retired, its audit trail stays.
When validate is the right hammer
Validate is for one metric at a time, deeply. It is not the right tool for:
- Daily triage of the cockpit — that's
heartbeat-today. - Deciding whether to add a new metric — that's
metrics-discovery. - Routine threshold tuning — that's the inline editor on the cockpit.
- Bulk audit of every metric — open a candidates file and run validate on each one in sequence; there is no batch mode.
See also
- Metric lifecycle › stage 8 — validate is the mechanical version of the owner review step.
- Sources › webbank first — the rule the cross-check rebuild defers to.