fix(claude): walk audit findings tier by tier

This commit is contained in:
2026-04-27 03:37:03 +02:00
parent 52256352ce
commit ad6da302c7
+87 -54
View File
@@ -19,6 +19,10 @@ description: Run a deep, multi-lens review of existing code state (not a diff).
If the user invokes `/audit` with no arguments, infer the primary source tree from the project layout (Rust: `crates/*/src/`; TS/JS: `src/` or `packages/*/src/`; Python: the package directory). Ask briefly only if ambiguous. If the user invokes `/audit` with no arguments, infer the primary source tree from the project layout (Rust: `crates/*/src/`; TS/JS: `src/` or `packages/*/src/`; Python: the package directory). Ask briefly only if ambiguous.
## Communicating with the user
Phases are internal scaffolding for organizing this skill, not concepts the user needs to track. Do not announce them in user-facing text. No "Phase 3: validating findings before reporting", no "moving on to Phase 5", no "Phase 4 triage complete". Brief, plain progress notes are fine when warranted ("validating findings before reporting", "running the gate"), but they should describe the action, not name a phase.
## Phase 1: Context gather ## Phase 1: Context gather
Before spawning review agents: Before spawning review agents:
@@ -38,19 +42,19 @@ Send a single message with multiple `Agent` tool uses, each `subagent_type: gene
The `Agent` tool accepts a `model: "sonnet" | "opus" | "haiku"` parameter. Pick deliberately - some lenses are pattern-matching (cheap), others are reasoning-heavy (expensive but worth it). The `Agent` tool accepts a `model: "sonnet" | "opus" | "haiku"` parameter. Pick deliberately - some lenses are pattern-matching (cheap), others are reasoning-heavy (expensive but worth it).
| lens | model | why | | lens | model | why |
|------|-------|-----| | ----------------- | -------- | --------------------------------------------------------------------- |
| reuse | sonnet | pattern recognition across files, fits sonnet's strengths | | reuse | sonnet | pattern recognition across files, fits sonnet's strengths |
| quality | sonnet | structural critique, naming, dead code; sonnet is enough | | quality | sonnet | structural critique, naming, dead code; sonnet is enough |
| efficiency | **opus** | needs reasoning about hot paths, allocations, asymptotic patterns | | efficiency | **opus** | needs reasoning about hot paths, allocations, asymptotic patterns |
| errors | **opus** | control-flow analysis, silent-failure detection wants careful reading | | errors | **opus** | control-flow analysis, silent-failure detection wants careful reading |
| api | sonnet | visibility analysis, type design - mostly mechanical | | api | sonnet | visibility analysis, type design - mostly mechanical |
| bugs | **opus** | correctness reasoning is the place not to skimp | | bugs | **opus** | correctness reasoning is the place not to skimp |
| docs (opt-in) | haiku | "does the comment still match the code?" - cheap | | docs (opt-in) | haiku | "does the comment still match the code?" - cheap |
| tests (opt-in) | sonnet | gap analysis with semantic context | | tests (opt-in) | sonnet | gap analysis with semantic context |
| security (opt-in) | **opus** | high-stakes correctness, needs careful reading | | security (opt-in) | **opus** | high-stakes correctness, needs careful reading |
| a11y (opt-in) | sonnet | pattern matching with semantic context | | a11y (opt-in) | sonnet | pattern matching with semantic context |
| deps (opt-in) | haiku | mostly file scanning | | deps (opt-in) | haiku | mostly file scanning |
The validation agent in Phase 3 also runs on **opus** - false negatives drop real findings, so this is the wrong place to economize. The validation agent in Phase 3 also runs on **opus** - false negatives drop real findings, so this is the wrong place to economize.
@@ -59,6 +63,7 @@ These are defaults; if a project's lens is unusually subtle (e.g. obscure embedd
### Default lenses ### Default lenses
Each lens prompt must include: Each lens prompt must include:
- One-paragraph project summary (language, domain, what the code does). - One-paragraph project summary (language, domain, what the code does).
- The scope: exact file/directory list the agent must read. - The scope: exact file/directory list the agent must read.
- The lens's concrete focus (see below). - The lens's concrete focus (see below).
@@ -138,64 +143,92 @@ Classify each confirmed finding into one of four tiers:
This phase is **classification only**. Do NOT apply any fixes here, do NOT edit `TODO.md` here. Recording happens in the next phase, after the user has seen the proposed plan. This phase is **classification only**. Do NOT apply any fixes here, do NOT edit `TODO.md` here. Recording happens in the next phase, after the user has seen the proposed plan.
## Phase 5: Report and pause ## Phase 5: Report and apply tier by tier
Present a single structured report to the user. Layout: Don't dump every tier at once. The user shouldn't have to scroll back through a wall of findings to track decisions. Walk through one tier at a time: present, get approval, apply, commit, gate, then move to the next.
### Set up internal tracking
Before presenting anything, use `TaskCreate` to record one task per non-empty tier in the order below. The full finding set lives in those tasks, so you can hold detail internally and surface only the active tier to the user. Mark each tier's task complete as you finish it.
### Tier order
1. **Suggested backlog additions** - lock these in first. A single `TODO.md` append is cheap and ensures nothing is lost if a later code change goes sideways.
2. **Trivial fixes** - grouped by theme (e.g. "use existing helpers", "drop dead code"), one commit per theme.
3. **Substantive fixes** - one commit per logical change. Commit message explains the why.
4. **Needs discussion** - present each as: issue, two options, tradeoff. Apply only if the user gives specific direction.
Skip any tier that has zero items.
### Opening the report
Only on the first non-empty tier, lead with a single summary line:
``` ```
## Audit findings ## Audit findings
Ran <N> lens(es) over <scope>. <K> raw findings, <V> confirmed by validation. Ran <N> lens(es) over <scope>. <K> raw findings, <V> confirmed. Working through them one tier at a time.
### Trivial fixes (will apply on go-ahead)
- file.rs:42 - <issue>. Fix: <one-line change>.
- ...
### Substantive fixes (will apply on go-ahead, separate commits)
- file.rs:120 - <issue>. Fix: <approach>.
- ...
### Needs discussion (no action yet, want your read)
- file.rs:220 - <issue>. Two options: (a) <option>, (b) <option>. Tradeoff: <one line>.
- ...
### Suggested backlog additions (will append to TODO.md on go-ahead)
- <one-line description, framed as a TODO entry>
- ...
### Patterns worth noting
- <cross-cutting observation, if any>
Ready to apply? Tell me which tiers to proceed with, or call out specific items to skip.
``` ```
**Stop here and wait for the user's response.** Do not edit any files. Do not commit. Do not append to `TODO.md`. The user may: If there's a useful cross-cutting observation, mention it in one line here. Don't pad.
- Approve everything ("go ahead with all"). ### Format per tier
- Approve specific tiers ("apply trivials, hold substantives").
- Skip individual items ("skip the X one, apply the rest").
- Reject findings ("the API one isn't worth it, drop it").
- Ask for more detail on any item before deciding.
Match their direction precisely - don't slip auto-applied items past their stated scope. Items are **numbered 1..N within the tier**, resetting each tier so the user can say "skip 2 and 5" without ambiguity.
## Phase 6: Apply on confirmation ```
### <Tier name> (<count>)
Once the user has greenlit a tier or set of items: 1. file.rs:42 - <issue>. Fix: <one-line change>.
2. file.rs:88 - <issue>. Fix: <one-line change>.
3. ...
1. **Backlog additions** first (if approved). Append to `TODO.md`, single commit, before any code change. Ready to apply? "Go ahead" for all, or tell me which numbers to skip.
2. **Trivial wins** grouped by theme (e.g. "use existing helpers", "drop dead code"), one commit per theme. ```
3. **Substantive wins** one commit per logical change. Commit message explains the why.
4. **Needs-discussion items** only proceed if the user gave a specific direction during Phase 5; otherwise leave them.
After each commit (or once for a small scope), run the project's gate (`scripts/prepare.sh` / `pnpm check` / etc.) to confirm nothing broke. If it fails, fix the underlying issue rather than reverting or bypassing. For **Needs discussion**, expand each item:
End with a brief final summary: what was applied, what was backlogged, what's still open. If a finding got dropped during discussion, note it briefly so the user can confirm the bookkeeping. ```
### Needs discussion (<count>)
1. file.rs:220 - <issue>.
- Option a: <option>
- Option b: <option>
- Tradeoff: <one line>
2. ...
```
Stop after presenting a tier and wait for the user. They may:
- Approve all in the tier ("go ahead").
- Skip specific numbers ("skip 2 and 5, apply the rest").
- Reject the whole tier.
- Ask for more detail on a specific number before deciding.
Match their direction precisely. Don't slip auto-applied items past their stated scope.
### After approval, before the next tier
1. Apply the approved items in the current tier.
2. Commit per the tier's rule (one commit for the backlog append, one per theme for trivials, one per logical change for substantives).
3. Run the project's gate (`scripts/prepare.sh` / `pnpm check` / etc.). If it fails, fix the underlying issue rather than reverting or bypassing.
4. Mark the tier's task complete.
5. Present the next non-empty tier the same way.
### Public-API guard
A generic "go ahead" on Trivial or Substantive does NOT extend to items that touch public API surface (`pub` items, exported types, breaking signature changes). If such an item is sitting in those tiers, lift it into Needs discussion before presenting, so it gets explicit attention.
### Closing
After the final tier, give a brief summary: what was applied, what was backlogged, what's still open. If items got dropped during discussion, note them so the user can confirm the bookkeeping.
## Do not ## Do not
- **Edit any file before the user has seen the report and given a go-ahead.** This includes `TODO.md` and any code file. The validation pass is the last automatic step; everything after Phase 4 waits on the user. - **Edit any file before the user has approved the active tier.** This includes `TODO.md` and any code file. Validation is the last automatic step. Everything after waits on per-tier go-ahead.
- Auto-apply changes that affect public API surface even after a generic "go ahead" - if a finding touches `pub` items, surface it explicitly in Phase 5 under "Needs discussion". - Dump every tier at once or rely on a single big report. Walk tier by tier so the user only has to track one decision at a time.
- Surface phase numbers/names to the user. Phases are scaffolding for this skill, not vocabulary the user should have to learn.
- Auto-apply changes that affect public API surface even after a generic "go ahead". If a finding touches `pub` items, lift it into the Needs discussion tier before presenting.
- Stack findings from multiple lenses into one commit without clear grouping. - Stack findings from multiple lenses into one commit without clear grouping.
- Invent findings to fill space if a lens comes up empty. "Nothing to flag" is a valid outcome and should be reported as such. - Invent findings to fill space if a lens comes up empty. "Nothing to flag" is a valid outcome and should be reported as such.
- Re-raise items already in `TODO.md` / `BACKLOG.md`. - Re-raise items already in `TODO.md` / `BACKLOG.md`.