dotfiles/.claude/skills/audit/SKILL.md

---
name: audit
description: Run a deep, multi-lens review of existing code state (not a diff). Launches six specialized review agents in parallel - reuse, quality, efficiency, errors, api, bugs - then validates each finding before presenting. Optional scope (`/audit path/ path2/`) and optional lens subset (`/audit --lenses reuse,bugs`). Opt-in lenses for docs, tests, security, a11y, deps. Use when the user asks for a full review, deep review, codebase audit, cleanup pass, retrospective review, tech-debt sweep, or otherwise wants to surface issues across landed code - even if they say "clean up the project" or "look over the repo" without using the word "audit". Do NOT use for reviewing in-flight work (use /simplify), PR review (use /review or /code-review), or security-only review (use /security-review).
---

# /audit: Retrospective multi-lens codebase review

`/simplify` reviews a diff; `/audit` reviews current file state. Use it when issues may have accumulated before a review gate existed, when rolling onto an unfamiliar codebase, or when the user wants a deliberate "what's lurking?" sweep.

## Invocation

```
/audit                              # whole primary source tree
/audit crates/uitk/src/text/        # one directory
/audit src/auth.rs src/api.rs       # specific files
/audit --lenses reuse,bugs          # only those lenses
/audit src/foo/ --lenses docs,tests # both scope and lenses
```

If the user invokes `/audit` with no arguments, infer the primary source tree from the project layout (Rust: `crates/*/src/`; TS/JS: `src/` or `packages/*/src/`; Python: the package directory). Ask briefly only if ambiguous.

## Phase 1: Context gather

Before spawning review agents:

1. **Identify scope** from args or inference (see above).
2. **Read `CLAUDE.md`** (project root, plus any in touched directories) and any memory index. Capture project-specific conventions to feed each agent as "do NOT flag these" directives (example: "no em-dashes", "we intentionally keep `cargo test --all-targets` off", "focus state uses two bools deliberately, scheduled for refactor in TODO.md").
3. **Read `TODO.md` / `BACKLOG.md` / equivalents** for items explicitly deferred. Agents must not re-raise known debt.
4. **Identify the gate script** (e.g. `scripts/prepare.sh`, `pnpm check`, `make test`) so fixes can be verified at the end.

All three pieces of context get fed into every agent prompt so they respect the project's existing shape.

## Phase 2: Launch lens agents in parallel

Send a single message with multiple `Agent` tool uses, each `subagent_type: general-purpose`. The default set is six lenses; if `--lenses <list>` was given, run only those (plus any opt-in lenses named in the list).

### Model selection per lens

The `Agent` tool accepts a `model: "sonnet" | "opus" | "haiku"` parameter. Pick deliberately - some lenses are pattern-matching (cheap), others are reasoning-heavy (expensive but worth it).

| lens | model | why |
|------|-------|-----|
| reuse | sonnet | pattern recognition across files, fits sonnet's strengths |
| quality | sonnet | structural critique, naming, dead code; sonnet is enough |
| efficiency | **opus** | needs reasoning about hot paths, allocations, asymptotic patterns |
| errors | **opus** | control-flow analysis, silent-failure detection wants careful reading |
| api | sonnet | visibility analysis, type design - mostly mechanical |
| bugs | **opus** | correctness reasoning is the place not to skimp |
| docs (opt-in) | haiku | "does the comment still match the code?" - cheap |
| tests (opt-in) | sonnet | gap analysis with semantic context |
| security (opt-in) | **opus** | high-stakes correctness, needs careful reading |
| a11y (opt-in) | sonnet | pattern matching with semantic context |
| deps (opt-in) | haiku | mostly file scanning |

The validation agent in Phase 3 also runs on **opus** - false negatives drop real findings, so this is the wrong place to economize.

These are defaults; if a project's lens is unusually subtle (e.g. obscure embedded language, novel runtime), bump up.

### Default lenses

Each lens prompt must include:
- One-paragraph project summary (language, domain, what the code does).
- The scope: exact file/directory list the agent must read.
- The lens's concrete focus (see below).
- Project conventions to skip (from Phase 1).
- Deferred TODO items to skip (from Phase 1).
- Explicit "skip" list: the other lenses' topics (so findings don't overlap).
- Output format: bulleted findings, each with **file:line** (or range), **the issue** (concrete, one line), **suggested fix** (one line).
- Word cap: 400-700 words per agent. Findings scale with scope, so give bigger caps when auditing whole repos, smaller when auditing a single file.
- "HIGH SIGNAL only. If you are not certain a finding is real, don't flag it. Don't invent findings to fill space. If the area is clean, say so."

#### reuse

Duplicated logic, reinvented std / framework primitives, inline patterns that match an existing helper in the same codebase, inconsistent import paths (e.g. some files use top-level re-exports, others use deep paths). Flag the concrete duplication with file:line of each duplicate site. Don't propose new abstractions where no duplication exists yet.

#### quality

Structural and maintainability issues: redundant state (fields that duplicate each other, derived values cached unnecessarily, bools that encode the same thing as an adjacent enum), leaky abstractions (pub(crate) fields poked directly when a method would be cleaner), stringly-typed code, parameter sprawl, unnecessary comments (especially WHAT-not-WHY narration, section dividers, PR/commit/task references in code), nested conditionals 3+ levels deep that could flatten, dead code, brittle test fixtures. Skip anything a linter/formatter would catch - the gate handles those.

#### efficiency

Hot-path bloat (anything that runs per-frame / per-event / per-request / per-render): redundant allocations, repeated hashmap lookups, multiple tree walks where one would do, reconstructing immutable objects every call. Recurring no-op updates (state writes that trigger downstream invalidation even when the value didn't change). Unbounded growth in caches or maps. Overly broad operations (scanning entire collections to find one thing). Note "hot path" context per project - for GUI/game code it's paint/layout/event loops; for servers it's request handlers; for data pipelines it's per-record transforms.

#### errors

Error-handling hygiene. Silent failures (catch/Result discarded, unwrap/expect on fallible ops that could surface meaningful errors), inconsistent error propagation patterns within one codebase, `expect("...")` messages that don't explain why, panic locations that could be Result returns, missing error context at boundaries. Inspired by Anthropic's `silent-failure-hunter` agent.

#### api

Public API surface appropriateness. `pub` on items that could be `pub(crate)` (check whether external callers exist), missing `#[non_exhaustive]` on enums that will grow, doc-commented-but-private items (doc comment misplaced), trait methods with confusing defaults, constructors/builders inconsistent with the rest of the crate. Type design: invariants expressed via state instead of type (e.g. a pair of Option + bool that could be an enum). Inspired by Anthropic's `type-design-analyzer`.

#### bugs

Correctness issues: logic errors, off-by-one, missing bounds checks, wrong condition in if, incorrect loop termination, type confusion that compiles but is wrong, borrow patterns that compile but violate invariants (lifetimes too permissive / not permissive enough). Only flag with high confidence - "this might be wrong depending on inputs" is NOT a finding. Include language-specific bug profiles: Rust bugs often involve lifetimes/Send/Sync; JS/TS bugs often involve null/undefined, async Promise lifetimes, reference equality mistakes.

### Opt-in lenses

Enabled only via explicit `--lenses` containing their name.

#### docs

Public items without doc comments. Stale / rotted comments (code has moved on, comment hasn't). Outdated examples in doc comments. Missing module-level docs on non-trivial modules. Inspired by Anthropic's `comment-analyzer`.

#### tests

Coverage gaps (public API without tests), brittle fixtures (parallel arrays that should be tuples, over-complex setup), test-only code leaking into production, missing edge case assertions (empty input, single element, boundary values), assertions that don't match their descriptions. Inspired by Anthropic's `pr-test-analyzer`.

#### security

Common vulnerability patterns for the project type: injection (SQL / shell / template), hardcoded secrets, unsafe deserialization, missing input validation at trust boundaries, auth/session flaws, path traversal. Skip entirely for projects with no security surface (pure algorithm libraries, graphics code, offline tools).

#### a11y

UI accessibility: missing labels on inputs / buttons, colour-only signalling, tabindex / focus management, screen reader compatibility, keyboard-only navigation support. Only meaningful for UI-layer code.

#### deps

Dependency hygiene: duplicate deps at different versions, unused deps, feature flags that enable more than needed, dev-deps used in production code paths.

## Phase 3: Validation pass

Once lens agents return, do NOT present findings to the user yet. Launch a single validation agent with all raw findings as input:

> "Each finding below was flagged by a lens agent. For each one, confirm independently whether it's real by reading the referenced file(s) and the surrounding context. Classify each as: **confirmed** (high-confidence real issue), **misfire** (wrong reading of the code, semantics differ from what the agent thought), or **context-dependent** (real only under unstated assumptions - treat as misfire). Return the confirmed list, with the reasoning for any misfires you're dropping so the aggregator can double-check."

This mirrors the confidence-scoring approach in Anthropic's `/code-review`. Misfires are noisy; validation keeps signal high.

Skip validation only if the raw finding count is ≤3 and each one is obviously right (saves tokens when the audit turns up almost nothing).

## Phase 4: Triage

Classify each confirmed finding into one of four tiers:

- **Trivial fix** - small local change, clear improvement, no judgment call (e.g. "use existing helper at file.rs:42 instead of inline arithmetic").
- **Substantive fix** - real value, more than a few lines, clear scope (e.g. "merge two near-duplicate functions into one walker").
- **Needs discussion** - chunky refactor, public API change, enum redesign, hot-path caching with lifetime gymnastics. Outcome shouldn't be assumed.
- **Backlog item** - real but larger than cleanup. Should land in `TODO.md` (or equivalent) so it's not lost.

This phase is **classification only**. Do NOT apply any fixes here, do NOT edit `TODO.md` here. Recording happens in the next phase, after the user has seen the proposed plan.

## Phase 5: Report and pause

Present a single structured report to the user. Layout:

```
## Audit findings

Ran <N> lens(es) over <scope>. <K> raw findings, <V> confirmed by validation.

### Trivial fixes (will apply on go-ahead)
- file.rs:42 - <issue>. Fix: <one-line change>.
- ...

### Substantive fixes (will apply on go-ahead, separate commits)
- file.rs:120 - <issue>. Fix: <approach>.
- ...

### Needs discussion (no action yet, want your read)
- file.rs:220 - <issue>. Two options: (a) <option>, (b) <option>. Tradeoff: <one line>.
- ...

### Suggested backlog additions (will append to TODO.md on go-ahead)
- <one-line description, framed as a TODO entry>
- ...

### Patterns worth noting
- <cross-cutting observation, if any>

Ready to apply? Tell me which tiers to proceed with, or call out specific items to skip.
```

**Stop here and wait for the user's response.** Do not edit any files. Do not commit. Do not append to `TODO.md`. The user may:

- Approve everything ("go ahead with all").
- Approve specific tiers ("apply trivials, hold substantives").
- Skip individual items ("skip the X one, apply the rest").
- Reject findings ("the API one isn't worth it, drop it").
- Ask for more detail on any item before deciding.

Match their direction precisely - don't slip auto-applied items past their stated scope.

## Phase 6: Apply on confirmation

Once the user has greenlit a tier or set of items:

1. **Backlog additions** first (if approved). Append to `TODO.md`, single commit, before any code change.
2. **Trivial wins** grouped by theme (e.g. "use existing helpers", "drop dead code"), one commit per theme.
3. **Substantive wins** one commit per logical change. Commit message explains the why.
4. **Needs-discussion items** only proceed if the user gave a specific direction during Phase 5; otherwise leave them.

After each commit (or once for a small scope), run the project's gate (`scripts/prepare.sh` / `pnpm check` / etc.) to confirm nothing broke. If it fails, fix the underlying issue rather than reverting or bypassing.

End with a brief final summary: what was applied, what was backlogged, what's still open. If a finding got dropped during discussion, note it briefly so the user can confirm the bookkeeping.

## Do not

- **Edit any file before the user has seen the report and given a go-ahead.** This includes `TODO.md` and any code file. The validation pass is the last automatic step; everything after Phase 4 waits on the user.
- Auto-apply changes that affect public API surface even after a generic "go ahead" - if a finding touches `pub` items, surface it explicitly in Phase 5 under "Needs discussion".
- Stack findings from multiple lenses into one commit without clear grouping.
- Invent findings to fill space if a lens comes up empty. "Nothing to flag" is a valid outcome and should be reported as such.
- Re-raise items already in `TODO.md` / `BACKLOG.md`.
- Run `/audit` against a codebase that was just audited - diminishing returns.

## Parallelism note

All lens agents run in parallel (single message, multiple `Agent` tool uses). Six agents is the upper bound where parallelism still pays; above that, coordination overhead catches up. Keep opt-in lenses opt-in for this reason - running all 11 in parallel would be wasteful for most projects.

## When NOT to use this skill

- **Reviewing in-flight work** → `/simplify` against the diff.
- **PR review** → `/review` (built-in) or `/code-review:code-review` if that plugin is installed.
- **Security-only audit of a diff** → `/security-review`.
- **Back-to-back on the same code** → re-runs produce sharply diminishing returns.