Includes: - cli.py path fix (parents[1]) for config/catalog resolution - Library cleanup feature design docs (TODO.md, mockup) - Audit + bug-queue markdowns from May 2026 reliability pass - .gitignore expanded for transient artifacts
28 KiB
Bug Audit Plan — rclone-jav (Python CLI + Brave Extension)
Customized from D:\DEV\Project\Goal\bug-audit-template.md. Tightened for this project: scope is chunked, "bug" is narrowed, reproduction recipe is required, independent verification is enforced via fresh-context agents with bounded contract context, intentional patterns are listed only when verified against current code or current doc.
All output artifacts (per-scope bugs-*.md files, bugs-candidates-*.md scratch, audit-snapshot-<ISO>.md, and the final verification.md) live under D:\DEV\Project\rclone-jav\. Do NOT write audit output under D:\DEV\Extensions\Production\rclone-jav\ (extension folder) or D:\DEV\Project\Goal\ (template home).
What counts as a bug (for THIS audit)
Include:
- Wrong result — code produces output that contradicts documented behavior, comment, or stated intent
- Data loss / corruption — cache.json, config.json, chrome.storage, or remote file content can become incorrect or lost
- Crash / unhandled exception — Python tracebacks, uncaught JS promise rejections that kill an operation
- Silent failure — operation appears to succeed but didn't (e.g. write claimed but file not changed)
- Contract violation — host RPC schema mismatch, manifest declaration mismatch, cache-version mismatch, fixture-driven expectation broken
- Race condition with observable user-visible effect — concurrent operations leading to one of the above
Exclude (out of scope for this audit — separate effort):
- Code style / formatting / linting
- Performance unless it causes timeout or hang
- Dead code / unused imports / unused variables
- Outdated comments (unless misleading enough to cause wrong-result)
- Security review (use
/security-reviewinstead) - Documentation gaps (separate doc-debt pass)
- Refactor opportunities ("could be cleaner")
- Missing features → file in
TODO.md, notbugs.md
Phrase findings as "every function reviewed for externally observable bugs." Internal helpers with no flow to RPC / UI / file system / network get reviewed only as part of their caller's flow, not as their own audit unit.
Scope chunks (run each as separate audit pass)
Five chunks. Each gets its own bugs-<chunk>.md file. Do NOT batch into one giant audit — context grows, hallucinations multiply.
| # | Chunk | Files in scope | Output |
|---|---|---|---|
| 1 | Python CLI | rc-jav.py + rcjav/*.py + tests/*.py + fixtures/run.py (all under D:\DEV\Project\rclone-jav\) |
bugs-python.md |
| 2 | Native host | host\rcjav-host.py + host\install-host.ps1 + host\rcjav-host.bat + host\register-host.bat (under D:\DEV\Extensions\Production\rclone-jav\) |
bugs-host.md |
| 3 | Extension SW + content | background.js + content.js + manifest.json (under D:\DEV\Extensions\Production\rclone-jav\) |
bugs-extension-bg.md |
| 4 | Extension Options pages | src\options\* (under D:\DEV\Extensions\Production\rclone-jav\) |
bugs-extension-options.md |
| 5 | Extension Popup + Bulk Check | src\popup\* + src\bulk-check\* (under D:\DEV\Extensions\Production\rclone-jav\) |
bugs-extension-popup.md |
Tabvault extension (D:\DEV\Extensions\Production\tabvault\) is out of scope for this audit — separate project.
Explicit per-chunk excludes
Do NOT audit (read-only-if-needed-for-context, never report findings against):
**/__pycache__/— bytecode**/*.bak— historical snapshots (e.g.CLAUDE.md.bak,cache.json.bak)cache.json,config.json— runtime data, not code (their schema is auditable indocs/CACHE_CONTRACT.md)benchmarks/*.py— performance probes, not productmockups/*.html— design memory, not codewincatalog/— user data dirREADME.md,TODO.md,AGENTS.md,CLAUDE.md,docs/*.md— docs (separate doc-debt pass)host/logs/*— runtime logshost/state/*— runtime statehost/com.rcjav.host.json,host/allowed-extension-ids.json— generated/runtime config- Per-project memory under
C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\— READ for rules, do NOT audit
Required reading before audit
Auditor MUST read (and reference findings against) the following intentional-pattern docs:
D:\DEV\Project\rclone-jav\AGENTS.md— Python CLI session memory, ID normalization rules, defaultsD:\DEV\Project\rclone-jav\CLAUDE.md(if present)D:\DEV\Project\rclone-jav\TODO.md— deferred work that's NOT a bugD:\DEV\Extensions\Production\rclone-jav\docs\CACHE_CONTRACT.md— cache schema + ID rules versioningD:\DEV\Extensions\Production\rclone-jav\AGENTS.md— extension session memoryD:\DEV\Extensions\Production\rclone-jav\CLAUDE.md(if present)D:\DEV\Extensions\Production\rclone-jav\mockups\console-consolidation-claude.html— design rationaleC:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\*.md— per-project memory (version bump rule, install workflow, no hollow suggestions)
If a finding contradicts an explicit decision in these docs, it's NOT a bug — it's expected behavior. Mark as discarded — intentional per <doc:section> in the False Positives section.
Known intentional patterns (verified against current code or current doc)
Only patterns confirmed against the current snapshot belong here. If a pattern is suspected but unverified, leave it OFF this list — the auditor will surface it, the verifier will check the cited doc, and discard-as-intentional happens there. Stale assumptions on this list are dangerous — they actively shield real bugs in code that's been touched.
Python CLI (verified)
extract_id()chops trailing single letters from filenames intentionally (e.g.IBW-902z→IBW-902) — seeD:\DEV\Project\rclone-jav\AGENTS.md"ID normalization"- JAV IDs canonicalized to at least 3 digits but keep wider widths (
ABC-027,ABCDE-1167) — not a "leading zero" bug .tsranks lowest among video containers in dupe keep ranking —AGENTS.md"Defaults from earlier sessions"- VIP folders (
ClearJAVdefault) win first in dupe keep ranking — same - Cache loading falls back to empty cache when malformed top-level — intentional resilience,
AGENTS.md"Recent decisions" - Scan is always recursive — old
--recursive/-Rflag was removed intentionally extract_json_blobtolerates leading status lines + trailing noise — intentional for--basicoutput parsing
Native host (verified)
- stderr capture lives INSIDE
rcjav-host.pyviaos.dup2(not inrcjav-host.batvia2>>) — the bat NOT redirecting stderr is the fix, not a missing-redirect bug. See comments at top ofrcjav-host.bat. __port_disconnect__is a synthetic action name for the rolling RPC log marker — not an actual RPC handler_shrink_responsecalled twice (once in main loop, once insidewrite_message) — defense-in-depth, intentionalclient_req_idisNonefor RPCs originating from rclone-jav extension (only tabvault stamps it)- Discord webhook rate-limit uses
last-alert-ts.jsonshared across host process spawns — intentional anti-storm - Host spawns fresh per
connectNativecall from each extension — intentional Chromium behavior, not a "leak"
Extension (verified against current files)
chrome.runtime.lastErrorvoided after several Chrome API calls — silences MV3 warning, intentional- Native messaging 90s timeout in
nativeCall— long enough for--quickon a slow remote web_accessible_resourcesforsrc/options/options.htmlandsrc/bulk-check/bulk-check.htmlONLY (NOTpopup.html) — explicit permockups/console-consolidation-claude.html; popup is browser-action UI, doesn't need WAR- Library Issues report-only kinds (
resolution_*,quality_marker_not_resolution,missing_resolution, etc.) — user-chosen per session; not a "missing fix path" bug. Auto-rename only valid forbracket_idandnohyphen_id. No IDchip removed from sidebar;no_idoutcomes not logged to recent activity — intentional- Default landing pane =
dupe-review— per mockup - Setup pane lives in SUPPORT sidebar group — current intentional placement after earlier orphaning/restoration
pcLabelempty string default — intentional, user opt-in- 10-minute Discord webhook rate-limit — intentional anti-spam
mkv/mp4/wmv/aviformat-preference defaults — intentional KEEP-ranking order- Default
cacheStaleHours= 24 — display only, doesn't change search results _rcjavSwInstanceIdis a fresh UUID per SW startup — used to detect SW eviction mid-call, intentional design
Not on this list — let auditor surface (do NOT shield)
DEFAULT_TARGET/DEFAULT_SOURCEhardcoded fallback values inrcjav/cli.py— these have been a regression source. Auditor checks current values vsconfig.jsondefaults vsAGENTS.mddocumented current state.CONFIG_PATH/CACHE_PATH/CANCEL_FLAG/DEFAULT_CATALOGpath resolutions inrcjav/package —.parentvs.parents[1]has been a bug. Verify each against current package layout.- Any other path-resolution code that uses
__file__— same class of risk
Snapshot preflight (MANDATORY — Phase 1 cannot start without it)
Before any audit chunk runs, capture D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.md with:
# Audit Snapshot — <ISO timestamp>
## CLI repo (D:\DEV\Project\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
<output, or "(clean)" if no output>
## Extension repo (D:\DEV\Extensions\Production\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
<output, or "(clean)" if no output>
## Versions
- Extension manifest.json version: <X.Y.Z>
- Python: <python --version output>
- Node: <node --version output, for fixture runner>
- Brave: <version, if extension manual verification will be needed>
## Dirty-state policy
This audit accepts dirty working trees (option b). All file:line citations reference the snapshot AS-IS at this timestamp. No file edits during Phase 1 except audit docs (allowed-write list below).
Every bugs-*.md file MUST cite this snapshot ID in its header. If files change during audit, restart from a new snapshot.
Phase 1 allowed-write list (explicit)
During Phase 1 (audit), the ONLY files that may be created or modified are:
D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.mdD:\DEV\Project\rclone-jav\bugs-candidates-<chunk>.mdD:\DEV\Project\rclone-jav\bugs-<chunk>.md
Any other write = audit violation. Restart the chunk from snapshot.
bugs-candidates-.md format (Phase 1 scratch)
This is the auditor's scratch space. Hedge language permitted here (and ONLY here). Theories, speculation, "this looks wrong" go in candidates first.
# Candidate Findings — <chunk> — <snapshot ID>
## Candidate C-1
- File: <path:line>
- Hunch: <one sentence, hedge language OK>
- Trace: <what code path led here>
- Question for verifier: <specific yes/no claim to verify>
- Contract refs needed: <list of doc paths verifier should read, or "none">
## Candidate C-2
...
Only CONFIRMED or PARTIAL candidates from verifier get promoted into bugs-<chunk>.md. REFUTED or NEEDS-INFO stay in candidates with verifier's response appended.
After Phase 1 chunk completes: bugs-candidates-<chunk>.md stays beside bugs-<chunk>.md. Optional archive under D:\DEV\Project\rclone-jav\audits\<date>\ — operator choice, not enforced.
bugs-.md format (confirmed only)
# Bug Report — <chunk name> — <snapshot ID>
Snapshot: audit-snapshot-<ISO>.md
Required-reading docs read: [Y for each in list above]
Auditor agent: <type / fresh context confirmed Y/N>
---
## Severe (S)
Definition: data loss, crash, silent wrong result, contract violation that breaks user workflow.
### S-1
- **File:** `<absolute path>:<line>` (single line OR `:<start>-<end>` range)
- **Symptom (one sentence):** what the user / caller observes
- **Why it's a bug:** concrete reason citing the contract / doc / comment it violates. NO hedge language: "could", "might", "potentially", "in theory", "may cause", "possibly" — if you can't trace it concretely, demote to N or discard.
- **Reproduction:**
1. Input or state: `<exact value / command / RPC payload>`
2. Expected: `<what doc / comment / contract says should happen>`
3. Actual: `<what code actually does, traced through>`
- **Suggested fix sketch (optional, one-liner):** NOT to be implemented in audit phase
- **Verifier agent:** `<identifier, must be fresh-context>`
- **Verifier verdict:** CONFIRMED / PARTIAL (with revised repro)
- **Verifier confidence:** high / medium / low — low requires re-verification with different agent
- **Contract refs verifier read:** `<list>`
- **Mirror check needed in:** `<other chunk/file/RPC/schema if finding crosses a contract boundary, else "none">`
- **Status:** open
---
## Moderate (M)
Definition: degraded but observable behavior, recoverable error path missing, edge case mishandled.
<same field set>
---
## Light (L)
Definition: misleading log / error message, dev-only annoyance, minor input-validation gap.
<same field set>
---
## Needs Input (N)
Definition: looks suspicious but requires user / spec clarification before classifying.
### N-1
- **File:** ...
- **Question:** what specifically needs clarification
- **Why blocked:** what doc would resolve it but doesn't exist or is ambiguous
- **Status:** needs-input
---
## False Positives (discarded)
- `<file>:<line>` — initially flagged as `<what>`; discarded because `<reason, citing doc:section>`
Cross-chunk mirror check (narrowly scoped)
Mirror check fires ONLY when a confirmed bug crosses a contract boundary. Contract boundaries:
- Cache schema (
docs/CACHE_CONTRACT.md) - Host RPC payload/response shape
- Settings schema (chrome.storage.sync.settings ↔ host alerts-config.json)
- ID normalization rules shared between extension's
id-extract.jsand host'shost_normalize_idand Python'srcjav/ids.py - Fixture corpus expectations (Python + Node consumers in
fixtures/)
When a bug entry hits one of those, add:
Mirror check needed in: <specific file/RPC/schema>
Default (no contract boundary touched) = no mirror check. Avoids spawning vague secondary audits.
Final verification (Phase 3) scans every confirmed bug for Mirror check needed in: and runs the requested check.
PHASE 1 — AUDIT
Per-chunk goal
/goal bugs-<chunk>.md exists in D:\DEV\Project\rclone-jav\, cites audit-snapshot-<ISO>.md, contains every file in scope chunk <N> reviewed for externally observable bugs, each bug has exact file:line citation, each bug has reproduction recipe (input/expected/actual), each bug verified by a fresh-context independent agent reading only cited contract docs, intentional patterns from "Known intentional patterns" list NOT flagged, no hedge language in confirmed bugs, bugs ranked S/M/L/N, mirror check noted where contract boundary touched, zero code changes made
Run the goal once per chunk (5 runs total). Do not batch.
Verifier protocol
For each candidate promoted from bugs-candidates-<chunk>.md, spawn a NEW agent (fresh context, no audit-history visibility) with this exact framing:
Read <file>:<line> and the surrounding function ONLY. The claim is: <symptom>.
The supposed reproduction is: input <X>, expected <Y>, actual <Z>.
Contract refs to read before judging: <list from candidate, max 3 docs>.
Reply with one of:
CONFIRMED — bug is real, repro matches
PARTIAL — symptom real, repro doesn't match exactly, suggest revised repro
REFUTED — code does <Z'> not <Z>; here's the trace
NEEDS-INFO — can't verify without <X>
Verifier MUST NOT see:
- Auditor's reasoning beyond the symptom/repro claim
- Other candidates in this chunk
- Other confirmed bugs in this or any other chunk
- Audit-internal memory or chat history
Otherwise it's a context-correlated rubber stamp, not independent verification.
Stop conditions per chunk
Restart the chunk with tighter framing if:
- Verifier rejects > 30% of confirmed-candidate attempts → "what counts as a bug" threshold is too loose
- Candidate count exceeds 50 in one chunk → scope too broad, split it
- Auditor produces a finding flagged by an Intentional Pattern → re-read this doc
PHASE 2 — FIX LOOP
One bug at a time, starting at S-1 of the highest-priority chunk, then M-1, then L-1. Skip N (needs-input) until user resolves.
Per-bug goal
/goal <BUG-ID> in <bugs-chunk.md> is marked "fixed", the fix is applied at the cited file:line, the bug's reproduction recipe now returns Expected not Actual, no other bugs.md entries were changed, no unrelated code was modified, any tests covering the affected code still pass (or new test added if none existed), version bump applied if extension files touched
Replace <BUG-ID> with the actual ID (e.g. S-1).
Fix verification gate
Before marking status: fixed:
- Re-run the bug's reproduction recipe — must now produce Expected, not Actual
- Per-file test re-run: if
tests/orfixtures/cover the affected file, re-run them, all must pass - If no test existed for the now-fixed behavior: write one, place under
tests/orfixtures/ - If extension code changed: bump
manifest.jsonversion (perfeedback_extension_version_bump.md— one bump per user-requested update, visible reload-verification signal) - Do NOT touch: any other bug entry, any file marked DO NOT FIX in code comments, any intentional pattern listed above
- Update the bug entry with
Status: fixedand aFix:line citing the new file:line of the change
After completing all fixes in a chunk
Run the chunk's full test suite, not just per-file tests. Catches cross-bug interactions (e.g. fix for S-1 in rcjav/cache.py interacts with fix for M-2 in rcjav/dupes.py).
PHASE 3 — FINAL VERIFICATION
/goal all bugs in bugs-*.md files under D:\DEV\Project\rclone-jav\ are marked "fixed", "skipped" (with reason), or "needs-input" (awaiting user); D:\DEV\Project\rclone-jav\verification.md exists confirming a final audit of every modified file finds no new bugs introduced by the fixes; verification.md lists each fixed BUG-ID + its commit/edit and the repro-now-passes proof; every "Mirror check needed in:" entry resolved (either no mirror bug found, or new bug filed in target chunk); manifest.json version is incremented appropriately
verification.md format
# Verification — <ISO date>
Original snapshot: audit-snapshot-<ISO>.md
Final snapshot: audit-snapshot-<final ISO>.md
## Fix summary
- S-1 (bugs-python.md): fixed at <file:line>. Repro now returns Expected (was Actual). Test added: <test path>.
- M-1 (bugs-extension-bg.md): fixed at <file:line>. Existing test <name> still passes.
- ...
## Mirror checks resolved
- S-3 mirror in bugs-host.md: scanned `handle_search` for same contract issue, NOT present.
- M-2 mirror in bugs-python.md: FOUND same issue → filed as M-7 in bugs-python.md, fixed at <file:line>.
## Skipped
- L-3 (bugs-host.md): skipped — `<reason>` (e.g. user decision, deferred to next audit)
## Needs input
- N-1 (bugs-extension-options.md): awaiting user clarification on <question>
## Final pass
- Files modified during fix phase: <list>
- Independent re-audit of those files: <date>, <verifier agent>, found 0 new bugs / found <N> new bugs (back to PHASE 1)
- All `bugs-*.md` files: zero entries with status `open`
- Extension manifest.json: version <X> → <Y> (bumped per shipped change)
- All existing tests pass: <test runner output summary>
- Fixture corpus runs: <Python runner + Node runner exit codes>
ANTI-HALLUCINATION RULES (enforced — not optional)
- No bug without file:line — line range only acceptable if symptom is genuinely multi-line
- No bug without reproduction recipe with concrete input / expected / actual
- Verifier MUST be fresh-context — same agent re-reading the claim is not independent
- Verifier reads only cited contract docs, not the whole project memory pile — bounded context preserves independence
- One bug per fix session — no batch fixes even for "obviously similar" findings
- DO NOT FIX banners + intentional patterns are untouchable — listed in this doc + AGENTS.md / mockups
- Severity is criteria-based, not vibes-based — Severe = data loss/crash/silent-wrong; Moderate = degraded observable; Light = misleading message / minor
- Forbidden hedge language in confirmed bugs: "could be", "might", "potentially", "in theory", "may cause", "possibly". If you can't trace it concretely, demote to Needs Input or candidate scratch.
- No speculative race conditions — race must have observable user-visible repro recipe, not just "concurrent code path exists"
- Reference contracts, not preferences — bugs cite what code SHOULD do per a doc/comment/test, not what auditor thinks would be nicer
- No bug for missing feature — that's a TODO, goes in
TODO.mdnotbugs.md - Phase 1 is read-only except audit docs — see allowed-write list above
Final-pass readability checklist (run before any audit)
Before Phase 1 starts, re-read this doc and verify:
- Every "intentional pattern" line has been verified against current code OR cites a current doc that exists right now
- Any old memory/session claim that conflicts with current files has been removed or softened
- Phase 1 allowed-write list is explicit and current
- Candidates clearly separated from confirmed bugs (different files, different formats)
- Verifier prompt includes
contract_refs:and does NOT include auditor reasoning - Stop conditions are present (30% rejection, 50 candidates)
- Mirror check scope is narrowly defined (contract boundaries only)
- Excluded paths are current (no missing dirs, no dead refs)
If any check fails, fix this doc before starting audit.
NOTES
- Run audit goals from the CLI project root:
cd D:\DEV\Project\rclone-jav && claude— even when auditing extension files, output stays in this folder - Extension folder and CLI folder are separate git repos — verify with
git statusin each before audit so you're auditing a known snapshot - Per-project memory at
C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\carries feedback rules — read those at audit start, they override default audit behavior - The extension repo currently has uncommitted modifications (hybrid state from codex's roadmap work + later edits). Snapshot captures this state; option (b) accepts dirty + records what was dirty. No auto-stash.
Appendix — Recommended agent topology (Claude Code / multi-agent runners)
This appendix is OPTIONAL — the plan above is portable to any /goal-style runner. If you're running it in Claude Code or a similar multi-agent tool, this section describes how to map the independence + parallelism requirements onto explicit agent calls. Operators using a different runner can ignore this appendix without losing the plan's structure.
Role map
Main Coordinator (the session you start the audit from)
- Owns the snapshot file (
audit-snapshot-<ISO>.md) - Launches Chunk Auditor agents (parallel allowed)
- Collects produced
bugs-candidates-<chunk>.mdfiles - Launches Verifier agents per candidate (or small batch)
- Promotes CONFIRMED / PARTIAL findings into
bugs-<chunk>.md - Drives Phase 2 fix loop one bug at a time
- Launches Final Re-Audit agents in Phase 3
- The only role with write access to multiple files
Chunk Auditor Agents (one per scope chunk)
- Canonical agent type:
Explore(read-only, fast) - Parallel allowed once snapshot is written
- Inputs: chunk file list, snapshot ID, required-reading docs, this plan's "Known intentional patterns" + "Not on this list — let auditor surface" sections
- Output:
bugs-candidates-<chunk>.mdONLY (no confirmed-bug writes; coordinator promotes) - Must cite file:line + candidate repro; hedge language permitted in candidates
- Must NOT: edit product code, edit another chunk's candidate file, write to confirmed bug files
Verifier Agents (fresh context per candidate, or small candidate batch from same file)
- Canonical agent type:
Explore(read-only, blind) - Fresh context — NO prior audit-history visibility
- Inputs (and ONLY these):
file:lineof the claim- Symptom (one sentence)
- Reproduction recipe
contract_refs:list (max 3 docs)
- Must NOT see: auditor reasoning, the candidate file as a whole, other candidates, other chunks' findings, this plan's hedge-language rules (verifier only verifies the specific claim)
- Output: one of
CONFIRMED/PARTIAL(with revised repro) /REFUTED(with code trace) /NEEDS-INFO(with what's missing)
Fix Phase Agent (Phase 2)
- Canonical agent type: main coordinator context OR a single write-capable
general-purposeagent - Serial — one bug at a time
- No parallel fixes even for "obviously similar" bugs
- Inputs: the one bug entry being fixed, full file context, project memory
- Outputs: code edits, bug entry status update, test additions if needed
- Re-runs the bug's repro recipe and per-file tests before marking fixed
Final Re-Audit Agents (Phase 3)
- Canonical agent type:
Explore(read-only) - One per modified-file group or per chunk that had fixes
- Inputs: list of files modified during Phase 2, this plan
- Output: confirmation of no new bugs introduced, OR new bug entries if found (which loop back to Phase 1)
File-ownership rules (prevent merge collisions)
- Each Chunk Auditor owns ONLY its own
bugs-candidates-<chunk>.md - Each Verifier writes nothing to disk — returns a structured response to the coordinator
- Coordinator owns
bugs-<chunk>.md,audit-snapshot-<ISO>.md, andverification.md - Fix Phase Agent owns the code files being edited + the bug entry being marked fixed
- No two agents share write access to the same file at any time
Parallelism rules
- Phase 1: chunks may be audited in parallel ONLY after the snapshot is written. Parallel auditors must not edit product code or each other's output files. Coordinator dispatches all 5 chunk Agent calls in a single message for max throughput.
- Verifier dispatch: within a chunk, verifiers for distinct candidates may run in parallel. Verifiers for candidates that cite the SAME file must run sequentially (avoids verifier-context cross-contamination if a verifier loads file context that affects another).
- Phase 2: strictly serial. One bug per Agent call. No parallelism.
- Phase 3: re-audit agents may run in parallel by file group.
Canonical Agent tool calls (Claude Code specific)
Coordinator-level pseudocode:
# Phase 1 — parallel chunk audit
Agent(subagent_type="Explore", description="Audit chunk 1 Python CLI",
prompt="<chunk 1 inputs + this plan's required reading + intentional patterns + output target>")
Agent(subagent_type="Explore", description="Audit chunk 2 native host", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 3 ext SW+content", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 4 ext options", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 5 ext popup+bulk", prompt="<...>")
# all 5 dispatched in one message → run in parallel
# Phase 1 — verifier per candidate
for candidate in bugs-candidates-<chunk>.md:
Agent(subagent_type="Explore", description=f"Verify {candidate.id}",
prompt="<file:line + symptom + repro + contract_refs ONLY — no auditor reasoning>")
# Phase 2 — serial fix loop
for bug in confirmed_bugs_sorted_by_severity:
Agent(subagent_type="general-purpose", description=f"Fix {bug.id}",
prompt="<single bug entry + repro + verification gate rules>")
# wait for completion, verify repro now passes, mark fixed
# Phase 3 — final re-audit
for modified_file_group in fix_phase_diff:
Agent(subagent_type="Explore", description=f"Re-audit {group}", prompt="<...>")
Anti-correlation rules (preserve verifier independence)
- Coordinator must NOT pass auditor reasoning to verifier — only the structured claim
- Coordinator must NOT pass the candidate file's full text to verifier — only the one candidate's fields
- Each verifier call is a fresh
Agentinvocation — never reuse a verifier agent across candidates - If a verifier rejects a claim, do NOT immediately re-verify with another agent hoping for CONFIRMED — that's correlation-chasing. Demote the candidate to REFUTED, log in candidates file, move on.
- Track verifier rejection rate per chunk (see Stop Conditions). If rejection >30%, the auditor's threshold is wrong, not the verifiers'.