Files

T

admin f7fc15b17c Sync working tree before initial Gitea push

Includes:
- cli.py path fix (parents[1]) for config/catalog resolution
- Library cleanup feature design docs (TODO.md, mockup)
- Audit + bug-queue markdowns from May 2026 reliability pass
- .gitignore expanded for transient artifacts

2026-05-26 22:35:42 +02:00

28 KiB

Raw Blame History

Bug Audit Plan — rclone-jav (Python CLI + Brave Extension)

Customized from D:\DEV\Project\Goal\bug-audit-template.md. Tightened for this project: scope is chunked, "bug" is narrowed, reproduction recipe is required, independent verification is enforced via fresh-context agents with bounded contract context, intentional patterns are listed only when verified against current code or current doc.

All output artifacts (per-scope bugs-*.md files, bugs-candidates-*.md scratch, audit-snapshot-<ISO>.md, and the final verification.md) live under D:\DEV\Project\rclone-jav\. Do NOT write audit output under D:\DEV\Extensions\Production\rclone-jav\ (extension folder) or D:\DEV\Project\Goal\ (template home).

What counts as a bug (for THIS audit)

Include:

Wrong result — code produces output that contradicts documented behavior, comment, or stated intent
Data loss / corruption — cache.json, config.json, chrome.storage, or remote file content can become incorrect or lost
Crash / unhandled exception — Python tracebacks, uncaught JS promise rejections that kill an operation
Silent failure — operation appears to succeed but didn't (e.g. write claimed but file not changed)
Contract violation — host RPC schema mismatch, manifest declaration mismatch, cache-version mismatch, fixture-driven expectation broken
Race condition with observable user-visible effect — concurrent operations leading to one of the above

Exclude (out of scope for this audit — separate effort):

Code style / formatting / linting
Performance unless it causes timeout or hang
Dead code / unused imports / unused variables
Outdated comments (unless misleading enough to cause wrong-result)
Security review (use /security-review instead)
Documentation gaps (separate doc-debt pass)
Refactor opportunities ("could be cleaner")
Missing features → file in TODO.md, not bugs.md

Phrase findings as "every function reviewed for externally observable bugs." Internal helpers with no flow to RPC / UI / file system / network get reviewed only as part of their caller's flow, not as their own audit unit.

Scope chunks (run each as separate audit pass)

Five chunks. Each gets its own bugs-<chunk>.md file. Do NOT batch into one giant audit — context grows, hallucinations multiply.

#	Chunk	Files in scope	Output
1	Python CLI	`rc-jav.py` + `rcjav/.py` + `tests/.py` + `fixtures/run.py` (all under `D:\DEV\Project\rclone-jav\`)	`bugs-python.md`
2	Native host	`host\rcjav-host.py` + `host\install-host.ps1` + `host\rcjav-host.bat` + `host\register-host.bat` (under `D:\DEV\Extensions\Production\rclone-jav\`)	`bugs-host.md`
3	Extension SW + content	`background.js` + `content.js` + `manifest.json` (under `D:\DEV\Extensions\Production\rclone-jav\`)	`bugs-extension-bg.md`
4	Extension Options pages	`src\options\*` (under `D:\DEV\Extensions\Production\rclone-jav\`)	`bugs-extension-options.md`
5	Extension Popup + Bulk Check	`src\popup\` + `src\bulk-check\` (under `D:\DEV\Extensions\Production\rclone-jav\`)	`bugs-extension-popup.md`

Tabvault extension (D:\DEV\Extensions\Production\tabvault\) is out of scope for this audit — separate project.

Explicit per-chunk excludes

Do NOT audit (read-only-if-needed-for-context, never report findings against):

**/__pycache__/ — bytecode
**/*.bak — historical snapshots (e.g. CLAUDE.md.bak, cache.json.bak)
cache.json, config.json — runtime data, not code (their schema is auditable in docs/CACHE_CONTRACT.md)
benchmarks/*.py — performance probes, not product
mockups/*.html — design memory, not code
wincatalog/ — user data dir
README.md, TODO.md, AGENTS.md, CLAUDE.md, docs/*.md — docs (separate doc-debt pass)
host/logs/* — runtime logs
host/state/* — runtime state
host/com.rcjav.host.json, host/allowed-extension-ids.json — generated/runtime config
Per-project memory under C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\ — READ for rules, do NOT audit

Required reading before audit

Auditor MUST read (and reference findings against) the following intentional-pattern docs:

D:\DEV\Project\rclone-jav\AGENTS.md — Python CLI session memory, ID normalization rules, defaults
D:\DEV\Project\rclone-jav\CLAUDE.md (if present)
D:\DEV\Project\rclone-jav\TODO.md — deferred work that's NOT a bug
D:\DEV\Extensions\Production\rclone-jav\docs\CACHE_CONTRACT.md — cache schema + ID rules versioning
D:\DEV\Extensions\Production\rclone-jav\AGENTS.md — extension session memory
D:\DEV\Extensions\Production\rclone-jav\CLAUDE.md (if present)
D:\DEV\Extensions\Production\rclone-jav\mockups\console-consolidation-claude.html — design rationale
C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\*.md — per-project memory (version bump rule, install workflow, no hollow suggestions)

If a finding contradicts an explicit decision in these docs, it's NOT a bug — it's expected behavior. Mark as discarded — intentional per <doc:section> in the False Positives section.

Known intentional patterns (verified against current code or current doc)

Only patterns confirmed against the current snapshot belong here. If a pattern is suspected but unverified, leave it OFF this list — the auditor will surface it, the verifier will check the cited doc, and discard-as-intentional happens there. Stale assumptions on this list are dangerous — they actively shield real bugs in code that's been touched.

Python CLI (verified)

extract_id() chops trailing single letters from filenames intentionally (e.g. IBW-902z → IBW-902) — see D:\DEV\Project\rclone-jav\AGENTS.md "ID normalization"
JAV IDs canonicalized to at least 3 digits but keep wider widths (ABC-027, ABCDE-1167) — not a "leading zero" bug
.ts ranks lowest among video containers in dupe keep ranking — AGENTS.md "Defaults from earlier sessions"
VIP folders (ClearJAV default) win first in dupe keep ranking — same
Cache loading falls back to empty cache when malformed top-level — intentional resilience, AGENTS.md "Recent decisions"
Scan is always recursive — old --recursive/-R flag was removed intentionally
extract_json_blob tolerates leading status lines + trailing noise — intentional for --basic output parsing

Native host (verified)

stderr capture lives INSIDE rcjav-host.py via os.dup2 (not in rcjav-host.bat via 2>>) — the bat NOT redirecting stderr is the fix, not a missing-redirect bug. See comments at top of rcjav-host.bat.
__port_disconnect__ is a synthetic action name for the rolling RPC log marker — not an actual RPC handler
_shrink_response called twice (once in main loop, once inside write_message) — defense-in-depth, intentional
client_req_id is None for RPCs originating from rclone-jav extension (only tabvault stamps it)
Discord webhook rate-limit uses last-alert-ts.json shared across host process spawns — intentional anti-storm
Host spawns fresh per connectNative call from each extension — intentional Chromium behavior, not a "leak"

Extension (verified against current files)

chrome.runtime.lastError voided after several Chrome API calls — silences MV3 warning, intentional
Native messaging 90s timeout in nativeCall — long enough for --quick on a slow remote
web_accessible_resources for src/options/options.html and src/bulk-check/bulk-check.html ONLY (NOT popup.html) — explicit per mockups/console-consolidation-claude.html; popup is browser-action UI, doesn't need WAR
Library Issues report-only kinds (resolution_*, quality_marker_not_resolution, missing_resolution, etc.) — user-chosen per session; not a "missing fix path" bug. Auto-rename only valid for bracket_id and nohyphen_id.
No ID chip removed from sidebar; no_id outcomes not logged to recent activity — intentional
Default landing pane = dupe-review — per mockup
Setup pane lives in SUPPORT sidebar group — current intentional placement after earlier orphaning/restoration
pcLabel empty string default — intentional, user opt-in
10-minute Discord webhook rate-limit — intentional anti-spam
mkv / mp4 / wmv / avi format-preference defaults — intentional KEEP-ranking order
Default cacheStaleHours = 24 — display only, doesn't change search results
_rcjavSwInstanceId is a fresh UUID per SW startup — used to detect SW eviction mid-call, intentional design

Not on this list — let auditor surface (do NOT shield)

DEFAULT_TARGET / DEFAULT_SOURCE hardcoded fallback values in rcjav/cli.py — these have been a regression source. Auditor checks current values vs config.json defaults vs AGENTS.md documented current state.
CONFIG_PATH / CACHE_PATH / CANCEL_FLAG / DEFAULT_CATALOG path resolutions in rcjav/ package — .parent vs .parents[1] has been a bug. Verify each against current package layout.
Any other path-resolution code that uses __file__ — same class of risk

Snapshot preflight (MANDATORY — Phase 1 cannot start without it)

Before any audit chunk runs, capture D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.md with:

# Audit Snapshot — <ISO timestamp>

## CLI repo (D:\DEV\Project\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
  <output, or "(clean)" if no output>

## Extension repo (D:\DEV\Extensions\Production\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
  <output, or "(clean)" if no output>

## Versions
- Extension manifest.json version: <X.Y.Z>
- Python: <python --version output>
- Node: <node --version output, for fixture runner>
- Brave: <version, if extension manual verification will be needed>

## Dirty-state policy
This audit accepts dirty working trees (option b). All file:line citations reference the snapshot AS-IS at this timestamp. No file edits during Phase 1 except audit docs (allowed-write list below).

Every bugs-*.md file MUST cite this snapshot ID in its header. If files change during audit, restart from a new snapshot.

Phase 1 allowed-write list (explicit)

During Phase 1 (audit), the ONLY files that may be created or modified are:

D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.md
D:\DEV\Project\rclone-jav\bugs-candidates-<chunk>.md
D:\DEV\Project\rclone-jav\bugs-<chunk>.md

Any other write = audit violation. Restart the chunk from snapshot.

bugs-candidates-.md format (Phase 1 scratch)

This is the auditor's scratch space. Hedge language permitted here (and ONLY here). Theories, speculation, "this looks wrong" go in candidates first.

# Candidate Findings — <chunk> — <snapshot ID>

## Candidate C-1
- File: <path:line>
- Hunch: <one sentence, hedge language OK>
- Trace: <what code path led here>
- Question for verifier: <specific yes/no claim to verify>
- Contract refs needed: <list of doc paths verifier should read, or "none">

## Candidate C-2
...

Only CONFIRMED or PARTIAL candidates from verifier get promoted into bugs-<chunk>.md. REFUTED or NEEDS-INFO stay in candidates with verifier's response appended.

After Phase 1 chunk completes: bugs-candidates-<chunk>.md stays beside bugs-<chunk>.md. Optional archive under D:\DEV\Project\rclone-jav\audits\<date>\ — operator choice, not enforced.

bugs-.md format (confirmed only)

# Bug Report — <chunk name> — <snapshot ID>

Snapshot: audit-snapshot-<ISO>.md
Required-reading docs read: [Y for each in list above]
Auditor agent: <type / fresh context confirmed Y/N>

---

## Severe (S)

Definition: data loss, crash, silent wrong result, contract violation that breaks user workflow.

### S-1
- **File:** `<absolute path>:<line>` (single line OR `:<start>-<end>` range)
- **Symptom (one sentence):** what the user / caller observes
- **Why it's a bug:** concrete reason citing the contract / doc / comment it violates. NO hedge language: "could", "might", "potentially", "in theory", "may cause", "possibly" — if you can't trace it concretely, demote to N or discard.
- **Reproduction:**
    1. Input or state: `<exact value / command / RPC payload>`
    2. Expected: `<what doc / comment / contract says should happen>`
    3. Actual: `<what code actually does, traced through>`
- **Suggested fix sketch (optional, one-liner):** NOT to be implemented in audit phase
- **Verifier agent:** `<identifier, must be fresh-context>`
- **Verifier verdict:** CONFIRMED / PARTIAL (with revised repro)
- **Verifier confidence:** high / medium / low — low requires re-verification with different agent
- **Contract refs verifier read:** `<list>`
- **Mirror check needed in:** `<other chunk/file/RPC/schema if finding crosses a contract boundary, else "none">`
- **Status:** open

---

## Moderate (M)
Definition: degraded but observable behavior, recoverable error path missing, edge case mishandled.
<same field set>

---

## Light (L)
Definition: misleading log / error message, dev-only annoyance, minor input-validation gap.
<same field set>

---

## Needs Input (N)
Definition: looks suspicious but requires user / spec clarification before classifying.

### N-1
- **File:** ...
- **Question:** what specifically needs clarification
- **Why blocked:** what doc would resolve it but doesn't exist or is ambiguous
- **Status:** needs-input

---

## False Positives (discarded)
- `<file>:<line>` — initially flagged as `<what>`; discarded because `<reason, citing doc:section>`

Cross-chunk mirror check (narrowly scoped)

Mirror check fires ONLY when a confirmed bug crosses a contract boundary. Contract boundaries:

Cache schema (docs/CACHE_CONTRACT.md)
Host RPC payload/response shape
Settings schema (chrome.storage.sync.settings ↔ host alerts-config.json)
ID normalization rules shared between extension's id-extract.js and host's host_normalize_id and Python's rcjav/ids.py
Fixture corpus expectations (Python + Node consumers in fixtures/)

When a bug entry hits one of those, add:

Mirror check needed in: <specific file/RPC/schema>

Default (no contract boundary touched) = no mirror check. Avoids spawning vague secondary audits.

Final verification (Phase 3) scans every confirmed bug for Mirror check needed in: and runs the requested check.

PHASE 1 — AUDIT

Per-chunk goal

/goal bugs-<chunk>.md exists in D:\DEV\Project\rclone-jav\, cites audit-snapshot-<ISO>.md, contains every file in scope chunk <N> reviewed for externally observable bugs, each bug has exact file:line citation, each bug has reproduction recipe (input/expected/actual), each bug verified by a fresh-context independent agent reading only cited contract docs, intentional patterns from "Known intentional patterns" list NOT flagged, no hedge language in confirmed bugs, bugs ranked S/M/L/N, mirror check noted where contract boundary touched, zero code changes made

Run the goal once per chunk (5 runs total). Do not batch.

Verifier protocol

For each candidate promoted from bugs-candidates-<chunk>.md, spawn a NEW agent (fresh context, no audit-history visibility) with this exact framing:

Read <file>:<line> and the surrounding function ONLY. The claim is: <symptom>.
The supposed reproduction is: input <X>, expected <Y>, actual <Z>.
Contract refs to read before judging: <list from candidate, max 3 docs>.

Reply with one of:
  CONFIRMED — bug is real, repro matches
  PARTIAL   — symptom real, repro doesn't match exactly, suggest revised repro
  REFUTED   — code does <Z'> not <Z>; here's the trace
  NEEDS-INFO — can't verify without <X>

Verifier MUST NOT see:

Auditor's reasoning beyond the symptom/repro claim
Other candidates in this chunk
Other confirmed bugs in this or any other chunk
Audit-internal memory or chat history

Otherwise it's a context-correlated rubber stamp, not independent verification.

Stop conditions per chunk

Restart the chunk with tighter framing if:

Verifier rejects > 30% of confirmed-candidate attempts → "what counts as a bug" threshold is too loose
Candidate count exceeds 50 in one chunk → scope too broad, split it
Auditor produces a finding flagged by an Intentional Pattern → re-read this doc

PHASE 2 — FIX LOOP

One bug at a time, starting at S-1 of the highest-priority chunk, then M-1, then L-1. Skip N (needs-input) until user resolves.

Per-bug goal

/goal <BUG-ID> in <bugs-chunk.md> is marked "fixed", the fix is applied at the cited file:line, the bug's reproduction recipe now returns Expected not Actual, no other bugs.md entries were changed, no unrelated code was modified, any tests covering the affected code still pass (or new test added if none existed), version bump applied if extension files touched

Replace <BUG-ID> with the actual ID (e.g. S-1).

Fix verification gate

Before marking status: fixed:

Re-run the bug's reproduction recipe — must now produce Expected, not Actual
Per-file test re-run: if tests/ or fixtures/ cover the affected file, re-run them, all must pass
If no test existed for the now-fixed behavior: write one, place under tests/ or fixtures/
If extension code changed: bump manifest.json version (per feedback_extension_version_bump.md — one bump per user-requested update, visible reload-verification signal)
Do NOT touch: any other bug entry, any file marked DO NOT FIX in code comments, any intentional pattern listed above
Update the bug entry with Status: fixed and a Fix: line citing the new file:line of the change

After completing all fixes in a chunk

Run the chunk's full test suite, not just per-file tests. Catches cross-bug interactions (e.g. fix for S-1 in rcjav/cache.py interacts with fix for M-2 in rcjav/dupes.py).

PHASE 3 — FINAL VERIFICATION

/goal all bugs in bugs-*.md files under D:\DEV\Project\rclone-jav\ are marked "fixed", "skipped" (with reason), or "needs-input" (awaiting user); D:\DEV\Project\rclone-jav\verification.md exists confirming a final audit of every modified file finds no new bugs introduced by the fixes; verification.md lists each fixed BUG-ID + its commit/edit and the repro-now-passes proof; every "Mirror check needed in:" entry resolved (either no mirror bug found, or new bug filed in target chunk); manifest.json version is incremented appropriately

verification.md format

# Verification — <ISO date>

Original snapshot: audit-snapshot-<ISO>.md
Final snapshot: audit-snapshot-<final ISO>.md

## Fix summary
- S-1 (bugs-python.md): fixed at <file:line>. Repro now returns Expected (was Actual). Test added: <test path>.
- M-1 (bugs-extension-bg.md): fixed at <file:line>. Existing test <name> still passes.
- ...

## Mirror checks resolved
- S-3 mirror in bugs-host.md: scanned `handle_search` for same contract issue, NOT present.
- M-2 mirror in bugs-python.md: FOUND same issue → filed as M-7 in bugs-python.md, fixed at <file:line>.

## Skipped
- L-3 (bugs-host.md): skipped — `<reason>` (e.g. user decision, deferred to next audit)

## Needs input
- N-1 (bugs-extension-options.md): awaiting user clarification on <question>

## Final pass
- Files modified during fix phase: <list>
- Independent re-audit of those files: <date>, <verifier agent>, found 0 new bugs / found <N> new bugs (back to PHASE 1)
- All `bugs-*.md` files: zero entries with status `open`
- Extension manifest.json: version <X> → <Y> (bumped per shipped change)
- All existing tests pass: <test runner output summary>
- Fixture corpus runs: <Python runner + Node runner exit codes>

ANTI-HALLUCINATION RULES (enforced — not optional)

No bug without file:line — line range only acceptable if symptom is genuinely multi-line
No bug without reproduction recipe with concrete input / expected / actual
Verifier MUST be fresh-context — same agent re-reading the claim is not independent
Verifier reads only cited contract docs, not the whole project memory pile — bounded context preserves independence
One bug per fix session — no batch fixes even for "obviously similar" findings
DO NOT FIX banners + intentional patterns are untouchable — listed in this doc + AGENTS.md / mockups
Severity is criteria-based, not vibes-based — Severe = data loss/crash/silent-wrong; Moderate = degraded observable; Light = misleading message / minor
Forbidden hedge language in confirmed bugs: "could be", "might", "potentially", "in theory", "may cause", "possibly". If you can't trace it concretely, demote to Needs Input or candidate scratch.
No speculative race conditions — race must have observable user-visible repro recipe, not just "concurrent code path exists"
Reference contracts, not preferences — bugs cite what code SHOULD do per a doc/comment/test, not what auditor thinks would be nicer
No bug for missing feature — that's a TODO, goes in TODO.md not bugs.md
Phase 1 is read-only except audit docs — see allowed-write list above

Final-pass readability checklist (run before any audit)

Before Phase 1 starts, re-read this doc and verify:

Every "intentional pattern" line has been verified against current code OR cites a current doc that exists right now
Any old memory/session claim that conflicts with current files has been removed or softened
Phase 1 allowed-write list is explicit and current
Candidates clearly separated from confirmed bugs (different files, different formats)
Verifier prompt includes contract_refs: and does NOT include auditor reasoning
Stop conditions are present (30% rejection, 50 candidates)
Mirror check scope is narrowly defined (contract boundaries only)
Excluded paths are current (no missing dirs, no dead refs)

If any check fails, fix this doc before starting audit.

NOTES

Run audit goals from the CLI project root: cd D:\DEV\Project\rclone-jav && claude — even when auditing extension files, output stays in this folder
Extension folder and CLI folder are separate git repos — verify with git status in each before audit so you're auditing a known snapshot
Per-project memory at C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\ carries feedback rules — read those at audit start, they override default audit behavior
The extension repo currently has uncommitted modifications (hybrid state from codex's roadmap work + later edits). Snapshot captures this state; option (b) accepts dirty + records what was dirty. No auto-stash.

Appendix — Recommended agent topology (Claude Code / multi-agent runners)

This appendix is OPTIONAL — the plan above is portable to any /goal-style runner. If you're running it in Claude Code or a similar multi-agent tool, this section describes how to map the independence + parallelism requirements onto explicit agent calls. Operators using a different runner can ignore this appendix without losing the plan's structure.

Role map

Main Coordinator (the session you start the audit from)

Owns the snapshot file (audit-snapshot-<ISO>.md)
Launches Chunk Auditor agents (parallel allowed)
Collects produced bugs-candidates-<chunk>.md files
Launches Verifier agents per candidate (or small batch)
Promotes CONFIRMED / PARTIAL findings into bugs-<chunk>.md
Drives Phase 2 fix loop one bug at a time
Launches Final Re-Audit agents in Phase 3
The only role with write access to multiple files

Chunk Auditor Agents (one per scope chunk)

Canonical agent type: Explore (read-only, fast)
Parallel allowed once snapshot is written
Inputs: chunk file list, snapshot ID, required-reading docs, this plan's "Known intentional patterns" + "Not on this list — let auditor surface" sections
Output: bugs-candidates-<chunk>.md ONLY (no confirmed-bug writes; coordinator promotes)
Must cite file:line + candidate repro; hedge language permitted in candidates
Must NOT: edit product code, edit another chunk's candidate file, write to confirmed bug files

Verifier Agents (fresh context per candidate, or small candidate batch from same file)

Canonical agent type: Explore (read-only, blind)
Fresh context — NO prior audit-history visibility
Inputs (and ONLY these):
- file:line of the claim
- Symptom (one sentence)
- Reproduction recipe
- contract_refs: list (max 3 docs)
Must NOT see: auditor reasoning, the candidate file as a whole, other candidates, other chunks' findings, this plan's hedge-language rules (verifier only verifies the specific claim)
Output: one of CONFIRMED / PARTIAL (with revised repro) / REFUTED (with code trace) / NEEDS-INFO (with what's missing)

Fix Phase Agent (Phase 2)

Canonical agent type: main coordinator context OR a single write-capable general-purpose agent
Serial — one bug at a time
No parallel fixes even for "obviously similar" bugs
Inputs: the one bug entry being fixed, full file context, project memory
Outputs: code edits, bug entry status update, test additions if needed
Re-runs the bug's repro recipe and per-file tests before marking fixed

Final Re-Audit Agents (Phase 3)

Canonical agent type: Explore (read-only)
One per modified-file group or per chunk that had fixes
Inputs: list of files modified during Phase 2, this plan
Output: confirmation of no new bugs introduced, OR new bug entries if found (which loop back to Phase 1)

File-ownership rules (prevent merge collisions)

Each Chunk Auditor owns ONLY its own bugs-candidates-<chunk>.md
Each Verifier writes nothing to disk — returns a structured response to the coordinator
Coordinator owns bugs-<chunk>.md, audit-snapshot-<ISO>.md, and verification.md
Fix Phase Agent owns the code files being edited + the bug entry being marked fixed
No two agents share write access to the same file at any time

Parallelism rules

Phase 1: chunks may be audited in parallel ONLY after the snapshot is written. Parallel auditors must not edit product code or each other's output files. Coordinator dispatches all 5 chunk Agent calls in a single message for max throughput.
Verifier dispatch: within a chunk, verifiers for distinct candidates may run in parallel. Verifiers for candidates that cite the SAME file must run sequentially (avoids verifier-context cross-contamination if a verifier loads file context that affects another).
Phase 2: strictly serial. One bug per Agent call. No parallelism.
Phase 3: re-audit agents may run in parallel by file group.

Canonical Agent tool calls (Claude Code specific)

Coordinator-level pseudocode:

# Phase 1 — parallel chunk audit
Agent(subagent_type="Explore", description="Audit chunk 1 Python CLI",
      prompt="<chunk 1 inputs + this plan's required reading + intentional patterns + output target>")
Agent(subagent_type="Explore", description="Audit chunk 2 native host", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 3 ext SW+content", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 4 ext options", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 5 ext popup+bulk", prompt="<...>")
# all 5 dispatched in one message → run in parallel

# Phase 1 — verifier per candidate
for candidate in bugs-candidates-<chunk>.md:
    Agent(subagent_type="Explore", description=f"Verify {candidate.id}",
          prompt="<file:line + symptom + repro + contract_refs ONLY — no auditor reasoning>")

# Phase 2 — serial fix loop
for bug in confirmed_bugs_sorted_by_severity:
    Agent(subagent_type="general-purpose", description=f"Fix {bug.id}",
          prompt="<single bug entry + repro + verification gate rules>")
    # wait for completion, verify repro now passes, mark fixed

# Phase 3 — final re-audit
for modified_file_group in fix_phase_diff:
    Agent(subagent_type="Explore", description=f"Re-audit {group}", prompt="<...>")

Anti-correlation rules (preserve verifier independence)

Coordinator must NOT pass auditor reasoning to verifier — only the structured claim
Coordinator must NOT pass the candidate file's full text to verifier — only the one candidate's fields
Each verifier call is a fresh Agent invocation — never reuse a verifier agent across candidates
If a verifier rejects a claim, do NOT immediately re-verify with another agent hoping for CONFIRMED — that's correlation-chasing. Demote the candidate to REFUTED, log in candidates file, move on.
Track verifier rejection rate per chunk (see Stop Conditions). If rejection >30%, the auditor's threshold is wrong, not the verifiers'.

28 KiB Raw Blame History