Includes: - cli.py path fix (parents[1]) for config/catalog resolution - Library cleanup feature design docs (TODO.md, mockup) - Audit + bug-queue markdowns from May 2026 reliability pass - .gitignore expanded for transient artifacts
12 KiB
Bug Report — Native Host — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md Required-reading docs read: AGENTS.md / mockup / CACHE_CONTRACT.md / bug-audit-plan.md / project memory Auditor agent: fresh Explore agent (chunk 2 auditor) Verifier agents: fresh Explore agents per candidate, blind context, stricter contract-check prompt + external-vs-internal-input rule
Chunk 2 calibration note: Moderate verification yielded 2 confirmed bugs + 1 demoted (M→L) with 40% pure-rejection rate (2/5 REFUTED). Auditor's recurring weaknesses: (1) flagging gate logic that's fail-SAFE as if it were fail-OPEN (C-1), (2) ignoring browser/protocol-level caps when worrying about host-side validation (C-2). Stricter verifier prompt with external-input + protocol-spec checks caught both false positives. Light candidates were NOT verified per audit-plan stop condition (>30% rejection → halt L verification). See bugs-candidates-host.md for unverified L list (C-6, C-7, C-8, C-9, C-10, C-11) and Needs Input C-12.
Severe (S)
(none flagged by auditor in this chunk)
Moderate (M)
M-1 — post_discord_alert blocks main message loop for up to 5 s
- File:
D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:174-289(post_discord_alert refactored into_discord_post_worker+_build_discord_bodyhelpers + publicpost_discord_alertthin wrapper after M-4 fix; was line 174-217 pre-fix), with callsites inhandle_test_alerts_config+ 4 main-loop sites (conn_close abnormal, read_message exception, handler exception, write_message exception) - Symptom (one sentence): When a handler exception or abnormal port close fires AND the Discord webhook URL is configured AND Discord is slow/unreachable, the main message loop blocks for up to 5 seconds inside
urllib.request.urlopen(timeout=5), delaying the failure response to the extension by the same 5 s. - Why it's a bug: All 5 callsites of
post_discord_alertexecute on the main thread that runs the native messaging loop. Of those: callsites 2-5 are rate-limited via_alert_rate_limited()(LAST_ALERT_FILE check at line 184-185) so the FIRST exception per 10-minute window blocks; callsite 1 (handle_test_alerts_config) deliberately deletes LAST_ALERT_FILE to bypass rate limiting (line 258) before callingpost_discord_alert— every Test (host) button click is a guaranteed 5 s main-thread block when Discord slow. During the block, the extension's RPC promise hangs waiting for the response. - Reproduction:
- Input: configure Discord webhook URL pointing at a slow/down endpoint (or kill network). Open Setup → Alerts → click Test (host).
- Expected: test fires asynchronously; UI returns immediately with "sent (still pending)" or similar
- Actual: Options page hangs ~5 s waiting for the host's RPC response, because host's main loop is blocked in urlopen
- Suggested fix sketch: spawn a background thread for
urlopen(fire-and-forget), or use a 1 s timeout instead of 5 s, or move webhook delivery into a worker queue consumed by a dedicated thread. Mirror the extension-side webhook post pattern (which already usesfetch().catch(...)without blocking the SW event loop). - Verifier agent: fresh Explore, blind context, stricter prompt
- Verifier verdict: CONFIRMED
- Verifier confidence: high
- Contract refs verifier read: native messaging response timing expectations; threading model of
main() - Mirror check needed in: extension-side
postDiscordAlertin background.js — already non-blocking (uses fetch), but verify pattern consistency - Status: fixed
- Fix:
D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:174-289— refactored post_discord_alert into shared internal worker (_discord_post_worker) + helper (_build_discord_body). Two public modes: (a)post_discord_alert(...)spawns daemon thread, returns immediately (used by 4 main-loop callsites: conn_close, read_error, handler_exception, write_error — each now passesalert_sourcelabel for analytics); (b)handle_test_alerts_configbuilds payload, spawns same worker with event+holder, waits 6 s, returns synchronous pass/fail or explicit timeout error"Discord webhook timed out after 6s; background post may still complete (see events.log)". Worker logs every outcome vialog_event("discord_post", ok=, status=, error=, alert_kind=, alert_source=, elapsed_ms=)— visibility preserved despite async execution. Error text capped at 120 chars; never logs webhook URL or full payload. Main message loop no longer blocks on Discord. Manifest bumped 0.1.38 → 0.1.39. Python syntax verified viapy_compile. Worker mechanics smoke-tested in isolation: bogus URL → 404 ok:False; bad domain → URLError ok:False with reason captured; fire-and-forget mode (no event/holder) → no raise. Test button still returns synchronous pass/fail for user experience.
M-2 — handle_scan returns success before _scan_worker can detect Popen failure
- File:
D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2235-2264(handle_scan) +:2053-2110(_scan_worker Popen path) +:2211-2220(_scan_worker exception path) - Symptom (one sentence): When
subprocess.Popenin_scan_workerfails (python missing, rc-jav.py path wrong, permission denied, etc.),handle_scanhas already returned{"ok": True, "started": True}to the extension because the thread was started but had not yet executed Popen; extension shows "scan started" for 1-2 seconds before the nextscan-progresspoll surfaces the actual error. - Why it's a bug:
handle_scancallsthread.start()at line 2263 then returns at line 2264 without waiting for Popen to succeed. If Popen raises (line 2092-2098) the worker's exception handler writesscan_ok: false, error: ...to SCAN_STATE_FILE (line 2211-2220) — but the extension already receivedok: trueand only learns of the failure on the next progress poll. Race window: short (1-2 s typically) but user-visible — UI shows "scan started" then suddenly "scan failed" with cryptic OS-level error. - Reproduction:
- Input: trigger Rebuild Cache from extension while python is not on PATH (or rc-jav.py path mis-set, or cwd has permission issue)
- Expected: handle_scan returns an error immediately so extension can show clear message before any "started" state
- Actual: extension shows "scan started" briefly → next poll → "scan failed: FileNotFoundError" or similar OS error
- Suggested fix sketch: validate Popen preconditions synchronously in
handle_scanbefore returning (python exists, rc-jav.py exists, cwd writable). OR use a sync event/queue from worker to handle_scan so it can wait briefly for the first state-file write before returning. - Verifier agent: fresh Explore, blind context, stricter prompt
- Verifier verdict: CONFIRMED
- Verifier confidence: very high (100%)
- Contract refs verifier read: _scan_worker exception path; SCAN_STATE_FILE write timing; handle_scan_progress detection logic
- Mirror check needed in: none — Popen race specific to scan path; other RPCs run handlers synchronously
- Status: fixed
- Fix:
D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2053-2305— added per-invocationspawn_event(threading.Event) +spawn_resultdict, both passed fromhandle_scaninto_scan_worker. Worker setsspawn_result["spawn_ok"] = Trueimmediately aftersubprocess.Popenreturns ORspawn_ok = False+erroron exception, then sets event.handle_scanwaits up to 500 ms viaspawn_event.wait(timeout=0.5)then branches: spawn_ok=True →{ok: true, started: true}; spawn_ok=False →{ok: false, started: false, error}; timeout →{ok: true, started: true, startup_pending: true}(backward compatible — existing UI ignores the new key). Per-invocation holder isolates the handoff from globals (_scan_proc) and state file (UI/progress surface) so cross-invocation contamination is impossible. Manifest bumped 0.1.36 → 0.1.37. Python syntax verified viapy_compile. Threading harness smoke-tested in isolation: success →{spawn_ok: True}+ event set; Popen fail (nonexistent binary) →{spawn_ok: False, error: "[WinError 2] ..."}+ event set; slow Popen → event NOT set after 500 ms (timeout branch fires). All 3 cases behave correctly. Runtime repro verified via temporary instrumentation (injectedraise FileNotFoundError("simulated spawn fail")immediately before thesubprocess.Popenline in_scan_worker, reloaded extension, triggered Rebuild Cache, UI showedscan failed: FileNotFoundError: simulated spawn failsynchronously with no misleading "scan started" flash). Instrumentation reverted post-test; manifest stayed at 0.1.37 because no code-of-record change. Note: the bad-rcjavPath test (point Setup → rcjavPath to non-existent path) does NOT exercise this fix path — that goes through Popen success → rc-jav.py exits 2 → existing async exception handler. M-3 specifically targets Popen-itself-raising, which is reachable via Python-on-PATH missing, OS permission denied at spawn time, or analogous OS-level interference. Use the instrumented-raise technique for any future regression test.
Light (L)
L-1 — Stderr blocking read freezes progress display for up to 5 s on rc-jav stall
- File:
D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2053-2227(_scan_worker), specifically:2101(stderr iterator loop),:2267-2275(deferred kill) - Symptom (one sentence): When rc-jav.py stalls mid-scan (e.g. rclone blocked on unresponsive remote), the
for raw in proc.stderr:iterator at line 2101 blocks until either a stderr line arrives or proc exits — during which the scan-state file is not updated, so the extension's progress display shows stale state for up to 5 s (until the deferred-kill mechanism forces proc.terminate). - Why it's a bug (demoted from M to L): Originally flagged as M. Re-verifier confirmed the blocking is real but: no data loss occurs, cancel still works (delayed by up to 5 s as terminate fires), zombie process not left behind. Pure UX progress-freeze, not workflow-breaking.
- Reproduction:
- Input: rclone remote becomes unresponsive mid-scan
- Expected: progress display updates with "stalled, will cancel in s" indicator, OR heartbeat that resumes when remote recovers
- Actual: progress frozen for 5 s, then deferred kill fires, scan marked complete with last-known progress
- Suggested fix sketch: add a watchdog timer that emits a heartbeat to SCAN_STATE_FILE every 1-2 s while stderr is silent, OR use non-blocking stderr reads with select/poll (cross-platform via threading)
- Verifier agent: fresh Explore, blind context, stricter prompt
- Verifier verdict: PARTIAL — symptom real, severity originally over-stated
- Verifier confidence: high (100%)
- Contract refs verifier read: cancel path; deferred-kill behavior; SCAN_STATE_FILE update timing
- Mirror check needed in: none
- Status: open
Needs Input (N)
(C-12 from candidates was N — _load_host_cache memoization key collision — left unverified per stop condition; candidate scratch retains it)
False Positives (discarded)
-
host/rcjav-host.py:1216-1221(_path_in_allowed_prefixes case-sensitivity) — flagged as Moderate "security bypass via uppercase remote". REFUTED. The gate is fail-SAFE, not fail-OPEN: case-mismatch causes the comparison to fail, which REJECTS the operation. No bypass possible. Verifier noted a related usability issue (legitimate uppercase paths get confusing rejection) but that's a UX gap, not a security bug. -
host/rcjav-host.py:306-316(read_message unbounded length prefix) — flagged as Moderate "DoS via 4 GiB length". REFUTED. Chrome native messaging protocol caps extension-to-host messages at 64 MiB browser-side per Chrome dev docs. Non-Brave processes cannot write to host stdin (it's piped by the browser into the host child process). The theoretical 4 GiB read cannot actually be triggered through any practical attack surface. Pure defensive-coding gap, not a real DoS.