Files
rclone-jav/bugs-extension-bg.md
admin f7fc15b17c Sync working tree before initial Gitea push
Includes:
- cli.py path fix (parents[1]) for config/catalog resolution
- Library cleanup feature design docs (TODO.md, mockup)
- Audit + bug-queue markdowns from May 2026 reliability pass
- .gitignore expanded for transient artifacts
2026-05-26 22:35:42 +02:00

12 KiB

Bug Report — Extension SW + content + manifest — audit-snapshot-2026-05-24T15-55Z.md

Snapshot: audit-snapshot-2026-05-24T15-55Z.md Required-reading docs read: ext AGENTS.md / mockup / bug-audit-plan.md / project memory Auditor agent: fresh Explore agent (chunk 3 auditor) Verifier agents: fresh Explore agents per candidate, blind context

This file contains CONFIRMED + PARTIAL findings only. Candidate scratch lives in bugs-candidates-extension-bg.md. REFUTED / NEEDS-INFO candidates stay in scratch with verifier response appended.

Chunk 3 calibration note: S+M verification yielded 4 confirmed bugs with 50% rejection rate. Auditor over-claimed by missing platform API contracts (Chrome connectNative keepalive, contextMenus contract, storage.local atomicity scope, Object.assign argument order, content.js function definitions). Light candidates were NOT verified per audit-plan stop condition. Revisit chunk 3 L only if needed; see bugs-candidates-extension-bg.md.

Cross-chunk re-rank note: Per bugs-fix-queue.md, this chunk's original severity labels were normalized against other chunks. Changes:

  • Original S-1 (recordRpc race) → M-6 in the queue. Demoted because it's diagnostic-log loss, not user-data loss.
  • Original M-1 (maybeNotifyHostError rate-limit race) → L-1 in the queue (renumbered locally as L-2 below to avoid colliding with prior L-1 Discord). Demoted because over-notification is annoying but recoverable and self-corrects after 10 min.
  • M-2 (context menu after SW eviction) → unchanged, kept M (queue M-2).

Severe (S)

Definition: data loss/corruption · wrong remote operation · persistent broken workflow no recovery · silent success when operation actually failed.

(none in this chunk after re-rank)


Moderate (M)

Definition: operation fails/hangs but user can retry · wrong persisted settings · diagnostic loss that materially blocks investigation · modal/workflow stuck until manual recovery · race causing stale/wrong visible results.

M-2 (queue) — Context menu missing after MV3 SW eviction

  • File: D:\DEV\Extensions\Production\rclone-jav\background.js:766-782 (ensureContextMenu) with callsites at :1019 (settings-changed), :1178 (onInstalled), :1179 (onStartup)
  • Symptom (one sentence): After the MV3 service worker evicts (~30s idle) and a new SW boots from a non-install/non-startup trigger (toolbar click, alarm, message), Chrome has no contextMenus registered and the user's "rclone-jav: Scan" / "rclone-jav: Search ..." entries silently disappear from right-click menus.
  • Why it's a bug: Per Chrome MV3 contract, chrome.contextMenus entries DO NOT persist across SW lifecycle boundaries — they must be re-created on each SW boot. ensureContextMenu is only invoked from: onInstalled (install/update), onStartup (browser boot), and the settings-changed message handler. None of these fire on routine SW evict→wake cycles.
  • Reproduction:
    1. Install extension. Right-click any page → context menu items present ✓
    2. Leave Brave idle for >30s with no extension activity. SW evicts.
    3. Click anything that wakes the SW NOT via onInstalled/onStartup/settings-changed (toolbar icon, alarm, content-script message). New SW boots.
    4. Expected: right-click context menu items still present
    5. Actual: items missing — must reload extension OR change a setting to restore
  • Suggested fix sketch: call ensureContextMenu() at top-level module init in background.js (runs every SW boot)
  • Verifier verdict: CONFIRMED — very high confidence (99%)
  • Contract refs verifier read: Chrome MV3 contextMenus lifecycle
  • Mirror check needed in: any other Chrome API state that must be re-registered per SW boot — chrome.alarms persistent, chrome.commands manifest-declared. contextMenus is the outlier.
  • Status: fixed
  • Fix: D:\DEV\Extensions\Production\rclone-jav\background.js:1193 — added top-level ensureContextMenu(); call at module init scope (NOT inside any addListener / event handler). This runs on every SW evaluation: install, browser startup, idle wake, alarm wake, message wake — covering all paths the prior listener-bound calls missed. Existing onInstalled/onStartup listeners kept as defensive backup; ensureContextMenu calls chrome.contextMenus.removeAll first, so duplicate invocation is idempotent. Manifest bumped 0.1.35 → 0.1.36. JS syntax verified via node --check. Code-trace proof of placement: line 1193 is at module scope (preceded only by other top-level statements like addListener registrations); fires unconditionally on every fresh SW evaluation before any user-event handler. Runtime repro requires user test (reload extension → verify context menu appears → wait 30+ s for SW idle → trigger SW wake via toolbar icon or content script message → right-click any page → expect context menu items still present without needing reload).

M-6 (queue) — recordRpc read-modify-write race loses log entries

Re-ranked from chunk S-1 to queue M-6 (diagnostic loss, not user data loss).

  • File: D:\DEV\Extensions\Production\rclone-jav\background.js:155-169 (recordRpc), callsites at :143, :318, :330, :343, :359
  • Symptom: When the native port disconnects with multiple inflight requests, the rolling RPC log loses entries because all pending rejects + the disconnect marker call recordRpc concurrently and each does non-atomic get-then-set on the same storage key.
  • Why it's a bug: recordRpc is async but callers fire fire-and-forget. When port.onDisconnect rejects every pending entry in the same tick, each reject wrapper calls recordRpc concurrently. All read same old array, all set [newEntry, ...old], last set wins. Chrome storage.local has no atomicity guarantee.
  • Reproduction:
    1. Native port disconnects while 3+ requests are inflight (host killed by AV during Check Library batch)
    2. Expected: all 3+ rejected requests + __port_disconnect__ marker land in chrome.storage.local[NATIVE_LOG_KEY]
    3. Actual: only one entry persists; the others silently disappear. Diagnostics → Native messaging log shows misleading picture exactly when user is investigating an outage.
  • Suggested fix sketch: wrap recordRpc body in let _rpcLogLock = Promise.resolve(); _rpcLogLock = _rpcLogLock.then(async () => { ... }) chain. Same pattern user already applied to _rcjavTrace in tabvault.
  • Verifier verdict: CONFIRMED — high confidence
  • Contract refs verifier read: Chrome storage.local API (no atomicity)
  • Mirror check needed in: options.js settings save flow, options-library-issues.js cache writes, activity log buffer, tabvault caller log (out-of-scope)
  • Status: fixed
  • Fix: D:\DEV\Extensions\Production\rclone-jav\background.js:155-180 — wrapped recordRpc body in promise-chain lock (_rpcLogLock = _rpcLogLock.then(async () => { ... })). Read-modify-write on chrome.storage.local[NATIVE_LOG_KEY] now serializes — concurrent callers chain instead of racing. Pattern mirrors tabvault _rcjavTrace lock and the M-2-follow-up ensureContextMenu lock for the same storage race class. maybeNotifyHostError(entry) still runs OUTSIDE the lock (its own rate-limit storage race is tracked separately as L-1 in the queue; not fixed here per one-bug-per-session rule). Manifest bumped 0.1.41 → 0.1.42. JS syntax verified via node --check. Lock mechanics smoke-tested in isolation with simulated chrome.storage.local (5 ms artificial latency on get/set, 5 concurrent writes): UNLOCKED variant stored only 1 of 5 entries (race confirmed); LOCKED variant stored all 5 entries in correct newest-first order. Mirror checks for options.js / options-library-issues.js storage writes deferred to Phase 3 final verification per audit plan.

Light (L)

Definition: confusing UI · cosmetic stale state · diagnostic annoyance · non-blocking alert issue · two-click recoverable.

L-1 (queue) — maybeNotifyHostError rate-limit get-then-set race

Re-ranked from chunk M-1 to queue L-1.

  • File: D:\DEV\Extensions\Production\rclone-jav\background.js:188-193, callsites via recordRpc at :173
  • Symptom: During a host outage burst (port disconnects with 2+ inflight requests), the 10-minute rate-limit on Discord/notification alerts can fire 2-3 alerts within the same window because the get-then-set on HOST_ALERT_KEY is non-atomic.
  • Why it's a bug (demoted from M to L): Same race pattern as M-6, but the impact is over-notification not data loss. User receives extra alerts during one outage event — annoying but informative. Self-corrects after 10-min window. Not blocking. Not stuck workflow.
  • Reproduction:
    1. Port disconnects with 3 inflight requests
    2. Expected: 1 alert per 10-min window
    3. Actual: 3 alerts for the same incident
  • Suggested fix sketch: wrap get-then-set in Promise lock (same as M-6 fix; can share the lock)
  • Verifier verdict: CONFIRMED — high confidence
  • Mirror check needed in: same as M-6
  • Status: fixed
  • Fix: D:\DEV\Extensions\Production\rclone-jav\background.js:191-247 — added dedicated _hostAlertLock Promise-chain (NOT shared with _rpcLogLock per codex's note — different storage key, different invariant). Entire maybeNotifyHostError body now runs inside the lock: rate-limit read/check/write of HOST_ALERT_KEY, plus the notification create and Discord post that follow. Concurrent calls in the same tick (5+ pending requests rejected on onDisconnect) now properly chain — first caller writes the new lastTs, subsequent callers see the fresh ts and bail at the check. Manifest bumped 0.1.42 → 0.1.43. JS syntax verified via node --check. Lock + rate-limit smoke-tested in isolation with simulated chrome.storage.local (5ms latency): UNLOCKED → 5 of 5 concurrent calls fire alerts (bug confirmed); LOCKED → 1 of 5 concurrent calls fires (correct); LOCKED + 5 sequential within rate-limit window → 1 alert (rate-limit still enforced after the lock change).

L-2 (queue, was chunk L-1) — Discord post failures have no passive UI surface

  • File: D:\DEV\Extensions\Production\rclone-jav\background.js:230-273 (postDiscordAlert), status write at :265-271
  • Symptom: Discord webhook failures are persisted to chrome.storage.local.lastDiscordSend but only visible by clicking Test buttons — no passive page-load display.
  • Why it's a bug (originally L): Diagnostic data not lost, just not surfaced passively. UX visibility gap.
  • Suggested fix sketch: on Setup pane render, read lastDiscordSend and show "Last alert: · ok|FAILED "
  • Verifier verdict: PARTIAL — symptom real, original "silent failure" framing wrong
  • Status: open

Needs Input (N)

(none)


False Positives (discarded)

  • background.js:90, 100-114, 120-148, 307-365 — flagged as Severe "pending Map orphaned on SW eviction mid-call". REFUTED via Chrome connectNative keepalive contract: an open port keeps the MV3 SW alive; if port closes, onDisconnect rejects all pending (line 139) — no orphans. pulseKeepalive is defensive redundancy. Caveat: if Brave observed diverging, would become Brave-specific bug — not verified.

  • background.js:62-76 (mergeSettings) — flagged as Moderate. REFUTED. Auditor misread Object.assign({}, dv, sv) — defaults go FIRST so missing keys fill from defaults.

  • background.js:895-905 (contextMenu tab.id null) — flagged as Moderate. REFUTED via Chrome contextMenus contract: registered contexts guarantee non-null tab.id. extractIdFromTab also has defensive null check.

  • content.js (escapeOverlay undefined) — flagged as Moderate. REFUTED. Function IS defined at content.js:451. Auditor missed it.