Compare commits

..

10 Commits

Author SHA1 Message Date
admin f7fc15b17c Sync working tree before initial Gitea push
Includes:
- cli.py path fix (parents[1]) for config/catalog resolution
- Library cleanup feature design docs (TODO.md, mockup)
- Audit + bug-queue markdowns from May 2026 reliability pass
- .gitignore expanded for transient artifacts
2026-05-26 22:35:42 +02:00
admin 8d6bdb81af Add Node-side fixture runner — both sides now exercise the corpus
Mirrors `content.js` normalizeId() in a self-contained
`fixtures/run-node.mjs`. Loads `query-extraction.json` and
`shared-normalization.json` and asserts each case the same way the
Python runner does.

content.js can't be imported directly — it lives inside an injected
IIFE in the extension — so the runner duplicates the regexes
(ID_RE_DASHED, ID_RE_UNDASHED, BUILTIN_ID_NORMALIZERS). Inline
comment + README update flag that they must be kept in sync.

Why this matters: `shared-normalization.json` now actually catches
cross-side drift. A case that passes one side but fails the other is
the canary — without a Node runner, the contract was aspirational.

Verified:
  $ node fixtures/run-node.mjs
  query-extraction.json     -> normalizeId (10 cases): 10 passed
  shared-normalization.json -> normalizeId (5 cases):  5 passed
  OK: all 15 cases passed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 11:18:52 +02:00
admin b9a24b3fb5 Step 11: benchmark host fast-path, decision = keep
Adds benchmarks/host-fast-path.py and benchmarks/README.md.

The benchmark compares two paths for a cached single-ID search:
  1. fast-path: in-process dict walk inside the native host
     (handle_cached_search_fast in rcjav-host.py)
  2. subprocess: shell out to `rc-jav.py --search ID --cache --format json`

Idle baseline against the live 7124-file cache (5 queries × 5 iter):

  fast-path:   median 0.46ms  p95 0.61ms  max 0.72ms
  subprocess:  median 919ms   p95 1233ms  max 1385ms
  median speedup: 2000x

Decision: keep the fast path. The ~920ms subprocess cost is dominated
by Python interpreter startup + 1.3MB cache.json parse. That's
structural — it applies under idle Python too, not just when a scan
is running. The "Python actively scanning" condition from the original
roadmap doesn't change the verdict; it would only make the subprocess
path even slower while leaving the in-process path unaffected (the
fast path doesn't touch the scanning process).

The fast path is already correctly scoped — bails out for wildcards,
ranges, name searches, and --quick mode. Narrowing further would just
push more queries through the slow path with no upside.

Possible follow-up (not in scope here): memoize _load_host_cache with
mtime-based invalidation so the fast path doesn't reparse cache.json
on every call. Current per-call median (0.46ms) is already fast enough
that this is optional.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 11:12:33 +02:00
admin 66f82eb214 Add --print-rules-info CLI flag for host cache freshness lookup 2026-05-22 22:08:05 +02:00
admin 33c495ad57 Step 10j (Python side): cache contract + --reextract command
Implements the two-tier contract from docs/CACHE_CONTRACT.md (extension
repo, locked at step 9):

  cache_schema       on-disk shape; mismatch -> force rebuild
  id_rules           bumps when extraction rules change
  id_rules_signature sha256 over canonical rule text; catches drift
                     when the integer bump is forgotten

New constants in rcjav/cache.py:

  CACHE_SCHEMA_VERSION = 1
  ID_RULES_VERSION = 1     (the legacy "version: 3" cache reads as
                            id_rules: 0 after in-place migration)

New helpers:

  rcjav.ids.current_rules_signature()
      Sha256 over the canonical text of every rule that influences
      a jav_id: built-in regexes, BUILTIN_PART_RES, PART_RES (which
      captures user-added part patterns), FC2 handling.

  rcjav.cache.load_cache(signature=None)
      Reads cache.json. Legacy `version: 3` headers get an in-place
      header upgrade with no forced rescan; the cache is stamped as
      `id_rules: 0` + signature "legacy" so it surfaces as
      "stale by rules" in cache_state. Schema mismatch on the new
      header still forces a rebuild.

  rcjav.cache.cache_state(cache, signature)
      Classifies a cache as "fresh" / "stale_by_rules" /
      "schema_mismatch". Drives the three-state extension UX.

  rcjav.cache.stamp_current_rules(cache, signature)
      Updates id_rules and id_rules_signature in place. Called after
      a successful full scan or --reextract.

New CLI command:

  rc-jav.py --reextract

Walks `cache["remotes"][r]["files"]` against the live rule set and
updates `jav_id` in place. No rclone calls — fast path (seconds on
a 7k-file cache). Reports changed/unchanged/dropped per remote.
Stamps current rules into the saved cache.

--scan (full, no --scan-since) now also stamps current rules.
--scan --scan-since deliberately does NOT stamp: it only re-walks
recently-modified files, so older entries may still carry jav_ids
from previous rules; cache stays "stale by rules" until a full scan
or --reextract.

Verified:
  - python rc-jav.py --reextract --format json on the live 7124-file
    cache → 0 changes (existing IDs already canonical), cache.json
    rewritten with new header
  - cache_state on the post-migration cache → "fresh"
  - tests + fixtures + --help all pass

Extension-side (host's cache_status response + options-cache.js
three-state UX + Re-extract IDs button) ships in a separate commit
in the extension repo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 22:07:13 +02:00
admin 1cc2c38128 Step 10i: rc-jav.py becomes a thin shim; main() lives in rcjav/cli.py
The real entrypoint moved into rcjav/cli.py (845 lines: imports + the
remaining top-level glue + collectors + main()). rc-jav.py is now a
25-line shim that does:

  - `from rcjav import *` to re-export the package surface for callers
    that load this script via importlib.spec_from_file_location
    (tests/test_rules.py, fixtures/run.py, the native-messaging host
    via importlib).
  - `from rcjav.cli import main` and call it under `__main__`.

Verified all four entry points:
  - python rc-jav.py --help              → ok (legacy CLI invocation)
  - python -m rcjav.cli --help           → ok (package-direct)
  - python fixtures/run.py               → 17/17 cases pass
  - python -m unittest tests.test_rules  → 5/5 OK

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 22:01:52 +02:00
admin fb5700cdab Step 10h: extract renderers + file outputs into rcjav/output.py 2026-05-22 22:00:22 +02:00
admin 550482a7a2 Step 10g: extract library issues + renaming into rcjav/library.py 2026-05-22 21:54:49 +02:00
admin 90054e4d0b Step 10f: extract rclone subprocess wrappers into rcjav/rclone_io.py 2026-05-22 21:53:36 +02:00
admin 41f7c80f1b Step 10e: extract WinCatalog ingest into rcjav/catalog.py 2026-05-22 21:51:09 +02:00
33 changed files with 5549 additions and 1779 deletions
+8
View File
@@ -3,5 +3,13 @@ __pycache__/
scan-cancel.flag
cache.json
cache.json.tmp
cache.json.bak
scan-progress.json
cleanup-revert-*.json
reports/
.claude/
.venv/
venv/
.vscode/
.idea/
wincatalog/
+1 -1
View File
@@ -46,7 +46,7 @@ D:\DEV\Project\rclone-jav\
## Defaults from earlier sessions
- `cq:JAV` is the current remote root (after the rclone crypt config change moved it down a level)
- `default_target` in config.json = `["cq:JAV"]`
- `default_target` in config.json = `["cq:JAV"]` (hardcoded fallback in cli.py matches)
- `human_size()` formats to 2 decimals (e.g. `6.94 GiB`)
- After the 3-digit ID canonicalization change, run `python rc-jav.py --scan` to rebuild `cache.json` under the new padded keys.
- Duplicate KEEP ranking uses configurable VIP folders before source/size/format ranking. Default VIP folder is `ClearJAV`; video files there are treated as the trusted direct-rip copy.
+75
View File
@@ -2,8 +2,83 @@
## Deferred
### Deferred Lights from 2026-05-24 audit
See `D:\DEV\Project\rclone-jav\bugs-fix-queue.md` rows L-2 through L-6. Cosmetic / UX polish: Discord passive UI surface, stderr 5s stale on rc-jav stall (also L-3 mentioned above), expectedId state leak, history chip during modal, Clear button modal stay-open. None block S/M workflows.
### Library Cleanup feature (preview-first, Phase 1 only)
Design mockup: `mockups/library-cleanup-claude.html`. Scope-fenced to deterministic transforms + junk-strip on names already containing resolution OR clear garbage tokens. **NO ffprobe / resolution probing** — Phase 2 is a separate future feature.
Volume from 2026-05-26 Library Issues export: 106 files in scope. 85 cleanup-tier (resolution data present, reshape only) + ~21 junk-strip (resolution still missing after strip; Phase 2 candidates).
**Locked design decisions:**
| # | Decision | Locked value |
|---|---|---|
| 1 | Preview flow | P2 dedicated Cleanup Plan modal + P3 JSON export as side button |
| 2 | Part-suffix canonical shape | `{ID} #partN [resolution].ext` — matches CANONICAL_RESOLUTION_RE (last bracket = resolution) AND existing extract_id `#partN` convention |
| 3 | `_PARTN` cosmetic normalize (9 files) | Optional group, default deselected — extract_id already handles both forms |
| 4 | copy_suffix `(N)` conflict policy | Auto-skip + default-uncheck if stripped form exists in cache; default-check if no conflict |
| 5 | Multi-pattern transforms on one file | Composite single row; reason field lists all applied transforms |
| 6 | Revert plan artifact | Auto-save `cleanup-revert-<ts>.json` on apply, in CLI repo root |
| 7 | Progress UI during apply | Full progress bar + ETA + cancel; mirrors scan-progress polling pattern |
| 8 | Placement in extension | Library Review pane: add "Generate Cleanup Plan" button alongside existing Duplicate Review + Library Issues buttons |
| 9 | Persistent ignore list | Per-file `filename_hygiene_ignore: true` flag in cache; scan honors; cleared on cache rebuild |
**Filter UX:** chips not stacked groups. Single-select radio-style chips with tabular-nums count badges. Tier-tinted active state (green=cleanup, yellow=strip, purple=optional, red=conflicts).
**Implementation order (single multi-session feature):**
1. **Python backend — transform + strip functions** in `rcjav/library.py`:
- `_part_suffix_to_canonical(filename)``[1080p].2of2.wmv``#part2 [1080p].wmv`
- `_copy_suffix_to_canonical(filename, folder_filenames)``[1080p] (1).mp4``[1080p].mp4`, returns `(new_name, conflict_with_path | None)`
- `_bare_suffix_to_canonical(filename)``.450p.wmv`` [450p].wmv`
- `_strip_empty_brackets(filename)``TYOD-232 [].wmv``TYOD-232.wmv`
- `_strip_quality_suffix(filename)``CESD-325.HD.mp4``CESD-325.mp4`
- `_strip_bitrate_bracket(filename)``MXGS-672 [396m].avi``MXGS-672.avi`
- `_partN_to_canonical(filename)``_PART1.mp4`` #part1.mp4` (optional)
2. **Plan builder** `build_cleanup_plan(cache, config)`:
- Walk cache.json
- For each file matching a cleanup-eligible pattern, generate row with `old_path`, `new_path`, `transform_kind`, `conflict_with` (cache pre-check), `size`, `composite_reasons[]`
- Returns structured plan dict + grouped counts for chip badges
3. **Host RPC** in `rcjav-host.py`:
- `cleanup_plan` action → calls `build_cleanup_plan`, returns JSON
- `cleanup_apply` action → spawns worker thread (M-3 pattern: per-invocation Event + result holder), executes renames serially via existing `rename_files_batch`, writes progress to state file, auto-saves revert plan
- `cleanup_progress` action → reads state file (mirrors scan-progress)
- `cleanup_cancel` action → sets flag, finishes current rclone moveto, stops
4. **Extension UI** in `src/options/options-library-issues.js`:
- New "Generate Cleanup Plan" button in Library Review pane
- New modal `#cleanup-plan-modal` with chip-row + scrollable row list + footer
- Chip click handler filters visible rows (CSS class toggle)
- Per-row checkbox state managed in JS Set keyed by old_path
- Apply click → RPC + progress polling + result summary modal
- JSON export button → trigger download blob from current plan
5. **Cache flag** for ignore list:
- `find_library_issues` skips entries with `filename_hygiene_ignore: true`
- Modal row gains "Mark as intentional" action (sets the flag via small RPC)
6. **Smoke test before ship:**
- Run plan generation against the real cache.json snapshot
- Verify all 7 transform functions produce expected canonical names for ~10 sample inputs each
- Verify conflict pre-check correctly flags HFD-197 case
- Verify revert plan round-trips (apply 5 renames, import revert plan, re-apply, end state = start state)
**Out of scope (explicitly skipped):**
- ffprobe resolution probing (Phase 2, separate feature)
- Quality-mapping editor (4 files, not worth own UI)
- Bare-name renames (~775 files; no resolution data to act on)
- Cross-remote moves
- Mid-rclone-call cancellation (would risk corrupt remote state)
**Estimated scope:** ~700-900 lines across Python backend + extension UI. Single multi-session feature, NOT a bug fix. Deserves its own audit pass after implementation (mirror the 3-phase audit pattern done in May 2026 for reliability bugs).
(append below)
## Completed notes
- Scan progress cadence polish completed 2026-05-25: `rcjav/rclone_io.py` now decouples cancel checks from progress emission. Cancel checks run every 25 files. Progress emits on a dual gate: at least 25 files and 0.25s since the last emit, or a 1.0s heartbeat when the loop is still receiving rows. This does not solve full rclone stalls where no loop iterations occur.
- WinCatalog CSV/XML paths are normalized from `\` to `/` during catalog load.
+64
View File
@@ -0,0 +1,64 @@
# Audit Snapshot — 2026-05-24T15:55Z
## CLI repo (D:\DEV\Project\rclone-jav)
- git rev-parse HEAD: `8d6bdb81af75b5db1f9d71fce48deb0c859b462b`
- git status --short:
```
M AGENTS.md
M config.json
M rcjav/__init__.py
M rcjav/cli.py
M rcjav/library.py
?? bug-audit-plan.md
?? cache.json.bak
```
## Extension repo (D:\DEV\Extensions\Production\rclone-jav)
- git rev-parse HEAD: `0e230320a9d9a078e1414161dc69d80a8cc3f9ef`
- git status --short:
```
D CLAUDE.md.bak
M background.js
M content.js
M host/com.rcjav.host.json
M host/install-host.ps1
M host/rcjav-host.bat
M host/rcjav-host.py
M host/register-host.bat
M manifest.json
R bulk-check.css -> src/bulk-check/bulk-check.css
R bulk-check.html -> src/bulk-check/bulk-check.html
R bulk-check.js -> src/bulk-check/bulk-check.js
RM options-cache.js -> src/options/options-cache.js
RM options-diagnostics.js -> src/options/options-diagnostics.js
RM options-dupe-review.js -> src/options/options-dupe-review.js
RM options-library-issues.js -> src/options/options-library-issues.js
R options-profiles.js -> src/options/options-profiles.js
R options-rules-editors.js -> src/options/options-rules-editors.js
RM options.css -> src/options/options.css
RM options.html -> src/options/options.html
RM options.js -> src/options/options.js
R popup.css -> src/popup/popup.css
R popup.html -> src/popup/popup.html
R popup.js -> src/popup/popup.js
?? host/allowed-extension-ids.json
?? rclone-jav-library-issues-all-2026-05-24T15-52-01-714Z.json
?? src/options/options-shared.js
?? src/shared/
```
## Versions
- Extension manifest.json version: 0.1.32
- Python: 3.14.5
- Node: v24.14.0
- Brave: not captured (no manual extension verification needed during Phase 1)
## Dirty-state policy
This audit accepts dirty working trees (option b). All file:line citations reference the snapshot AS-IS at this timestamp. No file edits during Phase 1 except audit docs (per allowed-write list in bug-audit-plan.md).
## Notes on dirty state
- Extension repo has codex's roadmap refactor + later edits uncommitted. Renamed file paths (options/*, popup/*, bulk-check/*) reflect step 6/6b/6c/7a outcomes.
- Untracked `src/shared/` and `src/options/options-shared.js` are new module extractions — in audit scope.
- Untracked `rclone-jav-library-issues-all-*.json` is a user-generated export — OUT of audit scope (runtime artifact).
- CLI `cache.json.bak` is excluded per plan.
- Auditors must reference paths AS THEY EXIST ON DISK NOW, not as they appeared in git HEAD.
+66
View File
@@ -0,0 +1,66 @@
# benchmarks/
Latency benchmarks for decisions in the rc-jav roadmap.
## host-fast-path.py — Step 11 decision
The native-messaging host has an in-process fast path
(`handle_cached_search_fast` in `rcjav-host.py`) that answers simple
cached single-ID searches without shelling out to `rc-jav.py`. Step 11
asked: is the fast path actually pulling its weight, or could we
delete it / narrow it down?
### Run
```
python benchmarks/host-fast-path.py [--queries Q1 Q2 ...] [--iterations N]
```
For the "Python actively scanning" condition, start `rc-jav.py --scan`
in a separate terminal first.
### Findings (idle baseline, 2026-05-23, 7124-file cache)
```
=== aggregate (5 queries × 5 iterations, idle Python) ===
fast-path total:
n=25 min=0.43ms median=0.46ms mean=0.48ms p95=0.61ms max=0.72ms
subprocess total:
n=25 min=880.56ms median=919.45ms mean=965.55ms p95=1232.93ms max=1385.37ms
median speedup: 2000.1x
p95 speedup: 2036.0x
```
### Decision: keep the fast path
The subprocess path costs ~920ms median per query even with the
interpreter doing nothing else — that's pure Python startup +
`json.loads` of the 1.3 MB cache.json. The fast path returns hits in
under 1ms. The 2000× speedup is structural (interpreter startup
overhead), not load-dependent, so it would apply equally under (a)
idle and (b) active-scan conditions.
The fast path is already correctly scoped — it bails out for
wildcards, ranges, name searches, and `--quick` mode (which forces a
live rclone hit). Narrowing further would just push more queries
through the slow path with no upside.
The "Python actively scanning" condition listed in the original
roadmap was framed as the case where the fast path's value would be
most obvious. The idle baseline already settles it; we don't need to
gate the decision on the active-scan measurement, though running it
remains a sanity check if cache.json grows substantially.
### What this benchmark does NOT cover
- Latency from inside the browser extension (popup) to the host. Adds
Brave's native-messaging protocol overhead on top of whichever path
the host takes — but the relative difference between paths is
preserved.
- Memory cost of the in-process cache load. The fast path loads
cache.json once per call today (no caching across calls inside the
host). A future optimization is to memoize `_load_host_cache` with
mtime-based invalidation; left for follow-up if needed.
- Cold-cache effects. `cache.json` is large enough that the OS page
cache matters; numbers above reflect a warm read. First call after
a reboot may be slower for both paths but proportionally so.
+151
View File
@@ -0,0 +1,151 @@
"""Measure host fast-path vs subprocess rc-jav.py for cached single-ID search.
Step 11 of the console-consolidation roadmap asks: does the host's
`handle_cached_search_fast` actually save meaningful latency vs just
shelling out to `rc-jav.py --search ID --format json --quick`? If yes,
under what conditions (idle Python vs Python actively scanning)?
This script runs both paths N times against a set of query IDs and
reports min / median / mean / p95 / max in milliseconds.
Usage:
python benchmarks/host-fast-path.py [--queries Q1 Q2 ...] [--iterations N]
To measure (b) Python-actively-scanning, kick off a `rc-jav.py --scan` in
another terminal, then run this script while the scan runs.
The fast-path implementation is replicated inline here (not imported
from the host module) so the benchmark is self-contained.
"""
from __future__ import annotations
import argparse
import json
import statistics
import subprocess
import sys
import time
from pathlib import Path
ROOT = Path(__file__).resolve().parents[1]
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))
from rcjav.cache import load_cache # noqa: E402
from rcjav.ids import current_rules_signature, normalize_id # noqa: E402
DEFAULT_QUERIES = ["SSIS-001", "ABP-100", "FC2-1841460", "MIDD-500", "IBW-902"]
DEFAULT_ITERATIONS = 20
def fast_path_search(cache: dict, query: str) -> int:
"""Replicates handle_cached_search_fast minus the response shape.
Returns hit count. Walks every remote's files[] looking for jav_id
matching the normalized query (exact or `<id>#partN`).
"""
norm = normalize_id(query)
if not norm:
return 0
hits = 0
for remote, entry in (cache.get("remotes") or {}).items():
files = entry.get("files") or []
for item in files:
jid = item.get("jav_id", "")
if jid == norm or (isinstance(jid, str) and jid.startswith(norm + "#part")):
hits += 1
return hits
def time_fast_path(query: str, iterations: int) -> list[float]:
sig = current_rules_signature()
cache = load_cache(sig)
out: list[float] = []
for _ in range(iterations):
t0 = time.perf_counter()
fast_path_search(cache, query)
out.append((time.perf_counter() - t0) * 1000)
return out
def time_subprocess(query: str, iterations: int) -> list[float]:
cmd = [
sys.executable,
str(ROOT / "rc-jav.py"),
"--search", query,
"--cache", # force cache mode (no rclone)
"--format", "json",
"--basic", "--no-color",
]
out: list[float] = []
for _ in range(iterations):
t0 = time.perf_counter()
proc = subprocess.run(cmd, capture_output=True, text=True, encoding="utf-8", errors="replace")
out.append((time.perf_counter() - t0) * 1000)
if proc.returncode not in (0, 1): # 1 = no hits, still valid
sys.stderr.write(f"subprocess returned {proc.returncode}; stderr={proc.stderr[:200]!r}\n")
return out
def percentile(values: list[float], p: float) -> float:
if not values:
return 0.0
s = sorted(values)
k = (len(s) - 1) * p
f = int(k)
c = min(f + 1, len(s) - 1)
return s[f] + (s[c] - s[f]) * (k - f)
def summarize(label: str, values: list[float]) -> None:
if not values:
print(f" {label}: (no data)")
return
print(f" {label}:")
print(f" n={len(values)} min={min(values):.2f}ms median={statistics.median(values):.2f}ms "
f"mean={statistics.mean(values):.2f}ms p95={percentile(values, 0.95):.2f}ms max={max(values):.2f}ms")
def main() -> int:
ap = argparse.ArgumentParser(description=__doc__)
ap.add_argument("--queries", nargs="+", default=DEFAULT_QUERIES,
help=f"JAV IDs to search (default: {DEFAULT_QUERIES})")
ap.add_argument("--iterations", type=int, default=DEFAULT_ITERATIONS,
help=f"Iterations per query per path (default: {DEFAULT_ITERATIONS})")
args = ap.parse_args()
print(f"Host fast-path vs subprocess rc-jav.py benchmark")
print(f"queries: {args.queries}")
print(f"iterations per path: {args.iterations}")
print(f"cache: {ROOT / 'cache.json'}")
print()
all_fast: list[float] = []
all_sub: list[float] = []
for q in args.queries:
print(f"[{q}]")
fast = time_fast_path(q, args.iterations)
summarize("fast-path (in-process dict walk)", fast)
sub = time_subprocess(q, args.iterations)
summarize("subprocess rc-jav.py --search --cache", sub)
all_fast.extend(fast)
all_sub.extend(sub)
if fast and sub:
speedup = statistics.median(sub) / max(statistics.median(fast), 0.001)
print(f" speedup (median sub / median fast): {speedup:.1f}x")
print()
print("=== aggregate ===")
summarize("fast-path total", all_fast)
summarize("subprocess total", all_sub)
if all_fast and all_sub:
med_speedup = statistics.median(all_sub) / max(statistics.median(all_fast), 0.001)
p95_speedup = percentile(all_sub, 0.95) / max(percentile(all_fast, 0.95), 0.001)
print(f" median speedup: {med_speedup:.1f}x")
print(f" p95 speedup: {p95_speedup:.1f}x")
return 0
if __name__ == "__main__":
sys.exit(main())
+525
View File
@@ -0,0 +1,525 @@
# Bug Audit Plan — rclone-jav (Python CLI + Brave Extension)
Customized from `D:\DEV\Project\Goal\bug-audit-template.md`. Tightened for this project: scope is chunked, "bug" is narrowed, reproduction recipe is required, independent verification is enforced via fresh-context agents with bounded contract context, intentional patterns are listed only when verified against current code or current doc.
All output artifacts (per-scope `bugs-*.md` files, `bugs-candidates-*.md` scratch, `audit-snapshot-<ISO>.md`, and the final `verification.md`) live under `D:\DEV\Project\rclone-jav\`. Do NOT write audit output under `D:\DEV\Extensions\Production\rclone-jav\` (extension folder) or `D:\DEV\Project\Goal\` (template home).
---
## What counts as a bug (for THIS audit)
Include:
- **Wrong result** — code produces output that contradicts documented behavior, comment, or stated intent
- **Data loss / corruption** — cache.json, config.json, chrome.storage, or remote file content can become incorrect or lost
- **Crash / unhandled exception** — Python tracebacks, uncaught JS promise rejections that kill an operation
- **Silent failure** — operation appears to succeed but didn't (e.g. write claimed but file not changed)
- **Contract violation** — host RPC schema mismatch, manifest declaration mismatch, cache-version mismatch, fixture-driven expectation broken
- **Race condition with observable user-visible effect** — concurrent operations leading to one of the above
Exclude (out of scope for this audit — separate effort):
- Code style / formatting / linting
- Performance unless it causes timeout or hang
- Dead code / unused imports / unused variables
- Outdated comments (unless misleading enough to cause wrong-result)
- Security review (use `/security-review` instead)
- Documentation gaps (separate doc-debt pass)
- Refactor opportunities ("could be cleaner")
- Missing features → file in `TODO.md`, not `bugs.md`
Phrase findings as "every function reviewed for externally observable bugs." Internal helpers with no flow to RPC / UI / file system / network get reviewed only as part of their caller's flow, not as their own audit unit.
---
## Scope chunks (run each as separate audit pass)
Five chunks. Each gets its own `bugs-<chunk>.md` file. Do NOT batch into one giant audit — context grows, hallucinations multiply.
| # | Chunk | Files in scope | Output |
|---|---|---|---|
| 1 | **Python CLI** | `rc-jav.py` + `rcjav/*.py` + `tests/*.py` + `fixtures/run.py` (all under `D:\DEV\Project\rclone-jav\`) | `bugs-python.md` |
| 2 | **Native host** | `host\rcjav-host.py` + `host\install-host.ps1` + `host\rcjav-host.bat` + `host\register-host.bat` (under `D:\DEV\Extensions\Production\rclone-jav\`) | `bugs-host.md` |
| 3 | **Extension SW + content** | `background.js` + `content.js` + `manifest.json` (under `D:\DEV\Extensions\Production\rclone-jav\`) | `bugs-extension-bg.md` |
| 4 | **Extension Options pages** | `src\options\*` (under `D:\DEV\Extensions\Production\rclone-jav\`) | `bugs-extension-options.md` |
| 5 | **Extension Popup + Bulk Check** | `src\popup\*` + `src\bulk-check\*` (under `D:\DEV\Extensions\Production\rclone-jav\`) | `bugs-extension-popup.md` |
Tabvault extension (`D:\DEV\Extensions\Production\tabvault\`) is **out of scope** for this audit — separate project.
### Explicit per-chunk excludes
Do NOT audit (read-only-if-needed-for-context, never report findings against):
- `**/__pycache__/` — bytecode
- `**/*.bak` — historical snapshots (e.g. `CLAUDE.md.bak`, `cache.json.bak`)
- `cache.json`, `config.json` — runtime data, not code (their schema is auditable in `docs/CACHE_CONTRACT.md`)
- `benchmarks/*.py` — performance probes, not product
- `mockups/*.html` — design memory, not code
- `wincatalog/` — user data dir
- `README.md`, `TODO.md`, `AGENTS.md`, `CLAUDE.md`, `docs/*.md` — docs (separate doc-debt pass)
- `host/logs/*` — runtime logs
- `host/state/*` — runtime state
- `host/com.rcjav.host.json`, `host/allowed-extension-ids.json` — generated/runtime config
- Per-project memory under `C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\` — READ for rules, do NOT audit
---
## Required reading before audit
Auditor MUST read (and reference findings against) the following intentional-pattern docs:
- `D:\DEV\Project\rclone-jav\AGENTS.md` — Python CLI session memory, ID normalization rules, defaults
- `D:\DEV\Project\rclone-jav\CLAUDE.md` (if present)
- `D:\DEV\Project\rclone-jav\TODO.md` — deferred work that's NOT a bug
- `D:\DEV\Extensions\Production\rclone-jav\docs\CACHE_CONTRACT.md` — cache schema + ID rules versioning
- `D:\DEV\Extensions\Production\rclone-jav\AGENTS.md` — extension session memory
- `D:\DEV\Extensions\Production\rclone-jav\CLAUDE.md` (if present)
- `D:\DEV\Extensions\Production\rclone-jav\mockups\console-consolidation-claude.html` — design rationale
- `C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\*.md` — per-project memory (version bump rule, install workflow, no hollow suggestions)
If a finding contradicts an explicit decision in these docs, it's NOT a bug — it's expected behavior. Mark as `discarded — intentional per <doc:section>` in the False Positives section.
---
## Known intentional patterns (verified against current code or current doc)
Only patterns confirmed against the current snapshot belong here. If a pattern is suspected but unverified, leave it OFF this list — the auditor will surface it, the verifier will check the cited doc, and discard-as-intentional happens there. **Stale assumptions on this list are dangerous** — they actively shield real bugs in code that's been touched.
### Python CLI (verified)
- `extract_id()` chops trailing single letters from filenames intentionally (e.g. `IBW-902z``IBW-902`) — see `D:\DEV\Project\rclone-jav\AGENTS.md` "ID normalization"
- JAV IDs canonicalized to at least 3 digits but keep wider widths (`ABC-027`, `ABCDE-1167`) — not a "leading zero" bug
- `.ts` ranks lowest among video containers in dupe keep ranking — `AGENTS.md` "Defaults from earlier sessions"
- VIP folders (`ClearJAV` default) win first in dupe keep ranking — same
- Cache loading falls back to empty cache when malformed top-level — intentional resilience, `AGENTS.md` "Recent decisions"
- Scan is always recursive — old `--recursive/-R` flag was removed intentionally
- `extract_json_blob` tolerates leading status lines + trailing noise — intentional for `--basic` output parsing
### Native host (verified)
- stderr capture lives INSIDE `rcjav-host.py` via `os.dup2` (not in `rcjav-host.bat` via `2>>`) — the bat NOT redirecting stderr is the fix, not a missing-redirect bug. See comments at top of `rcjav-host.bat`.
- `__port_disconnect__` is a synthetic action name for the rolling RPC log marker — not an actual RPC handler
- `_shrink_response` called twice (once in main loop, once inside `write_message`) — defense-in-depth, intentional
- `client_req_id` is `None` for RPCs originating from rclone-jav extension (only tabvault stamps it)
- Discord webhook rate-limit uses `last-alert-ts.json` shared across host process spawns — intentional anti-storm
- Host spawns fresh per `connectNative` call from each extension — intentional Chromium behavior, not a "leak"
### Extension (verified against current files)
- `chrome.runtime.lastError` voided after several Chrome API calls — silences MV3 warning, intentional
- Native messaging 90s timeout in `nativeCall` — long enough for `--quick` on a slow remote
- `web_accessible_resources` for `src/options/options.html` and `src/bulk-check/bulk-check.html` ONLY (NOT `popup.html`) — explicit per `mockups/console-consolidation-claude.html`; popup is browser-action UI, doesn't need WAR
- Library Issues report-only kinds (`resolution_*`, `quality_marker_not_resolution`, `missing_resolution`, etc.) — user-chosen per session; not a "missing fix path" bug. Auto-rename only valid for `bracket_id` and `nohyphen_id`.
- `No ID` chip removed from sidebar; `no_id` outcomes not logged to recent activity — intentional
- Default landing pane = `dupe-review` — per mockup
- Setup pane lives in SUPPORT sidebar group — current intentional placement after earlier orphaning/restoration
- `pcLabel` empty string default — intentional, user opt-in
- 10-minute Discord webhook rate-limit — intentional anti-spam
- `mkv` / `mp4` / `wmv` / `avi` format-preference defaults — intentional KEEP-ranking order
- Default `cacheStaleHours` = 24 — display only, doesn't change search results
- `_rcjavSwInstanceId` is a fresh UUID per SW startup — used to detect SW eviction mid-call, intentional design
### Not on this list — let auditor surface (do NOT shield)
- `DEFAULT_TARGET` / `DEFAULT_SOURCE` hardcoded fallback values in `rcjav/cli.py` — these have been a regression source. Auditor checks current values vs `config.json` defaults vs `AGENTS.md` documented current state.
- `CONFIG_PATH` / `CACHE_PATH` / `CANCEL_FLAG` / `DEFAULT_CATALOG` path resolutions in `rcjav/` package — `.parent` vs `.parents[1]` has been a bug. Verify each against current package layout.
- Any other path-resolution code that uses `__file__` — same class of risk
---
## Snapshot preflight (MANDATORY — Phase 1 cannot start without it)
Before any audit chunk runs, capture `D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.md` with:
```markdown
# Audit Snapshot — <ISO timestamp>
## CLI repo (D:\DEV\Project\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
<output, or "(clean)" if no output>
## Extension repo (D:\DEV\Extensions\Production\rclone-jav)
- git rev-parse HEAD: <sha>
- git status --short:
<output, or "(clean)" if no output>
## Versions
- Extension manifest.json version: <X.Y.Z>
- Python: <python --version output>
- Node: <node --version output, for fixture runner>
- Brave: <version, if extension manual verification will be needed>
## Dirty-state policy
This audit accepts dirty working trees (option b). All file:line citations reference the snapshot AS-IS at this timestamp. No file edits during Phase 1 except audit docs (allowed-write list below).
```
Every `bugs-*.md` file MUST cite this snapshot ID in its header. If files change during audit, restart from a new snapshot.
---
## Phase 1 allowed-write list (explicit)
During Phase 1 (audit), the ONLY files that may be created or modified are:
- `D:\DEV\Project\rclone-jav\audit-snapshot-<ISO>.md`
- `D:\DEV\Project\rclone-jav\bugs-candidates-<chunk>.md`
- `D:\DEV\Project\rclone-jav\bugs-<chunk>.md`
Any other write = audit violation. Restart the chunk from snapshot.
---
## bugs-candidates-<chunk>.md format (Phase 1 scratch)
This is the auditor's scratch space. Hedge language permitted here (and ONLY here). Theories, speculation, "this looks wrong" go in candidates first.
```markdown
# Candidate Findings — <chunk> — <snapshot ID>
## Candidate C-1
- File: <path:line>
- Hunch: <one sentence, hedge language OK>
- Trace: <what code path led here>
- Question for verifier: <specific yes/no claim to verify>
- Contract refs needed: <list of doc paths verifier should read, or "none">
## Candidate C-2
...
```
Only CONFIRMED or PARTIAL candidates from verifier get promoted into `bugs-<chunk>.md`. REFUTED or NEEDS-INFO stay in candidates with verifier's response appended.
After Phase 1 chunk completes: `bugs-candidates-<chunk>.md` stays beside `bugs-<chunk>.md`. Optional archive under `D:\DEV\Project\rclone-jav\audits\<date>\` — operator choice, not enforced.
---
## bugs-<chunk>.md format (confirmed only)
```markdown
# Bug Report — <chunk name> — <snapshot ID>
Snapshot: audit-snapshot-<ISO>.md
Required-reading docs read: [Y for each in list above]
Auditor agent: <type / fresh context confirmed Y/N>
---
## Severe (S)
Definition: data loss, crash, silent wrong result, contract violation that breaks user workflow.
### S-1
- **File:** `<absolute path>:<line>` (single line OR `:<start>-<end>` range)
- **Symptom (one sentence):** what the user / caller observes
- **Why it's a bug:** concrete reason citing the contract / doc / comment it violates. NO hedge language: "could", "might", "potentially", "in theory", "may cause", "possibly" — if you can't trace it concretely, demote to N or discard.
- **Reproduction:**
1. Input or state: `<exact value / command / RPC payload>`
2. Expected: `<what doc / comment / contract says should happen>`
3. Actual: `<what code actually does, traced through>`
- **Suggested fix sketch (optional, one-liner):** NOT to be implemented in audit phase
- **Verifier agent:** `<identifier, must be fresh-context>`
- **Verifier verdict:** CONFIRMED / PARTIAL (with revised repro)
- **Verifier confidence:** high / medium / low — low requires re-verification with different agent
- **Contract refs verifier read:** `<list>`
- **Mirror check needed in:** `<other chunk/file/RPC/schema if finding crosses a contract boundary, else "none">`
- **Status:** open
---
## Moderate (M)
Definition: degraded but observable behavior, recoverable error path missing, edge case mishandled.
<same field set>
---
## Light (L)
Definition: misleading log / error message, dev-only annoyance, minor input-validation gap.
<same field set>
---
## Needs Input (N)
Definition: looks suspicious but requires user / spec clarification before classifying.
### N-1
- **File:** ...
- **Question:** what specifically needs clarification
- **Why blocked:** what doc would resolve it but doesn't exist or is ambiguous
- **Status:** needs-input
---
## False Positives (discarded)
- `<file>:<line>` — initially flagged as `<what>`; discarded because `<reason, citing doc:section>`
```
---
## Cross-chunk mirror check (narrowly scoped)
Mirror check fires ONLY when a confirmed bug crosses a contract boundary. Contract boundaries:
- **Cache schema** (`docs/CACHE_CONTRACT.md`)
- **Host RPC payload/response shape**
- **Settings schema** (chrome.storage.sync.settings ↔ host alerts-config.json)
- **ID normalization rules** shared between extension's `id-extract.js` and host's `host_normalize_id` and Python's `rcjav/ids.py`
- **Fixture corpus expectations** (Python + Node consumers in `fixtures/`)
When a bug entry hits one of those, add:
```
Mirror check needed in: <specific file/RPC/schema>
```
Default (no contract boundary touched) = no mirror check. Avoids spawning vague secondary audits.
Final verification (Phase 3) scans every confirmed bug for `Mirror check needed in:` and runs the requested check.
---
## PHASE 1 — AUDIT
### Per-chunk goal
```
/goal bugs-<chunk>.md exists in D:\DEV\Project\rclone-jav\, cites audit-snapshot-<ISO>.md, contains every file in scope chunk <N> reviewed for externally observable bugs, each bug has exact file:line citation, each bug has reproduction recipe (input/expected/actual), each bug verified by a fresh-context independent agent reading only cited contract docs, intentional patterns from "Known intentional patterns" list NOT flagged, no hedge language in confirmed bugs, bugs ranked S/M/L/N, mirror check noted where contract boundary touched, zero code changes made
```
Run the goal **once per chunk** (5 runs total). Do not batch.
### Verifier protocol
For each candidate promoted from `bugs-candidates-<chunk>.md`, spawn a NEW agent (fresh context, no audit-history visibility) with this exact framing:
```
Read <file>:<line> and the surrounding function ONLY. The claim is: <symptom>.
The supposed reproduction is: input <X>, expected <Y>, actual <Z>.
Contract refs to read before judging: <list from candidate, max 3 docs>.
Reply with one of:
CONFIRMED — bug is real, repro matches
PARTIAL — symptom real, repro doesn't match exactly, suggest revised repro
REFUTED — code does <Z'> not <Z>; here's the trace
NEEDS-INFO — can't verify without <X>
```
Verifier MUST NOT see:
- Auditor's reasoning beyond the symptom/repro claim
- Other candidates in this chunk
- Other confirmed bugs in this or any other chunk
- Audit-internal memory or chat history
Otherwise it's a context-correlated rubber stamp, not independent verification.
### Stop conditions per chunk
Restart the chunk with tighter framing if:
- Verifier rejects > **30%** of confirmed-candidate attempts → "what counts as a bug" threshold is too loose
- Candidate count exceeds **50 in one chunk** → scope too broad, split it
- Auditor produces a finding flagged by an Intentional Pattern → re-read this doc
---
## PHASE 2 — FIX LOOP
One bug at a time, starting at S-1 of the highest-priority chunk, then M-1, then L-1. Skip N (needs-input) until user resolves.
### Per-bug goal
```
/goal <BUG-ID> in <bugs-chunk.md> is marked "fixed", the fix is applied at the cited file:line, the bug's reproduction recipe now returns Expected not Actual, no other bugs.md entries were changed, no unrelated code was modified, any tests covering the affected code still pass (or new test added if none existed), version bump applied if extension files touched
```
Replace `<BUG-ID>` with the actual ID (e.g. `S-1`).
### Fix verification gate
Before marking `status: fixed`:
1. **Re-run the bug's reproduction recipe** — must now produce Expected, not Actual
2. **Per-file test re-run:** if `tests/` or `fixtures/` cover the affected file, re-run them, all must pass
3. **If no test existed for the now-fixed behavior:** write one, place under `tests/` or `fixtures/`
4. **If extension code changed:** bump `manifest.json` version (per `feedback_extension_version_bump.md` — one bump per user-requested update, visible reload-verification signal)
5. **Do NOT touch:** any other bug entry, any file marked DO NOT FIX in code comments, any intentional pattern listed above
6. Update the bug entry with `Status: fixed` and a `Fix:` line citing the new file:line of the change
### After completing all fixes in a chunk
Run the chunk's **full test suite**, not just per-file tests. Catches cross-bug interactions (e.g. fix for S-1 in `rcjav/cache.py` interacts with fix for M-2 in `rcjav/dupes.py`).
---
## PHASE 3 — FINAL VERIFICATION
```
/goal all bugs in bugs-*.md files under D:\DEV\Project\rclone-jav\ are marked "fixed", "skipped" (with reason), or "needs-input" (awaiting user); D:\DEV\Project\rclone-jav\verification.md exists confirming a final audit of every modified file finds no new bugs introduced by the fixes; verification.md lists each fixed BUG-ID + its commit/edit and the repro-now-passes proof; every "Mirror check needed in:" entry resolved (either no mirror bug found, or new bug filed in target chunk); manifest.json version is incremented appropriately
```
### verification.md format
```markdown
# Verification — <ISO date>
Original snapshot: audit-snapshot-<ISO>.md
Final snapshot: audit-snapshot-<final ISO>.md
## Fix summary
- S-1 (bugs-python.md): fixed at <file:line>. Repro now returns Expected (was Actual). Test added: <test path>.
- M-1 (bugs-extension-bg.md): fixed at <file:line>. Existing test <name> still passes.
- ...
## Mirror checks resolved
- S-3 mirror in bugs-host.md: scanned `handle_search` for same contract issue, NOT present.
- M-2 mirror in bugs-python.md: FOUND same issue → filed as M-7 in bugs-python.md, fixed at <file:line>.
## Skipped
- L-3 (bugs-host.md): skipped — `<reason>` (e.g. user decision, deferred to next audit)
## Needs input
- N-1 (bugs-extension-options.md): awaiting user clarification on <question>
## Final pass
- Files modified during fix phase: <list>
- Independent re-audit of those files: <date>, <verifier agent>, found 0 new bugs / found <N> new bugs (back to PHASE 1)
- All `bugs-*.md` files: zero entries with status `open`
- Extension manifest.json: version <X> → <Y> (bumped per shipped change)
- All existing tests pass: <test runner output summary>
- Fixture corpus runs: <Python runner + Node runner exit codes>
```
---
## ANTI-HALLUCINATION RULES (enforced — not optional)
1. **No bug without file:line** — line range only acceptable if symptom is genuinely multi-line
2. **No bug without reproduction recipe** with concrete input / expected / actual
3. **Verifier MUST be fresh-context** — same agent re-reading the claim is not independent
4. **Verifier reads only cited contract docs**, not the whole project memory pile — bounded context preserves independence
5. **One bug per fix session** — no batch fixes even for "obviously similar" findings
6. **DO NOT FIX banners + intentional patterns are untouchable** — listed in this doc + AGENTS.md / mockups
7. **Severity is criteria-based, not vibes-based** — Severe = data loss/crash/silent-wrong; Moderate = degraded observable; Light = misleading message / minor
8. **Forbidden hedge language in confirmed bugs:** "could be", "might", "potentially", "in theory", "may cause", "possibly". If you can't trace it concretely, demote to Needs Input or candidate scratch.
9. **No speculative race conditions** — race must have observable user-visible repro recipe, not just "concurrent code path exists"
10. **Reference contracts, not preferences** — bugs cite what code SHOULD do per a doc/comment/test, not what auditor thinks would be nicer
11. **No bug for missing feature** — that's a TODO, goes in `TODO.md` not `bugs.md`
12. **Phase 1 is read-only except audit docs** — see allowed-write list above
---
## Final-pass readability checklist (run before any audit)
Before Phase 1 starts, re-read this doc and verify:
1. Every "intentional pattern" line has been verified against current code OR cites a current doc that exists right now
2. Any old memory/session claim that conflicts with current files has been removed or softened
3. Phase 1 allowed-write list is explicit and current
4. Candidates clearly separated from confirmed bugs (different files, different formats)
5. Verifier prompt includes `contract_refs:` and does NOT include auditor reasoning
6. Stop conditions are present (30% rejection, 50 candidates)
7. Mirror check scope is narrowly defined (contract boundaries only)
8. Excluded paths are current (no missing dirs, no dead refs)
If any check fails, fix this doc before starting audit.
---
## NOTES
- Run audit goals from the CLI project root: `cd D:\DEV\Project\rclone-jav && claude` — even when auditing extension files, output stays in this folder
- Extension folder and CLI folder are separate git repos — verify with `git status` in each before audit so you're auditing a known snapshot
- Per-project memory at `C:\Users\admin\.claude\projects\D--DEV-Project-rclone-jav\memory\` carries feedback rules — read those at audit start, they override default audit behavior
- The extension repo currently has uncommitted modifications (hybrid state from codex's roadmap work + later edits). Snapshot captures this state; option (b) accepts dirty + records what was dirty. No auto-stash.
---
## Appendix — Recommended agent topology (Claude Code / multi-agent runners)
This appendix is OPTIONAL — the plan above is portable to any `/goal`-style runner. If you're running it in Claude Code or a similar multi-agent tool, this section describes how to map the independence + parallelism requirements onto explicit agent calls. Operators using a different runner can ignore this appendix without losing the plan's structure.
### Role map
**Main Coordinator** (the session you start the audit from)
- Owns the snapshot file (`audit-snapshot-<ISO>.md`)
- Launches Chunk Auditor agents (parallel allowed)
- Collects produced `bugs-candidates-<chunk>.md` files
- Launches Verifier agents per candidate (or small batch)
- Promotes CONFIRMED / PARTIAL findings into `bugs-<chunk>.md`
- Drives Phase 2 fix loop one bug at a time
- Launches Final Re-Audit agents in Phase 3
- The only role with write access to multiple files
**Chunk Auditor Agents** (one per scope chunk)
- Canonical agent type: `Explore` (read-only, fast)
- Parallel allowed once snapshot is written
- Inputs: chunk file list, snapshot ID, required-reading docs, this plan's "Known intentional patterns" + "Not on this list — let auditor surface" sections
- Output: `bugs-candidates-<chunk>.md` ONLY (no confirmed-bug writes; coordinator promotes)
- Must cite file:line + candidate repro; hedge language permitted in candidates
- **Must NOT:** edit product code, edit another chunk's candidate file, write to confirmed bug files
**Verifier Agents** (fresh context per candidate, or small candidate batch from same file)
- Canonical agent type: `Explore` (read-only, blind)
- Fresh context — NO prior audit-history visibility
- Inputs (and ONLY these):
- `file:line` of the claim
- Symptom (one sentence)
- Reproduction recipe
- `contract_refs:` list (max 3 docs)
- **Must NOT see:** auditor reasoning, the candidate file as a whole, other candidates, other chunks' findings, this plan's hedge-language rules (verifier only verifies the specific claim)
- Output: one of `CONFIRMED` / `PARTIAL` (with revised repro) / `REFUTED` (with code trace) / `NEEDS-INFO` (with what's missing)
**Fix Phase Agent** (Phase 2)
- Canonical agent type: main coordinator context OR a single write-capable `general-purpose` agent
- Serial — one bug at a time
- No parallel fixes even for "obviously similar" bugs
- Inputs: the one bug entry being fixed, full file context, project memory
- Outputs: code edits, bug entry status update, test additions if needed
- Re-runs the bug's repro recipe and per-file tests before marking fixed
**Final Re-Audit Agents** (Phase 3)
- Canonical agent type: `Explore` (read-only)
- One per modified-file group or per chunk that had fixes
- Inputs: list of files modified during Phase 2, this plan
- Output: confirmation of no new bugs introduced, OR new bug entries if found (which loop back to Phase 1)
### File-ownership rules (prevent merge collisions)
- Each Chunk Auditor owns ONLY its own `bugs-candidates-<chunk>.md`
- Each Verifier writes nothing to disk — returns a structured response to the coordinator
- Coordinator owns `bugs-<chunk>.md`, `audit-snapshot-<ISO>.md`, and `verification.md`
- Fix Phase Agent owns the code files being edited + the bug entry being marked fixed
- No two agents share write access to the same file at any time
### Parallelism rules
- **Phase 1:** chunks may be audited in parallel ONLY after the snapshot is written. Parallel auditors must not edit product code or each other's output files. Coordinator dispatches all 5 chunk Agent calls in a single message for max throughput.
- **Verifier dispatch:** within a chunk, verifiers for distinct candidates may run in parallel. Verifiers for candidates that cite the SAME file must run sequentially (avoids verifier-context cross-contamination if a verifier loads file context that affects another).
- **Phase 2:** strictly serial. One bug per Agent call. No parallelism.
- **Phase 3:** re-audit agents may run in parallel by file group.
### Canonical Agent tool calls (Claude Code specific)
Coordinator-level pseudocode:
```
# Phase 1 — parallel chunk audit
Agent(subagent_type="Explore", description="Audit chunk 1 Python CLI",
prompt="<chunk 1 inputs + this plan's required reading + intentional patterns + output target>")
Agent(subagent_type="Explore", description="Audit chunk 2 native host", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 3 ext SW+content", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 4 ext options", prompt="<...>")
Agent(subagent_type="Explore", description="Audit chunk 5 ext popup+bulk", prompt="<...>")
# all 5 dispatched in one message → run in parallel
# Phase 1 — verifier per candidate
for candidate in bugs-candidates-<chunk>.md:
Agent(subagent_type="Explore", description=f"Verify {candidate.id}",
prompt="<file:line + symptom + repro + contract_refs ONLY — no auditor reasoning>")
# Phase 2 — serial fix loop
for bug in confirmed_bugs_sorted_by_severity:
Agent(subagent_type="general-purpose", description=f"Fix {bug.id}",
prompt="<single bug entry + repro + verification gate rules>")
# wait for completion, verify repro now passes, mark fixed
# Phase 3 — final re-audit
for modified_file_group in fix_phase_diff:
Agent(subagent_type="Explore", description=f"Re-audit {group}", prompt="<...>")
```
### Anti-correlation rules (preserve verifier independence)
- Coordinator must NOT pass auditor reasoning to verifier — only the structured claim
- Coordinator must NOT pass the candidate file's full text to verifier — only the one candidate's fields
- Each verifier call is a fresh `Agent` invocation — never reuse a verifier agent across candidates
- If a verifier rejects a claim, do NOT immediately re-verify with another agent hoping for CONFIRMED — that's correlation-chasing. Demote the candidate to REFUTED, log in candidates file, move on.
- Track verifier rejection rate per chunk (see Stop Conditions). If rejection >30%, the auditor's threshold is wrong, not the verifiers'.
+158
View File
@@ -0,0 +1,158 @@
# Candidate Findings — Extension SW + content + manifest — audit-snapshot-2026-05-24T15-55Z.md
Scope: background.js + content.js + manifest.json
Required-reading: ext AGENTS.md / mockup / bug-audit-plan.md / project memory
Auditor: fresh Explore agent (read-only Phase 1)
---
## Candidate C-1: Race condition in maybeNotifyHostError rate-limiting
- File: background.js:188-193
- Hunch: Concurrent recordRpc calls could trigger multiple notifications within 10min due to get-then-set race.
- Trace: Two simultaneous host errors invoke recordRpc() → maybeNotifyHostError(). Both read HOST_ALERT_KEY before either writes. Both pass the now-lastTs check and both post.
- Question for verifier: Does code guarantee only one alert fires per 10min window under concurrent error paths?
- Contract refs needed: Race condition definition in bug-audit-plan.md; Chrome storage.local atomicity
---
## Candidate C-2: pending Map orphaned on SW eviction mid-call
- File: background.js:90, 124-148, 307-365
- Hunch: If SW evicts between request send and response, next instance has empty pending Map. Response arrives with no matching req_id.
- Trace: Send stores pending.set(reqId, {resolve, reject}). SW evicts. New SW has empty pending. Response at line 124-127 finds no match, dropped.
- Question for verifier: Does keepalive (20s pulse) reliably prevent SW eviction during full 90s timeout on slow remote?
- Contract refs needed: MV3 SW eviction timing vs NATIVE_CALL_TIMEOUT_MS (90s)
---
## Candidate C-3: mergeSettings shallow-merge loses missing nested keys
- File: background.js:62-76
- Hunch: Deep-merge is one level, but if stored.triggers is incomplete, Object.assign(dv, sv) loses keys not in sv.
- Trace: dv={autoPageLoad:true, autoKnownSites:false, …7 keys}. sv={autoPageLoad:true}. Result is only {autoPageLoad:true}.
- Question for verifier: Does incomplete settings blob correctly populate all missing triggers keys from defaults?
- Contract refs needed: DEFAULT_SETTINGS.triggers shape vs loaded partial settings
---
## Candidate C-4: Discord webhook URL regex insufficient
- File: background.js:232
- Hunch: Regex validates only schema+domain, not mandatory ID. URL https://discord.com/api/webhooks/ (no ID) passes validation.
- Trace: /^https:\/\/(?:discord\.com|discordapp\.com)\/api\/webhooks\//.test(url) matches prefix only.
- Question for verifier: Should regex enforce numeric ID after /api/webhooks/?
- Contract refs needed: Discord webhook URL format spec
---
## Candidate C-5: postDiscordAlert silently swallows all errors
- File: background.js:215, 268-272
- Hunch: .catch(() => {}) suppresses Discord errors with no logging or diagnostics visibility.
- Trace: Line 215 swallows all exceptions. Function catches fetch errors at 268 and records in lastDiscordSend, but callers don't see them.
- Question for verifier: Should Discord errors be logged to native RPC log or is silent swallow intentional?
- Contract refs needed: AGENTS.md or alerting design docs
---
## Candidate C-6: contextMenu handler doesn't validate tab.id
- File: background.js:895-905
- Hunch: Handler checks if (!tab) but not if tab.id is null. Missing tab.id will fail silently in checkTab.
- Trace: Line 896 checks tab, not tab.id. Line 898 calls checkTab(tab) which calls extractIdFromTab(tab) which calls chrome.tabs.sendMessage(tab.id).
- Question for verifier: Can contextMenus.onClicked pass tab with null id?
- Contract refs needed: Chrome contextMenus API contract
---
## Candidate C-12: FC2 ID regex minimum 4 digits too strict
- File: src/shared/id-extract.js:16-20
- Hunch: FC2-PPV normalizer requires 4+ digits. Pages with FC2-PPV-123 silently fail to match.
- Trace: Line 17 regex /\bFC2-?PPV-?(\d{4,})\b/i. Line 19 /\bFC2-(\d{4,})\b/i. Both need 4+ digits minimum.
- Question for verifier: Is 4-digit minimum intentional or should it be 3+?
- Contract refs needed: Real FC2 ID formats in AGENTS.md
---
## Candidate C-15: recordRpc read-modify-write race loses entries
- File: background.js:162-169
- Hunch: Concurrent recordRpc calls uncoordinated. Two errors get same old log, prepend differently, second write overwrites first.
- Trace: get(NATIVE_LOG_KEY), prepend entry, set. If two calls concurrent, second set overwrites first's entry.
- Question for verifier: Does Chrome storage.local.set serialize, or can concurrent calls lose entries?
- Contract refs needed: Chrome storage.local atomicity guarantees
---
## Candidate C-19: ensureContextMenu not called on SW init
- File: background.js:766-782
- Hunch: Context menu may not recreate after SW eviction if only called on settings changes.
- Trace: ensureContextMenu defined at line 766. Need to verify it's called at SW startup.
- Question for verifier: Is ensureContextMenu called during SW initialization?
- Contract refs needed: MV3 context menu persistence after SW eviction
---
## Candidate C-20: escapeOverlay function not found in content.js
- File: content.js (various lines use escapeOverlay but definition not visible)
- Hunch: showOverlay calls escapeOverlay at multiple points but function is not defined in the 509-line content.js.
- Trace: Lines 374-400 call escapeOverlay(...) but no definition found.
- Question for verifier: Is escapeOverlay defined in content.js or missing?
- Contract refs needed: content.js full file review; module dependencies
---
## Additional light candidates (lower priority, but noted):
C-7: nativePort assignment race (theory-level, JS atomic in practice)
C-9: Silent catch in recordActivity (observability trade-off)
C-11: Content message validation (trusted sender, acceptable design trade-off)
C-14: keepaliveTimer stale reference (uncommon race, low impact)
C-18: badgeSpinners leak under heavy load (unlikely, has onRemoved handler)
---
## Triage Summary
**High-priority for verification**: C-1, C-2, C-3, C-4, C-5, C-6, C-12, C-15, C-19, C-20
**Medium-priority (design review)**: C-7, C-9, C-11, C-14, C-18
**Focus areas for verifier**:
1. Concurrency safety of storage get-then-set patterns
2. Service worker eviction + pending request handling
3. Settings merge correctness with partial updates
4. Input validation in all entry points
---
## VERIFIER NOTES (appended after Phase 1 verification)
### C-2 (SW eviction + orphaned pending) — REFUTED
First verifier returned CONFIRMED, but did not consult Chrome `connectNative` keepalive contract.
Re-verifier (with explicit contract ref to https://developer.chrome.com/docs/extensions/develop/concepts/service-workers/lifecycle) returned REFUTED, high confidence.
Key finding: an open `connectNative` port keeps the MV3 SW alive per docs. If port closes, `onDisconnect` fires and rejects all pending (background.js:139). The orphaned-pending-Map scenario cannot occur under documented Chrome contract. The in-code `pulseKeepalive` is defensive redundancy, not load-bearing.
Caveat: if Brave is observed diverging from this contract, the symptom could manifest as a Brave-specific bug — would need a Brave-observed SW restart trace while a native port stayed active. NOT verified here.
Final status: REFUTED. Removed from bugs-extension-bg.md.
### C-15 (recordRpc race) — CONFIRMED
Verifier returned CONFIRMED, high confidence. Promoted to bugs-extension-bg.md as S-1.
### CHUNK 3 MODERATE VERIFICATION RESULTS (after stricter prompt)
- C-1 (maybeNotifyHostError rate-limit race) — CONFIRMED, M, promoted as M-1 in bugs-extension-bg.md
- C-3 (mergeSettings shallow-merge) — REFUTED. Auditor misread Object.assign arg order; defaults fill missing keys correctly. Discarded.
- C-5 (Discord errors swallowed) — PARTIAL, demoted M→L. lastDiscordSend storage write present; only passive UI display missing. Promoted as L-1.
- C-6 (contextMenu tab.id null) — REFUTED. Chrome contract guarantees non-null tab.id for registered contexts. extractIdFromTab also has defensive null check. Discarded.
- C-19 (ensureContextMenu post-eviction) — CONFIRMED, M (very high confidence). Promoted as M-2.
- C-20 (escapeOverlay undefined) — REFUTED. Function defined at content.js:451. Auditor missed it. Discarded.
CHUNK 3 CALIBRATION SUMMARY:
- Severe rejection: 1/2 = 50%
- Moderate rejection: 3/6 = 50%
- Combined: 4/8 = 50%
- Stop condition (>30% rejection) TRIGGERED. Chunk 3 audit should be restarted with tighter framing OR Light candidates should NOT be verified per current pass.
- Calibration learning: auditor over-claimed by reading code in isolation without checking platform API contracts (Chrome lifecycle, contextMenus, storage atomicity). Stricter verifier prompt with explicit contract requirement caught 3 of 4 false positives.
+151
View File
@@ -0,0 +1,151 @@
# Candidate Findings — Extension Options pages — audit-snapshot-2026-05-24T15-55Z.md
Scope: `src/options/*` (+ `src/shared/*` if referenced)
Required-reading: ext AGENTS.md / mockup / bug-audit-plan.md / project memory
Auditor: fresh Explore agent
---
## Candidate C-1
- **File:** `src/options/options.js:492-525` (SETTINGS_SCHEMA definition)
- **Hunch:** SETTINGS_SCHEMA includes all load/save keys; no asymmetry.
- **Trace:** Lines 191-192 in `load()` read `settings.siteAdapters` and `settings.idNormalizers`, and lines 234-235 in `save()` write them back. SETTINGS_SCHEMA at lines 518, 520 includes both keys. Schema is complete.
- **Question for verifier:** Confirm SETTINGS_SCHEMA includes all keys that load/save cycle uses.
- **Contract refs needed:** none
---
## Candidate C-2
- **File:** `src/options/options-dupe-review.js:561-628` (loadKeepRanking and save)
- **Hunch:** `loadKeepRanking()` runs once at module load (line 628), not when pane is activated. External changes to keep-ranking won't appear until page reload.
- **Trace:** Line 628 calls loadKeepRanking() at module top level. No listener on pane activation (options.js line 37) calls it again. If another tab changes keep-ranking via RPC, this pane won't refresh until reload.
- **Question for verifier:** Should pane activation re-fetch keep-ranking to catch external changes?
- **Contract refs needed:** none
---
## Candidate C-3
- **File:** `src/options/options.js:110-122` (openModal / closeModal)
- **Hunch:** No coordination between modals. Two rapid opens could show two modals simultaneously.
- **Trace:** `openModal(id)` adds "open" class and sets aria-hidden=false. No check if another modal is already open. Multiple modals could be marked "open" at the same time.
- **Question for verifier:** Should only one modal be visible at a time, or is simultaneous open allowed?
- **Contract refs needed:** none
---
## Candidate C-4
- **File:** `src/options/options.js:300` (setNote function)
- **Hunch:** `setNote()` calls `el.innerHTML = html` without sanitization. Assumes caller sanitizes before passing.
- **Trace:** All current callers (lines 308337) build HTML with escapeHtml() before inserting. So current usage is safe. But setNote() is not a safe setter — it's a raw innerHTML setter.
- **Question for verifier:** Is setNote intended as a safe reescaper, or a raw setter expecting pre-sanitized input?
- **Contract refs needed:** none
---
## Candidate C-5
- **File:** `src/options/options.js:533-546` (sanitizeImportedSettings)
- **Hunch:** Array elements are not recursively validated. Example: `siteAdapters: [{ host: 123, selector: [] }]` would pass because the outer type is "array".
- **Trace:** SETTINGS_SCHEMA checks outer type (line 542) but not inner element types. Comment at line 491 says nested objects get recursive validation, but code doesn't implement it for arrays.
- **Question for verifier:** Should imported arrays validate element types, or is current lenient behavior acceptable?
- **Contract refs needed:** none
---
## Candidate C-6
- **File:** `src/options/options.js:210-286` (save function)
- **Hunch:** Save persists to chrome.storage but messages to background.js are fire-and-forget. No confirmation that background applied the settings.
- **Trace:** Line 256 awaits chrome.storage.sync (safe). Lines 261, 278 send messages without waiting for response. If background crashes, settings persist but running extension uses stale config.
- **Question for verifier:** Should save() wait for background.js acknowledgment?
- **Contract refs needed:** none
---
## Candidate C-7
- **File:** `src/options/options-cache.js:113-137` (renderCacheContractBanner)
- **Hunch:** Unrecognized cache_state values silently return empty string instead of showing error.
- **Trace:** Lines 118, 121, 131 test cache_state against known literals. If host sends unexpected state (e.g., `"unknown_state"`), no if matches and line 136 returns "". No error banner shown.
- **Question for verifier:** Should unrecognized cache_state trigger an error banner?
- **Contract refs needed:** none
---
## Candidate C-8
- **File:** `src/options/options.js:386-416` (export/import keep-ranking)
- **Hunch:** Export fails silently if keep-ranking RPC fails. Exported file has empty `_meta.host_config.keep_ranking`. On import, user gets warning but import proceeds anyway. Asymmetric: backup loses data without obvious indication.
- **Trace:** Lines 391-395 try-catch the RPC silently. If it fails, hostConfig.keep_ranking is never set. Export completes anyway (line 405-414). On import, user sees warning at line 560 but can proceed. Keep-ranking is lost in backup/restore cycle.
- **Question for verifier:** Should export fail or warn prominently if keep-ranking cannot be fetched?
- **Contract refs needed:** none
---
## Candidate C-9
- **File:** `src/options/options.js:351-372` (delete-enable-modal flow)
- **Hunch:** If user checks enableDelete and navigates away before confirming modal, box stays unchecked with no way to retry except page reload.
- **Trace:** Line 352-353 check→uncheck→open-modal. If user closes modal without confirming, checkbox is false and no path to re-open the modal exists. Would need page reload or clicking box again.
- **Question for verifier:** Should modal be re-openable from checked state without page reload?
- **Contract refs needed:** none
---
## Candidate C-10
- **File:** `src/options/options-library-issues.js:134-143` (makeReportRow rendering)
- **Hunch:** `entry.path.split("/")` could throw if entry.path is null or not a string.
- **Trace:** Line 134 `fname = entry.filename || entry.path.split("/").pop()`. If entry.filename is falsy and entry.path is null, .split() throws. Entry.path comes from RPC; malformed response could crash render.
- **Question for verifier:** Should there be null/type check for entry.path before .split()?
- **Contract refs needed:** none
---
## Candidate C-11
- **File:** `src/options/options.js:374-382` (input/change listeners)
- **Hunch:** Event listeners call updateSectionSummaries() which reads from DOM. If multiple panes render simultaneously and input fires from hidden pane, stale element reads could occur.
- **Trace:** Lines 374-375 listen for "input" and "change" on panes. Delegated check at line 378 ensures only active pane events fire, but race condition possible if multiple panes render concurrently.
- **Question for verifier:** Is there a render race if load() initializes all panes and events fire before page ready?
- **Contract refs needed:** none
---
## Candidate C-12
- **File:** `src/options/options-library-issues.js:120-131` (makeRow - library issues)
- **Hunch:** The makeRow function for bracket_id and nohyphen_id rows sets data-issue at line 123, which is later checked by _canRenameIdFixRow() line 60. If makeRow is called with malformed entry data, the row might be created but with missing data attributes, making it silently non-renamable.
- **Trace:** Line 120-131 creates row HTML and sets data-issue to entry.issue at line 123. The entry.issue comes from the response. If rendering bracket entries with undefined entry.issue, the row would be created but unclickable (no rename).
- **Question for verifier:** Should missing entry.issue in bracket/nohyphen entries trigger an error, or is silent disable acceptable?
- **Contract refs needed:** none
---
## Summary Stats
- **Total candidates:** 12
- **Severity breakdown:** L (7), M (4), N (1)
- **Areas affected:** stale state (1), modal visibility (1), HTML safety (2), validation (2), error handling (3), RPC failures (1), null safety (1), concurrency (1), missing attributes (1)
---
## VERIFIER NOTES (Phase 1 Moderate verification, stricter prompt + UI-inconvenience rule + storage-quota awareness)
- C-5 (array element validation) — CONFIRMED M. Auditor right; downstream crash in tryAdapters confirmed via content.js read. Promoted as M-1.
- C-6 (save fire-and-forget) — REFUTED. getSettings reads fresh from storage every call; no in-memory cache to invalidate. sendMessage wakes SW per MV3 spec.
- C-8 (export silent fail) — CONFIRMED M, high confidence. Backup→restore cycle silently loses user-typed config. Promoted as M-2.
- C-10 (entry.path crash) — REFUTED. Host's _cache_entry contract guarantees path always non-null string. Unreachable in normal operation.
CHUNK 4 CALIBRATION:
- Severe: 0 (none flagged)
- Moderate rejection: 2/4 = 50% (stop condition >30% triggered)
- Combined: 2/4 = 50%
- Auditor weaknesses: (1) flagging fire-and-forget message patterns without checking if downstream caches, (2) ignoring host-side schema contracts that prevent null/malformed data reaching JS
- L candidates NOT verified per stop condition. Revisit only if needed.
+97
View File
@@ -0,0 +1,97 @@
# Candidate Findings — Extension Popup + Bulk Check — audit-snapshot-2026-05-24T15-55Z.md
Scope: src/popup/* + src/bulk-check/* + src/shared/id-extract.js
Required-reading: ext AGENTS.md / bug-audit-plan.md
Auditor: fresh Explore agent (Phase 1 audit)
## Candidate C-1: Popup closes before open-bulk-check message completes
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:563-565`
- **Hunch:** Popup window closes immediately after sending `open-bulk-check` message without callback, risking message loss if popup is killed before IPC serialization.
- **Trace:** Line 564 sends message with NO callback, line 565 immediately closes popup.
- **Question for verifier:** Does Chrome guarantee sendMessage() is queued before window.close() returns?
- **Contract refs needed:** Chrome Runtime API message passing guarantees
## Candidate C-2: Delete confirmation allows stale expectedId from prior modal session
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:318-372`
- **Hunch:** Global `expectedId` set when hit is selected. If modal closes without selection and reopens, old `expectedId` persists and could allow false-positive match.
- **Trace:** `expectedId` is global line 316, set in selectHit() line 353, but NOT reset in openDeleteModal().
- **Question for verifier:** Can modal open without selecting a hit, leaving stale `expectedId` that allows unintended validation?
- **Contract refs needed:** Modal state management
## Candidate C-3: Manual search captures t0 before SW eviction, causing misleading timings
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:452-470`
- **Hunch:** `t0 = performance.now()` captured at send time. If SW evicted/restarted during 90s timeout, callback fires with unbounded `total_ms`.
- **Trace:** Lines 452 and 468 show t0 captured and used to synthesize total_ms after callback.
- **Question for verifier:** When SW dies mid-request, can callback fire with unbounded total_ms contradicting host query time?
- **Contract refs needed:** SW lifecycle, AGENTS.md timeout handling
## Candidate C-4: History chip click executes search without checking modal state
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:423-425`
- **Hunch:** History chip click calls runManualSearch() unconditionally. If delete modal open with chosenHit, search executes while modal remains visible with stale state.
- **Trace:** Line 425 calls runManualSearch() with no check for open modals.
- **Question for verifier:** Should clicking history chip close delete modal first?
- **Contract refs needed:** Modal lifecycle specification
## Candidate C-5: Search Clear button does not close delete modal
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:479-484`
- **Hunch:** Clear button resets manualMode and calls runCheck() but does NOT close delete modal, leaving it open with stale chosenHit.
- **Trace:** Lines 479-484 show no $overlay.style.display = "none"
- **Question for verifier:** If user opens delete modal then clicks Clear, does modal remain open in invalid state?
- **Contract refs needed:** Modal lifecycle specification
## Candidate C-6: Bulk-check uses innerHTML with escaped fields (fragile pattern)
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\bulk-check\bulk-check.js:26-44`
- **Hunch:** All template fields are escaped currently, but innerHTML pattern is fragile. If future code adds unescaped response fields, XSS possible.
- **Trace:** Lines 32-44 show escapeHtml() called on fields, but innerHTML assignment could miss new fields.
- **Question for verifier:** Are all response fields (e.g., cache_meta, scanned_remotes) properly escaped?
- **Contract refs needed:** XSS contract, host response schema
## Candidate C-7: Profile selector change triggers search without canceling in-flight request
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:612-622`
- **Hunch:** Profile change triggers new search without AbortController. Old search callback could fire after new one, rendering stale (old-profile) results.
- **Trace:** Lines 620-621 call runManualSearch/runCheck with no request ID or AbortController.
- **Question for verifier:** If user changes profiles while search in-flight, can old callback render stale results?
- **Contract refs needed:** Message passing race condition contract
## Candidate C-8: Bulk-check does not warn when query count exceeds 250-query limit
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\bulk-check\bulk-check.js:13-18`
- **Hunch:** readBulkIds() deduplicates but does NOT enforce 250-query limit. User pastes 300 IDs, UI shows "300 unique IDs", host silently truncates to 250.
- **Trace:** No limit enforcement in readBulkIds(). Host limit at rcjav-host.py line 818: queries[:250]
- **Question for verifier:** Should UI warn when input exceeds 250 IDs?
- **Contract refs needed:** Host bulk_search limit contract
## Candidate C-9: Undo modal button success feedback only visible for 1.2 seconds
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:536-543`
- **Hunch:** Button text changes to "✓ restored" but modal auto-closes after 1.2s. May be too brief for user to see success state.
- **Trace:** Line 536 sets button text to checkmark, line 541 closes modal after only 1200ms timeout.
- **Question for verifier:** Is 1.2s timeout intentional, or should success state remain visible longer?
- **Contract refs needed:** Undo UX specification
## Candidate C-10: Cache banner age uses Math.round, hiding near-boundary staleness
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:604`
- **Hunch:** Math.round(24.4) = 24, making 24.4h cache appear as 24h old. Should use Math.ceil for conservative rounding.
- **Trace:** Line 604 uses Math.round() on oldest.age_hours
- **Question for verifier:** Should cache age always round up to conservatively display staleness?
- **Contract refs needed:** Cache freshness display contract
---
## Summary: 10 candidates found
Most critical: C-1 (message race), C-2 (stale state), C-7 (search race)
---
## VERIFIER NOTES (Phase 1 Moderate verification, stricter prompt + UI-lifecycle rule + bulk-check window awareness)
- C-1 (open-bulk-check race) — REFUTED. Chrome runtime guarantees fire-and-forget delivery before sender unload.
- C-2 (stale expectedId) — PARTIAL → demoted M to L. Delete RPC uses chosenHit (reset every open); typing-validation just gates UI button; no wrong-delete possible. Promoted as L-1.
- C-4 (history chip during modal) — PARTIAL → demoted M to L. chosenHit is reference; delete still correct file. Cosmetic UI confusion only. Promoted as L-2.
- C-5 (Clear button modal) — CONFIRMED M, high confidence. Modal stays open; no Esc handler; delete still works but UX broken. Promoted as M-1.
- C-7 (profile selector race) — CONFIRMED M, high confidence. Real race, stale results stick. Promoted as M-2.
CHUNK 5 CALIBRATION:
- Severe: 0 (none flagged)
- Moderate raw rejection: 1/5 pure REFUTED, 2/5 demoted = 60% downgrade rate
- Combined: stop condition triggered
- Auditor weaknesses: flagging timing races without checking (a) platform contracts, (b) object reference vs rebuilt-from-current-results, (c) self-correcting cosmetic-only effects
- L candidates NOT verified per stop condition. Revisit only if needed.
+121
View File
@@ -0,0 +1,121 @@
# Candidate Findings — Native host — audit-snapshot-2026-05-24T15-55Z.md
Scope: rcjav-host.py + rcjav-host.bat + register-host.bat + install-host.ps1
Required-reading: AGENTS.md / mockup / CACHE_CONTRACT.md / bug-audit-plan.md
Auditor: fresh Explore agent
---
## Candidate C-1
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:1216-1221
- Hunch: Path allowlist check is case-sensitive; if rclone remote names are case-insensitive, a path with different-case remote could bypass the security check.
- Trace: _path_in_allowed_prefixes at line 1216 normalizes path slashes but not case. Line 1219 compares path_norm == prefix without .lower(). If extension passes CQ:JAV/file and allowlist has cq:JAV, the check fails.
- Question for verifier: Are rclone remote names case-insensitive, and can a case-mismatch bypass the allowlist?
- Suggested severity: M (potential security bypass if rclone treats remotes case-insensitively)
- Contract refs needed: none
## Candidate C-2
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:306-316
- Hunch:
ead_message() reads a 4-byte length prefix without validating max size; a sender could cause huge memory allocation.
- Trace: Line 312 reads length as unsigned int; line 313 reads exactly that many bytes with no cap. If msg_len = 0xFFFFFFFF, attempts to allocate 4 GiB.
- Question for verifier: Is there a practical max message size enforced by the browser before the host receives it?
- Suggested severity: M (DoS via huge length prefix; host could crash)
- Contract refs needed: none
## Candidate C-3
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:174-217 (post_discord_alert)
- Hunch: Discord webhook URL validation at line 182 checks format but not reachability; blocking urllib.request.urlopen at line 209 with 5-second timeout could delay RPC response if webhook is unreachable.
- Trace: Line 182 regex validates URL format. Line 209 calls urllib.request.urlopen with timeout=5. If URL is malformed or unreachable, the attempt blocks for up to 5 seconds.
- Question for verifier: Are webhook alerts fired on the main message loop, potentially blocking RPC response?
- Suggested severity: M (RPC delay if alerts fire synchronously on main loop)
- Contract refs needed: none
## Candidate C-4
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2235-2264 (handle_scan)
- Hunch: handle_scan returns success before spawn thread completes, so if subprocess.Popen fails in _scan_worker, extension incorrectly sees "started": true.
- Trace: Line 2260-2264 spawns thread with .start() and immediately returns {"ok": True, "started": True}. If Popen fails in _scan_worker (line 2092), exception is caught and logged but message loop already returned.
- Question for verifier: If subprocess.Popen raises immediately, does extension see success but scan never started?
- Suggested severity: M (misleading response; race condition on success status)
- Contract refs needed: none
## Candidate C-5
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2053-2227 (_scan_worker)
- Hunch: Blocking for loop reading stderr (line 2101) with no timeout; if rc-jav hangs, progress updates freeze until 5-second deferred kill.
- Trace: Line 2101 or raw in proc.stderr: blocks on each line. No timeout. If rc-jav stalls, loop blocks. Deferred kill fires after 5 seconds (line 2297-2302).
- Question for verifier: Does stderr blocking cause observable 5-second stalls in progress updates if rc-jav hangs mid-output?
- Suggested severity: M (observable delay; progress freezes up to 5 seconds)
- Contract refs needed: none
## Candidate C-6
- File: D:\DEV\Extensions\Production\rclone-jav\host\install-host.ps1:47
- Hunch: Extension ID validation regex ^[a-p]{32}$ silently rejects invalid IDs without warning; user may not notice typos in manually-entered IDs.
- Trace: Line 47 filters IDs by regex; if ID doesn't match, it's silently skipped. No warning printed.
- Question for verifier: Should rejected IDs trigger a warning, or is silent skipping acceptable?
- Suggested severity: L (silent failure; user might not notice ID was rejected)
- Contract refs needed: none
## Candidate C-7
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:404-432 (run_rcjav)
- Hunch: Hardcoded PYTHON = "python" without verifying it exists on PATH; error message is generic if python is not found.
- Trace: Line 416 uses bare PYTHON string. Line 431 returns generic error str(e) if subprocess fails.
- Question for verifier: If python is not on PATH, is the error message clear enough to diagnose?
- Suggested severity: L (error is returned; message could be more specific)
- Contract refs needed: none
## Candidate C-8
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:1100-1131 (_patch_cache_remove_paths)
- Hunch: Orphaned .tmp files from failed cache writes are never cleaned up.
- Trace: Line 1126-1128 writes temp and replaces. If replace fails at line 1128, OSError caught at line 1129, but .tmp file remains on disk.
- Question for verifier: Is resource leak of orphaned .tmp files acceptable?
- Suggested severity: L (resource leak; no data loss)
- Contract refs needed: none
## Candidate C-9
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:323-356 (_shrink_response)
- Hunch: Placeholder "TBD" in line 343 suggests unresolved work, but code later resolves it; confusing but not a functional bug.
- Trace: Line 343 appends "structured TBD"; line 355 appends actual count. "TBD" is always replaced.
- Question for verifier: Can "TBD" ever remain in the final truncated_reason?
- Suggested severity: L (misleading message; no functional bug)
- Contract refs needed: none
## Candidate C-10
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2092-2110
- Hunch: Partial JSON lines from ungracefully terminated process are silently caught, potentially losing progress data.
- Trace: Lines 2104-2112 parse SCAN_START JSON; exceptions caught and pass-ed. If process killed mid-line, JSON decoder fails silently.
- Question for verifier: Can partial JSON lines from forced process termination cause stale progress to persist?
- Suggested severity: L (error silently caught; scan restarts cleanly on next invocation)
- Contract refs needed: none
## Candidate C-11
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:1259-1260 (handle_delete)
- Hunch: Rclone paths not pre-validated; malformed paths reach rclone command and fail there instead of being rejected early.
- Trace: Line 1258-1260 validates only local paths with Path.exists(). Rclone paths skip existence check. Allowlist check at line 1267 validates prefix only.
- Question for verifier: Should rclone path format be pre-validated, or is deferring to rclone error handling acceptable?
- Suggested severity: L (rclone rejects malformed paths; no silent failure)
- Contract refs needed: none
## Candidate C-12
- File: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:589-603 (_load_host_cache)
- Hunch: In-process cache memoization uses resolved path as key, but if user changes rcjav_path mid-session, stale cache from different cache.json could be served.
- Trace: Line 584 key = str(cache_path.resolve()) — key includes resolved path. If rcjav_path changes, cache_path changes and fresh entry loaded. But logic is subtle.
- Question for verifier: Can two different cache.json files from different rcjav_path values collide in the memoization key?
- Suggested severity: N (cache key includes resolved path; no obvious collision)
- Contract refs needed: none
---
## VERIFIER NOTES (Phase 1 Moderate verification, stricter prompt with external/internal-input rule)
- C-1 (case-sensitive allowlist) — REFUTED. Gate is fail-SAFE; case-mismatch rejects (no bypass). Usability gap noted but not a bug.
- C-2 (unbounded msg length) — REFUTED. Chrome NM protocol caps extension-to-host at 64 MiB browser-side. Only Brave can write to host stdin.
- C-3 (blocking Discord) — CONFIRMED M, high confidence. All 5 callsites on main thread; test bypasses rate limit. Promoted as M-1.
- C-4 (handle_scan premature success) — CONFIRMED M, very high confidence. 1-2s race window. Promoted as M-2.
- C-5 (stderr blocking) — PARTIAL → demoted M to L. 5s stale progress max, cancel works delayed, no data loss. Promoted as L-1.
CHUNK 2 CALIBRATION:
- Severe: 0 (none flagged)
- Moderate rejection: 2/5 = 40% (pure refute), 1 demoted
- Combined: 2/5 = 40% (stop condition >30% triggered)
- Auditor weaknesses: (1) flagging fail-safe gates as if fail-open, (2) ignoring protocol-level caps from upstream layers
+64
View File
@@ -0,0 +1,64 @@
# Phase 3 Re-Audit Candidates — audit-2026-05-25T21-35Z (post-fix state)
Auditor: fresh Explore, blind context
Scope: 6 files modified during Phase 2 fixes
Looking for: bugs INTRODUCED by Phase 2 fixes (not pre-existing — those are in bugs-*.md)
## Findings: One introduced bug detected
### C-1: M-3 spawn_event race allows cancel to see _scan_proc = None
- **File**: D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:21832190, 23762387, 24082410
- **Symptom**: If user calls cancel-scan within ~15 ms after a scan starts, _scan_proc may still be None when handle_scan_cancel reads it under the lock, causing the cancel to return "no scan running" and skip the cancel-flag file write. The scan continues uninterrupted.
- **Trace**:
1. _scan_worker spawns Popen at line 2176, enters try block
2. Sets `spawn_result["spawn_ok"] = True` (line 2186)
3. Sets `spawn_event.set()` (line 2188) — this wakes handle_scan which is waiting
4. handle_scan timeout fires (line 2376), reads `spawn_result.get("spawn_ok")` → True
5. handle_scan returns `{"ok": True, "scanning": True, "started": True}` (line 2378)
6. Meanwhile, worker thread hasn't yet executed line 2190: `_scan_proc = proc`
7. Extension receives ok:true and immediately sends cancel-scan RPC
8. handle_scan_cancel reads `_scan_proc` under lock (line 2410) and gets None
9. Line 2411 condition is true: `if not running: return ...` and never writes cancel flag
10. Scan continues because rc-jav.py never sees the cancel flag
- **Root cause**: spawn_event is signaled (line 2188) and handle_scan returns before _scan_proc is assigned (line 2190). The critical assignment is inside `with _scan_lock:` which prevents a true race on the read, but the signal happens outside the lock. A cancel arriving in that window sees stale None.
- **Repro**: Stress-test with rapid scan-start / cancel-scan pairs; observe: handle_scan returns ok:true, cancel-scan returns "no scan running" instead of cancelling, scan directory walk completes uninterrupted.
---
## Clear findings on other fixes (no issues):
### M-2 ensureContextMenu lock
✓ Correct. Lock is `_contextMenuLock = Promise.resolve()` at module scope (line 798). Each call chains via `.then()` (line 800). No nesting with other locks; isolated invariant (removeAll + create is atomic in chain). No stale closures — `async () => { ... }` captures its own scope. Top-level call (line 1235) + onInstalled/onStartup listeners are idempotent (removeAll first). **No bugs.**
### M-6 recordRpc lock
✓ Correct. Lock is `_rpcLogLock = Promise.resolve()` at line 169. Wraps get-then-set of NATIVE_LOG_KEY (lines 171179). No rejection escape (catch block swallows, never re-throws). maybeNotifyHostError is called OUTSIDE the lock (line 184) as documented. No deadlock (independent from _hostAlertLock). **No bugs.**
### L-1 maybeNotifyHostError lock
✓ Correct. Lock is `_hostAlertLock = Promise.resolve()` at line 201. Wraps rate-limit read/check/write + notification + Discord post (lines 209240). Separate from _rpcLogLock (different storage key). Called outside _rpcLogLock by recordRpc, so no nesting. On burst, only first caller's check passes; rest read fresh ts and bail (lines 214). No rejection escape. **No bugs.**
### S-1 export handler
✓ Correct. Lines 403440: if get-keep-ranking RPC fails, blocks export and shows error message. Checks both `!r.ok` and missing `keep_ranking` payload. Success path writes to payload._meta.host_config.keep_ranking (line 426). File uses `app: "rclone-jav"` not `rclonex` (line 423). **No bugs.**
### M-1 sanitizeImportedSettings validators
✓ Correct. Profiles validator (lines 592596): accepts `{ name: string, source?: string[], target?: string[] }`. Uses `e.source || []` and `e.target || []` to handle missing fields. Consumer profileOverrides (background.js:407408) safely does `prof.source || []` again. Validator passes profiles with missing source/target; profileOverrides then treats them as empty arrays. This is safe — the consumer never assumes source/target exist as properties. **No bugs.**
### M-5 popup _currentSearchId counter
✓ Correct. Module-level counter (line 294). runCheck bumps at entry (line 300), captures myId, compares in callback (line 307). runManualSearch bumps at entry (line 461), captures myId, compares in callback (line 475). Popup is recreated on each open; each session is isolated. Bumping before paused early-exit ensures older callbacks bail. **No bugs.**
### M-3 spawn_event signal timing
**BUG FOUND** (see C-1 above).
### M-4 Discord post-alert threaded fire-and-forget
✓ Correct. post_discord_alert (lines 242268): checks rate limit via _alert_rate_limited() before spawning thread (line 257). Rate-limit file write (line 149) happens before thread spawn. On burst, only first post passes rate limit; rest return early without thread spawn. _discord_post_worker receives alert_source label (lines 262263). All 4 main-loop callsites pass alert_source (lines 2682, 2701, 2739, 2810). **No bugs.**
### M-7 save_config retry
✓ Correct. Lines 186196: Popen creates tmp file, writes JSON, tries replace. PermissionError triggers sleep(0.5) + one retry. On second PermissionError, re-raises (no infinite loop). Mirrors save_cache design. **No bugs.**
### Manifest version field
✓ Valid. Version is `"0.1.43"` (line 4). Valid semver. No trailing commas. Valid JSON confirmed. **No bugs.**
---
## Summary
**One introduced bug detected in M-3** spawn_event race condition. The remaining five fixes (M-2, M-6, L-1, S-1, M-1, M-5, M-4, M-7) and manifest version are correct and safe.
+109
View File
@@ -0,0 +1,109 @@
# Candidate Findings — Python CLI — audit-snapshot-2026-05-24T15-55Z.md
Scope: rc-jav.py, rcjav/*.py, tests/test_rules.py, fixtures/run.py
Required-reading docs read: AGENTS.md / TODO.md / bug-audit-plan.md
(Note: CACHE_CONTRACT.md does not exist; docs/ folder is absent.)
Auditor: fresh Explore agent
## Candidate C-1
- File: D:\DEV\Project\rclone-jav\rcjav\rclone_io.py:66
- Hunch: Accessing item["Path"] on rclone lsjson output may raise KeyError.
- Trace: quick_search_remote() at line 66 uses direct dict access item["Path"] without .get() fallback. If rclone output is malformed or omits Path, KeyError crashes the scan.
- Question for verifier: Should line 66 use item.get("Path") like line 77 does for Size/ModTime?
- Suggested severity: M
- Contract refs needed: none
## Candidate C-2
- File: D:\DEV\Project\rclone-jav\rcjav\library.py:257
- Hunch: Direct dictionary access f["path"] in find_library_issues() may raise KeyError on corrupted cache.
- Trace: find_library_issues() accesses f["path"] without .get(). Cache is written with path/size/mod_time/jav_id keys but no validation ensures all entries have these keys. Corrupted/legacy caches could be missing path.
- Question for verifier: Should line 257 use f.get("path") to handle missing keys gracefully like --reextract does at line 524?
- Suggested severity: M
- Contract refs needed: none
## Candidate C-3
- File: D:\DEV\Project\rclone-jav\rcjav\library.py:328-330
- Hunch: Direct dict access f["path"] and f["jav_id"] assumes cache entries are well-formed without validation.
- Trace: rename_file_in_remote() at line 328-330 uses direct key access. Line 330 tries fallback with "or f["jav_id"]" but would crash on line 328 if f["path"] is missing. Corrupted cache entries could cause KeyError.
- Question for verifier: Should these lines use f.get() with fallback instead of direct bracket access?
- Suggested severity: M
- Contract refs needed: none
## Candidate C-4
- File: D:\DEV\Project\rclone-jav\rcjav\cli.py:186-189
- Hunch: save_config() lacks Windows file-locking retry logic that save_cache() has.
- Trace: save_config() calls os.replace() without PermissionError handling. If Windows locks config.json, the replace fails. save_cache() (line 142-147) has explicit PermissionError handling with 0.5s retry. --save could report success but file write fails silently on Windows.
- Question for verifier: Should save_config() include the same PermissionError + retry as save_cache()?
- Suggested severity: M
- Contract refs needed: none
## Candidate C-5
- File: D:\DEV\Project\rclone-jav\rcjav\cli.py:131
- Hunch: DEFAULT_CATALOG path computed at module-load time; could resolve incorrectly if cwd differs.
- Trace: DEFAULT_CATALOG is set on line 131 using Path(__file__).resolve().parents[1] at import time. If rc-jav.py invoked from different cwd (Task Scheduler, cron), path resolution might be affected by symlinks or relative-path assumptions.
- Question for verifier: Does DEFAULT_CATALOG resolve to correct wincatalog/ across all invocation contexts?
- Suggested severity: L
- Contract refs needed: AGENTS.md
## Candidate C-6
- File: D:\DEV\Project\rclone-jav\rcjav\dupes.py:105-107
- Hunch: best_priority could be None if no entries match priority folders, masking misconfiguration.
- Trace: Line 105 builds prioritized list. Line 106 sets best_priority=None if empty. Line 107 filters for rank==None which yields empty list. Falls through to fallback, but absence of warning could hide config error.
- Question for verifier: Should a warning be logged when no duplicates match configured priority_folders?
- Suggested severity: L
- Contract refs needed: AGENTS.md
## Candidate C-7
- File: D:\DEV\Project\rclone-jav\rcjav\cli.py:797
- Hunch: Global mutation of DEFAULT_CATALOG/DEFAULT_SOURCE/DEFAULT_TARGET could cause reference bugs.
- Trace: Lines 438-440 reassign global DEFAULT_* from config.json. Line 797 passes mutated DEFAULT_CATALOG to _expand_catalog_paths(). Works correctly but the global-mutation pattern is fragile and could break if code is refactored.
- Question for verifier: Is the global reassignment pattern intentional, or should these be passed as parameters instead?
- Suggested severity: L
- Contract refs needed: AGENTS.md
## Candidate C-8
- File: D:\DEV\Project\rclone-jav\rcjav\ids.py:206-207
- Hunch: normalize_id() appends dummy extension; could fail on input with embedded dots.
- Trace: normalize_id() adds ".x" to call extract_id(). If input is "ABC-001.backup", stem operation treats .backup as extension, breaking the ID. Unlikely in practice but contract not clearly documented.
- Question for verifier: Should normalize_id() validate input format or handle embedded-dot cases?
- Suggested severity: L
- Contract refs needed: AGENTS.md
## Candidate C-9
- File: D:\DEV\Project\rclone-jav\rcjav\rclone_io.py:293
- Hunch: _stderr_thread.join() has no timeout; could hang if stderr thread deadlocks.
- Trace: Daemon thread reads stderr on line 231-235. Line 293 calls join() without timeout. If thread hangs, main thread blocks indefinitely. The timeout handling in cancel logic (lines 270, 284) uses proc.wait(timeout=3).
- Question for verifier: Should _stderr_thread.join() include a timeout?
- Suggested severity: L
- Contract refs needed: none
---
## Summary by Severity
- **Moderate (M)**: 4 candidates — KeyError risks in cache/rclone access, Windows file-locking issue
- **Light (L)**: 5 candidates — Path resolution edge case, global mutation, retry logic, normalize_id contract, thread join timeout
- **Severe (S)**: 0
- **Needs Input (N)**: 0
Top 3 by risk:
1. C-1: KeyError on rclone output could crash scan in quick mode
2. C-2: KeyError on cache.path could crash library-issues scan
3. C-4: Config write failure on Windows could silently corrupt config.json
---
## VERIFIER NOTES (Phase 1 Moderate verification, stricter prompt)
- C-1 (rclone KeyError on Path) — REFUTED. rclone lsjson contract guarantees Path. Direct access appropriate fail-fast.
- C-2 (library cache KeyError) — REFUTED. CACHE_CONTRACT.md + load_cache validation + FileEntry dataclass triple-guarantee path key. cli.py:526 .get pattern is for un-validated --reextract direct read.
- C-3 (rename_file KeyError) — REFUTED. Auditor conflated scalar caller args with iterated dict entries. f comes from cache (contract-guaranteed).
- C-4 (save_config no retry) — CONFIRMED M, high confidence. Promoted as M-1 in bugs-python.md. Real asymmetry vs save_cache.
CHUNK 1 CALIBRATION:
- Severe: 0 (none flagged)
- Moderate rejection: 3/4 = 75%
- Combined: 3/4 = 75% (stop condition >30% triggered)
- Auditor weakness: KeyError pattern-matching without upstream contract check
- L candidates NOT verified per stop condition. Same auditor weakness likely affects L list. Revisit only if needed.
+114
View File
@@ -0,0 +1,114 @@
# Bug Report — Extension SW + content + manifest — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md
Required-reading docs read: ext AGENTS.md / mockup / bug-audit-plan.md / project memory
Auditor agent: fresh Explore agent (chunk 3 auditor)
Verifier agents: fresh Explore agents per candidate, blind context
This file contains CONFIRMED + PARTIAL findings only. Candidate scratch lives in `bugs-candidates-extension-bg.md`. REFUTED / NEEDS-INFO candidates stay in scratch with verifier response appended.
**Chunk 3 calibration note:** S+M verification yielded 4 confirmed bugs with 50% rejection rate. Auditor over-claimed by missing platform API contracts (Chrome connectNative keepalive, contextMenus contract, storage.local atomicity scope, Object.assign argument order, content.js function definitions). Light candidates were NOT verified per audit-plan stop condition. Revisit chunk 3 L only if needed; see `bugs-candidates-extension-bg.md`.
**Cross-chunk re-rank note:** Per `bugs-fix-queue.md`, this chunk's original severity labels were normalized against other chunks. Changes:
- Original S-1 (recordRpc race) → **M-6** in the queue. Demoted because it's diagnostic-log loss, not user-data loss.
- Original M-1 (maybeNotifyHostError rate-limit race) → **L-1 in the queue** (renumbered locally as L-2 below to avoid colliding with prior L-1 Discord). Demoted because over-notification is annoying but recoverable and self-corrects after 10 min.
- M-2 (context menu after SW eviction) → unchanged, kept M (queue M-2).
---
## Severe (S)
Definition: data loss/corruption · wrong remote operation · persistent broken workflow no recovery · silent success when operation actually failed.
(none in this chunk after re-rank)
---
## Moderate (M)
Definition: operation fails/hangs but user can retry · wrong persisted settings · diagnostic loss that materially blocks investigation · modal/workflow stuck until manual recovery · race causing stale/wrong visible results.
### M-2 (queue) — Context menu missing after MV3 SW eviction
- **File:** `D:\DEV\Extensions\Production\rclone-jav\background.js:766-782` (ensureContextMenu) with callsites at `:1019` (settings-changed), `:1178` (onInstalled), `:1179` (onStartup)
- **Symptom (one sentence):** After the MV3 service worker evicts (~30s idle) and a new SW boots from a non-install/non-startup trigger (toolbar click, alarm, message), Chrome has no contextMenus registered and the user's "rclone-jav: Scan" / "rclone-jav: Search ..." entries silently disappear from right-click menus.
- **Why it's a bug:** Per Chrome MV3 contract, `chrome.contextMenus` entries DO NOT persist across SW lifecycle boundaries — they must be re-created on each SW boot. `ensureContextMenu` is only invoked from: `onInstalled` (install/update), `onStartup` (browser boot), and the `settings-changed` message handler. None of these fire on routine SW evict→wake cycles.
- **Reproduction:**
1. Install extension. Right-click any page → context menu items present ✓
2. Leave Brave idle for >30s with no extension activity. SW evicts.
3. Click anything that wakes the SW NOT via onInstalled/onStartup/settings-changed (toolbar icon, alarm, content-script message). New SW boots.
4. Expected: right-click context menu items still present
5. Actual: items missing — must reload extension OR change a setting to restore
- **Suggested fix sketch:** call `ensureContextMenu()` at top-level module init in background.js (runs every SW boot)
- **Verifier verdict:** CONFIRMED — very high confidence (99%)
- **Contract refs verifier read:** Chrome MV3 contextMenus lifecycle
- **Mirror check needed in:** any other Chrome API state that must be re-registered per SW boot — chrome.alarms persistent, chrome.commands manifest-declared. contextMenus is the outlier.
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\background.js:1193` — added top-level `ensureContextMenu();` call at module init scope (NOT inside any addListener / event handler). This runs on every SW evaluation: install, browser startup, idle wake, alarm wake, message wake — covering all paths the prior listener-bound calls missed. Existing onInstalled/onStartup listeners kept as defensive backup; `ensureContextMenu` calls `chrome.contextMenus.removeAll` first, so duplicate invocation is idempotent. Manifest bumped 0.1.35 → 0.1.36. JS syntax verified via `node --check`. Code-trace proof of placement: line 1193 is at module scope (preceded only by other top-level statements like addListener registrations); fires unconditionally on every fresh SW evaluation before any user-event handler. Runtime repro requires user test (reload extension → verify context menu appears → wait 30+ s for SW idle → trigger SW wake via toolbar icon or content script message → right-click any page → expect context menu items still present without needing reload).
### M-6 (queue) — recordRpc read-modify-write race loses log entries
**Re-ranked from chunk S-1 to queue M-6 (diagnostic loss, not user data loss).**
- **File:** `D:\DEV\Extensions\Production\rclone-jav\background.js:155-169` (recordRpc), callsites at `:143`, `:318`, `:330`, `:343`, `:359`
- **Symptom:** When the native port disconnects with multiple inflight requests, the rolling RPC log loses entries because all pending rejects + the disconnect marker call `recordRpc` concurrently and each does non-atomic get-then-set on the same storage key.
- **Why it's a bug:** `recordRpc` is `async` but callers fire fire-and-forget. When `port.onDisconnect` rejects every pending entry in the same tick, each reject wrapper calls `recordRpc` concurrently. All read same `old` array, all set `[newEntry, ...old]`, last set wins. Chrome storage.local has no atomicity guarantee.
- **Reproduction:**
1. Native port disconnects while 3+ requests are inflight (host killed by AV during Check Library batch)
2. Expected: all 3+ rejected requests + `__port_disconnect__` marker land in `chrome.storage.local[NATIVE_LOG_KEY]`
3. Actual: only one entry persists; the others silently disappear. Diagnostics → Native messaging log shows misleading picture exactly when user is investigating an outage.
- **Suggested fix sketch:** wrap recordRpc body in `let _rpcLogLock = Promise.resolve(); _rpcLogLock = _rpcLogLock.then(async () => { ... })` chain. Same pattern user already applied to `_rcjavTrace` in tabvault.
- **Verifier verdict:** CONFIRMED — high confidence
- **Contract refs verifier read:** Chrome storage.local API (no atomicity)
- **Mirror check needed in:** options.js settings save flow, options-library-issues.js cache writes, activity log buffer, tabvault caller log (out-of-scope)
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\background.js:155-180` — wrapped recordRpc body in promise-chain lock (`_rpcLogLock = _rpcLogLock.then(async () => { ... })`). Read-modify-write on `chrome.storage.local[NATIVE_LOG_KEY]` now serializes — concurrent callers chain instead of racing. Pattern mirrors tabvault `_rcjavTrace` lock and the M-2-follow-up ensureContextMenu lock for the same storage race class. `maybeNotifyHostError(entry)` still runs OUTSIDE the lock (its own rate-limit storage race is tracked separately as L-1 in the queue; not fixed here per one-bug-per-session rule). Manifest bumped 0.1.41 → 0.1.42. JS syntax verified via `node --check`. Lock mechanics smoke-tested in isolation with simulated chrome.storage.local (5 ms artificial latency on get/set, 5 concurrent writes): UNLOCKED variant stored only 1 of 5 entries (race confirmed); LOCKED variant stored all 5 entries in correct newest-first order. Mirror checks for options.js / options-library-issues.js storage writes deferred to Phase 3 final verification per audit plan.
---
## Light (L)
Definition: confusing UI · cosmetic stale state · diagnostic annoyance · non-blocking alert issue · two-click recoverable.
### L-1 (queue) — maybeNotifyHostError rate-limit get-then-set race
**Re-ranked from chunk M-1 to queue L-1.**
- **File:** `D:\DEV\Extensions\Production\rclone-jav\background.js:188-193`, callsites via `recordRpc` at `:173`
- **Symptom:** During a host outage burst (port disconnects with 2+ inflight requests), the 10-minute rate-limit on Discord/notification alerts can fire 2-3 alerts within the same window because the get-then-set on `HOST_ALERT_KEY` is non-atomic.
- **Why it's a bug (demoted from M to L):** Same race pattern as M-6, but the impact is over-notification not data loss. User receives extra alerts during one outage event — annoying but informative. Self-corrects after 10-min window. Not blocking. Not stuck workflow.
- **Reproduction:**
1. Port disconnects with 3 inflight requests
2. Expected: 1 alert per 10-min window
3. Actual: 3 alerts for the same incident
- **Suggested fix sketch:** wrap get-then-set in Promise lock (same as M-6 fix; can share the lock)
- **Verifier verdict:** CONFIRMED — high confidence
- **Mirror check needed in:** same as M-6
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\background.js:191-247` — added dedicated `_hostAlertLock` Promise-chain (NOT shared with `_rpcLogLock` per codex's note — different storage key, different invariant). Entire maybeNotifyHostError body now runs inside the lock: rate-limit read/check/write of `HOST_ALERT_KEY`, plus the notification create and Discord post that follow. Concurrent calls in the same tick (5+ pending requests rejected on onDisconnect) now properly chain — first caller writes the new lastTs, subsequent callers see the fresh ts and bail at the check. Manifest bumped 0.1.42 → 0.1.43. JS syntax verified via `node --check`. Lock + rate-limit smoke-tested in isolation with simulated chrome.storage.local (5ms latency): UNLOCKED → 5 of 5 concurrent calls fire alerts (bug confirmed); LOCKED → 1 of 5 concurrent calls fires (correct); LOCKED + 5 sequential within rate-limit window → 1 alert (rate-limit still enforced after the lock change).
### L-2 (queue, was chunk L-1) — Discord post failures have no passive UI surface
- **File:** `D:\DEV\Extensions\Production\rclone-jav\background.js:230-273` (postDiscordAlert), status write at `:265-271`
- **Symptom:** Discord webhook failures are persisted to `chrome.storage.local.lastDiscordSend` but only visible by clicking Test buttons — no passive page-load display.
- **Why it's a bug (originally L):** Diagnostic data not lost, just not surfaced passively. UX visibility gap.
- **Suggested fix sketch:** on Setup pane render, read `lastDiscordSend` and show "Last alert: <ts> · ok|FAILED <reason>"
- **Verifier verdict:** PARTIAL — symptom real, original "silent failure" framing wrong
- **Status:** open
---
## Needs Input (N)
(none)
---
## False Positives (discarded)
- `background.js:90, 100-114, 120-148, 307-365` — flagged as Severe "pending Map orphaned on SW eviction mid-call". REFUTED via Chrome `connectNative` keepalive contract: an open port keeps the MV3 SW alive; if port closes, `onDisconnect` rejects all pending (line 139) — no orphans. `pulseKeepalive` is defensive redundancy. Caveat: if Brave observed diverging, would become Brave-specific bug — not verified.
- `background.js:62-76` (mergeSettings) — flagged as Moderate. REFUTED. Auditor misread `Object.assign({}, dv, sv)` — defaults go FIRST so missing keys fill from defaults.
- `background.js:895-905` (contextMenu tab.id null) — flagged as Moderate. REFUTED via Chrome contextMenus contract: registered contexts guarantee non-null tab.id. `extractIdFromTab` also has defensive null check.
- `content.js` (escapeOverlay undefined) — flagged as Moderate. REFUTED. Function IS defined at content.js:451. Auditor missed it.
+77
View File
@@ -0,0 +1,77 @@
# Bug Report — Extension Options pages — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md
Required-reading docs read: ext AGENTS.md / mockup / bug-audit-plan.md / project memory
Auditor agent: fresh Explore agent (chunk 4 auditor)
Verifier agents: fresh Explore agents per candidate, blind context, stricter contract-check prompt + UI-inconvenience rule + chrome.storage.sync quota awareness
**Chunk 4 calibration note:** Moderate verification yielded 2 confirmed bugs with 50% rejection rate (2/4 REFUTED). Auditor's recurring weakness: flagging fire-and-forget message patterns as data-loss without tracing whether downstream reads from storage every call (no in-memory cache to invalidate). Also misjudged contract-guaranteed nullability (C-10 cited entry.path null crash without checking host's `_cache_entry` guarantees). Stricter verifier prompt + UI-inconvenience rule + storage-quota awareness caught both false positives. **Light candidates were NOT verified per audit-plan stop condition** (>30% rejection → halt L verification). See `bugs-candidates-extension-options.md` for unverified L list (C-1, C-2, C-3, C-4, C-7, C-9, C-11, C-12).
---
**Cross-chunk re-rank note:** Per `bugs-fix-queue.md`, this chunk's M-2 (export silently drops keep_ranking) was **promoted to Severe** because silent backup data loss fits the S criterion ("silent success when operation actually failed"). M-1 (sanitizeImportedSettings) remains Moderate. Severity sections below reflect post-rerank placement.
---
## Severe (S)
Definition: data loss/corruption · wrong remote operation · persistent broken workflow no recovery · silent success when operation actually failed.
### S-1 (queue) — Export silently drops keep_ranking when host RPC fails; backup→restore loses user-typed VIP/format prefs
**Re-ranked from chunk M-2 to queue S-1 (silent backup data loss).**
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\options\options.js:386-416` (export flow), import display at `:540-565`
- **Symptom:** When user clicks Export while host is unreachable, `get-keep-ranking` RPC fails, error swallowed by `try/catch {}`, hostConfig empty, export JSON has `_meta.host_config: {}`, export reports plain "exported." with no warning. On restore, user loses VIP folders / format prefs / size tolerance / tiebreak rules silently.
- **Why it's a bug:** Export try/catch wraps RPC at line ~393 with bare `catch {}` at line ~395. Falls through to line ~401 with `hostConfig = {}`. Status message at line ~415 hardcoded to "exported." regardless of RPC result. Import side reads `data?._meta?.host_config?.keep_ranking`; null shows informational "Not included" message but does NOT block confirm. User loses manually-configured ranking on restore.
- **Reproduction:**
1. Kill host process (or block network so RPC times out). Open Setup → Backup → Export.
2. Expected: export blocked with "Cannot export — keep_ranking unreachable" OR file marked partial AND status visibly warns.
3. Actual: download triggers, file has `_meta.host_config: {}`, status says "exported." User believes complete backup. Later imports → keep-ranking silently absent. User must re-type configuration.
- **Suggested fix sketch:** on RPC failure, EITHER block export with retry prompt OR write file with explicit `_meta.partial: true` flag + prominent "Partial export — keep_ranking unavailable" status; import side must treat absent keep_ranking as hard warning requiring explicit "proceed without" confirmation, not dismissable info.
- **Verifier verdict:** CONFIRMED — high confidence (95%)
- **Mirror check needed in:** any other RPC-sourced data in export `_meta.host_config` (currently only keep_ranking)
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\src\options\options.js:386-447` — export now blocks if `get-keep-ranking` RPC fails (any of: missing response, !ok, missing keep_ranking, value not an object, exception). Status shows specific failure reason + retry instruction. No file written on failure. Success path requires `r.keep_ranking` to be a valid object (not the literal `null`/array/etc.) and writes `_meta.host_config.keep_ranking` into the JSON. Status message updated to `"exported (settings + keep_ranking)."` to confirm both halves landed. Manifest bumped 0.1.32 → 0.1.33. JS syntax verified via `node --check`. Runtime proof confirmed both paths: failure → status "export blocked — host could not return keep ranking: Specified native messaging host not found. Confirm native host is running, then retry." + no file; success → file with full keep_ranking object (5 fields populated).
---
## Moderate (M)
### M-1 (queue) — sanitizeImportedSettings doesn't validate array element shape; downstream crash
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\options\options.js:533-546` (sanitizeImportedSettings), with downstream crash sites at `content.js:tryAdapters` (~line 57-73) and `src\shared\id-extract.js:applyNormalizers` (~line 22-31)
- **Symptom (one sentence):** A settings import with malformed array elements (e.g. `siteAdapters: [{ host: 123, selector: [] }]` or `idNormalizers: [{ re: null, fmt: 42 }]`) passes the import sanitizer because only the outer "array" type is checked — the bad data persists to chrome.storage.sync, then crashes content script's `tryAdapters` at `a.selector.split(",")` (TypeError on `.split` of an Array) the next time the user visits any web page.
- **Why it's a bug:** `sanitizeImportedSettings` (line 533-546) checks `_typeOf(v) !== expected` against SETTINGS_SCHEMA outer types only. The comment at line 491 claims nested objects get recursive validation but the code doesn't implement it for arrays. Result: idNormalizers crash is silently swallowed by `try/catch` in applyNormalizers (search degraded silently). siteAdapters crash is NOT wrapped — content script's tryAdapters throws TypeError, breaks ID extraction on the affected page, and subsequent extraction failures propagate to all auto-check + manual search via content scripts. Import modal does NOT preview adapter contents, so user has no chance to spot bad data before confirming.
- **Reproduction:**
1. Input: hand-craft a JSON export file with `settings.siteAdapters = [{host: ["array"], selector: 5}]`, import via Setup → Import Settings
2. Expected: import rejects entry with "siteAdapters[0] malformed" or strips the bad entry
3. Actual: import succeeds with status "imported." Subsequent web page navigations trigger content script auto-check → `tryAdapters` throws TypeError on `a.selector.split` → ID extraction breaks on that page → manual searches also fail because shared content path is broken
- **Suggested fix sketch:** add per-key element validation in sanitizeImportedSettings for siteAdapters (each element must be `{host: string, selector: string}`) and idNormalizers (each element must be `{re: string|RegExp, fmt: string}`). Drop malformed elements with a note in the import warning.
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** CONFIRMED
- **Verifier confidence:** medium-high (downstream impact varies — idNormalizers silently degrades, siteAdapters breaks content script)
- **Contract refs verifier read:** SETTINGS_SCHEMA outer-type behavior; tryAdapters string assumptions; applyNormalizers try/catch
- **Mirror check needed in:** profiles[] array (also passes outer type check); partPatterns[] array — same pattern likely applies
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\src\options\options.js:552-633` — added `ARRAY_ELEMENT_VALIDATORS` map keyed by setting name (covers siteAdapters, idNormalizers, partPatterns, knownSitePatterns, profiles). Each validator rejects only shapes that would crash consumers; tolerates extra unknown fields for forward-compat with future exports. `sanitizeImportedSettings` now filters malformed elements per-array — good ones pass through, bad ones get logged to `dropped[]` with their index (e.g. `siteAdapters[2](malformed)`); if ALL elements in an array are bad, the key falls through to defaults via mergeSettings instead of persisting empty array. Dropped list capped at 30 entries to keep import modal readable. Mirror check resolved in same commit (validators cover all four flagged keys). Manifest bumped 0.1.33 → 0.1.34. JS syntax verified via `node --check`. Validator unit-tested inline: 9 acceptable shapes accepted + 9 malformed shapes rejected (18/18 pass). Runtime repro requires user test (import a JSON with `siteAdapters: [{host: 123, selector: []}]` → expect modal shows `siteAdapters[0](malformed)` in Ignored keys row + the entry is NOT persisted to chrome.storage.sync; subsequent web page navigations should NOT crash content.js).
(M-2 moved to Severe S-1 above per cross-chunk re-rank)
---
## Light (L)
(none promoted — chunk 4 L verification skipped per stop condition)
---
## Needs Input (N)
(none promoted)
---
## False Positives (discarded)
- `src/options/options.js:210-286` (save fire-and-forget settings-changed message) — flagged as Moderate "stale in-memory config in background after save". REFUTED. Background's `getSettings()` (background.js:74-77) reads from chrome.storage.sync on every call — no in-memory cache to invalidate. The `settings-changed` message is just an optimization for context-menu rebuild; if the message is lost (which MV3 doesn't actually do — `sendMessage` wakes the SW per spec), the next native call still reads fresh settings from storage. No persistent stale state.
- `src/options/options-library-issues.js:134-143` (entry.path null crash in makeReportRow) — flagged as Moderate "TypeError crashes Library Issues modal render". REFUTED via host-side schema contract: `rcjav/library.py:_cache_entry` always sets `"path": f.get("path", "")` (non-null string guarantee). Host's `handle_library_issues` passes through structurally-valid entries; the null-path scenario is unreachable in normal operation. If cache.json was directly corrupted with missing path keys, Python-side direct `f["path"]` access (library.py:257) would raise KeyError before any data reached the JS render layer. Severity if reachable would be CRITICAL (uncaught TypeError crashes modal — no outer try/catch), but the contract closes the attack surface. Note: defensive null check would be cheap if ever wanted as belt-and-suspenders.
+106
View File
@@ -0,0 +1,106 @@
# Bug Report — Extension Popup + Bulk Check — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md
Required-reading docs read: ext AGENTS.md / mockup / bug-audit-plan.md / project memory
Auditor agent: fresh Explore agent (chunk 5 auditor)
Verifier agents: fresh Explore agents per candidate, blind context, stricter contract-check prompt + UI-lifecycle rule + bulk-check window-lifecycle awareness
**Chunk 5 calibration note:** Moderate verification yielded 2 confirmed bugs + 2 demoted (M→L) with 60% combined downgrade rate (1/5 pure REFUTED, 2/5 demoted). Auditor's recurring weakness: flagging timing/race theories as M without checking whether (a) Chrome platform contract gates the race, (b) underlying action object reference stays correct despite cosmetic UI staleness, (c) user can recover with one click. Stricter prompt + UI-lifecycle rule + "self-correcting cosmetic" filter caught all 3. **Light candidates were NOT verified per audit-plan stop condition** (>30% rejection → halt L verification). See `bugs-candidates-extension-popup.md` for unverified L list (C-3, C-6, C-8, C-9, C-10).
---
## Severe (S)
(none flagged by auditor in this chunk)
---
**Cross-chunk re-rank note:** Per `bugs-fix-queue.md`, this chunk's M-1 (Clear button leaves modal open) was **demoted to Light** because two-click recovery (Cancel button visible) makes it cosmetic + minor friction, not data loss or stuck workflow. M-2 (profile selector race) remains Moderate. Severity sections below reflect post-rerank placement.
---
## Moderate (M)
### M-5 (queue) — Profile selector race
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:605-625` (profile change handler) with runManualSearch at `:443-475` and runCheck around `:288-302`
- **Symptom (one sentence):** When the user switches profile while a manual search or auto-check is inflight, the handler fires a NEW search without canceling the OLD inflight RPC — if the old RPC's response arrives AFTER the new one (e.g., host was slow on first call, fast on second), the older Default-profile results render over the newer Other-profile results and stick until the user takes another action.
- **Why it's a bug:** No AbortController, no request ID gating, no UI-side request tracking, no profile-selector-disable-while-busy. `renderResult` unconditionally overwrites `lastResult` and the DOM with whatever response arrives (lines 237-286). Results are persistent — there's no auto-refresh that would self-correct. User sees wrong-profile results until they manually act. Codex's rule: wrong-data persisting until user action ≠ cosmetic ≠ self-correcting → M.
- **Reproduction:**
1. User on Default profile, searches `ABC-001`. RPC#1 inflight (host happens to be slow, ~250ms).
2. Before RPC#1 returns, user switches profile selector to "Other". Handler fires `runManualSearch()` for same query → RPC#2 inflight (host now fast, ~100ms).
3. RPC#2 returns first → renderResult shows Other-profile results.
4. RPC#1 returns later → renderResult overwrites with Default-profile results.
5. User sees Default-profile results under "Other" selector. Stays wrong until next action.
- **Suggested fix sketch:** track a monotonic `currentSearchId` in module scope. Each new search increments and captures its ID; callback compares its captured ID against the current — discards stale callbacks. Alternative: disable the profile selector + show busy indicator while inflight (simpler, but blocks legitimate fast follow-up actions).
- **Verifier agent:** fresh Explore, blind context, stricter prompt + UI-lifecycle rule
- **Verifier verdict:** CONFIRMED
- **Verifier confidence:** high (95%)
- **Contract refs verifier read:** runManualSearch / runCheck callback handling; profile change handler; renderResult lack of guards
- **Mirror check needed in:** any other UI control that triggers fresh searches while old may be inflight (history chip clicks at lines 423-425, search clear's runCheck call — both share the same lack of request-ID gating)
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:282-310` (runCheck) + `:443-475` (runManualSearch) — added module-level monotonic `_currentSearchId` counter. Both functions bump it synchronously on entry (BEFORE the `scanPaused` early-exit, per follow-up below) and capture their own id; sendMessage callbacks compare `myId !== _currentSearchId` and bail before any UI write if a newer search has started. Stale callbacks neither call setStatus, renderResult, pushHistory, nor any other UI mutator — they return silently so the newer search's render is final. Mirror check resolved in same commit: every entry point that triggers a fresh search (history chip clicks, search clear, profile selector, search-go button, Enter key, pause-while-inflight) funnels through runCheck or runManualSearch, so the gate covers all paths automatically. Manifest bumped 0.1.39 → 0.1.40, then 0.1.40 → 0.1.41 (follow-up: moved bump to BEFORE paused early-exit so pause-while-inflight is also gated — same race class). JS syntax verified via `node --check`. Logic unit-tested in isolation: 5/5 cases correct.
---
## Light (L)
### L-6 (queue) — Search Clear button leaves delete modal open with stale state; no Esc support
**Re-ranked from chunk M-1 to queue L-6.**
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:475-490` (Search Clear handler) with modal lifecycle at `:318-380`
- **Symptom:** When delete modal is open and user clicks Search Clear (×), the modal stays visible with stale confirm-input + chosenHit. No Esc-key handler exists. Only recovery is clicking the modal's Cancel button.
- **Why it's a bug (demoted from M to L):** Clear handler resets only `manualMode` and search input. Modal close NOT invoked. BUT: delete RPC stays correct because `chosenHit` is reference to original hit (delete fires against right file). Two-click recovery via Cancel button. No data loss, no wrong action, no stuck workflow. Cosmetic + minor friction.
- **Reproduction:**
1. Search FILE-A; click hit; click DELETE button
2. Click × (Search Clear) instead of Cancel
3. Expected: modal closes
4. Actual: modal stays visible; pressing Esc does nothing; click Cancel to recover
- **Suggested fix sketch:** Clear handler invokes closeDeleteModal before runCheck; add document keydown listener for Esc closes modal
- **Verifier verdict:** CONFIRMED — high confidence (95%)
- **Mirror check needed in:** other modals (undo modal, profile modal) — do they have Esc support
- **Status:** open
### L-4 (queue, was chunk L-1) — expectedId not reset between delete-modal sessions; cosmetic state leak
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:318-380`
- **Symptom (one sentence):** Global `expectedId` persists across delete-modal open/close cycles — if user opens modal for FILE-A, cancels, then opens for FILE-B without first clicking a hit, the modal's confirm-input would validate against FILE-A's name briefly until selectHit(B) fires and updates it.
- **Why it's a bug (demoted from M to L):** Originally flagged as M "stale expectedId allows wrong delete." Verifier traced the actual delete path: the DELETE RPC at line 382 uses `chosenHit.full_path`, not `expectedId`. `chosenHit` is reset to null in `openDeleteModal` (line 321), so any "type-and-confirm" attempt without re-clicking a hit returns early at `if (!chosenHit) return;` (line 379). No wrong file can be deleted. The only effect is `expectedId` text showing stale value briefly during the open→re-open transition.
- **Reproduction:**
1. Open delete modal for FILE-A, click FILE-A hit (selectHit sets expectedId="FILE-A.mp4")
2. Cancel modal (close without delete)
3. Open delete modal for FILE-B WITHOUT clicking any hit yet
4. Expected: expectedId reset to empty string OR null; confirm-input disabled
5. Actual: expectedId still "FILE-A.mp4"; confirm-input checks typed text against stale value. No delete possible (chosenHit is null) but state is inconsistent.
- **Suggested fix sketch:** add `expectedId = "";` to openDeleteModal at line 321
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** PARTIAL — bug is real as state leak; severity over-stated by auditor
- **Verifier confidence:** high (95%)
- **Contract refs verifier read:** delete RPC gating; openDeleteModal init; chosenHit lifecycle
- **Mirror check needed in:** none (popup-local state)
- **Status:** open
### L-5 (queue, was chunk L-2) — History chip click during open delete modal leaves modal floating over fresh results
- **File:** `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js:423-425`
- **Symptom (one sentence):** Clicking a history chip while the delete modal is open fires `runManualSearch()` without closing the modal — fresh search results render behind the floating modal, which still shows the OLD search's hit selected for deletion.
- **Why it's a bug (demoted from M to L):** Originally flagged as M. Verifier confirmed `chosenHit` is a reference to the ORIGINAL hit object (set at selectHit, line 347), so if user confirms typing in the stale modal, delete fires correctly against the original file. No wrong delete. Only effect: UI confusion + stale modal over fresh results. User dismisses modal via Cancel button → recovers. Pure cosmetic + minor friction.
- **Reproduction:** open modal for FILE-A → click history chip for a different search → modal stays open over new results
- **Suggested fix sketch:** close modal in the history chip click handler before invoking runManualSearch (or gate the chip clicks while modal open)
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** PARTIAL — symptom real, severity over-stated
- **Verifier confidence:** high (90%+)
- **Contract refs verifier read:** chosenHit reference semantics; runManualSearch modal-state side-effects; renderResult scope
- **Mirror check needed in:** Search Clear's similar issue (covered by M-1)
- **Status:** open
---
## Needs Input (N)
(none promoted)
---
## False Positives (discarded)
- `src/popup/popup.js:563-565` (open-bulk-check fire-and-forget race) — flagged as Moderate "message lost if popup closes before IPC dispatch". REFUTED via Chrome runtime contract: `chrome.runtime.sendMessage(...)` queues the message in the runtime before the sender context unloads. window.close() cannot terminate the popup before the message is enqueued. Fire-and-forget pattern from a popup followed by window.close is documented-safe. Worst-case if anything went wrong: user clicks bulk-check icon again — fully retry-recoverable, not a permanent stuck state.
+100
View File
@@ -0,0 +1,100 @@
# Phase 2 Fix Queue — audit-snapshot-2026-05-24T15-55Z.md
Compiled from `bugs-python.md`, `bugs-host.md`, `bugs-extension-bg.md`, `bugs-extension-options.md`, `bugs-extension-popup.md` after cross-chunk comparative re-rank. Per-chunk severity labels were assigned in isolation; this queue reflects normalized severity using the audit-plan + codex's re-rank criteria:
- **Severe:** data loss/corruption · wrong remote operation · persistent broken workflow with no reasonable recovery · silent success when operation actually failed
- **Moderate:** operation fails/hangs but user can retry · wrong persisted settings · diagnostic loss that materially blocks investigation · modal/workflow stuck until manual recovery · race causing stale/wrong visible results
- **Light:** confusing UI · cosmetic stale state · diagnostic annoyance · non-blocking alert issue · two-click recoverable
Per-chunk `bugs-*.md` files have been updated so their severity sections are consistent with this queue.
## Fix order (top-to-bottom)
| # | Original ID | Re-ranked | File | Fix boundary | One-line |
|---|---|---|---|---|---|
| ~~1~~ | ~~opts M-2~~ | ~~**S-1**~~ | ~~`bugs-extension-options.md`~~ | ~~Extension (Options)~~ | **FIXED v0.1.33** — Export now blocks on RPC failure; success path verified with populated keep_ranking |
| ~~2~~ | ~~opts M-1~~ | ~~**M-1**~~ | ~~`bugs-extension-options.md`~~ | ~~Extension (Options)~~ | **FIXED v0.1.34** — Element validators per array key; malformed elements dropped with index in modal; mirror check resolved in same commit |
| ~~3~~ | ~~bg M-2~~ | ~~**M-2**~~ | ~~`bugs-extension-bg.md`~~ | ~~Extension (background SW)~~ | **FIXED v0.1.36** — Top-level ensureContextMenu() call at module init runs on every SW evaluation |
| ~~4~~ | ~~host M-2~~ | ~~**M-3**~~ | ~~`bugs-host.md`~~ | ~~Host (Python)~~ | **FIXED v0.1.37** — Per-invocation spawn handoff via threading.Event + dict; handle_scan waits ≤500ms for Popen result |
| ~~5~~ | ~~host M-1~~ | ~~**M-4**~~ | ~~`bugs-host.md`~~ | ~~Host (Python)~~ | **FIXED v0.1.39** — Threaded worker fire-and-forget for real alerts; test RPC waits 6s for synchronous pass/fail; outcome logged with alert_source |
| ~~6~~ | ~~popup M-2~~ | ~~**M-5**~~ | ~~`bugs-extension-popup.md`~~ | ~~Extension (Popup)~~ | **FIXED v0.1.40** — Monotonic `_currentSearchId` gate; stale callbacks bail before any UI write |
| ~~7~~ | ~~bg S-1~~ | ~~**M-6**~~ | ~~`bugs-extension-bg.md`~~ | ~~Extension (background SW)~~ | **FIXED v0.1.42** — Promise-chain lock around recordRpc; 5/5 concurrent writes preserved in smoke test (vs 1/5 unlocked) |
| ~~8~~ | ~~python M-1~~ | ~~**M-7**~~ | ~~`bugs-python.md`~~ | ~~CLI (Python)~~ | **FIXED (no manifest bump — CLI only)** — Mirrored save_cache retry; 3/3 smoke tests pass |
| ~~9~~ | ~~bg M-1~~ | ~~**L-1**~~ | ~~`bugs-extension-bg.md`~~ | ~~Extension (background SW)~~ | **FIXED v0.1.43** — Dedicated `_hostAlertLock` around rate-limit + notification + Discord paths |
| 10 | bg L-1 | **L-2** | `bugs-extension-bg.md` | Extension (background SW) | Discord post failures no passive UI surface |
| 11 | host L-1 | **L-3** | `bugs-host.md` | Host (Python) | Stderr blocking 5s stale on rc-jav stall |
| 12 | popup L-1 | **L-4** | `bugs-extension-popup.md` | Extension (Popup) | Stale expectedId between delete modal sessions |
| 13 | popup L-2 | **L-5** | `bugs-extension-popup.md` | Extension (Popup) | History chip during open delete modal |
| 14 | popup M-1 | **L-6** | `bugs-extension-popup.md` | Extension (Popup) | Search Clear button leaves delete modal open (demoted — two-click recovery) |
## Summary
- **Severe: 1** (#1 keep_ranking export) — **FIXED v0.1.33**
- **Moderate: 7** (#2-#8)
- **Light: 6** (#9-#14)
- **Total confirmed bugs: 14**
**Remaining: 0 S · 0 M · 5 L** (9 fixed — all Severe + Moderate + 1 Light closed)
## Shipped versions log
Tracks manifest version bumps. Not every bump corresponds to a queue entry — some ship out-of-band fixes flagged ad-hoc.
| Version | Queue ID | What |
|---|---|---|
| 0.1.33 | S-1 | Export blocks on keep_ranking RPC failure |
| 0.1.34 | M-1 | sanitizeImportedSettings element validators |
| 0.1.35 | (out-of-band) | Branding follow-up: `_meta.app` and export filename `rclonex``rclone-jav`. No functional change. |
| 0.1.36 | M-2 | Context menu re-registered on every SW evaluation (top-level `ensureContextMenu()` call) |
| 0.1.37 | M-3 | handle_scan synchronously surfaces Popen failures via per-invocation Event/dict handoff (≤500 ms wait) |
| 0.1.38 | (M-2 follow-up) | ensureContextMenu Promise-chain lock + per-create try/catch — fixes "duplicate id" race introduced by M-2's top-level call |
| 0.1.39 | M-4 | post_discord_alert threaded fire-and-forget (real alerts) + sync wait with explicit timeout error (test RPC); outcome logged via discord_post event with alert_source |
| 0.1.40 | M-5 | Popup monotonic `_currentSearchId` gate; stale runCheck/runManualSearch callbacks bail before UI write |
| 0.1.41 | (M-5 follow-up) | Bump `_currentSearchId` BEFORE paused early-exit in runCheck/runManualSearch — closes same race class for pause-while-inflight |
| 0.1.42 | M-6 | Promise-chain lock around recordRpc — serializes concurrent storage.local read-modify-write |
| (no bump) | M-7 | `rcjav/cli.py` save_config gains PermissionError retry to match save_cache — CLI repo, no extension files touched |
| 0.1.43 | L-1 | Dedicated `_hostAlertLock` around maybeNotifyHostError; serializes rate-limit read/check/write + notification + Discord paths |
| 0.1.44 | (Phase 3 introduced-bug fix) | M-3 spawn race: reorder `_scan_proc = proc` BEFORE `spawn_event.set()` so cancel handler sees live proc reference |
| 0.1.45 | (M-6 mirror) | `recordActivity` race fix — same Promise-chain lock pattern as recordRpc; concurrent activity log writes now serialized |
## Fix-boundary summary
| Boundary | S | M | L | Notes |
|---|---|---|---|---|
| CLI repo (Python `D:\DEV\Project\rclone-jav\`) | 0 | 1 | 0 | M-7 save_config retry only |
| Host (Python at `D:\DEV\Extensions\Production\rclone-jav\host\`) | 0 | 2 | 1 | M-3, M-4, L-3 |
| Extension SW + content (`background.js` + `content.js` + `manifest.json`) | 0 | 2 | 2 | M-2, M-6, L-1, L-2 |
| Extension Options (`src/options/`) | 1 | 1 | 0 | S-1, M-1 |
| Extension Popup + Bulk (`src/popup/`, `src/bulk-check/`) | 0 | 1 | 3 | M-5, L-4, L-5, L-6 |
## Phase 2 rules (per audit-plan)
1. **One bug per fix session.** No batch fixes.
2. **Fix verification gate** before marking `status: fixed`:
- Re-run the bug's reproduction recipe → must produce Expected, not Actual
- Per-file test re-run for affected file
- If no test existed for the now-fixed behavior, write one
- If extension code OR host code changed (any file under `D:\DEV\Extensions\Production\rclone-jav\`): bump `manifest.json` version (one bump per fix unless user explicitly batches). CLI fix (separate repo) does not trigger manifest bump.
- Do NOT touch any other bug entry or any file marked DO NOT FIX
- Update bug entry with `Status: fixed` + `Fix:` line citing the new file:line
3. **After fixes in a chunk:** full chunk test suite re-run (not just per-file)
4. **Mirror checks:** S-1 (none), M-1 (profiles[], partPatterns[] same pattern), M-2 (none), M-6 (options.js storage writes, activity log buffer, tabvault caller log out-of-scope)
## Version bump policy for this queue
Each fix is its own user-requested update under the project's "one bump per shipped change" rule (see `feedback_extension_version_bump.md`). The manifest version chip is the user's reload-verification signal — they read it in `brave://extensions` after reload to confirm latest code is loaded.
- **Extension fix (background.js / src/options / src/popup / src/bulk-check / content.js / manifest.json itself) → bump `manifest.json`** (one bump per fix unless explicitly batched)
- **Host fix (`host/rcjav-host.py` or sibling files in `host/`) → bump `manifest.json` AS WELL.** Host folder is bundled inside the extension repo and ships together. User's reload habit is "reload extension + check version chip"; if only host changed, user has no other visible "latest version" signal in their normal workflow. Optionally also bump `VERSION` constant inside `rcjav-host.py` for forensic record (visible via Diagnostics card), but the manifest bump is the user-facing signal.
- **CLI fix (M-7 only — `D:\DEV\Project\rclone-jav\rcjav/*.py` or `rc-jav.py`) → no extension manifest bump.** CLI lives in a separate repo; extension folder is unchanged. CLI changes take effect on the next CLI invocation automatically. If forensic version tracking is desired, optionally bump a CLI-side version marker, but no manifest bump.
If user explicitly batches multiple fixes into one shipped change → one bump for the batch. Default = per-fix bump.
## Recommended pause
Per audit plan: pause before starting Phase 2. Confirm:
1. Severity re-rank looks right (compare side by side, not in isolation)
2. Fix-boundary distribution is acceptable (Extension Options has the only Severe — Options pane will need careful regression check after fix)
3. Decision on whether to fix all 14 or only the Severe + critical Moderates
Standing by.
+87
View File
@@ -0,0 +1,87 @@
# Bug Report — Native Host — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md
Required-reading docs read: AGENTS.md / mockup / CACHE_CONTRACT.md / bug-audit-plan.md / project memory
Auditor agent: fresh Explore agent (chunk 2 auditor)
Verifier agents: fresh Explore agents per candidate, blind context, stricter contract-check prompt + external-vs-internal-input rule
**Chunk 2 calibration note:** Moderate verification yielded 2 confirmed bugs + 1 demoted (M→L) with 40% pure-rejection rate (2/5 REFUTED). Auditor's recurring weaknesses: (1) flagging gate logic that's fail-SAFE as if it were fail-OPEN (C-1), (2) ignoring browser/protocol-level caps when worrying about host-side validation (C-2). Stricter verifier prompt with external-input + protocol-spec checks caught both false positives. **Light candidates were NOT verified per audit-plan stop condition** (>30% rejection → halt L verification). See `bugs-candidates-host.md` for unverified L list (C-6, C-7, C-8, C-9, C-10, C-11) and Needs Input C-12.
---
## Severe (S)
(none flagged by auditor in this chunk)
---
## Moderate (M)
### M-1 — post_discord_alert blocks main message loop for up to 5 s
- **File:** `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:174-289` (post_discord_alert refactored into `_discord_post_worker` + `_build_discord_body` helpers + public `post_discord_alert` thin wrapper after M-4 fix; was line 174-217 pre-fix), with callsites in `handle_test_alerts_config` + 4 main-loop sites (conn_close abnormal, read_message exception, handler exception, write_message exception)
- **Symptom (one sentence):** When a handler exception or abnormal port close fires AND the Discord webhook URL is configured AND Discord is slow/unreachable, the main message loop blocks for up to 5 seconds inside `urllib.request.urlopen(timeout=5)`, delaying the failure response to the extension by the same 5 s.
- **Why it's a bug:** All 5 callsites of `post_discord_alert` execute on the main thread that runs the native messaging loop. Of those: callsites 2-5 are rate-limited via `_alert_rate_limited()` (LAST_ALERT_FILE check at line 184-185) so the FIRST exception per 10-minute window blocks; callsite 1 (`handle_test_alerts_config`) deliberately deletes LAST_ALERT_FILE to bypass rate limiting (line 258) before calling `post_discord_alert` — every Test (host) button click is a guaranteed 5 s main-thread block when Discord slow. During the block, the extension's RPC promise hangs waiting for the response.
- **Reproduction:**
1. Input: configure Discord webhook URL pointing at a slow/down endpoint (or kill network). Open Setup → Alerts → click Test (host).
2. Expected: test fires asynchronously; UI returns immediately with "sent (still pending)" or similar
3. Actual: Options page hangs ~5 s waiting for the host's RPC response, because host's main loop is blocked in urlopen
- **Suggested fix sketch:** spawn a background thread for `urlopen` (fire-and-forget), or use a 1 s timeout instead of 5 s, or move webhook delivery into a worker queue consumed by a dedicated thread. Mirror the extension-side webhook post pattern (which already uses `fetch().catch(...)` without blocking the SW event loop).
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** CONFIRMED
- **Verifier confidence:** high
- **Contract refs verifier read:** native messaging response timing expectations; threading model of `main()`
- **Mirror check needed in:** extension-side `postDiscordAlert` in background.js — already non-blocking (uses fetch), but verify pattern consistency
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:174-289` — refactored post_discord_alert into shared internal worker (`_discord_post_worker`) + helper (`_build_discord_body`). Two public modes: (a) `post_discord_alert(...)` spawns daemon thread, returns immediately (used by 4 main-loop callsites: conn_close, read_error, handler_exception, write_error — each now passes `alert_source` label for analytics); (b) `handle_test_alerts_config` builds payload, spawns same worker with event+holder, waits 6 s, returns synchronous pass/fail or explicit timeout error `"Discord webhook timed out after 6s; background post may still complete (see events.log)"`. Worker logs every outcome via `log_event("discord_post", ok=, status=, error=, alert_kind=, alert_source=, elapsed_ms=)` — visibility preserved despite async execution. Error text capped at 120 chars; never logs webhook URL or full payload. Main message loop no longer blocks on Discord. Manifest bumped 0.1.38 → 0.1.39. Python syntax verified via `py_compile`. Worker mechanics smoke-tested in isolation: bogus URL → 404 ok:False; bad domain → URLError ok:False with reason captured; fire-and-forget mode (no event/holder) → no raise. Test button still returns synchronous pass/fail for user experience.
### M-2 — handle_scan returns success before _scan_worker can detect Popen failure
- **File:** `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2235-2264` (handle_scan) + `:2053-2110` (_scan_worker Popen path) + `:2211-2220` (_scan_worker exception path)
- **Symptom (one sentence):** When `subprocess.Popen` in `_scan_worker` fails (python missing, rc-jav.py path wrong, permission denied, etc.), `handle_scan` has already returned `{"ok": True, "started": True}` to the extension because the thread was started but had not yet executed Popen; extension shows "scan started" for 1-2 seconds before the next `scan-progress` poll surfaces the actual error.
- **Why it's a bug:** `handle_scan` calls `thread.start()` at line 2263 then returns at line 2264 without waiting for Popen to succeed. If Popen raises (line 2092-2098) the worker's exception handler writes `scan_ok: false, error: ...` to SCAN_STATE_FILE (line 2211-2220) — but the extension already received `ok: true` and only learns of the failure on the next progress poll. Race window: short (1-2 s typically) but user-visible — UI shows "scan started" then suddenly "scan failed" with cryptic OS-level error.
- **Reproduction:**
1. Input: trigger Rebuild Cache from extension while python is not on PATH (or rc-jav.py path mis-set, or cwd has permission issue)
2. Expected: handle_scan returns an error immediately so extension can show clear message before any "started" state
3. Actual: extension shows "scan started" briefly → next poll → "scan failed: FileNotFoundError" or similar OS error
- **Suggested fix sketch:** validate Popen preconditions synchronously in `handle_scan` before returning (python exists, rc-jav.py exists, cwd writable). OR use a sync event/queue from worker to handle_scan so it can wait briefly for the first state-file write before returning.
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** CONFIRMED
- **Verifier confidence:** very high (100%)
- **Contract refs verifier read:** _scan_worker exception path; SCAN_STATE_FILE write timing; handle_scan_progress detection logic
- **Mirror check needed in:** none — Popen race specific to scan path; other RPCs run handlers synchronously
- **Status:** fixed
- **Fix:** `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2053-2305` — added per-invocation `spawn_event` (threading.Event) + `spawn_result` dict, both passed from `handle_scan` into `_scan_worker`. Worker sets `spawn_result["spawn_ok"] = True` immediately after `subprocess.Popen` returns OR `spawn_ok = False` + `error` on exception, then sets event. `handle_scan` waits up to 500 ms via `spawn_event.wait(timeout=0.5)` then branches: spawn_ok=True → `{ok: true, started: true}`; spawn_ok=False → `{ok: false, started: false, error}`; timeout → `{ok: true, started: true, startup_pending: true}` (backward compatible — existing UI ignores the new key). Per-invocation holder isolates the handoff from globals (`_scan_proc`) and state file (UI/progress surface) so cross-invocation contamination is impossible. Manifest bumped 0.1.36 → 0.1.37. Python syntax verified via `py_compile`. Threading harness smoke-tested in isolation: success → `{spawn_ok: True}` + event set; Popen fail (nonexistent binary) → `{spawn_ok: False, error: "[WinError 2] ..."}` + event set; slow Popen → event NOT set after 500 ms (timeout branch fires). All 3 cases behave correctly. **Runtime repro verified** via temporary instrumentation (injected `raise FileNotFoundError("simulated spawn fail")` immediately before the `subprocess.Popen` line in `_scan_worker`, reloaded extension, triggered Rebuild Cache, UI showed `scan failed: FileNotFoundError: simulated spawn fail` synchronously with no misleading "scan started" flash). Instrumentation reverted post-test; manifest stayed at 0.1.37 because no code-of-record change. **Note:** the bad-rcjavPath test (point Setup → rcjavPath to non-existent path) does NOT exercise this fix path — that goes through Popen success → rc-jav.py exits 2 → existing async exception handler. M-3 specifically targets Popen-itself-raising, which is reachable via Python-on-PATH missing, OS permission denied at spawn time, or analogous OS-level interference. Use the instrumented-raise technique for any future regression test.
---
## Light (L)
### L-1 — Stderr blocking read freezes progress display for up to 5 s on rc-jav stall
- **File:** `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py:2053-2227` (_scan_worker), specifically `:2101` (stderr iterator loop), `:2267-2275` (deferred kill)
- **Symptom (one sentence):** When rc-jav.py stalls mid-scan (e.g. rclone blocked on unresponsive remote), the `for raw in proc.stderr:` iterator at line 2101 blocks until either a stderr line arrives or proc exits — during which the scan-state file is not updated, so the extension's progress display shows stale state for up to 5 s (until the deferred-kill mechanism forces proc.terminate).
- **Why it's a bug (demoted from M to L):** Originally flagged as M. Re-verifier confirmed the blocking is real but: no data loss occurs, cancel still works (delayed by up to 5 s as terminate fires), zombie process not left behind. Pure UX progress-freeze, not workflow-breaking.
- **Reproduction:**
1. Input: rclone remote becomes unresponsive mid-scan
2. Expected: progress display updates with "stalled, will cancel in <N>s" indicator, OR heartbeat that resumes when remote recovers
3. Actual: progress frozen for 5 s, then deferred kill fires, scan marked complete with last-known progress
- **Suggested fix sketch:** add a watchdog timer that emits a heartbeat to SCAN_STATE_FILE every 1-2 s while stderr is silent, OR use non-blocking stderr reads with select/poll (cross-platform via threading)
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** PARTIAL — symptom real, severity originally over-stated
- **Verifier confidence:** high (100%)
- **Contract refs verifier read:** cancel path; deferred-kill behavior; SCAN_STATE_FILE update timing
- **Mirror check needed in:** none
- **Status:** open
---
## Needs Input (N)
(C-12 from candidates was N — _load_host_cache memoization key collision — left unverified per stop condition; candidate scratch retains it)
---
## False Positives (discarded)
- `host/rcjav-host.py:1216-1221` (_path_in_allowed_prefixes case-sensitivity) — flagged as Moderate "security bypass via uppercase remote". REFUTED. The gate is fail-SAFE, not fail-OPEN: case-mismatch causes the comparison to fail, which REJECTS the operation. No bypass possible. Verifier noted a related usability issue (legitimate uppercase paths get confusing rejection) but that's a UX gap, not a security bug.
- `host/rcjav-host.py:306-316` (read_message unbounded length prefix) — flagged as Moderate "DoS via 4 GiB length". REFUTED. Chrome native messaging protocol caps extension-to-host messages at 64 MiB browser-side per Chrome dev docs. Non-Brave processes cannot write to host stdin (it's piped by the browser into the host child process). The theoretical 4 GiB read cannot actually be triggered through any practical attack surface. Pure defensive-coding gap, not a real DoS.
+67
View File
@@ -0,0 +1,67 @@
# Bug Report — Python CLI — audit-snapshot-2026-05-24T15-55Z.md
Snapshot: audit-snapshot-2026-05-24T15-55Z.md
Required-reading docs read: AGENTS.md / TODO.md / CACHE_CONTRACT.md (at D:\DEV\Extensions\Production\rclone-jav\docs\CACHE_CONTRACT.md) / bug-audit-plan.md / project memory
Auditor agent: fresh Explore agent (chunk 1 auditor)
Verifier agents: fresh Explore agents per candidate, blind context, stricter contract-check prompt
This file contains CONFIRMED + PARTIAL findings only. Candidate scratch lives in `bugs-candidates-python.md`. REFUTED / NEEDS-INFO candidates stay in scratch with verifier response appended.
**Chunk 1 calibration note:** Moderate verification yielded 1 confirmed bug with 75% rejection rate (3/4 REFUTED). Auditor's recurring weakness: flagging `f["key"]` direct access as KeyError risk without checking the contract that guarantees the dict shape upstream (rclone lsjson schema, cache.json schema enforced by load_cache validation + CACHE_CONTRACT.md). Stricter verifier prompt with required contract-check caught all 3 false positives. **Light candidates were NOT verified per audit-plan stop condition** (>30% rejection → halt L verification). The Python auditor likely shares a similar pattern-matching weakness on L candidates — revisit only if needed. See `bugs-candidates-python.md` for unverified L list (C-5, C-6, C-7, C-8, C-9).
---
## Severe (S)
(none flagged by auditor in this chunk)
---
## Moderate (M)
### M-1 — save_config lacks Windows file-locking retry that save_cache has
- **File:** `D:\DEV\Project\rclone-jav\rcjav\cli.py:186-189` (save_config), with comparison at `rcjav/cache.py:142-147` (save_cache)
- **Symptom (one sentence):** When a user runs `--save` while config.json is briefly locked by antivirus, Windows Search indexer, or any reader, `os.replace(tmp, CONFIG_PATH)` raises uncaught PermissionError and the user sees a Python traceback — config write fails. `save_cache` for the same os.replace pattern has explicit PermissionError + 0.5s retry; `save_config` does not.
- **Why it's a bug:** Asymmetric protection. `save_cache` (cache.py:142-147):
```python
try: os.replace(tmp, CACHE_PATH)
except PermissionError: time.sleep(0.5); os.replace(tmp, CACHE_PATH)
```
`save_config` (cli.py:186-189):
```python
tmp.write_text(json.dumps(cfg, indent=2), encoding="utf-8")
os.replace(tmp, CONFIG_PATH)
```
Single call site at cli.py:465 inside `--save` flag handler, NOT wrapped in try/except. Outer exception handler at cli.py:1000-1004 catches only KeyboardInterrupt. PermissionError propagates uncaught → Python traceback to user. On Windows with active AV (Defender, Avast, etc.), file-lock-during-replace is common.
- **Reproduction:**
1. Input: user runs `python rc-jav.py --save --target cq:JAV` while config.json is being read by another process (AV scan, Windows Search indexer reindexing, manual file open in editor)
2. Expected: write retries briefly + succeeds, OR clear "config write failed, retry" message
3. Actual: PermissionError raised from os.replace, uncaught, Python prints traceback `PermissionError: [WinError 32] The process cannot access the file because it is being used by another process`. tmp file may remain on disk. Config not persisted. User confused.
- **Suggested fix sketch:** copy save_cache's pattern verbatim — wrap os.replace in try/except PermissionError with 0.5s sleep + single retry
- **Verifier agent:** fresh Explore, blind context, stricter prompt
- **Verifier verdict:** CONFIRMED
- **Verifier confidence:** high (95%)
- **Contract refs verifier read:** save_cache implementation as comparison; outer exception handler scope
- **Mirror check needed in:** any other `os.replace` callsite in `rcjav/` package that writes user-visible config/state (search for `os.replace` in rcjav/ — only save_cache and save_config currently)
- **Status:** fixed
- **Fix:** `D:\DEV\Project\rclone-jav\rcjav\cli.py:186-194` — wrapped `os.replace(tmp, CONFIG_PATH)` in same try/except PermissionError + time.sleep(0.5) + retry pattern that save_cache uses (rcjav/cache.py:142-147). Now symmetric: both writers handle transient Windows file locks identically. Single retry (not infinite) — persistent locks still bubble PermissionError to caller, matching save_cache behavior. `time` already imported in cli.py:14 — no new import needed. **No manifest bump** — CLI repo only, no extension files touched. Python syntax verified via `py_compile`. Smoke-tested in isolation: (1) normal write produces correct file; (2) first os.replace raises PermissionError then succeeds on retry — final state correct, 0.5s sleep observed (2 calls, elapsed 0.50s); (3) persistent PermissionError on both attempts → bubbles up to caller (2 attempts, matches save_cache). Mirror check resolved: only save_cache and save_config use os.replace in rcjav/; both now have retry.
---
## Light (L)
(none promoted — chunk 1 L verification skipped per stop condition)
---
## Needs Input (N)
(none promoted)
---
## False Positives (discarded)
- `rcjav/rclone_io.py:66` — flagged as Moderate "rclone KeyError on Path". REFUTED. rclone lsjson output contract guarantees `Path` field on every item per official docs. Direct `item["Path"]` access is appropriate fail-fast for contract violation. Lines 77-78's `.get()` pattern for Size/ModTime is defensive over-engineering for those fields, NOT evidence Path needs the same.
- `rcjav/library.py:257` — flagged as Moderate "library cache KeyError on path". REFUTED via 3 converging facts: (1) CACHE_CONTRACT.md mandates `path` key on every file entry, (2) `load_cache()` (cache.py:67-106) validates schema before find_library_issues runs — non-conformant caches get wiped via `_fresh_cache()`, (3) FileEntry dataclass + every cache write site explicitly emits `path`. The `.get()` pattern at cli.py:526 (`--reextract`) is defensive because that path reads cache.json directly without re-validation; library.py operates on already-validated data.
- `rcjav/library.py:328-330` — flagged as Moderate "rename_file KeyError on path/jav_id". REFUTED. `f` comes only from cache entries (`remote_data.get("files", [])`), which are contract-guaranteed to have `path`. Caller scalar args (`old_rel_path`, `new_rel_path`) are strings, not dicts. Line 330's `or f["jav_id"]` fallback is for `extract_id` returning None, NOT for missing key — correct design. Auditor conflated scalar caller args with iterated dict entries.
+18 -1
View File
@@ -1,5 +1,22 @@
{
"default_target": [
"cq:JAV"
]
],
"filename_hygiene": {
"custom_rules": []
},
"keep_ranking": {
"priority_folders": [
"ClearJAV"
],
"size_tolerance_mib": 0,
"format_preference": [
"mkv",
"mp4",
"wmv",
"avi"
],
"tiebreak_res_tag": true,
"tiebreak_longer_name": true
}
}
+15 -7
View File
@@ -40,13 +40,21 @@ The runner imports `rc-jav.py` in place, exercises `extract_id` against
## Running the extension side
No automated runner today. `content.js` lives inside an IIFE that the
browser injects into pages, so importing it from Node would require
either an extraction refactor or a duplicated copy of the regex. Until
that lands, treat `query-extraction.json` and `shared-normalization.json`
as the canonical specification: if you touch `ID_RE_DASHED`,
`ID_RE_UNDASHED`, or `BUILTIN_ID_NORMALIZERS` in content.js, eyeball
this corpus and confirm the cases still describe expected behavior.
```bash
node fixtures/run-node.mjs
```
The Node runner exercises `query-extraction.json` and
`shared-normalization.json` against a hand-mirrored copy of
`normalizeId` from `content.js`. Because `content.js` lives inside an
injected IIFE in the extension repo, it can't be imported directly —
the runner duplicates the regexes (`ID_RE_DASHED`, `ID_RE_UNDASHED`,
`BUILTIN_ID_NORMALIZERS`).
If you change any of those in `content.js`, mirror the change at the
top of `fixtures/run-node.mjs`. `shared-normalization.json` catches
silent cross-side drift because both Python and Node exercise it; a
case that passes Python but fails Node (or vice versa) is the canary.
## Adding a case
+102
View File
@@ -0,0 +1,102 @@
// Node-side fixture runner — mirrors content.js normalizeId() for
// query-extraction.json + shared-normalization.json.
//
// IMPORTANT: this file *replicates* the regexes from content.js by hand.
// content.js lives inside an injected IIFE in the extension, so a real
// import isn't feasible without restructuring it. If you touch ID_RE_DASHED,
// ID_RE_UNDASHED, or BUILTIN_ID_NORMALIZERS in content.js, update them
// here too. fixtures/shared-normalization.json catches cross-side drift
// because Python and this runner both exercise it.
//
// Usage:
// node fixtures/run-node.mjs
//
// Exits non-zero on any fixture case failure.
import { readFileSync } from "node:fs";
import { fileURLToPath } from "node:url";
import { dirname, join } from "node:path";
const __dirname = dirname(fileURLToPath(import.meta.url));
// ---------- mirror of content.js ----------
const ID_RE_DASHED = /\b([A-Za-z][A-Za-z0-9]{1,})-(\d{2,7})[a-zA-Z]?\b/;
const ID_RE_UNDASHED = /\b([A-Za-z][A-Za-z0-9]{1,})(\d{3,5})[a-zA-Z]?\b/;
const BUILTIN_ID_NORMALIZERS = [
// FC2-PPV in any dash configuration: FC2PPV12345, FC2-PPV12345, FC2-PPV-12345
{ re: /\bFC2-?PPV-?(\d{4,})\b/i, fmt: "FC2-PPV-$1" },
// Some sites display FC2 IDs without the PPV segment: FC2-1841460.
{ re: /\bFC2-(\d{4,})\b/i, fmt: "FC2-PPV-$1" },
];
function applyNormalizers(text, userList = []) {
const all = [...userList, ...BUILTIN_ID_NORMALIZERS];
for (const n of all) {
let re;
try { re = n.re instanceof RegExp ? n.re : new RegExp(n.re, "i"); } catch { continue; }
const m = text.match(re);
if (m) {
return n.fmt.replace(/\$(\d)/g, (_, i) => m[+i] || "");
}
}
return null;
}
function normalizeId(text) {
if (!text) return null;
const fromNormalizer = applyNormalizers(text);
if (fromNormalizer) return fromNormalizer.toUpperCase();
let m = text.match(ID_RE_DASHED);
if (!m) m = text.match(ID_RE_UNDASHED);
if (!m) return null;
return `${m[1].toUpperCase()}-${m[2]}`;
}
// ---------- harness ----------
function load(name) {
return JSON.parse(readFileSync(join(__dirname, name), "utf8"));
}
function run(label, cases) {
let passed = 0;
let failed = 0;
for (const c of cases) {
const got = normalizeId(c.input);
if (got === c.expected) {
passed += 1;
} else {
failed += 1;
console.log(` FAIL [${label}] ${JSON.stringify(c.name)}`);
console.log(` input = ${JSON.stringify(c.input)}`);
console.log(` expected = ${JSON.stringify(c.expected)}`);
console.log(` got = ${JSON.stringify(got)}`);
}
}
return { passed, failed };
}
let totalPassed = 0;
let totalFailed = 0;
for (const [filename, fnLabel] of [
["query-extraction.json", "normalizeId"],
["shared-normalization.json", "normalizeId"],
]) {
const doc = load(filename);
const cases = doc.cases || [];
console.log(`\n${filename} -> node.${fnLabel} (${cases.length} cases)`);
const { passed, failed } = run(filename, cases);
totalPassed += passed;
totalFailed += failed;
console.log(` ${passed} passed | ${failed} failed`);
}
console.log();
if (totalFailed > 0) {
console.log(`FAILED: ${totalFailed} of ${totalPassed + totalFailed} cases`);
process.exit(1);
}
console.log(`OK: all ${totalPassed} cases passed`);
+603
View File
@@ -0,0 +1,603 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>rclone-jav — Library Cleanup mockup (preview-first, no resolution probing)</title>
<style>
:root {
color-scheme: dark;
--bg: #0c0e10;
--shell: #14171a;
--panel: #181b1e;
--surface: #1f2327;
--line: #292e33;
--line-2: #3a4148;
--text: #e1e6eb;
--muted: #8a949d;
--blue: #6ec5ff;
--green: #7de4a0;
--yellow: #ffd36c;
--red: #ff9097;
--purple: #c5a9ff;
--orange: #ffb072;
}
* { box-sizing: border-box; }
body { margin:0; background:var(--bg); color:var(--text); font:13px/1.5 -apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif; }
main { padding:24px; max-width:1320px; margin:0 auto; }
h1 { margin:0 0 4px; font-size:24px; }
h2 { margin:28px 0 8px; font-size:17px; color:#f4f7fa; }
h3 { margin:0 0 6px; font-size:11px; text-transform:uppercase; color:#9ba6af; letter-spacing:0.04em; }
p { margin:0 0 10px; color:var(--muted); }
.intro { color:var(--muted); max-width:960px; margin:6px 0 18px; font-size:13px; }
code { font-family:Consolas,monospace; background:#1a1f24; padding:1px 5px; border-radius:3px; color:#cfdde5; font-size:11px; }
.meta-banner { display:flex; align-items:center; gap:10px; padding:10px 14px; background:#11181f; border:1px solid #1f2b35; border-radius:6px; margin-bottom:18px; font-size:12px; color:var(--muted); }
.meta-banner .dot { width:8px; height:8px; border-radius:50%; background:var(--green); box-shadow:0 0 0 3px rgba(125,228,160,0.15); }
.meta-banner b { color:#cfdde5; }
.status-grid { display:grid; grid-template-columns:repeat(3,minmax(0,1fr)); gap:12px; margin-bottom:24px; }
.status-card { background:#13171b; border:1px solid #232a30; border-radius:6px; padding:12px; }
.status-card h3 { color:#dce5ed; text-transform:none; letter-spacing:0; font-size:13px; margin-bottom:8px; }
.status-card.todo { border-left:3px solid var(--yellow); }
.status-card.work { border-left:3px solid var(--blue); }
.status-card.done { border-left:3px solid var(--green); }
.status-card ul { margin:0; padding-left:16px; color:var(--muted); font-size:12px; }
.status-card ul li { margin:2px 0; }
.status-card .num { color:var(--text); font-weight:700; font-size:18px; }
.legend { display:flex; gap:6px; flex-wrap:wrap; margin-bottom:14px; }
.pill { border-radius:12px; padding:3px 9px; font-size:11px; border:1px solid var(--line-2); background:#22272b; color:var(--text); }
.pill.green { color:var(--green); background:#143020; border-color:#245036; }
.pill.blue { color:var(--blue); background:#132837; border-color:#284b66; }
.pill.red { color:var(--red); background:#321618; border-color:#5b2228; }
.pill.yellow { color:var(--yellow); background:#332b16; border-color:#645228; }
.pill.orange { color:var(--orange); background:#3a2818; border-color:#7a4b25; }
.pill.purple { color:var(--purple); background:#241d35; border-color:#453363; }
.pill.muted { color:#9aa4ac; background:#1a1f24; border-color:#2c333a; }
.mock { border:1px solid #2c333a; border-radius:8px; background:var(--shell); overflow:hidden; margin-bottom:14px; }
.mock-head { padding:9px 14px; border-bottom:1px solid var(--line); background:#0f1214; display:flex; align-items:center; justify-content:space-between; }
.mock-head .title { color:#fff; font-weight:600; font-size:13px; }
.mock-head .sub { color:var(--muted); font-size:11px; }
.mock-body { padding:14px 16px; }
button { border:1px solid var(--line-2); border-radius:4px; padding:5px 10px; background:#252a2e; color:var(--text); font:inherit; cursor:default; font-size:11px; }
button.primary { background:#163923; color:#aaf3bf; border-color:#285b3a; }
button.live { background:#143247; color:#9fd9ff; border-color:#2e607f; }
button.danger { background:#3a191d; color:#ffb2b7; border-color:#722c33; }
button.ghost { background:transparent; color:#9aa4ac; border-color:#3a4148; }
button.warn { background:#3a3017; color:#ffd784; border-color:#645228; }
/* Filter chips */
.chip-row { display:flex; gap:6px; flex-wrap:wrap; padding:10px 14px; background:#0d1013; border-bottom:1px solid var(--line); font-size:11px; align-items:center; }
.chip { display:inline-flex; align-items:center; gap:8px; padding:4px 10px; border:1px solid #2a3138; border-radius:12px; background:#1a1f24; color:#9aa4ac; cursor:default; min-width:auto; }
.chip .cnt { font-variant-numeric:tabular-nums; min-width:22px; padding:0 6px; text-align:right; background:rgba(255,255,255,0.05); border-radius:9px; font-size:10px; font-weight:600; color:#888; }
.chip.active { background:#27313a; color:#fff; border-color:#36526a; }
.chip.active .cnt { background:rgba(157,204,255,0.12); color:#9dccff; }
/* outcome-tinted active chips */
.chip.t-cleanup.active { background:#143020; border-color:#245036; color:#9be3b3; }
.chip.t-cleanup.active .cnt { background:rgba(155,227,179,0.12); color:#9be3b3; }
.chip.t-strip.active { background:#332b16; border-color:#645228; color:#ffd784; }
.chip.t-strip.active .cnt { background:rgba(255,215,132,0.12); color:#ffd784; }
.chip.t-conflict.active { background:#321618; border-color:#5b2228; color:#ff9097; }
.chip.t-conflict.active .cnt { background:rgba(255,144,151,0.12); color:#ff9097; }
.chip.t-optional.active { background:#241d35; border-color:#453363; color:#c5a9ff; }
.chip.t-optional.active .cnt { background:rgba(197,169,255,0.12); color:#c5a9ff; }
.chip-row .sep { color:#3a4148; padding:0 2px; }
.chip-row .right { margin-left:auto; display:flex; gap:6px; }
.row { display:grid; grid-template-columns: 22px 1fr 22px 1fr 130px; gap:10px; padding:8px 10px; background:#101418; border:1px solid #1d2429; border-radius:4px; margin-top:5px; align-items:center; font-size:12px; }
.row.conflict { background:#231410; border-color:#43251c; }
.row.skip-default { opacity:0.65; }
.row .box { width:14px; height:14px; border:1px solid #4a5560; border-radius:3px; background:#0a0c0e; position:relative; }
.row .box.checked { background:#163923; border-color:#285b3a; }
.row .box.checked::after { content:"✓"; color:#aaf3bf; font-size:11px; position:absolute; top:-3px; left:1px; }
.row .arrow { color:#5d6772; text-align:center; }
.row .name { font-family:Consolas,monospace; font-size:11px; color:#cdd6dd; overflow:hidden; text-overflow:ellipsis; white-space:nowrap; }
.row .name.old { color:#a8b3bb; }
.row .name.new { color:#9be3b3; }
.row .name.new.conflict { color:var(--red); }
.row .meta { font-size:10px; color:var(--muted); display:flex; flex-direction:column; gap:2px; align-items:flex-end; }
.row .meta .tag { padding:1px 6px; border-radius:8px; font-size:9px; background:#1a1f24; color:#9aa4ac; border:1px solid #2a3138; }
.row .meta .tag.strip { background:#3a2818; color:#ffb072; border-color:#7a4b25; }
.row .meta .tag.transform { background:#132837; color:#9fd9ff; border-color:#2e607f; }
.row .meta .tag.conflict { background:#321618; color:#ff9097; border-color:#5b2228; }
.row .meta .tag.still-bare { background:#332b16; color:#ffd784; border-color:#645228; }
.row .name-stack { display:flex; flex-direction:column; gap:2px; min-width:0; }
.row .name-stack .secondary { font-family:Consolas,monospace; font-size:10px; color:#5d6772; overflow:hidden; text-overflow:ellipsis; white-space:nowrap; }
.reason { font-size:10px; color:#6b757d; margin-top:2px; padding-left:32px; font-family:Consolas,monospace; }
/* Plan summary footer */
.plan-footer { margin-top:18px; padding:12px; background:#0f1518; border:1px solid #1d2a30; border-radius:5px; display:flex; align-items:center; justify-content:space-between; }
.plan-footer .counts { font-size:12px; color:var(--muted); display:flex; gap:14px; }
.plan-footer .counts b { color:#fff; }
/* Decision table */
table.spec { width:100%; border-collapse:collapse; margin-top:10px; font-size:12px; }
table.spec th { text-align:left; padding:8px 10px; background:#181d22; color:#cfdde5; font-weight:600; border-bottom:1px solid #2a3138; font-size:11px; }
table.spec td { padding:8px 10px; border-bottom:1px solid #1c2126; color:var(--text); vertical-align:top; font-size:12px; }
table.spec tr:nth-child(even) td { background:#10141a; }
table.spec td.opt { font-family:Consolas,monospace; color:var(--blue); font-weight:600; }
table.spec td.small { color:var(--muted); font-size:11px; }
/* Option compare cards */
.opt-grid { display:grid; grid-template-columns: repeat(3, minmax(0,1fr)); gap:12px; margin-top:12px; }
.opt-card { background:#13171b; border:1px solid #232a30; border-radius:6px; padding:12px; }
.opt-card h3 { color:#dce5ed; text-transform:none; font-size:13px; letter-spacing:0; margin-bottom:6px; }
.opt-card .verdict { font-size:11px; margin-top:8px; }
.opt-card.rec { border-color:#285b3a; }
.opt-card .verdict b { color:#dce5ed; }
/* Plan modal frame */
.modal-shell { background:#181b1e; border:1px solid var(--line-2); border-radius:6px; box-shadow:0 8px 24px rgba(0,0,0,.55); overflow:hidden; }
.modal-head { display:flex; align-items:center; justify-content:space-between; background:#0f1214; padding:10px 14px; border-bottom:1px solid var(--line); }
.modal-head .title { color:#fff; font-weight:600; font-size:13px; }
.modal-head .sub { color:var(--muted); font-size:11px; margin-top:2px; }
.modal-head .x { color:#7a838c; font-size:14px; cursor:default; }
.modal-toolbar { display:flex; align-items:center; gap:8px; padding:8px 14px; background:#10141a; border-bottom:1px solid var(--line); font-size:11px; color:var(--muted); }
/* Apply progress */
.progress { background:#0a0c0e; border:1px solid var(--line-2); border-radius:4px; padding:10px 12px; }
.progress-bar { height:6px; background:#1a1f24; border-radius:3px; overflow:hidden; margin-top:6px; }
.progress-fill { height:100%; background:linear-gradient(90deg, var(--blue), var(--green)); width:42%; transition:width .3s; }
.progress-meta { display:flex; justify-content:space-between; font-size:11px; color:var(--muted); margin-top:6px; }
.progress-meta b { color:var(--text); }
/* Settings card for ignore-list */
.setting-card { background:#13171b; border:1px solid #232a30; border-radius:6px; padding:12px; margin-top:10px; }
.setting-card label { display:flex; align-items:center; gap:10px; color:var(--text); font-size:12px; }
.setting-card label .sublabel { display:block; color:var(--muted); font-size:11px; margin-left:24px; margin-top:2px; }
@media (max-width: 1000px) {
.opt-grid, .status-grid { grid-template-columns: 1fr; }
}
</style>
</head>
<body>
<main>
<h1>Library Cleanup — preview-first mockup</h1>
<p class="intro">Phase 1 only: deterministic transforms + junk-strip on names that already have resolution data, or have garbage trailing tokens. <b>No ffprobe</b>. No resolution-adding work. Real numbers from the 2026-05-26 Library Issues export.</p>
<div class="meta-banner">
<span class="dot"></span>
<span><b>Scope locked:</b> Phase 1 cleanup only. Phase 2 resolution probing is a separate session. Goal here: 85 cleanup-tier renames + ~21 junk-strips on missing-resolution names. <b>Total in scope: ~106 files.</b> Preview is mandatory before any rclone moveto runs.</span>
</div>
<!-- ============================================== -->
<h2>1 — Volume picture</h2>
<div class="status-grid">
<div class="status-card done">
<h3>Already clean</h3>
<p style="color:var(--muted); font-size:11px;">No work needed.</p>
<ul>
<li><b>0</b> bracket_id</li>
<li><b>0</b> nohyphen_id</li>
</ul>
</div>
<div class="status-card work">
<h3>Cleanup-tier (Phase 1a)</h3>
<p style="color:var(--muted); font-size:11px;">Already have resolution data. Just reshape.</p>
<ul>
<li><span class="num">64</span> <b>resolution_part_suffix</b><code>RBD-394 [1080p].2of2.wmv</code></li>
<li><span class="num">18</span> <b>resolution_copy_suffix</b><code>PIYO-005 [1080p] (1).mp4</code></li>
<li><span class="num">3</span> <b>resolution_bare_suffix</b><code>REAL-487.450p.wmv</code></li>
<li style="margin-top:6px;color:#9be3b3;"><b>85</b> become fully canonical after Phase 1a</li>
</ul>
</div>
<div class="status-card todo">
<h3>Junk-strip (Phase 1b)</h3>
<p style="color:var(--muted); font-size:11px;">Strip leftover tags. Still missing resolution after.</p>
<ul>
<li><span class="num">2</span> empty brackets <code>[]</code></li>
<li><span class="num">5</span> <code>.HD</code> suffix (failed auto-label)</li>
<li><span class="num">6</span> <code>[396m]</code> bracket (bitrate, not resolution)</li>
<li><span class="num">~9</span> <code>_PARTN</code> → optional normalize</li>
<li style="margin-top:6px;color:#ffd784;"><b>~21</b> renamed, still need Phase 2 ffprobe later</li>
</ul>
</div>
</div>
<p style="color:var(--muted); font-size:12px;">The other 775 missing_resolution files (bare names like <code>ROYD-109.mp4</code>) need resolution data we don't have on hand. Out of scope for this cleanup session.</p>
<!-- ============================================== -->
<h2>2 — Preview flow options (P1 / P2 / P3)</h2>
<div class="opt-grid">
<div class="opt-card">
<h3>P1 — Inline rows in existing Library Issues modal</h3>
<p style="font-size:11px; color:var(--muted);">Add Old → New column to existing rows. Same modal. New "Cleanup Plan" filter chip on top of existing All / Found / Missing chips.</p>
<div class="verdict"><b>Trade:</b> no new surface, but modal is busy. 779 missing_resolution rows are already cramped; adding 106 cleanup rows compounds it.</div>
</div>
<div class="opt-card rec">
<h3>P2 — Dedicated Cleanup Plan modal <span class="pill green" style="margin-left:6px;">recommended</span></h3>
<p style="font-size:11px; color:var(--muted);">Click "Generate Cleanup Plan" in Library Review. Opens its own modal with only the ~106 affected files, grouped by transform kind. Per-group select-all, per-row toggle, Apply N button at bottom.</p>
<div class="verdict"><b>Trade:</b> focused UX, but a new modal to maintain. Mockup below uses this.</div>
</div>
<div class="opt-card">
<h3>P3 — External JSON plan + reimport</h3>
<p style="font-size:11px; color:var(--muted);">Export cleanup-plan-{ts}.json. User edits in text editor (deletes lines to skip). Re-import to apply.</p>
<div class="verdict"><b>Trade:</b> full audit trail + offline review, but high friction. Worth offering as a secondary "Export plan" button alongside P2.</div>
</div>
</div>
<!-- ============================================== -->
<h2>3 — Mockup: P2 Cleanup Plan modal</h2>
<p>Below shows what the dedicated modal looks like for your actual library state. Grouped by transform kind. Per-row checkbox. Conflict rows default-unchecked + flagged.</p>
<div class="modal-shell">
<div class="modal-head">
<div>
<div class="title">Cleanup Plan</div>
<div class="sub">Phase 1 deterministic transforms + junk-strip. No resolution probing.</div>
</div>
<div class="x"></div>
</div>
<div class="modal-toolbar">
<span><b style="color:#fff;">106</b> files in plan</span>
<span>·</span>
<span><b style="color:#9be3b3;">85</b> cleanup</span>
<span><b style="color:#ffd784;">21</b> junk-strip</span>
<span><b style="color:#ff9097;">2</b> conflicts</span>
<span style="margin-left:auto;">
<button class="ghost">Export plan (JSON)</button>
</span>
</div>
<!-- Filter chips replace the long stacked group headers -->
<div class="chip-row">
<span class="chip active"><span>All</span><span class="cnt">106</span></span>
<span class="sep">|</span>
<span class="chip t-cleanup"><span>part suffix</span><span class="cnt">64</span></span>
<span class="chip t-cleanup"><span>copy (N)</span><span class="cnt">18</span></span>
<span class="chip t-cleanup"><span>bare res</span><span class="cnt">3</span></span>
<span class="sep">|</span>
<span class="chip t-strip"><span>empty []</span><span class="cnt">2</span></span>
<span class="chip t-strip"><span>strip .HD</span><span class="cnt">5</span></span>
<span class="chip t-strip"><span>strip [Nm]</span><span class="cnt">6</span></span>
<span class="sep">|</span>
<span class="chip t-optional"><span>_PARTN</span><span class="cnt">9</span></span>
<span class="sep">|</span>
<span class="chip t-conflict"><span>conflicts</span><span class="cnt">2</span></span>
<span class="right">
<button class="ghost" style="font-size:10px;">Select all visible</button>
<button class="ghost" style="font-size:10px;">Deselect all visible</button>
</span>
</div>
<div style="padding:12px 16px;">
<div style="font-size:11px; color:var(--muted); margin-bottom:8px;">
Showing <b style="color:#fff;">All</b> · 106 rows (5 displayed below, rest virtualized in real impl). Click a chip to filter. Multi-select not supported — chips are single-choice radio-style.
</div>
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">RBD-394 [1080p].2of2.wmv</span>
<span class="secondary">cq:JAV/Q-U/R/RBD/ · part suffix</span>
</div>
<div class="arrow"></div>
<div class="name new">RBD-394 #part2 [1080p].wmv</div>
<div class="meta">
<span class="tag transform">transform</span>
<span>2.63 GiB</span>
</div>
</div>
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">PIYO-005 [1080p] (1).mp4</span>
<span class="secondary">cq:JAV/... · copy (N) · no conflict</span>
</div>
<div class="arrow"></div>
<div class="name new">PIYO-005 [1080p].mp4</div>
<div class="meta">
<span class="tag transform">drop (N)</span>
<span>5.84 GiB</span>
</div>
</div>
<div class="row conflict skip-default">
<div class="box"></div>
<div class="name-stack">
<span class="name old">HFD-197 [720p] (1).mp4</span>
<span class="secondary" style="color:#ff9097;">CONFLICT — HFD-197 [720p].mp4 already in cache</span>
</div>
<div class="arrow"></div>
<div class="name new conflict">HFD-197 [720p].mp4 ✗</div>
<div class="meta">
<span class="tag conflict">conflict</span>
<span>2.91 GiB</span>
</div>
</div>
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">REAL-487.450p.wmv</span>
<span class="secondary">cq:JAV/Q-U/R/REAL/ · bare res</span>
</div>
<div class="arrow"></div>
<div class="name new">REAL-487 [450p].wmv</div>
<div class="meta">
<span class="tag transform">wrap</span>
<span>2.52 GiB</span>
</div>
</div>
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">TYOD-232 [].wmv</span>
<span class="secondary">empty [] · auto-labeler leftover</span>
</div>
<div class="arrow"></div>
<div class="name new">TYOD-232.wmv</div>
<div class="meta">
<span class="tag strip">strip</span>
<span class="tag still-bare">still missing res</span>
</div>
</div>
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">MXGS-672 [396m].avi</span>
<span class="secondary">[Nm] interpreted as bitrate, not resolution</span>
</div>
<div class="arrow"></div>
<div class="name new">MXGS-672.avi</div>
<div class="meta">
<span class="tag strip">strip bracket</span>
<span class="tag still-bare">still missing res</span>
</div>
</div>
<div class="row skip-default">
<div class="box"></div>
<div class="name-stack">
<span class="name old">KV-118 - Aiba Reika_PART1.mp4</span>
<span class="secondary" style="color:#c5a9ff;">_PARTN — OPTIONAL, default unchecked (cosmetic; extract_id already handles)</span>
</div>
<div class="arrow"></div>
<div class="name new">KV-118 - Aiba Reika #part1.mp4</div>
<div class="meta">
<span class="tag transform">cosmetic</span>
<span>3.40 GiB</span>
</div>
</div>
<div style="text-align:center; color:var(--muted); font-size:11px; padding:10px 0;">… 99 more rows in real plan (virtualized scrolling). Use chips above to narrow down.</div>
</div>
<div class="plan-footer">
<div class="counts">
<span><b>104</b> selected</span>
<span><b style="color:#ff9097;">2</b> skipped (conflicts)</span>
<span>est. <b>~2 min</b> apply time</span>
</div>
<div style="display:flex; gap:8px;">
<button class="ghost">Cancel</button>
<button class="warn">Save plan to disk (no apply)</button>
<button class="primary">Apply 104 renames</button>
</div>
</div>
</div>
<!-- ============================================== -->
<h2>4 — Per-row anatomy</h2>
<div class="mock">
<div class="mock-head">
<div>
<div class="title">One row, annotated</div>
<div class="sub">Five columns: checkbox · old name + folder context · arrow · new name · transform tag + size</div>
</div>
</div>
<div class="mock-body" style="padding:18px 22px;">
<div class="row">
<div class="box checked"></div>
<div class="name-stack">
<span class="name old">RBD-394 [1080p].2of2.wmv</span>
<span class="secondary">cq:JAV/Q-U/R/RBD/</span>
</div>
<div class="arrow"></div>
<div class="name new">RBD-394 #part2 [1080p].wmv</div>
<div class="meta">
<span class="tag transform">transform</span>
<span>2.63 GiB</span>
</div>
</div>
<div style="margin-top:10px; font-size:11px; color:var(--muted); line-height:1.7;">
<div><b style="color:#fff;">checkbox</b> — default checked unless conflict detected; per-row toggle, per-group select-all</div>
<div><b style="color:#fff;">old name + remote folder</b> — folder context is muted so it stays scannable; full <code>full_path</code> in tooltip</div>
<div><b style="color:#fff;">arrow</b> — separator only, no interactivity</div>
<div><b style="color:#fff;">new name</b> — green text on safe transforms, red on conflicts</div>
<div><b style="color:#fff;">meta</b> — transform-kind tag + file size; flags like <code>conflict</code> or <code>still missing res</code> stack vertically</div>
</div>
</div>
</div>
<!-- ============================================== -->
<h2>5 — Conflict cases</h2>
<p>Cache-based conflict detection runs synchronously when building the plan. Real rclone-side recheck runs at apply time as belt-and-suspenders.</p>
<table class="spec">
<thead>
<tr><th style="width:30%;">Conflict case</th><th>Example</th><th>Plan default</th></tr>
</thead>
<tbody>
<tr>
<td><b>Target exists in cache</b></td>
<td><code>PIYO-005 [1080p] (1).mp4</code><code>PIYO-005 [1080p].mp4</code><br><span class="small">stripped form already in cache.json</span></td>
<td><span class="pill red">skip</span> Default-unchecked. Reason text: "Use Duplicate Review to decide which to keep."</td>
</tr>
<tr>
<td><b>Two plan rows target same new name</b></td>
<td>If <code>ABC-001 (1).mp4</code> AND <code>ABC-001 (2).mp4</code> both want <code>ABC-001.mp4</code></td>
<td><span class="pill red">skip both</span> Plan generator detects in-plan collision; flags both rows with "conflict-with-plan-row N."</td>
</tr>
<tr>
<td><b>Target appears at apply time</b></td>
<td>File renamed externally between plan generation and Apply click</td>
<td><span class="pill yellow">apply-time skip</span> rclone lsf check fails; row reported as <code>skipped: target appeared</code> in summary modal.</td>
</tr>
<tr>
<td><b>rclone moveto error</b></td>
<td>Network glitch, permission, rclone bug</td>
<td><span class="pill yellow">apply-time fail</span> Row marked failed in summary. Other renames continue. User can re-run plan to retry.</td>
</tr>
</tbody>
</table>
<!-- ============================================== -->
<h2>6 — Apply progress + summary</h2>
<p>Bulk apply on ~104 files is roughly 1-2 minutes at typical rclone moveto latency. Progress + cancel needed.</p>
<div class="mock">
<div class="mock-head">
<div>
<div class="title">Applying renames…</div>
<div class="sub">Progress channel mirrors the existing scan-progress pattern</div>
</div>
<button class="danger">Cancel</button>
</div>
<div class="mock-body">
<div class="progress">
<div style="display:flex; justify-content:space-between; font-size:12px;">
<span style="color:#dce5ed;">Renaming <b style="font-family:Consolas,monospace;">NFDM-247 [720p].1of2.wmv</b></span>
<span style="color:var(--muted);">43 / 104</span>
</div>
<div class="progress-bar"><div class="progress-fill"></div></div>
<div class="progress-meta">
<span><b>41</b> succeeded · <b style="color:var(--red);">2</b> conflicts (skipped) · <b style="color:var(--yellow);">0</b> failed</span>
<span>elapsed 22s · est <b>32s</b> remaining</span>
</div>
</div>
<p style="margin-top:10px; font-size:11px; color:var(--muted);">Cancel waits for the current rclone moveto to complete before stopping. Partial application is safe — cache is patched per-rename, batch <code>save_cache</code> still fires at the end of the cancelled run.</p>
</div>
</div>
<h3 style="margin-top:18px;">Result summary modal (after apply)</h3>
<div class="mock" style="margin-top:6px;">
<div class="mock-head">
<div>
<div class="title">Cleanup complete</div>
<div class="sub">104 of 106 renames attempted · 102 succeeded · 2 conflicts auto-skipped</div>
</div>
<div class="x"></div>
</div>
<div class="mock-body">
<div style="display:grid; grid-template-columns:repeat(4,minmax(0,1fr)); gap:8px; font-size:12px;">
<div style="background:#143020;border:1px solid #245036;border-radius:4px;padding:10px;">
<div style="color:#9be3b3;font-size:18px;font-weight:700;">102</div>
<div style="color:#9be3b3;font-size:11px;">succeeded</div>
</div>
<div style="background:#321618;border:1px solid #5b2228;border-radius:4px;padding:10px;">
<div style="color:#ff9097;font-size:18px;font-weight:700;">2</div>
<div style="color:#ff9097;font-size:11px;">conflicts (in-plan skip)</div>
</div>
<div style="background:#332b16;border:1px solid #645228;border-radius:4px;padding:10px;">
<div style="color:#ffd784;font-size:18px;font-weight:700;">0</div>
<div style="color:#ffd784;font-size:11px;">apply-time failures</div>
</div>
<div style="background:#132837;border:1px solid #2e607f;border-radius:4px;padding:10px;">
<div style="color:#9fd9ff;font-size:18px;font-weight:700;">~21</div>
<div style="color:#9fd9ff;font-size:11px;">still need resolution (Phase 2)</div>
</div>
</div>
<div style="margin-top:14px; display:flex; gap:8px; align-items:center;">
<button class="ghost">Save revert plan (cleanup-revert-{ts}.json)</button>
<button class="ghost">Re-scan Library Issues</button>
<span style="margin-left:auto; font-size:11px; color:var(--muted);">Cache patched + saved · scan re-suggests cleanup pass if any items remain</span>
</div>
</div>
</div>
<!-- ============================================== -->
<h2>7 — Ignore list (optional, per-file)</h2>
<p>After applying, some files might intentionally stay non-canonical. Tracking them prevents Library Issues from re-flagging the same file next scan. Per-file flag in cache, no UI editor needed beyond a "Mark as intentional" checkbox per row in the modal.</p>
<div class="setting-card">
<label>
<input type="checkbox" checked disabled style="cursor:default;">
<span>Persist "ignore" decisions per file</span>
</label>
<div class="sublabel">Adds <code>filename_hygiene_ignore: true</code> to cache entry when row is unchecked + marked Intentional. Library Issues scan skips these files going forward. Cleared on cache rebuild.</div>
</div>
<!-- ============================================== -->
<h2>8 — Decisions to lock before any code</h2>
<table class="spec">
<thead>
<tr><th>Decision</th><th>Options</th><th>Suggested default</th></tr>
</thead>
<tbody>
<tr>
<td><b>Preview flow</b></td>
<td>P1 inline · P2 dedicated modal · P3 JSON-only</td>
<td class="opt">P2 + P3 export as side button</td>
</tr>
<tr>
<td><b>Part-suffix canonical shape</b></td>
<td><code>#part2 [1080p]</code> · <code>[1080p] #part2</code> · <code>.2of2 [1080p]</code> · leave alone</td>
<td class="opt"><code>#part2 [1080p]</code><br><span class="small">resolution at end matches the canonical regex; <code>#partN</code> matches existing extract_id convention</span></td>
</tr>
<tr>
<td><b>_PARTN normalization (9 files)</b></td>
<td>Convert <code>_PART1</code><code>#part1</code> · Leave as-is</td>
<td class="opt">Optional group, default deselected<br><span class="small">extract_id already handles both; cosmetic only</span></td>
</tr>
<tr>
<td><b>copy_suffix conflict policy</b></td>
<td>Auto-skip + report · Auto-include + warn · User decides per-row</td>
<td class="opt">Auto-skip + default-uncheck<br><span class="small">prevents clobbering real dupes; user can override</span></td>
</tr>
<tr>
<td><b>Multi-pattern transforms</b></td>
<td>Composite single row · Sequential per pattern</td>
<td class="opt">Composite<br><span class="small">simpler review; reason field lists all applied transforms</span></td>
</tr>
<tr>
<td><b>Revert plan artifact</b></td>
<td>None · Auto-save JSON · Save on user opt-in</td>
<td class="opt">Auto-save to disk on apply<br><span class="small">cheap safety net; user can ignore if not needed</span></td>
</tr>
<tr>
<td><b>Progress UI during apply</b></td>
<td>None · Spinner · Full progress bar + ETA</td>
<td class="opt">Full bar + ETA + cancel<br><span class="small">apply takes 1-2 min; user needs visibility</span></td>
</tr>
<tr>
<td><b>Placement in extension</b></td>
<td>Library Review pane (add button) · New Setup card · Detached window</td>
<td class="opt">Library Review pane<br><span class="small">already where users go for library issues</span></td>
</tr>
<tr>
<td><b>Persistent ignore list</b></td>
<td>None · Per-file flag in cache · Pattern-based regex</td>
<td class="opt">Per-file flag<br><span class="small">simplest; cleared on cache rebuild is acceptable</span></td>
</tr>
</tbody>
</table>
<!-- ============================================== -->
<h2>9 — What's NOT in this mockup (scope-fenced)</h2>
<ul style="color:var(--muted); font-size:12px; line-height:1.8; margin:0 0 30px 18px;">
<li><b style="color:#dce5ed;">ffprobe / resolution probing</b> — Phase 2, separate mockup if/when needed</li>
<li><b style="color:#dce5ed;">Bare-name renames</b> (~775 missing_resolution files) — out of scope without resolution data</li>
<li><b style="color:#dce5ed;">Quality-mapping editor</b> (HD → 1080p config) — only 4 files affected; not worth own UI</li>
<li><b style="color:#dce5ed;">Bulk cancellation that aborts mid-rclone-call</b> — would risk corrupt remote state; not supported</li>
<li><b style="color:#dce5ed;">Cross-remote moves</b> — cleanup keeps files in same folder; only rename within remote</li>
<li><b style="color:#dce5ed;">Pattern-rule editor</b> — extends Library Issues custom_rules; future enhancement</li>
</ul>
</main>
</body>
</html>
+13 -1750
View File
File diff suppressed because it is too large Load Diff
+71
View File
@@ -6,6 +6,72 @@ find at the top level. Adding a new submodule does not change the
public surface — only this file does.
"""
from rcjav.model import FileEntry # noqa: F401
from rcjav.output import ( # noqa: F401
USE_ANSI,
ANSI_RESET,
ANSI_GREEN,
ANSI_RED,
ANSI_YELLOW,
ANSI_CYAN,
ANSI_DIM,
ANSI_BOLD,
set_use_ansi,
set_basic,
ansi,
console,
set_console_no_color,
strip_markup,
human_size,
BasicProgress,
make_progress,
render_banner,
render_search,
render_name_matches,
render_name_matches_plain,
render_dupes,
render_banner_plain,
render_search_plain,
render_dupes_plain,
write_txt,
write_csv,
describe_skipped_id,
dupes_to_obj,
write_json,
)
from rcjav.library import ( # noqa: F401
classify_filename_hygiene,
find_library_issues,
find_missing_resolution,
find_resolution_noncanonical,
rename_file_in_remote,
rename_files_batch,
)
from rcjav.rclone_io import ( # noqa: F401
RCLONE_BIN,
CANCEL_FLAG,
CANCEL_CHECK_INTERVAL,
DURATION_RE,
set_basic,
set_rclone_bin,
quick_search_remote,
choose_search_mode,
name_to_include_patterns,
name_match,
query_to_include_patterns,
remote_file_count,
parse_duration,
walk_remote,
)
from rcjav.catalog import ( # noqa: F401
CATALOG_COL_NAME,
CATALOG_COL_PATH,
CATALOG_COL_SIZE,
CATALOG_COL_DISC,
normalize_catalog_path,
load_catalog_csv,
load_catalog_xml,
load_catalogs,
)
from rcjav.dupes import ( # noqa: F401
DEFAULT_KEEP_RANKING,
set_keep_ranking,
@@ -19,11 +85,15 @@ from rcjav.dupes import ( # noqa: F401
from rcjav.cache import ( # noqa: F401
CACHE_PATH,
CACHE_VERSION,
CACHE_SCHEMA_VERSION,
ID_RULES_VERSION,
CACHE_STALE_HOURS,
load_cache,
save_cache,
cache_age_hours,
fmt_age,
cache_state,
stamp_current_rules,
)
from rcjav.ids import ( # noqa: F401
PRIMARY_ID_RE,
@@ -40,4 +110,5 @@ from rcjav.ids import ( # noqa: F401
normalize_id,
describe_id_match,
expand_range,
current_rules_signature,
)
+108 -20
View File
@@ -2,10 +2,20 @@
This module owns the on-disk cache contract: where the file lives,
what the header looks like, and how mismatches are handled. The
current shape predates the two-tier `cache_schema` + `id_rules` split
documented in docs/CACHE_CONTRACT.md (extension repo) — step 10j
implements that contract; until then this is the legacy
`version: 3` reader.
contract is the two-tier `cache_schema` + `id_rules` model from
docs/CACHE_CONTRACT.md (extension repo).
cache_schema on-disk shape. Mismatch -> force rebuild.
id_rules integer; bumps when extraction rules change.
Mismatch -> mark stale, allow lazy re-extract.
id_rules_signature sha256 over canonical rule text (see
rcjav.ids.current_rules_signature). Belt-and-
braces drift check that catches a forgotten
`id_rules` bump.
Legacy users on `version: 3` get an in-place header upgrade with no
forced rescan; the cache is marked as `id_rules: 0` so it shows up
as "stale by rules" until they Re-extract IDs.
"""
from __future__ import annotations
@@ -19,29 +29,107 @@ from pathlib import Path
# Lives next to rc-jav.py at the repo root.
CACHE_PATH = Path(__file__).resolve().parents[1] / "cache.json"
CACHE_VERSION = 3 # bumped: extract_id handles bracket-wrapped IDs + no-hyphen fallback
CACHE_STALE_HOURS = 24
# Two-tier version contract (see docs/CACHE_CONTRACT.md):
CACHE_SCHEMA_VERSION = 1 # on-disk shape; bump = force rebuild
ID_RULES_VERSION = 1 # extraction rules; bump = mark stale (lazy re-extract)
# Legacy alias preserved for any external caller that still imports it.
# Maps to CACHE_SCHEMA_VERSION + ID_RULES_VERSION under the new contract.
CACHE_VERSION = 3
def _fresh_cache(signature: str = "unknown") -> dict:
return {
"cache_schema": CACHE_SCHEMA_VERSION,
"id_rules": ID_RULES_VERSION,
"id_rules_signature": signature,
"remotes": {},
}
def _migrate_legacy_v3(data: dict) -> dict:
"""Translate a legacy `version: 3` cache to the new header in place.
Sets `id_rules: 0` so the cache reads as "stale by rules" — user
sees the new amber state and can opt into a fast Re-extract without
a rclone re-scan.
"""
return {
"cache_schema": CACHE_SCHEMA_VERSION,
"id_rules": 0,
"id_rules_signature": "legacy",
"remotes": data.get("remotes", {}),
}
def load_cache(current_signature: str | None = None) -> dict:
"""Read and (if necessary) migrate cache.json.
`current_signature` is the value of `rcjav.ids.current_rules_signature()`
captured by the caller. It's only stamped into the header when this
function has to mint a *fresh* cache; when migrating legacy data we
deliberately stamp `"legacy"` so the cache reads as stale-by-rules.
"""
fresh_sig = current_signature or "unknown"
def load_cache() -> dict:
if not CACHE_PATH.exists():
return {"version": CACHE_VERSION, "remotes": {}}
return _fresh_cache(fresh_sig)
try:
data = json.loads(CACHE_PATH.read_text(encoding="utf-8"))
if (
not isinstance(data, dict)
or data.get("version") != CACHE_VERSION
or not isinstance(data.get("remotes"), dict)
):
if isinstance(data, dict) and "version" in data and data["version"] != CACHE_VERSION:
sys.stderr.write(
f"[warn] cache version mismatch (got {data['version']}, "
f"expected {CACHE_VERSION}); forcing full rescan.\n"
)
return {"version": CACHE_VERSION, "remotes": {}}
return data
except (json.JSONDecodeError, OSError):
return {"version": CACHE_VERSION, "remotes": {}}
return _fresh_cache(fresh_sig)
if not isinstance(data, dict) or not isinstance(data.get("remotes"), dict):
return _fresh_cache(fresh_sig)
# Legacy header: { "version": 3, "remotes": {...} } — migrate in place.
if "version" in data and "cache_schema" not in data:
if data.get("version") == 3:
return _migrate_legacy_v3(data)
sys.stderr.write(
f"[warn] unknown legacy cache version {data.get('version')!r}; "
f"rebuilding.\n"
)
return _fresh_cache(fresh_sig)
# New header: validate schema. Mismatch = force rebuild (per contract).
if data.get("cache_schema") != CACHE_SCHEMA_VERSION:
sys.stderr.write(
f"[warn] cache_schema mismatch (got {data.get('cache_schema')!r}, "
f"expected {CACHE_SCHEMA_VERSION}); forcing full rescan.\n"
)
return _fresh_cache(fresh_sig)
return data
def cache_state(cache: dict, current_signature: str) -> str:
"""Classify a cache dict against the live rule set.
Returns one of: "fresh", "stale_by_rules", "schema_mismatch".
"schema_mismatch" should normally never reach the caller — load_cache
already rebuilds. It's reported for diagnostics flows that read
cache.json directly without going through load_cache.
"""
if cache.get("cache_schema") != CACHE_SCHEMA_VERSION:
return "schema_mismatch"
rules_match = cache.get("id_rules") == ID_RULES_VERSION
sig_match = cache.get("id_rules_signature") == current_signature
return "fresh" if (rules_match and sig_match) else "stale_by_rules"
def stamp_current_rules(cache: dict, current_signature: str) -> None:
"""Stamp `id_rules` and `id_rules_signature` to current values in place.
Use after a successful re-extract or full scan completes against the
live rule set.
"""
cache["id_rules"] = ID_RULES_VERSION
cache["id_rules_signature"] = current_signature
def save_cache(cache: dict) -> None:
+178
View File
@@ -0,0 +1,178 @@
"""WinCatalog ingest — CSV and XML.
Catalog entries are offline references (e.g. an exported disc index
from WinCatalog). They show up in dupe output but never participate
in keep ranking — rcjav.dupes filters Catalog-sourced entries out
before choosing a winner.
Warnings are written to stderr without rich markup; the calling
module owns terminal styling.
"""
from __future__ import annotations
import csv
import re
import sys
import xml.etree.ElementTree as ET
from pathlib import Path
from rcjav.ids import extract_id
from rcjav.model import FileEntry
CATALOG_COL_NAME = ("name", "file name", "filename", "title")
CATALOG_COL_PATH = ("path", "full path", "location", "folder")
CATALOG_COL_SIZE = ("size", "file size", "bytes", "size (bytes)")
CATALOG_COL_DISC = ("disc", "disc name", "disc label", "volume", "source", "catalog", "media")
def _warn(msg: str) -> None:
sys.stderr.write(f"WARN: {msg}\n")
def _pick_col(headers_lower: list[str], synonyms: tuple[str, ...]) -> str | None:
for s in synonyms:
if s in headers_lower:
return s
return None
def normalize_catalog_path(path: str) -> str:
"""Keep catalog paths display-compatible with rclone-style path consumers."""
p = (path or "").replace("\\", "/")
if p.startswith("//"):
return "//" + re.sub(r"/+", "/", p[2:])
return re.sub(r"/+", "/", p)
def load_catalog_csv(path: Path, skipped: list[tuple[str, str]]) -> list[FileEntry]:
"""Load a WinCatalog CSV export. Lenient about column names."""
entries: list[FileEntry] = []
with path.open("r", encoding="utf-8-sig", newline="") as f:
sample = f.read(4096)
f.seek(0)
try:
dialect = csv.Sniffer().sniff(sample, delimiters=",;\t|")
except csv.Error:
dialect = csv.excel
reader = csv.DictReader(f, dialect=dialect)
if not reader.fieldnames:
return entries
headers: dict[str, str] = {}
for h in reader.fieldnames:
hl = h.lower()
if hl not in headers:
headers[hl] = h
col_name = _pick_col(list(headers), CATALOG_COL_NAME)
col_path = _pick_col(list(headers), CATALOG_COL_PATH)
col_size = _pick_col(list(headers), CATALOG_COL_SIZE)
col_disc = _pick_col(list(headers), CATALOG_COL_DISC)
if not col_name and not col_path:
_warn(f"catalog CSV {path} has no Name/Path columns; skipping.")
return entries
for row in reader:
name = (row.get(headers[col_name]) if col_name else "") or ""
full_path = (row.get(headers[col_path]) if col_path else "") or ""
if not name and full_path:
name = Path(full_path).name
full_path = normalize_catalog_path(full_path)
if not name:
continue
jav_id = extract_id(name)
if not jav_id:
skipped.append((f"catalog:{path.name}", full_path or name))
continue
try:
size = int(row.get(headers[col_size], 0)) if col_size else 0
except (ValueError, TypeError):
size = 0
disc = (row.get(headers[col_disc]) if col_disc else "") or ""
# Encode disc label into "remote" so it surfaces in output.
remote_label = f"catalog:{disc}" if disc else f"catalog:{path.name}"
entries.append(FileEntry(
source="Catalog", remote=remote_label,
path=full_path or name, size=size, mod_time="",
jav_id=jav_id,
))
return entries
def _strip_xml_ns(tag: str) -> str:
"""Remove Clark-notation namespace {uri}local -> local."""
return tag.split("}")[-1] if "}" in tag else tag
def load_catalog_xml(path: Path, skipped: list[tuple[str, str]]) -> list[FileEntry]:
"""Load a WinCatalog XML export. Walks for any element with file-like attrs."""
entries: list[FileEntry] = []
tree = ET.parse(str(path))
root = tree.getroot()
def walk(node, disc_label: str, parent_path: str, _depth: int = 0):
if _depth > 500:
return
tag = _strip_xml_ns(node.tag).lower()
if tag in ("disc", "catalog", "source", "volume", "media"):
disc_label = node.get("name") or node.get("Name") or disc_label
if tag in ("file", "f"):
name = node.get("name") or node.get("Name") or node.findtext("Name") or ""
size_raw = node.get("size") or node.get("Size") or node.findtext("Size") or "0"
try:
size = int(size_raw)
except ValueError:
size = 0
full_path = normalize_catalog_path(f"{parent_path}/{name}" if parent_path else name)
jav_id = extract_id(name)
if jav_id:
entries.append(FileEntry(
source="Catalog",
remote=f"catalog:{disc_label}" if disc_label else f"catalog:{path.name}",
path=full_path, size=size, mod_time="", jav_id=jav_id,
))
else:
skipped.append((f"catalog:{disc_label or path.name}", full_path))
return
if tag in ("folder", "dir", "directory"):
folder_name = node.get("name") or node.get("Name") or ""
parent_path = normalize_catalog_path(f"{parent_path}/{folder_name}" if parent_path else folder_name)
for child in node:
walk(child, disc_label, parent_path, _depth + 1)
walk(root, "", "")
return entries
def _expand_catalog_paths(paths: list[str], default_paths: list[str] | None = None) -> list[Path]:
"""Expand any directories to their *.csv / *.xml children. Files passed through.
`default_paths` is the configured DEFAULT_CATALOG list; missing paths inside
that set are silently skipped (it's normal to not have a catalog dir).
Missing paths outside the default set produce a warning.
"""
defaults = {Path(d).resolve() for d in (default_paths or [])}
out: list[Path] = []
for p in paths:
cp = Path(p)
if cp.is_dir():
for child in sorted(cp.iterdir()):
if child.suffix.lower() in (".csv", ".xml") and child.is_file():
out.append(child)
elif cp.exists():
out.append(cp)
elif Path(p).resolve() not in defaults:
_warn(f"catalog path not found: {p}")
return out
def load_catalogs(paths: list[str], skipped: list[tuple[str, str]],
default_paths: list[str] | None = None) -> list[FileEntry]:
out: list[FileEntry] = []
for cp in _expand_catalog_paths(paths, default_paths=default_paths):
ext = cp.suffix.lower()
if ext == ".csv":
out.extend(load_catalog_csv(cp, skipped))
elif ext == ".xml":
out.extend(load_catalog_xml(cp, skipped))
else:
_warn(f"unknown catalog format '{ext}' for {cp}; skipping.")
return out
+1011
View File
File diff suppressed because it is too large Load Diff
+32
View File
@@ -231,6 +231,38 @@ def describe_id_match(display_query: str, matched_query: str, matched_id: str,
}
def current_rules_signature() -> str:
"""Sha256 over the canonical text of every rule that influences a jav_id.
Includes built-in regex sources, BUILTIN_PART_RES sources, and PART_RES
(which captures user-added part patterns applied by
`configure_part_patterns`). Output prefixed with `sha256:` so callers can
sniff the algorithm without re-deriving it.
Stable across invocations: dict is dumped with sort_keys=True. Bumping a
regex changes the digest; reordering BUILTIN_PART_RES also changes it
(order is part of the contract because part-detection short-circuits).
"""
import hashlib
import json as _json
data = {
"schema": 1, # bump when this signature schema itself changes
"primary": PRIMARY_ID_RE.pattern,
"compound": COMPOUND_ID_RE.pattern,
"fallback": FALLBACK_ID_RE.pattern,
"nohyphen": _NOHYPHEN_ID_RE.pattern,
"bracket": _BRACKET_ID_RE.pattern,
"variant": _VARIANT_SUFFIX_RE.pattern,
"xofy": _XOFY_PRIORITY_RE.pattern,
"resolution_tag": _RESOLUTION_TAG_RE.pattern,
"builtin_part_res": [r.pattern for r in BUILTIN_PART_RES],
"part_res": [r.pattern for r in PART_RES],
"fc2_handling": "fc2_to_ppv",
}
text = _json.dumps(data, sort_keys=True, ensure_ascii=False)
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
def expand_range(raw: str) -> list[str] | None:
"""Expand a bracket range like 'IPZZ-[820-860]' into individual ID strings.
Returns None if no range marker present."""
+363
View File
@@ -0,0 +1,363 @@
"""Library-issue detection (non-canonical filenames) + safe renaming.
Scans the cache (not the live remote) for files whose names violate
the canonical `{ID}[ - actress][ [resolution]].ext` shape:
- Bracket-wrapped IDs: `[REAL-779].mp4` -> `REAL-779.mp4`
- No-hyphen IDs: `MVSD312 [576p].avi` -> `MVSD-312 [576p].avi`
`rename_file_in_remote` performs the rclone moveto and patches the
cache in place. `rename_files_batch` writes the cache once after a
batch of renames.
"""
from __future__ import annotations
import re
import subprocess
from collections import Counter
from pathlib import Path
from rcjav.cache import save_cache
from rcjav.ids import (
_BRACKET_ID_RE,
_NOHYPHEN_ID_RE,
COMPOUND_ID_RE,
FALLBACK_ID_RE,
PRIMARY_ID_RE,
extract_id,
)
from rcjav.output import human_size as _human_size
VIDEO_EXTS = {".avi", ".m4v", ".mkv", ".mov", ".mp4", ".mpeg", ".mpg", ".ts", ".webm", ".wmv"}
CANONICAL_RESOLUTION_RE = re.compile(r"\[(?P<resolution>\d{3,4}[pi]|4k|8k)\]$", re.IGNORECASE)
RESOLUTION_COPY_SUFFIX_RE = re.compile(r"\[(?P<resolution>\d{3,4}[pi]|4k|8k)\]\s*\((?P<copy>\d+)\)$", re.IGNORECASE)
RESOLUTION_PART_SUFFIX_RE = re.compile(
r"\[(?P<resolution>\d{3,4}[pi]|4k|8k)\][._ -]*(?P<part>\d+of\d+|part\d+|pt\d+)[.\s]*$",
re.IGNORECASE,
)
BARE_RESOLUTION_SUFFIX_RE = re.compile(r"(?:^|[._ -])(?P<resolution>\d{3,4}[pi]|4k|8k)$", re.IGNORECASE)
EMPTY_BRACKETS_RE = re.compile(r"\[\s*\]$")
BRACKET_TOKEN_SUFFIX_RE = re.compile(r"\[(?P<token>[^\]]+)\]$")
HD_QUALITY_SUFFIX_RE = re.compile(r"(?:^|[._ -])(?P<quality>hd|fhd|uhd|sd|fullhd)$", re.IGNORECASE)
MULTIPART_SUFFIX_RE = re.compile(r"(?:[._ -])(?P<part>\d+of\d+|part\d+|pt\d+|cd\d+|disc\d+|[ab])$", re.IGNORECASE)
def _issue(kind: str, *, source: str = "builtin", severity: str = "info", **extra) -> dict:
return {"kind": kind, "source": source, "severity": severity, **extra}
def _compile_custom_filename_rules(config: dict | None) -> list[dict]:
rules = ((config or {}).get("filename_hygiene") or {}).get("custom_rules") or []
compiled = []
for i, rule in enumerate(rules):
if not isinstance(rule, dict) or rule.get("enabled", True) is False:
continue
pattern = rule.get("pattern") or rule.get("match")
kind = rule.get("kind") or rule.get("name") or f"custom_rule_{i + 1}"
if not pattern:
continue
try:
compiled.append({
"name": rule.get("name") or kind,
"kind": kind,
"severity": rule.get("severity") or "info",
"target": rule.get("target") or "filename",
"regex": re.compile(pattern, re.IGNORECASE if rule.get("ignore_case", True) else 0),
})
except re.error:
continue
return compiled
def classify_filename_hygiene(filename: str, config: dict | None = None) -> dict:
"""Classify filename hygiene without proposing destructive changes."""
stem = Path(filename).stem
issues: list[dict] = []
has_resolution = False
resolution_style = "missing"
if m := CANONICAL_RESOLUTION_RE.search(stem):
has_resolution = True
resolution_style = "canonical"
issues.append(_issue("resolution_canonical", resolution=m.group("resolution").lower()))
elif m := RESOLUTION_COPY_SUFFIX_RE.search(stem):
has_resolution = True
resolution_style = "noncanonical"
issues.append(_issue(
"resolution_copy_suffix",
severity="cleanup",
resolution=m.group("resolution").lower(),
copy=m.group("copy"),
))
elif m := RESOLUTION_PART_SUFFIX_RE.search(stem):
has_resolution = True
resolution_style = "noncanonical"
issues.append(_issue(
"resolution_part_suffix",
severity="cleanup",
resolution=m.group("resolution").lower(),
part=m.group("part"),
))
elif m := BARE_RESOLUTION_SUFFIX_RE.search(stem):
has_resolution = True
resolution_style = "noncanonical"
issues.append(_issue(
"resolution_bare_suffix",
severity="cleanup",
resolution=m.group("resolution").lower(),
))
if not has_resolution:
issues.append(_issue("missing_resolution", severity="needs_probe"))
if EMPTY_BRACKETS_RE.search(stem):
issues.append(_issue("resolution_placeholder_empty", severity="needs_probe", token="[]"))
elif m := HD_QUALITY_SUFFIX_RE.search(stem):
issues.append(_issue("quality_marker_not_resolution", severity="needs_probe", token=m.group("quality")))
elif m := BRACKET_TOKEN_SUFFIX_RE.search(stem):
issues.append(_issue("suspicious_bracket_token", severity="needs_probe", token=m.group("token")))
if m := MULTIPART_SUFFIX_RE.search(stem):
issues.append(_issue("multipart_without_resolution", severity="needs_probe", part=m.group("part")))
for rule in _compile_custom_filename_rules(config):
target = rule["target"]
value = stem if target == "stem" else filename
if target == "path":
value = filename
match = rule["regex"].search(value)
if match:
issues.append(_issue(
rule["kind"],
source="custom",
severity=rule["severity"],
name=rule["name"],
matched=match.group(0),
))
return {
"has_resolution": has_resolution,
"resolution_style": resolution_style,
"issues": issues,
}
def _bracket_to_canonical(filename: str) -> str:
"""[REAL-779].mp4 -> REAL-779.mp4 | [HODV-21076] Saki [1080p].mkv -> HODV-21076 Saki [1080p].mkv"""
stem = Path(filename).stem
suffix = Path(filename).suffix
bm = _BRACKET_ID_RE.match(stem)
if not bm:
return filename
inner = bm.group(1).strip()
rest = stem[bm.end():].strip()
new_stem = f"{inner} {rest}".strip() if rest else inner
return f"{new_stem}{suffix}"
def _nohyphen_to_canonical(filename: str) -> str:
"""MVSD312 [576p].avi -> MVSD-312 [576p].avi"""
stem = Path(filename).stem
suffix = Path(filename).suffix
m = _NOHYPHEN_ID_RE.match(stem)
if not m:
return filename
prefix = m.group(1).upper()
num_str = m.group(2)
rest = stem[m.end():]
return f"{prefix}-{num_str}{rest}{suffix}"
def _cache_entry(remote: str, f: dict, issue: str, **extra) -> dict:
path = f.get("path", "")
filename = Path(path).name
ext = Path(filename).suffix.lower()
sep = "" if remote.endswith("/") or not path else "/"
return {
"remote": remote,
"path": path,
"full_path": f"{remote}{sep}{path}",
"filename": filename,
"extension": ext,
"size": f.get("size", 0),
"size_human": _human_size(f.get("size", 0)),
"mod_time": f.get("mod_time", ""),
"jav_id": f.get("jav_id", ""),
"issue": issue,
**extra,
}
def find_missing_resolution(cache: dict, config: dict | None = None) -> dict:
"""Return cached video files missing a final bracketed [resolution] tag."""
items: list[dict] = []
by_extension: Counter[str] = Counter()
by_remote: Counter[str] = Counter()
for remote, remote_data in cache.get("remotes", {}).items():
for f in remote_data.get("files", []):
fname = Path(f.get("path", "")).name
ext = Path(fname).suffix.lower()
if ext not in VIDEO_EXTS:
continue
classification = classify_filename_hygiene(fname, config)
if classification["has_resolution"]:
continue
entry = _cache_entry(remote, f, "missing_resolution", **classification)
items.append(entry)
by_extension[ext] += 1
by_remote[remote] += 1
return {
"issue": "missing_resolution",
"source": "cache",
"count": len(items),
"by_extension": dict(sorted(by_extension.items())),
"by_remote": dict(sorted(by_remote.items())),
"items": items,
}
def find_resolution_noncanonical(cache: dict, config: dict | None = None) -> dict:
"""Return cached video files with resolution present but not in final [resolution] form."""
items: list[dict] = []
by_kind: Counter[str] = Counter()
by_extension: Counter[str] = Counter()
for remote, remote_data in cache.get("remotes", {}).items():
for f in remote_data.get("files", []):
fname = Path(f.get("path", "")).name
ext = Path(fname).suffix.lower()
if ext not in VIDEO_EXTS:
continue
classification = classify_filename_hygiene(fname, config)
if classification["resolution_style"] != "noncanonical":
continue
entry = _cache_entry(remote, f, "resolution_noncanonical", **classification)
items.append(entry)
by_extension[ext] += 1
for issue in classification["issues"]:
by_kind[issue["kind"]] += 1
return {
"issue": "resolution_noncanonical",
"source": "cache",
"count": len(items),
"by_kind": dict(sorted(by_kind.items())),
"by_extension": dict(sorted(by_extension.items())),
"items": items,
}
def find_library_issues(cache: dict, config: dict | None = None) -> dict:
"""Scan cache for files with non-canonical names.
Returns:
{"bracket_names": [...], "nohyphen_names": [...]}
Each entry: {remote, path, size, mod_time, jav_id, canonical_name, issue}
"""
bracket: list[dict] = []
nohyphen: list[dict] = []
for remote, remote_data in cache.get("remotes", {}).items():
for f in remote_data.get("files", []):
fname = Path(f["path"]).name
stem = Path(fname).stem
if stem.startswith("[") and _BRACKET_ID_RE.match(stem):
bracket.append(_cache_entry(
remote, f, "bracket_id",
canonical_name=_bracket_to_canonical(fname),
))
elif (not PRIMARY_ID_RE.match(stem)
and not COMPOUND_ID_RE.match(stem)
and not FALLBACK_ID_RE.match(stem)
and _NOHYPHEN_ID_RE.match(stem)):
nohyphen.append(_cache_entry(
remote, f, "nohyphen_id",
canonical_name=_nohyphen_to_canonical(fname),
))
missing_resolution = find_missing_resolution(cache, config)
resolution_noncanonical = find_resolution_noncanonical(cache, config)
return {
"bracket_names": bracket,
"nohyphen_names": nohyphen,
"missing_resolution": missing_resolution["items"],
"missing_resolution_summary": {
"count": missing_resolution["count"],
"by_extension": missing_resolution["by_extension"],
"by_remote": missing_resolution["by_remote"],
},
"resolution_noncanonical": resolution_noncanonical["items"],
"resolution_noncanonical_summary": {
"count": resolution_noncanonical["count"],
"by_kind": resolution_noncanonical["by_kind"],
"by_extension": resolution_noncanonical["by_extension"],
},
}
def rename_file_in_remote(
remote: str,
old_rel_path: str,
new_rel_path: str,
cache: dict,
rclone_bin: str = "rclone",
save: bool = True,
) -> dict:
"""Rename one file via rclone moveto and patch cache.json.
Returns {"ok": True, "old_path": ..., "new_path": ...}
or {"ok": False, "error": ..., "conflict": bool}
Pass save=False when batching — caller is responsible for calling save_cache() once.
"""
sep = "" if remote.endswith("/") else "/"
old_full = f"{remote}{sep}{old_rel_path}"
new_full = f"{remote}{sep}{new_rel_path}"
check = subprocess.run(
[rclone_bin, "lsf", new_full],
capture_output=True, text=True,
)
if check.returncode == 0 and check.stdout.strip():
return {"ok": False, "error": f"Target already exists: {new_full}", "conflict": True}
result = subprocess.run(
[rclone_bin, "moveto", old_full, new_full],
capture_output=True, text=True,
)
if result.returncode != 0:
return {"ok": False, "error": (result.stderr or result.stdout).strip(), "conflict": False}
remote_data = cache.get("remotes", {}).get(remote)
if remote_data:
for f in remote_data.get("files", []):
if f["path"] == old_rel_path:
f["path"] = new_rel_path
f["jav_id"] = extract_id(Path(new_rel_path).name) or f["jav_id"]
break
remote_data["skipped"] = [s for s in remote_data.get("skipped", []) if s != old_rel_path]
if save:
save_cache(cache)
return {"ok": True, "old_path": old_full, "new_path": new_full}
def rename_files_batch(
renames: list[dict],
cache: dict,
rclone_bin: str = "rclone",
) -> list[dict]:
"""Rename multiple files, writing cache once at the end.
Each item in renames: {remote, old_path, new_path}
Returns list of per-file results with old_path/new_path echoed back.
"""
results = []
cache_dirty = False
for r in renames:
res = rename_file_in_remote(
r["remote"], r["old_path"], r["new_path"],
cache, rclone_bin=rclone_bin, save=False,
)
res["old_path"] = r["old_path"]
res["new_path"] = r["new_path"]
results.append(res)
if res["ok"]:
cache_dirty = True
if cache_dirty:
save_cache(cache)
return results
+495
View File
@@ -0,0 +1,495 @@
"""All terminal rendering, plain-text formatting, and file outputs.
Owns the singleton `console` (rich.Console) plus the ANSI constants
used in --basic mode. `BASIC` is mirrored from rcjav.rclone_io so
both modules answer the same question (the setter here proxies).
"""
from __future__ import annotations
import csv
import json
import re
import sys
from dataclasses import asdict
from pathlib import Path
from rich.console import Console
from rich.panel import Panel
from rich.progress import (
BarColumn,
MofNCompleteColumn,
Progress,
SpinnerColumn,
TextColumn,
TimeElapsedColumn,
TimeRemainingColumn,
)
from rich.table import Table
from rich.text import Text
from rcjav import rclone_io as _rclone_io
from rcjav.dupes import (
decide_keep,
decide_keep_with_reason,
describe_dupe_risks,
)
from rcjav.ids import extract_id
from rcjav.model import FileEntry
# ---------- ANSI / plain-mode toggles ----------
USE_ANSI = True # disabled by --no-color
ANSI_RESET = "\033[0m"
ANSI_GREEN = "\033[32m"
ANSI_RED = "\033[31m"
ANSI_YELLOW = "\033[33m"
ANSI_CYAN = "\033[36m"
ANSI_DIM = "\033[2m"
ANSI_BOLD = "\033[1m"
def set_use_ansi(value: bool) -> None:
global USE_ANSI
USE_ANSI = bool(value)
def ansi(s: str, code: str) -> str:
return f"{code}{s}{ANSI_RESET}" if USE_ANSI else s
# Singleton rich console. Replaced in set_console_no_color() when --no-color
# is passed (rich respects no_color=True everywhere).
console = Console()
def set_console_no_color() -> None:
global console
console = Console(no_color=True)
_RICH_TAG_RE = re.compile(r"\[/?[^\]]*\]")
def strip_markup(s: str) -> str:
return _RICH_TAG_RE.sub("", s)
# ---------- --basic mode flag (mirrored with rcjav.rclone_io) ----------
# Read dynamically as _rclone_io.BASIC so a single set_basic() call updates
# both this module's renderers and walk_remote's progress emission.
def set_basic(value: bool) -> None:
"""Toggle --basic mode for both renderers and rclone progress."""
_rclone_io.set_basic(value)
def _basic() -> bool:
return _rclone_io.BASIC
# ---------- size formatting ----------
def human_size(n: int) -> str:
nf = float(max(0, n))
for unit in ("B", "KiB", "MiB", "GiB", "TiB"):
if nf < 1024:
return f"{int(nf)} B" if unit == "B" else f"{nf:.2f} {unit}"
nf /= 1024
return f"{nf:.2f} PiB"
# ---------- progress UI ----------
class BasicProgress:
"""Minimal stand-in for rich.Progress used when --basic is set."""
def __init__(self):
self._tasks: dict[int, dict] = {}
self._next = 0
self._last_print: dict[int, int] = {}
def __enter__(self):
return self
def __exit__(self, *exc):
for tid, t in self._tasks.items():
sys.stderr.write(f"{ansi('[done]', ANSI_GREEN)} {t['desc']} {t['done']}/{t['total']}\n")
return False
def add_task(self, description: str, total: int = 1) -> int:
tid = self._next
self._next += 1
desc = strip_markup(description)
self._tasks[tid] = {"desc": desc, "total": total, "done": 0}
self._last_print[tid] = 0
sys.stderr.write(f"{ansi('[start]', ANSI_CYAN)} {desc}\n")
return tid
def update(self, tid, total=None, description=None, **_):
t = self._tasks[tid]
if total is not None:
t["total"] = total
if description is not None:
t["desc"] = strip_markup(description)
def advance(self, tid, n: int = 1):
t = self._tasks[tid]
t["done"] += n
# In-place refresh every 5 files (or every file if total small).
step = 5 if t["total"] > 50 else 1
if t["done"] - self._last_print[tid] >= step or t["done"] == t["total"]:
counter = ansi(f"{t['done']}/{t['total']}", ANSI_CYAN)
line = f" {counter} {ansi(t['desc'], ANSI_DIM)}"
if sys.stderr.isatty():
sys.stderr.write(f"\r\033[K{line}")
if t["done"] == t["total"]:
sys.stderr.write("\n")
sys.stderr.flush()
elif t["done"] == t["total"]:
sys.stderr.write(line + "\n")
self._last_print[tid] = t["done"]
def make_progress():
if _basic():
return BasicProgress()
return Progress(
SpinnerColumn(),
TextColumn("{task.description}"),
BarColumn(),
MofNCompleteColumn(),
TimeElapsedColumn(),
TextColumn("eta"),
TimeRemainingColumn(),
console=console,
transient=False,
)
# ---------- rich renderers ----------
def render_banner(cache_meta: dict[str, dict], mode: str) -> Panel:
lines: list[Text] = []
lines.append(Text.from_markup(f"[bold]mode:[/] {mode}"))
if cache_meta:
for r, m in cache_meta.items():
if m["cached"]:
tag = f"CACHED {m['age']}" + (" STALE" if m["stale"] else "")
style = "yellow" if m["stale"] else "dim"
else:
tag = "FRESH SCAN"
style = "green"
lines.append(Text.from_markup(
f" [white]{r}[/] [{style}]{tag}[/] [dim]({m['file_count']} files)[/]"
))
body = Text("\n").join(lines)
return Panel(body, title="rc-jav", title_align="left", border_style="blue")
def render_search(matches: dict[str, list[FileEntry]], queries: list[str],
cache_meta: dict[str, dict]) -> None:
console.print(render_banner(cache_meta, mode="search"))
for q in queries:
hits = matches.get(q, [])
if not hits:
console.print(f"[bold red][{q}] NOT FOUND[/]")
console.print()
continue
title = f"[bold green][{q}] {len(hits)} hit(s)[/]"
tbl = Table(title=title, title_justify="left", show_lines=False,
border_style="green", expand=True)
tbl.add_column("Source", style="yellow", no_wrap=True)
tbl.add_column("Cache", no_wrap=True)
tbl.add_column("File", style="bold", overflow="fold")
tbl.add_column("Size", justify="right", no_wrap=True)
tbl.add_column("Path", style="dim", overflow="fold")
for e in sorted(hits, key=lambda x: (x.jav_id, x.path.lower())):
meta = cache_meta.get(e.remote, {})
if meta.get("cached"):
cache_tag = "[yellow][CACHED-STALE][/]" if meta.get("stale") else "[dim][CACHED][/]"
else:
cache_tag = "[green][FRESH][/]"
tbl.add_row(
e.source, cache_tag, Path(e.path).name,
f"{human_size(e.size)}\n[dim]({e.size:,} B)[/]",
e.full_path,
)
console.print(tbl)
console.print()
def render_name_matches(hits: list[FileEntry], tokens: list[str],
cache_meta: dict[str, dict]) -> None:
title = f"[bold green]Name match {tokens}{len(hits)} hit(s)[/]"
if not hits:
console.print(f"[bold red]Name match {tokens} — NOT FOUND[/]")
return
tbl = Table(title=title, title_justify="left", show_lines=False,
border_style="green", expand=True)
tbl.add_column("Source", style="yellow", no_wrap=True)
tbl.add_column("Cache", no_wrap=True)
tbl.add_column("ID", style="bold cyan", no_wrap=True)
tbl.add_column("File", style="bold", overflow="fold")
tbl.add_column("Size", justify="right", no_wrap=True)
tbl.add_column("Path", style="dim", overflow="fold")
for e in sorted(hits, key=lambda x: (x.jav_id, x.path.lower())):
meta = cache_meta.get(e.remote, {})
if meta.get("cached"):
cache_tag = "[yellow][CACHED-STALE][/]" if meta.get("stale") else "[dim][CACHED][/]"
else:
cache_tag = "[green][FRESH][/]"
tbl.add_row(
e.source, cache_tag, e.jav_id, Path(e.path).name,
f"{human_size(e.size)}\n[dim]({e.size:,} B)[/]",
e.full_path,
)
console.print(tbl)
console.print()
def render_name_matches_plain(hits: list[FileEntry], tokens: list[str],
cache_meta: dict[str, dict]) -> str:
lines: list[str] = []
if not hits:
lines.append(ansi(f"Name match {tokens} — NOT FOUND", ANSI_RED))
return "\n".join(lines)
lines.append(ansi(f"Name match {tokens}{len(hits)} hit(s)", ANSI_GREEN + ANSI_BOLD))
for e in sorted(hits, key=lambda x: (x.jav_id, x.path.lower())):
meta = cache_meta.get(e.remote, {})
if meta.get("cached"):
tag = ansi("[CACHED-STALE]", ANSI_YELLOW) if meta.get("stale") else ansi("[CACHED]", ANSI_DIM)
else:
tag = ansi("[FRESH]", ANSI_GREEN)
src = ansi(e.source, ANSI_YELLOW)
lines.append(f" {src} {tag} {ansi(e.jav_id, ANSI_CYAN)}")
lines.append(ansi(f" file: {Path(e.path).name}", ANSI_BOLD))
lines.append(f" size: {human_size(e.size)} ({e.size:,} bytes)")
lines.append(ansi(f" path: {e.full_path}", ANSI_DIM))
return "\n".join(lines)
def render_dupes(dupes: dict[str, list[FileEntry]],
skipped: list[tuple[str, str]],
variant_alerts: dict[str, list[FileEntry]] | None = None) -> None:
if not dupes:
console.print(Panel("[bold green]No duplicates found.[/]",
border_style="green"))
else:
console.print(f"[bold]Found {len(dupes)} duplicate ID group(s):[/]")
console.print()
total_reclaim = 0
for jav_id in sorted(dupes):
entries = dupes[jav_id]
keep = decide_keep(entries)
tbl = Table(title=f"[bold][{jav_id}][/]", title_justify="left",
show_lines=False, border_style="magenta", expand=True)
tbl.add_column("Action", no_wrap=True)
tbl.add_column("Source", style="yellow", no_wrap=True)
tbl.add_column("Size", justify="right", no_wrap=True)
tbl.add_column("Path", overflow="fold")
for e in sorted(entries, key=lambda x: (x.source != "Source", x.source == "Catalog", -x.size)):
if e.source == "Catalog":
action = "[cyan]CATALOG[/]"
elif e is keep:
action = "[green]KEEP[/]"
else:
action = "[red]DELETE?[/]"
total_reclaim += e.size
tbl.add_row(action, e.source,
f"{human_size(e.size)}\n[dim]({e.size:,} B)[/]",
e.full_path)
console.print(tbl)
console.print()
console.print(Panel(
f"[bold]Potential space reclaim if all DELETE? removed: "
f"[red]{human_size(total_reclaim)}[/][/]",
border_style="red"))
if skipped:
console.print()
tbl = Table(title=f"[dim]Skipped {len(skipped)} file(s) with no parseable ID[/]",
title_justify="left", show_lines=False, border_style="dim", expand=True)
tbl.add_column("Remote", style="dim", no_wrap=True)
tbl.add_column("Path", style="dim", overflow="fold")
for remote, path in skipped[:50]:
tbl.add_row(remote, path)
if len(skipped) > 50:
tbl.add_row("[dim]…[/]", f"[dim]+{len(skipped) - 50} more[/]")
console.print(tbl)
if variant_alerts:
console.print()
console.print(Panel(
f"[bold yellow]⚠ {len(variant_alerts)} variant alert(s) — manual review recommended[/]",
border_style="yellow"))
for bare_id, entries in sorted(variant_alerts.items()):
tbl = Table(title=f"[bold yellow][{bare_id}] — bare + variant coexist[/]",
title_justify="left", show_lines=False, border_style="yellow", expand=True)
tbl.add_column("ID", style="yellow", no_wrap=True)
tbl.add_column("Size", justify="right", no_wrap=True)
tbl.add_column("Path", overflow="fold")
for e in sorted(entries, key=lambda x: x.full_path):
eid = extract_id(Path(e.path).name) or e.jav_id
tbl.add_row(eid, human_size(e.size), e.full_path)
console.print(tbl)
console.print()
# ---------- plain renderers (--basic) ----------
def render_banner_plain(cache_meta: dict[str, dict], mode: str) -> str:
lines = [ansi(f"=== rc-jav ({mode}) ===", ANSI_BOLD)]
for r, m in cache_meta.items():
if m["cached"]:
tag = f"CACHED {m['age']}" + (" STALE" if m["stale"] else "")
tag_c = ansi(tag, ANSI_YELLOW if m["stale"] else ANSI_DIM)
else:
tag_c = ansi("FRESH SCAN", ANSI_GREEN)
count_str = ansi(f"({m['file_count']} files)", ANSI_DIM)
lines.append(f" {r} {tag_c} {count_str}")
return "\n".join(lines)
def render_search_plain(matches: dict[str, list[FileEntry]], queries: list[str],
cache_meta: dict[str, dict]) -> str:
lines: list[str] = []
if cache_meta:
lines.append(render_banner_plain(cache_meta, "search"))
lines.append("")
for q in queries:
hits = matches.get(q, [])
if not hits:
lines.append(ansi(f"[{q}] NOT FOUND", ANSI_RED))
lines.append("")
continue
lines.append(ansi(f"[{q}] {len(hits)} hit(s)", ANSI_GREEN + ANSI_BOLD))
for e in sorted(hits, key=lambda x: (x.jav_id, x.path.lower())):
meta = cache_meta.get(e.remote, {})
if meta.get("cached"):
tag = ansi("[CACHED-STALE]", ANSI_YELLOW) if meta.get("stale") else ansi("[CACHED]", ANSI_DIM)
else:
tag = ansi("[FRESH]", ANSI_GREEN)
src = ansi(e.source, ANSI_YELLOW)
lines.append(f" {src} {tag}")
lines.append(ansi(f" file: {Path(e.path).name}", ANSI_BOLD))
lines.append(f" size: {human_size(e.size)} ({e.size:,} bytes)")
lines.append(ansi(f" path: {e.full_path}", ANSI_DIM))
lines.append("")
return "\n".join(lines)
def render_dupes_plain(dupes, skipped, variant_alerts=None) -> str:
lines: list[str] = []
if not dupes:
lines.append(ansi("No duplicates found.", ANSI_GREEN))
else:
lines.append(ansi(f"Found {len(dupes)} duplicate ID group(s):", ANSI_BOLD))
lines.append("")
total_reclaim = 0
for jav_id in sorted(dupes):
entries = dupes[jav_id]
keep = decide_keep(entries)
lines.append(ansi(f"[{jav_id}]", ANSI_BOLD))
for e in sorted(entries, key=lambda x: (x.source != "Source", x.source == "Catalog", -x.size)):
if e.source == "Catalog":
mark = ansi("CATALOG ", ANSI_CYAN)
elif e is keep:
mark = ansi("KEEP ", ANSI_GREEN)
else:
mark = ansi("DELETE? ", ANSI_RED)
total_reclaim += e.size
src = ansi(f"{e.source:>8}", ANSI_YELLOW)
size_str = f"{human_size(e.size)} ({e.size:,} B)"
lines.append(f" {mark} {src} {size_str:>26} {e.full_path}")
lines.append("")
lines.append(ansi(f"Potential space reclaim if all DELETE? removed: {human_size(total_reclaim)}", ANSI_BOLD))
if skipped:
lines.append("")
lines.append(ansi(f"Skipped {len(skipped)} file(s) with no parseable ID:", ANSI_DIM))
for remote, path in skipped[:50]:
lines.append(ansi(f" {remote} {path}", ANSI_DIM))
if len(skipped) > 50:
lines.append(ansi(f" ... +{len(skipped) - 50} more", ANSI_DIM))
if variant_alerts:
lines.append("")
lines.append(ansi(f"{len(variant_alerts)} variant alert(s) — manual review required:", ANSI_YELLOW + ANSI_BOLD))
for bare_id, entries in sorted(variant_alerts.items()):
lines.append(ansi(f" [{bare_id}] bare + variant coexist", ANSI_YELLOW))
for e in sorted(entries, key=lambda x: x.full_path):
eid = extract_id(Path(e.path).name) or e.jav_id
lines.append(f" {ansi(eid, ANSI_YELLOW)} {human_size(e.size):>10} {e.full_path}")
return "\n".join(lines)
# ---------- file outputs ----------
def write_txt(path: Path, dupes, skipped):
path.write_text(render_dupes_plain(dupes, skipped), encoding="utf-8")
def write_csv(path: Path, dupes):
with path.open("w", newline="", encoding="utf-8") as f:
w = csv.writer(f)
w.writerow(["jav_id", "action", "source", "remote", "path", "full_path",
"size_bytes", "size_human", "mod_time"])
for jav_id in sorted(dupes):
entries = dupes[jav_id]
keep = decide_keep(entries)
for e in entries:
if e.source == "Catalog":
action = "CATALOG"
elif e is keep:
action = "KEEP"
else:
action = "DELETE?"
w.writerow([jav_id, action, e.source,
e.remote, e.path, e.full_path, e.size, human_size(e.size), e.mod_time])
def describe_skipped_id(remote: str, path: str) -> dict[str, str]:
"""Explain a common reason a path did not yield an ID."""
name = Path((path or "").replace("\\", "/")).name
reason = "No supported JAV ID at filename start"
hint = "Rename with a leading ID such as ABC-123 or add an ID normalizer/site-specific source."
if re.match(r"^\[[A-Za-z0-9-]+-\d+\]", name):
reason = "ID is wrapped in leading brackets"
hint = "Remove the leading brackets so the filename starts with the ID."
elif re.match(r"^[A-Za-z][A-Za-z0-9]+[-―]\d+", name):
reason = "ID uses a non-ASCII dash"
hint = "Replace the separator with a normal hyphen."
elif re.match(r"^[A-Za-z][A-Za-z0-9]+\d+", name):
reason = "ID prefix and number have no hyphen"
hint = "Insert the ID hyphen, for example ABC-123."
return {"remote": remote, "path": path, "name": name, "reason": reason, "hint": hint}
def dupes_to_obj(dupes, skipped, variant_alerts=None) -> dict:
out = {"groups": {}, "skipped": [describe_skipped_id(r, p) for r, p in skipped],
"variant_alerts": []}
for jav_id in sorted(dupes):
entries = dupes[jav_id]
keep, keep_reason = decide_keep_with_reason(entries)
out["groups"][jav_id] = {
"keep": asdict(keep) | {"full_path": keep.full_path, "size_human": human_size(keep.size)},
"keep_reason": keep_reason,
"risks": describe_dupe_risks(jav_id, entries),
"delete_candidates": [asdict(e) | {"full_path": e.full_path, "size_human": human_size(e.size)}
for e in entries
if e is not keep and e.source != "Catalog"],
"catalog": [asdict(e) | {"full_path": e.full_path, "size_human": human_size(e.size)}
for e in entries if e.source == "Catalog"],
}
for bare_id, entries in sorted((variant_alerts or {}).items()):
out["variant_alerts"].append({
"bare_id": bare_id,
"files": [
asdict(e) | {"full_path": e.full_path, "size_human": human_size(e.size),
"detected_id": extract_id(Path(e.path).name) or e.jav_id}
for e in sorted(entries, key=lambda x: x.full_path)
],
})
return out
def write_json(path: Path, dupes, skipped, variant_alerts=None):
path.write_text(json.dumps(dupes_to_obj(dupes, skipped, variant_alerts), indent=2), encoding="utf-8")
+316
View File
@@ -0,0 +1,316 @@
"""rclone subprocess wrappers — listing, sizing, quick search.
Owns RCLONE_BIN and the cancel-flag protocol used by the
native-messaging host to interrupt scans. Errors go to stderr (no
rich markup); rc-jav.py owns terminal styling.
"""
from __future__ import annotations
import fnmatch
import json
import re
import subprocess
import sys
import threading
import time
from pathlib import Path
from rcjav.ids import RANGE_RE, expand_range, extract_id, normalize_id
from rcjav.model import FileEntry
RCLONE_BIN = "rclone"
# Written by the native-messaging host when the user clicks Cancel in the
# extension popup. walk_remote checks for it every CANCEL_CHECK_INTERVAL files
# and exits cleanly if found.
CANCEL_FLAG = Path(__file__).resolve().parents[1] / "scan-cancel.flag"
CANCEL_CHECK_INTERVAL = 25
PROGRESS_EMIT_MIN_FILES = 25
PROGRESS_EMIT_MIN_GAP_S = 0.25
PROGRESS_EMIT_MAX_GAP_S = 1.0
# Toggled from rc-jav.py main() when --basic is passed. Affects whether
# walk_remote emits machine-parseable progress lines on stderr.
BASIC = False
def set_basic(value: bool) -> None:
"""Toggle plain/machine-readable progress output for rclone walks."""
global BASIC
BASIC = bool(value)
def set_rclone_bin(path: str) -> None:
"""Override the rclone binary (default 'rclone' on PATH)."""
global RCLONE_BIN
RCLONE_BIN = path or "rclone"
def _err(msg: str) -> None:
sys.stderr.write(msg + "\n")
def quick_search_remote(remote: str, source_label: str,
patterns: list[str],
skipped: list[tuple[str, str]]) -> list[FileEntry]:
"""Run `rclone lsjson --include <pattern>` once per pattern. Bypass cache."""
out: list[FileEntry] = []
seen: set[tuple[str, str]] = set()
for pat in patterns:
cmd = [RCLONE_BIN, "lsjson", remote, "--files-only", "-R", "--include", pat]
proc = subprocess.run(cmd, capture_output=True, text=True,
encoding="utf-8", errors="replace")
if proc.returncode != 0:
_err(f"rclone lsjson --include failed for {remote}:\n{proc.stderr}")
sys.exit(proc.returncode)
for item in json.loads(proc.stdout or "[]"):
if item.get("IsDir"):
continue
path = item["Path"]
key = (remote, path)
if key in seen:
continue
seen.add(key)
jav_id = extract_id(Path(path).name)
if not jav_id:
skipped.append((remote, path))
continue
out.append(FileEntry(
source=source_label, remote=remote, path=path,
size=int(item.get("Size", 0)),
mod_time=item.get("ModTime", ""), jav_id=jav_id,
))
return out
def choose_search_mode(raw_queries: list[str], force_quick: bool, force_cache: bool) -> tuple[str, str]:
"""Decide quick vs cached. Returns (mode, reason)."""
if force_quick and force_cache:
return ("cached", "both --quick and --cache passed; preferring --cache (safer)")
if force_quick:
return ("quick", "forced via --quick")
if force_cache:
return ("cached", "forced via --cache")
if len(raw_queries) > 1:
return ("cached", f"multi-query ({len(raw_queries)} IDs) — cache batches them for free")
if not raw_queries:
return ("cached", "no queries")
q = raw_queries[0]
if RANGE_RE.search(q):
return ("cached", "range [N-M] — too many rclone calls otherwise")
if "*" in q or "?" in q:
return ("cached", "wildcard — cache match semantics are more reliable")
return ("quick", "single exact ID — live lookup is fastest")
def _escape_rclone_glob(s: str) -> str:
"""Escape rclone filter meta-chars so a literal token isn't interpreted as a
glob. rclone's filter syntax treats `*`, `?`, `[`, `{` specially; brackets
open a char-class that fails silently if the token contains `[` or `]`."""
out = []
for ch in s:
if ch in r"*?[]{}\\":
out.append("\\" + ch)
else:
out.append(ch)
return "".join(out)
def name_to_include_patterns(tokens: list[str]) -> list[str]:
"""Build rclone --include globs for each name token (case-insensitive substring)."""
pats: list[str] = []
for t in tokens:
if "*" in t or "?" in t:
pats.append(t)
else:
pats.append(f"*{_escape_rclone_glob(t)}*")
return pats
def name_match(stem: str, tokens: list[str]) -> bool:
"""Case-insensitive: True if ANY token matches stem (substring or fnmatch glob)."""
low = stem.lower()
for t in tokens:
tl = t.lower()
if "*" in tl or "?" in tl:
if fnmatch.fnmatchcase(low, tl):
return True
elif tl in low:
return True
return False
def query_to_include_patterns(raw: str) -> list[str]:
"""Turn a search query into one or more rclone --include globs.
Ranges expand to individual IDs; wildcards and exact IDs map to single glob."""
if RANGE_RE.search(raw):
expanded = expand_range(raw) or []
out: list[str] = []
for e in expanded:
out.extend(query_to_include_patterns(e))
return out
if "*" in raw or "?" in raw:
return [f"{raw}*"]
norm = normalize_id(raw)
if not norm:
return [f"{raw}*"]
prefix, _, digits = norm.rpartition("-")
if not digits.isdigit():
return [f"{norm}*"]
n = int(digits)
width = max(3, len(str(n)))
return [f"{prefix}-{n:0{width}d}*"]
def remote_file_count(remote: str) -> int:
"""Fast total file count via `rclone size --json`."""
cmd = [RCLONE_BIN, "size", "--json", remote]
proc = subprocess.run(cmd, capture_output=True, text=True,
encoding="utf-8", errors="replace")
if proc.returncode != 0:
_err(f"rclone size failed for {remote}:\n{proc.stderr}")
sys.exit(proc.returncode)
try:
return int(json.loads(proc.stdout).get("count", 0))
except (json.JSONDecodeError, ValueError):
return 0
DURATION_RE = re.compile(r"^\s*(\d+)\s*([smhd])\s*$", re.IGNORECASE)
def parse_duration(s: str) -> str | None:
"""Validate a duration suffix (`30m`, `24h`, `7d`, `90s`). Returns the
normalized form rclone accepts, or None if invalid. We don't compute a
timedelta we pass the suffix straight to rclone --max-age."""
if not s:
return None
m = DURATION_RE.match(s)
if not m:
return None
return f"{m.group(1)}{m.group(2).lower()}"
def walk_remote(remote: str, source_label: str,
skipped: list[tuple[str, str]],
progress, task_id,
max_age: str | None = None,
_total_override: int | None = None) -> tuple[list[FileEntry], list[str]]:
"""Stream files from rclone lsf, ticking progress per file.
If max_age is set, pass --max-age to rclone so only recently-modified files
are returned (incremental scan).
_total_override: skip the internal remote_file_count probe (caller already did it).
`progress` is a rich.Progress (or BasicProgress) instance owned by the caller.
"""
if max_age:
total = 0
progress.update(task_id, total=1,
description=f"[cyan]{source_label}[/] {remote} (since {max_age})")
else:
if _total_override is not None:
total = _total_override
else:
total = remote_file_count(remote)
if BASIC:
sys.stderr.write("SCAN_REMOTE_COUNTED " + json.dumps({
"remote": remote, "total": total,
}) + "\n")
sys.stderr.flush()
progress.update(task_id, total=max(total, 1),
description=f"[cyan]{source_label}[/] {remote}")
cmd = [RCLONE_BIN, "lsf", "--files-only", "-R",
"--format", "pst", "--separator", "\t"]
if max_age:
cmd += ["--max-age", max_age]
cmd.append(remote)
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
text=True, encoding="utf-8", errors="replace")
entries: list[FileEntry] = []
local_skipped: list[str] = []
if proc.stdout is None:
raise RuntimeError("rclone stdout pipe unexpectedly None")
_stderr_chunks: list[str] = []
_stderr_thread = threading.Thread(
target=lambda: _stderr_chunks.append(proc.stderr.read() if proc.stderr else ""),
daemon=True,
)
_stderr_thread.start()
_cancelled = False
last_emit_n = 0
last_emit_ts = time.monotonic()
try:
for line in proc.stdout:
line = line.rstrip("\n").rstrip("\r")
if not line:
continue
parts = line.split("\t")
if len(parts) < 2:
continue
rel = parts[0]
try:
size = int(parts[1])
except ValueError:
size = 0
mod_time = parts[2] if len(parts) >= 3 else ""
jav_id = extract_id(Path(rel).name)
if not jav_id:
local_skipped.append(rel)
skipped.append((remote, rel))
else:
entries.append(FileEntry(
source=source_label, remote=remote, path=rel,
size=size, mod_time=mod_time, jav_id=jav_id,
))
progress.advance(task_id)
n = len(entries) + len(local_skipped)
if BASIC and n > 0 and n % CANCEL_CHECK_INTERVAL == 0:
if CANCEL_FLAG.exists():
try:
CANCEL_FLAG.unlink(missing_ok=True)
except OSError:
pass
proc.terminate()
try:
proc.wait(timeout=3)
except subprocess.TimeoutExpired:
proc.kill()
_cancelled = True
break
if BASIC and n > 0:
now = time.monotonic()
files_since_emit = n - last_emit_n
elapsed_since_emit = now - last_emit_ts
should_emit_progress = (
files_since_emit >= PROGRESS_EMIT_MIN_FILES
and elapsed_since_emit >= PROGRESS_EMIT_MIN_GAP_S
) or elapsed_since_emit >= PROGRESS_EMIT_MAX_GAP_S
if not should_emit_progress:
continue
sys.stderr.write("SCAN_FILE_PROGRESS " + json.dumps({
"remote": remote, "label": source_label,
"files": len(entries), "skipped": len(local_skipped),
"total": total,
}) + "\n")
sys.stderr.flush()
last_emit_n = n
last_emit_ts = now
except KeyboardInterrupt:
proc.terminate()
try:
proc.wait(timeout=3)
except subprocess.TimeoutExpired:
proc.kill()
raise
if _cancelled:
sys.stderr.write("SCAN_CANCELLED\n")
sys.stderr.flush()
sys.exit(0)
proc.wait()
_stderr_thread.join()
if proc.returncode != 0:
err = _stderr_chunks[0] if _stderr_chunks else ""
_err(f"rclone lsf failed for {remote}:\n{err}")
sys.exit(proc.returncode)
return entries, local_skipped
+83
View File
@@ -0,0 +1,83 @@
# Verification — Phase 3 — 2026-05-25
Original snapshot: `audit-snapshot-2026-05-24T15-55Z.md`
Final snapshot: same code-of-record + 13 manifest bumps (0.1.33 → 0.1.45 inclusive) + 1 CLI-only no-bump fix
## Fix summary (all bugs in audit queue)
### Severe (1/1 fixed)
- **S-1 (opts):** Export silently drops `keep_ranking` → backup data loss. **FIXED v0.1.33** at `src/options/options.js:386-447`. Export now blocks on `get-keep-ranking` RPC failure; success path writes `_meta.host_config.keep_ranking`. Verified: failure path shows clear block message + no file; success path produces JSON with full keep_ranking populated.
### Moderate (7/7 fixed)
- **M-1 (opts):** `sanitizeImportedSettings` element validation gap → content-script crash on malformed import. **FIXED v0.1.34** at `src/options/options.js:552-633`. Added `ARRAY_ELEMENT_VALIDATORS` for siteAdapters / idNormalizers / partPatterns / knownSitePatterns / profiles. Verified: malformed import shows `siteAdapters[0](malformed)` in modal; good entry survives; runtime test confirmed.
- **M-2 (bg):** Context menu missing after MV3 SW eviction. **FIXED v0.1.36** at `background.js:1193` (top-level `ensureContextMenu()` call). Verified: right-click on google.com after SW lifecycle shows full rclone-jav menu.
- **M-3 (host):** `handle_scan` returned success before Popen could fail. **FIXED v0.1.37** at `host/rcjav-host.py:2053-2306`. Per-invocation `threading.Event` + `spawn_result` dict; 500 ms wait; synchronous failure surfacing. Verified via instrumented `raise FileNotFoundError("simulated spawn fail")` runtime test — UI showed `scan failed: FileNotFoundError: simulated spawn fail` synchronously; instrumentation reverted.
- **M-4 (host):** `post_discord_alert` blocked main loop 5 s on slow Discord. **FIXED v0.1.39** at `host/rcjav-host.py:174-289`. Refactored to `_discord_post_worker` + `_build_discord_body`; real alerts threaded fire-and-forget; test RPC waits 6 s with explicit timeout error. Outcomes logged with `alert_source`. Verified runtime: Test (host) returned synchronous `HTTP 401` for bogus token + `HTTP 204` for valid; events.log has `discord_post` with all fields.
- **M-5 (popup):** Profile selector race overwrites with stale results. **FIXED v0.1.40 + 0.1.41 follow-up** at `src/popup/popup.js`. Monotonic `_currentSearchId` gate in `runCheck` + `runManualSearch`; bumped BEFORE paused early-exit. Verified: unit tests 5/5 pass; stale callbacks bail before any UI write.
- **M-6 (bg):** `recordRpc` race loses log entries. **FIXED v0.1.42** at `background.js:155-180`. Promise-chain lock around the body. Verified via simulated-storage smoke test: unlocked 1/5 preserved, locked 5/5 preserved.
- **M-7 (CLI):** `save_config` no retry — Windows AV lock crashes `--save`. **FIXED (no manifest bump — CLI repo only)** at `rcjav/cli.py:186-194`. Mirrored `save_cache`'s retry. Verified 3/3 smoke tests.
### Light (1/6 fixed; 5 deferred)
- **L-1 (bg):** `maybeNotifyHostError` rate-limit race over-notification. **FIXED v0.1.43** at `background.js:191-247`. Dedicated `_hostAlertLock` around rate-limit + notification + Discord paths. Verified: unlocked 5/5 fire (bug confirmed), locked 1/5 fire (correct), sequential locked still rate-limited.
- **L-2 → L-6 (deferred):** cosmetic / UX polish — Discord visibility passive UI, stderr 5 s stale on rc-jav stall, expectedId state leak, history chip during modal, Clear button modal stays open. None block S/M user workflows; all have documented workarounds. Tracked in respective `bugs-*.md` files for a future polish pass.
## Phase 3 re-audit: bug introduced by M-3 fix, caught and fixed
Re-audit of Phase 2 modified files (background.js, options.js, popup.js, rcjav-host.py, cli.py, manifest.json) by a fresh-context Explore agent surfaced one introduced bug:
- **M-3 spawn race:** `_scan_worker` signaled `spawn_event` BEFORE assigning `_scan_proc = proc` under `_scan_lock`. A cancel arriving in the ~1-5 ms window between signal and assignment would read `_scan_proc = None`, return "no scan running", and never write the cancel flag — scan would run to completion uninterruptable.
- **FIXED v0.1.44** at `host/rcjav-host.py:2186-2196`. Reordered: `_scan_proc = proc` (under `_scan_lock`) now happens BEFORE `spawn_event.set()`. handle_scan still gets the spawn-ok signal; cancel handler now sees a live `_scan_proc` reference.
No other introduced bugs found. Phase 2 fix code passed scrutiny on:
- Lock nesting (3 independent locks: `_rpcLogLock`, `_hostAlertLock`, `_contextMenuLock`, plus new `_activityLogLock` — no path holds two simultaneously)
- Closure capture (each lock chain `.then()` captures its own `entry`/`now`/etc.)
- Unhandled rejection paths (try/catch inside chains; one failure doesn't poison future calls)
- Thread leaks (M-4 rate-limit check runs BEFORE thread spawn)
- M-1 validator backward compat (v1 exports without `source`/`target` keys accepted via `|| []` consumer pattern)
- M-5 popup counter (popup is short-lived; counter doesn't need cross-session persistence)
- M-7 retry not infinite (single retry, then re-raises on persistent failure)
- Manifest JSON validity (semver, no trailing commas)
## Mirror checks resolved
| Bug | Mirror flagged | Status |
|---|---|---|
| S-1 | other RPC-sourced `_meta.host_config` data | None exist beyond keep_ranking. Resolved (nothing to mirror). |
| M-1 | profiles[] + partPatterns[] | Both covered in same commit via `ARRAY_ELEMENT_VALIDATORS`. Resolved. |
| M-2 | other Chrome APIs needing re-register per SW boot | `chrome.alarms` persistent, `chrome.commands` manifest-declared. contextMenus is the outlier. Resolved. |
| M-3 | none flagged | n/a |
| M-4 | extension-side `postDiscordAlert` parity | Verified: extension-side already uses `fetch(...).catch(...)` fire-and-forget. Parity confirmed. Resolved. |
| M-5 | other search entry points | All 6 entry points (search-go, Enter, history chip, profile change, search-clear, pause-while-inflight) funnel through `runCheck`/`runManualSearch`. Covered. Resolved. |
| M-6 | options.js settings save / options-library-issues.js / activity log / tabvault | **`recordActivity` had same race — FIXED v0.1.45 at `background.js:613-635`** with dedicated `_activityLogLock`. options.js settings save is user-triggered (single SAVE click), low race risk. options-library-issues.js only does `set` (no get-then-set). Tabvault out-of-scope (separate project). Resolved. |
| M-7 | other `os.replace` callsites in rcjav/ | Only `save_cache` and `save_config` use `os.replace`. Both now have retry. Resolved. |
| L-1 | same as M-6 | Covered via `recordActivity` mirror fix above. Resolved. |
## Residual risk
1. **Deferred Lights (5).** Documented as cosmetic / UX polish. None block user workflows. Pass-through risk acceptable.
2. **M-3 timeout-path response shape change.** Added `startup_pending: true` on Popen timeout. Backward-compatible — existing UI ignores unknown keys. If future UI work parses this field, behavior locked in.
3. **Phase 3 re-audit blind spot.** Auditor instructed to focus on **introduced** bugs, not pre-existing. Pre-existing bugs in unmodified files (e.g. tabvault, rc-jav internals beyond cli.py) were not re-checked. Those remain in their respective audit-out-of-scope notes.
4. **Brave-specific divergence from Chrome contracts.** Several REFUTED candidates noted that if Brave is observed diverging from documented Chrome behavior (SW lifecycle, connectNative keepalive), some refuted bugs could re-emerge as Brave-specific. Not currently verified; flagged in chunk-3 candidate notes.
5. **Manifest version chip semantics.** Each fix bumped manifest per the project's reload-verification signal rule. Total 13 bumps (0.1.33 → 0.1.45 inclusive) covering: S-1, M-1, branding follow-up, M-2, M-3, M-2 follow-up (lock), M-4, M-5, M-5 follow-up (paused), M-6, L-1, M-3 follow-up (spawn race introduced + fixed in Phase 3), and M-6 mirror (recordActivity). M-7 had no bump (CLI repo only).
## Final pass
- **Files modified during Phase 2 + 3:**
- `D:\DEV\Extensions\Production\rclone-jav\background.js` (M-2 + M-6 + L-1 + M-2 follow-up + M-6 mirror)
- `D:\DEV\Extensions\Production\rclone-jav\src\options\options.js` (S-1 + M-1 + branding)
- `D:\DEV\Extensions\Production\rclone-jav\src\popup\popup.js` (M-5 + M-5 follow-up)
- `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py` (M-3 + M-4 + M-3 fix-of-fix)
- `D:\DEV\Extensions\Production\rclone-jav\manifest.json` (13 bumps: 0.1.33, 0.1.34, 0.1.35, 0.1.36, 0.1.37, 0.1.38, 0.1.39, 0.1.40, 0.1.41, 0.1.42, 0.1.43, 0.1.44, 0.1.45)
- `D:\DEV\Project\rclone-jav\rcjav\cli.py` (M-7)
- **Independent re-audit by fresh Explore agent:** 1 introduced bug found (M-3 spawn race) → fixed in v0.1.44. Final re-audit pass on the M-3 fix: trivial reorder of two adjacent blocks, no further issues.
- **All `bugs-*.md` files: zero entries with status `open` except deferred Lights** (L-2 through L-6 — 5 entries; intentionally deferred per Phase 2 close decision).
- **Extension `manifest.json` version: 0.1.45** (was 0.1.32 at audit start).
- **Test instrumentation residue check:** no `simulated spawn fail` / `M-3 TEST` / `REMOVE` markers remain in any source file.
- **JS syntax (`node --check`):** background.js, options.js, popup.js, options/* all pass.
- **Python syntax (`py_compile`):** rcjav-host.py, rcjav/cli.py pass.
## Verification verdict
**Phase 2 + 3 closed.** All Severe + Moderate fixed and runtime-verified. 1 of 6 Lights fixed (L-1, same bug class as M-6). 1 introduced bug surfaced and fixed during Phase 3 re-audit. All mirror checks resolved or scoped out. 5 cosmetic Lights deferred.
Next polish session (out of audit scope): L-2 through L-6 if/when prioritized.