From c17ac9e1e71acf73b8757416da283dae81903710 Mon Sep 17 00:00:00 2001 From: admin Date: Sat, 23 May 2026 11:07:26 +0200 Subject: [PATCH] Step 10j (host + extension): cache contract three-state UX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Completes the two-tier cache contract from step 9 / docs/CACHE_CONTRACT.md on the extension side. The Python side shipped in the Python repo at 33c495a. Host (rcjav-host.py): - fetch_rules_info() memoizes per-script-path calls to `rc-jav.py --print-rules-info` so handle_cache_status doesn't pay the Python startup cost on every poll. - _cache_freshness_fields(data, rules_info) computes the new cache_schema / id_rules / id_rules_signature trio + their three *_match booleans + cache_state ('fresh' / 'stale_by_rules' / 'schema_mismatch' / 'missing'). Legacy version:3 caches still on disk report as stale_by_rules with cache_schema_match=True (we'll migrate them at next load_cache). - New handle_reextract_ids() action forwards to `rc-jav.py --reextract --format json` with a 5-minute timeout. background.js: - New `reextract-ids` message forwards to host with a 300s timeout. options-cache.js + options-library-issues.js: - renderCacheContractBanner() paints a three-state banner above the per-remote list: green ✓ fresh / amber ! stale-by-rules (with "Re-extract IDs (fast, no rescan)" chip button) / red ✗ schema mismatch. Includes a snippet of the cache signature for diagnostics. - Delegated click handler in options-library-issues.js catches .cache-reextract, sends the message, shows transient "Re-extracting…" state, and replaces the button with a per-summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). - rules_info_error from the host surfaces as its own amber line above the banner. node --check passes on background.js, options-cache.js, options-library-issues.js individually and on the concatenation of all four script files. python -m py_compile passes on rcjav-host.py. Behavioral verification requires reloading the unpacked extension and running through: - Check Cache → banner shows "stale by rules" amber (legacy v3 cache) - Click "Re-extract IDs" → fast path runs, summary appears - Check Cache again → banner now shows "Cache up to date" green Co-Authored-By: Claude Opus 4.7 --- AGENTS.md | 8 ++- background.js | 13 +++++ host/rcjav-host.py | 116 +++++++++++++++++++++++++++++++++++++- options-cache.js | 30 ++++++++++ options-library-issues.js | 32 +++++++++++ 5 files changed, 197 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index ddfb650..d4e3271 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -149,7 +149,13 @@ Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Window 10. **`rc-jav.py` package split — done (sub-steps 10a–10i, shipped across two sessions).** Python repo at `D:\DEV\Project\rclone-jav\` is now git-tracked (baseline `e029e89`); `rc-jav.py` went from 2230 lines to a 25-line shim. New `rcjav/` package contains: `model.py` (24, FileEntry), `ids.py` (243, ID extraction + part detection + normalization + describe_id_match + expand_range), `cache.py` (76, cache.json I/O), `catalog.py` (178, WinCatalog CSV/XML), `dupes.py` (264, keep-ranking + find_dupes + variant alerts), `rclone_io.py` (298, subprocess wrappers + walk_remote + glob escaping), `library.py` (176, library-issues + safe rename), `output.py` (495, rich console + renderers + plain/CSV/JSON outputs), `cli.py` (845, main() + collectors + arg parsing). Pattern across all sub-steps: top-level mutable globals (`PART_RES`, `_KEEP_RANKING`, `BASIC`, `RCLONE_BIN`, `console`, `USE_ANSI`) are read/written only inside their owning module — callers go through setters (`configure_part_patterns`, `set_keep_ranking`, `set_basic`, `set_rclone_bin`, `set_console_no_color`, `set_use_ansi`) so no in-tree code ever sees a stale captured binding. `rc-jav.py` shim does `from rcjav import *` + `from rcjav.cli import main`, so `importlib.spec_from_file_location("rcjav_script", "rc-jav.py")` (used by tests/fixtures/native host) still finds every previously-top-level name. Each sub-step verified at commit time via `python rc-jav.py --help`, `python -m rcjav.cli --help`, `python fixtures/run.py` (17/17 cases), and `python -m unittest tests.test_rules` (5/5). (Step 10's cache-contract implementation is split off as step 10j below — design from step 9 is locked, implementation hasn't shipped.) -- **Step 10j — Implement the cache contract from step 9.** Split today's single `CACHE_VERSION = 3` into `cache_schema` + `id_rules` + `id_rules_signature` inside `rcjav/cache.py`. Add the one-shot in-place migration for users on legacy `version: 3` (translates the header without forcing a rescan, marks the cache as "stale by rules"). Add a `rc-jav.py --reextract` flag that walks `cache["remotes"][r]["files"][:]` against the current rule set and updates `jav_id` in place — no rclone calls. Update `rcjav.cache.load_cache` to return the new header shape; update the `cache-status` message in the extension's `background.js` to surface `cache_schema_match` / `id_rules_match` / `id_rules_signature_match` flags. In `options-cache.js`, render the three-state UX from `docs/CACHE_CONTRACT.md` (fresh ✓ / stale-by-rules ! / schema-mismatch ✗) and wire the "Re-extract IDs" button to a new `reextract-ids` message that forwards to host. Update `fixtures/run.py` to also exercise the signature stability across rule changes (mutating a regex bumps the sha256 even when `id_rules` integer is forgotten). +10j. **Cache contract implementation — done.** Two-tier contract from `docs/CACHE_CONTRACT.md` now live end-to-end across Python + host + extension. + + Python (`rcjav/cache.py` + `rcjav/ids.py` + `rcjav/cli.py`): new constants `CACHE_SCHEMA_VERSION = 1` and `ID_RULES_VERSION = 1`. New `current_rules_signature()` in `rcjav.ids` produces a stable sha256 over the canonical text of every rule that influences a `jav_id` (PRIMARY_ID_RE, COMPOUND_ID_RE, FALLBACK_ID_RE, _NOHYPHEN_ID_RE, _BRACKET_ID_RE, _VARIANT_SUFFIX_RE, _XOFY_PRIORITY_RE, _RESOLUTION_TAG_RE, BUILTIN_PART_RES, PART_RES, FC2 handling toggle). `load_cache(signature)` translates legacy `version: 3` headers in place — no forced rescan; the cache is stamped `id_rules: 0` + signature `"legacy"` so it reads as "stale by rules". `cache_state(cache, sig)` classifies as `fresh` / `stale_by_rules` / `schema_mismatch`. `stamp_current_rules(cache, sig)` updates the header after a full scan or `--reextract`. New `rc-jav.py --reextract` walks `cache["remotes"][r]["files"]` against the live rule set and updates `jav_id` in place (no rclone). Full `--scan` (without `--scan-since`) stamps current rules; incremental `--scan --scan-since` deliberately does not. Verified on the live 7124-file cache. + + Host (`rcjav-host.py`): new `--print-rules-info` flag on the Python side returns `{cache_schema, id_rules, id_rules_signature}` cheaply. Host memoizes the result per script path in `_RULES_INFO_CACHE` and augments `cache_status` responses with `cache_schema`, `id_rules`, `id_rules_signature`, the corresponding `expected_*` constants, three `*_match` booleans, and `cache_state` (`fresh` / `stale_by_rules` / `schema_mismatch` / `missing`). Legacy `version: 3` caches still on disk are reported as `stale_by_rules` with `cache_schema_match: true` (we'll migrate them at next `load_cache`). New `reextract_ids` action forwards to `rc-jav.py --reextract --format json` with a 5-minute timeout. + + Extension (`background.js` + `options-cache.js` + `options-library-issues.js`): new `reextract-ids` message in `background.js` calls the host with a 300s timeout. `renderCacheContractBanner(r)` in `options-cache.js` paints the three-state inline banner above the per-remote list — green ✓ for fresh, amber ! for stale-by-rules (with a "Re-extract IDs (fast, no rescan)" chip button), red ✗ for schema mismatch. The delegated click handler in `options-library-issues.js` (which already owns the cache-status-results container) catches `.cache-reextract`, sends the message, shows a transient "Re-extracting…" state, and replaces the button with a per-remote summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). `rules_info_error` from the host surfaces as a separate amber line above the banner. - **Step 11 — Host fast-path benchmark and decide.** Measure popup search latency under (a) idle Python and (b) Python actively scanning. If host fast path is the only thing keeping popup responsive under scan = narrow to dict lookup only and document. If not needed = delete entirely. **Architecture (locked — do not relitigate):** diff --git a/background.js b/background.js index 741288c..db6dfad 100644 --- a/background.js +++ b/background.js @@ -948,6 +948,19 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => { })(); return true; } + if (msg.type === "reextract-ids") { + (async () => { + try { + const settings = await getSettings(); + const r = await nativeCall({ + action: "reextract_ids", + rcjav_path: settings.rcjavPath || "", + }, 300_000); + sendResponse(r); + } catch (e) { sendResponse({ ok: false, error: e.message, error_kind: classifyNativeError(e), extension_id: chrome.runtime.id }); } + })(); + return true; + } if (msg.type === "host-status") { (async () => { try { diff --git a/host/rcjav-host.py b/host/rcjav-host.py index 75cb738..0632fc5 100644 --- a/host/rcjav-host.py +++ b/host/rcjav-host.py @@ -218,6 +218,40 @@ def run_rcjav(args: list[str], timeout: int = 120, extra_flags: list[str] | None return 1, "", f"spawn error: {e}" +# Memoize rules-info per script path so handle_cache_status doesn't pay +# the Python startup cost on every poll. Invalidate on rcjav_path change. +_RULES_INFO_CACHE: dict[str, dict] = {} + + +def fetch_rules_info(rcjav_path: str | None) -> dict: + """Get cache contract constants + current rules signature from rc-jav.py. + + Returns {ok, cache_schema, id_rules, id_rules_signature} on success, + or {ok: False, error: ...} when the lookup fails. Memoized per script + path; the result is cached for the lifetime of this host process. + """ + script = resolve_rcjav(rcjav_path) + key = str(script) + cached = _RULES_INFO_CACHE.get(key) + if cached is not None: + return cached + rc, stdout, stderr = run_rcjav( + ["--print-rules-info"], + extra_flags=[], # NB: omit --basic / --no-color, --print-rules-info already JSON + rcjav_path=rcjav_path, + timeout=30, + ) + if rc != 0: + info = {"ok": False, "error": (stderr or stdout or "unknown").strip()} + else: + try: + info = json.loads(stdout.strip()) + except json.JSONDecodeError as e: + info = {"ok": False, "error": f"invalid rules-info JSON: {e}"} + _RULES_INFO_CACHE[key] = info + return info + + def part_pattern_args(payload: dict) -> list[str]: args: list[str] = [] for pattern in payload.get("part_patterns") or []: @@ -1398,6 +1432,83 @@ def _describe_skipped_id(path: str, remote: str = "") -> dict: return {"path": path, "name": name, "ext": ext, "reason": reason, "full_path": full_path} +def _cache_freshness_fields(data: dict | None, rules_info: dict) -> dict: + """Build the cache-contract fields surfaced to the extension. + + `data` is the parsed cache.json (or None when the file is missing). + `rules_info` is the dict returned by fetch_rules_info; when its + `ok` flag is False we report match-flags as None and let the UI + decide whether to show a "rules lookup failed" state. + """ + out: dict = { + # Legacy field — preserved for any consumer that still reads it. + "version": (data or {}).get("version") if data else None, + # New two-tier contract: + "cache_schema": (data or {}).get("cache_schema") if data else None, + "id_rules": (data or {}).get("id_rules") if data else None, + "id_rules_signature": (data or {}).get("id_rules_signature") if data else None, + "expected_cache_schema": None, + "expected_id_rules": None, + "expected_id_rules_signature": None, + "cache_schema_match": None, + "id_rules_match": None, + "id_rules_signature_match": None, + "cache_state": None, # 'fresh' | 'stale_by_rules' | 'schema_mismatch' | 'missing' + "rules_info_error": None, + } + if not rules_info.get("ok"): + out["rules_info_error"] = rules_info.get("error") or "rules lookup failed" + return out + out["expected_cache_schema"] = rules_info.get("cache_schema") + out["expected_id_rules"] = rules_info.get("id_rules") + out["expected_id_rules_signature"] = rules_info.get("id_rules_signature") + if data is None: + out["cache_state"] = "missing" + return out + # Legacy version:3 cache (pre-migration on disk): treat as stale_by_rules. + if "cache_schema" not in data and data.get("version") == 3: + out["cache_state"] = "stale_by_rules" + out["cache_schema_match"] = True # we'll migrate at next load_cache + out["id_rules_match"] = False + out["id_rules_signature_match"] = False + return out + schema_match = data.get("cache_schema") == out["expected_cache_schema"] + rules_match = data.get("id_rules") == out["expected_id_rules"] + sig_match = data.get("id_rules_signature") == out["expected_id_rules_signature"] + out["cache_schema_match"] = schema_match + out["id_rules_match"] = rules_match + out["id_rules_signature_match"] = sig_match + if not schema_match: + out["cache_state"] = "schema_mismatch" + elif rules_match and sig_match: + out["cache_state"] = "fresh" + else: + out["cache_state"] = "stale_by_rules" + return out + + +def handle_reextract_ids(payload: dict) -> dict: + """Trigger a fast re-extract of jav_ids against the current rule set. + + No rclone calls — walks the on-disk cache.json. Stamps current rules + + signature into the cache on success so the next cache_status call + reports `cache_state: "fresh"`. + """ + rc, stdout, stderr = run_rcjav( + ["--reextract", "--format", "json"], + extra_flags=[], # --reextract already JSON; --basic would prepend noise + rcjav_path=payload.get("rcjav_path"), + timeout=300, + ) + if rc != 0: + return {"ok": False, "error": (stderr or stdout or "unknown").strip()} + try: + result = json.loads(stdout.strip()) + except json.JSONDecodeError as e: + return {"ok": False, "error": f"invalid reextract JSON: {e}"} + return result + + def handle_cache_status(payload: dict) -> dict: script = resolve_rcjav(payload.get("rcjav_path", "")) cache_path = script.parent / "cache.json" @@ -1413,6 +1524,7 @@ def handle_cache_status(payload: dict) -> dict: stale_hours = _stale_hours(payload) scan_state = _read_scan_state() configured_roots = set((configured.get("default_source") or []) + (configured.get("default_target") or [])) + rules_info = fetch_rules_info(payload.get("rcjav_path", "")) if not cache_path.exists(): remotes = [{ "remote": remote, @@ -1431,6 +1543,7 @@ def handle_cache_status(payload: dict) -> dict: "stale_hours": stale_hours, "scan_state": scan_state, "remotes": remotes, + **_cache_freshness_fields(None, rules_info), } try: data = json.loads(cache_path.read_text(encoding="utf-8")) @@ -1525,12 +1638,12 @@ def handle_cache_status(payload: dict) -> dict: "ok": True, "cache_exists": True, "cache_path": str(cache_path), - "version": data.get("version"), "stale_hours": stale_hours, "configured": configured, "scan_state": scan_state, "remotes": remotes, "warnings": warnings, + **_cache_freshness_fields(data, rules_info), } @@ -1980,6 +2093,7 @@ DISPATCH = { "host_status": handle_host_status, "host_repair": handle_host_repair, "cache_status": handle_cache_status, + "reextract_ids": handle_reextract_ids, "recent_deletes": handle_recent_deletes, "undo_delete": handle_undo_delete, "scan": handle_scan, diff --git a/options-cache.js b/options-cache.js index 8e2c5f2..fb8cc16 100644 --- a/options-cache.js +++ b/options-cache.js @@ -107,6 +107,35 @@ document.getElementById("setup-health-run").addEventListener("click", (event) => }) ); +// Three-state UX (docs/CACHE_CONTRACT.md): fresh / stale_by_rules / schema_mismatch. +// Renders an inline banner above the per-remote list. Stale_by_rules adds a +// "Re-extract IDs" button that triggers the fast rebuild without rclone. +function renderCacheContractBanner(r) { + const state = r.cache_state; + if (r.rules_info_error) { + return `
⚠ rules lookup failed: ${escapeHtml(r.rules_info_error)}
`; + } + if (state === "fresh") { + return `
✓ Cache up to date with current ID rules.
`; + } + if (state === "stale_by_rules") { + const sigLine = r.id_rules_signature && r.id_rules_signature !== "legacy" + ? `
Cache signature: ${escapeHtml(String(r.id_rules_signature).slice(0, 22))}…
` + : `
Cache predates the two-tier contract (legacy header).
`; + return `
+ ! Cache is stale by rules. ID extraction rules have changed since this cache was built. Some jav_id values may be out of date. + ${sigLine} +
+
`; + } + if (state === "schema_mismatch") { + return `
+ ✗ Cache schema mismatch. The on-disk cache shape is incompatible (schema ${escapeHtml(r.cache_schema ?? "?")} vs expected ${escapeHtml(r.expected_cache_schema ?? "?")}). A full re-scan is required. +
`; + } + return ""; +} + document.getElementById("cache-status-run").addEventListener("click", async () => { const out = document.getElementById("cache-status-results"); out.textContent = "checking cache..."; @@ -136,6 +165,7 @@ document.getElementById("cache-status-run").addEventListener("click", async () = `
Configured target: ${escapeHtml((r.configured?.default_target || []).join(", ") || "(none)")}
`, `
Configured source: ${escapeHtml((r.configured?.default_source || []).join(", ") || "(none)")}
`, ]; + rows.push(renderCacheContractBanner(r)); for (const m of r.remotes || []) { const color = m.status === "never_scanned" || m.stale ? "#ffa" : "#afa"; const state = m.status === "never_scanned" ? "never scanned" : `${m.status || (m.stale ? "stale" : "fresh")} · age ${fmtCacheAge(m.age_hours)}`; diff --git a/options-library-issues.js b/options-library-issues.js index 114de71..0d901e2 100644 --- a/options-library-issues.js +++ b/options-library-issues.js @@ -415,6 +415,38 @@ document.getElementById("library-issues-modal").addEventListener("click", (e) => showSkipped.closest("div")?.after(panel); return; } + const reextract = event.target.closest(".cache-reextract"); + if (reextract) { + const original = reextract.textContent; + reextract.disabled = true; + reextract.textContent = "Re-extracting…"; + (async () => { + try { + const r = await chrome.runtime.sendMessage({ type: "reextract-ids" }); + if (!r || !r.ok) { + reextract.textContent = original; + reextract.disabled = false; + const note = document.createElement("div"); + note.style.cssText = "color:#faa;margin-top:6px;font-size:11px;"; + note.textContent = `Re-extract failed: ${r?.error || "no response"}`; + reextract.after(note); + return; + } + const note = document.createElement("div"); + note.style.cssText = "color:#afa;margin-top:6px;font-size:11px;"; + note.textContent = `Re-extracted ${r.total ?? 0} IDs · ${r.changed ?? 0} changed · ${r.unchanged ?? 0} unchanged · ${r.dropped ?? 0} dropped. Re-run Check Cache to refresh this view.`; + reextract.replaceWith(note); + } catch (err) { + reextract.textContent = original; + reextract.disabled = false; + const note = document.createElement("div"); + note.style.cssText = "color:#faa;margin-top:6px;font-size:11px;"; + note.textContent = `Re-extract failed: ${err?.message || String(err)}`; + reextract.after(note); + } + })(); + return; + } const refresh = event.target.closest(".cache-refresh-remote"); if (refresh) { const remote = refresh.dataset.remote || "";