diff --git a/AGENTS.md b/AGENTS.md index ddfb650..d4e3271 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -149,7 +149,13 @@ Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Window 10. **`rc-jav.py` package split — done (sub-steps 10a–10i, shipped across two sessions).** Python repo at `D:\DEV\Project\rclone-jav\` is now git-tracked (baseline `e029e89`); `rc-jav.py` went from 2230 lines to a 25-line shim. New `rcjav/` package contains: `model.py` (24, FileEntry), `ids.py` (243, ID extraction + part detection + normalization + describe_id_match + expand_range), `cache.py` (76, cache.json I/O), `catalog.py` (178, WinCatalog CSV/XML), `dupes.py` (264, keep-ranking + find_dupes + variant alerts), `rclone_io.py` (298, subprocess wrappers + walk_remote + glob escaping), `library.py` (176, library-issues + safe rename), `output.py` (495, rich console + renderers + plain/CSV/JSON outputs), `cli.py` (845, main() + collectors + arg parsing). Pattern across all sub-steps: top-level mutable globals (`PART_RES`, `_KEEP_RANKING`, `BASIC`, `RCLONE_BIN`, `console`, `USE_ANSI`) are read/written only inside their owning module — callers go through setters (`configure_part_patterns`, `set_keep_ranking`, `set_basic`, `set_rclone_bin`, `set_console_no_color`, `set_use_ansi`) so no in-tree code ever sees a stale captured binding. `rc-jav.py` shim does `from rcjav import *` + `from rcjav.cli import main`, so `importlib.spec_from_file_location("rcjav_script", "rc-jav.py")` (used by tests/fixtures/native host) still finds every previously-top-level name. Each sub-step verified at commit time via `python rc-jav.py --help`, `python -m rcjav.cli --help`, `python fixtures/run.py` (17/17 cases), and `python -m unittest tests.test_rules` (5/5). (Step 10's cache-contract implementation is split off as step 10j below — design from step 9 is locked, implementation hasn't shipped.) -- **Step 10j — Implement the cache contract from step 9.** Split today's single `CACHE_VERSION = 3` into `cache_schema` + `id_rules` + `id_rules_signature` inside `rcjav/cache.py`. Add the one-shot in-place migration for users on legacy `version: 3` (translates the header without forcing a rescan, marks the cache as "stale by rules"). Add a `rc-jav.py --reextract` flag that walks `cache["remotes"][r]["files"][:]` against the current rule set and updates `jav_id` in place — no rclone calls. Update `rcjav.cache.load_cache` to return the new header shape; update the `cache-status` message in the extension's `background.js` to surface `cache_schema_match` / `id_rules_match` / `id_rules_signature_match` flags. In `options-cache.js`, render the three-state UX from `docs/CACHE_CONTRACT.md` (fresh ✓ / stale-by-rules ! / schema-mismatch ✗) and wire the "Re-extract IDs" button to a new `reextract-ids` message that forwards to host. Update `fixtures/run.py` to also exercise the signature stability across rule changes (mutating a regex bumps the sha256 even when `id_rules` integer is forgotten). +10j. **Cache contract implementation — done.** Two-tier contract from `docs/CACHE_CONTRACT.md` now live end-to-end across Python + host + extension. + + Python (`rcjav/cache.py` + `rcjav/ids.py` + `rcjav/cli.py`): new constants `CACHE_SCHEMA_VERSION = 1` and `ID_RULES_VERSION = 1`. New `current_rules_signature()` in `rcjav.ids` produces a stable sha256 over the canonical text of every rule that influences a `jav_id` (PRIMARY_ID_RE, COMPOUND_ID_RE, FALLBACK_ID_RE, _NOHYPHEN_ID_RE, _BRACKET_ID_RE, _VARIANT_SUFFIX_RE, _XOFY_PRIORITY_RE, _RESOLUTION_TAG_RE, BUILTIN_PART_RES, PART_RES, FC2 handling toggle). `load_cache(signature)` translates legacy `version: 3` headers in place — no forced rescan; the cache is stamped `id_rules: 0` + signature `"legacy"` so it reads as "stale by rules". `cache_state(cache, sig)` classifies as `fresh` / `stale_by_rules` / `schema_mismatch`. `stamp_current_rules(cache, sig)` updates the header after a full scan or `--reextract`. New `rc-jav.py --reextract` walks `cache["remotes"][r]["files"]` against the live rule set and updates `jav_id` in place (no rclone). Full `--scan` (without `--scan-since`) stamps current rules; incremental `--scan --scan-since` deliberately does not. Verified on the live 7124-file cache. + + Host (`rcjav-host.py`): new `--print-rules-info` flag on the Python side returns `{cache_schema, id_rules, id_rules_signature}` cheaply. Host memoizes the result per script path in `_RULES_INFO_CACHE` and augments `cache_status` responses with `cache_schema`, `id_rules`, `id_rules_signature`, the corresponding `expected_*` constants, three `*_match` booleans, and `cache_state` (`fresh` / `stale_by_rules` / `schema_mismatch` / `missing`). Legacy `version: 3` caches still on disk are reported as `stale_by_rules` with `cache_schema_match: true` (we'll migrate them at next `load_cache`). New `reextract_ids` action forwards to `rc-jav.py --reextract --format json` with a 5-minute timeout. + + Extension (`background.js` + `options-cache.js` + `options-library-issues.js`): new `reextract-ids` message in `background.js` calls the host with a 300s timeout. `renderCacheContractBanner(r)` in `options-cache.js` paints the three-state inline banner above the per-remote list — green ✓ for fresh, amber ! for stale-by-rules (with a "Re-extract IDs (fast, no rescan)" chip button), red ✗ for schema mismatch. The delegated click handler in `options-library-issues.js` (which already owns the cache-status-results container) catches `.cache-reextract`, sends the message, shows a transient "Re-extracting…" state, and replaces the button with a per-remote summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). `rules_info_error` from the host surfaces as a separate amber line above the banner. - **Step 11 — Host fast-path benchmark and decide.** Measure popup search latency under (a) idle Python and (b) Python actively scanning. If host fast path is the only thing keeping popup responsive under scan = narrow to dict lookup only and document. If not needed = delete entirely. **Architecture (locked — do not relitigate):** diff --git a/background.js b/background.js index 741288c..db6dfad 100644 --- a/background.js +++ b/background.js @@ -948,6 +948,19 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => { })(); return true; } + if (msg.type === "reextract-ids") { + (async () => { + try { + const settings = await getSettings(); + const r = await nativeCall({ + action: "reextract_ids", + rcjav_path: settings.rcjavPath || "", + }, 300_000); + sendResponse(r); + } catch (e) { sendResponse({ ok: false, error: e.message, error_kind: classifyNativeError(e), extension_id: chrome.runtime.id }); } + })(); + return true; + } if (msg.type === "host-status") { (async () => { try { diff --git a/host/rcjav-host.py b/host/rcjav-host.py index 75cb738..0632fc5 100644 --- a/host/rcjav-host.py +++ b/host/rcjav-host.py @@ -218,6 +218,40 @@ def run_rcjav(args: list[str], timeout: int = 120, extra_flags: list[str] | None return 1, "", f"spawn error: {e}" +# Memoize rules-info per script path so handle_cache_status doesn't pay +# the Python startup cost on every poll. Invalidate on rcjav_path change. +_RULES_INFO_CACHE: dict[str, dict] = {} + + +def fetch_rules_info(rcjav_path: str | None) -> dict: + """Get cache contract constants + current rules signature from rc-jav.py. + + Returns {ok, cache_schema, id_rules, id_rules_signature} on success, + or {ok: False, error: ...} when the lookup fails. Memoized per script + path; the result is cached for the lifetime of this host process. + """ + script = resolve_rcjav(rcjav_path) + key = str(script) + cached = _RULES_INFO_CACHE.get(key) + if cached is not None: + return cached + rc, stdout, stderr = run_rcjav( + ["--print-rules-info"], + extra_flags=[], # NB: omit --basic / --no-color, --print-rules-info already JSON + rcjav_path=rcjav_path, + timeout=30, + ) + if rc != 0: + info = {"ok": False, "error": (stderr or stdout or "unknown").strip()} + else: + try: + info = json.loads(stdout.strip()) + except json.JSONDecodeError as e: + info = {"ok": False, "error": f"invalid rules-info JSON: {e}"} + _RULES_INFO_CACHE[key] = info + return info + + def part_pattern_args(payload: dict) -> list[str]: args: list[str] = [] for pattern in payload.get("part_patterns") or []: @@ -1398,6 +1432,83 @@ def _describe_skipped_id(path: str, remote: str = "") -> dict: return {"path": path, "name": name, "ext": ext, "reason": reason, "full_path": full_path} +def _cache_freshness_fields(data: dict | None, rules_info: dict) -> dict: + """Build the cache-contract fields surfaced to the extension. + + `data` is the parsed cache.json (or None when the file is missing). + `rules_info` is the dict returned by fetch_rules_info; when its + `ok` flag is False we report match-flags as None and let the UI + decide whether to show a "rules lookup failed" state. + """ + out: dict = { + # Legacy field — preserved for any consumer that still reads it. + "version": (data or {}).get("version") if data else None, + # New two-tier contract: + "cache_schema": (data or {}).get("cache_schema") if data else None, + "id_rules": (data or {}).get("id_rules") if data else None, + "id_rules_signature": (data or {}).get("id_rules_signature") if data else None, + "expected_cache_schema": None, + "expected_id_rules": None, + "expected_id_rules_signature": None, + "cache_schema_match": None, + "id_rules_match": None, + "id_rules_signature_match": None, + "cache_state": None, # 'fresh' | 'stale_by_rules' | 'schema_mismatch' | 'missing' + "rules_info_error": None, + } + if not rules_info.get("ok"): + out["rules_info_error"] = rules_info.get("error") or "rules lookup failed" + return out + out["expected_cache_schema"] = rules_info.get("cache_schema") + out["expected_id_rules"] = rules_info.get("id_rules") + out["expected_id_rules_signature"] = rules_info.get("id_rules_signature") + if data is None: + out["cache_state"] = "missing" + return out + # Legacy version:3 cache (pre-migration on disk): treat as stale_by_rules. + if "cache_schema" not in data and data.get("version") == 3: + out["cache_state"] = "stale_by_rules" + out["cache_schema_match"] = True # we'll migrate at next load_cache + out["id_rules_match"] = False + out["id_rules_signature_match"] = False + return out + schema_match = data.get("cache_schema") == out["expected_cache_schema"] + rules_match = data.get("id_rules") == out["expected_id_rules"] + sig_match = data.get("id_rules_signature") == out["expected_id_rules_signature"] + out["cache_schema_match"] = schema_match + out["id_rules_match"] = rules_match + out["id_rules_signature_match"] = sig_match + if not schema_match: + out["cache_state"] = "schema_mismatch" + elif rules_match and sig_match: + out["cache_state"] = "fresh" + else: + out["cache_state"] = "stale_by_rules" + return out + + +def handle_reextract_ids(payload: dict) -> dict: + """Trigger a fast re-extract of jav_ids against the current rule set. + + No rclone calls — walks the on-disk cache.json. Stamps current rules + + signature into the cache on success so the next cache_status call + reports `cache_state: "fresh"`. + """ + rc, stdout, stderr = run_rcjav( + ["--reextract", "--format", "json"], + extra_flags=[], # --reextract already JSON; --basic would prepend noise + rcjav_path=payload.get("rcjav_path"), + timeout=300, + ) + if rc != 0: + return {"ok": False, "error": (stderr or stdout or "unknown").strip()} + try: + result = json.loads(stdout.strip()) + except json.JSONDecodeError as e: + return {"ok": False, "error": f"invalid reextract JSON: {e}"} + return result + + def handle_cache_status(payload: dict) -> dict: script = resolve_rcjav(payload.get("rcjav_path", "")) cache_path = script.parent / "cache.json" @@ -1413,6 +1524,7 @@ def handle_cache_status(payload: dict) -> dict: stale_hours = _stale_hours(payload) scan_state = _read_scan_state() configured_roots = set((configured.get("default_source") or []) + (configured.get("default_target") or [])) + rules_info = fetch_rules_info(payload.get("rcjav_path", "")) if not cache_path.exists(): remotes = [{ "remote": remote, @@ -1431,6 +1543,7 @@ def handle_cache_status(payload: dict) -> dict: "stale_hours": stale_hours, "scan_state": scan_state, "remotes": remotes, + **_cache_freshness_fields(None, rules_info), } try: data = json.loads(cache_path.read_text(encoding="utf-8")) @@ -1525,12 +1638,12 @@ def handle_cache_status(payload: dict) -> dict: "ok": True, "cache_exists": True, "cache_path": str(cache_path), - "version": data.get("version"), "stale_hours": stale_hours, "configured": configured, "scan_state": scan_state, "remotes": remotes, "warnings": warnings, + **_cache_freshness_fields(data, rules_info), } @@ -1980,6 +2093,7 @@ DISPATCH = { "host_status": handle_host_status, "host_repair": handle_host_repair, "cache_status": handle_cache_status, + "reextract_ids": handle_reextract_ids, "recent_deletes": handle_recent_deletes, "undo_delete": handle_undo_delete, "scan": handle_scan, diff --git a/options-cache.js b/options-cache.js index 8e2c5f2..fb8cc16 100644 --- a/options-cache.js +++ b/options-cache.js @@ -107,6 +107,35 @@ document.getElementById("setup-health-run").addEventListener("click", (event) => }) ); +// Three-state UX (docs/CACHE_CONTRACT.md): fresh / stale_by_rules / schema_mismatch. +// Renders an inline banner above the per-remote list. Stale_by_rules adds a +// "Re-extract IDs" button that triggers the fast rebuild without rclone. +function renderCacheContractBanner(r) { + const state = r.cache_state; + if (r.rules_info_error) { + return `
${escapeHtml(String(r.id_rules_signature).slice(0, 22))}…jav_id values may be out of date.
+ ${sigLine}
+
+