Step 10j (host + extension): cache contract three-state UX

Completes the two-tier cache contract from step 9 / docs/CACHE_CONTRACT.md
on the extension side. The Python side shipped in the Python repo at
33c495a.

Host (rcjav-host.py):
  - fetch_rules_info() memoizes per-script-path calls to
    `rc-jav.py --print-rules-info` so handle_cache_status doesn't pay
    the Python startup cost on every poll.
  - _cache_freshness_fields(data, rules_info) computes the new
    cache_schema / id_rules / id_rules_signature trio + their three
    *_match booleans + cache_state ('fresh' / 'stale_by_rules' /
    'schema_mismatch' / 'missing'). Legacy version:3 caches still on
    disk report as stale_by_rules with cache_schema_match=True (we'll
    migrate them at next load_cache).
  - New handle_reextract_ids() action forwards to
    `rc-jav.py --reextract --format json` with a 5-minute timeout.

background.js:
  - New `reextract-ids` message forwards to host with a 300s timeout.

options-cache.js + options-library-issues.js:
  - renderCacheContractBanner() paints a three-state banner above the
    per-remote list: green ✓ fresh / amber ! stale-by-rules (with
    "Re-extract IDs (fast, no rescan)" chip button) / red ✗ schema
    mismatch. Includes a snippet of the cache signature for diagnostics.
  - Delegated click handler in options-library-issues.js catches
    .cache-reextract, sends the message, shows transient
    "Re-extracting…" state, and replaces the button with a per-summary
    line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped").
  - rules_info_error from the host surfaces as its own amber line above
    the banner.

node --check passes on background.js, options-cache.js,
options-library-issues.js individually and on the concatenation of all
four script files. python -m py_compile passes on rcjav-host.py.
Behavioral verification requires reloading the unpacked extension and
running through:
  - Check Cache → banner shows "stale by rules" amber (legacy v3 cache)
  - Click "Re-extract IDs" → fast path runs, summary appears
  - Check Cache again → banner now shows "Cache up to date" green

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
admin
2026-05-23 11:07:26 +02:00
parent 1a57b6a4f9
commit c17ac9e1e7
5 changed files with 197 additions and 2 deletions
+7 -1
View File
@@ -149,7 +149,13 @@ Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Window
10. **`rc-jav.py` package split — done (sub-steps 10a10i, shipped across two sessions).** Python repo at `D:\DEV\Project\rclone-jav\` is now git-tracked (baseline `e029e89`); `rc-jav.py` went from 2230 lines to a 25-line shim. New `rcjav/` package contains: `model.py` (24, FileEntry), `ids.py` (243, ID extraction + part detection + normalization + describe_id_match + expand_range), `cache.py` (76, cache.json I/O), `catalog.py` (178, WinCatalog CSV/XML), `dupes.py` (264, keep-ranking + find_dupes + variant alerts), `rclone_io.py` (298, subprocess wrappers + walk_remote + glob escaping), `library.py` (176, library-issues + safe rename), `output.py` (495, rich console + renderers + plain/CSV/JSON outputs), `cli.py` (845, main() + collectors + arg parsing). Pattern across all sub-steps: top-level mutable globals (`PART_RES`, `_KEEP_RANKING`, `BASIC`, `RCLONE_BIN`, `console`, `USE_ANSI`) are read/written only inside their owning module — callers go through setters (`configure_part_patterns`, `set_keep_ranking`, `set_basic`, `set_rclone_bin`, `set_console_no_color`, `set_use_ansi`) so no in-tree code ever sees a stale captured binding. `rc-jav.py` shim does `from rcjav import *` + `from rcjav.cli import main`, so `importlib.spec_from_file_location("rcjav_script", "rc-jav.py")` (used by tests/fixtures/native host) still finds every previously-top-level name. Each sub-step verified at commit time via `python rc-jav.py --help`, `python -m rcjav.cli --help`, `python fixtures/run.py` (17/17 cases), and `python -m unittest tests.test_rules` (5/5).
(Step 10's cache-contract implementation is split off as step 10j below — design from step 9 is locked, implementation hasn't shipped.)
- **Step 10j — Implement the cache contract from step 9.** Split today's single `CACHE_VERSION = 3` into `cache_schema` + `id_rules` + `id_rules_signature` inside `rcjav/cache.py`. Add the one-shot in-place migration for users on legacy `version: 3` (translates the header without forcing a rescan, marks the cache as "stale by rules"). Add a `rc-jav.py --reextract` flag that walks `cache["remotes"][r]["files"][:]` against the current rule set and updates `jav_id` in place — no rclone calls. Update `rcjav.cache.load_cache` to return the new header shape; update the `cache-status` message in the extension's `background.js` to surface `cache_schema_match` / `id_rules_match` / `id_rules_signature_match` flags. In `options-cache.js`, render the three-state UX from `docs/CACHE_CONTRACT.md` (fresh ✓ / stale-by-rules ! / schema-mismatch ✗) and wire the "Re-extract IDs" button to a new `reextract-ids` message that forwards to host. Update `fixtures/run.py` to also exercise the signature stability across rule changes (mutating a regex bumps the sha256 even when `id_rules` integer is forgotten).
10j. **Cache contract implementation — done.** Two-tier contract from `docs/CACHE_CONTRACT.md` now live end-to-end across Python + host + extension.
Python (`rcjav/cache.py` + `rcjav/ids.py` + `rcjav/cli.py`): new constants `CACHE_SCHEMA_VERSION = 1` and `ID_RULES_VERSION = 1`. New `current_rules_signature()` in `rcjav.ids` produces a stable sha256 over the canonical text of every rule that influences a `jav_id` (PRIMARY_ID_RE, COMPOUND_ID_RE, FALLBACK_ID_RE, _NOHYPHEN_ID_RE, _BRACKET_ID_RE, _VARIANT_SUFFIX_RE, _XOFY_PRIORITY_RE, _RESOLUTION_TAG_RE, BUILTIN_PART_RES, PART_RES, FC2 handling toggle). `load_cache(signature)` translates legacy `version: 3` headers in place — no forced rescan; the cache is stamped `id_rules: 0` + signature `"legacy"` so it reads as "stale by rules". `cache_state(cache, sig)` classifies as `fresh` / `stale_by_rules` / `schema_mismatch`. `stamp_current_rules(cache, sig)` updates the header after a full scan or `--reextract`. New `rc-jav.py --reextract` walks `cache["remotes"][r]["files"]` against the live rule set and updates `jav_id` in place (no rclone). Full `--scan` (without `--scan-since`) stamps current rules; incremental `--scan --scan-since` deliberately does not. Verified on the live 7124-file cache.
Host (`rcjav-host.py`): new `--print-rules-info` flag on the Python side returns `{cache_schema, id_rules, id_rules_signature}` cheaply. Host memoizes the result per script path in `_RULES_INFO_CACHE` and augments `cache_status` responses with `cache_schema`, `id_rules`, `id_rules_signature`, the corresponding `expected_*` constants, three `*_match` booleans, and `cache_state` (`fresh` / `stale_by_rules` / `schema_mismatch` / `missing`). Legacy `version: 3` caches still on disk are reported as `stale_by_rules` with `cache_schema_match: true` (we'll migrate them at next `load_cache`). New `reextract_ids` action forwards to `rc-jav.py --reextract --format json` with a 5-minute timeout.
Extension (`background.js` + `options-cache.js` + `options-library-issues.js`): new `reextract-ids` message in `background.js` calls the host with a 300s timeout. `renderCacheContractBanner(r)` in `options-cache.js` paints the three-state inline banner above the per-remote list — green ✓ for fresh, amber ! for stale-by-rules (with a "Re-extract IDs (fast, no rescan)" chip button), red ✗ for schema mismatch. The delegated click handler in `options-library-issues.js` (which already owns the cache-status-results container) catches `.cache-reextract`, sends the message, shows a transient "Re-extracting…" state, and replaces the button with a per-remote summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). `rules_info_error` from the host surfaces as a separate amber line above the banner.
- **Step 11 — Host fast-path benchmark and decide.** Measure popup search latency under (a) idle Python and (b) Python actively scanning. If host fast path is the only thing keeping popup responsive under scan = narrow to dict lookup only and document. If not needed = delete entirely.
**Architecture (locked — do not relitigate):**
+13
View File
@@ -948,6 +948,19 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
})();
return true;
}
if (msg.type === "reextract-ids") {
(async () => {
try {
const settings = await getSettings();
const r = await nativeCall({
action: "reextract_ids",
rcjav_path: settings.rcjavPath || "",
}, 300_000);
sendResponse(r);
} catch (e) { sendResponse({ ok: false, error: e.message, error_kind: classifyNativeError(e), extension_id: chrome.runtime.id }); }
})();
return true;
}
if (msg.type === "host-status") {
(async () => {
try {
+115 -1
View File
@@ -218,6 +218,40 @@ def run_rcjav(args: list[str], timeout: int = 120, extra_flags: list[str] | None
return 1, "", f"spawn error: {e}"
# Memoize rules-info per script path so handle_cache_status doesn't pay
# the Python startup cost on every poll. Invalidate on rcjav_path change.
_RULES_INFO_CACHE: dict[str, dict] = {}
def fetch_rules_info(rcjav_path: str | None) -> dict:
"""Get cache contract constants + current rules signature from rc-jav.py.
Returns {ok, cache_schema, id_rules, id_rules_signature} on success,
or {ok: False, error: ...} when the lookup fails. Memoized per script
path; the result is cached for the lifetime of this host process.
"""
script = resolve_rcjav(rcjav_path)
key = str(script)
cached = _RULES_INFO_CACHE.get(key)
if cached is not None:
return cached
rc, stdout, stderr = run_rcjav(
["--print-rules-info"],
extra_flags=[], # NB: omit --basic / --no-color, --print-rules-info already JSON
rcjav_path=rcjav_path,
timeout=30,
)
if rc != 0:
info = {"ok": False, "error": (stderr or stdout or "unknown").strip()}
else:
try:
info = json.loads(stdout.strip())
except json.JSONDecodeError as e:
info = {"ok": False, "error": f"invalid rules-info JSON: {e}"}
_RULES_INFO_CACHE[key] = info
return info
def part_pattern_args(payload: dict) -> list[str]:
args: list[str] = []
for pattern in payload.get("part_patterns") or []:
@@ -1398,6 +1432,83 @@ def _describe_skipped_id(path: str, remote: str = "") -> dict:
return {"path": path, "name": name, "ext": ext, "reason": reason, "full_path": full_path}
def _cache_freshness_fields(data: dict | None, rules_info: dict) -> dict:
"""Build the cache-contract fields surfaced to the extension.
`data` is the parsed cache.json (or None when the file is missing).
`rules_info` is the dict returned by fetch_rules_info; when its
`ok` flag is False we report match-flags as None and let the UI
decide whether to show a "rules lookup failed" state.
"""
out: dict = {
# Legacy field — preserved for any consumer that still reads it.
"version": (data or {}).get("version") if data else None,
# New two-tier contract:
"cache_schema": (data or {}).get("cache_schema") if data else None,
"id_rules": (data or {}).get("id_rules") if data else None,
"id_rules_signature": (data or {}).get("id_rules_signature") if data else None,
"expected_cache_schema": None,
"expected_id_rules": None,
"expected_id_rules_signature": None,
"cache_schema_match": None,
"id_rules_match": None,
"id_rules_signature_match": None,
"cache_state": None, # 'fresh' | 'stale_by_rules' | 'schema_mismatch' | 'missing'
"rules_info_error": None,
}
if not rules_info.get("ok"):
out["rules_info_error"] = rules_info.get("error") or "rules lookup failed"
return out
out["expected_cache_schema"] = rules_info.get("cache_schema")
out["expected_id_rules"] = rules_info.get("id_rules")
out["expected_id_rules_signature"] = rules_info.get("id_rules_signature")
if data is None:
out["cache_state"] = "missing"
return out
# Legacy version:3 cache (pre-migration on disk): treat as stale_by_rules.
if "cache_schema" not in data and data.get("version") == 3:
out["cache_state"] = "stale_by_rules"
out["cache_schema_match"] = True # we'll migrate at next load_cache
out["id_rules_match"] = False
out["id_rules_signature_match"] = False
return out
schema_match = data.get("cache_schema") == out["expected_cache_schema"]
rules_match = data.get("id_rules") == out["expected_id_rules"]
sig_match = data.get("id_rules_signature") == out["expected_id_rules_signature"]
out["cache_schema_match"] = schema_match
out["id_rules_match"] = rules_match
out["id_rules_signature_match"] = sig_match
if not schema_match:
out["cache_state"] = "schema_mismatch"
elif rules_match and sig_match:
out["cache_state"] = "fresh"
else:
out["cache_state"] = "stale_by_rules"
return out
def handle_reextract_ids(payload: dict) -> dict:
"""Trigger a fast re-extract of jav_ids against the current rule set.
No rclone calls — walks the on-disk cache.json. Stamps current rules
+ signature into the cache on success so the next cache_status call
reports `cache_state: "fresh"`.
"""
rc, stdout, stderr = run_rcjav(
["--reextract", "--format", "json"],
extra_flags=[], # --reextract already JSON; --basic would prepend noise
rcjav_path=payload.get("rcjav_path"),
timeout=300,
)
if rc != 0:
return {"ok": False, "error": (stderr or stdout or "unknown").strip()}
try:
result = json.loads(stdout.strip())
except json.JSONDecodeError as e:
return {"ok": False, "error": f"invalid reextract JSON: {e}"}
return result
def handle_cache_status(payload: dict) -> dict:
script = resolve_rcjav(payload.get("rcjav_path", ""))
cache_path = script.parent / "cache.json"
@@ -1413,6 +1524,7 @@ def handle_cache_status(payload: dict) -> dict:
stale_hours = _stale_hours(payload)
scan_state = _read_scan_state()
configured_roots = set((configured.get("default_source") or []) + (configured.get("default_target") or []))
rules_info = fetch_rules_info(payload.get("rcjav_path", ""))
if not cache_path.exists():
remotes = [{
"remote": remote,
@@ -1431,6 +1543,7 @@ def handle_cache_status(payload: dict) -> dict:
"stale_hours": stale_hours,
"scan_state": scan_state,
"remotes": remotes,
**_cache_freshness_fields(None, rules_info),
}
try:
data = json.loads(cache_path.read_text(encoding="utf-8"))
@@ -1525,12 +1638,12 @@ def handle_cache_status(payload: dict) -> dict:
"ok": True,
"cache_exists": True,
"cache_path": str(cache_path),
"version": data.get("version"),
"stale_hours": stale_hours,
"configured": configured,
"scan_state": scan_state,
"remotes": remotes,
"warnings": warnings,
**_cache_freshness_fields(data, rules_info),
}
@@ -1980,6 +2093,7 @@ DISPATCH = {
"host_status": handle_host_status,
"host_repair": handle_host_repair,
"cache_status": handle_cache_status,
"reextract_ids": handle_reextract_ids,
"recent_deletes": handle_recent_deletes,
"undo_delete": handle_undo_delete,
"scan": handle_scan,
+30
View File
@@ -107,6 +107,35 @@ document.getElementById("setup-health-run").addEventListener("click", (event) =>
})
);
// Three-state UX (docs/CACHE_CONTRACT.md): fresh / stale_by_rules / schema_mismatch.
// Renders an inline banner above the per-remote list. Stale_by_rules adds a
// "Re-extract IDs" button that triggers the fast rebuild without rclone.
function renderCacheContractBanner(r) {
const state = r.cache_state;
if (r.rules_info_error) {
return `<div style="margin-top:10px;padding:6px 8px;background:rgba(255,200,50,.08);border:1px solid rgba(255,200,50,.25);border-radius:4px;color:#ffa;">⚠ rules lookup failed: ${escapeHtml(r.rules_info_error)}</div>`;
}
if (state === "fresh") {
return `<div style="margin-top:10px;padding:6px 8px;background:rgba(120,200,120,.08);border:1px solid rgba(120,200,120,.25);border-radius:4px;color:#afa;">✓ Cache up to date with current ID rules.</div>`;
}
if (state === "stale_by_rules") {
const sigLine = r.id_rules_signature && r.id_rules_signature !== "legacy"
? `<div style="color:#999;font-size:11px;margin-top:3px;">Cache signature: <code>${escapeHtml(String(r.id_rules_signature).slice(0, 22))}…</code></div>`
: `<div style="color:#999;font-size:11px;margin-top:3px;">Cache predates the two-tier contract (legacy header).</div>`;
return `<div style="margin-top:10px;padding:8px 10px;background:rgba(255,200,50,.08);border:1px solid rgba(255,200,50,.3);border-radius:4px;color:#ffa;">
! <strong>Cache is stale by rules.</strong> ID extraction rules have changed since this cache was built. Some <code>jav_id</code> values may be out of date.
${sigLine}
<div style="margin-top:8px;"><button class="chip-btn cache-reextract" type="button" style="color:#ffd97a;background:rgba(255,200,50,.12);border-color:rgba(255,200,50,.35);font-weight:600;">Re-extract IDs (fast, no rescan)</button></div>
</div>`;
}
if (state === "schema_mismatch") {
return `<div style="margin-top:10px;padding:8px 10px;background:rgba(255,120,120,.08);border:1px solid rgba(255,120,120,.3);border-radius:4px;color:#faa;">
<strong>Cache schema mismatch.</strong> The on-disk cache shape is incompatible (schema ${escapeHtml(r.cache_schema ?? "?")} vs expected ${escapeHtml(r.expected_cache_schema ?? "?")}). A full re-scan is required.
</div>`;
}
return "";
}
document.getElementById("cache-status-run").addEventListener("click", async () => {
const out = document.getElementById("cache-status-results");
out.textContent = "checking cache...";
@@ -136,6 +165,7 @@ document.getElementById("cache-status-run").addEventListener("click", async () =
`<div><span style="color:#777;">Configured target:</span> ${escapeHtml((r.configured?.default_target || []).join(", ") || "(none)")}</div>`,
`<div><span style="color:#777;">Configured source:</span> ${escapeHtml((r.configured?.default_source || []).join(", ") || "(none)")}</div>`,
];
rows.push(renderCacheContractBanner(r));
for (const m of r.remotes || []) {
const color = m.status === "never_scanned" || m.stale ? "#ffa" : "#afa";
const state = m.status === "never_scanned" ? "never scanned" : `${m.status || (m.stale ? "stale" : "fresh")} · age ${fmtCacheAge(m.age_hours)}`;
+32
View File
@@ -415,6 +415,38 @@ document.getElementById("library-issues-modal").addEventListener("click", (e) =>
showSkipped.closest("div")?.after(panel);
return;
}
const reextract = event.target.closest(".cache-reextract");
if (reextract) {
const original = reextract.textContent;
reextract.disabled = true;
reextract.textContent = "Re-extracting…";
(async () => {
try {
const r = await chrome.runtime.sendMessage({ type: "reextract-ids" });
if (!r || !r.ok) {
reextract.textContent = original;
reextract.disabled = false;
const note = document.createElement("div");
note.style.cssText = "color:#faa;margin-top:6px;font-size:11px;";
note.textContent = `Re-extract failed: ${r?.error || "no response"}`;
reextract.after(note);
return;
}
const note = document.createElement("div");
note.style.cssText = "color:#afa;margin-top:6px;font-size:11px;";
note.textContent = `Re-extracted ${r.total ?? 0} IDs · ${r.changed ?? 0} changed · ${r.unchanged ?? 0} unchanged · ${r.dropped ?? 0} dropped. Re-run Check Cache to refresh this view.`;
reextract.replaceWith(note);
} catch (err) {
reextract.textContent = original;
reextract.disabled = false;
const note = document.createElement("div");
note.style.cssText = "color:#faa;margin-top:6px;font-size:11px;";
note.textContent = `Re-extract failed: ${err?.message || String(err)}`;
reextract.after(note);
}
})();
return;
}
const refresh = event.target.closest(".cache-refresh-remote");
if (refresh) {
const remote = refresh.dataset.remote || "";