Step 9: cache contract design doc
Adds docs/CACHE_CONTRACT.md defining the two-tier replacement for
today's single CACHE_VERSION=3 constant:
cache_schema force rebuild on mismatch (today's semantics)
id_rules mark stale, allow lazy re-extract w/o rescan
id_rules_signature sha256 over canonical text of all extraction
rule sources (regexes, normalizers, part
detectors, FC2 handling, user-config rules)
as a belt-and-braces drift check
Documents:
- new cache.json header shape
- one-shot in-place migration for legacy `version: 3` users (no
forced rescan)
- behavior matrix for the three resulting states
- extension UX: fresh / stale-by-rules amber / schema-mismatch red
- new "Re-extract IDs" action that walks files[] in place and
never touches rclone
- what counts as a rules change vs. unrelated code change
- open questions deferred to step 10 (per-remote tracking,
custom-rules signature handling, host wiring)
No code changes — step 10 implements. This commit only locks the
contract so step 10 has a single source of truth for both the
Python and extension sides.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -139,14 +139,14 @@ Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Window
|
||||
6b. **options.js split — Library Issues extraction.** `options.js` 2356 → 1903 lines. New file: `options-library-issues.js` (453 lines) — covers `lastLibraryIssues`, `_libraryIssuesDirty`, `renderLibraryIssues`, `_closeLibraryIssues`, and the bottom IIFE that wraps `_optScanTimer` / `_setOptScanningState` / `_pollOptProgress` for optimization-scan progress polling. Block was fully self-contained (no external callers of its identifiers). Reads `_configuredScanRoots` / `_cacheSkippedByRemote` / calls `rememberConfiguredScanRoots` from `options-cache.js` — same cross-file binding pattern proven in step 6. Script-tag order in `options.html`: cache → dupe-review → library-issues → options.js. `node --check` passes on each file and on concatenation; line count of concat (3133) matches pre-split total exactly.
|
||||
7a. **Bulk Check standalone window.** New `bulk-check.{html,js,css}` opened as detached `chrome.windows.create({ type: 'popup', width: 640, height: 540 })`. Launcher = 📋 icon button in popup header next to ⚙ Options; click sends `open-bulk-check` message to background and closes the popup. Background owns the lifecycle: `openBulkCheckWindow()` reads `chrome.storage.session.bulkCheckWindowId`; existing id → `chrome.windows.update({ focused, drawAttention })`; failure or no id → create new window + stash id. `chrome.windows.onRemoved` clears the stale id on close. Last-paste persisted to `chrome.storage.local.bulkCheckLastPaste` (debounced 500ms), restored on window open. `quickMode` read from settings on each run (parity with old options behavior). Removed the Bulk ID Check fieldset from `options.html` (Library Review pane description updated to note the relocation) and its handlers from `options.js` (1903 → 1852 lines). No manifest permission changes needed.
|
||||
8. **Shared fixture corpus.** Seeded `D:\DEV\Project\rclone-jav\fixtures\` (top-level in the Python repo, conceptually shared with this extension). Files: `filename-extraction.json` (12 cases, Python `extract_id` contract), `query-extraction.json` (10 cases, extension `content.js` `normalizeId` contract), `shared-normalization.json` (5 cases, both sides must agree), `README.md`, and a self-contained Python runner `run.py` (no third-party deps; imports `rc-jav.py` in place). All 17 Python-side cases pass against current `rc-jav.py`. The runner uses `|` and `->` instead of `·` and `→` so it works on Windows cp1252 consoles. Documented one intentional divergence: the extension normalizes the compact `FC2PPV1841460` form (page-title surface) while Python `extract_id` does not (filename surface — compact form doesn't appear on disk). No Node-side runner today — `content.js` lives in an injected IIFE and importing it would require duplicating regexes; the JSON corpus is the canonical spec until that lands.
|
||||
9. **Cache contract design — shipped as a design doc, not code.** `docs/CACHE_CONTRACT.md` defines a two-tier model that splits today's single `CACHE_VERSION = 3` into `cache_schema` (force rebuild on mismatch) and `id_rules` (mark stale, allow lazy re-extract without re-scanning). Adds `id_rules_signature` (sha256 over canonical text of all extraction-rule sources, including user-added normalizers from config.json) as a belt-and-braces drift check. Specifies the new cache header shape, a one-shot in-place migration for users on legacy `version: 3` (no forced rescan), the behavior matrix for the three resulting states, and the extension's three-state UX (fresh / stale-by-rules amber / schema-mismatch red) with a new "Re-extract IDs" action that walks `files[]` in place and never touches rclone. Step 10 implements; step 9 only locks the contract.
|
||||
|
||||
(Step 4 in the plan is a paired-extraction sub-task of step 6; folded into step 6 ship.)
|
||||
|
||||
**Pending (in execution order):**
|
||||
|
||||
- **Step 6c — finish options.js split (optional).** Remaining options.js (1852 lines) still holds: settings load/save, backup/restore, recent activity, search test bench, adapters, ID normalizers, part detectors, element picker, overlay previews, diagnostics, profiles, paths, and the bottom-entry IIFE. Candidates for extraction: Diagnostics (~250 lines), Profiles (~265 lines), Adapters + ID normalizers + Part detectors as a "rules editors" file (~330 lines combined). Diminishing returns past this point — bottom IIFE + load/save core should stay in `options.js` as the entry point.
|
||||
- **Step 9 — Cache contract design.** CACHE_VERSION already exists (currently 3). Add ID_RULES_VERSION concept: schema bump = force rebuild, rules bump = warn-and-mark-stale.
|
||||
- **Step 10 — `rc-jav.py` module split** into `rcjav/` package (ids, cache, dupes, catalog, rclone_io, output, cli). Keep `rc-jav.py` as thin entrypoint that imports from `rcjav.cli.main`.
|
||||
- **Step 10 — `rc-jav.py` module split** into `rcjav/` package (ids, cache, dupes, catalog, rclone_io, output, cli). Keep `rc-jav.py` as thin entrypoint that imports from `rcjav.cli.main`. Step 10 is also where the cache-contract design from step 9 gets implemented: split `CACHE_VERSION` into `cache_schema` + `id_rules` + `id_rules_signature`, add the legacy-`version: 3` in-place migration, add a `--reextract` CLI flag that walks `files[]` without re-listing remotes, and update the extension's `cache-status` consumer (`options-cache.js`) to render the three-state UX from `docs/CACHE_CONTRACT.md`.
|
||||
- **Step 11 — Host fast-path benchmark and decide.** Measure popup search latency under (a) idle Python and (b) Python actively scanning. If host fast path is the only thing keeping popup responsive under scan = narrow to dict lookup only and document. If not needed = delete entirely.
|
||||
|
||||
**Architecture (locked — do not relitigate):**
|
||||
|
||||
Reference in New Issue
Block a user