41e9a500d0
Adds docs/CACHE_CONTRACT.md defining the two-tier replacement for
today's single CACHE_VERSION=3 constant:
cache_schema force rebuild on mismatch (today's semantics)
id_rules mark stale, allow lazy re-extract w/o rescan
id_rules_signature sha256 over canonical text of all extraction
rule sources (regexes, normalizers, part
detectors, FC2 handling, user-config rules)
as a belt-and-braces drift check
Documents:
- new cache.json header shape
- one-shot in-place migration for legacy `version: 3` users (no
forced rescan)
- behavior matrix for the three resulting states
- extension UX: fresh / stale-by-rules amber / schema-mismatch red
- new "Re-extract IDs" action that walks files[] in place and
never touches rclone
- what counts as a rules change vs. unrelated code change
- open questions deferred to step 10 (per-remote tracking,
custom-rules signature handling, host wiring)
No code changes — step 10 implements. This commit only locks the
contract so step 10 has a single source of truth for both the
Python and extension sides.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
172 lines
16 KiB
Markdown
172 lines
16 KiB
Markdown
# rclone-jav (Brave extension + native messaging host)
|
|
|
|
Session memory for Claude. Read before making changes here.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Brave tab title -> content script extracts JAV ID
|
|
-> background.js connectNative("com.rcjav.host")
|
|
-> host/rcjav-host.bat (portable: py launcher or python on PATH)
|
|
-> host/rcjav-host.py
|
|
-> subprocess python rc-jav.py --search ID --basic --no-color --format json
|
|
-> structured hits back through native port
|
|
-> popup or in-page overlay
|
|
```
|
|
|
|
Two separate codebases:
|
|
- This repo: Brave extension + native messaging host.
|
|
- `D:\DEV\Project\rclone-jav\` — Python rc-jav CLI. The host shells out to `rc-jav.py` here.
|
|
|
|
## Folder layout (post-rename)
|
|
|
|
```
|
|
D:\DEV\Extensions\Production\rclone-jav\ (PC 1)
|
|
D:\DEV\Extensions\Staging\rclone-jav\ (PC 2)
|
|
├── manifest.json
|
|
├── background.js
|
|
├── content.js
|
|
├── popup.{html,js,css}
|
|
├── options.{html,js}
|
|
├── host\
|
|
│ ├── rcjav-host.py
|
|
│ ├── rcjav-host.bat (portable: py launcher fallback)
|
|
│ ├── install-host.ps1 (self-elevates to HKLM)
|
|
│ ├── register-host.bat (prompts for ID, calls install-host.ps1)
|
|
│ ├── com.rcjav.host.json (generated; UTF-8 NO BOM)
|
|
│ └── (logs)
|
|
└── docs\
|
|
├── INSTALL.md (gotcha table at the bottom)
|
|
└── README.md
|
|
```
|
|
|
|
## Critical gotchas (learned the hard way)
|
|
|
|
| Symptom | Cause | Fix |
|
|
|---|---|---|
|
|
| "Specified native messaging host not found" | UTF-8 BOM in com.rcjav.host.json | `WriteAllText` with `UTF8Encoding($false)` |
|
|
| Same error after registering HKCU | Brave on Windows ignores HKCU on some installs | Register HKLM too. `install-host.ps1` does both. |
|
|
| Host launches then disconnects | Python text-mode stdio mangles 4-byte length prefix | `msvcrt.setmode(stdin/stdout, O_BINARY)` at host startup |
|
|
| Host log says "stdin closed, exiting" immediately | bat-side stderr leak corrupts protocol | `python -u` + redirect stderr to log file |
|
|
| `Missing closing '}'` in install-host.ps1 | Em-dashes in comments + LF endings + Windows PS 5.1 (cp1252 fallback) | Strip em-dashes from .ps1 files, or save with BOM, or use pwsh |
|
|
| Brave reload != Brave restart | NM cache survives extension reload | Kill all brave.exe processes then reopen |
|
|
| `IBW-902z` page title fails to parse | `\b` after `\d` blocked by following word char | Extension regex uses `[a-zA-Z]?\b` trailing — captured but discarded |
|
|
| Delete safety too broad | Allowlist reduced `cq:JAV` to `cq:` | Match full configured prefixes, not remote roots |
|
|
| Overlay feels ~1.5s late on SPA pages | `SPA_SETTLE_MS` waits before auto-check | Current value is 800ms; tune carefully if detection gets flaky |
|
|
|
|
## Internal names — keep as-is
|
|
|
|
- Native messaging host: `com.rcjav.host` (NOT renamed despite extension rename)
|
|
- Window flag in content.js: `__rclonex_loaded__` (idempotency guard for content script re-injection)
|
|
- CSS IDs starting with `rclonex-` (overlay)
|
|
- Host logs: `host/logs/rcjav-host.log`, `host/logs/rcjav-host-events.log`, `host/logs/rcjav-host-stderr.log`, `host/logs/deletes.log`
|
|
- Host scan progress state: `host/state/scan-state.json`
|
|
|
|
Don't rename these unless there's a real reason. They're orthogonal to the user-facing extension name.
|
|
|
|
## Settings
|
|
|
|
Stored in `chrome.storage.sync` under key `settings`. Per-extension-ID namespacing → if extension is reloaded under a different path, settings are wiped.
|
|
|
|
**Backup/restore lives in Options → Setup & Transfer** — JSON export/import to survive reloads or PC migrations. Use it before renaming or relocating the extension.
|
|
|
|
DEFAULT_SETTINGS lives in background.js. Keep in sync with options.html defaults.
|
|
|
|
## Decision log
|
|
|
|
### Deletion allowlist uses full prefixes (2026-05-20)
|
|
|
|
**Decision:** host delete allowlist must use full configured path prefixes (`cq:JAV`, trash dir, etc.), not only remote roots like `cq:`.
|
|
|
|
**Reasoning:** Reducing `cq:JAV` to `cq:` lets any path on the same rclone remote pass the safety check. Deletion is opt-in but must be tightly scoped.
|
|
|
|
**Important:** extension delete calls must forward `rcjav_path`, or the host may read the wrong `config.json` and derive the wrong allowlist.
|
|
|
|
### Toolbar popup setting gates auto-check (2026-05-20)
|
|
|
|
**Decision:** `triggers.toolbarClick` does not remove the MV3 popup, but it does gate whether the popup auto-runs `checkTab` on open. If disabled, popup stays idle until user clicks Re-Scan.
|
|
|
|
### Quick search and ID padding (2026-05-20)
|
|
|
|
**Decision:** rc-jav canonical JAV IDs use at least 3 digits (`ABC-027`) and preserve 4+ digit IDs (`ABCD-1294`). Quick search emits canonical uppercase globs only.
|
|
|
|
**Reasoning:** user clarified real JAV filenames are never `ABC-27` or `ABC-0027`; they are `ABC-027`. User also never uses lowercase filenames, so quick search should not use rclone `--ignore-case` because it added noticeable delay.
|
|
|
|
**Operational note:** this changes cache keys. Run `python rc-jav.py --scan` in `D:\DEV\Project\rclone-jav` after this change.
|
|
|
|
### No-match overlay metadata (2026-05-20)
|
|
|
|
**Decision:** host search response includes `cache_meta` and `scanned_remotes` from rc-jav JSON so no-match overlays can show what was scanned instead of falling back to "library".
|
|
|
|
### IBW-902z trailing letter (2026-05-20)
|
|
|
|
**Decision:** minimal regex fix in extension only. NOT a full variant-suffix rewrite of the index.
|
|
|
|
**Reasoning:** User's library uses one ID per number (either `IBW-902` OR `IBW-902z`, not both). Page titles failing on `IBW-902z` is the real bug. Extension regex now matches optional trailing letter and discards it. rc-jav's index continues to strip trailing letters at extract_id time. Effective: extension queries `IBW-902` for any title `IBW-902` or `IBW-902z`, finds the file regardless of how it's named on rclone.
|
|
|
|
**Revisit if:** both `IBW-902.mp4` and `IBW-902z.mp4` ever coexist in library — they'd collide on the same ID. Then implement variant suffix (#var_Z) end-to-end.
|
|
|
|
### Native messaging host name stayed `com.rcjav.host`
|
|
|
|
When extension was renamed `rclonex` → `rclone-jav`, the NM host name was NOT renamed. Reason: zero user impact (it's an internal identifier in registry/manifest), but every rename costs registry rewrites + script churn. Not worth it.
|
|
|
|
### WinCatalog backslash normalization
|
|
|
|
Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Windows `\` to rclone-style `/` before the extension sees them.
|
|
|
|
## When making changes
|
|
|
|
- Extension settings schema change → update `DEFAULT_SETTINGS` in background.js AND defaults in options.html + options.js load()
|
|
- New native messaging action → handler in rcjav-host.py + DISPATCH map + extension code that sends it
|
|
- New options pane → sidebar item in options.html + new `.pane` div + load/save bindings in options.js
|
|
- Any rc-jav.py CLI change → host invocation in rcjav-host.py handle_search must keep pace
|
|
|
|
---
|
|
|
|
## Console consolidation refactor — execution status
|
|
|
|
**Spec / blueprint:**
|
|
- `D:\DEV\Project\rclone-jav\mockups\console-consolidation-claude.html` (refactor spec — decision table, sequence, acceptance criteria)
|
|
- `D:\DEV\Project\rclone-jav\mockups\console-consolidation-options.html` (Codex's visual annotation variant)
|
|
|
|
**Shipped (in execution order):**
|
|
|
|
1. **Sim Dupe deleted from popup.** Button + click handler removed from `popup.html` / `popup.js`. Payload preserved in `samples/sim-dupe.js` for future layout work.
|
|
2. **CSS extracted from options.html.** Embedded `<style>` block moved to `options.css`, linked via `<link rel="stylesheet">`. options.html went 1179 → 794 lines. Inline `style="..."` attributes intentionally left for later (step 6 territory).
|
|
3. **Transfer Assistant wizard deleted.** "Setup & Transfer" pane renamed to "Setup". Replacement: Extension ID display + Copy button added to Diagnostics → Native host registration fieldset (always visible, not failure-gated). Sidebar entry, fieldset, modal, and ~107 lines of JS removed.
|
|
5. **Recent Activity + Search Troubleshooting moved to new Debug Tools pane.** Verified Recent Activity is search-trigger-only by reading `background.js` — `recordActivity()` is NOT called from `delete-file` handler. No audit-value split needed. New sidebar entry "Debug Tools" under System group; new `pane-debug` houses both fieldsets.
|
|
6. **options.js split — Cache & Scans + Duplicate Review paired extraction.** `options.js` 3133 → 2356 lines. New files: `options-cache.js` (161 lines, Cache & Scans block), `options-dupe-review.js` (616 lines, Dup Review + Keep Ranking incl. bottom `loadKeepRanking()` call). Script-tag order in `options.html`: cache → dupe-review → options.js (body bottom). Cross-script binding visibility (vanilla classic scripts share global declarative env): Library Issues code still in options.js reads `_configuredScanRoots` / `_cacheSkippedByRemote` / calls `rememberConfiguredScanRoots` from cache file by bare reference. Calls to `escapeHtml` / `openModal` / `closeModal` / `keepActionViewport` / `clearNativeRepairCard` / `renderNativeMessagingFailure` from extracted files all occur inside event handlers (resolved at call time, after options.js parses). Repo `git init`'d before this step; baseline commit `f8e781f` is the rollback point. Verified by `node --check` on each file and on concatenated script.
|
|
6b. **options.js split — Library Issues extraction.** `options.js` 2356 → 1903 lines. New file: `options-library-issues.js` (453 lines) — covers `lastLibraryIssues`, `_libraryIssuesDirty`, `renderLibraryIssues`, `_closeLibraryIssues`, and the bottom IIFE that wraps `_optScanTimer` / `_setOptScanningState` / `_pollOptProgress` for optimization-scan progress polling. Block was fully self-contained (no external callers of its identifiers). Reads `_configuredScanRoots` / `_cacheSkippedByRemote` / calls `rememberConfiguredScanRoots` from `options-cache.js` — same cross-file binding pattern proven in step 6. Script-tag order in `options.html`: cache → dupe-review → library-issues → options.js. `node --check` passes on each file and on concatenation; line count of concat (3133) matches pre-split total exactly.
|
|
7a. **Bulk Check standalone window.** New `bulk-check.{html,js,css}` opened as detached `chrome.windows.create({ type: 'popup', width: 640, height: 540 })`. Launcher = 📋 icon button in popup header next to ⚙ Options; click sends `open-bulk-check` message to background and closes the popup. Background owns the lifecycle: `openBulkCheckWindow()` reads `chrome.storage.session.bulkCheckWindowId`; existing id → `chrome.windows.update({ focused, drawAttention })`; failure or no id → create new window + stash id. `chrome.windows.onRemoved` clears the stale id on close. Last-paste persisted to `chrome.storage.local.bulkCheckLastPaste` (debounced 500ms), restored on window open. `quickMode` read from settings on each run (parity with old options behavior). Removed the Bulk ID Check fieldset from `options.html` (Library Review pane description updated to note the relocation) and its handlers from `options.js` (1903 → 1852 lines). No manifest permission changes needed.
|
|
8. **Shared fixture corpus.** Seeded `D:\DEV\Project\rclone-jav\fixtures\` (top-level in the Python repo, conceptually shared with this extension). Files: `filename-extraction.json` (12 cases, Python `extract_id` contract), `query-extraction.json` (10 cases, extension `content.js` `normalizeId` contract), `shared-normalization.json` (5 cases, both sides must agree), `README.md`, and a self-contained Python runner `run.py` (no third-party deps; imports `rc-jav.py` in place). All 17 Python-side cases pass against current `rc-jav.py`. The runner uses `|` and `->` instead of `·` and `→` so it works on Windows cp1252 consoles. Documented one intentional divergence: the extension normalizes the compact `FC2PPV1841460` form (page-title surface) while Python `extract_id` does not (filename surface — compact form doesn't appear on disk). No Node-side runner today — `content.js` lives in an injected IIFE and importing it would require duplicating regexes; the JSON corpus is the canonical spec until that lands.
|
|
9. **Cache contract design — shipped as a design doc, not code.** `docs/CACHE_CONTRACT.md` defines a two-tier model that splits today's single `CACHE_VERSION = 3` into `cache_schema` (force rebuild on mismatch) and `id_rules` (mark stale, allow lazy re-extract without re-scanning). Adds `id_rules_signature` (sha256 over canonical text of all extraction-rule sources, including user-added normalizers from config.json) as a belt-and-braces drift check. Specifies the new cache header shape, a one-shot in-place migration for users on legacy `version: 3` (no forced rescan), the behavior matrix for the three resulting states, and the extension's three-state UX (fresh / stale-by-rules amber / schema-mismatch red) with a new "Re-extract IDs" action that walks `files[]` in place and never touches rclone. Step 10 implements; step 9 only locks the contract.
|
|
|
|
(Step 4 in the plan is a paired-extraction sub-task of step 6; folded into step 6 ship.)
|
|
|
|
**Pending (in execution order):**
|
|
|
|
- **Step 6c — finish options.js split (optional).** Remaining options.js (1852 lines) still holds: settings load/save, backup/restore, recent activity, search test bench, adapters, ID normalizers, part detectors, element picker, overlay previews, diagnostics, profiles, paths, and the bottom-entry IIFE. Candidates for extraction: Diagnostics (~250 lines), Profiles (~265 lines), Adapters + ID normalizers + Part detectors as a "rules editors" file (~330 lines combined). Diminishing returns past this point — bottom IIFE + load/save core should stay in `options.js` as the entry point.
|
|
- **Step 10 — `rc-jav.py` module split** into `rcjav/` package (ids, cache, dupes, catalog, rclone_io, output, cli). Keep `rc-jav.py` as thin entrypoint that imports from `rcjav.cli.main`. Step 10 is also where the cache-contract design from step 9 gets implemented: split `CACHE_VERSION` into `cache_schema` + `id_rules` + `id_rules_signature`, add the legacy-`version: 3` in-place migration, add a `--reextract` CLI flag that walks `files[]` without re-listing remotes, and update the extension's `cache-status` consumer (`options-cache.js`) to render the three-state UX from `docs/CACHE_CONTRACT.md`.
|
|
- **Step 11 — Host fast-path benchmark and decide.** Measure popup search latency under (a) idle Python and (b) Python actively scanning. If host fast path is the only thing keeping popup responsive under scan = narrow to dict lookup only and document. If not needed = delete entirely.
|
|
|
|
**Architecture (locked — do not relitigate):**
|
|
|
|
- Sidebar = Console / Settings / Support tri-split. No dashboard pane. Status carried by badges on tab labels (`Duplicate Review [27]`, `Cache & Scans [28m]`, `Library Issues [4]`).
|
|
- Default landing = Duplicate Review.
|
|
- Bulk ID Check = detached `chrome.windows.create` popup, NOT a Console sidebar tab. Single canonical entry path = popup launcher button.
|
|
- Keep Ranking Rules nested INSIDE Duplicate Review as a sub-tab, NOT a separate Settings tab.
|
|
- Sim Dupe: deleted from extension. Repo HTML harness in `samples/` only.
|
|
- Transfer Assistant: deleted. Diagnostics' Native host registration fieldset is the replacement (Extension ID copy + Repair Registration + Verify Registration buttons).
|
|
- Vanilla JS + ordered `<script>` tags. No framework, no build system.
|
|
- Inline rule tests stay next to rule editors (Matching Rules, Site Extraction). Standalone benches go to Debug Tools.
|
|
|
|
**Notes:**
|
|
|
|
- Repo is NOT git-initialized. Rollback for shipped steps = manual restore from this conversation's diffs. Worth running `git init` in this folder before step 6 (the big one) for safer iteration.
|
|
- Three pre-execution handoffs from the original plan have been resolved:
|
|
- Recent Activity scope test → settled by code read (single role, all to Debug).
|
|
- Diagnostics replacement for Transfer wizard → present (Extension ID, Repair, Verify all visible in one fieldset).
|
|
- Popup launcher button label → defer until step 7a; text + emoji currently in mockup.
|
|
|
|
If a future session wants to continue: read this status block + open the mockup HTML files for the full spec. Resume on step 6.
|