181 lines
22 KiB
Markdown
181 lines
22 KiB
Markdown
# rclone-jav (Brave extension + native messaging host)
|
||
|
||
Session memory for Claude. Read before making changes here.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Brave tab title -> content script extracts JAV ID
|
||
-> background.js connectNative("com.rcjav.host")
|
||
-> host/rcjav-host.bat (portable: py launcher or python on PATH)
|
||
-> host/rcjav-host.py
|
||
-> subprocess python rc-jav.py --search ID --basic --no-color --format json
|
||
-> structured hits back through native port
|
||
-> popup or in-page overlay
|
||
```
|
||
|
||
Two separate codebases:
|
||
- This repo: Brave extension + native messaging host.
|
||
- `D:\DEV\Project\rclone-jav\` — Python rc-jav CLI. The host shells out to `rc-jav.py` here.
|
||
|
||
## Folder layout (post-rename)
|
||
|
||
```
|
||
D:\DEV\Extensions\Production\rclone-jav\ (PC 1)
|
||
D:\DEV\Extensions\Staging\rclone-jav\ (PC 2)
|
||
├── manifest.json
|
||
├── background.js
|
||
├── content.js
|
||
├── popup.{html,js,css}
|
||
├── options.{html,js}
|
||
├── host\
|
||
│ ├── rcjav-host.py
|
||
│ ├── rcjav-host.bat (portable: py launcher fallback)
|
||
│ ├── install-host.ps1 (self-elevates to HKLM)
|
||
│ ├── register-host.bat (prompts for ID, calls install-host.ps1)
|
||
│ ├── com.rcjav.host.json (generated; UTF-8 NO BOM)
|
||
│ └── (logs)
|
||
└── docs\
|
||
├── INSTALL.md (gotcha table at the bottom)
|
||
└── README.md
|
||
```
|
||
|
||
## Critical gotchas (learned the hard way)
|
||
|
||
| Symptom | Cause | Fix |
|
||
|---|---|---|
|
||
| "Specified native messaging host not found" | UTF-8 BOM in com.rcjav.host.json | `WriteAllText` with `UTF8Encoding($false)` |
|
||
| Same error after registering HKCU | Brave on Windows ignores HKCU on some installs | Register HKLM too. `install-host.ps1` does both. |
|
||
| Host launches then disconnects | Python text-mode stdio mangles 4-byte length prefix | `msvcrt.setmode(stdin/stdout, O_BINARY)` at host startup |
|
||
| Host log says "stdin closed, exiting" immediately | bat-side stderr leak corrupts protocol | `python -u` + redirect stderr to log file |
|
||
| `Missing closing '}'` in install-host.ps1 | Em-dashes in comments + LF endings + Windows PS 5.1 (cp1252 fallback) | Strip em-dashes from .ps1 files, or save with BOM, or use pwsh |
|
||
| Brave reload != Brave restart | NM cache survives extension reload | Kill all brave.exe processes then reopen |
|
||
| `IBW-902z` page title fails to parse | `\b` after `\d` blocked by following word char | Extension regex uses `[a-zA-Z]?\b` trailing — captured but discarded |
|
||
| Delete safety too broad | Allowlist reduced `cq:JAV` to `cq:` | Match full configured prefixes, not remote roots |
|
||
| Overlay feels ~1.5s late on SPA pages | `SPA_SETTLE_MS` waits before auto-check | Current value is 800ms; tune carefully if detection gets flaky |
|
||
|
||
## Internal names — keep as-is
|
||
|
||
- Native messaging host: `com.rcjav.host` (NOT renamed despite extension rename)
|
||
- Window flag in content.js: `__rclonex_loaded__` (idempotency guard for content script re-injection)
|
||
- CSS IDs starting with `rclonex-` (overlay)
|
||
- Host logs: `host/logs/rcjav-host.log`, `host/logs/rcjav-host-events.log`, `host/logs/rcjav-host-stderr.log`, `host/logs/deletes.log`
|
||
- Host scan progress state: `host/state/scan-state.json`
|
||
|
||
Don't rename these unless there's a real reason. They're orthogonal to the user-facing extension name.
|
||
|
||
## Settings
|
||
|
||
Stored in `chrome.storage.sync` under key `settings`. Per-extension-ID namespacing → if extension is reloaded under a different path, settings are wiped.
|
||
|
||
**Backup/restore lives in Options → Setup & Transfer** — JSON export/import to survive reloads or PC migrations. Use it before renaming or relocating the extension.
|
||
|
||
DEFAULT_SETTINGS lives in background.js. Keep in sync with options.html defaults.
|
||
|
||
## Decision log
|
||
|
||
### Deletion allowlist uses full prefixes (2026-05-20)
|
||
|
||
**Decision:** host delete allowlist must use full configured path prefixes (`cq:JAV`, trash dir, etc.), not only remote roots like `cq:`.
|
||
|
||
**Reasoning:** Reducing `cq:JAV` to `cq:` lets any path on the same rclone remote pass the safety check. Deletion is opt-in but must be tightly scoped.
|
||
|
||
**Important:** extension delete calls must forward `rcjav_path`, or the host may read the wrong `config.json` and derive the wrong allowlist.
|
||
|
||
### Toolbar popup setting gates auto-check (2026-05-20)
|
||
|
||
**Decision:** `triggers.toolbarClick` does not remove the MV3 popup, but it does gate whether the popup auto-runs `checkTab` on open. If disabled, popup stays idle until user clicks Re-Scan.
|
||
|
||
### Quick search and ID padding (2026-05-20)
|
||
|
||
**Decision:** rc-jav canonical JAV IDs use at least 3 digits (`ABC-027`) and preserve 4+ digit IDs (`ABCD-1294`). Quick search emits canonical uppercase globs only.
|
||
|
||
**Reasoning:** user clarified real JAV filenames are never `ABC-27` or `ABC-0027`; they are `ABC-027`. User also never uses lowercase filenames, so quick search should not use rclone `--ignore-case` because it added noticeable delay.
|
||
|
||
**Operational note:** this changes cache keys. Run `python rc-jav.py --scan` in `D:\DEV\Project\rclone-jav` after this change.
|
||
|
||
### No-match overlay metadata (2026-05-20)
|
||
|
||
**Decision:** host search response includes `cache_meta` and `scanned_remotes` from rc-jav JSON so no-match overlays can show what was scanned instead of falling back to "library".
|
||
|
||
### IBW-902z trailing letter (2026-05-20)
|
||
|
||
**Decision:** minimal regex fix in extension only. NOT a full variant-suffix rewrite of the index.
|
||
|
||
**Reasoning:** User's library uses one ID per number (either `IBW-902` OR `IBW-902z`, not both). Page titles failing on `IBW-902z` is the real bug. Extension regex now matches optional trailing letter and discards it. rc-jav's index continues to strip trailing letters at extract_id time. Effective: extension queries `IBW-902` for any title `IBW-902` or `IBW-902z`, finds the file regardless of how it's named on rclone.
|
||
|
||
**Revisit if:** both `IBW-902.mp4` and `IBW-902z.mp4` ever coexist in library — they'd collide on the same ID. Then implement variant suffix (#var_Z) end-to-end.
|
||
|
||
### Native messaging host name stayed `com.rcjav.host`
|
||
|
||
When extension was renamed `rclonex` → `rclone-jav`, the NM host name was NOT renamed. Reason: zero user impact (it's an internal identifier in registry/manifest), but every rename costs registry rewrites + script churn. Not worth it.
|
||
|
||
### WinCatalog backslash normalization
|
||
|
||
Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Windows `\` to rclone-style `/` before the extension sees them.
|
||
|
||
## When making changes
|
||
|
||
- Extension settings schema change → update `DEFAULT_SETTINGS` in background.js AND defaults in options.html + options.js load()
|
||
- New native messaging action → handler in rcjav-host.py + DISPATCH map + extension code that sends it
|
||
- New options pane → sidebar item in options.html + new `.pane` div + load/save bindings in options.js
|
||
- Any rc-jav.py CLI change → host invocation in rcjav-host.py handle_search must keep pace
|
||
|
||
---
|
||
|
||
## Console consolidation refactor — execution status
|
||
|
||
**Spec / blueprint:**
|
||
- `D:\DEV\Project\rclone-jav\mockups\console-consolidation-claude.html` (refactor spec — decision table, sequence, acceptance criteria)
|
||
- `D:\DEV\Project\rclone-jav\mockups\console-consolidation-options.html` (Codex's visual annotation variant)
|
||
|
||
**Shipped (in execution order):**
|
||
|
||
1. **Sim Dupe deleted from popup.** Button + click handler removed from `popup.html` / `popup.js`. Payload preserved in `samples/sim-dupe.js` for future layout work.
|
||
2. **CSS extracted from options.html.** Embedded `<style>` block moved to `options.css`, linked via `<link rel="stylesheet">`. options.html went 1179 → 794 lines. Inline `style="..."` attributes intentionally left for later (step 6 territory).
|
||
3. **Transfer Assistant wizard deleted.** "Setup & Transfer" pane renamed to "Setup". Replacement: Extension ID display + Copy button added to Diagnostics → Native host registration fieldset (always visible, not failure-gated). Sidebar entry, fieldset, modal, and ~107 lines of JS removed.
|
||
5. **Recent Activity + Search Troubleshooting moved to new Debug Tools pane.** Verified Recent Activity is search-trigger-only by reading `background.js` — `recordActivity()` is NOT called from `delete-file` handler. No audit-value split needed. New sidebar entry "Debug Tools" under System group; new `pane-debug` houses both fieldsets.
|
||
6. **options.js split — Cache & Scans + Duplicate Review paired extraction.** `options.js` 3133 → 2356 lines. New files: `options-cache.js` (161 lines, Cache & Scans block), `options-dupe-review.js` (616 lines, Dup Review + Keep Ranking incl. bottom `loadKeepRanking()` call). Script-tag order in `options.html`: cache → dupe-review → options.js (body bottom). Cross-script binding visibility (vanilla classic scripts share global declarative env): Library Issues code still in options.js reads `_configuredScanRoots` / `_cacheSkippedByRemote` / calls `rememberConfiguredScanRoots` from cache file by bare reference. Calls to `escapeHtml` / `openModal` / `closeModal` / `keepActionViewport` / `clearNativeRepairCard` / `renderNativeMessagingFailure` from extracted files all occur inside event handlers (resolved at call time, after options.js parses). Repo `git init`'d before this step; baseline commit `f8e781f` is the rollback point. Verified by `node --check` on each file and on concatenated script.
|
||
6b. **options.js split — Library Issues extraction.** `options.js` 2356 → 1903 lines. New file: `options-library-issues.js` (453 lines) — covers `lastLibraryIssues`, `_libraryIssuesDirty`, `renderLibraryIssues`, `_closeLibraryIssues`, and the bottom IIFE that wraps `_optScanTimer` / `_setOptScanningState` / `_pollOptProgress` for optimization-scan progress polling. Block was fully self-contained (no external callers of its identifiers). Reads `_configuredScanRoots` / `_cacheSkippedByRemote` / calls `rememberConfiguredScanRoots` from `options-cache.js` — same cross-file binding pattern proven in step 6. Script-tag order in `options.html`: cache → dupe-review → library-issues → options.js. `node --check` passes on each file and on concatenation; line count of concat (3133) matches pre-split total exactly.
|
||
7a. **Bulk Check standalone window.** New `bulk-check.{html,js,css}` opened as detached `chrome.windows.create({ type: 'popup', width: 640, height: 540 })`. Launcher = 📋 icon button in popup header next to ⚙ Options; click sends `open-bulk-check` message to background and closes the popup. Background owns the lifecycle: `openBulkCheckWindow()` reads `chrome.storage.session.bulkCheckWindowId`; existing id → `chrome.windows.update({ focused, drawAttention })`; failure or no id → create new window + stash id. `chrome.windows.onRemoved` clears the stale id on close. Last-paste persisted to `chrome.storage.local.bulkCheckLastPaste` (debounced 500ms), restored on window open. `quickMode` read from settings on each run (parity with old options behavior). Removed the Bulk ID Check fieldset from `options.html` (Library Review pane description updated to note the relocation) and its handlers from `options.js` (1903 → 1852 lines). No manifest permission changes needed.
|
||
8. **Shared fixture corpus.** Seeded `D:\DEV\Project\rclone-jav\fixtures\` (top-level in the Python repo, conceptually shared with this extension). Files: `filename-extraction.json` (12 cases, Python `extract_id` contract), `query-extraction.json` (10 cases, extension `content.js` `normalizeId` contract), `shared-normalization.json` (5 cases, both sides must agree), `README.md`, and a self-contained Python runner `run.py` (no third-party deps; imports `rc-jav.py` in place). All 17 Python-side cases pass against current `rc-jav.py`. The runner uses `|` and `->` instead of `·` and `→` so it works on Windows cp1252 consoles. Documented one intentional divergence: the extension normalizes the compact `FC2PPV1841460` form (page-title surface) while Python `extract_id` does not (filename surface — compact form doesn't appear on disk). No Node-side runner today — `content.js` lives in an injected IIFE and importing it would require duplicating regexes; the JSON corpus is the canonical spec until that lands.
|
||
9. **Cache contract design — shipped as a design doc, not code.** `docs/CACHE_CONTRACT.md` defines a two-tier model that splits today's single `CACHE_VERSION = 3` into `cache_schema` (force rebuild on mismatch) and `id_rules` (mark stale, allow lazy re-extract without re-scanning). Adds `id_rules_signature` (sha256 over canonical text of all extraction-rule sources, including user-added normalizers from config.json) as a belt-and-braces drift check. Specifies the new cache header shape, a one-shot in-place migration for users on legacy `version: 3` (no forced rescan), the behavior matrix for the three resulting states, and the extension's three-state UX (fresh / stale-by-rules amber / schema-mismatch red) with a new "Re-extract IDs" action that walks `files[]` in place and never touches rclone. Step 10 implements; step 9 only locks the contract.
|
||
|
||
(Step 4 in the plan is a paired-extraction sub-task of step 6; folded into step 6 ship.)
|
||
|
||
6c. **options.js split — Diagnostics + Profiles + Rules Editors extracted.** Final mechanical split. Three new files: `options-diagnostics.js` (245 lines — extension ID display, runDiagnostics, host status, host repair, native messaging failure renderer), `options-profiles.js` (265 lines — `_knownRemotes`, `_cfgDefaults`, fetchRemotes, buildRemotePicker, profile modal), `options-rules-editors.js` (328 lines — adapters + ID normalizers + custom part detectors with their feedback UI). `options.js` is now **1014 lines** — entry IIFE, settings load/save, backup/restore, recent activity, search test bench, element picker, overlay previews, no-match overlay, `escapeHtml`, and paths. The picker + overlay-preview code stays because it's tightly coupled across multiple settings panes and the JS-DOM call graph would have to be untangled to extract cleanly. Script-tag order in `options.html` now: cache → dupe-review → library-issues → diagnostics → profiles → rules-editors → options.js (entry). `node --check` clean on each file individually and on the concatenated load-order stream. Concat = 3144 lines, matching the pre-6c sum exactly.
|
||
|
||
**Pending:** none. Original roadmap closed. Follow-ups recorded inline above (e.g. step 11's `_load_host_cache` memoization is already shipped via the `_cache_mem` stamp dict). Node-side fixture runner (`fixtures/run-node.mjs`) added so `shared-normalization.json` now genuinely guards cross-side drift — the original step 8 ship noted the gap; it's closed.
|
||
10. **`rc-jav.py` package split — done (sub-steps 10a–10i, shipped across two sessions).** Python repo at `D:\DEV\Project\rclone-jav\` is now git-tracked (baseline `e029e89`); `rc-jav.py` went from 2230 lines to a 25-line shim. New `rcjav/` package contains: `model.py` (24, FileEntry), `ids.py` (243, ID extraction + part detection + normalization + describe_id_match + expand_range), `cache.py` (76, cache.json I/O), `catalog.py` (178, WinCatalog CSV/XML), `dupes.py` (264, keep-ranking + find_dupes + variant alerts), `rclone_io.py` (298, subprocess wrappers + walk_remote + glob escaping), `library.py` (176, library-issues + safe rename), `output.py` (495, rich console + renderers + plain/CSV/JSON outputs), `cli.py` (845, main() + collectors + arg parsing). Pattern across all sub-steps: top-level mutable globals (`PART_RES`, `_KEEP_RANKING`, `BASIC`, `RCLONE_BIN`, `console`, `USE_ANSI`) are read/written only inside their owning module — callers go through setters (`configure_part_patterns`, `set_keep_ranking`, `set_basic`, `set_rclone_bin`, `set_console_no_color`, `set_use_ansi`) so no in-tree code ever sees a stale captured binding. `rc-jav.py` shim does `from rcjav import *` + `from rcjav.cli import main`, so `importlib.spec_from_file_location("rcjav_script", "rc-jav.py")` (used by tests/fixtures/native host) still finds every previously-top-level name. Each sub-step verified at commit time via `python rc-jav.py --help`, `python -m rcjav.cli --help`, `python fixtures/run.py` (17/17 cases), and `python -m unittest tests.test_rules` (5/5).
|
||
|
||
(Step 10's cache-contract implementation is split off as step 10j below — design from step 9 is locked, implementation hasn't shipped.)
|
||
10j. **Cache contract implementation — done.** Two-tier contract from `docs/CACHE_CONTRACT.md` now live end-to-end across Python + host + extension.
|
||
|
||
Python (`rcjav/cache.py` + `rcjav/ids.py` + `rcjav/cli.py`): new constants `CACHE_SCHEMA_VERSION = 1` and `ID_RULES_VERSION = 1`. New `current_rules_signature()` in `rcjav.ids` produces a stable sha256 over the canonical text of every rule that influences a `jav_id` (PRIMARY_ID_RE, COMPOUND_ID_RE, FALLBACK_ID_RE, _NOHYPHEN_ID_RE, _BRACKET_ID_RE, _VARIANT_SUFFIX_RE, _XOFY_PRIORITY_RE, _RESOLUTION_TAG_RE, BUILTIN_PART_RES, PART_RES, FC2 handling toggle). `load_cache(signature)` translates legacy `version: 3` headers in place — no forced rescan; the cache is stamped `id_rules: 0` + signature `"legacy"` so it reads as "stale by rules". `cache_state(cache, sig)` classifies as `fresh` / `stale_by_rules` / `schema_mismatch`. `stamp_current_rules(cache, sig)` updates the header after a full scan or `--reextract`. New `rc-jav.py --reextract` walks `cache["remotes"][r]["files"]` against the live rule set and updates `jav_id` in place (no rclone). Full `--scan` (without `--scan-since`) stamps current rules; incremental `--scan --scan-since` deliberately does not. Verified on the live 7124-file cache.
|
||
|
||
Host (`rcjav-host.py`): new `--print-rules-info` flag on the Python side returns `{cache_schema, id_rules, id_rules_signature}` cheaply. Host memoizes the result per script path in `_RULES_INFO_CACHE` and augments `cache_status` responses with `cache_schema`, `id_rules`, `id_rules_signature`, the corresponding `expected_*` constants, three `*_match` booleans, and `cache_state` (`fresh` / `stale_by_rules` / `schema_mismatch` / `missing`). Legacy `version: 3` caches still on disk are reported as `stale_by_rules` with `cache_schema_match: true` (we'll migrate them at next `load_cache`). New `reextract_ids` action forwards to `rc-jav.py --reextract --format json` with a 5-minute timeout.
|
||
|
||
Extension (`background.js` + `options-cache.js` + `options-library-issues.js`): new `reextract-ids` message in `background.js` calls the host with a 300s timeout. `renderCacheContractBanner(r)` in `options-cache.js` paints the three-state inline banner above the per-remote list — green ✓ for fresh, amber ! for stale-by-rules (with a "Re-extract IDs (fast, no rescan)" chip button), red ✗ for schema mismatch. The delegated click handler in `options-library-issues.js` (which already owns the cache-status-results container) catches `.cache-reextract`, sends the message, shows a transient "Re-extracting…" state, and replaces the button with a per-remote summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). `rules_info_error` from the host surfaces as a separate amber line above the banner.
|
||
11. **Host fast-path benchmark — done, decision = keep.** `benchmarks/host-fast-path.py` (Python repo) compares `handle_cached_search_fast` against `rc-jav.py --search ID --cache --format json` on the live 7124-file cache. Idle baseline (5 queries × 5 iterations): fast-path median 0.46ms / p95 0.61ms / max 0.72ms; subprocess median 919ms / p95 1233ms / max 1385ms; **2000× median speedup**. The ~920ms subprocess cost is structural — Python interpreter startup + 1.3 MB cache.json parse — so it applies under idle Python too, not just when a scan is running. The "Python actively scanning" condition from the original framing doesn't change the verdict; it would only make the subprocess path slower while leaving the in-process fast path unaffected. Fast path is already correctly scoped (bails for wildcards, ranges, name searches, `--quick`). Possible follow-up (not in scope): memoize `_load_host_cache` with mtime-based invalidation so the fast path doesn't reparse cache.json on every call — current per-call median is already fast enough that this is optional. See `benchmarks/README.md` for the full write-up.
|
||
|
||
**Architecture (locked — do not relitigate):**
|
||
|
||
- Sidebar = Console / Settings / Support tri-split. No dashboard pane. Status carried by badges on tab labels (`Duplicate Review [27]`, `Cache & Scans [28m]`, `Library Issues [4]`).
|
||
- Default landing = Duplicate Review.
|
||
- Bulk ID Check = detached `chrome.windows.create` popup, NOT a Console sidebar tab. Single canonical entry path = popup launcher button.
|
||
- Keep Ranking Rules nested INSIDE Duplicate Review as a sub-tab, NOT a separate Settings tab.
|
||
- Sim Dupe: deleted from extension. Repo HTML harness in `samples/` only.
|
||
- Transfer Assistant: deleted. Diagnostics' Native host registration fieldset is the replacement (Extension ID copy + Repair Registration + Verify Registration buttons).
|
||
- Vanilla JS + ordered `<script>` tags. No framework, no build system.
|
||
- Inline rule tests stay next to rule editors (Matching Rules, Site Extraction). Standalone benches go to Debug Tools.
|
||
|
||
**Notes:**
|
||
|
||
- Repo is NOT git-initialized. Rollback for shipped steps = manual restore from this conversation's diffs. Worth running `git init` in this folder before step 6 (the big one) for safer iteration.
|
||
- Three pre-execution handoffs from the original plan have been resolved:
|
||
- Recent Activity scope test → settled by code read (single role, all to Debug).
|
||
- Diagnostics replacement for Transfer wizard → present (Extension ID, Repair, Verify all visible in one fieldset).
|
||
- Popup launcher button label → defer until step 7a; text + emoji currently in mockup.
|
||
|
||
If a future session wants to continue: read this status block + open the mockup HTML files for the full spec. Resume on step 6.
|