22 KiB
rclone-jav (Brave extension + native messaging host)
Session memory for Claude. Read before making changes here.
Architecture
Brave tab title -> content script extracts JAV ID
-> background.js connectNative("com.rcjav.host")
-> host/rcjav-host.bat (portable: py launcher or python on PATH)
-> host/rcjav-host.py
-> subprocess python rc-jav.py --search ID --basic --no-color --format json
-> structured hits back through native port
-> popup or in-page overlay
Two separate codebases:
- This repo: Brave extension + native messaging host.
D:\DEV\Project\rclone-jav\— Python rc-jav CLI. The host shells out torc-jav.pyhere.
Folder layout (post-rename)
D:\DEV\Extensions\Production\rclone-jav\ (PC 1)
D:\DEV\Extensions\Staging\rclone-jav\ (PC 2)
├── manifest.json
├── background.js
├── content.js
├── popup.{html,js,css}
├── options.{html,js}
├── host\
│ ├── rcjav-host.py
│ ├── rcjav-host.bat (portable: py launcher fallback)
│ ├── install-host.ps1 (self-elevates to HKLM)
│ ├── register-host.bat (prompts for ID, calls install-host.ps1)
│ ├── com.rcjav.host.json (generated; UTF-8 NO BOM)
│ └── (logs)
└── docs\
├── INSTALL.md (gotcha table at the bottom)
└── README.md
Critical gotchas (learned the hard way)
| Symptom | Cause | Fix |
|---|---|---|
| "Specified native messaging host not found" | UTF-8 BOM in com.rcjav.host.json | WriteAllText with UTF8Encoding($false) |
| Same error after registering HKCU | Brave on Windows ignores HKCU on some installs | Register HKLM too. install-host.ps1 does both. |
| Host launches then disconnects | Python text-mode stdio mangles 4-byte length prefix | msvcrt.setmode(stdin/stdout, O_BINARY) at host startup |
| Host log says "stdin closed, exiting" immediately | bat-side stderr leak corrupts protocol | python -u + redirect stderr to log file |
Missing closing '}' in install-host.ps1 |
Em-dashes in comments + LF endings + Windows PS 5.1 (cp1252 fallback) | Strip em-dashes from .ps1 files, or save with BOM, or use pwsh |
| Brave reload != Brave restart | NM cache survives extension reload | Kill all brave.exe processes then reopen |
IBW-902z page title fails to parse |
\b after \d blocked by following word char |
Extension regex uses [a-zA-Z]?\b trailing — captured but discarded |
| Delete safety too broad | Allowlist reduced cq:JAV to cq: |
Match full configured prefixes, not remote roots |
| Overlay feels ~1.5s late on SPA pages | SPA_SETTLE_MS waits before auto-check |
Current value is 800ms; tune carefully if detection gets flaky |
Internal names — keep as-is
- Native messaging host:
com.rcjav.host(NOT renamed despite extension rename) - Window flag in content.js:
__rclonex_loaded__(idempotency guard for content script re-injection) - CSS IDs starting with
rclonex-(overlay) - Host logs:
host/logs/rcjav-host.log,host/logs/rcjav-host-events.log,host/logs/rcjav-host-stderr.log,host/logs/deletes.log - Host scan progress state:
host/state/scan-state.json
Don't rename these unless there's a real reason. They're orthogonal to the user-facing extension name.
Settings
Stored in chrome.storage.sync under key settings. Per-extension-ID namespacing → if extension is reloaded under a different path, settings are wiped.
Backup/restore lives in Options → Setup & Transfer — JSON export/import to survive reloads or PC migrations. Use it before renaming or relocating the extension.
DEFAULT_SETTINGS lives in background.js. Keep in sync with options.html defaults.
Decision log
Deletion allowlist uses full prefixes (2026-05-20)
Decision: host delete allowlist must use full configured path prefixes (cq:JAV, trash dir, etc.), not only remote roots like cq:.
Reasoning: Reducing cq:JAV to cq: lets any path on the same rclone remote pass the safety check. Deletion is opt-in but must be tightly scoped.
Important: extension delete calls must forward rcjav_path, or the host may read the wrong config.json and derive the wrong allowlist.
Toolbar popup setting gates auto-check (2026-05-20)
Decision: triggers.toolbarClick does not remove the MV3 popup, but it does gate whether the popup auto-runs checkTab on open. If disabled, popup stays idle until user clicks Re-Scan.
Quick search and ID padding (2026-05-20)
Decision: rc-jav canonical JAV IDs use at least 3 digits (ABC-027) and preserve 4+ digit IDs (ABCD-1294). Quick search emits canonical uppercase globs only.
Reasoning: user clarified real JAV filenames are never ABC-27 or ABC-0027; they are ABC-027. User also never uses lowercase filenames, so quick search should not use rclone --ignore-case because it added noticeable delay.
Operational note: this changes cache keys. Run python rc-jav.py --scan in D:\DEV\Project\rclone-jav after this change.
No-match overlay metadata (2026-05-20)
Decision: host search response includes cache_meta and scanned_remotes from rc-jav JSON so no-match overlays can show what was scanned instead of falling back to "library".
IBW-902z trailing letter (2026-05-20)
Decision: minimal regex fix in extension only. NOT a full variant-suffix rewrite of the index.
Reasoning: User's library uses one ID per number (either IBW-902 OR IBW-902z, not both). Page titles failing on IBW-902z is the real bug. Extension regex now matches optional trailing letter and discards it. rc-jav's index continues to strip trailing letters at extract_id time. Effective: extension queries IBW-902 for any title IBW-902 or IBW-902z, finds the file regardless of how it's named on rclone.
Revisit if: both IBW-902.mp4 and IBW-902z.mp4 ever coexist in library — they'd collide on the same ID. Then implement variant suffix (#var_Z) end-to-end.
Native messaging host name stayed com.rcjav.host
When extension was renamed rclonex → rclone-jav, the NM host name was NOT renamed. Reason: zero user impact (it's an internal identifier in registry/manifest), but every rename costs registry rewrites + script churn. Not worth it.
WinCatalog backslash normalization
Done in rc-jav catalog loading. Catalog CSV/XML paths are normalized from Windows \ to rclone-style / before the extension sees them.
When making changes
- Extension settings schema change → update
DEFAULT_SETTINGSin background.js AND defaults in options.html + options.js load() - New native messaging action → handler in rcjav-host.py + DISPATCH map + extension code that sends it
- New options pane → sidebar item in options.html + new
.panediv + load/save bindings in options.js - Any rc-jav.py CLI change → host invocation in rcjav-host.py handle_search must keep pace
Console consolidation refactor — execution status
Spec / blueprint:
D:\DEV\Project\rclone-jav\mockups\console-consolidation-claude.html(refactor spec — decision table, sequence, acceptance criteria)D:\DEV\Project\rclone-jav\mockups\console-consolidation-options.html(Codex's visual annotation variant)
Shipped (in execution order):
- Sim Dupe deleted from popup. Button + click handler removed from
popup.html/popup.js. Payload preserved insamples/sim-dupe.jsfor future layout work. - CSS extracted from options.html. Embedded
<style>block moved tooptions.css, linked via<link rel="stylesheet">. options.html went 1179 → 794 lines. Inlinestyle="..."attributes intentionally left for later (step 6 territory). - Transfer Assistant wizard deleted. "Setup & Transfer" pane renamed to "Setup". Replacement: Extension ID display + Copy button added to Diagnostics → Native host registration fieldset (always visible, not failure-gated). Sidebar entry, fieldset, modal, and ~107 lines of JS removed.
- Recent Activity + Search Troubleshooting moved to new Debug Tools pane. Verified Recent Activity is search-trigger-only by reading
background.js—recordActivity()is NOT called fromdelete-filehandler. No audit-value split needed. New sidebar entry "Debug Tools" under System group; newpane-debughouses both fieldsets. - options.js split — Cache & Scans + Duplicate Review paired extraction.
options.js3133 → 2356 lines. New files:options-cache.js(161 lines, Cache & Scans block),options-dupe-review.js(616 lines, Dup Review + Keep Ranking incl. bottomloadKeepRanking()call). Script-tag order inoptions.html: cache → dupe-review → options.js (body bottom). Cross-script binding visibility (vanilla classic scripts share global declarative env): Library Issues code still in options.js reads_configuredScanRoots/_cacheSkippedByRemote/ callsrememberConfiguredScanRootsfrom cache file by bare reference. Calls toescapeHtml/openModal/closeModal/keepActionViewport/clearNativeRepairCard/renderNativeMessagingFailurefrom extracted files all occur inside event handlers (resolved at call time, after options.js parses). Repogit init'd before this step; baseline commitf8e781fis the rollback point. Verified bynode --checkon each file and on concatenated script. 6b. options.js split — Library Issues extraction.options.js2356 → 1903 lines. New file:options-library-issues.js(453 lines) — coverslastLibraryIssues,_libraryIssuesDirty,renderLibraryIssues,_closeLibraryIssues, and the bottom IIFE that wraps_optScanTimer/_setOptScanningState/_pollOptProgressfor optimization-scan progress polling. Block was fully self-contained (no external callers of its identifiers). Reads_configuredScanRoots/_cacheSkippedByRemote/ callsrememberConfiguredScanRootsfromoptions-cache.js— same cross-file binding pattern proven in step 6. Script-tag order inoptions.html: cache → dupe-review → library-issues → options.js.node --checkpasses on each file and on concatenation; line count of concat (3133) matches pre-split total exactly. 7a. Bulk Check standalone window. Newbulk-check.{html,js,css}opened as detachedchrome.windows.create({ type: 'popup', width: 640, height: 540 }). Launcher = 📋 icon button in popup header next to ⚙ Options; click sendsopen-bulk-checkmessage to background and closes the popup. Background owns the lifecycle:openBulkCheckWindow()readschrome.storage.session.bulkCheckWindowId; existing id →chrome.windows.update({ focused, drawAttention }); failure or no id → create new window + stash id.chrome.windows.onRemovedclears the stale id on close. Last-paste persisted tochrome.storage.local.bulkCheckLastPaste(debounced 500ms), restored on window open.quickModeread from settings on each run (parity with old options behavior). Removed the Bulk ID Check fieldset fromoptions.html(Library Review pane description updated to note the relocation) and its handlers fromoptions.js(1903 → 1852 lines). No manifest permission changes needed. - Shared fixture corpus. Seeded
D:\DEV\Project\rclone-jav\fixtures\(top-level in the Python repo, conceptually shared with this extension). Files:filename-extraction.json(12 cases, Pythonextract_idcontract),query-extraction.json(10 cases, extensioncontent.jsnormalizeIdcontract),shared-normalization.json(5 cases, both sides must agree),README.md, and a self-contained Python runnerrun.py(no third-party deps; importsrc-jav.pyin place). All 17 Python-side cases pass against currentrc-jav.py. The runner uses|and->instead of·and→so it works on Windows cp1252 consoles. Documented one intentional divergence: the extension normalizes the compactFC2PPV1841460form (page-title surface) while Pythonextract_iddoes not (filename surface — compact form doesn't appear on disk). No Node-side runner today —content.jslives in an injected IIFE and importing it would require duplicating regexes; the JSON corpus is the canonical spec until that lands. - Cache contract design — shipped as a design doc, not code.
docs/CACHE_CONTRACT.mddefines a two-tier model that splits today's singleCACHE_VERSION = 3intocache_schema(force rebuild on mismatch) andid_rules(mark stale, allow lazy re-extract without re-scanning). Addsid_rules_signature(sha256 over canonical text of all extraction-rule sources, including user-added normalizers from config.json) as a belt-and-braces drift check. Specifies the new cache header shape, a one-shot in-place migration for users on legacyversion: 3(no forced rescan), the behavior matrix for the three resulting states, and the extension's three-state UX (fresh / stale-by-rules amber / schema-mismatch red) with a new "Re-extract IDs" action that walksfiles[]in place and never touches rclone. Step 10 implements; step 9 only locks the contract.
(Step 4 in the plan is a paired-extraction sub-task of step 6; folded into step 6 ship.)
6c. options.js split — Diagnostics + Profiles + Rules Editors extracted. Final mechanical split. Three new files: options-diagnostics.js (245 lines — extension ID display, runDiagnostics, host status, host repair, native messaging failure renderer), options-profiles.js (265 lines — _knownRemotes, _cfgDefaults, fetchRemotes, buildRemotePicker, profile modal), options-rules-editors.js (328 lines — adapters + ID normalizers + custom part detectors with their feedback UI). options.js is now 1014 lines — entry IIFE, settings load/save, backup/restore, recent activity, search test bench, element picker, overlay previews, no-match overlay, escapeHtml, and paths. The picker + overlay-preview code stays because it's tightly coupled across multiple settings panes and the JS-DOM call graph would have to be untangled to extract cleanly. Script-tag order in options.html now: cache → dupe-review → library-issues → diagnostics → profiles → rules-editors → options.js (entry). node --check clean on each file individually and on the concatenated load-order stream. Concat = 3144 lines, matching the pre-6c sum exactly.
Pending: none. Original roadmap closed. Follow-ups recorded inline above (e.g. step 11's _load_host_cache memoization is already shipped via the _cache_mem stamp dict). Node-side fixture runner (fixtures/run-node.mjs) added so shared-normalization.json now genuinely guards cross-side drift — the original step 8 ship noted the gap; it's closed.
10. rc-jav.py package split — done (sub-steps 10a–10i, shipped across two sessions). Python repo at D:\DEV\Project\rclone-jav\ is now git-tracked (baseline e029e89); rc-jav.py went from 2230 lines to a 25-line shim. New rcjav/ package contains: model.py (24, FileEntry), ids.py (243, ID extraction + part detection + normalization + describe_id_match + expand_range), cache.py (76, cache.json I/O), catalog.py (178, WinCatalog CSV/XML), dupes.py (264, keep-ranking + find_dupes + variant alerts), rclone_io.py (298, subprocess wrappers + walk_remote + glob escaping), library.py (176, library-issues + safe rename), output.py (495, rich console + renderers + plain/CSV/JSON outputs), cli.py (845, main() + collectors + arg parsing). Pattern across all sub-steps: top-level mutable globals (PART_RES, _KEEP_RANKING, BASIC, RCLONE_BIN, console, USE_ANSI) are read/written only inside their owning module — callers go through setters (configure_part_patterns, set_keep_ranking, set_basic, set_rclone_bin, set_console_no_color, set_use_ansi) so no in-tree code ever sees a stale captured binding. rc-jav.py shim does from rcjav import * + from rcjav.cli import main, so importlib.spec_from_file_location("rcjav_script", "rc-jav.py") (used by tests/fixtures/native host) still finds every previously-top-level name. Each sub-step verified at commit time via python rc-jav.py --help, python -m rcjav.cli --help, python fixtures/run.py (17/17 cases), and python -m unittest tests.test_rules (5/5).
(Step 10's cache-contract implementation is split off as step 10j below — design from step 9 is locked, implementation hasn't shipped.)
10j. Cache contract implementation — done. Two-tier contract from docs/CACHE_CONTRACT.md now live end-to-end across Python + host + extension.
Python (rcjav/cache.py + rcjav/ids.py + rcjav/cli.py): new constants CACHE_SCHEMA_VERSION = 1 and ID_RULES_VERSION = 1. New current_rules_signature() in rcjav.ids produces a stable sha256 over the canonical text of every rule that influences a jav_id (PRIMARY_ID_RE, COMPOUND_ID_RE, FALLBACK_ID_RE, _NOHYPHEN_ID_RE, _BRACKET_ID_RE, _VARIANT_SUFFIX_RE, _XOFY_PRIORITY_RE, _RESOLUTION_TAG_RE, BUILTIN_PART_RES, PART_RES, FC2 handling toggle). load_cache(signature) translates legacy version: 3 headers in place — no forced rescan; the cache is stamped id_rules: 0 + signature "legacy" so it reads as "stale by rules". cache_state(cache, sig) classifies as fresh / stale_by_rules / schema_mismatch. stamp_current_rules(cache, sig) updates the header after a full scan or --reextract. New rc-jav.py --reextract walks cache["remotes"][r]["files"] against the live rule set and updates jav_id in place (no rclone). Full --scan (without --scan-since) stamps current rules; incremental --scan --scan-since deliberately does not. Verified on the live 7124-file cache.
Host (rcjav-host.py): new --print-rules-info flag on the Python side returns {cache_schema, id_rules, id_rules_signature} cheaply. Host memoizes the result per script path in _RULES_INFO_CACHE and augments cache_status responses with cache_schema, id_rules, id_rules_signature, the corresponding expected_* constants, three *_match booleans, and cache_state (fresh / stale_by_rules / schema_mismatch / missing). Legacy version: 3 caches still on disk are reported as stale_by_rules with cache_schema_match: true (we'll migrate them at next load_cache). New reextract_ids action forwards to rc-jav.py --reextract --format json with a 5-minute timeout.
Extension (background.js + options-cache.js + options-library-issues.js): new reextract-ids message in background.js calls the host with a 300s timeout. renderCacheContractBanner(r) in options-cache.js paints the three-state inline banner above the per-remote list — green ✓ for fresh, amber ! for stale-by-rules (with a "Re-extract IDs (fast, no rescan)" chip button), red ✗ for schema mismatch. The delegated click handler in options-library-issues.js (which already owns the cache-status-results container) catches .cache-reextract, sends the message, shows a transient "Re-extracting…" state, and replaces the button with a per-remote summary line ("Re-extracted N IDs · X changed · Y unchanged · Z dropped"). rules_info_error from the host surfaces as a separate amber line above the banner.
11. Host fast-path benchmark — done, decision = keep. benchmarks/host-fast-path.py (Python repo) compares handle_cached_search_fast against rc-jav.py --search ID --cache --format json on the live 7124-file cache. Idle baseline (5 queries × 5 iterations): fast-path median 0.46ms / p95 0.61ms / max 0.72ms; subprocess median 919ms / p95 1233ms / max 1385ms; 2000× median speedup. The ~920ms subprocess cost is structural — Python interpreter startup + 1.3 MB cache.json parse — so it applies under idle Python too, not just when a scan is running. The "Python actively scanning" condition from the original framing doesn't change the verdict; it would only make the subprocess path slower while leaving the in-process fast path unaffected. Fast path is already correctly scoped (bails for wildcards, ranges, name searches, --quick). Possible follow-up (not in scope): memoize _load_host_cache with mtime-based invalidation so the fast path doesn't reparse cache.json on every call — current per-call median is already fast enough that this is optional. See benchmarks/README.md for the full write-up.
Architecture (locked — do not relitigate):
- Sidebar = Console / Settings / Support tri-split. No dashboard pane. Status carried by badges on tab labels (
Duplicate Review [27],Cache & Scans [28m],Library Issues [4]). - Default landing = Duplicate Review.
- Bulk ID Check = detached
chrome.windows.createpopup, NOT a Console sidebar tab. Single canonical entry path = popup launcher button. - Keep Ranking Rules nested INSIDE Duplicate Review as a sub-tab, NOT a separate Settings tab.
- Sim Dupe: deleted from extension. Repo HTML harness in
samples/only. - Transfer Assistant: deleted. Diagnostics' Native host registration fieldset is the replacement (Extension ID copy + Repair Registration + Verify Registration buttons).
- Vanilla JS + ordered
<script>tags. No framework, no build system. - Inline rule tests stay next to rule editors (Matching Rules, Site Extraction). Standalone benches go to Debug Tools.
Notes:
- Repo is NOT git-initialized. Rollback for shipped steps = manual restore from this conversation's diffs. Worth running
git initin this folder before step 6 (the big one) for safer iteration. - Three pre-execution handoffs from the original plan have been resolved:
- Recent Activity scope test → settled by code read (single role, all to Debug).
- Diagnostics replacement for Transfer wizard → present (Extension ID, Repair, Verify all visible in one fieldset).
- Popup launcher button label → defer until step 7a; text + emoji currently in mockup.
If a future session wants to continue: read this status block + open the mockup HTML files for the full spec. Resume on step 6.