# rc-jav (Python CLI) Session memory for Codex. Read before making changes here. ## What this is A read-only rclone library comparison + search CLI. Compares `cq:JAV` remote (rclone crypt) against itself (dupe detection) or against external WinCatalog CSV/XML exports. Powers the rclone-jav Brave extension via native messaging. ## Architecture ``` rc-jav.py ├── reads config.json (default_target etc.) ├── reads cache.json (per-remote file index, written by --scan) ├── shells out to: rclone lsf / rclone lsjson / rclone size --json ├── extract_id() per filename → normalized ID with optional #partN / variant suffix ├── two query modes: --quick (live rclone --include glob) and cached (uses cache.json) └── output: rich tables (default) | --basic plain | --format json (for extension) ``` ## Files ``` D:\DEV\Project\rclone-jav\ ├── rc-jav.py single-file CLI ├── config.json default_source/target/catalog (user-editable via --save) ├── cache.json scanned remote file index (written by --scan) ├── wincatalog\ drop WinCatalog CSV/XML exports here (auto-loaded) ├── TODO.md deferred work └── README.md ``` ## Companion project `D:\DEV\Extensions\Production\rclone-jav\` (PC 1) / `D:\DEV\Extensions\Staging\rclone-jav\` (PC 2) — Brave extension + native messaging host that shells out to `rc-jav.py` for searches. ## ID normalization - `extract_id()` chops trailing single letters (e.g. `IBW-902z.mp4` → `IBW-902`). Decision is intentional — see extension's AGENTS.md "Decision log". - JAV IDs are canonicalized with at least 3 digits (`ABC-27` → `ABC-027`); 4+ digit IDs keep their width (`ABCD-1294`). User expects real JAV IDs to be `ABC-027`, never `ABC-27` or `ABC-0027`. - Part suffix detection: `_1`, `-pt1`, `(1)` → appended as `#partN` for distinctness. - Compound prefixes (`FC2-PPV-123`) handled via secondary regex. - Search matcher does prefix lookup so `IBW-902` finds both `IBW-902` and `IBW-902#part1` etc. - Quick search must emit only canonical padded uppercase globs (`ABC-027*`, `ABCDE-1167*`). Do not add `--ignore-case`; user never uses lowercase filenames and it caused noticeable delay. ## Defaults from earlier sessions - `cq:JAV` is the current remote root (after the rclone crypt config change moved it down a level) - `default_target` in config.json = `["cq:JAV"]` (hardcoded fallback in cli.py matches) - `human_size()` formats to 2 decimals (e.g. `6.94 GiB`) - After the 3-digit ID canonicalization change, run `python rc-jav.py --scan` to rebuild `cache.json` under the new padded keys. - Duplicate KEEP ranking uses configurable VIP folders before source/size/format ranking. Default VIP folder is `ClearJAV`; video files there are treated as the trusted direct-rip copy. - Duplicate KEEP ranking treats `.ts` as the lowest-priority video container when any non-`.ts` duplicate is available. ## Recent decisions / bug fixes - `--format json` should keep stdout as clean JSON. Status/progress text belongs on stderr in JSON mode. - Catalog rows are informational. CSV exports mark them as `CATALOG`; JSON exports put them under `catalog`, not `delete_candidates`. - Cache loading validates the top-level shape and falls back to an empty cache when `remotes` is missing or malformed. - The old `--recursive/-R` flag was removed because scans are always recursive (`rclone lsf -R` / quick `lsjson -R`). ## TODO See `TODO.md` for deferred work. ## When making changes - Adding CLI flags: also update host invocation in `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py` if the flag matters to the extension - Changing `extract_id()` semantics: forces a `--scan` to rebuild cache under new keys, and may need a parallel change in extension's `normalizeId()` - JSON output format changes: extension's popup.js / overlay rendering reads `structured` array — keep field names stable (`source`, `remote`, `path`, `full_path`, `size`, `size_human`, `mod_time`, `jav_id`) - Config schema: update `--save` writer and any defaults