Files
rclone-jav/AGENTS.md
T

72 lines
4.0 KiB
Markdown

# rc-jav (Python CLI)
Session memory for Codex. Read before making changes here.
## What this is
A read-only rclone library comparison + search CLI. Compares `cq:JAV` remote (rclone crypt) against itself (dupe detection) or against external WinCatalog CSV/XML exports. Powers the rclone-jav Brave extension via native messaging.
## Architecture
```
rc-jav.py
├── reads config.json (default_target etc.)
├── reads cache.json (per-remote file index, written by --scan)
├── shells out to: rclone lsf / rclone lsjson / rclone size --json
├── extract_id() per filename → normalized ID with optional #partN / variant suffix
├── two query modes: --quick (live rclone --include glob) and cached (uses cache.json)
└── output: rich tables (default) | --basic plain | --format json (for extension)
```
## Files
```
D:\DEV\Project\rclone-jav\
├── rc-jav.py single-file CLI
├── config.json default_source/target/catalog (user-editable via --save)
├── cache.json scanned remote file index (written by --scan)
├── wincatalog\ drop WinCatalog CSV/XML exports here (auto-loaded)
├── TODO.md deferred work
└── README.md
```
## Companion project
`D:\DEV\Extensions\Production\rclone-jav\` (PC 1) / `D:\DEV\Extensions\Staging\rclone-jav\` (PC 2) — Brave extension + native messaging host that shells out to `rc-jav.py` for searches.
## ID normalization
- `extract_id()` chops trailing single letters (e.g. `IBW-902z.mp4``IBW-902`). Decision is intentional — see extension's AGENTS.md "Decision log".
- JAV IDs are canonicalized with at least 3 digits (`ABC-27``ABC-027`); 4+ digit IDs keep their width (`ABCD-1294`). User expects real JAV IDs to be `ABC-027`, never `ABC-27` or `ABC-0027`.
- Part suffix detection: `_1`, `-pt1`, `(1)` → appended as `#partN` for distinctness.
- Compound prefixes (`FC2-PPV-123`) handled via secondary regex.
- Search matcher does prefix lookup so `IBW-902` finds both `IBW-902` and `IBW-902#part1` etc.
- Quick search must emit only canonical padded uppercase globs (`ABC-027*`, `ABCDE-1167*`). Do not add `--ignore-case`; user never uses lowercase filenames and it caused noticeable delay.
## Defaults from earlier sessions
- `cq:JAV` is the current remote root (after the rclone crypt config change moved it down a level)
- `default_target` in config.json = `["cq:JAV"]`
- `human_size()` formats to 2 decimals (e.g. `6.94 GiB`)
- After the 3-digit ID canonicalization change, run `python rc-jav.py --scan` to rebuild `cache.json` under the new padded keys.
- Duplicate KEEP ranking uses configurable VIP folders before source/size/format ranking. Default VIP folder is `ClearJAV`; video files there are treated as the trusted direct-rip copy.
- Duplicate KEEP ranking treats `.ts` as the lowest-priority video container when any non-`.ts` duplicate is available.
## Recent decisions / bug fixes
- `--format json` should keep stdout as clean JSON. Status/progress text belongs on stderr in JSON mode.
- Catalog rows are informational. CSV exports mark them as `CATALOG`; JSON exports put them under `catalog`, not `delete_candidates`.
- Cache loading validates the top-level shape and falls back to an empty cache when `remotes` is missing or malformed.
- The old `--recursive/-R` flag was removed because scans are always recursive (`rclone lsf -R` / quick `lsjson -R`).
## TODO
See `TODO.md` for deferred work.
## When making changes
- Adding CLI flags: also update host invocation in `D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py` if the flag matters to the extension
- Changing `extract_id()` semantics: forces a `--scan` to rebuild cache under new keys, and may need a parallel change in extension's `normalizeId()`
- JSON output format changes: extension's popup.js / overlay rendering reads `structured` array — keep field names stable (`source`, `remote`, `path`, `full_path`, `size`, `size_human`, `mod_time`, `jav_id`)
- Config schema: update `--save` writer and any defaults