Files
admin f7fc15b17c Sync working tree before initial Gitea push
Includes:
- cli.py path fix (parents[1]) for config/catalog resolution
- Library cleanup feature design docs (TODO.md, mockup)
- Audit + bug-queue markdowns from May 2026 reliability pass
- .gitignore expanded for transient artifacts
2026-05-26 22:35:42 +02:00

4.0 KiB

rc-jav (Python CLI)

Session memory for Codex. Read before making changes here.

What this is

A read-only rclone library comparison + search CLI. Compares cq:JAV remote (rclone crypt) against itself (dupe detection) or against external WinCatalog CSV/XML exports. Powers the rclone-jav Brave extension via native messaging.

Architecture

rc-jav.py
  ├── reads config.json (default_target etc.)
  ├── reads cache.json (per-remote file index, written by --scan)
  ├── shells out to: rclone lsf / rclone lsjson / rclone size --json
  ├── extract_id() per filename → normalized ID with optional #partN / variant suffix
  ├── two query modes: --quick (live rclone --include glob) and cached (uses cache.json)
  └── output: rich tables (default) | --basic plain | --format json (for extension)

Files

D:\DEV\Project\rclone-jav\
├── rc-jav.py             single-file CLI
├── config.json           default_source/target/catalog (user-editable via --save)
├── cache.json            scanned remote file index (written by --scan)
├── wincatalog\           drop WinCatalog CSV/XML exports here (auto-loaded)
├── TODO.md               deferred work
└── README.md

Companion project

D:\DEV\Extensions\Production\rclone-jav\ (PC 1) / D:\DEV\Extensions\Staging\rclone-jav\ (PC 2) — Brave extension + native messaging host that shells out to rc-jav.py for searches.

ID normalization

  • extract_id() chops trailing single letters (e.g. IBW-902z.mp4IBW-902). Decision is intentional — see extension's AGENTS.md "Decision log".
  • JAV IDs are canonicalized with at least 3 digits (ABC-27ABC-027); 4+ digit IDs keep their width (ABCD-1294). User expects real JAV IDs to be ABC-027, never ABC-27 or ABC-0027.
  • Part suffix detection: _1, -pt1, (1) → appended as #partN for distinctness.
  • Compound prefixes (FC2-PPV-123) handled via secondary regex.
  • Search matcher does prefix lookup so IBW-902 finds both IBW-902 and IBW-902#part1 etc.
  • Quick search must emit only canonical padded uppercase globs (ABC-027*, ABCDE-1167*). Do not add --ignore-case; user never uses lowercase filenames and it caused noticeable delay.

Defaults from earlier sessions

  • cq:JAV is the current remote root (after the rclone crypt config change moved it down a level)
  • default_target in config.json = ["cq:JAV"] (hardcoded fallback in cli.py matches)
  • human_size() formats to 2 decimals (e.g. 6.94 GiB)
  • After the 3-digit ID canonicalization change, run python rc-jav.py --scan to rebuild cache.json under the new padded keys.
  • Duplicate KEEP ranking uses configurable VIP folders before source/size/format ranking. Default VIP folder is ClearJAV; video files there are treated as the trusted direct-rip copy.
  • Duplicate KEEP ranking treats .ts as the lowest-priority video container when any non-.ts duplicate is available.

Recent decisions / bug fixes

  • --format json should keep stdout as clean JSON. Status/progress text belongs on stderr in JSON mode.
  • Catalog rows are informational. CSV exports mark them as CATALOG; JSON exports put them under catalog, not delete_candidates.
  • Cache loading validates the top-level shape and falls back to an empty cache when remotes is missing or malformed.
  • The old --recursive/-R flag was removed because scans are always recursive (rclone lsf -R / quick lsjson -R).

TODO

See TODO.md for deferred work.

When making changes

  • Adding CLI flags: also update host invocation in D:\DEV\Extensions\Production\rclone-jav\host\rcjav-host.py if the flag matters to the extension
  • Changing extract_id() semantics: forces a --scan to rebuild cache under new keys, and may need a parallel change in extension's normalizeId()
  • JSON output format changes: extension's popup.js / overlay rendering reads structured array — keep field names stable (source, remote, path, full_path, size, size_human, mod_time, jav_id)
  • Config schema: update --save writer and any defaults