Commit Graph

2 Commits

Author SHA1 Message Date
admin 33c495ad57 Step 10j (Python side): cache contract + --reextract command
Implements the two-tier contract from docs/CACHE_CONTRACT.md (extension
repo, locked at step 9):

  cache_schema       on-disk shape; mismatch -> force rebuild
  id_rules           bumps when extraction rules change
  id_rules_signature sha256 over canonical rule text; catches drift
                     when the integer bump is forgotten

New constants in rcjav/cache.py:

  CACHE_SCHEMA_VERSION = 1
  ID_RULES_VERSION = 1     (the legacy "version: 3" cache reads as
                            id_rules: 0 after in-place migration)

New helpers:

  rcjav.ids.current_rules_signature()
      Sha256 over the canonical text of every rule that influences
      a jav_id: built-in regexes, BUILTIN_PART_RES, PART_RES (which
      captures user-added part patterns), FC2 handling.

  rcjav.cache.load_cache(signature=None)
      Reads cache.json. Legacy `version: 3` headers get an in-place
      header upgrade with no forced rescan; the cache is stamped as
      `id_rules: 0` + signature "legacy" so it surfaces as
      "stale by rules" in cache_state. Schema mismatch on the new
      header still forces a rebuild.

  rcjav.cache.cache_state(cache, signature)
      Classifies a cache as "fresh" / "stale_by_rules" /
      "schema_mismatch". Drives the three-state extension UX.

  rcjav.cache.stamp_current_rules(cache, signature)
      Updates id_rules and id_rules_signature in place. Called after
      a successful full scan or --reextract.

New CLI command:

  rc-jav.py --reextract

Walks `cache["remotes"][r]["files"]` against the live rule set and
updates `jav_id` in place. No rclone calls — fast path (seconds on
a 7k-file cache). Reports changed/unchanged/dropped per remote.
Stamps current rules into the saved cache.

--scan (full, no --scan-since) now also stamps current rules.
--scan --scan-since deliberately does NOT stamp: it only re-walks
recently-modified files, so older entries may still carry jav_ids
from previous rules; cache stays "stale by rules" until a full scan
or --reextract.

Verified:
  - python rc-jav.py --reextract --format json on the live 7124-file
    cache → 0 changes (existing IDs already canonical), cache.json
    rewritten with new header
  - cache_state on the post-migration cache → "fresh"
  - tests + fixtures + --help all pass

Extension-side (host's cache_status response + options-cache.js
three-state UX + Re-extract IDs button) ships in a separate commit
in the extension repo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 22:07:13 +02:00
admin f03d032336 Step 10c: extract cache I/O into rcjav/cache.py
Pulls CACHE_PATH, CACHE_VERSION, CACHE_STALE_HOURS, load_cache,
save_cache, cache_age_hours, and fmt_age out of rc-jav.py and into a
new self-contained module. No behavior change.

rc-jav.py: 2019 → 1972 lines.

The new module's `CACHE_PATH = Path(__file__).resolve().parents[1] /
"cache.json"` keeps the file at the repo root next to rc-jav.py (one
directory above the package), matching the legacy `Path(__file__).
resolve().parent / "cache.json"` location.

rcjav/__init__.py now re-exports the cache public surface alongside
the model and ids surface.

Verified:
  - python rc-jav.py --help              → ok
  - python fixtures/run.py               → 17/17 cases pass
  - python -m unittest tests.test_rules  → 5/5 OK

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 21:46:20 +02:00