Step 10j (Python side): cache contract + --reextract command
Implements the two-tier contract from docs/CACHE_CONTRACT.md (extension
repo, locked at step 9):
cache_schema on-disk shape; mismatch -> force rebuild
id_rules bumps when extraction rules change
id_rules_signature sha256 over canonical rule text; catches drift
when the integer bump is forgotten
New constants in rcjav/cache.py:
CACHE_SCHEMA_VERSION = 1
ID_RULES_VERSION = 1 (the legacy "version: 3" cache reads as
id_rules: 0 after in-place migration)
New helpers:
rcjav.ids.current_rules_signature()
Sha256 over the canonical text of every rule that influences
a jav_id: built-in regexes, BUILTIN_PART_RES, PART_RES (which
captures user-added part patterns), FC2 handling.
rcjav.cache.load_cache(signature=None)
Reads cache.json. Legacy `version: 3` headers get an in-place
header upgrade with no forced rescan; the cache is stamped as
`id_rules: 0` + signature "legacy" so it surfaces as
"stale by rules" in cache_state. Schema mismatch on the new
header still forces a rebuild.
rcjav.cache.cache_state(cache, signature)
Classifies a cache as "fresh" / "stale_by_rules" /
"schema_mismatch". Drives the three-state extension UX.
rcjav.cache.stamp_current_rules(cache, signature)
Updates id_rules and id_rules_signature in place. Called after
a successful full scan or --reextract.
New CLI command:
rc-jav.py --reextract
Walks `cache["remotes"][r]["files"]` against the live rule set and
updates `jav_id` in place. No rclone calls — fast path (seconds on
a 7k-file cache). Reports changed/unchanged/dropped per remote.
Stamps current rules into the saved cache.
--scan (full, no --scan-since) now also stamps current rules.
--scan --scan-since deliberately does NOT stamp: it only re-walks
recently-modified files, so older entries may still carry jav_ids
from previous rules; cache stays "stale by rules" until a full scan
or --reextract.
Verified:
- python rc-jav.py --reextract --format json on the live 7124-file
cache → 0 changes (existing IDs already canonical), cache.json
rewritten with new header
- cache_state on the post-migration cache → "fresh"
- tests + fixtures + --help all pass
Extension-side (host's cache_status response + options-cache.js
three-state UX + Re-extract IDs button) ships in a separate commit
in the extension repo.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -231,6 +231,38 @@ def describe_id_match(display_query: str, matched_query: str, matched_id: str,
|
||||
}
|
||||
|
||||
|
||||
def current_rules_signature() -> str:
|
||||
"""Sha256 over the canonical text of every rule that influences a jav_id.
|
||||
|
||||
Includes built-in regex sources, BUILTIN_PART_RES sources, and PART_RES
|
||||
(which captures user-added part patterns applied by
|
||||
`configure_part_patterns`). Output prefixed with `sha256:` so callers can
|
||||
sniff the algorithm without re-deriving it.
|
||||
|
||||
Stable across invocations: dict is dumped with sort_keys=True. Bumping a
|
||||
regex changes the digest; reordering BUILTIN_PART_RES also changes it
|
||||
(order is part of the contract because part-detection short-circuits).
|
||||
"""
|
||||
import hashlib
|
||||
import json as _json
|
||||
data = {
|
||||
"schema": 1, # bump when this signature schema itself changes
|
||||
"primary": PRIMARY_ID_RE.pattern,
|
||||
"compound": COMPOUND_ID_RE.pattern,
|
||||
"fallback": FALLBACK_ID_RE.pattern,
|
||||
"nohyphen": _NOHYPHEN_ID_RE.pattern,
|
||||
"bracket": _BRACKET_ID_RE.pattern,
|
||||
"variant": _VARIANT_SUFFIX_RE.pattern,
|
||||
"xofy": _XOFY_PRIORITY_RE.pattern,
|
||||
"resolution_tag": _RESOLUTION_TAG_RE.pattern,
|
||||
"builtin_part_res": [r.pattern for r in BUILTIN_PART_RES],
|
||||
"part_res": [r.pattern for r in PART_RES],
|
||||
"fc2_handling": "fc2_to_ppv",
|
||||
}
|
||||
text = _json.dumps(data, sort_keys=True, ensure_ascii=False)
|
||||
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
|
||||
|
||||
def expand_range(raw: str) -> list[str] | None:
|
||||
"""Expand a bracket range like 'IPZZ-[820-860]' into individual ID strings.
|
||||
Returns None if no range marker present."""
|
||||
|
||||
Reference in New Issue
Block a user