# Shared JAV ID fixture corpus JSON cases shared between the Python `rc-jav.py` CLI and the browser extension at `D:\DEV\Extensions\Production\rclone-jav\`. Each side reads the cases relevant to its own extraction surface. ## Files | File | Domain | Consumer | Notes | |-------------------------------|----------|----------------------------------------|-------| | `filename-extraction.json` | filename | Python `extract_id(name)` | Has `#partN` expectations for multipart files | | `query-extraction.json` | query | Extension `content.js` `normalizeId` | Looser context; extension never emits part suffix | | `shared-normalization.json` | shared | BOTH | Contract: any mismatch here is a bug, not a fixture issue | All files share the same shape: ```json { "version": 1, "domain": "…", "description": "…", "case_schema": { … }, "cases": [ { "name": "…", "input": "…", "expected": "…" } ] } ``` `expected: null` means "no ID should be detected". ## Running the Python side ```bash python fixtures/run.py ``` The runner imports `rc-jav.py` in place, exercises `extract_id` against `filename-extraction.json`, and `normalize_id` against `shared-normalization.json`. Exit code is non-zero on any failure. ## Running the extension side No automated runner today. `content.js` lives inside an IIFE that the browser injects into pages, so importing it from Node would require either an extraction refactor or a duplicated copy of the regex. Until that lands, treat `query-extraction.json` and `shared-normalization.json` as the canonical specification: if you touch `ID_RE_DASHED`, `ID_RE_UNDASHED`, or `BUILTIN_ID_NORMALIZERS` in content.js, eyeball this corpus and confirm the cases still describe expected behavior. ## Adding a case 1. Pick the file matching the surface you're testing. 2. Append a `{ "name", "input", "expected" }` entry. Keep `name` descriptive — it's the only label shown when the runner fails. 3. If the case exercises a guarantee both sides must honor, add it to `shared-normalization.json` as well. 4. Run `python fixtures/run.py` to confirm Python still passes. ## Known cross-side divergences (intentional) These are NOT bugs — they reflect the different surfaces each side extracts from. Recorded here so future contributors don't try to "fix" them. - **`FC2PPV1841460` compact form (no dashes).** The extension's `BUILTIN_ID_NORMALIZERS` in `content.js` rewrites this to `FC2-PPV-1841460` when seen in page titles. Python `extract_id` does NOT — the compact form doesn't realistically appear in filenames on disk. Hence the case lives in `query-extraction.json` only, not in `filename-extraction.json` or `shared-normalization.json`. If a case belongs to one side's contract but not the other's, file it under the specific domain (`filename-` or `query-`) — not under `shared-`. ## Ownership This directory lives in the Python repo only because the Python repo is the more stable root. Conceptually it's joint property of both codebases. Don't add anything Python-specific to the JSON files — keep them tool-neutral.