Mirrors `content.js` normalizeId() in a self-contained `fixtures/run-node.mjs`. Loads `query-extraction.json` and `shared-normalization.json` and asserts each case the same way the Python runner does. content.js can't be imported directly — it lives inside an injected IIFE in the extension — so the runner duplicates the regexes (ID_RE_DASHED, ID_RE_UNDASHED, BUILTIN_ID_NORMALIZERS). Inline comment + README update flag that they must be kept in sync. Why this matters: `shared-normalization.json` now actually catches cross-side drift. A case that passes one side but fails the other is the canary — without a Node runner, the contract was aspirational. Verified: $ node fixtures/run-node.mjs query-extraction.json -> normalizeId (10 cases): 10 passed shared-normalization.json -> normalizeId (5 cases): 5 passed OK: all 15 cases passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.3 KiB
Shared JAV ID fixture corpus
JSON cases shared between the Python rc-jav.py CLI and the browser
extension at D:\DEV\Extensions\Production\rclone-jav\. Each side
reads the cases relevant to its own extraction surface.
Files
| File | Domain | Consumer | Notes |
|---|---|---|---|
filename-extraction.json |
filename | Python extract_id(name) |
Has #partN expectations for multipart files |
query-extraction.json |
query | Extension content.js normalizeId |
Looser context; extension never emits part suffix |
shared-normalization.json |
shared | BOTH | Contract: any mismatch here is a bug, not a fixture issue |
All files share the same shape:
{
"version": 1,
"domain": "…",
"description": "…",
"case_schema": { … },
"cases": [
{ "name": "…", "input": "…", "expected": "…" }
]
}
expected: null means "no ID should be detected".
Running the Python side
python fixtures/run.py
The runner imports rc-jav.py in place, exercises extract_id against
filename-extraction.json, and normalize_id against
shared-normalization.json. Exit code is non-zero on any failure.
Running the extension side
node fixtures/run-node.mjs
The Node runner exercises query-extraction.json and
shared-normalization.json against a hand-mirrored copy of
normalizeId from content.js. Because content.js lives inside an
injected IIFE in the extension repo, it can't be imported directly —
the runner duplicates the regexes (ID_RE_DASHED, ID_RE_UNDASHED,
BUILTIN_ID_NORMALIZERS).
If you change any of those in content.js, mirror the change at the
top of fixtures/run-node.mjs. shared-normalization.json catches
silent cross-side drift because both Python and Node exercise it; a
case that passes Python but fails Node (or vice versa) is the canary.
Adding a case
- Pick the file matching the surface you're testing.
- Append a
{ "name", "input", "expected" }entry. Keepnamedescriptive — it's the only label shown when the runner fails. - If the case exercises a guarantee both sides must honor, add it to
shared-normalization.jsonas well. - Run
python fixtures/run.pyto confirm Python still passes.
Known cross-side divergences (intentional)
These are NOT bugs — they reflect the different surfaces each side extracts from. Recorded here so future contributors don't try to "fix" them.
FC2PPV1841460compact form (no dashes). The extension'sBUILTIN_ID_NORMALIZERSincontent.jsrewrites this toFC2-PPV-1841460when seen in page titles. Pythonextract_iddoes NOT — the compact form doesn't realistically appear in filenames on disk. Hence the case lives inquery-extraction.jsononly, not infilename-extraction.jsonorshared-normalization.json.
If a case belongs to one side's contract but not the other's, file it
under the specific domain (filename- or query-) — not under
shared-.
Ownership
This directory lives in the Python repo only because the Python repo is the more stable root. Conceptually it's joint property of both codebases. Don't add anything Python-specific to the JSON files — keep them tool-neutral.