Shared JAV ID fixture corpus
JSON cases shared between the Python rc-jav.py CLI and the browser
extension at D:\DEV\Extensions\Production\rclone-jav\. Each side
reads the cases relevant to its own extraction surface.
Files
| File | Domain | Consumer | Notes |
|---|---|---|---|
filename-extraction.json |
filename | Python extract_id(name) |
Has #partN expectations for multipart files |
query-extraction.json |
query | Extension content.js normalizeId |
Looser context; extension never emits part suffix |
shared-normalization.json |
shared | BOTH | Contract: any mismatch here is a bug, not a fixture issue |
All files share the same shape:
{
"version": 1,
"domain": "…",
"description": "…",
"case_schema": { … },
"cases": [
{ "name": "…", "input": "…", "expected": "…" }
]
}
expected: null means "no ID should be detected".
Running the Python side
python fixtures/run.py
The runner imports rc-jav.py in place, exercises extract_id against
filename-extraction.json, and normalize_id against
shared-normalization.json. Exit code is non-zero on any failure.
Running the extension side
No automated runner today. content.js lives inside an IIFE that the
browser injects into pages, so importing it from Node would require
either an extraction refactor or a duplicated copy of the regex. Until
that lands, treat query-extraction.json and shared-normalization.json
as the canonical specification: if you touch ID_RE_DASHED,
ID_RE_UNDASHED, or BUILTIN_ID_NORMALIZERS in content.js, eyeball
this corpus and confirm the cases still describe expected behavior.
Adding a case
- Pick the file matching the surface you're testing.
- Append a
{ "name", "input", "expected" }entry. Keepnamedescriptive — it's the only label shown when the runner fails. - If the case exercises a guarantee both sides must honor, add it to
shared-normalization.jsonas well. - Run
python fixtures/run.pyto confirm Python still passes.
Known cross-side divergences (intentional)
These are NOT bugs — they reflect the different surfaces each side extracts from. Recorded here so future contributors don't try to "fix" them.
FC2PPV1841460compact form (no dashes). The extension'sBUILTIN_ID_NORMALIZERSincontent.jsrewrites this toFC2-PPV-1841460when seen in page titles. Pythonextract_iddoes NOT — the compact form doesn't realistically appear in filenames on disk. Hence the case lives inquery-extraction.jsononly, not infilename-extraction.jsonorshared-normalization.json.
If a case belongs to one side's contract but not the other's, file it
under the specific domain (filename- or query-) — not under
shared-.
Ownership
This directory lives in the Python repo only because the Python repo is the more stable root. Conceptually it's joint property of both codebases. Don't add anything Python-specific to the JSON files — keep them tool-neutral.