Files

T

admin 8d6bdb81af Add Node-side fixture runner — both sides now exercise the corpus

Mirrors `content.js` normalizeId() in a self-contained
`fixtures/run-node.mjs`. Loads `query-extraction.json` and
`shared-normalization.json` and asserts each case the same way the
Python runner does.

content.js can't be imported directly — it lives inside an injected
IIFE in the extension — so the runner duplicates the regexes
(ID_RE_DASHED, ID_RE_UNDASHED, BUILTIN_ID_NORMALIZERS). Inline
comment + README update flag that they must be kept in sync.

Why this matters: `shared-normalization.json` now actually catches
cross-side drift. A case that passes one side but fails the other is
the canary — without a Node runner, the contract was aspirational.

Verified:
  $ node fixtures/run-node.mjs
  query-extraction.json     -> normalizeId (10 cases): 10 passed
  shared-normalization.json -> normalizeId (5 cases):  5 passed
  OK: all 15 cases passed

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-23 11:18:52 +02:00

filename-extraction.json

Initial snapshot before step 10 package split

2026-05-22 21:39:09 +02:00

query-extraction.json

Initial snapshot before step 10 package split

2026-05-22 21:39:09 +02:00

README.md

Add Node-side fixture runner — both sides now exercise the corpus

2026-05-23 11:18:52 +02:00

run-node.mjs

Add Node-side fixture runner — both sides now exercise the corpus

2026-05-23 11:18:52 +02:00

run.py

Step 10a + 10b: scaffold rcjav/ package, extract ID rules

2026-05-22 21:43:57 +02:00

shared-normalization.json

Initial snapshot before step 10 package split

2026-05-22 21:39:09 +02:00

README.md

Shared JAV ID fixture corpus

JSON cases shared between the Python rc-jav.py CLI and the browser extension at D:\DEV\Extensions\Production\rclone-jav\. Each side reads the cases relevant to its own extraction surface.

Files

File	Domain	Consumer	Notes
`filename-extraction.json`	filename	Python `extract_id(name)`	Has `#partN` expectations for multipart files
`query-extraction.json`	query	Extension `content.js` `normalizeId`	Looser context; extension never emits part suffix
`shared-normalization.json`	shared	BOTH	Contract: any mismatch here is a bug, not a fixture issue

All files share the same shape:

{
  "version": 1,
  "domain": "…",
  "description": "…",
  "case_schema": { … },
  "cases": [
    { "name": "…", "input": "…", "expected": "…" }
  ]
}

expected: null means "no ID should be detected".

Running the Python side

python fixtures/run.py

The runner imports rc-jav.py in place, exercises extract_id against filename-extraction.json, and normalize_id against shared-normalization.json. Exit code is non-zero on any failure.

Running the extension side

node fixtures/run-node.mjs

The Node runner exercises query-extraction.json and shared-normalization.json against a hand-mirrored copy of normalizeId from content.js. Because content.js lives inside an injected IIFE in the extension repo, it can't be imported directly — the runner duplicates the regexes (ID_RE_DASHED, ID_RE_UNDASHED, BUILTIN_ID_NORMALIZERS).

If you change any of those in content.js, mirror the change at the top of fixtures/run-node.mjs. shared-normalization.json catches silent cross-side drift because both Python and Node exercise it; a case that passes Python but fails Node (or vice versa) is the canary.

Adding a case

Pick the file matching the surface you're testing.
Append a { "name", "input", "expected" } entry. Keep name descriptive — it's the only label shown when the runner fails.
If the case exercises a guarantee both sides must honor, add it to shared-normalization.json as well.
Run python fixtures/run.py to confirm Python still passes.

Known cross-side divergences (intentional)

These are NOT bugs — they reflect the different surfaces each side extracts from. Recorded here so future contributors don't try to "fix" them.

FC2PPV1841460 compact form (no dashes). The extension's BUILTIN_ID_NORMALIZERS in content.js rewrites this to FC2-PPV-1841460 when seen in page titles. Python extract_id does NOT — the compact form doesn't realistically appear in filenames on disk. Hence the case lives in query-extraction.json only, not in filename-extraction.json or shared-normalization.json.

If a case belongs to one side's contract but not the other's, file it under the specific domain (filename- or query-) — not under shared-.

Ownership

This directory lives in the Python repo only because the Python repo is the more stable root. Conceptually it's joint property of both codebases. Don't add anything Python-specific to the JSON files — keep them tool-neutral.