8d6bdb81af
Mirrors `content.js` normalizeId() in a self-contained `fixtures/run-node.mjs`. Loads `query-extraction.json` and `shared-normalization.json` and asserts each case the same way the Python runner does. content.js can't be imported directly — it lives inside an injected IIFE in the extension — so the runner duplicates the regexes (ID_RE_DASHED, ID_RE_UNDASHED, BUILTIN_ID_NORMALIZERS). Inline comment + README update flag that they must be kept in sync. Why this matters: `shared-normalization.json` now actually catches cross-side drift. A case that passes one side but fails the other is the canary — without a Node runner, the contract was aspirational. Verified: $ node fixtures/run-node.mjs query-extraction.json -> normalizeId (10 cases): 10 passed shared-normalization.json -> normalizeId (5 cases): 5 passed OK: all 15 cases passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
92 lines
3.3 KiB
Markdown
92 lines
3.3 KiB
Markdown
# Shared JAV ID fixture corpus
|
|
|
|
JSON cases shared between the Python `rc-jav.py` CLI and the browser
|
|
extension at `D:\DEV\Extensions\Production\rclone-jav\`. Each side
|
|
reads the cases relevant to its own extraction surface.
|
|
|
|
## Files
|
|
|
|
| File | Domain | Consumer | Notes |
|
|
|-------------------------------|----------|----------------------------------------|-------|
|
|
| `filename-extraction.json` | filename | Python `extract_id(name)` | Has `#partN` expectations for multipart files |
|
|
| `query-extraction.json` | query | Extension `content.js` `normalizeId` | Looser context; extension never emits part suffix |
|
|
| `shared-normalization.json` | shared | BOTH | Contract: any mismatch here is a bug, not a fixture issue |
|
|
|
|
All files share the same shape:
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"domain": "…",
|
|
"description": "…",
|
|
"case_schema": { … },
|
|
"cases": [
|
|
{ "name": "…", "input": "…", "expected": "…" }
|
|
]
|
|
}
|
|
```
|
|
|
|
`expected: null` means "no ID should be detected".
|
|
|
|
## Running the Python side
|
|
|
|
```bash
|
|
python fixtures/run.py
|
|
```
|
|
|
|
The runner imports `rc-jav.py` in place, exercises `extract_id` against
|
|
`filename-extraction.json`, and `normalize_id` against
|
|
`shared-normalization.json`. Exit code is non-zero on any failure.
|
|
|
|
## Running the extension side
|
|
|
|
```bash
|
|
node fixtures/run-node.mjs
|
|
```
|
|
|
|
The Node runner exercises `query-extraction.json` and
|
|
`shared-normalization.json` against a hand-mirrored copy of
|
|
`normalizeId` from `content.js`. Because `content.js` lives inside an
|
|
injected IIFE in the extension repo, it can't be imported directly —
|
|
the runner duplicates the regexes (`ID_RE_DASHED`, `ID_RE_UNDASHED`,
|
|
`BUILTIN_ID_NORMALIZERS`).
|
|
|
|
If you change any of those in `content.js`, mirror the change at the
|
|
top of `fixtures/run-node.mjs`. `shared-normalization.json` catches
|
|
silent cross-side drift because both Python and Node exercise it; a
|
|
case that passes Python but fails Node (or vice versa) is the canary.
|
|
|
|
## Adding a case
|
|
|
|
1. Pick the file matching the surface you're testing.
|
|
2. Append a `{ "name", "input", "expected" }` entry. Keep `name`
|
|
descriptive — it's the only label shown when the runner fails.
|
|
3. If the case exercises a guarantee both sides must honor, add it to
|
|
`shared-normalization.json` as well.
|
|
4. Run `python fixtures/run.py` to confirm Python still passes.
|
|
|
|
## Known cross-side divergences (intentional)
|
|
|
|
These are NOT bugs — they reflect the different surfaces each side
|
|
extracts from. Recorded here so future contributors don't try to
|
|
"fix" them.
|
|
|
|
- **`FC2PPV1841460` compact form (no dashes).** The extension's
|
|
`BUILTIN_ID_NORMALIZERS` in `content.js` rewrites this to
|
|
`FC2-PPV-1841460` when seen in page titles. Python `extract_id`
|
|
does NOT — the compact form doesn't realistically appear in
|
|
filenames on disk. Hence the case lives in
|
|
`query-extraction.json` only, not in `filename-extraction.json` or
|
|
`shared-normalization.json`.
|
|
|
|
If a case belongs to one side's contract but not the other's, file it
|
|
under the specific domain (`filename-` or `query-`) — not under
|
|
`shared-`.
|
|
|
|
## Ownership
|
|
|
|
This directory lives in the Python repo only because the Python repo
|
|
is the more stable root. Conceptually it's joint property of both
|
|
codebases. Don't add anything Python-specific to the JSON files — keep
|
|
them tool-neutral.
|