rclone-jav/fixtures/README.md

# Shared JAV ID fixture corpus

JSON cases shared between the Python `rc-jav.py` CLI and the browser
extension at `D:\DEV\Extensions\Production\rclone-jav\`. Each side
reads the cases relevant to its own extraction surface.

## Files

| File                          | Domain   | Consumer                               | Notes |
|-------------------------------|----------|----------------------------------------|-------|
| `filename-extraction.json`    | filename | Python `extract_id(name)`              | Has `#partN` expectations for multipart files |
| `query-extraction.json`       | query    | Extension `content.js` `normalizeId`   | Looser context; extension never emits part suffix |
| `shared-normalization.json`   | shared   | BOTH                                   | Contract: any mismatch here is a bug, not a fixture issue |

All files share the same shape:

```json
{
  "version": 1,
  "domain": "…",
  "description": "…",
  "case_schema": { … },
  "cases": [
    { "name": "…", "input": "…", "expected": "…" }
  ]
}
```

`expected: null` means "no ID should be detected".

## Running the Python side

```bash
python fixtures/run.py
```

The runner imports `rc-jav.py` in place, exercises `extract_id` against
`filename-extraction.json`, and `normalize_id` against
`shared-normalization.json`. Exit code is non-zero on any failure.

## Running the extension side

No automated runner today. `content.js` lives inside an IIFE that the
browser injects into pages, so importing it from Node would require
either an extraction refactor or a duplicated copy of the regex. Until
that lands, treat `query-extraction.json` and `shared-normalization.json`
as the canonical specification: if you touch `ID_RE_DASHED`,
`ID_RE_UNDASHED`, or `BUILTIN_ID_NORMALIZERS` in content.js, eyeball
this corpus and confirm the cases still describe expected behavior.

## Adding a case

1. Pick the file matching the surface you're testing.
2. Append a `{ "name", "input", "expected" }` entry. Keep `name`
   descriptive — it's the only label shown when the runner fails.
3. If the case exercises a guarantee both sides must honor, add it to
   `shared-normalization.json` as well.
4. Run `python fixtures/run.py` to confirm Python still passes.

## Known cross-side divergences (intentional)

These are NOT bugs — they reflect the different surfaces each side
extracts from. Recorded here so future contributors don't try to
"fix" them.

- **`FC2PPV1841460` compact form (no dashes).** The extension's
  `BUILTIN_ID_NORMALIZERS` in `content.js` rewrites this to
  `FC2-PPV-1841460` when seen in page titles. Python `extract_id`
  does NOT — the compact form doesn't realistically appear in
  filenames on disk. Hence the case lives in
  `query-extraction.json` only, not in `filename-extraction.json` or
  `shared-normalization.json`.

If a case belongs to one side's contract but not the other's, file it
under the specific domain (`filename-` or `query-`) — not under
`shared-`.

## Ownership

This directory lives in the Python repo only because the Python repo
is the more stable root. Conceptually it's joint property of both
codebases. Don't add anything Python-specific to the JSON files — keep
them tool-neutral.