Initial snapshot before step 10 package split
This commit is contained in:
@@ -0,0 +1,83 @@
|
||||
# Shared JAV ID fixture corpus
|
||||
|
||||
JSON cases shared between the Python `rc-jav.py` CLI and the browser
|
||||
extension at `D:\DEV\Extensions\Production\rclone-jav\`. Each side
|
||||
reads the cases relevant to its own extraction surface.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Domain | Consumer | Notes |
|
||||
|-------------------------------|----------|----------------------------------------|-------|
|
||||
| `filename-extraction.json` | filename | Python `extract_id(name)` | Has `#partN` expectations for multipart files |
|
||||
| `query-extraction.json` | query | Extension `content.js` `normalizeId` | Looser context; extension never emits part suffix |
|
||||
| `shared-normalization.json` | shared | BOTH | Contract: any mismatch here is a bug, not a fixture issue |
|
||||
|
||||
All files share the same shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 1,
|
||||
"domain": "…",
|
||||
"description": "…",
|
||||
"case_schema": { … },
|
||||
"cases": [
|
||||
{ "name": "…", "input": "…", "expected": "…" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`expected: null` means "no ID should be detected".
|
||||
|
||||
## Running the Python side
|
||||
|
||||
```bash
|
||||
python fixtures/run.py
|
||||
```
|
||||
|
||||
The runner imports `rc-jav.py` in place, exercises `extract_id` against
|
||||
`filename-extraction.json`, and `normalize_id` against
|
||||
`shared-normalization.json`. Exit code is non-zero on any failure.
|
||||
|
||||
## Running the extension side
|
||||
|
||||
No automated runner today. `content.js` lives inside an IIFE that the
|
||||
browser injects into pages, so importing it from Node would require
|
||||
either an extraction refactor or a duplicated copy of the regex. Until
|
||||
that lands, treat `query-extraction.json` and `shared-normalization.json`
|
||||
as the canonical specification: if you touch `ID_RE_DASHED`,
|
||||
`ID_RE_UNDASHED`, or `BUILTIN_ID_NORMALIZERS` in content.js, eyeball
|
||||
this corpus and confirm the cases still describe expected behavior.
|
||||
|
||||
## Adding a case
|
||||
|
||||
1. Pick the file matching the surface you're testing.
|
||||
2. Append a `{ "name", "input", "expected" }` entry. Keep `name`
|
||||
descriptive — it's the only label shown when the runner fails.
|
||||
3. If the case exercises a guarantee both sides must honor, add it to
|
||||
`shared-normalization.json` as well.
|
||||
4. Run `python fixtures/run.py` to confirm Python still passes.
|
||||
|
||||
## Known cross-side divergences (intentional)
|
||||
|
||||
These are NOT bugs — they reflect the different surfaces each side
|
||||
extracts from. Recorded here so future contributors don't try to
|
||||
"fix" them.
|
||||
|
||||
- **`FC2PPV1841460` compact form (no dashes).** The extension's
|
||||
`BUILTIN_ID_NORMALIZERS` in `content.js` rewrites this to
|
||||
`FC2-PPV-1841460` when seen in page titles. Python `extract_id`
|
||||
does NOT — the compact form doesn't realistically appear in
|
||||
filenames on disk. Hence the case lives in
|
||||
`query-extraction.json` only, not in `filename-extraction.json` or
|
||||
`shared-normalization.json`.
|
||||
|
||||
If a case belongs to one side's contract but not the other's, file it
|
||||
under the specific domain (`filename-` or `query-`) — not under
|
||||
`shared-`.
|
||||
|
||||
## Ownership
|
||||
|
||||
This directory lives in the Python repo only because the Python repo
|
||||
is the more stable root. Conceptually it's joint property of both
|
||||
codebases. Don't add anything Python-specific to the JSON files — keep
|
||||
them tool-neutral.
|
||||
Reference in New Issue
Block a user