Files
rclone-jav/fixtures/README.md
T

3.2 KiB

Shared JAV ID fixture corpus

JSON cases shared between the Python rc-jav.py CLI and the browser extension at D:\DEV\Extensions\Production\rclone-jav\. Each side reads the cases relevant to its own extraction surface.

Files

File Domain Consumer Notes
filename-extraction.json filename Python extract_id(name) Has #partN expectations for multipart files
query-extraction.json query Extension content.js normalizeId Looser context; extension never emits part suffix
shared-normalization.json shared BOTH Contract: any mismatch here is a bug, not a fixture issue

All files share the same shape:

{
  "version": 1,
  "domain": "…",
  "description": "…",
  "case_schema": {  },
  "cases": [
    { "name": "…", "input": "…", "expected": "…" }
  ]
}

expected: null means "no ID should be detected".

Running the Python side

python fixtures/run.py

The runner imports rc-jav.py in place, exercises extract_id against filename-extraction.json, and normalize_id against shared-normalization.json. Exit code is non-zero on any failure.

Running the extension side

No automated runner today. content.js lives inside an IIFE that the browser injects into pages, so importing it from Node would require either an extraction refactor or a duplicated copy of the regex. Until that lands, treat query-extraction.json and shared-normalization.json as the canonical specification: if you touch ID_RE_DASHED, ID_RE_UNDASHED, or BUILTIN_ID_NORMALIZERS in content.js, eyeball this corpus and confirm the cases still describe expected behavior.

Adding a case

  1. Pick the file matching the surface you're testing.
  2. Append a { "name", "input", "expected" } entry. Keep name descriptive — it's the only label shown when the runner fails.
  3. If the case exercises a guarantee both sides must honor, add it to shared-normalization.json as well.
  4. Run python fixtures/run.py to confirm Python still passes.

Known cross-side divergences (intentional)

These are NOT bugs — they reflect the different surfaces each side extracts from. Recorded here so future contributors don't try to "fix" them.

  • FC2PPV1841460 compact form (no dashes). The extension's BUILTIN_ID_NORMALIZERS in content.js rewrites this to FC2-PPV-1841460 when seen in page titles. Python extract_id does NOT — the compact form doesn't realistically appear in filenames on disk. Hence the case lives in query-extraction.json only, not in filename-extraction.json or shared-normalization.json.

If a case belongs to one side's contract but not the other's, file it under the specific domain (filename- or query-) — not under shared-.

Ownership

This directory lives in the Python repo only because the Python repo is the more stable root. Conceptually it's joint property of both codebases. Don't add anything Python-specific to the JSON files — keep them tool-neutral.