diff --git a/CHANGELOG.md b/CHANGELOG.md index 28a90a2..b3a2eaf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -164,3 +164,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `PluginError` enum with `thiserror`: `SerdeJson(#[from])` preserves error source chain; dedicated variants for parse errors, subprocess failures, host response errors, unsupported URLs, and quality mismatches - 77 native unit tests covering all pure-logic modules (url_matcher, metadata, quality_manager, extractor, ipc handlers) — runs on `x86_64-unknown-linux-gnu` without a WASM runtime - Release WASM binary: 1.2 MB stripped with LTO + `opt-level = "z"` +- SoundCloud, Vimeo, and Gallery WASM plugins (Task 25) + - New crates under `plugins/`: `vortex-mod-soundcloud`, `vortex-mod-vimeo`, `vortex-mod-gallery`, all targeting `wasm32-wasip1` via `extism-pdk` 1.4 and delegating network I/O to the host `http_request` capability + - **SoundCloud plugin**: `/resolve` API client (api-v2.soundcloud.com) with tagged enum `ResolveResponse` (Track / Playlist / User / Unknown), `classify_url` router covering `soundcloud.com`, `m.soundcloud.com`, `on.soundcloud.com` (single-segment short-links treated as Track), plus `sets/`, `likes`, `reposts`, `tracks`, `albums` paths. Fragment-safe path normalisation (`#recent` no longer misclassifies), artwork upgrade from `-large` to `-t500x500` (handles `.ext`, extensionless, and query-string variants), `client_id` forwarded via host `get_config`. Artist profiles are intentionally rejected by `can_handle` / `supports_playlist` / `ensure_soundcloud_url` until artist pagination is implemented, avoiding a false-positive capability claim. 51 native unit tests. + - **Vimeo plugin**: oEmbed JSON client (`vimeo.com/api/oembed.json`) for metadata + player config client (`player.vimeo.com/video//config`) for quality variants (progressive MP4 + HLS). Balanced-brace HTML fallback with single- and double-quoted string tracking, plus a word-boundary marker (`window.playerConfig` / `playerConfig =`) so similarly named variables like `window.playerConfigVersion` cannot derail extraction. Deterministic HLS CDN fallback (lexicographic key order when `default_cdn` is missing). `pick_variant_for_quality` with `2K → 1440` / `4K → 2160` mapping, `filter_audio_only` preserving HLS, plus `default_quality` config honoured by hoisting the matching variant to the head of the returned list. Private-share URLs (`vimeo.com//`) are preserved verbatim in the response so the auth token is not dropped. Showcase URLs are rejected by `can_handle` / `supports_playlist` / `extract_links` until token-gated showcase extraction lands. Anchored showcase/album regex rejects malformed trailing segments. 57 native unit tests. + - **Gallery plugin**: 3 provider backends with dedicated JSON shapes — Imgur album API v3 (Authorization: Client-ID), Reddit submission JSON (native `is_gallery` + single-image preview fallback) with `&` unescaping and deterministic URL-sorted output (single-image fallback accepts `.jpg`/`.png`/… URLs with query strings and fragments). Flickr `flickr.photosets.getPhotos` handles both numeric and string `width_o`/`height_o`, and `{"stat":"fail"}` envelopes surface as a `PluginError::HttpStatus` with the Flickr error `code`/`message` instead of a JSON parse failure. Generic `` HTML fallback behind a separate `extract_generic` export; relative URLs now resolve against the **page directory** (preserving `gallery/` context), protocol-relative URLs inherit the **page scheme** (no forced `https:`), and `UrlContext` strips `?`/`#` when computing the origin and base directory. `has_non_http_scheme` guard blocks `data:`/`javascript:`/`mailto:`/`blob:` from resolution. Fragment-stripping URL normaliser; `extract_reddit_permalink` no longer double-appends `.json` when the input already ends in `.json`. Post-processing pipeline: `dedupe_links` → `filter_by_min_resolution` (now drops images with a single known dimension below the threshold, not just both-known cases) → `auto_name` (zero-padded `__.` with album-id sanitisation). Canonical `Provider` enum lives in `url_matcher.rs` and is re-exported from `link.rs`, eliminating the duplicated type surface. Runtime `min_resolution` fallback (`800x600`) now matches the manifest default. 49 native unit tests. + - Shared host-function envelope pattern: every plugin models `HttpRequest`/`HttpResponse` to mirror `src-tauri/src/adapters/driven/plugin/host_functions.rs`, with `HttpResponse::into_success_body()` mapping 401/403 → `PluginError::Private` and other non-2xx → `PluginError::HttpStatus` + - `PluginError` per crate via `thiserror` with `SerdeJson(#[from])`, no `.unwrap()` in production paths, no `#[allow(dead_code)]`, no `unsafe` outside documented `#[host_fn]` call sites + - Release WASM binaries: SoundCloud ~250 KB, Vimeo ~1.12 MB, Gallery ~1.14 MB (all stripped with LTO + `opt-level = "z"`) diff --git a/plugins/vortex-mod-gallery/.cargo/config.toml b/plugins/vortex-mod-gallery/.cargo/config.toml new file mode 100644 index 0000000..6b509f5 --- /dev/null +++ b/plugins/vortex-mod-gallery/.cargo/config.toml @@ -0,0 +1,2 @@ +[build] +target = "wasm32-wasip1" diff --git a/plugins/vortex-mod-gallery/Cargo.lock b/plugins/vortex-mod-gallery/Cargo.lock new file mode 100644 index 0000000..9448fd0 --- /dev/null +++ b/plugins/vortex-mod-gallery/Cargo.lock @@ -0,0 +1,558 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "anyhow" +version = "1.0.102" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" + +[[package]] +name = "autocfg" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" + +[[package]] +name = "base64" +version = "0.22.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" + +[[package]] +name = "bytemuck" +version = "1.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec" + +[[package]] +name = "bytes" +version = "1.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33" + +[[package]] +name = "cfg-if" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" + +[[package]] +name = "either" +version = "1.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" + +[[package]] +name = "equivalent" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" + +[[package]] +name = "extism-convert" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec1a8eac059a1730a21aa47f99a0c2075ba0ab88fd0c4e52e35027cf99cdf3e7" +dependencies = [ + "anyhow", + "base64", + "bytemuck", + "extism-convert-macros", + "prost", + "rmp-serde", + "serde", + "serde_json", +] + +[[package]] +name = "extism-convert-macros" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "848f105dd6e1af2ea4bb4a76447658e8587167df3c4e4658c4258e5b14a5b051" +dependencies = [ + "manyhow", + "proc-macro-crate", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "extism-manifest" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "953a22ad322939ae4567ec73a34913a3a43dcbdfa648b8307d38fe56bb3a0acd" +dependencies = [ + "base64", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "352fcb5a66eb74145a1c4a01f2bd15d59c62c85be73aac8471880c65b26b798f" +dependencies = [ + "anyhow", + "base64", + "extism-convert", + "extism-manifest", + "extism-pdk-derive", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk-derive" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d086daea5fd844e3c5ac69ddfe36df4a9a43e7218cf7d1f888182b089b09806c" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-timer" +version = "3.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-core", + "futures-macro", + "futures-task", + "pin-project-lite", + "slab", +] + +[[package]] +name = "glob" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280" + +[[package]] +name = "hashbrown" +version = "0.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4f467dd6dccf739c208452f8014c75c18bb8301b050ad1cfb27153803edb0f51" + +[[package]] +name = "indexmap" +version = "2.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" +dependencies = [ + "equivalent", + "hashbrown", +] + +[[package]] +name = "itertools" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "manyhow" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b33efb3ca6d3b07393750d4030418d594ab1139cee518f0dc88db70fec873587" +dependencies = [ + "manyhow-macros", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "manyhow-macros" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46fce34d199b78b6e6073abf984c9cf5fd3e9330145a93ee0738a7443e371495" +dependencies = [ + "proc-macro-utils", + "proc-macro2", + "quote", +] + +[[package]] +name = "memchr" +version = "2.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", +] + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "proc-macro-crate" +version = "3.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e67ba7e9b2b56446f1d419b1d807906278ffa1a658a8a5d8a39dcb1f5a78614f" +dependencies = [ + "toml_edit", +] + +[[package]] +name = "proc-macro-utils" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "eeaf08a13de400bc215877b5bdc088f241b12eb42f0a548d3390dc1c56bb7071" +dependencies = [ + "proc-macro2", + "quote", + "smallvec", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "prost" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d2ea70524a2f82d518bce41317d0fae74151505651af45faf1ffbd6fd33f0568" +dependencies = [ + "bytes", + "prost-derive", +] + +[[package]] +name = "prost-derive" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "27c6023962132f4b30eb4c172c91ce92d933da334c59c23cddee82358ddafb0b" +dependencies = [ + "anyhow", + "itertools", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "regex" +version = "1.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" + +[[package]] +name = "relative-path" +version = "1.9.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ba39f3699c378cd8970968dcbff9c43159ea4cfbd88d43c00b22f2ef10a435d2" + +[[package]] +name = "rmp" +version = "0.8.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ba8be72d372b2c9b35542551678538b562e7cf86c3315773cae48dfbfe7790c" +dependencies = [ + "num-traits", +] + +[[package]] +name = "rmp-serde" +version = "1.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72f81bee8c8ef9b577d1681a70ebbc962c232461e397b22c208c43c04b67a155" +dependencies = [ + "rmp", + "serde", +] + +[[package]] +name = "rstest" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "03e905296805ab93e13c1ec3a03f4b6c4f35e9498a3d5fa96dc626d22c03cd89" +dependencies = [ + "futures-timer", + "futures-util", + "rstest_macros", + "rustc_version", +] + +[[package]] +name = "rstest_macros" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef0053bbffce09062bee4bcc499b0fbe7a57b879f1efe088d6d8d4c7adcdef9b" +dependencies = [ + "cfg-if", + "glob", + "proc-macro-crate", + "proc-macro2", + "quote", + "regex", + "relative-path", + "rustc_version", + "syn", + "unicode-ident", +] + +[[package]] +name = "rustc_version" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92" +dependencies = [ + "semver", +] + +[[package]] +name = "semver" +version = "1.0.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8a7852d02fc848982e0c167ef163aaff9cd91dc640ba85e263cb1ce46fae51cd" + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.149" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "smallvec" +version = "1.15.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "thiserror" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "toml_datetime" +version = "1.1.1+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3165f65f62e28e0115a00b2ebdd37eb6f3b641855f9d636d3cd4103767159ad7" +dependencies = [ + "serde_core", +] + +[[package]] +name = "toml_edit" +version = "0.25.11+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b59c4d22ed448339746c59b905d24568fcbb3ab65a500494f7b8c3e97739f2b" +dependencies = [ + "indexmap", + "toml_datetime", + "toml_parser", + "winnow", +] + +[[package]] +name = "toml_parser" +version = "1.1.2+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2abe9b86193656635d2411dc43050282ca48aa31c2451210f4202550afb7526" +dependencies = [ + "winnow", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "vortex-mod-gallery" +version = "1.0.0" +dependencies = [ + "extism-pdk", + "regex", + "rstest", + "serde", + "serde_json", + "thiserror", +] + +[[package]] +name = "winnow" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09dac053f1cd375980747450bfc7250c264eaae0583872e845c0c7cd578872b5" +dependencies = [ + "memchr", +] + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/plugins/vortex-mod-gallery/Cargo.toml b/plugins/vortex-mod-gallery/Cargo.toml new file mode 100644 index 0000000..1883d76 --- /dev/null +++ b/plugins/vortex-mod-gallery/Cargo.toml @@ -0,0 +1,26 @@ +[package] +name = "vortex-mod-gallery" +version = "1.0.0" +edition = "2021" +description = "Gallery WASM plugin for Vortex — Imgur, Reddit, Flickr image galleries" +license = "GPL-3.0" +authors = ["vortex-community"] + +[lib] +crate-type = ["cdylib", "rlib"] + +[dependencies] +extism-pdk = "1.4" +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +regex = "1.11" +thiserror = "2.0" + +[dev-dependencies] +rstest = "0.24" + +[profile.release] +opt-level = "z" +lto = true +codegen-units = 1 +strip = true diff --git a/plugins/vortex-mod-gallery/README.md b/plugins/vortex-mod-gallery/README.md new file mode 100644 index 0000000..3952415 --- /dev/null +++ b/plugins/vortex-mod-gallery/README.md @@ -0,0 +1,54 @@ +# vortex-mod-gallery + +Gallery WASM plugin for [Vortex](https://github.com/mpiton/vortex): +extracts direct image URLs from Imgur albums, Reddit galleries, +Flickr photosets, and a generic `` fallback. + +## Features + +- **Imgur**: `imgur.com/a/` and `imgur.com/gallery/` via the v3 + Album API (Authorization: Client-ID header) +- **Reddit**: native gallery (`is_gallery: true` + `media_metadata`) + and single-image submissions (preview source fallback) with + `&` unescaping +- **Flickr**: `flickr.photosets.getPhotos` REST API with `extras=url_o,url_l` +- **Generic**: `` scraping with scheme guard + (`data:` / `javascript:` / `mailto:` / `blob:` rejected) and + relative-URL resolution against the page origin +- Post-processing: dedupe, minimum-resolution filter, auto-naming + (`__.`) + +## Requirements + +- Vortex plugin host ≥ 0.1.0 with `http_request` and `get_config` + host functions enabled. +- Imgur API `client_id` (config key `imgur_client_id`). +- Flickr API key (config key `flickr_api_key`). + +## Build + +```bash +rustup target add wasm32-wasip1 +cargo build --release +``` + +Resulting WASM: `target/wasm32-wasip1/release/vortex_mod_gallery.wasm`. + +## Install + +```bash +mkdir -p ~/.config/vortex/plugins/vortex-mod-gallery +cp plugin.toml ~/.config/vortex/plugins/vortex-mod-gallery/ +cp target/wasm32-wasip1/release/vortex_mod_gallery.wasm \ + ~/.config/vortex/plugins/vortex-mod-gallery/vortex-mod-gallery.wasm +``` + +## Tests + +```bash +cargo test --target x86_64-unknown-linux-gnu +``` + +Every provider parser is covered with hardcoded JSON fixtures. The +generic HTML scraper has dedicated tests for relative-URL resolution +and scheme filtering. diff --git a/plugins/vortex-mod-gallery/plugin.toml b/plugins/vortex-mod-gallery/plugin.toml new file mode 100644 index 0000000..f0919ed --- /dev/null +++ b/plugins/vortex-mod-gallery/plugin.toml @@ -0,0 +1,17 @@ +[plugin] +name = "vortex-mod-gallery" +version = "1.0.0" +category = "crawler" +author = "vortex-community" +description = "Image galleries: Imgur, Reddit, Flickr, generic scraping" +license = "GPL-3.0" +min_vortex_version = "0.1.0" + +[capabilities] +# Gallery providers use HTTP APIs (Imgur v3, Reddit .json, Flickr REST) plus +# HTML scraping for the generic fallback. +http = true + +[config] +min_resolution = { type = "string", default = "800x600", description = "Filter by minimum image size (WxH)" } +auto_name = { type = "boolean", default = true, description = "Auto-generate filenames from provider metadata" } diff --git a/plugins/vortex-mod-gallery/src/error.rs b/plugins/vortex-mod-gallery/src/error.rs new file mode 100644 index 0000000..892906a --- /dev/null +++ b/plugins/vortex-mod-gallery/src/error.rs @@ -0,0 +1,25 @@ +//! Plugin error type. + +use thiserror::Error; + +/// Errors raised by the Gallery plugin. +#[derive(Debug, Error)] +pub enum PluginError { + #[error("Gallery JSON parse error: {0}")] + ParseJson(String), + + #[error("JSON error: {0}")] + SerdeJson(#[from] serde_json::Error), + + #[error("Gallery API returned status {status}: {message}")] + HttpStatus { status: u16, message: String }, + + #[error("host function response invalid: {0}")] + HostResponse(String), + + #[error("URL is not a recognised gallery resource: {0}")] + UnsupportedUrl(String), + + #[error("invalid min_resolution '{0}' — expected WxH")] + InvalidMinResolution(String), +} diff --git a/plugins/vortex-mod-gallery/src/filter.rs b/plugins/vortex-mod-gallery/src/filter.rs new file mode 100644 index 0000000..57741bc --- /dev/null +++ b/plugins/vortex-mod-gallery/src/filter.rs @@ -0,0 +1,235 @@ +//! Post-extraction filtering and naming helpers. +//! +//! - [`parse_min_resolution`] turns a user-facing `"WxH"` string into a +//! `(width, height)` tuple. +//! - [`filter_by_min_resolution`] drops images that are known to be +//! below the threshold. Images with unknown dimensions are kept, +//! because the HTTP HEAD check to discover them is out of scope for +//! the plugin (the download engine may re-check later). +//! - [`dedupe_links`] removes duplicate URLs while preserving order. +//! - [`auto_name`] produces a stable filename from provider + id + index. + +use crate::error::PluginError; +use crate::link::{ImageLink, Provider}; + +/// Parse a `"WxH"` string into `(width, height)`. +pub fn parse_min_resolution(input: &str) -> Result<(u32, u32), PluginError> { + let trimmed = input.trim().to_ascii_lowercase(); + if trimmed.is_empty() || trimmed == "0x0" { + return Ok((0, 0)); + } + let (w, h) = trimmed + .split_once('x') + .ok_or_else(|| PluginError::InvalidMinResolution(input.to_string()))?; + let w: u32 = w + .trim() + .parse() + .map_err(|_| PluginError::InvalidMinResolution(input.to_string()))?; + let h: u32 = h + .trim() + .parse() + .map_err(|_| PluginError::InvalidMinResolution(input.to_string()))?; + Ok((w, h)) +} + +/// Drop images strictly smaller than `min_w × min_h`. +/// +/// Policy for partial information: +/// +/// - Both dimensions unknown → keep (benefit-of-the-doubt; downstream +/// HEAD check can re-verify). +/// - Both known → drop if either axis is below the minimum. +/// - Only one axis known → drop if *that* axis is below its minimum, +/// otherwise keep. A known small width is a firm "too small" signal +/// and should not leak through just because the height is missing. +pub fn filter_by_min_resolution(links: Vec, min_w: u32, min_h: u32) -> Vec { + if min_w == 0 && min_h == 0 { + return links; + } + links + .into_iter() + .filter(|l| match (l.width, l.height) { + (Some(w), Some(h)) => w >= min_w && h >= min_h, + (Some(w), None) => w >= min_w, + (None, Some(h)) => h >= min_h, + (None, None) => true, + }) + .collect() +} + +/// Remove duplicate URLs while preserving first-seen order. +pub fn dedupe_links(links: Vec) -> Vec { + let mut seen = std::collections::HashSet::::new(); + let mut out = Vec::with_capacity(links.len()); + for link in links { + if seen.insert(link.url.clone()) { + out.push(link); + } + } + out +} + +/// Produce `__` as a stable auto-name when +/// `auto_name` is enabled. Index is zero-padded to 3 digits so files +/// sort lexicographically. +pub fn auto_name(provider: Provider, album_id: &str, index: usize, url: &str) -> String { + let provider = match provider { + Provider::Imgur => "imgur", + Provider::Reddit => "reddit", + Provider::Flickr => "flickr", + Provider::Generic => "web", + }; + let ext = guess_ext_from_url(url).unwrap_or("jpg"); + let safe_album = sanitize(album_id); + format!("{provider}_{safe_album}_{index:03}.{ext}") +} + +fn guess_ext_from_url(url: &str) -> Option<&str> { + // Strip the fragment first, then the query — an input like + // `image.avif#foo` would otherwise leave `avif#foo` in `ext` + // and fail the alphanumeric check. + let without_frag = url.split('#').next().unwrap_or(url); + let path = without_frag.split('?').next().unwrap_or(without_frag); + let dot = path.rfind('.')?; + let ext = &path[dot + 1..]; + if (1..=5).contains(&ext.len()) && ext.chars().all(|c| c.is_ascii_alphanumeric()) { + Some(ext) + } else { + None + } +} + +fn sanitize(input: &str) -> String { + input + .chars() + .map(|c| { + if c.is_ascii_alphanumeric() || c == '-' { + c + } else { + '_' + } + }) + .collect() +} + +#[cfg(test)] +mod tests { + use super::*; + + fn link(url: &str, w: Option, h: Option) -> ImageLink { + ImageLink { + url: url.to_string(), + width: w, + height: h, + title: None, + filename: None, + } + } + + #[test] + fn parse_min_resolution_happy_path() { + assert_eq!(parse_min_resolution("800x600").unwrap(), (800, 600)); + assert_eq!(parse_min_resolution("1920X1080").unwrap(), (1920, 1080)); + } + + #[test] + fn parse_min_resolution_zero_returns_0x0() { + assert_eq!(parse_min_resolution("0x0").unwrap(), (0, 0)); + assert_eq!(parse_min_resolution("").unwrap(), (0, 0)); + } + + #[test] + fn parse_min_resolution_invalid() { + let err = parse_min_resolution("tall").unwrap_err(); + assert!(matches!(err, PluginError::InvalidMinResolution(_))); + } + + #[test] + fn filter_by_min_resolution_drops_small_keeps_unknown() { + let input = vec![ + link("a.jpg", Some(400), Some(300)), // drop + link("b.jpg", Some(1200), Some(800)), // keep + link("c.jpg", None, None), // keep (unknown) + link("d.jpg", Some(800), Some(600)), // keep (exact match) + ]; + let out = filter_by_min_resolution(input, 800, 600); + let urls: Vec<_> = out.iter().map(|l| l.url.as_str()).collect(); + assert_eq!(urls, vec!["b.jpg", "c.jpg", "d.jpg"]); + } + + #[test] + fn filter_by_min_resolution_drops_known_partial_below_threshold() { + // A firmly-known small width must drop even if height is + // unknown — the old policy leaked such images through because + // "any partial info" was treated as "benefit of the doubt". + let input = vec![ + link("small-w.jpg", Some(400), None), // drop (width too small) + link("big-w.jpg", Some(1920), None), // keep + link("small-h.jpg", None, Some(300)), // drop (height too small) + link("big-h.jpg", None, Some(1080)), // keep + link("both-none.jpg", None, None), // keep (unknown) + ]; + let out = filter_by_min_resolution(input, 800, 600); + let urls: Vec<_> = out.iter().map(|l| l.url.as_str()).collect(); + assert_eq!(urls, vec!["big-w.jpg", "big-h.jpg", "both-none.jpg"]); + } + + #[test] + fn filter_by_min_resolution_zero_is_noop() { + let input = vec![link("a.jpg", Some(1), Some(1))]; + let out = filter_by_min_resolution(input, 0, 0); + assert_eq!(out.len(), 1); + } + + #[test] + fn dedupe_links_preserves_first_seen_order() { + let input = vec![ + link("a.jpg", None, None), + link("b.jpg", None, None), + link("a.jpg", None, None), + link("c.jpg", None, None), + ]; + let out = dedupe_links(input); + let urls: Vec<_> = out.iter().map(|l| l.url.as_str()).collect(); + assert_eq!(urls, vec!["a.jpg", "b.jpg", "c.jpg"]); + } + + #[test] + fn auto_name_pads_index_and_uses_extension() { + let name = auto_name(Provider::Imgur, "abcd123", 0, "https://i.imgur.com/xyz.png"); + assert_eq!(name, "imgur_abcd123_000.png"); + } + + #[test] + fn auto_name_defaults_extension_when_missing() { + let name = auto_name(Provider::Reddit, "1abc", 7, "https://preview.redd.it/noext"); + assert_eq!(name, "reddit_1abc_007.jpg"); + } + + #[test] + fn auto_name_sanitizes_album_id() { + let name = auto_name(Provider::Flickr, "72177/bad id", 12, "a.jpeg"); + assert_eq!(name, "flickr_72177_bad_id_012.jpeg"); + } + + #[test] + fn guess_ext_strips_query_string() { + assert_eq!( + guess_ext_from_url("https://x.com/a.webp?sig=abc"), + Some("webp") + ); + } + + #[test] + fn guess_ext_strips_fragment() { + assert_eq!(guess_ext_from_url("https://x.com/a.avif#foo"), Some("avif")); + } + + #[test] + fn guess_ext_strips_fragment_and_query() { + assert_eq!( + guess_ext_from_url("https://x.com/a.png?v=2#frag"), + Some("png") + ); + } +} diff --git a/plugins/vortex-mod-gallery/src/lib.rs b/plugins/vortex-mod-gallery/src/lib.rs new file mode 100644 index 0000000..40c5db2 --- /dev/null +++ b/plugins/vortex-mod-gallery/src/lib.rs @@ -0,0 +1,217 @@ +//! Vortex Gallery WASM plugin. +//! +//! Extracts direct image links from Imgur albums, Reddit galleries, +//! Flickr photosets, and a generic `` fallback for any HTTP page. +//! +//! Implements the CrawlerModule contract: +//! - `can_handle(url)` → `"true"` / `"false"` (recognised providers only) +//! - `extract_links(url)` → JSON string with `ImageLink` entries +//! +//! All network I/O is delegated to the host via `http_request`. + +pub mod error; +pub mod filter; +pub mod link; +pub mod providers; +pub mod url_matcher; + +#[cfg(target_family = "wasm")] +mod plugin_api; + +use serde::Serialize; + +use crate::error::PluginError; +// `link::Provider` is the canonical `url_matcher::Provider` re-exported +// through `link.rs` — importing from either path resolves to the same +// type, so there is no `From` conversion needed here. +use crate::link::ImageLink; +use crate::url_matcher::Provider; + +// ── IPC DTOs ────────────────────────────────────────────────────────────────── + +#[derive(Debug, Serialize, PartialEq, Eq)] +pub struct ExtractLinksResponse { + pub kind: &'static str, + pub provider: Provider, + pub images: Vec, +} + +// ── Routing helpers ────────────────────────────────────────────────────────── + +/// Returns `"true"` if the URL maps to a *recognised* provider. The +/// generic fallback is **not** reported as handleable — that would +/// amount to claiming ownership of every HTTP(S) page on the internet, +/// which would break other plugin ranking heuristics. +pub fn handle_can_handle(url: &str) -> String { + bool_to_string(url_matcher::is_recognised_provider(url)) +} + +pub fn handle_supports_playlist(url: &str) -> String { + // Every recognised provider is a multi-image collection → "true". + handle_can_handle(url) +} + +fn bool_to_string(b: bool) -> String { + if b { + "true".into() + } else { + "false".into() + } +} + +pub fn ensure_recognised_url(url: &str) -> Result { + match url_matcher::classify_url(url) { + Some(p @ (Provider::Imgur | Provider::Reddit | Provider::Flickr)) => Ok(p), + _ => Err(PluginError::UnsupportedUrl(url.to_string())), + } +} + +// ── Response building ──────────────────────────────────────────────────────── + +/// Post-process raw provider images: dedupe, filter by minimum +/// resolution, then optionally attach auto-generated filenames. +pub fn finalize_links( + provider: Provider, + album_id: &str, + images: Vec, + min_resolution: &str, + auto_name_enabled: bool, +) -> Result { + let (min_w, min_h) = filter::parse_min_resolution(min_resolution)?; + let deduped = filter::dedupe_links(images); + let filtered = filter::filter_by_min_resolution(deduped, min_w, min_h); + + let images: Vec = if auto_name_enabled { + filtered + .into_iter() + .enumerate() + .map(|(idx, mut link)| { + link.filename = Some(filter::auto_name(provider, album_id, idx, &link.url)); + link + }) + .collect() + } else { + filtered + }; + + Ok(ExtractLinksResponse { + kind: "gallery", + provider, + images, + }) +} + +#[cfg(test)] +mod tests { + use super::*; + + fn sample_images() -> Vec { + vec![ + ImageLink { + url: "https://i.imgur.com/a.jpg".into(), + width: Some(1920), + height: Some(1080), + title: Some("one".into()), + filename: None, + }, + ImageLink { + url: "https://i.imgur.com/a.jpg".into(), // duplicate + width: Some(1920), + height: Some(1080), + title: Some("one".into()), + filename: None, + }, + ImageLink { + url: "https://i.imgur.com/b.png".into(), + width: Some(400), + height: Some(300), + title: Some("small".into()), + filename: None, + }, + ImageLink { + url: "https://i.imgur.com/c.gif".into(), + width: None, + height: None, + title: None, + filename: None, + }, + ] + } + + #[test] + fn can_handle_true_for_imgur() { + assert_eq!(handle_can_handle("https://imgur.com/a/abcd"), "true"); + } + + #[test] + fn can_handle_true_for_reddit() { + assert_eq!( + handle_can_handle("https://www.reddit.com/r/pics/comments/1abc/title"), + "true" + ); + } + + #[test] + fn can_handle_true_for_flickr() { + assert_eq!( + handle_can_handle("https://www.flickr.com/photos/bob/albums/72177"), + "true" + ); + } + + #[test] + fn can_handle_false_for_unrelated() { + assert_eq!(handle_can_handle("https://example.com/"), "false"); + } + + #[test] + fn can_handle_false_for_ftp() { + assert_eq!(handle_can_handle("ftp://imgur.com/a/abcd"), "false"); + } + + #[test] + fn finalize_dedupes_filters_and_auto_names() { + let resp = + finalize_links(Provider::Imgur, "abcd", sample_images(), "800x600", true).unwrap(); + assert_eq!(resp.kind, "gallery"); + assert_eq!(resp.provider, Provider::Imgur); + // 4 input → dedupe to 3 → filter drops 400x300 → 2 kept + assert_eq!(resp.images.len(), 2); + assert_eq!( + resp.images[0].filename.as_deref(), + Some("imgur_abcd_000.jpg") + ); + assert_eq!( + resp.images[1].filename.as_deref(), + Some("imgur_abcd_001.gif") + ); + } + + #[test] + fn finalize_without_auto_name_preserves_filename_none() { + let resp = finalize_links(Provider::Imgur, "abcd", sample_images(), "0x0", false).unwrap(); + assert!(resp.images.iter().all(|img| img.filename.is_none())); + } + + #[test] + fn finalize_invalid_min_resolution_errors() { + let err = finalize_links(Provider::Imgur, "abcd", vec![], "bad", false).unwrap_err(); + assert!(matches!(err, PluginError::InvalidMinResolution(_))); + } + + #[test] + fn ensure_recognised_url_rejects_generic_fallback() { + let err = ensure_recognised_url("https://example.com/page").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn serialisation_of_extract_links_response() { + let resp = finalize_links(Provider::Flickr, "72177", sample_images(), "0x0", true).unwrap(); + let json = serde_json::to_string(&resp).unwrap(); + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap(); + assert_eq!(parsed["kind"], "gallery"); + assert_eq!(parsed["provider"], "flickr"); + assert_eq!(parsed["images"].as_array().unwrap().len(), 3); + } +} diff --git a/plugins/vortex-mod-gallery/src/link.rs b/plugins/vortex-mod-gallery/src/link.rs new file mode 100644 index 0000000..15bd781 --- /dev/null +++ b/plugins/vortex-mod-gallery/src/link.rs @@ -0,0 +1,27 @@ +//! Shared image-link type used across providers. +//! +//! The `Provider` enum lives in `url_matcher.rs` (it's the classifier +//! output) and is re-exported here as `link::Provider` so that any +//! module depending on `link.rs` — notably `filter.rs` — sees exactly +//! the same type the matcher produces. Keeping a single canonical +//! definition prevents the two-enum drift hazard flagged during PR +//! review. + +use serde::Serialize; + +pub use crate::url_matcher::Provider; + +/// A single image discovered by a provider. +#[derive(Debug, Clone, Serialize, PartialEq, Eq)] +pub struct ImageLink { + /// Direct HTTPS URL to the image file. + pub url: String, + /// Width in pixels, if known. + pub width: Option, + /// Height in pixels, if known. + pub height: Option, + /// Human title / caption, if known. + pub title: Option, + /// Auto-generated filename, if [`crate::filter::auto_name`] ran. + pub filename: Option, +} diff --git a/plugins/vortex-mod-gallery/src/plugin_api.rs b/plugins/vortex-mod-gallery/src/plugin_api.rs new file mode 100644 index 0000000..3dacf98 --- /dev/null +++ b/plugins/vortex-mod-gallery/src/plugin_api.rs @@ -0,0 +1,210 @@ +//! WASM-only module: `#[plugin_fn]` exports and `#[host_fn]` imports. + +use extism_pdk::*; + +use crate::error::PluginError; +use crate::providers::{ + build_flickr_request, build_generic_request, build_imgur_album_request, build_reddit_request, + parse_flickr_photoset, parse_generic_html, parse_http_response, parse_imgur_album, + parse_reddit_submission, +}; +use crate::url_matcher::{ + classify_url, extract_flickr_album_id, extract_imgur_id, extract_reddit_permalink, Provider, +}; +use crate::{ + bool_to_string, ensure_recognised_url, finalize_links, handle_can_handle, + handle_supports_playlist, +}; + +#[host_fn] +extern "ExtismHost" { + fn http_request(req: String) -> String; + fn get_config(key: String) -> String; +} + +#[plugin_fn] +pub fn can_handle(url: String) -> FnResult { + Ok(handle_can_handle(&url)) +} + +#[plugin_fn] +pub fn supports_playlist(url: String) -> FnResult { + Ok(handle_supports_playlist(&url)) +} + +#[plugin_fn] +pub fn extract_links(url: String) -> FnResult { + let provider = ensure_recognised_url(&url).map_err(error_to_fn_error)?; + + let images = match provider { + Provider::Imgur => { + let album = extract_imgur_id(&url) + .ok_or_else(|| error_to_fn_error(PluginError::UnsupportedUrl(url.clone())))?; + let client_id = read_config("imgur_client_id"); + let body = http_get(build_imgur_album_request(&album, &client_id))?; + parse_imgur_album(&body).map_err(error_to_fn_error)? + } + Provider::Reddit => { + let json_url = extract_reddit_permalink(&url) + .ok_or_else(|| error_to_fn_error(PluginError::UnsupportedUrl(url.clone())))?; + let body = http_get(build_reddit_request(&json_url))?; + parse_reddit_submission(&body).map_err(error_to_fn_error)? + } + Provider::Flickr => { + let (_, album) = extract_flickr_album_id(&url) + .ok_or_else(|| error_to_fn_error(PluginError::UnsupportedUrl(url.clone())))?; + let api_key = read_config("flickr_api_key"); + let body = http_get(build_flickr_request(&album, &api_key))?; + parse_flickr_photoset(&body).map_err(error_to_fn_error)? + } + Provider::Generic => { + // `ensure_recognised_url` already rejected Generic — this + // arm is only reachable if the classifier changes. Surface + // a clear error so a future contributor gets an obvious + // nudge rather than silent wrong behaviour. + return Err(error_to_fn_error(PluginError::UnsupportedUrl( + "generic HTML fallback is not wired into extract_links".into(), + ))); + } + }; + + let album_id = album_id_for(provider, &url); + // Keep the runtime fallback in sync with the `min_resolution` + // default declared in `plugin.toml` — 800×600. If the manifest is + // ever edited, update this literal at the same time. + let min_res = read_config_or("min_resolution", "800x600"); + let auto_name = read_bool_config("auto_name", true); + + let response = finalize_links(provider, &album_id, images, &min_res, auto_name) + .map_err(error_to_fn_error)?; + Ok(serde_json::to_string(&response)?) +} + +/// Scrape `` tags from an arbitrary HTML page. This entry point is +/// intentionally separate from `extract_links` because the generic +/// fallback must be explicit — the host calls it only when no +/// recognised crawler matched the URL. +#[plugin_fn] +pub fn extract_generic(url: String) -> FnResult { + // Generic fallback still requires http(s) + if classify_url(&url).is_none() { + return Err(error_to_fn_error(PluginError::UnsupportedUrl(url))); + } + + let body = http_get(build_generic_request(&url))?; + let images = parse_generic_html(&body, &url); + + // Keep the runtime fallback in sync with the `min_resolution` + // default declared in `plugin.toml` — 800×600. If the manifest is + // ever edited, update this literal at the same time. + let min_res = read_config_or("min_resolution", "800x600"); + let auto_name = read_bool_config("auto_name", true); + let album_id = "page"; + + let response = finalize_links(Provider::Generic, album_id, images, &min_res, auto_name) + .map_err(error_to_fn_error)?; + // Override `kind` to signal this came from the generic fallback. + let json = serde_json::to_string(&response)?; + Ok(json) +} + +#[plugin_fn] +pub fn is_http_url(url: String) -> FnResult { + Ok(bool_to_string(classify_url(&url).is_some())) +} + +// ── Host function wiring ────────────────────────────────────────────────────── + +fn http_get(req_json: Result) -> FnResult { + let req = req_json.map_err(error_to_fn_error)?; + // SAFETY: `http_request` is resolved by the Vortex plugin host at + // load time (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_http_request_function`). Invariants: + // 1. The host registers `http_request` in the `ExtismHost` + // namespace before any `#[plugin_fn]` export is callable — a + // missing symbol would abort `Plugin::new` in extism_loader.rs + // and prevent the plugin from being loaded. + // 2. The ABI is `(I64) -> I64` — a single u64 Extism memory + // handle in, a single u64 handle out. The `#[host_fn]` macro + // marshals `String` to/from the memory handle. + // 3. The host enforces the `http` capability from the manifest + // before invoking the implementation; rejections return an + // error which `?` propagates safely. + // 4. Inputs/outputs are owned, serialisable JSON strings — no + // aliasing or mutability concerns. + let raw = unsafe { http_request(req)? }; + let resp = parse_http_response(&raw).map_err(error_to_fn_error)?; + resp.into_success_body().map_err(error_to_fn_error) +} + +fn album_id_for(provider: Provider, url: &str) -> String { + match provider { + Provider::Imgur => extract_imgur_id(url).unwrap_or_default(), + Provider::Flickr => extract_flickr_album_id(url) + .map(|(_, a)| a) + .unwrap_or_default(), + Provider::Reddit => { + // Derive the album id from the *canonical* permalink + // returned by `extract_reddit_permalink`, not from the raw + // input URL. The raw URL can carry a `?utm_source=...` + // query string or `#comment` fragment, both of which would + // leak into the generated filename if we used the input + // verbatim. The permalink has already been normalised (no + // query, no fragment, trailing `.json` appended), so + // stripping the `.json` suffix and taking the final path + // segment yields a clean `t3_`-style album id. + extract_reddit_permalink(url) + .as_deref() + .map(|permalink| { + permalink + .trim_end_matches(".json") + .trim_end_matches('/') + .rsplit('/') + .next() + .unwrap_or("reddit") + .to_string() + }) + .unwrap_or_else(|| "reddit".to_string()) + } + Provider::Generic => "page".to_string(), + } +} + +fn read_config(key: &str) -> String { + // SAFETY: `get_config` is registered host-side before plugin exports + // run (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_get_config_function`). Invariants: + // 1. The host registers the symbol in the `ExtismHost` namespace + // before any `#[plugin_fn]` export is callable — a missing + // symbol would abort `Plugin::new` in extism_loader.rs. + // 2. The ABI is `(I64) -> I64`; the `#[host_fn]` macro marshals + // `String` in/out. + // 3. The host returns an empty string when the key is unknown or + // an error for transient failures; both are mapped to the + // empty default so the plugin can surface a clean + // `PluginError::HttpStatus` downstream. + // 4. Inputs/outputs are owned JSON strings — no aliasing concerns. + unsafe { get_config(key.to_string()) }.unwrap_or_default() +} + +fn read_config_or(key: &str, default: &str) -> String { + let v = read_config(key); + if v.is_empty() { + default.to_string() + } else { + v + } +} + +fn read_bool_config(key: &str, default: bool) -> bool { + let v = read_config(key); + match v.to_ascii_lowercase().as_str() { + "true" | "1" | "yes" => true, + "false" | "0" | "no" => false, + _ => default, + } +} + +fn error_to_fn_error(err: PluginError) -> WithReturnCode { + extism_pdk::Error::msg(err.to_string()).into() +} diff --git a/plugins/vortex-mod-gallery/src/providers.rs b/plugins/vortex-mod-gallery/src/providers.rs new file mode 100644 index 0000000..17f71c1 --- /dev/null +++ b/plugins/vortex-mod-gallery/src/providers.rs @@ -0,0 +1,1125 @@ +//! Provider-specific parsers: Imgur, Reddit, Flickr, generic HTML. +//! +//! Each provider takes a raw API body (or HTML page for Generic) and +//! returns a list of [`ImageLink`]s. The providers are kept pure so they +//! can be unit-tested with hardcoded fixtures. + +use std::collections::HashMap; +use std::sync::OnceLock; + +use regex::Regex; +use serde::{Deserialize, Serialize}; + +use crate::error::PluginError; +use crate::link::ImageLink; + +// ── Host function envelope ──────────────────────────────────────────────────── + +#[derive(Debug, Serialize)] +pub struct HttpRequest { + pub method: String, + pub url: String, + #[serde(skip_serializing_if = "HashMap::is_empty")] + pub headers: HashMap, + #[serde(skip_serializing_if = "Option::is_none")] + pub body: Option, +} + +#[derive(Debug, Deserialize)] +pub struct HttpResponse { + pub status: u16, + #[serde(default)] + pub headers: HashMap, + #[serde(default)] + pub body: String, +} + +impl HttpResponse { + pub fn into_success_body(self) -> Result { + if (200..300).contains(&self.status) { + Ok(self.body) + } else { + Err(PluginError::HttpStatus { + status: self.status, + message: truncate(&self.body, 256), + }) + } + } +} + +fn truncate(s: &str, max: usize) -> String { + if s.len() <= max { + s.to_string() + } else { + let mut cut = max; + while !s.is_char_boundary(cut) && cut > 0 { + cut -= 1; + } + format!("{}…", &s[..cut]) + } +} + +pub fn parse_http_response(raw: &str) -> Result { + serde_json::from_str(raw).map_err(|e| PluginError::HostResponse(e.to_string())) +} + +// ── Imgur ──────────────────────────────────────────────────────────────────── + +/// Matches Imgur API v3 `/album//images` JSON shape. +#[derive(Debug, Deserialize)] +struct ImgurAlbumResponse { + data: Vec, + status: u16, + #[serde(default)] + success: bool, +} + +#[derive(Debug, Deserialize)] +struct ImgurImage { + #[serde(default)] + id: Option, + #[serde(default)] + title: Option, + #[serde(default)] + link: Option, + #[serde(default)] + width: Option, + #[serde(default)] + height: Option, +} + +pub fn build_imgur_album_request(album_id: &str, client_id: &str) -> Result { + let url = format!("https://api.imgur.com/3/album/{album_id}/images"); + let mut headers = HashMap::new(); + headers.insert("Authorization".into(), format!("Client-ID {client_id}")); + let req = HttpRequest { + method: "GET".into(), + url, + headers, + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +pub fn parse_imgur_album(raw: &str) -> Result, PluginError> { + let parsed: ImgurAlbumResponse = + serde_json::from_str(raw).map_err(|e| PluginError::ParseJson(e.to_string()))?; + if !parsed.success || !(200..300).contains(&parsed.status) { + return Err(PluginError::HttpStatus { + status: parsed.status, + message: "Imgur API returned success=false".into(), + }); + } + Ok(parsed + .data + .into_iter() + .filter_map(|img| { + img.link.map(|url| ImageLink { + url, + width: img.width, + height: img.height, + title: img.title.or(img.id), + filename: None, + }) + }) + .collect()) +} + +// ── Reddit ─────────────────────────────────────────────────────────────────── + +/// Minimal subset of the Reddit listing JSON: the root is a 2-element +/// array where the first element contains the post and the second the +/// comment tree. +#[derive(Debug, Deserialize)] +struct RedditListing { + data: RedditListingData, +} + +#[derive(Debug, Deserialize)] +struct RedditListingData { + #[serde(default)] + children: Vec, +} + +#[derive(Debug, Deserialize)] +struct RedditChild { + data: RedditPost, +} + +#[derive(Debug, Deserialize)] +struct RedditPost { + #[serde(default)] + title: Option, + #[serde(default)] + url: Option, + #[serde(default)] + is_gallery: Option, + #[serde(default)] + media_metadata: Option>, + /// Gallery ordering — a sibling of `media_metadata` that carries + /// the ordered `media_id` sequence. Present only on native Reddit + /// galleries; older scraped posts may be missing it, in which + /// case callers fall back to a URL-sorted enumeration of + /// `media_metadata`. + #[serde(default)] + gallery_data: Option, + #[serde(default)] + preview: Option, +} + +#[derive(Debug, Deserialize)] +struct RedditGalleryData { + #[serde(default)] + items: Vec, +} + +#[derive(Debug, Deserialize)] +struct RedditGalleryItem { + #[serde(default)] + media_id: Option, +} + +#[derive(Debug, Deserialize)] +struct RedditMediaMeta { + #[serde(default)] + s: Option, +} + +#[derive(Debug, Deserialize)] +struct RedditMediaSource { + #[serde(default)] + u: Option, + #[serde(default)] + x: Option, + #[serde(default)] + y: Option, +} + +#[derive(Debug, Deserialize)] +struct RedditPreview { + #[serde(default)] + images: Vec, +} + +#[derive(Debug, Deserialize)] +struct RedditPreviewImage { + #[serde(default)] + source: Option, +} + +#[derive(Debug, Deserialize)] +struct RedditPreviewSource { + #[serde(default)] + url: Option, + #[serde(default)] + width: Option, + #[serde(default)] + height: Option, +} + +pub fn build_reddit_request(permalink_json: &str) -> Result { + let mut headers = HashMap::new(); + headers.insert("User-Agent".into(), "vortex-gallery-plugin/1.0".into()); + let req = HttpRequest { + method: "GET".into(), + url: permalink_json.to_string(), + headers, + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +pub fn parse_reddit_submission(raw: &str) -> Result, PluginError> { + let listings: Vec = + serde_json::from_str(raw).map_err(|e| PluginError::ParseJson(e.to_string()))?; + let Some(root) = listings.first() else { + return Ok(Vec::new()); + }; + let Some(child) = root.data.children.first() else { + return Ok(Vec::new()); + }; + let post = &child.data; + let title = post.title.clone(); + + // Case 1: native Reddit gallery (`is_gallery=true` + `media_metadata`) + if post.is_gallery.unwrap_or(false) { + if let Some(meta) = &post.media_metadata { + // Prefer the post's explicit ordering via + // `gallery_data.items` — each item carries a `media_id` + // that indexes into `media_metadata`. This preserves the + // submission sequence the uploader chose. + if let Some(gallery) = &post.gallery_data { + if !gallery.items.is_empty() { + let ordered: Vec = gallery + .items + .iter() + .filter_map(|item| item.media_id.as_ref()) + .filter_map(|id| { + let entry = meta.get(id)?; + let s = entry.s.as_ref()?; + s.u.as_ref().map(|u| ImageLink { + url: unescape_amp(u), + width: s.x, + height: s.y, + title: title.clone(), + filename: None, + }) + }) + .collect(); + if !ordered.is_empty() { + return Ok(ordered); + } + } + } + + // Fallback: `gallery_data` is missing (older posts, some + // scrapes) — enumerate `media_metadata` and sort by URL + // so the output is at least deterministic across runs. + let mut links: Vec = meta + .values() + .filter_map(|item| { + let s = item.s.as_ref()?; + s.u.as_ref().map(|u| ImageLink { + url: unescape_amp(u), + width: s.x, + height: s.y, + title: title.clone(), + filename: None, + }) + }) + .collect(); + links.sort_by(|a, b| a.url.cmp(&b.url)); + return Ok(links); + } + } + + // Case 2: single-image post — prefer preview (carries dimensions) + if let Some(preview) = &post.preview { + if let Some(img) = preview.images.first() { + if let Some(src) = &img.source { + if let Some(url) = &src.url { + return Ok(vec![ImageLink { + url: unescape_amp(url), + width: src.width, + height: src.height, + title, + filename: None, + }]); + } + } + } + } + + // Case 3: fallback — the submission URL points directly at an image + if let Some(url) = &post.url { + if looks_like_image_url(url) { + return Ok(vec![ImageLink { + url: url.clone(), + width: None, + height: None, + title, + filename: None, + }]); + } + } + + Ok(Vec::new()) +} + +fn unescape_amp(url: &str) -> String { + url.replace("&", "&") +} + +fn looks_like_image_url(url: &str) -> bool { + // Strip the query and fragment before inspecting the extension so + // that `https://cdn/example.jpg?sig=xyz#frag` is still recognised. + let stripped = url + .split('#') + .next() + .unwrap_or("") + .split('?') + .next() + .unwrap_or(""); + let lower = stripped.to_ascii_lowercase(); + lower.ends_with(".jpg") + || lower.ends_with(".jpeg") + || lower.ends_with(".png") + || lower.ends_with(".gif") + || lower.ends_with(".webp") + || lower.ends_with(".avif") +} + +// ── Flickr ─────────────────────────────────────────────────────────────────── + +/// Matches Flickr REST `flickr.photosets.getPhotos` JSON response when +/// `format=json&nojsoncallback=1` is passed. +/// +/// `photoset` is optional because Flickr's `{"stat":"fail"}` error +/// envelopes (bad API key, private album, non-existent photoset) omit +/// the field entirely — a mandatory field would surface those as JSON +/// parse failures instead of clean provider errors. +#[derive(Debug, Deserialize)] +struct FlickrResponse { + #[serde(default)] + photoset: Option, + #[serde(default)] + stat: Option, + #[serde(default)] + code: Option, + #[serde(default)] + message: Option, +} + +#[derive(Debug, Deserialize)] +struct FlickrPhotoset { + #[serde(default)] + photo: Vec, +} + +#[derive(Debug, Deserialize)] +struct FlickrPhoto { + #[serde(default)] + id: Option, + #[serde(default)] + title: Option, + // Original-size URL and its dimensions. Flickr returns dimensions + // as either a JSON number or a string depending on the extras + // requested — `extract_dim` normalises both shapes. + #[serde(default)] + url_o: Option, + #[serde(default)] + width_o: Option, + #[serde(default)] + height_o: Option, + // Large-size URL and its dimensions — the fallback we emit when + // `url_o` is missing (not every photoset exposes original-size + // downloads). Without the matching `width_l`/`height_l`, the + // downstream `min_resolution` filter would see `None, None` and + // either keep every large-only image or drop them all, depending + // on the filter's partial-dimension policy. + #[serde(default)] + url_l: Option, + #[serde(default)] + width_l: Option, + #[serde(default)] + height_l: Option, +} + +pub fn build_flickr_request(album_id: &str, api_key: &str) -> Result { + // URL-encode user-controlled config values so that a key or album + // id containing `&` or `=` cannot corrupt the query string. The + // `album_id` is matched by `(\d+)` in `url_matcher.rs` so it is + // safe by construction, but encoding it costs nothing and matches + // the hardening applied to SoundCloud and Vimeo. + // + // Extras: request URL + dimensions for both the original (`url_o`, + // `width_o`, `height_o`) and large (`url_l`, `width_l`, `height_l`) + // sizes. The parser prefers `url_o` but falls back to `url_l` when + // the original is not published — and it must read the matching + // width/height fields so images aren't wrongly dropped by the + // `min_resolution` filter for missing dimensions. + let extras = urlencode_query("url_o,width_o,height_o,url_l,width_l,height_l"); + let url = format!( + "https://api.flickr.com/services/rest/?method=flickr.photosets.getPhotos&api_key={}&photoset_id={}&format=json&nojsoncallback=1&extras={}", + urlencode_query(api_key), + urlencode_query(album_id), + extras, + ); + let req = HttpRequest { + method: "GET".into(), + url, + headers: HashMap::new(), + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +/// Minimal percent-encoder for query-string values. Identical in spirit +/// to the one used by the SoundCloud and Vimeo plugins; duplicated here +/// intentionally because sharing would force a separate sdk crate. +fn urlencode_query(s: &str) -> String { + let mut out = String::with_capacity(s.len()); + for b in s.bytes() { + if b.is_ascii_alphanumeric() || matches!(b, b'-' | b'_' | b'.' | b'~') { + out.push(b as char); + } else { + out.push_str(&format!("%{:02X}", b)); + } + } + out +} + +pub fn parse_flickr_photoset(raw: &str) -> Result, PluginError> { + let parsed: FlickrResponse = + serde_json::from_str(raw).map_err(|e| PluginError::ParseJson(e.to_string()))?; + + // Check the API envelope status BEFORE touching `photoset`, so that + // `{"stat":"fail"}` responses surface as a provider error with the + // Flickr error code / message instead of an unwrap panic or a + // misleading JSON parse failure. + if parsed.stat.as_deref() == Some("fail") { + return Err(PluginError::HttpStatus { + status: parsed.code.unwrap_or(400), + message: parsed + .message + .unwrap_or_else(|| "Flickr API returned stat=fail".into()), + }); + } + if !matches!(parsed.stat.as_deref(), Some("ok") | None) { + return Err(PluginError::HttpStatus { + status: 400, + message: format!("Flickr stat={:?}", parsed.stat), + }); + } + + // `photoset` is now only absent for malformed success envelopes — + // treat that as an empty album rather than an error. + let Some(photoset) = parsed.photoset else { + return Ok(Vec::new()); + }; + + Ok(photoset + .photo + .into_iter() + .filter_map(|p| { + // Prefer the original-size URL with its matching + // `width_o`/`height_o`; fall back to the large-size URL + // with `width_l`/`height_l` so the dimensions we emit + // always describe the URL we emit, not a different size. + // Reading `width_o` when `url_l` is used would populate + // the link with stale (or missing) dimensions and either + // under-filter or wrongly drop images in + // `filter_by_min_resolution`. + let (url, width, height) = if let Some(url_o) = p.url_o { + (url_o, extract_dim(&p.width_o), extract_dim(&p.height_o)) + } else if let Some(url_l) = p.url_l { + (url_l, extract_dim(&p.width_l), extract_dim(&p.height_l)) + } else { + return None; + }; + Some(ImageLink { + url, + width, + height, + title: p.title.or_else(|| p.id.clone()), + filename: None, + }) + }) + .collect()) +} + +/// Flickr returns image dimensions as either a JSON number or a string +/// depending on the extras requested — handle both. +fn extract_dim(value: &Option) -> Option { + match value { + Some(serde_json::Value::Number(n)) => n.as_u64().map(|v| v as u32), + Some(serde_json::Value::String(s)) => s.parse().ok(), + _ => None, + } +} + +// ── Generic HTML fallback ──────────────────────────────────────────────────── + +fn img_src_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + // Match and capture the src value. + // Uses [^>]* rather than .*? to avoid backtracking on large pages. + // + // The pattern is a compile-time constant — `.expect` documents + // the invariant and honours the crate-wide rule that no + // production code path may `.unwrap()`. + Regex::new(r#"(?i)]*\bsrc\s*=\s*["']([^"']+)["']"#) + .expect("img_src_regex: compile-time constant regex must compile") + }) +} + +pub fn build_generic_request(page_url: &str) -> Result { + let mut headers = HashMap::new(); + headers.insert("User-Agent".into(), "vortex-gallery-plugin/1.0".into()); + let req = HttpRequest { + method: "GET".into(), + url: page_url.to_string(), + headers, + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +/// Scrape `` tags from an HTML page. Relative URLs are resolved +/// against `base_url`: +/// +/// - absolute URLs are passed through verbatim +/// - protocol-relative URLs (`//cdn.example.com/a.jpg`) inherit the +/// **scheme of the page URL** (not a hardcoded `https:`) +/// - root-relative paths (`/foo.png`) are resolved against the origin +/// - page-relative paths (`images/4.jpg`) are resolved against the +/// page's **directory** (everything up to and including the last +/// `/`) so `` on `https://example.com/gallery/p.html` +/// becomes `https://example.com/gallery/a.jpg`, not +/// `https://example.com/a.jpg` +/// +/// Non-http(s) URL schemes like `data:`, `blob:`, `javascript:`, +/// `mailto:` are dropped before resolution. +pub fn parse_generic_html(html: &str, base_url: &str) -> Vec { + let ctx = UrlContext::from_page_url(base_url); + img_src_regex() + .captures_iter(html) + .filter_map(|c| c.get(1).map(|m| m.as_str().to_string())) + .filter(|raw| !has_non_http_scheme(raw)) + .map(|raw| ctx.resolve(&raw)) + .filter(|url| is_http_url(url)) + .map(|url| ImageLink { + url, + width: None, + height: None, + title: None, + filename: None, + }) + .collect() +} + +/// Parsed view of a page URL, split into the pieces the generic +/// resolver actually needs. +#[derive(Debug, Default)] +struct UrlContext { + /// `"http"` or `"https"`, lowercased. Empty if the input wasn't + /// a well-formed http(s) URL — the resolver then degrades to + /// leaving relative URLs untouched (and they get dropped by + /// `is_http_url`). + scheme: String, + /// `://` — no path, no query, no fragment. + origin: String, + /// `:////` — the page directory, always ending + /// in `/`. Used for page-relative resolution. + base_dir: String, +} + +impl UrlContext { + fn from_page_url(url: &str) -> Self { + let (scheme, rest) = match url.split_once("://") { + Some((s, r)) => (s.to_ascii_lowercase(), r), + None => return Self::default(), + }; + if !matches!(scheme.as_str(), "http" | "https") { + return Self::default(); + } + // `rest` looks like `host/path?q#f` or just `host`. + let authority_end = rest.find(['/', '?', '#']).unwrap_or(rest.len()); + let host = &rest[..authority_end]; + if host.is_empty() { + return Self::default(); + } + let origin = format!("{scheme}://{host}"); + + // Extract the path (before `?` and `#`), then keep everything + // up to and including the last `/` as the base directory. + let path_start = authority_end; + let after_authority = &rest[path_start..]; + let path_only = after_authority + .split('#') + .next() + .unwrap_or("") + .split('?') + .next() + .unwrap_or(""); + let dir = match path_only.rfind('/') { + Some(idx) => &path_only[..=idx], + None => "/", + }; + let base_dir = format!("{origin}{dir}"); + + Self { + scheme, + origin, + base_dir, + } + } + + fn resolve(&self, raw: &str) -> String { + // Scheme detection must be case-insensitive: URL schemes are + // defined as case-insensitive by RFC 3986, and HTML authors + // sometimes write `HTTP://` (especially hand-edited legacy + // pages). A `starts_with("http://")` check would miss those + // and wrongly treat them as relative paths, prepending the + // page directory and producing a malformed URL. + if is_absolute_http_url(raw) { + raw.to_string() + } else if let Some(tail) = raw.strip_prefix("//") { + // Protocol-relative: inherit the page scheme instead of + // hardcoding https so http-only pages keep working. + let scheme = if self.scheme.is_empty() { + "https" + } else { + &self.scheme + }; + format!("{scheme}://{tail}") + } else if raw.starts_with('/') { + // Root-relative: attach to the origin. + format!("{}{}", self.origin, raw) + } else if self.base_dir.is_empty() { + // No base directory to resolve against — return the raw + // path; `is_http_url` will drop it. + raw.to_string() + } else { + // Page-relative: attach to the page directory so nested + // pages keep their asset paths intact. + format!("{}{}", self.base_dir, raw) + } + } +} + +/// Return true if the raw href is a non-resolvable scheme such as +/// `data:`, `blob:`, `javascript:`, `mailto:`, `tel:`, `file:`. These +/// must never be prefixed with an origin during relative resolution. +fn has_non_http_scheme(raw: &str) -> bool { + // A scheme is `[]*:` at the start of the URL. + // If it matches *and* it is not `http` or `https`, reject. + let colon = match raw.find(':') { + Some(i) => i, + None => return false, + }; + // Rule out `//` (protocol-relative) which has no scheme prefix. + if raw.starts_with("//") { + return false; + } + let scheme = &raw[..colon]; + // Use `map_or` rather than `unwrap()` so that reordering this + // check ahead of the `is_empty()` guard cannot introduce a panic. + if !scheme + .chars() + .next() + .is_some_and(|c| c.is_ascii_alphabetic()) + { + return false; + } + if !scheme + .chars() + .all(|c| c.is_ascii_alphanumeric() || matches!(c, '+' | '-' | '.')) + { + return false; + } + let lower = scheme.to_ascii_lowercase(); + lower != "http" && lower != "https" +} + +fn is_http_url(url: &str) -> bool { + is_absolute_http_url(url) +} + +/// Return `true` if `url` starts with an absolute `http://` or +/// `https://` scheme, **ignoring case**. RFC 3986 §3.1 defines URL +/// schemes as case-insensitive; some hand-edited HTML in the wild +/// uses uppercase schemes (`HTTP://example.com/a.jpg`), and a +/// case-sensitive check would route those into the relative-URL +/// branch and produce malformed output. +fn is_absolute_http_url(url: &str) -> bool { + let lower = url.chars().take(8).collect::().to_ascii_lowercase(); + lower.starts_with("http://") || lower.starts_with("https://") +} + +#[cfg(test)] +mod tests { + use super::*; + + // ── Imgur ────────────────────────────────────────────────────────────── + const IMGUR_ALBUM_JSON: &str = r#"{ + "data": [ + { + "id": "img1", + "title": "first", + "link": "https://i.imgur.com/img1.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": "img2", + "title": null, + "link": "https://i.imgur.com/img2.png", + "width": 800, + "height": 600 + } + ], + "status": 200, + "success": true + }"#; + + const IMGUR_FAILED_JSON: &str = r#"{ + "data": [], + "status": 404, + "success": false + }"#; + + #[test] + fn imgur_album_extracts_all_images() { + let links = parse_imgur_album(IMGUR_ALBUM_JSON).unwrap(); + assert_eq!(links.len(), 2); + assert_eq!(links[0].url, "https://i.imgur.com/img1.jpg"); + assert_eq!(links[0].width, Some(1920)); + assert_eq!(links[0].height, Some(1080)); + assert_eq!(links[0].title.as_deref(), Some("first")); + } + + #[test] + fn imgur_failed_response_maps_to_http_error() { + let err = parse_imgur_album(IMGUR_FAILED_JSON).unwrap_err(); + assert!(matches!(err, PluginError::HttpStatus { status: 404, .. })); + } + + #[test] + fn build_imgur_request_sets_client_id_header() { + let req = build_imgur_album_request("abc123", "MY_CLIENT").unwrap(); + assert!(req.contains("https://api.imgur.com/3/album/abc123/images")); + assert!(req.contains("Authorization")); + assert!(req.contains("Client-ID MY_CLIENT")); + } + + // ── Reddit ───────────────────────────────────────────────────────────── + const REDDIT_GALLERY_JSON: &str = r#"[ + {"data": {"children": [ + {"data": { + "title": "cool pics", + "is_gallery": true, + "media_metadata": { + "id1": {"s": {"u": "https://preview.redd.it/a.jpg?sig=1&s=2", "x": 1200, "y": 800}, "m": "image/jpg"}, + "id2": {"s": {"u": "https://preview.redd.it/b.jpg", "x": 1920, "y": 1080}, "m": "image/jpg"} + } + }} + ]}}, + {"data": {"children": []}} + ]"#; + + const REDDIT_SINGLE_IMAGE_JSON: &str = r#"[ + {"data": {"children": [ + {"data": { + "title": "neato", + "url": "https://i.redd.it/example.png", + "preview": { + "images": [ + {"source": {"url": "https://preview.redd.it/example.png?sig=xyz", "width": 800, "height": 600}} + ] + } + }} + ]}}, + {"data": {"children": []}} + ]"#; + + #[test] + fn reddit_gallery_extracts_all_images_in_order() { + let links = parse_reddit_submission(REDDIT_GALLERY_JSON).unwrap(); + assert_eq!(links.len(), 2); + // Unescaped ampersand roundtrip + assert_eq!(links[0].url, "https://preview.redd.it/a.jpg?sig=1&s=2"); + assert_eq!(links[0].width, Some(1200)); + assert_eq!(links[0].height, Some(800)); + } + + #[test] + fn reddit_single_image_uses_preview_source() { + let links = parse_reddit_submission(REDDIT_SINGLE_IMAGE_JSON).unwrap(); + assert_eq!(links.len(), 1); + assert_eq!(links[0].width, Some(800)); + assert_eq!(links[0].height, Some(600)); + } + + #[test] + fn reddit_gallery_preserves_post_ordering_via_gallery_data() { + // With `gallery_data.items` present, the parser must walk the + // items in order, looking up each `media_id` in + // `media_metadata`. This preserves the submission's image + // sequence regardless of HashMap iteration order. + // + // `id_z` is intentionally lexicographically *after* `id_a` so + // that a URL-sort fallback would produce the opposite order. + let raw = r#"[ + {"data": {"children": [ + {"data": { + "title": "ordered post", + "is_gallery": true, + "gallery_data": {"items": [ + {"media_id": "id_z"}, + {"media_id": "id_a"} + ]}, + "media_metadata": { + "id_a": {"s": {"u": "https://preview.redd.it/a.jpg", "x": 100, "y": 100}}, + "id_z": {"s": {"u": "https://preview.redd.it/z.jpg", "x": 200, "y": 200}} + } + }} + ]}}, + {"data": {"children": []}} + ]"#; + let links = parse_reddit_submission(raw).unwrap(); + assert_eq!(links.len(), 2); + assert_eq!(links[0].url, "https://preview.redd.it/z.jpg"); + assert_eq!(links[1].url, "https://preview.redd.it/a.jpg"); + } + + #[test] + fn reddit_gallery_falls_back_to_url_sort_when_gallery_data_missing() { + // No `gallery_data` — the parser falls back to the + // deterministic URL-sorted enumeration of `media_metadata`. + let raw = r#"[ + {"data": {"children": [ + {"data": { + "title": "legacy post", + "is_gallery": true, + "media_metadata": { + "id_z": {"s": {"u": "https://preview.redd.it/z.jpg", "x": 200, "y": 200}}, + "id_a": {"s": {"u": "https://preview.redd.it/a.jpg", "x": 100, "y": 100}} + } + }} + ]}}, + {"data": {"children": []}} + ]"#; + let links = parse_reddit_submission(raw).unwrap(); + assert_eq!(links.len(), 2); + // URL-sorted: a.jpg comes before z.jpg + assert_eq!(links[0].url, "https://preview.redd.it/a.jpg"); + assert_eq!(links[1].url, "https://preview.redd.it/z.jpg"); + } + + #[test] + fn reddit_empty_listing_is_not_an_error() { + let raw = r#"[{"data": {"children": []}}, {"data": {"children": []}}]"#; + let links = parse_reddit_submission(raw).unwrap(); + assert!(links.is_empty()); + } + + // ── Flickr ───────────────────────────────────────────────────────────── + const FLICKR_SET_JSON: &str = r#"{ + "photoset": { + "id": "72177", + "photo": [ + { + "id": "1", + "title": "pic1", + "url_o": "https://live.staticflickr.com/1.jpg", + "width_o": "4000", + "height_o": "3000" + }, + { + "id": "2", + "title": "pic2", + "url_l": "https://live.staticflickr.com/2_l.jpg", + "width_l": 2048, + "height_l": 1365 + } + ] + }, + "stat": "ok" + }"#; + + const FLICKR_ERROR_JSON: &str = r#"{ + "photoset": {"photo": []}, + "stat": "fail" + }"#; + + #[test] + fn flickr_photoset_extracts_all_photos() { + let links = parse_flickr_photoset(FLICKR_SET_JSON).unwrap(); + assert_eq!(links.len(), 2); + assert_eq!(links[0].width, Some(4000)); + assert_eq!(links[0].height, Some(3000)); + assert_eq!(links[1].url, "https://live.staticflickr.com/2_l.jpg"); + } + + #[test] + fn flickr_error_stat_is_rejected() { + let err = parse_flickr_photoset(FLICKR_ERROR_JSON).unwrap_err(); + assert!(matches!(err, PluginError::HttpStatus { .. })); + } + + #[test] + fn build_flickr_request_encodes_extras() { + let req = build_flickr_request("72177", "KEY").unwrap(); + assert!(req.contains("photoset_id=72177")); + assert!(req.contains("api_key=KEY")); + // Extras now includes both `url_o`/`width_o`/`height_o` and + // `url_l`/`width_l`/`height_l` so the parser can emit + // matching dimensions regardless of which URL size is used. + assert!(req.contains("url_o")); + assert!(req.contains("width_o")); + assert!(req.contains("height_o")); + assert!(req.contains("url_l")); + assert!(req.contains("width_l")); + assert!(req.contains("height_l")); + } + + #[test] + fn flickr_url_l_reads_matching_large_dimensions() { + // When `url_l` is the emitted URL, the parser must read + // `width_l`/`height_l` (not `width_o`/`height_o`) so the + // dimensions describe the image we actually download. + let raw = r#"{ + "photoset": { + "photo": [ + { + "id": "1", + "title": "large-only", + "url_l": "https://live.staticflickr.com/1_l.jpg", + "width_l": 1024, + "height_l": 768 + } + ] + }, + "stat": "ok" + }"#; + let links = parse_flickr_photoset(raw).unwrap(); + assert_eq!(links.len(), 1); + assert_eq!(links[0].url, "https://live.staticflickr.com/1_l.jpg"); + assert_eq!(links[0].width, Some(1024)); + assert_eq!(links[0].height, Some(768)); + } + + #[test] + fn flickr_url_o_does_not_leak_large_dimensions() { + // Defensive: when `url_o` is available the parser uses + // `width_o`/`height_o`, not any `width_l`/`height_l` that may + // happen to coexist in the JSON. + let raw = r#"{ + "photoset": { + "photo": [ + { + "id": "1", + "title": "both-sizes", + "url_o": "https://live.staticflickr.com/1.jpg", + "width_o": 4000, + "height_o": 3000, + "url_l": "https://live.staticflickr.com/1_l.jpg", + "width_l": 1024, + "height_l": 768 + } + ] + }, + "stat": "ok" + }"#; + let links = parse_flickr_photoset(raw).unwrap(); + assert_eq!(links.len(), 1); + assert_eq!(links[0].url, "https://live.staticflickr.com/1.jpg"); + assert_eq!(links[0].width, Some(4000)); + assert_eq!(links[0].height, Some(3000)); + } + + // ── Generic HTML ─────────────────────────────────────────────────────── + #[test] + fn generic_html_scrapes_img_tags_page_relative() { + let html = r#" + + + one + two + three + + + + + "#; + // The page lives in `/gallery/` so `relative/4.gif` should + // resolve against that directory, not the origin root. + let links = parse_generic_html(html, "https://example.com/gallery/page.html"); + let urls: Vec<_> = links.iter().map(|l| l.url.as_str()).collect(); + assert_eq!( + urls, + vec![ + "https://cdn.example.com/1.jpg", + "https://example.com/rel/2.png", + "https://cdn2.example.com/3.webp", + "https://example.com/gallery/relative/4.gif", + ] + ); + } + + #[test] + fn generic_html_protocol_relative_inherits_http_scheme() { + // Page served over plain HTTP must NOT upgrade protocol-relative + // images to https — that would break http-only assets. + let html = r#""#; + let links = parse_generic_html(html, "http://example.com/page"); + assert_eq!(links[0].url, "http://cdn.example.com/a.jpg"); + } + + #[test] + fn generic_html_root_page_page_relative_uses_origin_root() { + // When the page has no directory segment, relative paths + // resolve against `/` directly. + let html = r#""#; + let links = parse_generic_html(html, "https://example.com"); + assert_eq!(links[0].url, "https://example.com/foo.jpg"); + } + + #[test] + fn generic_html_accepts_uppercase_scheme() { + // Legacy pages sometimes ship `HTTP://` / `HTTPS://` — scheme + // detection must be case-insensitive so the URL is not routed + // through the relative-URL branch. + let html = r#""#; + let links = parse_generic_html(html, "https://example.com/page"); + assert_eq!(links.len(), 1); + assert_eq!(links[0].url, "HTTPS://cdn.example.com/A.jpg"); + } + + #[test] + fn url_context_ignores_query_and_fragment_on_base() { + let ctx = UrlContext::from_page_url("https://example.com/a/b?q=1#f"); + assert_eq!(ctx.origin, "https://example.com"); + assert_eq!(ctx.base_dir, "https://example.com/a/"); + } + + #[test] + fn has_non_http_scheme_rejects_data_javascript_mailto() { + assert!(has_non_http_scheme("data:image/png;base64,AAAA")); + assert!(has_non_http_scheme("javascript:alert(1)")); + assert!(has_non_http_scheme("mailto:a@b.com")); + assert!(has_non_http_scheme("blob:https://x")); + assert!(!has_non_http_scheme("http://x/y")); + assert!(!has_non_http_scheme("https://x/y")); + assert!(!has_non_http_scheme("//cdn.example.com/a.jpg")); + assert!(!has_non_http_scheme("/relative")); + assert!(!has_non_http_scheme("no-colon-here")); + } + + #[test] + fn flickr_failure_envelope_surfaces_as_provider_error() { + // Bad API key / private album / missing set → `stat: "fail"` + // with no `photoset` field. Must map to PluginError::HttpStatus. + let raw = r#"{ + "stat": "fail", + "code": 100, + "message": "Invalid API Key" + }"#; + let err = parse_flickr_photoset(raw).unwrap_err(); + match err { + PluginError::HttpStatus { status, message } => { + assert_eq!(status, 100); + assert_eq!(message, "Invalid API Key"); + } + other => panic!("unexpected {other:?}"), + } + } + + #[test] + fn reddit_fallback_detects_image_with_query_string() { + // Reddit single-image submissions without a `preview` field + // use the submission URL directly. That URL may carry a CDN + // signing query string, so the extension check must ignore it. + let raw = r#"[ + {"data": {"children": [ + {"data": { + "title": "shot", + "url": "https://i.redd.it/example.png?sig=abc123" + }} + ]}}, + {"data": {"children": []}} + ]"#; + let links = parse_reddit_submission(raw).unwrap(); + assert_eq!(links.len(), 1); + assert_eq!(links[0].url, "https://i.redd.it/example.png?sig=abc123"); + } + + #[test] + fn http_response_parse_envelope() { + let raw = r#"{"status": 200, "headers": {}, "body": "ok"}"#; + let resp = parse_http_response(raw).unwrap(); + assert_eq!(resp.status, 200); + } +} diff --git a/plugins/vortex-mod-gallery/src/url_matcher.rs b/plugins/vortex-mod-gallery/src/url_matcher.rs new file mode 100644 index 0000000..4e0aa97 --- /dev/null +++ b/plugins/vortex-mod-gallery/src/url_matcher.rs @@ -0,0 +1,303 @@ +//! Gallery URL detection and provider routing. +//! +//! Each recognised URL is routed to exactly one [`Provider`] based on a +//! host + path shape check. Unknown URLs fall through to +//! [`Provider::Generic`] only if the URL scheme is http(s); everything +//! else is rejected. + +use std::sync::OnceLock; + +use regex::Regex; +use serde::Serialize; + +/// Gallery provider for a given URL. +/// +/// This is the single canonical `Provider` type for the crate — +/// `link.rs` re-exports it so that all modules (url_matcher, filter, +/// providers, lib.rs) share the same definition and cannot drift. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize)] +#[serde(rename_all = "snake_case")] +pub enum Provider { + /// Imgur album or gallery: `imgur.com/a/` or `imgur.com/gallery/` + Imgur, + /// Reddit submission / gallery: `reddit.com/r//comments//…` + Reddit, + /// Flickr photoset / album: `flickr.com/photos//albums/` + Flickr, + /// Generic HTML page — parse `` tags. + Generic, +} + +pub fn classify_url(url: &str) -> Option { + let (host_lower, path) = validate_and_split(url)?; + let path_only = normalize_path(path); + + if is_imgur_host(&host_lower) && imgur_regex().is_match(path_only) { + return Some(Provider::Imgur); + } + if is_reddit_host(&host_lower) && reddit_regex().is_match(path_only) { + return Some(Provider::Reddit); + } + if is_flickr_host(&host_lower) && flickr_regex().is_match(path_only) { + return Some(Provider::Flickr); + } + // Generic fallback: any http(s) URL is eligible, callers decide + // whether to actually scrape it based on their own policy. + Some(Provider::Generic) +} + +pub fn is_recognised_provider(url: &str) -> bool { + !matches!(classify_url(url), None | Some(Provider::Generic)) +} + +pub fn extract_imgur_id(url: &str) -> Option { + let (_, path) = validate_and_split(url)?; + let path_only = normalize_path(path); + imgur_regex() + .captures(path_only) + .and_then(|c| c.get(1).map(|m| m.as_str().to_string())) +} + +pub fn extract_reddit_permalink(url: &str) -> Option { + // Reddit JSON endpoint = .json + let (host, path) = validate_and_split(url)?; + if !is_reddit_host(&host) { + return None; + } + let path_only = normalize_path(path); + if !reddit_regex().is_match(path_only) { + return None; + } + // Reddit's JSON endpoint is `.json`. Users sometimes + // paste the already-terminated `.json` URL; appending another + // `.json` would produce a `title.json.json` request that 404s, so + // we detect that case and pass the path through unchanged. + let already_json = path_only.ends_with(".json"); + let suffix = if already_json { "" } else { ".json" }; + Some(format!("https://www.reddit.com{path_only}{suffix}")) +} + +pub fn extract_flickr_album_id(url: &str) -> Option<(String, String)> { + let (_, path) = validate_and_split(url)?; + let path_only = normalize_path(path); + let caps = flickr_regex().captures(path_only)?; + let user = caps.get(1)?.as_str().to_string(); + let album = caps.get(2)?.as_str().to_string(); + Some((user, album)) +} + +/// Strip `?query`, `#fragment`, and trailing `/` from the raw path. +/// Fragments are split first so that `path?q#frag` still works. +fn normalize_path(path: &str) -> &str { + let no_frag = path.split('#').next().unwrap_or(""); + let no_query = no_frag.split('?').next().unwrap_or(""); + no_query.trim_end_matches('/') +} + +fn is_imgur_host(host: &str) -> bool { + matches!( + host, + "imgur.com" | "www.imgur.com" | "i.imgur.com" | "m.imgur.com" + ) +} + +fn is_reddit_host(host: &str) -> bool { + matches!( + host, + "reddit.com" | "www.reddit.com" | "old.reddit.com" | "m.reddit.com" + ) +} + +fn is_flickr_host(host: &str) -> bool { + matches!(host, "flickr.com" | "www.flickr.com" | "m.flickr.com") +} + +// Each provider regex anchors the captured segment to a **segment +// boundary** — either end-of-string or a `/` — so malformed paths +// like `/gallery/abc-typo` (where `-typo` is supposed to be part of +// the same segment) or `/albums/123junk` are rejected instead of +// producing a partial match that the extractor would happily use. +// Callers pre-normalise the path with `normalize_path`, so fragment +// and query are already stripped by the time the regex runs. + +// All three provider regexes are compile-time constants: `.expect` +// documents the invariant and honours the crate-wide policy that +// production code paths must not `.unwrap()`. + +fn imgur_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/(?:a|gallery)/([A-Za-z0-9]+)(?:$|/)") + .expect("imgur_regex: compile-time constant regex must compile") + }) +} + +fn reddit_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/r/[A-Za-z0-9_]+/comments/[A-Za-z0-9]+(?:$|/)") + .expect("reddit_regex: compile-time constant regex must compile") + }) +} + +fn flickr_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/photos/([^/]+)/albums/(\d+)(?:$|/)") + .expect("flickr_regex: compile-time constant regex must compile") + }) +} + +fn validate_and_split(url: &str) -> Option<(String, &str)> { + let (scheme, rest) = url.split_once("://")?; + if !matches!(scheme.to_ascii_lowercase().as_str(), "http" | "https") { + return None; + } + let (authority, path_and_query) = match rest.find('/') { + Some(idx) => (&rest[..idx], &rest[idx..]), + None => (rest, ""), + }; + let authority_no_user = authority.rsplit('@').next().unwrap_or(authority); + let host = extract_host(authority_no_user)?; + Some((host.to_ascii_lowercase(), path_and_query)) +} + +/// Extract the host portion (without port) from an authority string. +/// +/// Handles both plain hostnames/IPv4 (`example.com:8080`, `1.2.3.4`) +/// and IPv6 literals (`[::1]:8080`, `[2001:db8::1]`). For IPv6, the +/// host is the substring between `[` and `]`, keeping the brackets +/// so downstream host-allowlist matches still behave as expected. +/// For plain hosts, the host is the substring before the first `:`. +/// +/// Returns `None` when the authority is empty or malformed (e.g. a +/// lone `[` with no closing `]`). +fn extract_host(authority: &str) -> Option<&str> { + if authority.is_empty() { + return None; + } + if authority.starts_with('[') { + // IPv6 literal — host includes the brackets. + let close = authority.find(']')?; + Some(&authority[..=close]) + } else { + let host = authority.split(':').next().unwrap_or(authority); + if host.is_empty() { + None + } else { + Some(host) + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use rstest::rstest; + + #[rstest] + #[case("https://imgur.com/a/abcd123", Some(Provider::Imgur))] + #[case("https://imgur.com/gallery/XyZ99", Some(Provider::Imgur))] + #[case( + "https://www.reddit.com/r/pics/comments/1abc/title/", + Some(Provider::Reddit) + )] + #[case("https://old.reddit.com/r/pics/comments/1abc/", Some(Provider::Reddit))] + #[case( + "https://www.flickr.com/photos/bob/albums/72177720313121212", + Some(Provider::Flickr) + )] + #[case("https://example.com/page", Some(Provider::Generic))] + #[case("ftp://example.com/page", None)] + #[case("not a url", None)] + fn test_classify_url(#[case] url: &str, #[case] expected: Option) { + assert_eq!(classify_url(url), expected); + } + + #[test] + fn classify_url_rejects_malformed_imgur_with_junk_suffix() { + // The segment-boundary anchor rejects trailing junk: the + // `abc` is a valid id but `abc-typo` is not a separate segment. + assert_eq!( + classify_url("https://imgur.com/a/abc-typo"), + Some(Provider::Generic) + ); + } + + #[test] + fn classify_url_rejects_malformed_flickr_album_suffix() { + assert_eq!( + classify_url("https://www.flickr.com/photos/bob/albums/123junk"), + Some(Provider::Generic) + ); + } + + #[test] + fn classify_url_rejects_malformed_reddit_permalink_suffix() { + assert_eq!( + classify_url("https://www.reddit.com/r/pics/comments/1abcjunk-extra"), + Some(Provider::Generic) + ); + } + + #[test] + fn classify_url_accepts_imgur_with_trailing_slash() { + // The `(?:$|/)` anchor still permits a trailing slash after a + // valid id segment. + assert_eq!( + classify_url("https://imgur.com/a/abcd123/"), + Some(Provider::Imgur) + ); + } + + #[test] + fn is_recognised_provider_rejects_generic() { + assert!(is_recognised_provider("https://imgur.com/a/abc")); + assert!(!is_recognised_provider("https://example.com/page")); + } + + #[test] + fn extract_host_handles_plain_and_ipv6() { + // Plain host: everything before the first `:` is the host. + assert_eq!(extract_host("example.com"), Some("example.com")); + assert_eq!(extract_host("example.com:8080"), Some("example.com")); + assert_eq!(extract_host("1.2.3.4:443"), Some("1.2.3.4")); + // IPv6 literal: host is the substring between `[` and `]`, + // brackets included. + assert_eq!(extract_host("[::1]"), Some("[::1]")); + assert_eq!(extract_host("[::1]:8080"), Some("[::1]")); + assert_eq!(extract_host("[2001:db8::1]:443"), Some("[2001:db8::1]")); + // Malformed: lone `[` with no `]` → None. + assert_eq!(extract_host("[::1"), None); + // Empty authority → None. + assert_eq!(extract_host(""), None); + } + + #[test] + fn extract_imgur_id_works() { + assert_eq!( + extract_imgur_id("https://imgur.com/a/abcd123"), + Some("abcd123".into()) + ); + assert_eq!( + extract_imgur_id("https://imgur.com/gallery/XyZ99?foo=bar"), + Some("XyZ99".into()) + ); + } + + #[test] + fn extract_reddit_permalink_adds_json_suffix() { + assert_eq!( + extract_reddit_permalink("https://www.reddit.com/r/pics/comments/1abc/title/"), + Some("https://www.reddit.com/r/pics/comments/1abc/title.json".into()) + ); + } + + #[test] + fn extract_flickr_album_id_tuple() { + assert_eq!( + extract_flickr_album_id("https://www.flickr.com/photos/bob/albums/72177720313121212"), + Some(("bob".into(), "72177720313121212".into())) + ); + } +} diff --git a/plugins/vortex-mod-soundcloud/.cargo/config.toml b/plugins/vortex-mod-soundcloud/.cargo/config.toml new file mode 100644 index 0000000..6b509f5 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/.cargo/config.toml @@ -0,0 +1,2 @@ +[build] +target = "wasm32-wasip1" diff --git a/plugins/vortex-mod-soundcloud/Cargo.lock b/plugins/vortex-mod-soundcloud/Cargo.lock new file mode 100644 index 0000000..b6c3c19 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/Cargo.lock @@ -0,0 +1,557 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "anyhow" +version = "1.0.102" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" + +[[package]] +name = "autocfg" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" + +[[package]] +name = "base64" +version = "0.22.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" + +[[package]] +name = "bytemuck" +version = "1.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec" + +[[package]] +name = "bytes" +version = "1.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33" + +[[package]] +name = "cfg-if" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" + +[[package]] +name = "either" +version = "1.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" + +[[package]] +name = "equivalent" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" + +[[package]] +name = "extism-convert" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec1a8eac059a1730a21aa47f99a0c2075ba0ab88fd0c4e52e35027cf99cdf3e7" +dependencies = [ + "anyhow", + "base64", + "bytemuck", + "extism-convert-macros", + "prost", + "rmp-serde", + "serde", + "serde_json", +] + +[[package]] +name = "extism-convert-macros" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "848f105dd6e1af2ea4bb4a76447658e8587167df3c4e4658c4258e5b14a5b051" +dependencies = [ + "manyhow", + "proc-macro-crate", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "extism-manifest" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "953a22ad322939ae4567ec73a34913a3a43dcbdfa648b8307d38fe56bb3a0acd" +dependencies = [ + "base64", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "352fcb5a66eb74145a1c4a01f2bd15d59c62c85be73aac8471880c65b26b798f" +dependencies = [ + "anyhow", + "base64", + "extism-convert", + "extism-manifest", + "extism-pdk-derive", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk-derive" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d086daea5fd844e3c5ac69ddfe36df4a9a43e7218cf7d1f888182b089b09806c" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-timer" +version = "3.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-core", + "futures-macro", + "futures-task", + "pin-project-lite", + "slab", +] + +[[package]] +name = "glob" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280" + +[[package]] +name = "hashbrown" +version = "0.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4f467dd6dccf739c208452f8014c75c18bb8301b050ad1cfb27153803edb0f51" + +[[package]] +name = "indexmap" +version = "2.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" +dependencies = [ + "equivalent", + "hashbrown", +] + +[[package]] +name = "itertools" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "manyhow" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b33efb3ca6d3b07393750d4030418d594ab1139cee518f0dc88db70fec873587" +dependencies = [ + "manyhow-macros", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "manyhow-macros" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46fce34d199b78b6e6073abf984c9cf5fd3e9330145a93ee0738a7443e371495" +dependencies = [ + "proc-macro-utils", + "proc-macro2", + "quote", +] + +[[package]] +name = "memchr" +version = "2.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", +] + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "proc-macro-crate" +version = "3.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e67ba7e9b2b56446f1d419b1d807906278ffa1a658a8a5d8a39dcb1f5a78614f" +dependencies = [ + "toml_edit", +] + +[[package]] +name = "proc-macro-utils" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "eeaf08a13de400bc215877b5bdc088f241b12eb42f0a548d3390dc1c56bb7071" +dependencies = [ + "proc-macro2", + "quote", + "smallvec", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "prost" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d2ea70524a2f82d518bce41317d0fae74151505651af45faf1ffbd6fd33f0568" +dependencies = [ + "bytes", + "prost-derive", +] + +[[package]] +name = "prost-derive" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "27c6023962132f4b30eb4c172c91ce92d933da334c59c23cddee82358ddafb0b" +dependencies = [ + "anyhow", + "itertools", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "regex" +version = "1.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" + +[[package]] +name = "relative-path" +version = "1.9.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ba39f3699c378cd8970968dcbff9c43159ea4cfbd88d43c00b22f2ef10a435d2" + +[[package]] +name = "rmp" +version = "0.8.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ba8be72d372b2c9b35542551678538b562e7cf86c3315773cae48dfbfe7790c" +dependencies = [ + "num-traits", +] + +[[package]] +name = "rmp-serde" +version = "1.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72f81bee8c8ef9b577d1681a70ebbc962c232461e397b22c208c43c04b67a155" +dependencies = [ + "rmp", + "serde", +] + +[[package]] +name = "rstest" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "03e905296805ab93e13c1ec3a03f4b6c4f35e9498a3d5fa96dc626d22c03cd89" +dependencies = [ + "futures-timer", + "futures-util", + "rstest_macros", + "rustc_version", +] + +[[package]] +name = "rstest_macros" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef0053bbffce09062bee4bcc499b0fbe7a57b879f1efe088d6d8d4c7adcdef9b" +dependencies = [ + "cfg-if", + "glob", + "proc-macro-crate", + "proc-macro2", + "quote", + "regex", + "relative-path", + "rustc_version", + "syn", + "unicode-ident", +] + +[[package]] +name = "rustc_version" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92" +dependencies = [ + "semver", +] + +[[package]] +name = "semver" +version = "1.0.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8a7852d02fc848982e0c167ef163aaff9cd91dc640ba85e263cb1ce46fae51cd" + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.149" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "smallvec" +version = "1.15.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "thiserror" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "toml_datetime" +version = "1.1.1+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3165f65f62e28e0115a00b2ebdd37eb6f3b641855f9d636d3cd4103767159ad7" +dependencies = [ + "serde_core", +] + +[[package]] +name = "toml_edit" +version = "0.25.11+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b59c4d22ed448339746c59b905d24568fcbb3ab65a500494f7b8c3e97739f2b" +dependencies = [ + "indexmap", + "toml_datetime", + "toml_parser", + "winnow", +] + +[[package]] +name = "toml_parser" +version = "1.1.2+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2abe9b86193656635d2411dc43050282ca48aa31c2451210f4202550afb7526" +dependencies = [ + "winnow", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "vortex-mod-soundcloud" +version = "1.0.0" +dependencies = [ + "extism-pdk", + "rstest", + "serde", + "serde_json", + "thiserror", +] + +[[package]] +name = "winnow" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09dac053f1cd375980747450bfc7250c264eaae0583872e845c0c7cd578872b5" +dependencies = [ + "memchr", +] + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/plugins/vortex-mod-soundcloud/Cargo.toml b/plugins/vortex-mod-soundcloud/Cargo.toml new file mode 100644 index 0000000..6ef8c36 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/Cargo.toml @@ -0,0 +1,25 @@ +[package] +name = "vortex-mod-soundcloud" +version = "1.0.0" +edition = "2021" +description = "SoundCloud WASM plugin for Vortex — tracks, playlists, artist profiles" +license = "GPL-3.0" +authors = ["vortex-community"] + +[lib] +crate-type = ["cdylib", "rlib"] + +[dependencies] +extism-pdk = "1.4" +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +thiserror = "2.0" + +[dev-dependencies] +rstest = "0.24" + +[profile.release] +opt-level = "z" +lto = true +codegen-units = 1 +strip = true diff --git a/plugins/vortex-mod-soundcloud/README.md b/plugins/vortex-mod-soundcloud/README.md new file mode 100644 index 0000000..4c99ff6 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/README.md @@ -0,0 +1,56 @@ +# vortex-mod-soundcloud + +SoundCloud WASM plugin for [Vortex](https://github.com/mpiton/vortex). + +## Features + +- Single track resolution with title, artist, duration, artwork +- Playlist / album extraction (`/sets/`), plus `/likes`, `/tracks`, `/albums` +- Artwork upgraded from the 100×100 default to the 500×500 variant when + the CDN URL follows the `-large` marker convention +- `client_id` is read from host config (`get_config` → `client_id`) so + that the user can supply their own without rebuilding the plugin + +## Requirements + +- Vortex plugin host ≥ 0.1.0 with `http_request` and `get_config` + host functions enabled. + +## Build + +```bash +rustup target add wasm32-wasip1 +cargo build --release --target wasm32-wasip1 +``` + +The resulting WASM binary is at +`target/wasm32-wasip1/release/vortex_mod_soundcloud.wasm`. + +> Note: the crate ships a `.cargo/config.toml` that sets +> `target = "wasm32-wasip1"`, so `cargo build --release` alone also +> works inside the crate directory. The explicit flag above is given +> so that the command works from any working directory. + +## Install + +The Vortex plugin loader enforces two rules: + +1. The plugin directory name must match the `name` field in `plugin.toml`. +2. The directory must contain exactly one `.wasm` file. + +```bash +mkdir -p ~/.config/vortex/plugins/vortex-mod-soundcloud +cp plugin.toml ~/.config/vortex/plugins/vortex-mod-soundcloud/ +cp target/wasm32-wasip1/release/vortex_mod_soundcloud.wasm \ + ~/.config/vortex/plugins/vortex-mod-soundcloud/vortex-mod-soundcloud.wasm +``` + +## Tests + +```bash +cargo test --target x86_64-unknown-linux-gnu +``` + +All URL classification, JSON parsing, and IPC helpers are +native-testable — tests use hardcoded JSON fixtures so they run without +a WASM runtime or a live SoundCloud account. diff --git a/plugins/vortex-mod-soundcloud/plugin.toml b/plugins/vortex-mod-soundcloud/plugin.toml new file mode 100644 index 0000000..abee522 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/plugin.toml @@ -0,0 +1,23 @@ +[plugin] +name = "vortex-mod-soundcloud" +version = "1.0.0" +category = "crawler" +author = "vortex-community" +description = "SoundCloud tracks, playlists, and artist profiles" +license = "GPL-3.0" +min_vortex_version = "0.1.0" + +[capabilities] +# SoundCloud uses the public resolve API over HTTPS. The plugin delegates +# all network access to the host via `http_request`, so it never touches +# sockets directly — all SSRF/egress rules are enforced host-side. +http = true + +# NOTE: Forward-compatible config documentation. The current manifest +# parser only consumes `[plugin]` and `[capabilities]`. Values below +# describe the contract this plugin expects the settings UI to expose +# when manifest schema support lands. +[config] +preferred_format = { type = "string", default = "mp3", options = ["mp3", "ogg", "wav"] } +quality = { type = "string", default = "high", options = ["low", "medium", "high"] } +client_id = { type = "string", default = "", description = "SoundCloud API client_id — required; the plugin does not auto-discover it. Requests fail with 401/403 until a valid value is provided." } diff --git a/plugins/vortex-mod-soundcloud/src/api.rs b/plugins/vortex-mod-soundcloud/src/api.rs new file mode 100644 index 0000000..74f666f --- /dev/null +++ b/plugins/vortex-mod-soundcloud/src/api.rs @@ -0,0 +1,318 @@ +//! SoundCloud API request / response types. +//! +//! The SoundCloud `/resolve` endpoint returns JSON with very permissive shape +//! — many fields are optional depending on the resource kind and the +//! visibility (public/private, go+/regular). This module models only the +//! subset of fields we care about and tolerates unknown keys. +//! +//! ## HTTP host-function envelope +//! +//! The plugin wraps every outgoing request in an [`HttpRequest`] JSON and +//! expects an [`HttpResponse`] back from the host. The schemas mirror +//! `src-tauri/src/adapters/driven/plugin/host_functions.rs`. + +use std::collections::HashMap; + +use serde::{Deserialize, Serialize}; + +use crate::error::PluginError; + +// ── Host function envelope ──────────────────────────────────────────────────── + +/// Matches `HttpRequest` in `host_functions.rs`. +#[derive(Debug, Serialize)] +pub struct HttpRequest { + pub method: String, + pub url: String, + #[serde(skip_serializing_if = "HashMap::is_empty")] + pub headers: HashMap, + #[serde(skip_serializing_if = "Option::is_none")] + pub body: Option, +} + +/// Matches `HttpResponse` in `host_functions.rs`. +#[derive(Debug, Deserialize)] +pub struct HttpResponse { + pub status: u16, + #[serde(default)] + pub headers: HashMap, + #[serde(default)] + pub body: String, +} + +impl HttpResponse { + /// Returns the body if the status is 2xx, else a typed error. + pub fn into_success_body(self) -> Result { + if (200..300).contains(&self.status) { + Ok(self.body) + } else if self.status == 401 || self.status == 403 { + Err(PluginError::Private(format!("status {}", self.status))) + } else { + Err(PluginError::HttpStatus { + status: self.status, + message: truncate(&self.body, 256), + }) + } + } +} + +fn truncate(s: &str, max: usize) -> String { + if s.len() <= max { + s.to_string() + } else { + let mut cut = max; + while !s.is_char_boundary(cut) && cut > 0 { + cut -= 1; + } + format!("{}…", &s[..cut]) + } +} + +pub fn build_resolve_request(original_url: &str, client_id: &str) -> Result { + let resolve_url = format!( + "https://api-v2.soundcloud.com/resolve?url={}&client_id={}", + urlencode(original_url), + urlencode(client_id), + ); + let req = HttpRequest { + method: "GET".into(), + url: resolve_url, + headers: HashMap::new(), + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +pub fn parse_http_response(raw: &str) -> Result { + serde_json::from_str(raw).map_err(|e| PluginError::HostResponse(e.to_string())) +} + +/// Minimal URL-encode for query parameters (RFC 3986 unreserved + '%'). +/// +/// A full percent-encoder would pull in an extra dependency for just two +/// call sites; the lookup table here covers every byte the resolve +/// endpoint accepts in a `url=` query string. +fn urlencode(s: &str) -> String { + let mut out = String::with_capacity(s.len()); + for b in s.bytes() { + if b.is_ascii_alphanumeric() || matches!(b, b'-' | b'_' | b'.' | b'~') { + out.push(b as char); + } else { + out.push_str(&format!("%{:02X}", b)); + } + } + out +} + +// ── SoundCloud resource types ───────────────────────────────────────────────── + +/// `/resolve` response envelope discriminated by the `kind` field. +/// +/// Known kinds: `track`, `playlist`, `user`. Unknown kinds are mapped to +/// [`ResolveResponse::Unknown`] so the plugin can surface a clear error +/// instead of panicking. +#[derive(Debug, Deserialize)] +#[serde(tag = "kind")] +pub enum ResolveResponse { + #[serde(rename = "track")] + Track(Track), + #[serde(rename = "playlist")] + Playlist(Playlist), + #[serde(rename = "user")] + User(User), + #[serde(other)] + Unknown, +} + +#[derive(Debug, Deserialize)] +pub struct Track { + pub id: u64, + pub title: String, + #[serde(default)] + pub duration: Option, + #[serde(default)] + pub permalink_url: Option, + #[serde(default)] + pub artwork_url: Option, + #[serde(default)] + pub user: Option, + #[serde(default)] + pub streamable: Option, +} + +#[derive(Debug, Deserialize)] +pub struct TrackUser { + pub username: String, +} + +#[derive(Debug, Deserialize)] +pub struct Playlist { + pub id: u64, + pub title: String, + #[serde(default)] + pub permalink_url: Option, + #[serde(default)] + pub artwork_url: Option, + #[serde(default)] + pub tracks: Vec, + #[serde(default)] + pub track_count: Option, +} + +#[derive(Debug, Deserialize)] +pub struct User { + pub id: u64, + pub username: String, + #[serde(default)] + pub permalink_url: Option, + #[serde(default)] + pub avatar_url: Option, +} + +pub fn parse_resolve_response(body: &str) -> Result { + serde_json::from_str(body).map_err(|e| PluginError::ParseJson(e.to_string())) +} + +#[cfg(test)] +mod tests { + use super::*; + + const TRACK_JSON: &str = r#"{ + "kind": "track", + "id": 12345, + "title": "Flickermood", + "duration": 225000, + "permalink_url": "https://soundcloud.com/forss/flickermood", + "artwork_url": "https://i1.sndcdn.com/artworks-12345.jpg", + "streamable": true, + "user": { "username": "Forss" } + }"#; + + const PLAYLIST_JSON: &str = r#"{ + "kind": "playlist", + "id": 99, + "title": "Soulhack", + "permalink_url": "https://soundcloud.com/forss/sets/soulhack", + "tracks": [ + {"kind": "track", "id": 1, "title": "Flickermood"}, + {"kind": "track", "id": 2, "title": "Journeyman"} + ], + "track_count": 2 + }"#; + + const USER_JSON: &str = r#"{ + "kind": "user", + "id": 42, + "username": "forss", + "permalink_url": "https://soundcloud.com/forss", + "avatar_url": "https://i1.sndcdn.com/avatars-42.jpg" + }"#; + + const UNKNOWN_KIND_JSON: &str = r#"{"kind": "system-playlist", "id": 1}"#; + + #[test] + fn parse_track_response() { + let resolved = parse_resolve_response(TRACK_JSON).unwrap(); + match resolved { + ResolveResponse::Track(t) => { + assert_eq!(t.id, 12345); + assert_eq!(t.title, "Flickermood"); + assert_eq!(t.duration, Some(225000)); + assert_eq!(t.user.unwrap().username, "Forss"); + assert!(t.artwork_url.is_some()); + } + other => panic!("expected Track, got {other:?}"), + } + } + + #[test] + fn parse_playlist_response() { + let resolved = parse_resolve_response(PLAYLIST_JSON).unwrap(); + match resolved { + ResolveResponse::Playlist(p) => { + assert_eq!(p.id, 99); + assert_eq!(p.title, "Soulhack"); + assert_eq!(p.tracks.len(), 2); + assert_eq!(p.track_count, Some(2)); + } + other => panic!("expected Playlist, got {other:?}"), + } + } + + #[test] + fn parse_user_response() { + let resolved = parse_resolve_response(USER_JSON).unwrap(); + match resolved { + ResolveResponse::User(u) => { + assert_eq!(u.username, "forss"); + } + other => panic!("expected User, got {other:?}"), + } + } + + #[test] + fn parse_unknown_kind_falls_through() { + let resolved = parse_resolve_response(UNKNOWN_KIND_JSON).unwrap(); + assert!(matches!(resolved, ResolveResponse::Unknown)); + } + + #[test] + fn parse_resolve_rejects_malformed_json() { + let err = parse_resolve_response("not json").unwrap_err(); + assert!(matches!(err, PluginError::ParseJson(_))); + } + + #[test] + fn http_response_2xx_returns_body() { + let resp = HttpResponse { + status: 200, + headers: HashMap::new(), + body: "ok".into(), + }; + assert_eq!(resp.into_success_body().unwrap(), "ok"); + } + + #[test] + fn http_response_401_is_private() { + let resp = HttpResponse { + status: 401, + headers: HashMap::new(), + body: "forbidden".into(), + }; + assert!(matches!( + resp.into_success_body().unwrap_err(), + PluginError::Private(_) + )); + } + + #[test] + fn http_response_500_is_http_status_error() { + let resp = HttpResponse { + status: 500, + headers: HashMap::new(), + body: "boom".into(), + }; + match resp.into_success_body().unwrap_err() { + PluginError::HttpStatus { status, .. } => assert_eq!(status, 500), + other => panic!("unexpected {other:?}"), + } + } + + #[test] + fn urlencode_roundtrips_safe_chars() { + assert_eq!(urlencode("abc-_.~"), "abc-_.~"); + assert_eq!( + urlencode("https://soundcloud.com/a/b"), + "https%3A%2F%2Fsoundcloud.com%2Fa%2Fb" + ); + } + + #[test] + fn build_resolve_request_encodes_target() { + let req_str = + build_resolve_request("https://soundcloud.com/forss/flickermood", "abc123").unwrap(); + assert!(req_str.contains("\"method\":\"GET\"")); + assert!(req_str.contains("client_id=abc123")); + assert!(req_str.contains("url=https%3A%2F%2Fsoundcloud.com%2Fforss%2Fflickermood")); + } +} diff --git a/plugins/vortex-mod-soundcloud/src/error.rs b/plugins/vortex-mod-soundcloud/src/error.rs new file mode 100644 index 0000000..a8d374a --- /dev/null +++ b/plugins/vortex-mod-soundcloud/src/error.rs @@ -0,0 +1,45 @@ +//! Plugin error type. + +use thiserror::Error; + +/// Errors raised by the SoundCloud plugin. +#[derive(Debug, Error)] +pub enum PluginError { + /// SoundCloud API JSON parsing failure with contextual message. + #[error("SoundCloud JSON parse error: {0}")] + ParseJson(String), + + /// Direct serde_json failure (no wrapping context needed). + #[error("JSON error: {0}")] + SerdeJson(#[from] serde_json::Error), + + /// `http_request` host function returned a non-2xx status. + #[error("SoundCloud API returned status {status}: {message}")] + HttpStatus { status: u16, message: String }, + + /// Host function returned an invalid response envelope. + #[error("host function response invalid: {0}")] + HostResponse(String), + + /// URL could not be classified as a SoundCloud resource (host + /// not recognised, malformed path, not SoundCloud at all). + #[error("URL is not a recognised SoundCloud resource: {0}")] + UnsupportedUrl(String), + + /// URL was classified as a SoundCloud resource, but the kind is + /// not supported by the handler that was called — for example, + /// passing an artist-profile URL to `extract_playlist`, or a + /// playlist URL to `extract_track`. Carries the detected + /// [`crate::url_matcher::UrlKind`] so callers can distinguish + /// "not a SoundCloud URL at all" from "valid SoundCloud URL of + /// the wrong kind for this operation". + #[error("SoundCloud resource kind {kind:?} is not supported here: {url}")] + UnsupportedResourceKind { + kind: crate::url_matcher::UrlKind, + url: String, + }, + + /// SoundCloud returned access-denied for a private track. + #[error("SoundCloud resource is private: {0}")] + Private(String), +} diff --git a/plugins/vortex-mod-soundcloud/src/lib.rs b/plugins/vortex-mod-soundcloud/src/lib.rs new file mode 100644 index 0000000..fa5c372 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/src/lib.rs @@ -0,0 +1,545 @@ +//! Vortex SoundCloud WASM plugin. +//! +//! Implements the CrawlerModule contract expected by the Vortex plugin host: +//! - `can_handle(url)` → `"true"` / `"false"` +//! - `supports_playlist(url)` → `"true"` / `"false"` +//! - `extract_links(url)` → JSON string describing the resolved media +//! - `extract_playlist(url)` → JSON string with flat playlist entries +//! +//! The plugin delegates all network access to the host via `http_request`. +//! Pure parsing / URL-matching logic lives in sibling modules so that it +//! can be unit-tested natively. + +pub mod api; +pub mod error; +pub mod url_matcher; + +// The `plugin_api` module exports `#[plugin_fn]`-decorated functions and the +// host-function imports. It is only compiled when targeting WASM, because +// `extism-pdk`'s macros emit code that is not valid for native builds. +#[cfg(target_family = "wasm")] +mod plugin_api; + +use serde::Serialize; + +use crate::api::{Playlist as ApiPlaylist, ResolveResponse, Track}; +use crate::error::PluginError; +use crate::url_matcher::UrlKind; + +// ── IPC DTOs ────────────────────────────────────────────────────────────────── + +/// Returned by `extract_links` — describes the resolved media resource. +#[derive(Debug, Serialize, PartialEq, Eq)] +pub struct ExtractLinksResponse { + pub kind: &'static str, + pub tracks: Vec, +} + +/// A single resolved SoundCloud track entry. +#[derive(Debug, Serialize, PartialEq, Eq)] +pub struct MediaLink { + pub id: String, + pub title: String, + pub url: String, + pub artist: Option, + pub duration_ms: Option, + pub artwork_url: Option, +} + +// ── Pure business logic (native-testable) ──────────────────────────────────── + +/// Returns `"true"` if the URL is any form of recognised SoundCloud resource. +/// +/// Uses [`url_matcher::classify_url`] directly rather than +/// [`url_matcher::is_soundcloud_url`] so that the routing contract stays in +/// sync with the `extract_*` handlers: adding a new [`UrlKind`] variant +/// later will force an explicit decision here instead of silently +/// accepting it. +pub fn handle_can_handle(url: &str) -> String { + // Artist profiles are *not* reported as handleable yet because + // `extract_playlist` currently returns `UnsupportedUrl` for + // `ResolveResponse::User` — advertising support would produce a + // false-positive capability detection and a runtime failure. + // Re-enable `UrlKind::Artist` here once artist pagination is wired. + let kind = url_matcher::classify_url(url); + bool_to_string(matches!(kind, UrlKind::Track | UrlKind::Playlist)) +} + +/// Returns `"true"` only if the URL refers to an explicit playlist / +/// set / likes / tracks / albums collection. Artist profiles are +/// intentionally excluded until artist pagination ships. +pub fn handle_supports_playlist(url: &str) -> String { + let kind = url_matcher::classify_url(url); + bool_to_string(matches!(kind, UrlKind::Playlist)) +} + +fn bool_to_string(b: bool) -> String { + if b { + "true".into() + } else { + "false".into() + } +} + +/// Reject URLs that are not a supported SoundCloud resource. +/// +/// Artist profiles (`UrlKind::Artist`) are not accepted here until the +/// follow-up `/users//tracks` pagination is implemented — accepting +/// them would make `extract_links` fail with `UnsupportedUrl` *after* +/// the routing contract claimed to handle the URL, which is worse than +/// rejecting early. +pub fn ensure_soundcloud_url(url: &str) -> Result { + let kind = url_matcher::classify_url(url); + match kind { + UrlKind::Track | UrlKind::Playlist => Ok(kind), + // Artist is a recognised SoundCloud URL but the kind we cannot + // service yet — surface the kind so callers can tell this + // apart from "not a SoundCloud URL at all". + UrlKind::Artist => Err(PluginError::UnsupportedResourceKind { + kind, + url: url.to_string(), + }), + UrlKind::Unknown => Err(PluginError::UnsupportedUrl(url.to_string())), + } +} + +pub fn ensure_track(url: &str) -> Result<(), PluginError> { + let kind = url_matcher::classify_url(url); + match kind { + UrlKind::Track => Ok(()), + UrlKind::Playlist | UrlKind::Artist => Err(PluginError::UnsupportedResourceKind { + kind, + url: url.to_string(), + }), + UrlKind::Unknown => Err(PluginError::UnsupportedUrl(url.to_string())), + } +} + +pub fn ensure_playlist(url: &str) -> Result<(), PluginError> { + let kind = url_matcher::classify_url(url); + match kind { + UrlKind::Playlist => Ok(()), + UrlKind::Track | UrlKind::Artist => Err(PluginError::UnsupportedResourceKind { + kind, + url: url.to_string(), + }), + UrlKind::Unknown => Err(PluginError::UnsupportedUrl(url.to_string())), + } +} + +/// Convert an API [`Track`] into a [`MediaLink`] with the artwork +/// upgraded from the default 100×100 thumbnail to `t500x500` if possible. +pub fn track_to_link(track: Track) -> MediaLink { + MediaLink { + id: track.id.to_string(), + title: track.title, + url: track.permalink_url.unwrap_or_default(), + artist: track.user.map(|u| u.username), + duration_ms: track.duration, + artwork_url: track.artwork_url.map(upgrade_artwork), + } +} + +/// SoundCloud returns small (100×100) artwork URLs by default. The CDN +/// serves higher resolutions when the `-large` marker is replaced with +/// `-t500x500`. Two known URL shapes must be handled: +/// +/// - `…/artworks-000-large.jpg` — standard, has a file extension +/// - `…/artworks-000-large` — animated / extensionless variant served +/// by some API responses +/// +/// A plain `url.replace("-large", "-t500x500")` would also trigger on +/// `-larger` or `-largest`, which SoundCloud does not use but a future +/// CDN shape might. Guard with a word-boundary check (end-of-string or +/// a `.`, `/`, `?`) so only true `-large` markers are upgraded. +fn upgrade_artwork(url: String) -> String { + // The `-large` marker is always inside the URL *path* — never in + // the query string or fragment — but user-supplied URLs can carry + // `?ref=-large-thing` or `#anchor-large` metadata that would + // otherwise fool an `rfind` scan run over the full URL. So split + // the URL into `(path, suffix)` first, run the rewrite only on + // the path, and reattach `suffix` unchanged. + // + // The path part also uses `rfind` (not `find`) because a single + // path can legitimately contain multiple `-large` occurrences — + // for example the track slug `/user/too-large-a-track/artworks- + // 000-large.jpg` — and only the trailing one identifies the + // artwork size suffix. + let (path, suffix) = split_url_suffix(&url); + if let Some(idx) = path.rfind("-large") { + let after = path + .as_bytes() + .get(idx + "-large".len()) + .copied() + .unwrap_or(0); + // End-of-path also counts as a boundary because the suffix + // (query/fragment) follows immediately after. + let boundary = matches!(after, 0 | b'.' | b'/'); + if boundary { + return format!( + "{}-t500x500{}{}", + &path[..idx], + &path[idx + "-large".len()..], + suffix + ); + } + } + url +} + +/// Split a URL into `(path_part, query_and_fragment_suffix)`. The +/// suffix includes the leading `?` or `#` so that reassembly is just +/// concatenation. If the URL has neither, `suffix` is an empty slice. +fn split_url_suffix(url: &str) -> (&str, &str) { + let query_pos = url.find('?'); + let fragment_pos = url.find('#'); + let split = match (query_pos, fragment_pos) { + (Some(q), Some(f)) => q.min(f), + (Some(q), None) => q, + (None, Some(f)) => f, + (None, None) => return (url, ""), + }; + url.split_at(split) +} + +pub fn build_single_track_response(track: Track) -> ExtractLinksResponse { + ExtractLinksResponse { + kind: "track", + tracks: vec![track_to_link(track)], + } +} + +pub fn build_playlist_response(playlist: ApiPlaylist) -> ExtractLinksResponse { + ExtractLinksResponse { + kind: "playlist", + tracks: playlist.tracks.into_iter().map(track_to_link).collect(), + } +} + +/// Map a resolved response to an [`ExtractLinksResponse`]. +/// +/// Returns an error for `User` responses because turning an artist +/// profile into a track list requires a second `/users//tracks` +/// pagination call that is not implemented yet. Both `extract_links` +/// and `extract_playlist` currently reject artist URLs outright — the +/// error message must *not* redirect the caller to `extract_playlist`, +/// because that handler would also return `UnsupportedUrl` for this +/// variant. `Unknown` kinds are rejected with a plain error so that +/// callers get a clear error. +pub fn response_to_extract_links( + resolved: ResolveResponse, +) -> Result { + match resolved { + ResolveResponse::Track(t) => Ok(build_single_track_response(t)), + ResolveResponse::Playlist(p) => Ok(build_playlist_response(p)), + ResolveResponse::User(u) => Err(PluginError::UnsupportedUrl(format!( + "artist profile '{}' is not supported yet — artist pagination is not implemented", + u.username + ))), + ResolveResponse::Unknown => Err(PluginError::UnsupportedUrl( + "unknown SoundCloud resource kind".into(), + )), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::api::{Track, TrackUser}; + + fn sample_track() -> Track { + Track { + id: 1, + title: "Flickermood".into(), + duration: Some(225_000), + permalink_url: Some("https://soundcloud.com/forss/flickermood".into()), + artwork_url: Some("https://i1.sndcdn.com/artworks-12345-large.jpg".into()), + user: Some(TrackUser { + username: "Forss".into(), + }), + streamable: Some(true), + } + } + + #[test] + fn can_handle_recognises_track() { + assert_eq!( + handle_can_handle("https://soundcloud.com/forss/flickermood"), + "true" + ); + } + + #[test] + fn can_handle_rejects_unrelated_host() { + assert_eq!(handle_can_handle("https://example.com/"), "false"); + } + + #[test] + fn can_handle_rejects_artist_profile_until_pagination_lands() { + // Artist profiles are intentionally excluded — extracting them + // requires a second `/users//tracks` pagination call which + // is not implemented yet, so advertising support would produce + // a false-positive followed by a runtime error. + assert_eq!(handle_can_handle("https://soundcloud.com/forss"), "false"); + } + + #[test] + fn can_handle_accepts_on_short_link() { + assert_eq!( + handle_can_handle("https://on.soundcloud.com/AbCdEfGhIj"), + "true" + ); + } + + #[test] + fn supports_playlist_true_for_sets() { + assert_eq!( + handle_supports_playlist("https://soundcloud.com/forss/sets/soulhack"), + "true" + ); + } + + #[test] + fn supports_playlist_false_for_single_track() { + assert_eq!( + handle_supports_playlist("https://soundcloud.com/forss/flickermood"), + "false" + ); + } + + #[test] + fn supports_playlist_false_for_artist_profile() { + assert_eq!( + handle_supports_playlist("https://soundcloud.com/forss"), + "false" + ); + } + + #[test] + fn ensure_soundcloud_url_rejects_artist_profile_as_unsupported_resource_kind() { + // Artist is a *recognised* SoundCloud URL but the handler + // cannot service it yet — callers should see + // `UnsupportedResourceKind`, not the "not a SoundCloud URL" + // `UnsupportedUrl` variant. + let err = ensure_soundcloud_url("https://soundcloud.com/forss").unwrap_err(); + assert!(matches!( + err, + PluginError::UnsupportedResourceKind { + kind: UrlKind::Artist, + .. + } + )); + } + + #[test] + fn ensure_soundcloud_url_rejects_non_soundcloud_as_unsupported_url() { + let err = ensure_soundcloud_url("https://example.com/").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn ensure_track_rejects_playlist_as_kind_mismatch() { + let err = ensure_track("https://soundcloud.com/forss/sets/soulhack").unwrap_err(); + assert!(matches!( + err, + PluginError::UnsupportedResourceKind { + kind: UrlKind::Playlist, + .. + } + )); + } + + #[test] + fn ensure_playlist_rejects_track_as_kind_mismatch() { + let err = ensure_playlist("https://soundcloud.com/forss/flickermood").unwrap_err(); + assert!(matches!( + err, + PluginError::UnsupportedResourceKind { + kind: UrlKind::Track, + .. + } + )); + } + + #[test] + fn track_to_link_upgrades_artwork() { + let link = track_to_link(sample_track()); + assert_eq!(link.id, "1"); + assert_eq!(link.title, "Flickermood"); + assert_eq!(link.artist.as_deref(), Some("Forss")); + assert_eq!(link.duration_ms, Some(225_000)); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg"), + "large artwork marker should be upgraded to t500x500" + ); + } + + #[test] + fn track_to_link_preserves_non_large_artwork() { + let mut t = sample_track(); + t.artwork_url = Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg") + ); + } + + #[test] + fn track_to_link_upgrades_artwork_without_extension() { + let mut t = sample_track(); + t.artwork_url = Some("https://i1.sndcdn.com/artworks-12345-large".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500"), + "extensionless -large should also be upgraded" + ); + } + + #[test] + fn track_to_link_upgrades_artwork_with_query_string() { + let mut t = sample_track(); + t.artwork_url = Some("https://i1.sndcdn.com/artworks-12345-large?v=2".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500?v=2"), + "query string boundary should still trigger upgrade" + ); + } + + #[test] + fn track_to_link_does_not_upgrade_large_in_query_string() { + // A `-large` token inside the query string is metadata, not an + // artwork suffix — the path itself has the modern `-t500x500` + // marker and must be left untouched. + let mut t = sample_track(); + t.artwork_url = + Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg?ref=-large-thing".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg?ref=-large-thing"), + "query string `-large` must not be rewritten" + ); + } + + #[test] + fn track_to_link_upgrades_large_even_when_query_string_present() { + // A legitimate `-large` path suffix must still be upgraded + // when the URL also carries a query string. + let mut t = sample_track(); + t.artwork_url = Some("https://i1.sndcdn.com/artworks-12345-large.jpg?v=2".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-12345-t500x500.jpg?v=2"), + "path -large suffix must be rewritten while query string is preserved" + ); + } + + #[test] + fn track_to_link_upgrades_trailing_large_when_earlier_large_exists() { + // A URL that contains `-large` as part of an earlier slug must + // not cause the upgrade to rewrite the slug — `rfind` targets + // the trailing size suffix. + let mut t = sample_track(); + t.artwork_url = + Some("https://i1.sndcdn.com/too-large-a-track/artworks-999-large.jpg".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/too-large-a-track/artworks-999-t500x500.jpg"), + "only the trailing -large suffix should be rewritten" + ); + } + + #[test] + fn track_to_link_does_not_upgrade_larger_or_largest() { + let mut t = sample_track(); + t.artwork_url = Some("https://i1.sndcdn.com/artworks-larger.jpg".into()); + let link = track_to_link(t); + assert_eq!( + link.artwork_url.as_deref(), + Some("https://i1.sndcdn.com/artworks-larger.jpg"), + "-larger must not trigger the word-boundary upgrade" + ); + } + + #[test] + fn build_single_track_response_shape() { + let r = build_single_track_response(sample_track()); + assert_eq!(r.kind, "track"); + assert_eq!(r.tracks.len(), 1); + } + + #[test] + fn build_playlist_response_shape() { + let playlist = ApiPlaylist { + id: 42, + title: "Soulhack".into(), + permalink_url: Some("https://soundcloud.com/forss/sets/soulhack".into()), + artwork_url: None, + tracks: vec![sample_track(), sample_track()], + track_count: Some(2), + }; + let r = build_playlist_response(playlist); + assert_eq!(r.kind, "playlist"); + assert_eq!(r.tracks.len(), 2); + } + + #[test] + fn ensure_soundcloud_url_rejects_unknown() { + let err = ensure_soundcloud_url("https://example.com/").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn response_to_extract_links_track_ok() { + let resp = response_to_extract_links(ResolveResponse::Track(sample_track())).unwrap(); + assert_eq!(resp.kind, "track"); + } + + #[test] + fn response_to_extract_links_user_rejects_artist_profile_until_pagination() { + // Artist profiles are rejected by both `extract_links` and + // `extract_playlist` until artist pagination is implemented. + // The error message must not redirect the caller to + // `extract_playlist` (which also rejects this kind). + let err = response_to_extract_links(ResolveResponse::User(crate::api::User { + id: 1, + username: "forss".into(), + permalink_url: None, + avatar_url: None, + })) + .unwrap_err(); + match err { + PluginError::UnsupportedUrl(msg) => { + assert!( + !msg.contains("extract_playlist"), + "error message must not suggest extract_playlist" + ); + assert!(msg.contains("not supported") || msg.contains("not implemented")); + } + other => panic!("expected UnsupportedUrl, got {other:?}"), + } + } + + #[test] + fn response_to_extract_links_unknown_rejected() { + let err = response_to_extract_links(ResolveResponse::Unknown).unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn json_serialisation_of_extract_links_response() { + let resp = build_single_track_response(sample_track()); + let json = serde_json::to_string(&resp).unwrap(); + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap(); + assert_eq!(parsed["kind"], "track"); + assert_eq!(parsed["tracks"][0]["title"], "Flickermood"); + assert_eq!(parsed["tracks"][0]["artist"], "Forss"); + } +} diff --git a/plugins/vortex-mod-soundcloud/src/plugin_api.rs b/plugins/vortex-mod-soundcloud/src/plugin_api.rs new file mode 100644 index 0000000..dcc611a --- /dev/null +++ b/plugins/vortex-mod-soundcloud/src/plugin_api.rs @@ -0,0 +1,142 @@ +//! WASM-only module: `#[plugin_fn]` exports and `#[host_fn]` imports. +//! +//! Gated behind `cfg(target_family = "wasm")` because the macros emit +//! code that only compiles for a WASM target. + +use extism_pdk::*; + +use crate::api::{ + build_resolve_request, parse_http_response, parse_resolve_response, ResolveResponse, +}; +use crate::error::PluginError; +use crate::{ + build_playlist_response, build_single_track_response, ensure_playlist, ensure_soundcloud_url, + ensure_track, handle_can_handle, handle_supports_playlist, response_to_extract_links, +}; + +// ── Host function imports ───────────────────────────────────────────────────── + +#[host_fn] +extern "ExtismHost" { + /// JSON in → JSON out — see `HttpRequest` / `HttpResponse` envelopes. + fn http_request(req: String) -> String; + fn get_config(key: String) -> String; +} + +// ── Plugin function exports ─────────────────────────────────────────────────── + +#[plugin_fn] +pub fn can_handle(url: String) -> FnResult { + Ok(handle_can_handle(&url)) +} + +#[plugin_fn] +pub fn supports_playlist(url: String) -> FnResult { + Ok(handle_supports_playlist(&url)) +} + +#[plugin_fn] +pub fn extract_links(url: String) -> FnResult { + ensure_soundcloud_url(&url).map_err(error_to_fn_error)?; + + let resolved = resolve(&url)?; + let response = response_to_extract_links(resolved).map_err(error_to_fn_error)?; + Ok(serde_json::to_string(&response)?) +} + +#[plugin_fn] +pub fn extract_playlist(url: String) -> FnResult { + ensure_playlist(&url).map_err(error_to_fn_error)?; + + let resolved = resolve(&url)?; + let response = match resolved { + ResolveResponse::Playlist(p) => build_playlist_response(p), + // Artist profiles need a second call; for now we surface a clear + // error so the UI can paginate via a follow-up call when that + // endpoint support lands. + ResolveResponse::User(u) => { + return Err(error_to_fn_error(PluginError::UnsupportedUrl(format!( + "artist profile '{}' — artist pagination not yet implemented", + u.username + )))) + } + ResolveResponse::Track(_) => { + return Err(error_to_fn_error(PluginError::UnsupportedUrl( + "single track cannot be extracted as playlist".into(), + ))) + } + ResolveResponse::Unknown => { + return Err(error_to_fn_error(PluginError::UnsupportedUrl( + "unknown resource kind".into(), + ))) + } + }; + Ok(serde_json::to_string(&response)?) +} + +#[plugin_fn] +pub fn extract_track(url: String) -> FnResult { + ensure_track(&url).map_err(error_to_fn_error)?; + + let resolved = resolve(&url)?; + let response = match resolved { + ResolveResponse::Track(t) => build_single_track_response(t), + _ => { + return Err(error_to_fn_error(PluginError::UnsupportedUrl( + "resolved resource is not a track".into(), + ))) + } + }; + Ok(serde_json::to_string(&response)?) +} + +// ── Host function wiring ────────────────────────────────────────────────────── + +/// Issue a `/resolve` call against api-v2.soundcloud.com via the host and +/// return the parsed envelope. +fn resolve(url: &str) -> FnResult { + let client_id = read_client_id(); + let req_json = build_resolve_request(url, &client_id).map_err(error_to_fn_error)?; + // SAFETY: `http_request` is resolved by the Vortex plugin host at + // load time (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_http_request_function`). Invariants: + // 1. The host registers `http_request` in the `ExtismHost` namespace + // before any `#[plugin_fn]` export is callable — a missing + // symbol would abort `Plugin::new` in extism_loader.rs. + // 2. The ABI is `(I64) -> I64` — a single u64 Extism memory handle + // in, a single u64 handle out. The `#[host_fn]` macro marshals + // `String` to/from the memory handle. + // 3. The host enforces capability `http=true` from the manifest + // before invoking the implementation; rejections return an + // error which `?` propagates safely. + // 4. Inputs and outputs are owned, serialisable JSON strings — no + // aliasing or mutability concerns. + let resp_json = unsafe { http_request(req_json)? }; + let response = parse_http_response(&resp_json).map_err(error_to_fn_error)?; + let body = response.into_success_body().map_err(error_to_fn_error)?; + parse_resolve_response(&body).map_err(error_to_fn_error) +} + +/// Read the `client_id` config value. Returns an empty string if the +/// host has not yet wired `get_config` (forward-compatible with the +/// manifest parser, which currently ignores `[config]`). +fn read_client_id() -> String { + // SAFETY: `get_config` is registered host-side before plugin exports + // run (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_get_config_function`). Invariants: + // 1. The symbol is registered in the `ExtismHost` namespace + // before any `#[plugin_fn]` export is callable. + // 2. The ABI is `(I64) -> I64`; the `#[host_fn]` macro marshals + // `String` in/out. + // 3. A missing key or transient error returns the empty default + // so the plugin still builds the URL — the host surfaces the + // 401/403 via `http_request`, which `HttpResponse::into_success_body` + // maps to `PluginError::Private` and the user sees a clear + // "SoundCloud resource is private" error. + // 4. Inputs/outputs are owned JSON strings — no aliasing concerns. + unsafe { get_config("client_id".to_string()) }.unwrap_or_default() +} + +fn error_to_fn_error(err: PluginError) -> WithReturnCode { + extism_pdk::Error::msg(err.to_string()).into() +} diff --git a/plugins/vortex-mod-soundcloud/src/url_matcher.rs b/plugins/vortex-mod-soundcloud/src/url_matcher.rs new file mode 100644 index 0000000..c413200 --- /dev/null +++ b/plugins/vortex-mod-soundcloud/src/url_matcher.rs @@ -0,0 +1,178 @@ +//! SoundCloud URL detection and classification. +//! +//! Pure logic, no WASM or HTTP required — unit-testable natively. +//! +//! ## Design +//! +//! SoundCloud URLs are classified based on the number of path segments +//! after the user slug: +//! +//! - `soundcloud.com/` — artist profile (→ Artist) +//! - `soundcloud.com//` — track (→ Track) +//! - `soundcloud.com//sets/` — playlist / album (→ Playlist) +//! - `soundcloud.com//likes` — liked tracks collection (→ Playlist) +//! +//! The host allowlist blocks substring smuggling +//! (`example.com/?next=soundcloud.com/foo`). + +/// Kind of SoundCloud resource identified from a URL. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum UrlKind { + /// A single track: `soundcloud.com//` + Track, + /// A playlist / album / likes collection: `soundcloud.com//sets/` + Playlist, + /// An artist profile: `soundcloud.com/` + Artist, + /// Not a recognised SoundCloud URL. + Unknown, +} + +/// Returns `true` if the URL is any form of recognised SoundCloud resource. +pub fn is_soundcloud_url(url: &str) -> bool { + !matches!(classify_url(url), UrlKind::Unknown) +} + +/// Classify the URL into a [`UrlKind`]. +/// +/// Accepts both `soundcloud.com` and `m.soundcloud.com`. The `api.` and +/// `api-v2.` subdomains are not accepted because they are server-side +/// endpoints, not public URLs the user would paste. +pub fn classify_url(url: &str) -> UrlKind { + let Some((host_lower, path)) = validate_and_split(url) else { + return UrlKind::Unknown; + }; + + if !is_soundcloud_host(&host_lower) { + return UrlKind::Unknown; + } + + let path_only = normalize_path(path); + let segments: Vec<&str> = path_only + .trim_start_matches('/') + .split('/') + .filter(|s| !s.is_empty()) + .collect(); + + // `on.soundcloud.com/` is a URL-shortener: the single-segment + // path is a redirect token, not a user slug, so it resolves to a + // track (the host resolver follows the redirect). Classify it as + // Track so that `ensure_track` accepts it downstream. + if host_lower == "on.soundcloud.com" { + return match segments.as_slice() { + [_token] => UrlKind::Track, + _ => UrlKind::Unknown, + }; + } + + match segments.as_slice() { + [] => UrlKind::Unknown, + [_user] => UrlKind::Artist, + [_user, "sets", _slug] => UrlKind::Playlist, + [_user, "likes"] | [_user, "reposts"] | [_user, "tracks"] | [_user, "albums"] => { + UrlKind::Playlist + } + [_user, _slug] => UrlKind::Track, + _ => UrlKind::Unknown, + } +} + +/// Strip the query string, fragment, and trailing slash from a raw +/// path-and-query slice. Fragments are stripped first because a URL of +/// the form `path?q#frag` keeps the fragment *after* the query, and a +/// URL of the form `path#frag?q` (technically malformed but tolerated +/// by some clients) is handled by the same two-pass split. +fn normalize_path(path: &str) -> &str { + let no_frag = path.split('#').next().unwrap_or(""); + let no_query = no_frag.split('?').next().unwrap_or(""); + no_query.trim_end_matches('/') +} + +fn is_soundcloud_host(host: &str) -> bool { + matches!( + host, + "soundcloud.com" | "www.soundcloud.com" | "m.soundcloud.com" | "on.soundcloud.com" + ) +} + +/// Split `scheme://host/path?query` into `(host_lowercased, path+query)`. +/// Strips userinfo and port from the authority, rejects non-http(s). +fn validate_and_split(url: &str) -> Option<(String, &str)> { + let (scheme, rest) = url.split_once("://")?; + if !matches!(scheme.to_ascii_lowercase().as_str(), "http" | "https") { + return None; + } + let (authority, path_and_query) = match rest.find('/') { + Some(idx) => (&rest[..idx], &rest[idx..]), + None => (rest, ""), + }; + // Strip userinfo (`user:pass@host`) and port, with IPv6-literal + // awareness so `[::1]:443` does not collapse to `[`. + let authority_no_user = authority.rsplit('@').next().unwrap_or(authority); + let host = extract_host(authority_no_user)?; + Some((host.to_ascii_lowercase(), path_and_query)) +} + +/// Extract the host portion (without port) from an authority string. +/// Handles both plain hosts/IPv4 and bracketed IPv6 literals — see +/// the equivalent helper in the gallery plugin for the full policy. +fn extract_host(authority: &str) -> Option<&str> { + if authority.is_empty() { + return None; + } + if authority.starts_with('[') { + let close = authority.find(']')?; + Some(&authority[..=close]) + } else { + let host = authority.split(':').next().unwrap_or(authority); + if host.is_empty() { + None + } else { + Some(host) + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use rstest::rstest; + + #[rstest] + #[case("https://soundcloud.com/forss/flickermood", UrlKind::Track)] + #[case("https://soundcloud.com/forss/sets/soulhack", UrlKind::Playlist)] + #[case("https://soundcloud.com/forss", UrlKind::Artist)] + #[case("https://soundcloud.com/forss/likes", UrlKind::Playlist)] + #[case("https://soundcloud.com/forss/tracks", UrlKind::Playlist)] + #[case("https://soundcloud.com/forss/albums", UrlKind::Playlist)] + #[case("https://m.soundcloud.com/forss/flickermood", UrlKind::Track)] + #[case("https://www.soundcloud.com/forss", UrlKind::Artist)] + #[case( + "https://soundcloud.com/forss/flickermood?in=foo/sets/bar", + UrlKind::Track + )] + #[case("https://soundcloud.com/forss/flickermood/", UrlKind::Track)] + #[case("https://example.com/?next=soundcloud.com/forss", UrlKind::Unknown)] + #[case("https://api.soundcloud.com/tracks/123", UrlKind::Unknown)] + #[case("ftp://soundcloud.com/forss", UrlKind::Unknown)] + #[case("not a url", UrlKind::Unknown)] + // Fragment stripping: collections with `#...` must not be + // misclassified as tracks just because the path has two segments. + #[case("https://soundcloud.com/forss/likes#recent", UrlKind::Playlist)] + #[case("https://soundcloud.com/forss#bio", UrlKind::Artist)] + #[case("https://soundcloud.com/forss/flickermood#t=30", UrlKind::Track)] + // on.soundcloud.com short links are redirect tokens → Track + #[case("https://on.soundcloud.com/AbCdEfGhIj", UrlKind::Track)] + #[case("https://on.soundcloud.com/AbCdEfGhIj?si=xyz", UrlKind::Track)] + #[case("https://on.soundcloud.com/", UrlKind::Unknown)] + fn test_classify_url(#[case] url: &str, #[case] expected: UrlKind) { + assert_eq!(classify_url(url), expected); + } + + #[test] + fn test_is_soundcloud_url_accepts_tracks_and_playlists() { + assert!(is_soundcloud_url("https://soundcloud.com/a/b")); + assert!(is_soundcloud_url("https://soundcloud.com/a/sets/b")); + assert!(!is_soundcloud_url("https://example.com/")); + } +} diff --git a/plugins/vortex-mod-vimeo/.cargo/config.toml b/plugins/vortex-mod-vimeo/.cargo/config.toml new file mode 100644 index 0000000..6b509f5 --- /dev/null +++ b/plugins/vortex-mod-vimeo/.cargo/config.toml @@ -0,0 +1,2 @@ +[build] +target = "wasm32-wasip1" diff --git a/plugins/vortex-mod-vimeo/Cargo.lock b/plugins/vortex-mod-vimeo/Cargo.lock new file mode 100644 index 0000000..10a6f9b --- /dev/null +++ b/plugins/vortex-mod-vimeo/Cargo.lock @@ -0,0 +1,558 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "anyhow" +version = "1.0.102" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" + +[[package]] +name = "autocfg" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" + +[[package]] +name = "base64" +version = "0.22.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6" + +[[package]] +name = "bytemuck" +version = "1.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec" + +[[package]] +name = "bytes" +version = "1.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33" + +[[package]] +name = "cfg-if" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" + +[[package]] +name = "either" +version = "1.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719" + +[[package]] +name = "equivalent" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f" + +[[package]] +name = "extism-convert" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec1a8eac059a1730a21aa47f99a0c2075ba0ab88fd0c4e52e35027cf99cdf3e7" +dependencies = [ + "anyhow", + "base64", + "bytemuck", + "extism-convert-macros", + "prost", + "rmp-serde", + "serde", + "serde_json", +] + +[[package]] +name = "extism-convert-macros" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "848f105dd6e1af2ea4bb4a76447658e8587167df3c4e4658c4258e5b14a5b051" +dependencies = [ + "manyhow", + "proc-macro-crate", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "extism-manifest" +version = "1.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "953a22ad322939ae4567ec73a34913a3a43dcbdfa648b8307d38fe56bb3a0acd" +dependencies = [ + "base64", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "352fcb5a66eb74145a1c4a01f2bd15d59c62c85be73aac8471880c65b26b798f" +dependencies = [ + "anyhow", + "base64", + "extism-convert", + "extism-manifest", + "extism-pdk-derive", + "serde", + "serde_json", +] + +[[package]] +name = "extism-pdk-derive" +version = "1.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d086daea5fd844e3c5ac69ddfe36df4a9a43e7218cf7d1f888182b089b09806c" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-timer" +version = "3.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-core", + "futures-macro", + "futures-task", + "pin-project-lite", + "slab", +] + +[[package]] +name = "glob" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280" + +[[package]] +name = "hashbrown" +version = "0.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4f467dd6dccf739c208452f8014c75c18bb8301b050ad1cfb27153803edb0f51" + +[[package]] +name = "indexmap" +version = "2.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d466e9454f08e4a911e14806c24e16fba1b4c121d1ea474396f396069cf949d9" +dependencies = [ + "equivalent", + "hashbrown", +] + +[[package]] +name = "itertools" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2b192c782037fadd9cfa75548310488aabdbf3d2da73885b31bd0abd03351285" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "manyhow" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b33efb3ca6d3b07393750d4030418d594ab1139cee518f0dc88db70fec873587" +dependencies = [ + "manyhow-macros", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "manyhow-macros" +version = "0.11.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46fce34d199b78b6e6073abf984c9cf5fd3e9330145a93ee0738a7443e371495" +dependencies = [ + "proc-macro-utils", + "proc-macro2", + "quote", +] + +[[package]] +name = "memchr" +version = "2.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", +] + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "proc-macro-crate" +version = "3.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e67ba7e9b2b56446f1d419b1d807906278ffa1a658a8a5d8a39dcb1f5a78614f" +dependencies = [ + "toml_edit", +] + +[[package]] +name = "proc-macro-utils" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "eeaf08a13de400bc215877b5bdc088f241b12eb42f0a548d3390dc1c56bb7071" +dependencies = [ + "proc-macro2", + "quote", + "smallvec", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "prost" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d2ea70524a2f82d518bce41317d0fae74151505651af45faf1ffbd6fd33f0568" +dependencies = [ + "bytes", + "prost-derive", +] + +[[package]] +name = "prost-derive" +version = "0.14.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "27c6023962132f4b30eb4c172c91ce92d933da334c59c23cddee82358ddafb0b" +dependencies = [ + "anyhow", + "itertools", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "regex" +version = "1.12.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" + +[[package]] +name = "relative-path" +version = "1.9.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ba39f3699c378cd8970968dcbff9c43159ea4cfbd88d43c00b22f2ef10a435d2" + +[[package]] +name = "rmp" +version = "0.8.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4ba8be72d372b2c9b35542551678538b562e7cf86c3315773cae48dfbfe7790c" +dependencies = [ + "num-traits", +] + +[[package]] +name = "rmp-serde" +version = "1.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72f81bee8c8ef9b577d1681a70ebbc962c232461e397b22c208c43c04b67a155" +dependencies = [ + "rmp", + "serde", +] + +[[package]] +name = "rstest" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "03e905296805ab93e13c1ec3a03f4b6c4f35e9498a3d5fa96dc626d22c03cd89" +dependencies = [ + "futures-timer", + "futures-util", + "rstest_macros", + "rustc_version", +] + +[[package]] +name = "rstest_macros" +version = "0.24.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef0053bbffce09062bee4bcc499b0fbe7a57b879f1efe088d6d8d4c7adcdef9b" +dependencies = [ + "cfg-if", + "glob", + "proc-macro-crate", + "proc-macro2", + "quote", + "regex", + "relative-path", + "rustc_version", + "syn", + "unicode-ident", +] + +[[package]] +name = "rustc_version" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92" +dependencies = [ + "semver", +] + +[[package]] +name = "semver" +version = "1.0.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8a7852d02fc848982e0c167ef163aaff9cd91dc640ba85e263cb1ce46fae51cd" + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.149" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "smallvec" +version = "1.15.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "thiserror" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "2.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "toml_datetime" +version = "1.1.1+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3165f65f62e28e0115a00b2ebdd37eb6f3b641855f9d636d3cd4103767159ad7" +dependencies = [ + "serde_core", +] + +[[package]] +name = "toml_edit" +version = "0.25.11+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b59c4d22ed448339746c59b905d24568fcbb3ab65a500494f7b8c3e97739f2b" +dependencies = [ + "indexmap", + "toml_datetime", + "toml_parser", + "winnow", +] + +[[package]] +name = "toml_parser" +version = "1.1.2+spec-1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2abe9b86193656635d2411dc43050282ca48aa31c2451210f4202550afb7526" +dependencies = [ + "winnow", +] + +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + +[[package]] +name = "vortex-mod-vimeo" +version = "1.0.0" +dependencies = [ + "extism-pdk", + "regex", + "rstest", + "serde", + "serde_json", + "thiserror", +] + +[[package]] +name = "winnow" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09dac053f1cd375980747450bfc7250c264eaae0583872e845c0c7cd578872b5" +dependencies = [ + "memchr", +] + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/plugins/vortex-mod-vimeo/Cargo.toml b/plugins/vortex-mod-vimeo/Cargo.toml new file mode 100644 index 0000000..ffe399b --- /dev/null +++ b/plugins/vortex-mod-vimeo/Cargo.toml @@ -0,0 +1,26 @@ +[package] +name = "vortex-mod-vimeo" +version = "1.0.0" +edition = "2021" +description = "Vimeo WASM plugin for Vortex — public/private videos, quality selection" +license = "GPL-3.0" +authors = ["vortex-community"] + +[lib] +crate-type = ["cdylib", "rlib"] + +[dependencies] +extism-pdk = "1.4" +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" +regex = "1.11" +thiserror = "2.0" + +[dev-dependencies] +rstest = "0.24" + +[profile.release] +opt-level = "z" +lto = true +codegen-units = 1 +strip = true diff --git a/plugins/vortex-mod-vimeo/README.md b/plugins/vortex-mod-vimeo/README.md new file mode 100644 index 0000000..017b571 --- /dev/null +++ b/plugins/vortex-mod-vimeo/README.md @@ -0,0 +1,46 @@ +# vortex-mod-vimeo + +Vimeo WASM plugin for [Vortex](https://github.com/mpiton/vortex). + +## Features + +- Single public video extraction via the oEmbed endpoint +- Private-link videos (`vimeo.com//`) recognised and proxied + through the same oEmbed call +- Quality variants parsed from the player config JSON + (`player.vimeo.com/video//config`), including HLS adaptive fallback +- Audio-only preference (`extract_audio_only` config) preserves HLS and + drops progressive MP4 variants +- Quality selection helper with `2K → 1440p` and `4K → 2160p` mapping + +## Requirements + +- Vortex plugin host ≥ 0.1.0 with `http_request` and `get_config` + host functions enabled. + +## Build + +```bash +rustup target add wasm32-wasip1 +cargo build --release +``` + +Resulting WASM: `target/wasm32-wasip1/release/vortex_mod_vimeo.wasm`. + +## Install + +```bash +mkdir -p ~/.config/vortex/plugins/vortex-mod-vimeo +cp plugin.toml ~/.config/vortex/plugins/vortex-mod-vimeo/ +cp target/wasm32-wasip1/release/vortex_mod_vimeo.wasm \ + ~/.config/vortex/plugins/vortex-mod-vimeo/vortex-mod-vimeo.wasm +``` + +## Tests + +```bash +cargo test --target x86_64-unknown-linux-gnu +``` + +Pure parsing modules (`url_matcher`, `parser`, response builders) are +covered natively with hardcoded oEmbed and player-config fixtures. diff --git a/plugins/vortex-mod-vimeo/plugin.toml b/plugins/vortex-mod-vimeo/plugin.toml new file mode 100644 index 0000000..f8bc1a6 --- /dev/null +++ b/plugins/vortex-mod-vimeo/plugin.toml @@ -0,0 +1,21 @@ +[plugin] +name = "vortex-mod-vimeo" +version = "1.0.0" +category = "crawler" +author = "vortex-community" +description = "Vimeo public/private videos with quality selection" +license = "GPL-3.0" +min_vortex_version = "0.1.0" + +[capabilities] +# Vimeo parsing needs oEmbed JSON plus the public player config JSON. +# Both are fetched via the host `http_request` host function. +http = true + +[config] +# `default_quality` is honoured by `get_media_variants`: the variant +# whose height is closest to (but not exceeding) the preference is +# hoisted to the head of the returned list, becoming the default +# selection in the host UI. +default_quality = { type = "string", default = "720p", options = ["360p", "480p", "720p", "1080p", "2K", "4K"] } +extract_audio_only = { type = "boolean", default = false } diff --git a/plugins/vortex-mod-vimeo/src/error.rs b/plugins/vortex-mod-vimeo/src/error.rs new file mode 100644 index 0000000..30734d0 --- /dev/null +++ b/plugins/vortex-mod-vimeo/src/error.rs @@ -0,0 +1,35 @@ +//! Plugin error type. + +use thiserror::Error; + +/// Errors raised by the Vimeo plugin. +#[derive(Debug, Error)] +pub enum PluginError { + /// JSON parsing failure with contextual message. + #[error("Vimeo JSON parse error: {0}")] + ParseJson(String), + + /// Direct serde_json failure. + #[error("JSON error: {0}")] + SerdeJson(#[from] serde_json::Error), + + /// `http_request` host function returned a non-2xx status. + #[error("Vimeo API returned status {status}: {message}")] + HttpStatus { status: u16, message: String }, + + /// Vimeo player config JSON not found on the page HTML. + #[error("Vimeo player config not found on page")] + PlayerConfigNotFound, + + /// Host function returned an invalid response envelope. + #[error("host function response invalid: {0}")] + HostResponse(String), + + /// URL could not be classified as a Vimeo resource. + #[error("URL is not a recognised Vimeo resource: {0}")] + UnsupportedUrl(String), + + /// Vimeo video is private or requires authentication. + #[error("Vimeo resource is private: {0}")] + Private(String), +} diff --git a/plugins/vortex-mod-vimeo/src/lib.rs b/plugins/vortex-mod-vimeo/src/lib.rs new file mode 100644 index 0000000..899c94a --- /dev/null +++ b/plugins/vortex-mod-vimeo/src/lib.rs @@ -0,0 +1,567 @@ +//! Vortex Vimeo WASM plugin. +//! +//! Implements the CrawlerModule contract expected by the Vortex plugin host: +//! - `can_handle(url)` → `"true"` / `"false"` +//! - `supports_playlist(url)` → `"true"` / `"false"` +//! - `extract_links(url)` → JSON string describing the resolved media +//! - `get_media_variants(url)` → JSON string listing available formats +//! +//! Network access is delegated to the host via `http_request`. + +pub mod error; +pub mod parser; +pub mod url_matcher; + +#[cfg(target_family = "wasm")] +mod plugin_api; + +use serde::Serialize; + +use crate::error::PluginError; +use crate::parser::{OembedResponse, PlayerConfig, ProgressiveEntry}; +use crate::url_matcher::UrlKind; + +// ── IPC DTOs ────────────────────────────────────────────────────────────────── + +#[derive(Debug, Serialize, PartialEq, Eq)] +pub struct ExtractLinksResponse { + pub kind: &'static str, + pub videos: Vec, +} + +#[derive(Debug, Serialize, PartialEq, Eq)] +pub struct MediaLink { + pub id: String, + pub title: String, + pub url: String, + pub description: Option, + pub uploader: Option, + pub duration: Option, + pub thumbnail: Option, +} + +#[derive(Debug, Serialize, PartialEq)] +pub struct MediaVariantsResponse { + pub variants: Vec, +} + +#[derive(Debug, Serialize, PartialEq)] +pub struct MediaVariant { + pub format_id: String, + pub kind: VariantKind, + pub ext: String, + pub width: Option, + pub height: Option, + pub fps: Option, + pub url: String, +} + +#[derive(Debug, Serialize, PartialEq, Eq, Clone, Copy)] +#[serde(rename_all = "snake_case")] +pub enum VariantKind { + Video, + Audio, + Adaptive, +} + +// ── Routing helpers ────────────────────────────────────────────────────────── + +pub fn handle_can_handle(url: &str) -> String { + // Showcase URLs are intentionally excluded until + // `extract_playlist` is implemented — advertising support would + // produce a false-positive followed by a runtime `UnsupportedUrl`. + bool_to_string(matches!( + url_matcher::classify_url(url), + UrlKind::Video | UrlKind::PrivateVideo + )) +} + +pub fn handle_supports_playlist(_url: &str) -> String { + // Same rationale as `handle_can_handle`: showcase enumeration + // requires an access-token endpoint that is not wired in this MVP. + // The URL is intentionally ignored — we unconditionally report + // `false` so the host never routes showcase URLs to a handler + // that can only fail. Re-introduce a URL inspection here once + // `extract_playlist` grows a working showcase backend. + bool_to_string(false) +} + +fn bool_to_string(b: bool) -> String { + if b { + "true".into() + } else { + "false".into() + } +} + +/// Reject URLs that are not a single-video resource. +/// +/// With the current [`UrlKind`] set (`Video`, `PrivateVideo`, +/// `Showcase`, `Unknown`), this is functionally equivalent to +/// [`ensure_single_video`] — both gate on the same variants because +/// showcase extraction is not implemented. The two functions are kept +/// as distinct names for call-site clarity: `ensure_vimeo_url` is +/// used at the top-level routing boundary (e.g. by a future +/// `extract_links` that supports playlists), while +/// `ensure_single_video` is used by handlers that specifically need a +/// single-video resource (`get_media_variants`). When showcase +/// support lands, `ensure_vimeo_url` will accept `Showcase` too and +/// the two will diverge. +pub fn ensure_vimeo_url(url: &str) -> Result { + ensure_single_video(url) +} + +/// Reject URLs that are not a Video or PrivateVideo. Callers that need +/// to operate on a single-video resource (progressive variants, HLS, +/// oEmbed) should call this instead of [`ensure_vimeo_url`] so that +/// future expansion of the routing contract does not accidentally let +/// showcase URLs reach a single-video code path. +pub fn ensure_single_video(url: &str) -> Result { + match url_matcher::classify_url(url) { + kind @ (UrlKind::Video | UrlKind::PrivateVideo) => Ok(kind), + UrlKind::Showcase | UrlKind::Unknown => Err(PluginError::UnsupportedUrl(url.to_string())), + } +} + +// ── Response builders ───────────────────────────────────────────────────────── + +/// Build a single-video [`ExtractLinksResponse`] from an oEmbed payload +/// and the **original source URL** the caller resolved against. +/// +/// Private share links (`vimeo.com//`) carry an auth token in +/// the second path segment — reconstructing the URL from `video_id` +/// alone would drop that hash, and the resulting permalink would no +/// longer open the same video. So the caller must pass the original +/// URL in as `source_url`, and this function preserves it verbatim +/// except when it is empty (in which case it falls back to the +/// `https://vimeo.com/` permalink derived from the oEmbed payload). +pub fn build_single_video_response( + oembed: OembedResponse, + source_url: &str, +) -> ExtractLinksResponse { + let id = oembed.video_id.map(|id| id.to_string()).unwrap_or_default(); + let url = if !source_url.is_empty() { + source_url.to_string() + } else if !id.is_empty() { + format!("https://vimeo.com/{id}") + } else { + String::new() + }; + let link = MediaLink { + id, + title: oembed.title, + url, + description: oembed.description, + uploader: oembed.author_name, + duration: oembed.duration, + thumbnail: oembed.thumbnail_url, + }; + ExtractLinksResponse { + kind: "video", + videos: vec![link], + } +} + +/// Build the variants list from a parsed player config. +/// +/// Progressive MP4 URLs become `Video` variants, while the HLS master +/// manifest is exposed as a single `Adaptive` entry pointing at the +/// default CDN URL (falling back to the first entry if none is flagged +/// default). When `audio_only` is requested the caller post-filters +/// with [`filter_audio_only`]. +pub fn build_media_variants_response(config: PlayerConfig) -> MediaVariantsResponse { + let mut variants: Vec = config + .request + .files + .progressive + .into_iter() + .map(progressive_to_variant) + .collect(); + + if let Some(hls) = config.request.files.hls { + if let Some(cdn) = pick_cdn(&hls) { + variants.push(MediaVariant { + format_id: "hls".into(), + kind: VariantKind::Adaptive, + ext: "m3u8".into(), + width: None, + height: None, + fps: None, + url: cdn, + }); + } + } + + // Deterministic order: progressive in ascending height, adaptive last. + variants.sort_by_key(|v| match v.kind { + VariantKind::Audio => (0u8, 0u32), + VariantKind::Video => (1, v.height.unwrap_or(0)), + VariantKind::Adaptive => (2, 0), + }); + + MediaVariantsResponse { variants } +} + +fn pick_cdn(hls: &parser::HlsEntry) -> Option { + if let Some(key) = &hls.default_cdn { + if let Some(entry) = hls.cdns.get(key) { + return Some(entry.url.clone()); + } + } + // `HashMap::values().next()` is non-deterministic: the chosen CDN + // would change across runs even when the rest of the variant list + // is intentionally stable. Iterate over the keys and pick the + // lexicographically smallest one so the fallback is reproducible + // and matches the sort applied to progressive variants. + let min_key = hls.cdns.keys().min()?; + hls.cdns.get(min_key).map(|e| e.url.clone()) +} + +fn progressive_to_variant(entry: ProgressiveEntry) -> MediaVariant { + let ext = entry + .mime + .as_deref() + .and_then(|m| m.strip_prefix("video/")) + .unwrap_or("mp4") + .to_string(); + MediaVariant { + format_id: entry.quality.clone(), + kind: VariantKind::Video, + ext, + width: entry.width, + height: entry.height, + fps: entry.fps, + url: entry.url, + } +} + +/// Drop every non-audio-eligible variant (i.e. all progressive video +/// entries) when the user has requested audio-only download. The HLS +/// `Adaptive` stream is preserved because it muxes audio + video and +/// the downstream pipeline demuxes audio from it. +pub fn filter_audio_only(mut response: MediaVariantsResponse) -> MediaVariantsResponse { + response.variants.retain(|v| v.kind != VariantKind::Video); + response +} + +/// Return the variant closest to (but not exceeding) the user's +/// preferred quality (e.g. `"720p"`). Falls back to the highest +/// available progressive variant, and finally to the HLS stream. +pub fn pick_variant_for_quality<'a>( + variants: &'a [MediaVariant], + preferred: &str, +) -> Option<&'a MediaVariant> { + let target = parse_height(preferred)?; + let mut best: Option<&MediaVariant> = None; + for v in variants.iter().filter(|v| v.kind == VariantKind::Video) { + if let Some(h) = v.height { + if h <= target { + best = match best { + Some(prev) if prev.height.unwrap_or(0) >= h => Some(prev), + _ => Some(v), + }; + } + } + } + best.or_else(|| variants.iter().find(|v| v.kind == VariantKind::Adaptive)) +} + +fn parse_height(quality: &str) -> Option { + let trimmed = quality.trim_end_matches(['p', 'P']); + // "2K" ≈ 1080 isn't quite right, but it's what the UI uses to label + // 1440p; similarly "4K" → 2160. The mapping mirrors the plugin.toml + // options list. + match trimmed.to_ascii_uppercase().as_str() { + "2K" => Some(1440), + "4K" => Some(2160), + other => other.parse().ok(), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::parser::{ + CdnEntry, FilesConfig, HlsEntry, OembedResponse, PlayerConfig, ProgressiveEntry, + RequestConfig, + }; + use std::collections::HashMap; + + fn sample_oembed() -> OembedResponse { + OembedResponse { + kind: "video".into(), + title: "Sintel trailer".into(), + description: Some("Blender demo".into()), + author_name: Some("Blender Foundation".into()), + author_url: Some("https://vimeo.com/blender".into()), + thumbnail_url: Some("https://i.vimeocdn.com/video/1.jpg".into()), + duration: Some(52), + video_id: Some(123_456_789), + } + } + + fn sample_progressive(quality: &str, height: u32, url: &str) -> ProgressiveEntry { + ProgressiveEntry { + profile: None, + quality: quality.to_string(), + width: Some(height * 16 / 9), + height: Some(height), + fps: Some(24.0), + mime: Some("video/mp4".into()), + url: url.to_string(), + } + } + + fn sample_config_with_all() -> PlayerConfig { + let mut cdns = HashMap::new(); + cdns.insert( + "akfire".into(), + CdnEntry { + url: "https://cdn.vimeocdn.com/master.m3u8".into(), + avc_url: None, + }, + ); + PlayerConfig { + request: RequestConfig { + files: FilesConfig { + progressive: vec![ + sample_progressive("1080p", 1080, "https://a.mp4"), + sample_progressive("360p", 360, "https://b.mp4"), + sample_progressive("720p", 720, "https://c.mp4"), + ], + hls: Some(HlsEntry { + cdns, + default_cdn: Some("akfire".into()), + }), + dash: None, + }, + }, + video: None, + } + } + + #[test] + fn can_handle_recognises_public_video() { + assert_eq!(handle_can_handle("https://vimeo.com/123456789"), "true"); + } + + #[test] + fn can_handle_rejects_unknown() { + assert_eq!(handle_can_handle("https://example.com/"), "false"); + } + + #[test] + fn supports_playlist_false_for_video() { + assert_eq!( + handle_supports_playlist("https://vimeo.com/123456789"), + "false" + ); + } + + #[test] + fn build_single_video_response_populates_fields() { + let r = build_single_video_response(sample_oembed(), "https://vimeo.com/123456789"); + assert_eq!(r.kind, "video"); + assert_eq!(r.videos.len(), 1); + let v = &r.videos[0]; + assert_eq!(v.id, "123456789"); + assert_eq!(v.title, "Sintel trailer"); + assert_eq!(v.url, "https://vimeo.com/123456789"); + assert_eq!(v.uploader.as_deref(), Some("Blender Foundation")); + assert_eq!(v.duration, Some(52)); + } + + #[test] + fn build_single_video_response_preserves_private_share_hash() { + // For private share links the hash token must not be dropped. + let source_url = "https://vimeo.com/123456789/abcdef1234"; + let r = build_single_video_response(sample_oembed(), source_url); + assert_eq!( + r.videos[0].url, source_url, + "private share URL must be preserved verbatim" + ); + } + + #[test] + fn build_single_video_response_falls_back_when_source_empty() { + // When the caller has no source URL (e.g. internal batch), + // the oEmbed video_id is used to reconstruct a public permalink. + let r = build_single_video_response(sample_oembed(), ""); + assert_eq!(r.videos[0].url, "https://vimeo.com/123456789"); + } + + #[test] + fn build_variants_sorted_ascending_height_then_hls() { + let r = build_media_variants_response(sample_config_with_all()); + let heights: Vec> = r.variants.iter().map(|v| v.height).collect(); + // Progressive order: 360 → 720 → 1080 → HLS(no height) + assert_eq!(heights, vec![Some(360), Some(720), Some(1080), None]); + assert_eq!(r.variants.last().unwrap().kind, VariantKind::Adaptive); + } + + #[test] + fn build_variants_ext_derived_from_mime() { + let r = build_media_variants_response(sample_config_with_all()); + assert_eq!(r.variants[0].ext, "mp4"); + } + + #[test] + fn filter_audio_only_keeps_only_adaptive() { + let r = build_media_variants_response(sample_config_with_all()); + let filtered = filter_audio_only(r); + assert_eq!(filtered.variants.len(), 1); + assert_eq!(filtered.variants[0].kind, VariantKind::Adaptive); + } + + #[test] + fn pick_variant_below_preferred_quality() { + let r = build_media_variants_response(sample_config_with_all()); + let picked = pick_variant_for_quality(&r.variants, "720p").unwrap(); + assert_eq!(picked.height, Some(720)); + } + + #[test] + fn pick_variant_works_on_unsorted_slice() { + // Callers of `pick_variant_for_quality` are not required to + // sort their input first — `build_media_variants_response` + // does sort internally, but the helper must remain correct + // when given an arbitrary slice. + let variants = vec![ + MediaVariant { + format_id: "1080p".into(), + kind: VariantKind::Video, + ext: "mp4".into(), + width: Some(1920), + height: Some(1080), + fps: Some(24.0), + url: "a".into(), + }, + MediaVariant { + format_id: "360p".into(), + kind: VariantKind::Video, + ext: "mp4".into(), + width: Some(640), + height: Some(360), + fps: Some(24.0), + url: "b".into(), + }, + MediaVariant { + format_id: "720p".into(), + kind: VariantKind::Video, + ext: "mp4".into(), + width: Some(1280), + height: Some(720), + fps: Some(24.0), + url: "c".into(), + }, + ]; + let picked = pick_variant_for_quality(&variants, "720p").unwrap(); + assert_eq!(picked.height, Some(720)); + let picked = pick_variant_for_quality(&variants, "1080p").unwrap(); + assert_eq!(picked.height, Some(1080)); + let picked = pick_variant_for_quality(&variants, "480p").unwrap(); + assert_eq!( + picked.height, + Some(360), + "480p preferred should pick 360p (max height <= target)" + ); + } + + #[test] + fn pick_variant_for_2k_maps_to_1440() { + // no 1440p entry, only 360/720/1080 → 1080 is the closest ≤1440 + let r = build_media_variants_response(sample_config_with_all()); + let picked = pick_variant_for_quality(&r.variants, "2K").unwrap(); + assert_eq!(picked.height, Some(1080)); + } + + #[test] + fn pick_variant_falls_back_to_adaptive_when_no_progressive_fits() { + // Preferred quality lower than any progressive → fall back to HLS + let r = build_media_variants_response(sample_config_with_all()); + let picked = pick_variant_for_quality(&r.variants, "240p").unwrap(); + assert_eq!(picked.kind, VariantKind::Adaptive); + } + + #[test] + fn ensure_vimeo_url_rejects_unknown() { + let err = ensure_vimeo_url("https://example.com/").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn ensure_single_video_rejects_showcase() { + let err = ensure_single_video("https://vimeo.com/showcase/1").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn private_video_classification_is_accepted_by_can_handle() { + assert_eq!( + handle_can_handle("https://vimeo.com/123456789/abcdef1234"), + "true" + ); + } + + #[test] + fn json_serialisation_of_extract_links_response() { + let r = build_single_video_response(sample_oembed(), "https://vimeo.com/123456789"); + let json = serde_json::to_string(&r).unwrap(); + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap(); + assert_eq!(parsed["kind"], "video"); + assert_eq!(parsed["videos"][0]["title"], "Sintel trailer"); + } + + #[test] + fn supports_playlist_false_for_showcase_until_implemented() { + assert_eq!( + handle_supports_playlist("https://vimeo.com/showcase/98765"), + "false", + "Showcase must not be advertised as playlist-supported" + ); + } + + #[test] + fn can_handle_rejects_showcase_until_implemented() { + assert_eq!( + handle_can_handle("https://vimeo.com/showcase/98765"), + "false" + ); + } + + #[test] + fn ensure_vimeo_url_rejects_showcase() { + let err = ensure_vimeo_url("https://vimeo.com/showcase/98765").unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn pick_cdn_is_deterministic_without_default() { + // When `default_cdn` is missing, we must pick the + // lexicographically smallest key so the result is stable + // across runs. + let mut cdns = HashMap::new(); + cdns.insert( + "z_akamai".into(), + CdnEntry { + url: "https://z.example/m.m3u8".into(), + avc_url: None, + }, + ); + cdns.insert( + "a_fastly".into(), + CdnEntry { + url: "https://a.example/m.m3u8".into(), + avc_url: None, + }, + ); + let hls = HlsEntry { + cdns, + default_cdn: None, + }; + // Run multiple times to catch order instability. + for _ in 0..5 { + assert_eq!(pick_cdn(&hls).as_deref(), Some("https://a.example/m.m3u8")); + } + } +} diff --git a/plugins/vortex-mod-vimeo/src/parser.rs b/plugins/vortex-mod-vimeo/src/parser.rs new file mode 100644 index 0000000..67a17ab --- /dev/null +++ b/plugins/vortex-mod-vimeo/src/parser.rs @@ -0,0 +1,1106 @@ +//! Vimeo oEmbed + player config parsing. +//! +//! Two data sources are consulted for a video: +//! +//! 1. **oEmbed endpoint** (`https://vimeo.com/api/oembed.json?url=…`): +//! always-public JSON with title, description, thumbnail, duration, +//! html embed code. Works for both public and private-link videos. +//! +//! 2. **Player config JSON** (embedded in the video page HTML inside a +//! `window.playerConfig = {…};` script tag or fetched from +//! `https://player.vimeo.com/video//config`): carries the +//! progressive download URLs and available quality variants. +//! +//! The oEmbed endpoint alone is enough to populate metadata, so the +//! plugin can still return `MediaLink`s when the page HTML is blocked. +//! The quality variants only appear when the player config is available. + +use std::collections::HashMap; + +use serde::{Deserialize, Serialize}; + +use crate::error::PluginError; + +// ── Host function envelope ──────────────────────────────────────────────────── + +#[derive(Debug, Serialize)] +pub struct HttpRequest { + pub method: String, + pub url: String, + #[serde(skip_serializing_if = "HashMap::is_empty")] + pub headers: HashMap, + #[serde(skip_serializing_if = "Option::is_none")] + pub body: Option, +} + +#[derive(Debug, Deserialize)] +pub struct HttpResponse { + pub status: u16, + #[serde(default)] + pub headers: HashMap, + #[serde(default)] + pub body: String, +} + +impl HttpResponse { + pub fn into_success_body(self) -> Result { + if (200..300).contains(&self.status) { + Ok(self.body) + } else if self.status == 401 || self.status == 403 { + Err(PluginError::Private(format!("status {}", self.status))) + } else { + Err(PluginError::HttpStatus { + status: self.status, + message: truncate(&self.body, 256), + }) + } + } +} + +fn truncate(s: &str, max: usize) -> String { + if s.len() <= max { + s.to_string() + } else { + let mut cut = max; + while !s.is_char_boundary(cut) && cut > 0 { + cut -= 1; + } + format!("{}…", &s[..cut]) + } +} + +// ── oEmbed response ─────────────────────────────────────────────────────────── + +/// Partial mapping of the Vimeo oEmbed JSON schema. +#[derive(Debug, Deserialize, PartialEq, Eq)] +pub struct OembedResponse { + /// `"video"` for a single video. Other values are treated as errors. + #[serde(rename = "type")] + pub kind: String, + pub title: String, + #[serde(default)] + pub description: Option, + #[serde(default)] + pub author_name: Option, + #[serde(default)] + pub author_url: Option, + #[serde(default)] + pub thumbnail_url: Option, + #[serde(default)] + pub duration: Option, + #[serde(default)] + pub video_id: Option, +} + +pub fn parse_oembed(raw: &str) -> Result { + let parsed: OembedResponse = + serde_json::from_str(raw).map_err(|e| PluginError::ParseJson(e.to_string()))?; + if parsed.kind != "video" { + return Err(PluginError::UnsupportedUrl(format!( + "oEmbed kind '{}' is not a video", + parsed.kind + ))); + } + Ok(parsed) +} + +// ── Player config ───────────────────────────────────────────────────────────── + +/// Partial mapping of the Vimeo player config JSON schema. +/// +/// Full schema is huge; only the fields required to enumerate progressive +/// download URLs and the HLS manifest are captured here. +#[derive(Debug, Deserialize)] +pub struct PlayerConfig { + pub request: RequestConfig, + #[serde(default)] + pub video: Option, +} + +#[derive(Debug, Deserialize)] +pub struct RequestConfig { + pub files: FilesConfig, +} + +#[derive(Debug, Deserialize, Default)] +pub struct FilesConfig { + #[serde(default)] + pub progressive: Vec, + #[serde(default)] + pub hls: Option, + #[serde(default)] + pub dash: Option, +} + +#[derive(Debug, Deserialize)] +pub struct ProgressiveEntry { + pub profile: Option, + pub quality: String, + pub width: Option, + pub height: Option, + pub fps: Option, + pub mime: Option, + pub url: String, +} + +#[derive(Debug, Deserialize)] +pub struct HlsEntry { + #[serde(default)] + pub cdns: HashMap, + #[serde(default)] + pub default_cdn: Option, +} + +#[derive(Debug, Deserialize)] +pub struct CdnEntry { + pub url: String, + #[serde(default)] + pub avc_url: Option, +} + +#[derive(Debug, Deserialize)] +pub struct VideoMeta { + pub id: Option, + pub title: Option, + pub duration: Option, + pub thumbs: Option>, +} + +pub fn parse_player_config(raw: &str) -> Result { + // Vimeo's `/config` endpoint returns strict JSON, so the happy + // path is a direct `serde_json::from_str`. But the HTML-embedded + // player config (the fallback path used when /config is blocked + // or geo-rewritten) is a JavaScript object literal, and that + // format tolerates single-quoted strings — serde_json does not. + // + // When the strict parse fails, attempt a best-effort normalisation + // from JS object literal → JSON: convert unescaped `'` tokens + // outside already-double-quoted strings into `"`. The result is + // then re-parsed with serde_json. The normalisation is safe in + // the sense that a well-formed JSON input passes through + // unchanged (no `'` outside strings, so nothing to rewrite). + match serde_json::from_str(raw) { + Ok(cfg) => Ok(cfg), + Err(_) => { + let normalised = js_object_literal_to_json(raw); + serde_json::from_str(&normalised).map_err(|e| PluginError::ParseJson(e.to_string())) + } + } +} + +/// Convert a JavaScript object literal into valid JSON by rewriting +/// single-quoted string delimiters to double quotes. +/// +/// The scanner walks the input **by `char`** (not by byte) so that +/// non-ASCII metadata embedded in the player config — e.g. a video +/// title like `"Éclair — intro"` with accented characters, emoji, or +/// full-width punctuation — round-trips through the rewrite intact. +/// Iterating bytes and casting each to `char` would corrupt any +/// multi-byte UTF-8 code unit by splitting it across multiple +/// 1-character pushes. +/// +/// State tracks whether we are currently inside a `"`-delimited +/// string (so `"don't"` is not rewritten) and whether the previous +/// character was a backslash (so `\'` inside a single-quoted string +/// keeps its meaning as an escaped quote). When a `'` is encountered +/// outside a double-quoted string, the scanner toggles an `in_single` +/// flag and emits `"` instead. Escape sequences inside a +/// single-quoted string are re-emitted verbatim, except that `\'` +/// becomes `'` (a literal apostrophe inside what is now a +/// double-quoted string). +/// +/// This handles the shapes the balanced-brace extractor can return: +/// - pure JSON (pass-through — no `'` to rewrite) +/// - JS object with single-quoted strings (`{'url':'a.mp4'}`) +/// - mixed (`{'a':"b",'c':1}`) +/// +/// It does **not** handle keyword identifiers as keys +/// (`{url: 'a'}` — no quotes around `url`), because Vimeo's player +/// config always quotes its keys. If that ever changes, extend this +/// function to also rewrite `[A-Za-z_][A-Za-z0-9_]*\s*:` key shapes. +fn js_object_literal_to_json(input: &str) -> String { + let mut out = String::with_capacity(input.len()); + let mut in_double = false; + let mut in_single = false; + let mut escaped = false; + + for c in input.chars() { + if escaped { + // Inside a single-quoted string, `\'` collapses to `'` + // (literal apostrophe). Inside a double-quoted string, + // every escape is preserved verbatim. + if in_single && c == '\'' { + out.push('\''); + } else { + out.push('\\'); + out.push(c); + } + escaped = false; + continue; + } + match c { + '\\' if in_double || in_single => { + escaped = true; + } + '"' if !in_single => { + in_double = !in_double; + out.push('"'); + } + '\'' if !in_double => { + // Toggle the single-quote state and emit a double + // quote in its place. + in_single = !in_single; + out.push('"'); + } + // Inside a single-quoted string, a literal `"` character + // must be escaped when emitted into the JSON output so + // the reparser does not see it as an end-of-string. + '"' if in_single => { + out.push_str("\\\""); + } + _ => out.push(c), + } + } + out +} + +/// Extract the `{…}` block from a `window.playerConfig = {…};` assignment +/// embedded in the Vimeo page HTML. +/// +/// Uses a balanced-brace scan rather than a regex because the JSON payload +/// can contain nested braces inside string literals; a naive `.*?` regex +/// would match the first `}` inside a description field. +/// +/// Tracks both `"` and `'` as string delimiters so that a JavaScript +/// object with mixed quoting (not strictly JSON but valid JS) still +/// extracts correctly. +/// +/// The marker is anchored to `window.playerConfig` / `playerConfig =` +/// rather than the bare word, so a stray `` +/// earlier in the document cannot derail the scan. +pub fn extract_player_config_from_html(html: &str) -> Result<&str, PluginError> { + // Prefer the canonical assignment pattern; fall back to "playerConfig =" + // in case Vimeo ever drops the `window.` prefix. + // + // Both markers require an identifier boundary on **both** sides, + // so that similarly named variables like `window.playerConfigVersion` + // or `mywindow.playerConfig` do not match before the real + // assignment. + // + // Additionally, for the `CANONICAL` marker we insist on an `=` + // operator between the end of the needle and the next `{`. This + // rejects non-assignment references such as + // `console.log(window.playerConfig)` which happen to appear + // before the real assignment in the HTML. The `FALLBACK` needle + // already contains the `=`, so the gap check is a no-op for it. + const CANONICAL: &str = "window.playerConfig"; + const FALLBACK: &str = "playerConfig ="; + let (start_marker, needle_len) = + find_assignment_marker(html, CANONICAL, RequireAssignment::Yes) + .or_else(|| find_assignment_marker(html, FALLBACK, RequireAssignment::No)) + .ok_or(PluginError::PlayerConfigNotFound)?; + + // Find the first `{` after the marker that is outside any string + // literal. A plain `rest.find('{')` would pick up `{` inside a + // string like `"style={...}"`, pointing the balanced-brace scanner + // at the wrong position. Since the marker search guarantees we are + // outside a string at `needle_end`, the walk starts clean. + let needle_end = start_marker + needle_len; + let brace_start = find_brace_outside_strings(html.as_bytes(), needle_end) + .ok_or(PluginError::PlayerConfigNotFound)?; + + // Walk the bytes, counting unescaped braces outside string literals. + let bytes = html.as_bytes(); + let mut depth = 0i32; + let mut in_double = false; + let mut in_single = false; + let mut escaped = false; + let mut end = None; + for (i, &b) in bytes.iter().enumerate().skip(brace_start) { + if escaped { + escaped = false; + continue; + } + let in_str = in_double || in_single; + match b { + b'\\' if in_str => escaped = true, + b'"' if !in_single => in_double = !in_double, + b'\'' if !in_double => in_single = !in_single, + b'{' if !in_str => depth += 1, + b'}' if !in_str => { + depth -= 1; + if depth == 0 { + end = Some(i); + break; + } + } + _ => {} + } + } + let end_idx = end.ok_or(PluginError::PlayerConfigNotFound)?; + Ok(&html[brace_start..=end_idx]) +} + +// ── Request builders ────────────────────────────────────────────────────────── + +/// Whether the caller requires the gap between the needle end and +/// the next `{` to contain an `=` operator. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +enum RequireAssignment { + /// Scan the gap for an `=` operator. Use this when the needle + /// itself is bare (e.g. `window.playerConfig`). + Yes, + /// The needle already contains the `=` operator — skip the gap + /// scan. Use this for markers like `playerConfig =`. + No, +} + +/// Find the first `needle` occurrence in `haystack` that: +/// +/// 1. is **outside** any JavaScript string literal (both `"` and `'` +/// delimiters are tracked from position 0 so the quote state is +/// never lost — a marker that appears inside a string like +/// `"debug: window.playerConfig = ..."` is skipped); +/// 2. is bounded on **both** sides by non-identifier characters; and +/// 3. if `require_assignment == Yes`, is followed (before the first +/// `{` that is also outside any string) by a bare `=` operator +/// that is not part of `==`, `===`, `!=`, `!==`, `<=`, `>=`, `=>`. +/// +/// The function does a **single pass** over the haystack, tracking +/// JavaScript string state throughout, so there is no need to slice +/// a gap and re-parse it. This eliminates the class of bugs where +/// a gap substring resets quote state at its start. +/// +/// Returns `(byte_offset, needle_length)`. +fn find_assignment_marker( + haystack: &str, + needle: &str, + require_assignment: RequireAssignment, +) -> Option<(usize, usize)> { + let bytes = haystack.as_bytes(); + let needle_bytes = needle.as_bytes(); + let nlen = needle_bytes.len(); + if nlen == 0 || bytes.len() < nlen { + return None; + } + + let mut in_double = false; + let mut in_single = false; + let mut escaped = false; + let mut i = 0; + + while i < bytes.len() { + let b = bytes[i]; + + // Handle escape inside strings. + if escaped { + escaped = false; + i += 1; + continue; + } + let in_str = in_double || in_single; + if in_str && b == b'\\' { + escaped = true; + i += 1; + continue; + } + if b == b'"' && !in_single { + in_double = !in_double; + i += 1; + continue; + } + if b == b'\'' && !in_double { + in_single = !in_single; + i += 1; + continue; + } + // Skip all bytes inside strings — the needle, `=`, and `{` + // must all be outside strings to count. + if in_str { + i += 1; + continue; + } + + // Outside any string: check if needle starts here. + if i + nlen <= bytes.len() && bytes[i..i + nlen] == *needle_bytes { + let abs = i; + let after = abs + nlen; + + // Left boundary. + let left_ok = abs == 0 || !is_js_ident_continue(bytes[abs - 1]); + // Right boundary. + let right_ok = bytes.get(after).is_none_or(|b| !is_js_ident_continue(*b)); + + if left_ok && right_ok { + let assignment_ok = match require_assignment { + RequireAssignment::No => true, + RequireAssignment::Yes => { + // Continue the *same* string-state walk from + // `after` (which is guaranteed outside any + // string at this point) to find the first `{` + // outside strings, checking for a bare `=` + // along the way. + gap_has_assignment_then_brace(bytes, after) + } + }; + if assignment_ok { + return Some((abs, nlen)); + } + } + // Skip past the needle so the outer loop resumes after it + // (prevents re-matching the same position). + i = after; + continue; + } + + i += 1; + } + None +} + +/// Starting from `start` (guaranteed outside any string by the caller), +/// walk `bytes` tracking JS string state. Return `true` if a bare `=` +/// (outside strings, not part of `==`/`===`/`!=`/`!==`/`<=`/`>=`/`=>`) +/// is found before the first `{` (also outside strings). Return `false` +/// if `{` arrives before `=`, or if there is no `{` at all. +fn gap_has_assignment_then_brace(bytes: &[u8], start: usize) -> bool { + let mut in_double = false; + let mut in_single = false; + let mut escaped = false; + let mut found_eq = false; + let mut i = start; + + while i < bytes.len() { + let b = bytes[i]; + if escaped { + escaped = false; + i += 1; + continue; + } + let in_str = in_double || in_single; + if in_str && b == b'\\' { + escaped = true; + i += 1; + continue; + } + if b == b'"' && !in_single { + in_double = !in_double; + i += 1; + continue; + } + if b == b'\'' && !in_double { + in_single = !in_single; + i += 1; + continue; + } + if in_str { + i += 1; + continue; + } + // Outside any string. + if b == b'{' { + return found_eq; + } + if b == b'=' { + let prev = if i == start { 0 } else { bytes[i - 1] }; + let next = bytes.get(i + 1).copied().unwrap_or(0); + if !matches!(prev, b'=' | b'!' | b'<' | b'>') && !matches!(next, b'=' | b'>') { + found_eq = true; + } + } + i += 1; + } + false +} + +/// Find the first `{` in `bytes` starting from `start` that is +/// outside any JS string literal. Returns `None` if there is no `{` +/// outside strings. The caller must ensure `start` is outside a +/// string (this invariant is upheld by `find_assignment_marker`, +/// which only yields needle positions that are outside strings). +fn find_brace_outside_strings(bytes: &[u8], start: usize) -> Option { + let mut in_double = false; + let mut in_single = false; + let mut escaped = false; + let mut i = start; + while i < bytes.len() { + let b = bytes[i]; + if escaped { + escaped = false; + i += 1; + continue; + } + let in_str = in_double || in_single; + if in_str && b == b'\\' { + escaped = true; + i += 1; + continue; + } + if b == b'"' && !in_single { + in_double = !in_double; + i += 1; + continue; + } + if b == b'\'' && !in_double { + in_single = !in_single; + i += 1; + continue; + } + if !in_str && b == b'{' { + return Some(i); + } + i += 1; + } + None +} + +/// Standalone wrapper around `gap_has_assignment_then_brace` for +/// direct unit testing. The gap is treated as starting outside any +/// string — the `find_assignment_marker` single-pass scan guarantees +/// this invariant for real call sites. A trailing `{` sentinel is +/// appended so the helper can terminate. +#[cfg(test)] +fn gap_contains_assignment(gap: &str) -> bool { + let with_brace = format!("{gap}{{"); + gap_has_assignment_then_brace(with_brace.as_bytes(), 0) +} + +/// JavaScript ASCII identifier-continuation check. +/// +/// Full Unicode identifiers are out of scope for the HTML-embedded +/// `playerConfig` marker scan — Vimeo's page always uses plain ASCII +/// for the assignment — but `$` must be included alongside the +/// standard `[A-Za-z0-9_]` class because it is a legal identifier +/// character in JavaScript and appears in minified bundles. +fn is_js_ident_continue(b: u8) -> bool { + b.is_ascii_alphanumeric() || b == b'_' || b == b'$' +} + +pub fn build_oembed_request(video_url: &str) -> Result { + let url = format!( + "https://vimeo.com/api/oembed.json?url={}", + urlencode(video_url) + ); + let req = HttpRequest { + method: "GET".into(), + url, + headers: HashMap::new(), + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +pub fn build_player_config_request(video_id: &str) -> Result { + let url = format!("https://player.vimeo.com/video/{video_id}/config"); + let req = HttpRequest { + method: "GET".into(), + url, + headers: HashMap::new(), + body: None, + }; + Ok(serde_json::to_string(&req)?) +} + +pub fn parse_http_response(raw: &str) -> Result { + serde_json::from_str(raw).map_err(|e| PluginError::HostResponse(e.to_string())) +} + +fn urlencode(s: &str) -> String { + let mut out = String::with_capacity(s.len()); + for b in s.bytes() { + if b.is_ascii_alphanumeric() || matches!(b, b'-' | b'_' | b'.' | b'~') { + out.push(b as char); + } else { + out.push_str(&format!("%{:02X}", b)); + } + } + out +} + +#[cfg(test)] +mod tests { + use super::*; + + const OEMBED_JSON: &str = r#"{ + "type": "video", + "version": "1.0", + "title": "Sintel trailer", + "description": "Third open movie by the Blender Foundation.", + "author_name": "Blender Foundation", + "author_url": "https://vimeo.com/blender", + "thumbnail_url": "https://i.vimeocdn.com/video/1.jpg", + "duration": 52, + "video_id": 123456789 + }"#; + + const PLAYER_CONFIG_JSON: &str = r#"{ + "request": { + "files": { + "progressive": [ + { + "profile": 164, + "quality": "360p", + "width": 640, + "height": 360, + "fps": 24.0, + "mime": "video/mp4", + "url": "https://vod.vimeo.com/360.mp4" + }, + { + "profile": 165, + "quality": "720p", + "width": 1280, + "height": 720, + "fps": 24.0, + "mime": "video/mp4", + "url": "https://vod.vimeo.com/720.mp4" + }, + { + "profile": 174, + "quality": "1080p", + "width": 1920, + "height": 1080, + "fps": 24.0, + "mime": "video/mp4", + "url": "https://vod.vimeo.com/1080.mp4" + } + ], + "hls": { + "cdns": { + "akfire": { + "url": "https://akamai.vimeo.com/master.m3u8", + "avc_url": "https://akamai.vimeo.com/avc.m3u8" + } + }, + "default_cdn": "akfire" + } + } + }, + "video": { "id": 123456789, "title": "Sintel trailer", "duration": 52 } + }"#; + + #[test] + fn parse_oembed_accepts_video_type() { + let r = parse_oembed(OEMBED_JSON).unwrap(); + assert_eq!(r.title, "Sintel trailer"); + assert_eq!(r.duration, Some(52)); + assert_eq!(r.video_id, Some(123456789)); + } + + #[test] + fn parse_oembed_rejects_non_video_type() { + let json = r#"{"type": "photo", "title": "x"}"#; + let err = parse_oembed(json).unwrap_err(); + assert!(matches!(err, PluginError::UnsupportedUrl(_))); + } + + #[test] + fn parse_player_config_accepts_single_quoted_js_literal() { + // Vimeo's HTML-embedded player config can be a JS object + // literal with single-quoted strings. `parse_player_config` + // must normalise this into JSON before handing it to serde. + let raw = r#"{ + 'request': { + 'files': { + 'progressive': [ + { + 'profile': 164, + 'quality': '720p', + 'width': 1280, + 'height': 720, + 'fps': 24.0, + 'mime': 'video/mp4', + 'url': 'https://vod.vimeo.com/720.mp4' + } + ] + } + } + }"#; + let c = parse_player_config(raw).unwrap(); + assert_eq!(c.request.files.progressive.len(), 1); + assert_eq!(c.request.files.progressive[0].quality, "720p"); + assert_eq!( + c.request.files.progressive[0].url, + "https://vod.vimeo.com/720.mp4" + ); + } + + #[test] + fn parse_player_config_accepts_mixed_quoting() { + let raw = r#"{ + "request": { + "files": { + 'progressive': [ + {"profile": 1, "quality": "360p", "url": 'https://vod.vimeo.com/360.mp4'} + ] + } + } + }"#; + let c = parse_player_config(raw).unwrap(); + assert_eq!( + c.request.files.progressive[0].url, + "https://vod.vimeo.com/360.mp4" + ); + } + + #[test] + fn js_object_literal_preserves_double_quoted_apostrophe() { + let input = r#"{"title":"don't stop"}"#; + let out = js_object_literal_to_json(input); + // Strict JSON pass-through — no `'` outside strings, nothing rewritten. + assert_eq!(out, input); + } + + #[test] + fn js_object_literal_preserves_utf8_content() { + // A title with accented characters, em-dashes, and emoji must + // round-trip through the rewrite without corruption. Iterating + // bytes and casting each to char would split multi-byte UTF-8 + // sequences across multiple `push` calls. + let input = r#"{'title':'Éclair — intro 🎬','n':1}"#; + let out = js_object_literal_to_json(input); + assert_eq!(out, r#"{"title":"Éclair — intro 🎬","n":1}"#); + // And it should parse as valid JSON. + let v: serde_json::Value = serde_json::from_str(&out).unwrap(); + assert_eq!(v["title"], "Éclair — intro 🎬"); + } + + #[test] + fn js_object_literal_preserves_utf8_inside_double_quoted() { + // Double-quoted strings must also round-trip UTF-8 intact. + let input = r#"{"title":"Élodie: «bonjour»"}"#; + let out = js_object_literal_to_json(input); + assert_eq!(out, r#"{"title":"Élodie: «bonjour»"}"#); + } + + #[test] + fn parse_player_config_accepts_js_literal_with_utf8() { + let raw = r#"{ + 'request': { + 'files': { + 'progressive': [ + { + 'quality': '720p', + 'url': 'https://vod.vimeo.com/720.mp4' + } + ] + } + }, + 'video': { + 'title': 'Éclair — intro 🎬' + } + }"#; + let c = parse_player_config(raw).unwrap(); + assert_eq!(c.request.files.progressive[0].quality, "720p"); + assert_eq!(c.video.unwrap().title.unwrap(), "Éclair — intro 🎬"); + } + + #[test] + fn js_object_literal_converts_escaped_single_quote() { + let input = r#"{'title':'it\'s fine'}"#; + let out = js_object_literal_to_json(input); + assert_eq!(out, r#"{"title":"it's fine"}"#); + } + + #[test] + fn parse_player_config_all_qualities() { + let c = parse_player_config(PLAYER_CONFIG_JSON).unwrap(); + let qualities: Vec<_> = c + .request + .files + .progressive + .iter() + .map(|e| e.quality.as_str()) + .collect(); + assert_eq!(qualities, vec!["360p", "720p", "1080p"]); + assert!(c.request.files.hls.is_some()); + } + + #[test] + fn player_config_heights_preserved() { + let c = parse_player_config(PLAYER_CONFIG_JSON).unwrap(); + let heights: Vec<_> = c + .request + .files + .progressive + .iter() + .map(|e| e.height) + .collect(); + assert_eq!(heights, vec![Some(360), Some(720), Some(1080)]); + } + + #[test] + fn extract_player_config_simple_brace_balanced() { + let html = r#""#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"a":1,"b":{"c":"}"}}"#); + } + + #[test] + fn extract_player_config_escaped_quote_in_string() { + let html = r#"playerConfig = {"title":"he said \"hi\"","n":1};"#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"title":"he said \"hi\"","n":1}"#); + } + + #[test] + fn extract_player_config_not_found() { + let html = "no config here"; + let err = extract_player_config_from_html(html).unwrap_err(); + assert!(matches!(err, PluginError::PlayerConfigNotFound)); + } + + #[test] + fn extract_player_config_handles_single_quoted_strings() { + let html = r#""#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{'url':'has}brace','n':1}"#); + } + + #[test] + fn extract_player_config_skips_meta_tag_mention() { + let html = r#""#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"n":1}"#); + } + + #[test] + fn extract_player_config_skips_similar_prefixes() { + // `window.playerConfigVersion` must NOT be mistaken for the + // real `window.playerConfig` assignment. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_rejects_left_boundary_violation() { + // `mywindow.playerConfig` must not match `window.playerConfig` + // because the byte before `window` is an identifier character. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_rejects_equality_comparison() { + // `window.playerConfig === null` is not an assignment, but + // the old `gap.contains('=')` check would accept it because + // `===` contains `=`. The new `gap_contains_assignment` + // helper rejects this. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_rejects_loose_equality_and_inequality() { + let html_eq = r#" + + "#; + assert_eq!( + extract_player_config_from_html(html_eq).unwrap(), + r#"{"real": true}"# + ); + + let html_neq = r#" + + "#; + assert_eq!( + extract_player_config_from_html(html_neq).unwrap(), + r#"{"real": true}"# + ); + } + + #[test] + fn extract_player_config_rejects_arrow_function_reference() { + // `window.playerConfig => { ... }` is syntactically nonsense, + // but a gap with `=>` must still be rejected as non-assignment. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn gap_contains_assignment_accepts_bare_equal() { + assert!(gap_contains_assignment(" = ")); + assert!(gap_contains_assignment("=")); + assert!(gap_contains_assignment("\t= \n")); + } + + #[test] + fn gap_contains_assignment_ignores_equals_inside_string_literals() { + // `=` inside a double-quoted string literal is not an + // assignment — it is data. The scanner must ignore it so + // that a decoy `"window.playerConfig = ..."` string cannot + // fool the marker check. + assert!(!gap_contains_assignment( + r#" msg = "not = here" "#.split('=').next().unwrap() + )); + assert!(!gap_contains_assignment(r#" "has = inside" "#)); + assert!(!gap_contains_assignment(r#" 'single = quoted' "#)); + // A real `=` *after* the string must still be detected. + assert!(gap_contains_assignment(r#" "prefix" = "#)); + } + + #[test] + fn gap_contains_assignment_handles_escaped_quotes() { + // Escaped quote inside a string does not close the string, + // so the `=` that follows is still inside the literal. + assert!(!gap_contains_assignment(r#" "it \"= inside\"" "#)); + } + + #[test] + fn extract_player_config_skips_marker_inside_already_open_string() { + // The marker `window.playerConfig` appears inside a string + // that was already open before the marker starts. The + // single-pass scanner must track string state from position 0 + // so it knows the marker is still inside the string, even + // though `gap_contains_assignment` starts clean. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_skips_decoy_inside_string_literal() { + // A JavaScript snippet that embeds the playerConfig marker + // inside a string literal must not be picked as the real + // assignment site. The decoy `=` inside the string is now + // ignored, so the balanced-brace scanner reaches the real + // assignment below. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn gap_contains_assignment_rejects_comparisons_and_arrows() { + assert!(!gap_contains_assignment(" == ")); + assert!(!gap_contains_assignment(" === ")); + assert!(!gap_contains_assignment(" != ")); + assert!(!gap_contains_assignment(" !== ")); + assert!(!gap_contains_assignment(" <= ")); + assert!(!gap_contains_assignment(" >= ")); + assert!(!gap_contains_assignment(" => ")); + assert!(!gap_contains_assignment("")); + assert!(!gap_contains_assignment("no equals here")); + } + + #[test] + fn extract_player_config_rejects_non_assignment_reference() { + // A reference like `console.log(window.playerConfig)` appears + // before the real assignment. The scanner must walk past it + // because the gap between the needle end and the next `{` + // does not contain an `=` operator. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_fallback_marker_still_works() { + // The FALLBACK marker `playerConfig =` already contains `=` + // so the gap check is skipped — it must still find the + // assignment. + let html = r#""#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"fallback": true}"#); + } + + #[test] + fn extract_player_config_rejects_dollar_sign_identifier_continuation() { + // `$` is a legal JavaScript identifier character, so + // `window.playerConfig$legacy` must not be mistaken for + // `window.playerConfig`. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn extract_player_config_skips_similar_prefixes_for_fallback_marker() { + // Fallback `playerConfig =` must also observe the word boundary. + let html = r#" + + "#; + let json = extract_player_config_from_html(html).unwrap(); + assert_eq!(json, r#"{"real": true}"#); + } + + #[test] + fn build_oembed_request_url_encoded() { + let req = build_oembed_request("https://vimeo.com/123456789").unwrap(); + assert!(req.contains("\"method\":\"GET\"")); + assert!(req.contains("url=https%3A%2F%2Fvimeo.com%2F123456789")); + } + + #[test] + fn build_player_config_request_shape() { + let req = build_player_config_request("123456789").unwrap(); + assert!(req.contains("https://player.vimeo.com/video/123456789/config")); + } + + #[test] + fn http_response_private_when_401() { + let r = HttpResponse { + status: 401, + headers: HashMap::new(), + body: "x".into(), + }; + assert!(matches!( + r.into_success_body().unwrap_err(), + PluginError::Private(_) + )); + } +} diff --git a/plugins/vortex-mod-vimeo/src/plugin_api.rs b/plugins/vortex-mod-vimeo/src/plugin_api.rs new file mode 100644 index 0000000..3d91d28 --- /dev/null +++ b/plugins/vortex-mod-vimeo/src/plugin_api.rs @@ -0,0 +1,196 @@ +//! WASM-only module: `#[plugin_fn]` exports and `#[host_fn]` imports. + +use extism_pdk::*; + +use crate::error::PluginError; +use crate::parser::{ + build_oembed_request, build_player_config_request, extract_player_config_from_html, + parse_http_response, parse_oembed, parse_player_config, +}; +use crate::url_matcher::extract_video_id; +use crate::{ + build_media_variants_response, build_single_video_response, ensure_single_video, + filter_audio_only, handle_can_handle, handle_supports_playlist, pick_variant_for_quality, + MediaVariant, MediaVariantsResponse, +}; + +#[host_fn] +extern "ExtismHost" { + fn http_request(req: String) -> String; + fn get_config(key: String) -> String; +} + +#[plugin_fn] +pub fn can_handle(url: String) -> FnResult { + Ok(handle_can_handle(&url)) +} + +#[plugin_fn] +pub fn supports_playlist(url: String) -> FnResult { + Ok(handle_supports_playlist(&url)) +} + +#[plugin_fn] +pub fn extract_links(url: String) -> FnResult { + // Use `ensure_single_video` rather than `ensure_vimeo_url` so that + // showcase URLs are rejected at the entrypoint and never reach + // `build_single_video_response`. Showcase extraction is handled by + // `extract_playlist`, which currently returns a clear unsupported + // error until the token-gated showcase endpoint is wired up. + ensure_single_video(&url).map_err(error_to_fn_error)?; + + let oembed = fetch_oembed(&url)?; + // Pass the original URL through so private share links + // (`vimeo.com//`) retain their hash token. + let response = build_single_video_response(oembed, &url); + Ok(serde_json::to_string(&response)?) +} + +#[plugin_fn] +pub fn get_media_variants(url: String) -> FnResult { + ensure_single_video(&url).map_err(error_to_fn_error)?; + + let video_id = extract_video_id(&url) + .ok_or_else(|| error_to_fn_error(PluginError::UnsupportedUrl(url.clone())))?; + let config = fetch_player_config(&video_id)?; + let variants = build_media_variants_response(config); + let filtered = if audio_only_preference() { + filter_audio_only(variants) + } else { + variants + }; + // Honour the user-configured `default_quality` by hoisting the + // best matching variant to the head of the list. The host renders + // the first entry as the default selection in the UI, so a stable + // ordering plus a hoist gives us both deterministic output and + // respect for the configured preference. + let reordered = apply_quality_preference(filtered); + Ok(serde_json::to_string(&reordered)?) +} + +#[plugin_fn] +pub fn extract_playlist(_url: String) -> FnResult { + // Showcase / album extraction is not implemented in the MVP — the + // oEmbed endpoint does not enumerate showcase entries and the + // relevant API endpoint requires an access token. Return a clear + // error so the UI can surface an appropriate message. + Err(error_to_fn_error(PluginError::UnsupportedUrl( + "showcase extraction is not implemented yet".into(), + ))) +} + +// ── Host function wiring ────────────────────────────────────────────────────── + +fn fetch_oembed(video_url: &str) -> FnResult { + let req = build_oembed_request(video_url).map_err(error_to_fn_error)?; + // SAFETY: `http_request` is resolved by the Vortex plugin host at + // load time (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_http_request_function`). Invariants: + // 1. The host registers `http_request` in the `ExtismHost` + // namespace before any `#[plugin_fn]` export is callable. + // 2. The ABI is `(I64) -> I64`; the `#[host_fn]` macro marshals + // `String` in/out through Extism memory handles. + // 3. The host gates the call on the `http` capability from + // `plugin.toml`; rejections return an error which `?` surfaces. + // 4. Inputs/outputs are owned JSON strings — no aliasing. + let raw = unsafe { http_request(req)? }; + let resp = parse_http_response(&raw).map_err(error_to_fn_error)?; + let body = resp.into_success_body().map_err(error_to_fn_error)?; + parse_oembed(&body).map_err(error_to_fn_error) +} + +fn fetch_player_config(video_id: &str) -> FnResult { + let req = build_player_config_request(video_id).map_err(error_to_fn_error)?; + // SAFETY: identical host-function invariants to `fetch_oembed` + // above — the host-side symbol, ABI, capability gate, and owned + // JSON I/O all apply unchanged. See `fetch_oembed` for the full + // list. + let raw = unsafe { http_request(req)? }; + let resp = parse_http_response(&raw).map_err(error_to_fn_error)?; + let body = resp.into_success_body().map_err(error_to_fn_error)?; + + // Vimeo returns JSON directly for /config. If the body happens to be + // an HTML page (e.g. geo-blocked fallback) try to extract the config + // block before giving up. + match parse_player_config(&body) { + Ok(cfg) => Ok(cfg), + Err(_) => { + let json = extract_player_config_from_html(&body).map_err(error_to_fn_error)?; + parse_player_config(json).map_err(error_to_fn_error) + } + } +} + +/// Hoist the variant matching the user's `default_quality` preference +/// to the front of the list. The remaining entries keep their original +/// sort order from `build_media_variants_response`. If the config key +/// is missing, empty, or matches no progressive variant, the list is +/// returned unchanged. +fn apply_quality_preference(mut response: MediaVariantsResponse) -> MediaVariantsResponse { + let preferred = default_quality_preference(); + if preferred.is_empty() { + return response; + } + let Some(target_url) = + pick_variant_for_quality(&response.variants, &preferred).map(|v| v.url.clone()) + else { + return response; + }; + // Re-order in place: pull the match out, push it to the front. + if let Some(pos) = response + .variants + .iter() + .position(|v: &MediaVariant| v.url == target_url) + { + let picked = response.variants.remove(pos); + response.variants.insert(0, picked); + } + response +} + +fn default_quality_preference() -> String { + // SAFETY: identical host-function invariants to + // `audio_only_preference` below — the host symbol is registered, + // the ABI is `(I64) -> I64`, capability gating is manifest-driven, + // and the returned string is owned. + unsafe { get_config("default_quality".to_string()) }.unwrap_or_default() +} + +/// Accepted truthy string values for boolean config keys sourced via +/// `get_config("extract_audio_only")` and any future boolean host +/// setting. The comparison is case-insensitive (values are lowercased +/// before the match), and any value outside this list falls back to +/// the documented default of `false`. +/// +/// Keeping this list in one place makes the convention discoverable +/// and prevents drift if another config key later adopts the same +/// parser. +const TRUTHY_VALUES: &[&str] = &["true", "1", "yes"]; + +fn is_truthy(value: &str) -> bool { + let lower = value.to_ascii_lowercase(); + TRUTHY_VALUES.iter().any(|&v| v == lower) +} + +fn audio_only_preference() -> bool { + // Reads `get_config("extract_audio_only")` and interprets the + // returned string via [`is_truthy`] / [`TRUTHY_VALUES`]. + // + // SAFETY: `get_config` is registered host-side before plugin exports + // run (see src-tauri/src/adapters/driven/plugin/host_functions.rs: + // `make_get_config_function`). Invariants: + // 1. The symbol is registered in the `ExtismHost` namespace + // before any `#[plugin_fn]` export is callable. + // 2. The ABI is `(I64) -> I64`; the `#[host_fn]` macro marshals + // `String` in/out. + // 3. A missing key or transient error yields the empty default + // which falls through to `false` — the documented default for + // `extract_audio_only`. + // 4. Inputs/outputs are owned JSON strings — no aliasing concerns. + let value = unsafe { get_config("extract_audio_only".to_string()) }.unwrap_or_default(); + is_truthy(&value) +} + +fn error_to_fn_error(err: PluginError) -> WithReturnCode { + extism_pdk::Error::msg(err.to_string()).into() +} diff --git a/plugins/vortex-mod-vimeo/src/url_matcher.rs b/plugins/vortex-mod-vimeo/src/url_matcher.rs new file mode 100644 index 0000000..0ddf7ac --- /dev/null +++ b/plugins/vortex-mod-vimeo/src/url_matcher.rs @@ -0,0 +1,263 @@ +//! Vimeo URL detection and classification. +//! +//! ## Accepted URL shapes +//! +//! - `vimeo.com/` — public video +//! - `vimeo.com//` — private video share link (hash is a token) +//! - `vimeo.com/showcase/` or `vimeo.com/album/` — playlist +//! - `vimeo.com/ondemand/` — rejected (paid content, out of scope) +//! - `player.vimeo.com/video/` — embedded player URL +//! +//! `vimeo.com/` artist profiles are rejected here because Vimeo's +//! public HTML for profiles is inconsistent; the MVP focuses on video +//! and showcase extraction. + +use std::sync::OnceLock; + +use regex::Regex; + +/// Kind of Vimeo resource identified from a URL. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum UrlKind { + /// Single video: `vimeo.com/` or `player.vimeo.com/video/` + Video, + /// Private video share: `vimeo.com//` + PrivateVideo, + /// Showcase / album: `vimeo.com/showcase/` or `vimeo.com/album/` + Showcase, + /// Not a recognised Vimeo URL. + Unknown, +} + +/// Returns `true` if the URL is any form of recognised Vimeo resource. +pub fn is_vimeo_url(url: &str) -> bool { + !matches!(classify_url(url), UrlKind::Unknown) +} + +/// Classify the URL into a [`UrlKind`]. +pub fn classify_url(url: &str) -> UrlKind { + let Some((host_lower, path)) = validate_and_split(url) else { + return UrlKind::Unknown; + }; + + if !is_vimeo_host(&host_lower) { + return UrlKind::Unknown; + } + + let path_only = normalize_path(path); + + // player.vimeo.com/video/ + if host_lower == "player.vimeo.com" { + return if player_video_regex().is_match(path_only) { + UrlKind::Video + } else { + UrlKind::Unknown + }; + } + + // vimeo.com family + if showcase_or_album_regex().is_match(path_only) { + return UrlKind::Showcase; + } + if private_video_regex().is_match(path_only) { + return UrlKind::PrivateVideo; + } + if video_regex().is_match(path_only) { + return UrlKind::Video; + } + UrlKind::Unknown +} + +/// Strip query string, fragment, and trailing slash from a raw +/// path-and-query slice. `path#frag?q` (malformed but tolerated) is +/// handled by splitting on `#` first. +fn normalize_path(path: &str) -> &str { + let no_frag = path.split('#').next().unwrap_or(""); + let no_query = no_frag.split('?').next().unwrap_or(""); + no_query.trim_end_matches('/') +} + +fn is_vimeo_host(host: &str) -> bool { + matches!( + host, + "vimeo.com" | "www.vimeo.com" | "player.vimeo.com" | "m.vimeo.com" + ) +} + +// All four URL-classification regexes are compile-time constants: +// `.expect` documents the invariant and honours the crate-wide +// policy that production code paths must not `.unwrap()`. + +fn video_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/(\d{6,})$").expect("video_regex: compile-time constant regex must compile") + }) +} + +fn private_video_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/(\d{6,})/([a-f0-9]{8,})$") + .expect("private_video_regex: compile-time constant regex must compile") + }) +} + +fn showcase_or_album_regex() -> &'static Regex { + // Fully anchored — trailing junk like `/foo/bar` after the numeric + // ID must not match. Callers normalise query/fragment/trailing + // slash before passing the path to this regex. + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/(?:showcase|album)/(\d+)$") + .expect("showcase_or_album_regex: compile-time constant regex must compile") + }) +} + +fn player_video_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"^/video/(\d{6,})$") + .expect("player_video_regex: compile-time constant regex must compile") + }) +} + +/// Extract the numeric video ID from a URL or return `None` if the URL is +/// not a video / private-video shape. Used by the oEmbed request builder. +pub fn extract_video_id(url: &str) -> Option { + let (_, path) = validate_and_split(url)?; + let path_only = normalize_path(path); + + if let Some(caps) = private_video_regex().captures(path_only) { + return caps.get(1).map(|m| m.as_str().to_string()); + } + if let Some(caps) = video_regex().captures(path_only) { + return caps.get(1).map(|m| m.as_str().to_string()); + } + if let Some(caps) = player_video_regex().captures(path_only) { + return caps.get(1).map(|m| m.as_str().to_string()); + } + None +} + +/// Extract the numeric showcase / album ID from a URL or return `None`. +pub fn extract_showcase_id(url: &str) -> Option { + let (_, path) = validate_and_split(url)?; + let path_only = normalize_path(path); + showcase_or_album_regex() + .captures(path_only) + .and_then(|c| c.get(1).map(|m| m.as_str().to_string())) +} + +fn validate_and_split(url: &str) -> Option<(String, &str)> { + let (scheme, rest) = url.split_once("://")?; + if !matches!(scheme.to_ascii_lowercase().as_str(), "http" | "https") { + return None; + } + let (authority, path_and_query) = match rest.find('/') { + Some(idx) => (&rest[..idx], &rest[idx..]), + None => (rest, ""), + }; + let authority_no_user = authority.rsplit('@').next().unwrap_or(authority); + let host = extract_host(authority_no_user)?; + Some((host.to_ascii_lowercase(), path_and_query)) +} + +/// Extract the host portion (without port) from an authority string. +/// Handles both plain hosts/IPv4 and bracketed IPv6 literals — see +/// the equivalent helper in the gallery plugin for the full policy. +fn extract_host(authority: &str) -> Option<&str> { + if authority.is_empty() { + return None; + } + if authority.starts_with('[') { + let close = authority.find(']')?; + Some(&authority[..=close]) + } else { + let host = authority.split(':').next().unwrap_or(authority); + if host.is_empty() { + None + } else { + Some(host) + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use rstest::rstest; + + #[rstest] + #[case("https://vimeo.com/123456789", UrlKind::Video)] + #[case("https://www.vimeo.com/123456789", UrlKind::Video)] + #[case("https://vimeo.com/123456789/abcdef1234", UrlKind::PrivateVideo)] + #[case("https://vimeo.com/showcase/98765", UrlKind::Showcase)] + #[case("https://vimeo.com/album/54321", UrlKind::Showcase)] + #[case("https://player.vimeo.com/video/123456789", UrlKind::Video)] + #[case("https://vimeo.com/123456789?autoplay=1", UrlKind::Video)] + #[case("https://vimeo.com/123456789/", UrlKind::Video)] + #[case("https://vimeo.com/ondemand/example", UrlKind::Unknown)] + #[case("https://vimeo.com/user/foo", UrlKind::Unknown)] + #[case("https://example.com/?u=vimeo.com/123", UrlKind::Unknown)] + #[case("not a url", UrlKind::Unknown)] + // Fragment stripping: `#t=30s` timestamps must not reclassify the URL. + #[case("https://vimeo.com/123456789#t=30", UrlKind::Video)] + #[case( + "https://vimeo.com/123456789/abcdef1234#comment", + UrlKind::PrivateVideo + )] + #[case("https://vimeo.com/showcase/98765#intro", UrlKind::Showcase)] + // Showcase regex is anchored — junk after the numeric id is rejected. + #[case("https://vimeo.com/showcase/98765/extra/segments", UrlKind::Unknown)] + fn test_classify_url(#[case] url: &str, #[case] expected: UrlKind) { + assert_eq!(classify_url(url), expected); + } + + #[test] + fn extract_video_id_public() { + assert_eq!( + extract_video_id("https://vimeo.com/123456789"), + Some("123456789".into()) + ); + } + + #[test] + fn extract_video_id_private() { + assert_eq!( + extract_video_id("https://vimeo.com/123456789/abcdef1234"), + Some("123456789".into()) + ); + } + + #[test] + fn extract_video_id_player() { + assert_eq!( + extract_video_id("https://player.vimeo.com/video/123456789"), + Some("123456789".into()) + ); + } + + #[test] + fn extract_video_id_rejects_showcase() { + assert_eq!(extract_video_id("https://vimeo.com/showcase/1"), None); + } + + #[test] + fn extract_showcase_id_matches_showcase_and_album() { + assert_eq!( + extract_showcase_id("https://vimeo.com/showcase/98765"), + Some("98765".into()) + ); + assert_eq!( + extract_showcase_id("https://vimeo.com/album/54321"), + Some("54321".into()) + ); + } + + #[test] + fn is_vimeo_url_sanity() { + assert!(is_vimeo_url("https://vimeo.com/1234567")); + assert!(!is_vimeo_url("https://example.com/")); + } +}