feat(plugin): implement YouTube WASM plugin via yt-dlp#23
Conversation
Introduce vortex-mod-youtube, the first production crawler plugin, as an independent Cargo crate targeting wasm32-wasip1 with extism-pdk 1.4. The plugin resolves YouTube videos, shorts, playlists, and channels by delegating media extraction to yt-dlp through the Vortex run_subprocess host function — keeping the WASM sandbox boundary intact. Architecture split for testability: all business logic (URL matching, yt-dlp JSON parsing, quality selection, subprocess request/response helpers) lives in native-compilable modules, and a thin `plugin_api` module gated behind `cfg(target_family = "wasm")` holds the `#[plugin_fn]` exports and `#[host_fn]` imports. This lets 77 unit tests run under the host toolchain without a WASM runtime while still producing a 1.2 MB stripped release binary. Security hardening applied from code review: - `--` sentinel before every yt-dlp URL arg (option-injection defense in depth beyond host-side URL validation) - UTF-8-safe stderr truncation (char boundaries, never byte offsets) to avoid WASM aborts on non-ASCII yt-dlp output - `PluginError::SerdeJson` with `#[from]` preserves the underlying serde_json error source chain - Least-privilege manifest: only `subprocess = ["yt-dlp"]` declared, no `http` capability (all network I/O goes through yt-dlp itself) - Documented safety invariants on the sole `unsafe` block around the `run_subprocess` host-function call - Zero `.unwrap()`/`.expect()` in production paths per CLAUDE.md - Flat-playlist parser now accepts both JSONL and envelope formats (yt-dlp switches between them depending on channel URL shape) Covers specs/01-mvp/24-youtube-plugin.md success criteria.
📝 WalkthroughWalkthroughAdds a new Vortex YouTube WASM plugin crate ( Changes
Sequence DiagramsequenceDiagram
autonumber
actor Client as Client/Host
participant Plugin as Vortex YouTube<br/>WASM Plugin
participant HostSubproc as Host<br/>run_subprocess
participant YtDlp as yt-dlp
Client->>Plugin: extract_links(youtube_url)
activate Plugin
Plugin->>Plugin: ensure_youtube_url() / classify_url()
Plugin->>Plugin: build yt-dlp args (single/playlist)
Plugin->>HostSubproc: run_subprocess(JSON request)
activate HostSubproc
HostSubproc->>YtDlp: spawn yt-dlp with args
activate YtDlp
YtDlp-->>HostSubproc: stdout, stderr, exit_code
deactivate YtDlp
HostSubproc-->>Plugin: JSON response (exit_code, stdout, stderr)
deactivate HostSubproc
Plugin->>Plugin: parse stdout -> VideoInfo / Playlist
Plugin->>Plugin: build response DTO (ExtractLinksResponse / MediaVariants)
Plugin-->>Client: JSON result or mapped error
deactivate Plugin
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR introduces All findings are P2 — see inline comments for edge-case details. Confidence Score: 5/5Safe to merge — all findings are P2 edge cases with low real-world probability. No P0/P1 issues found. The four P2 findings (envelope heuristic false-positive, negative-duration silent truncation, empty-URL fallback, playlist URL accepted by get_media_variants) are all low-probability edge cases that don't affect the primary video/playlist extraction path. Security hardening (-- sentinel, char-boundary truncation, least-privilege manifest) is solid, and 77 unit tests cover the core logic thoroughly. plugins/vortex-mod-youtube/src/metadata.rs (envelope detection heuristic and parse_duration edge cases) Important Files Changed
Sequence DiagramsequenceDiagram
participant Host as Vortex Host
participant PA as plugin_api (WASM)
participant UM as url_matcher
participant EX as extractor
participant RS as run_subprocess (host fn)
participant MD as metadata
Host->>PA: extract_links(url)
PA->>UM: ensure_youtube_url(url)
PA->>UM: classify_url(url)
alt Video / Shorts
PA->>EX: yt_dlp_args_for_single_video(url)
else Playlist / Channel
PA->>EX: yt_dlp_args_for_playlist(url)
end
PA->>EX: build_subprocess_request(args)
PA->>RS: unsafe run_subprocess(req_json)
RS-->>PA: response_json
PA->>EX: parse_subprocess_response(response_json)
alt single video
PA->>MD: parse_single_video(stdout)
MD-->>PA: VideoInfo
PA->>PA: build_single_video_response(video)
else playlist
PA->>MD: parse_flat_playlist(stdout)
MD-->>PA: Playlist
PA->>PA: build_playlist_response(playlist)
end
PA-->>Host: JSON string (ExtractLinksResponse)
Reviews (1): Last reviewed commit: "feat(plugin): implement YouTube WASM plu..." | Re-trigger Greptile |
| if let Some(first_line) = trimmed.lines().next() { | ||
| let contains_entries_key = first_line.contains("\"entries\""); | ||
| let looks_single_object = trimmed.lines().filter(|l| !l.trim().is_empty()).count() == 1 | ||
| && trimmed.starts_with('{'); | ||
| if looks_single_object && contains_entries_key { | ||
| let env: RawPlaylistEnvelope = serde_json::from_str(trimmed)?; | ||
| return Ok(from_envelope(env)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Envelope detection can false-positive on
"entries" in string values
first_line.contains("\"entries\"") scans raw JSON bytes and matches any field whose value is exactly the word entries — not just the object key. A single-entry JSONL output where, say, the playlist name is entries would produce a raw line containing "playlist":"entries", which embeds the substring "entries" verbatim. With looks_single_object also true, the code attempts deserialization as RawPlaylistEnvelope, fails (wrong type for the entries field), and the ? operator propagates the error — the JSONL fallback path never executes.
Anchoring the search to the key position (adding the colon separator) eliminates the false-positive: look for "entries": rather than just "entries".
| url: raw.webpage_url.or(raw.url).unwrap_or_default(), | ||
| duration: parse_duration(raw.duration), | ||
| thumbnail: raw.thumbnail, | ||
| } | ||
| } |
There was a problem hiding this comment.
When yt-dlp emits a flat-playlist entry with neither webpage_url nor url (e.g., a private/deleted video placeholder), unwrap_or_default() silently produces an empty string. Callers receive a PlaylistEntry with url: "" rather than an error, which will fail opaquely when the URL is later used. Consider returning an Option<PlaylistEntry> and filtering at the call site so empty-URL entries are simply dropped.
| fn parse_duration(value: Option<serde_json::Value>) -> Option<u64> { | ||
| value.and_then(|v| match v { | ||
| serde_json::Value::Number(n) => n.as_f64().map(|f| f.round() as u64), | ||
| serde_json::Value::String(s) => s.parse::<f64>().ok().map(|f| f.round() as u64), | ||
| _ => None, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Saturating cast silently converts negative durations to
Some(0)
Since Rust 1.45 the as u64 cast is saturating, so (-1.0f64).round() as u64 == 0. Any yt-dlp output that emits a negative duration (malformed or live-stream sentinel) produces Some(0) rather than None, making it indistinguishable from a 0-second video. A guard before the cast makes the intent explicit:
| fn parse_duration(value: Option<serde_json::Value>) -> Option<u64> { | |
| value.and_then(|v| match v { | |
| serde_json::Value::Number(n) => n.as_f64().map(|f| f.round() as u64), | |
| serde_json::Value::String(s) => s.parse::<f64>().ok().map(|f| f.round() as u64), | |
| _ => None, | |
| }) | |
| } | |
| serde_json::Value::Number(n) => n.as_f64().and_then(|f| { | |
| if f < 0.0 { None } else { Some(f.round() as u64) } | |
| }), |
| pub fn get_media_variants(url: String) -> FnResult<String> { | ||
| ensure_youtube_url(&url).map_err(error_to_fn_error)?; | ||
|
|
||
| let stdout = call_yt_dlp(yt_dlp_args_for_single_video(&url))?; | ||
| let video = parse_single_video(&stdout).map_err(error_to_fn_error)?; | ||
| let variants = build_media_variants_response(video); | ||
| Ok(serde_json::to_string(&variants)?) | ||
| } |
There was a problem hiding this comment.
get_media_variants silently accepts playlist URLs
ensure_youtube_url accepts any recognised YouTube URL, including playlists and channels. When called with a playlist URL, yt_dlp_args_for_single_video (which includes --no-playlist) causes yt-dlp to extract the first video in the playlist, but the caller receives no indication that the returned variants belong to a different (first) video rather than the playlist itself. Guarding against non-single-video URLs would prevent this silent mismatch:
let kind = classify_url(&url);
if !matches!(kind, UrlKind::Video | UrlKind::Shorts) {
return Err(error_to_fn_error(PluginError::UnsupportedUrl(url)));
}There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
plugins/vortex-mod-youtube/src/metadata.rs (1)
223-236: Try envelope deserialization directly instead of line heuristics.This only recognizes the envelope form when it is a single non-empty line and
"entries"appears on that first line. Any valid multi-line envelope falls through to JSONL parsing and errors. A directserde_json::from_str::<RawPlaylistEnvelope>attempt would make this parser much more robust.Suggested change
- if let Some(first_line) = trimmed.lines().next() { - let contains_entries_key = first_line.contains("\"entries\""); - let looks_single_object = trimmed.lines().filter(|l| !l.trim().is_empty()).count() == 1 - && trimmed.starts_with('{'); - if looks_single_object && contains_entries_key { - let env: RawPlaylistEnvelope = serde_json::from_str(trimmed)?; - return Ok(from_envelope(env)); - } + if trimmed.starts_with('{') { + if let Ok(env) = serde_json::from_str::<RawPlaylistEnvelope>(trimmed) { + return Ok(from_envelope(env)); + } } parse_jsonl_playlist(output)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/vortex-mod-youtube/src/metadata.rs` around lines 223 - 236, The current heuristic only treats the output as a RawPlaylistEnvelope when it is a single non-empty line containing "entries", which misses valid multi-line JSON envelopes; replace the heuristic with a direct attempt to deserialize into RawPlaylistEnvelope by calling serde_json::from_str::<RawPlaylistEnvelope>(trimmed) and, on Ok(env), return Ok(from_envelope(env)); only if that deserialization returns Err should you fall back to parse_jsonl_playlist(output). Update the block around trimmed, RawPlaylistEnvelope, from_envelope, and parse_jsonl_playlist to try serde deserialization first and then fallback to JSONL parsing.plugins/vortex-mod-youtube/src/quality_manager.rs (1)
163-181: Make same-height fallback deterministic.When
preferred_containermisses, the final.next()inherits yt-dlp’s input order. That can arbitrarily flip between muxed and video-only entries at the same height. Add an explicit secondary ranking in this bucket instead of relying on source order.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/vortex-mod-youtube/src/quality_manager.rs` around lines 163 - 181, The fallback picks the first entry from at_height which follows yt-dlp input order and can flip arbitrarily; make this deterministic by applying a stable secondary ranking on at_height before taking next(): sort the at_height bucket to prefer muxed entries (ones with audio) over video-only, then tie-break by a numeric quality metric such as bitrate or filesize (descending), and finally use a stable comparison on ext or format id to ensure total ordering; update the selection logic around at_height (the block using prefs.preferred_container, FormatEntry entries, chosen_height and the final ok_or(PluginError::NoMatchingFormat)) to use that sorted/stable iterator.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/vortex-mod-youtube/plugin.toml`:
- Line 16: Update the plugin configuration schema so the "best" preset is
selectable: modify the default_quality options in plugin.toml by adding "best"
to the options array for the default_quality entry (the existing line that
defines default_quality = { type = "string", default = "1080p", options = [...]
} ) so the list includes "best" alongside the other quality strings.
In `@plugins/vortex-mod-youtube/README.md`:
- Line 3: Replace the placeholder repository link "https://github.com/..." in
the README.md title line with the actual Vortex repository URL so readers can
navigate to the host project; update the text "YouTube WASM plugin for
[Vortex](https://github.com/...)." to reference the real repo (e.g.,
https://github.com/Nexus-Mods/Vortex) so the markdown link points to the correct
Vortex project.
- Around line 26-35: The README uses two different WASM filenames
(vortex_mod_youtube.wasm vs vortex-mod-youtube.wasm); pick one and make it
consistent across the build output line and the install tree. Update the
sentence that states the resulting WASM binary and the install example so both
reference the same filename (either vortex_mod_youtube.wasm or
vortex-mod-youtube.wasm) and ensure the plugin.toml/install path stays correct.
In `@plugins/vortex-mod-youtube/src/lib.rs`:
- Around line 76-79: handle_can_handle currently uses
url_matcher::is_youtube_url which is too broad; change it to call
classify_url(url) and return true only when classify_url returns one of the
specific URL kinds your handlers actually support (i.e., the same variants the
extractor/routing code accepts), so the routing contract matches extraction;
apply the same tightening to the other helper(s) mentioned (the
ensure_youtube_url/related helper in the 153-159 region) so both use
classify_url and check for supported kinds rather than the broad regex.
In `@plugins/vortex-mod-youtube/src/metadata.rs`:
- Around line 191-200: The code is converting missing media URLs into empty
strings by calling unwrap_or_default() on raw.webpage_url and raw.thumbnail when
constructing VideoInfo; instead, preserve the absence by assigning the Option
values directly (use raw.webpage_url and raw.thumbnail without
unwrap_or_default) so downstream MediaLink.url sees None instead of "". Update
any downstream construction that assumes non-empty String to handle
Option<String> (or explicitly derive a fallback like
format!("https://youtu.be/{}", raw.id) where appropriate). Apply the same fix to
the other URL fields built later (the block around lines 278-285) so no missing
URL is silently turned into "".
In `@plugins/vortex-mod-youtube/src/plugin_api.rs`:
- Around line 52-66: The match over UrlKind in the response builder uses arms
that don't exist; update the match in the block that calls classify_url(...) to
handle the enum variants defined in url_matcher.rs (Video, Playlist, Channel,
Other) instead of Shorts/Unknown: group Playlist and Channel to call
yt_dlp_args_for_playlist + parse_flat_playlist + build_playlist_response, handle
Video (which should also cover shorts if classify_url classifies them) with
yt_dlp_args_for_single_video + parse_single_video + build_single_video_response,
and treat Other by returning
error_to_fn_error(PluginError::UnsupportedUrl(url)); keep calls to call_yt_dlp,
parse_* and build_* as-is.
- Around line 72-91: After ensure_youtube_url(&url), proactively validate the
YouTube resource kind and return an UnsupportedUrl FnError if it doesn't match
the function's expectation: in get_media_variants ensure the URL is a single
video (reject playlist/channel), and in extract_playlist ensure the URL is a
playlist (reject single-video); do this by probing yt-dlp metadata (reuse
call_yt_dlp/yt_dlp_args_for_single_video or yt_dlp_args_for_playlist to get the
info JSON) and inspecting the parsed result before calling parse_single_video or
parse_flat_playlist so you return UnsupportedUrl instead of dispatching to the
wrong parser (refer to get_media_variants, extract_playlist, ensure_youtube_url,
call_yt_dlp, parse_single_video, parse_flat_playlist, and UnsupportedUrl).
In `@plugins/vortex-mod-youtube/src/url_matcher.rs`:
- Around line 68-90: classify_url() lowercases and trims input before matching
but extract_video_id() only trims and extract_playlist_id() doesn't trim or
lowercase, causing mismatches; update extract_video_id() to both trim and
lowercase its input (apply the same normalization used by classify_url()) before
running watch_id_regex(), short_host_id_regex(), and shorts_id_regex(), and
update extract_playlist_id() to normalize (trim and lowercase) the url before
calling playlist_id_regex() so extraction behavior matches classification;
reference the functions extract_video_id, extract_playlist_id, classify_url,
watch_id_regex, short_host_id_regex, shorts_id_regex, and playlist_id_regex when
making the change.
- Around line 95-103: has_youtube_host currently does a substring search causing
false positives; change it to parse the input as a URL (use the existing URL
parsing approach or a Url/parse equivalent) and then compare the parsed host
(Url::host_str()) against the canonical hostnames
("youtube.com","www.youtube.com","m.youtube.com","music.youtube.com","youtu.be");
for youtu.be you may also accept the host exactly "youtu.be"; ensure any parse
errors return false so non-URLs are rejected and avoid inspecting the entire
string for embedded YouTube links.
---
Nitpick comments:
In `@plugins/vortex-mod-youtube/src/metadata.rs`:
- Around line 223-236: The current heuristic only treats the output as a
RawPlaylistEnvelope when it is a single non-empty line containing "entries",
which misses valid multi-line JSON envelopes; replace the heuristic with a
direct attempt to deserialize into RawPlaylistEnvelope by calling
serde_json::from_str::<RawPlaylistEnvelope>(trimmed) and, on Ok(env), return
Ok(from_envelope(env)); only if that deserialization returns Err should you fall
back to parse_jsonl_playlist(output). Update the block around trimmed,
RawPlaylistEnvelope, from_envelope, and parse_jsonl_playlist to try serde
deserialization first and then fallback to JSONL parsing.
In `@plugins/vortex-mod-youtube/src/quality_manager.rs`:
- Around line 163-181: The fallback picks the first entry from at_height which
follows yt-dlp input order and can flip arbitrarily; make this deterministic by
applying a stable secondary ranking on at_height before taking next(): sort the
at_height bucket to prefer muxed entries (ones with audio) over video-only, then
tie-break by a numeric quality metric such as bitrate or filesize (descending),
and finally use a stable comparison on ext or format id to ensure total
ordering; update the selection logic around at_height (the block using
prefs.preferred_container, FormatEntry entries, chosen_height and the final
ok_or(PluginError::NoMatchingFormat)) to use that sorted/stable iterator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc1cd2e4-c68e-49a0-b29f-84778abfcf82
⛔ Files ignored due to path filters (1)
plugins/vortex-mod-youtube/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
CHANGELOG.mdplugins/vortex-mod-youtube/.cargo/config.tomlplugins/vortex-mod-youtube/Cargo.tomlplugins/vortex-mod-youtube/README.mdplugins/vortex-mod-youtube/plugin.tomlplugins/vortex-mod-youtube/src/error.rsplugins/vortex-mod-youtube/src/extractor.rsplugins/vortex-mod-youtube/src/ipc_tests.rsplugins/vortex-mod-youtube/src/lib.rsplugins/vortex-mod-youtube/src/metadata.rsplugins/vortex-mod-youtube/src/plugin_api.rsplugins/vortex-mod-youtube/src/quality_manager.rsplugins/vortex-mod-youtube/src/url_matcher.rs
| let kind = classify_url(&url); | ||
| let response = match kind { | ||
| UrlKind::Playlist | UrlKind::Channel => { | ||
| let stdout = call_yt_dlp(yt_dlp_args_for_playlist(&url))?; | ||
| let playlist = parse_flat_playlist(&stdout).map_err(error_to_fn_error)?; | ||
| build_playlist_response(playlist) | ||
| } | ||
| UrlKind::Video | UrlKind::Shorts => { | ||
| let stdout = call_yt_dlp(yt_dlp_args_for_single_video(&url))?; | ||
| let video = parse_single_video(&stdout).map_err(error_to_fn_error)?; | ||
| build_single_video_response(video) | ||
| } | ||
| UrlKind::Unknown => { | ||
| return Err(error_to_fn_error(PluginError::UnsupportedUrl(url))); | ||
| } |
There was a problem hiding this comment.
Fix the UrlKind match arms.
This match does not line up with plugins/vortex-mod-youtube/src/url_matcher.rs:11-40, which defines UrlKind as Video | Playlist | Channel | Other. Shorts and Unknown will not compile against that enum.
Suggested change
let response = match kind {
UrlKind::Playlist | UrlKind::Channel => {
let stdout = call_yt_dlp(yt_dlp_args_for_playlist(&url))?;
let playlist = parse_flat_playlist(&stdout).map_err(error_to_fn_error)?;
build_playlist_response(playlist)
}
- UrlKind::Video | UrlKind::Shorts => {
+ UrlKind::Video => {
let stdout = call_yt_dlp(yt_dlp_args_for_single_video(&url))?;
let video = parse_single_video(&stdout).map_err(error_to_fn_error)?;
build_single_video_response(video)
}
- UrlKind::Unknown => {
+ UrlKind::Other => {
return Err(error_to_fn_error(PluginError::UnsupportedUrl(url)));
}
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/vortex-mod-youtube/src/plugin_api.rs` around lines 52 - 66, The match
over UrlKind in the response builder uses arms that don't exist; update the
match in the block that calls classify_url(...) to handle the enum variants
defined in url_matcher.rs (Video, Playlist, Channel, Other) instead of
Shorts/Unknown: group Playlist and Channel to call yt_dlp_args_for_playlist +
parse_flat_playlist + build_playlist_response, handle Video (which should also
cover shorts if classify_url classifies them) with yt_dlp_args_for_single_video
+ parse_single_video + build_single_video_response, and treat Other by returning
error_to_fn_error(PluginError::UnsupportedUrl(url)); keep calls to call_yt_dlp,
parse_* and build_* as-is.
There was a problem hiding this comment.
5 issues found across 14 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="plugins/vortex-mod-youtube/README.md">
<violation number="1" location="plugins/vortex-mod-youtube/README.md:35">
P2: The install filename is inconsistent with the documented build artifact name (`vortex_mod_youtube.wasm` vs `vortex-mod-youtube.wasm`), which can break manual installation.</violation>
</file>
<file name="plugins/vortex-mod-youtube/src/plugin_api.rs">
<violation number="1" location="plugins/vortex-mod-youtube/src/plugin_api.rs:74">
P2: `get_media_variants` does not enforce a video/short URL kind before running single-video extraction. Reject playlist/channel URLs explicitly to avoid incorrect or inconsistent results.</violation>
</file>
<file name="plugins/vortex-mod-youtube/src/url_matcher.rs">
<violation number="1" location="plugins/vortex-mod-youtube/src/url_matcher.rs:96">
P2: Host detection uses substring matching, so non-YouTube URLs that contain a YouTube URL in query/fragment can be misclassified as YouTube.</violation>
</file>
<file name="plugins/vortex-mod-youtube/src/metadata.rs">
<violation number="1" location="plugins/vortex-mod-youtube/src/metadata.rs:231">
P2: Envelope parsing should not return immediately on heuristic matches; deserialize defensively and fall back to JSONL when it is not actually an envelope.</violation>
</file>
<file name="plugins/vortex-mod-youtube/plugin.toml">
<violation number="1" location="plugins/vortex-mod-youtube/plugin.toml:16">
P2: The new `[config]` entries are currently a no-op: manifest parsing ignores this section, so these defaults/options are not applied at runtime.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| subprocess = ["yt-dlp"] | ||
|
|
||
| [config] | ||
| default_quality = { type = "string", default = "1080p", options = ["360p", "480p", "720p", "1080p", "1440p", "2160p", "4320p"] } |
There was a problem hiding this comment.
P2: The new [config] entries are currently a no-op: manifest parsing ignores this section, so these defaults/options are not applied at runtime.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At plugins/vortex-mod-youtube/plugin.toml, line 16:
<comment>The new `[config]` entries are currently a no-op: manifest parsing ignores this section, so these defaults/options are not applied at runtime.</comment>
<file context>
@@ -0,0 +1,22 @@
+subprocess = ["yt-dlp"]
+
+[config]
+default_quality = { type = "string", default = "1080p", options = ["360p", "480p", "720p", "1080p", "1440p", "2160p", "4320p"] }
+preferred_format = { type = "string", default = "mp4", options = ["mp4", "webm", "mkv"] }
+audio_only = { type = "boolean", default = false }
</file context>
Resolve 12 unique findings from PR #23 review (coderabbitai, greptile, cubic-dev-ai). Focus on correctness + security hardening of the YouTube plugin. No behaviour change to user-facing APIs beyond explicit rejection of previously-accepted-but-misrouted URLs. url_matcher.rs - Replace substring host matching with proper URL parsing so that `https://example.com/?next=https://youtube.com/watch?v=x` no longer smuggles through as a YouTube URL. - Add `normalize_for_matching` as a single chokepoint that validates the host against an explicit allowlist, strips userinfo + port, and lowercases path/query before regex matching. This also transparently fixes the previous failure on URLs like `www.youtube.com:443/...`. - Normalize `extract_video_id` and `extract_playlist_id` inputs with trim + host validation so that a valid classified URL no longer fails the subsequent id-extraction step. - Add `youtube-nocookie.com` (+ `www.`) to the recognised host set. - New tests: port authority, userinfo authority, query/fragment smuggling, evil.com prefix, trailing-whitespace extraction. metadata.rs - Replace the heuristic `first_line.contains("\"entries\"")` detector with a defensive try-parse: only attempt envelope deserialisation when the output is a single top-level JSON object, and silently fall back to JSONL parsing when the envelope shape does not fit. Removes the false-positive path where a JSONL line containing a playlist title of "entries" would commit to envelope parsing and fail the whole extraction. - Stop silently emitting empty strings when yt-dlp omits `webpage_url` / `url` for a single video or a playlist entry. Instead, derive a canonical `https://www.youtube.com/watch?v={id}` fallback so downstream MediaLink.url is always navigable. - Add `seconds_to_u64` guard that rejects NaN, infinity, and negative durations. Previously `(-1.0_f64).round() as u64` was saturating to `Some(0)`, indistinguishable from a legitimate 0-second video and hiding malformed or live-stream sentinel values. - New tests: single-line-entries-as-value, playlist-missing-url fallback, single-video-missing-webpage_url fallback, negative-duration-as-None, nan-duration-as-None. lib.rs + plugin_api.rs - `handle_can_handle` now matches classify_url against the exact supported kinds (Video | Shorts | Playlist | Channel) instead of delegating to `is_youtube_url`. Adding a new UrlKind later will force an explicit routing decision here. - `ensure_youtube_url` now returns the classified kind for callers to reuse, and `ensure_single_video` / `ensure_playlist_or_channel` guards enforce kind-specific preconditions. - `get_media_variants` rejects playlist/channel URLs so it no longer silently returns the first video's variants when handed a playlist. - `extract_playlist` rejects single-video URLs for the same reason. - New tests: ensure_single_video rejects playlist, accepts video + shorts; ensure_playlist_or_channel rejects video, accepts playlist + channel; can_handle rejects /embed/ (unsupported kind). README.md - Replace placeholder `https://github.com/...` with the actual repo URL. - Fix install instructions: the cargo build artifact is named `vortex_mod_youtube.wasm` (underscores), but the Vortex loader expects the wasm file to match the plugin directory name (`vortex-mod-youtube.wasm`, hyphens). Install steps now cp+rename explicitly and explain why. plugin.toml - Add "best" to `default_quality.options` to match the documented Quality::Best enum variant. - Document that the `[config]` section is forward-compatible only — the current manifest parser at src-tauri/src/adapters/driven/plugin/ manifest.rs ignores it, so defaults/options are not yet wired at runtime. Tests: 96 passing (77 → +19), clippy clean on both native and wasm32-wasip1 targets, release WASM still 1.2 MB.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/vortex-mod-youtube/README.md`:
- Around line 44-48: The fenced code block in README.md should include a
language identifier to satisfy MD040; update the block around the plugin path
listing by changing the opening backticks from ``` to ```text so the snippet
(the tree showing ~/.config/vortex/plugins/vortex-mod-youtube/ with plugin.toml
and vortex-mod-youtube.wasm) is fenced as text.
- Around line 30-33: The README incorrectly states that the Vortex plugin loader
requires the wasm binary filename to match the plugin name; instead update the
wording to reflect actual behavior in
src-tauri/src/adapters/driven/plugin/manifest.rs: it enforces that the plugin
directory name matches the manifest plugin name and that there is exactly one
.wasm file in that directory, not a strict filename match. Edit the paragraph
referencing filename matching to say that you must ensure the directory name
matches the plugin name in plugin.toml and that the build outputs a single .wasm
file in that directory (rename/copy the artifact into the directory if
necessary), and optionally add a brief note pointing readers to manifest.rs for
exact validation rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e8a0578-9cd9-4ec5-af31-0ed65925b1ed
📒 Files selected for processing (7)
plugins/vortex-mod-youtube/README.mdplugins/vortex-mod-youtube/plugin.tomlplugins/vortex-mod-youtube/src/ipc_tests.rsplugins/vortex-mod-youtube/src/lib.rsplugins/vortex-mod-youtube/src/metadata.rsplugins/vortex-mod-youtube/src/plugin_api.rsplugins/vortex-mod-youtube/src/url_matcher.rs
✅ Files skipped from review due to trivial changes (1)
- plugins/vortex-mod-youtube/plugin.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/vortex-mod-youtube/src/url_matcher.rs
The `specs/` directory is listed in `.gitignore` and the project rule in CLAUDE.md is explicit: "Ne jamais commit le dossier specs". One file (`03-domain-models.md`) slipped through before the gitignore rule was added and has been tracked ever since. Remove it from the git index with `git rm --cached` — the local file remains on disk (it is now covered by the existing `specs/` ignore entry) so no planning content is lost.
There was a problem hiding this comment.
2 issues found across 8 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="plugins/vortex-mod-youtube/src/url_matcher.rs">
<violation number="1" location="plugins/vortex-mod-youtube/src/url_matcher.rs:117">
P2: ID extraction is inconsistent with URL normalization: URLs with userinfo/port can be accepted as YouTube URLs but still fail extraction.</violation>
<violation number="2" location="plugins/vortex-mod-youtube/src/url_matcher.rs:203">
P2: `youtube-nocookie.com` is allowlisted but not actually supported by the regex matchers, so those URLs are still rejected.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Resolve 4 findings from the second review pass (coderabbitai, cubic-dev-ai) on PR #23. Two bugs in url_matcher share a root cause — regex patterns hard-coded `youtube\.com` into full-URL matches — so the fix is a path-based regex refactor that eliminates both. url_matcher.rs — path-based regex refactor - `youtube-nocookie.com` was on the host allowlist but still failed regex matching: the patterns looked for literal `youtube\.com` in the URL, which is not a substring of `youtube-nocookie.com`. Refactor the regex set to operate on the *path+query* portion of the URL instead of the full URL — the path is host-agnostic so all allowlisted authorities (youtube.com, www., m., music., youtube-nocookie.com, www.youtube-nocookie.com, youtu.be) share a single regex set. - `extract_video_id` / `extract_playlist_id` now operate on the same path+query slice as `classify_url`. Previously the extract helpers validated the host but ran the id regex against the original (un-normalised) URL, so URLs with explicit ports (`:443`) or userinfo (`user:pass@`) passed host validation and then failed at id extraction. With path-based matching the authority is stripped upstream and the regex sees a clean `/watch?v=…` fragment. - Introduce `validate_and_split(url) -> Option<(host_lower, path)>` as the single host-validation chokepoint. `classify_url`, `extract_video_id`, and `extract_playlist_id` all route through it. - Extract `extract_youtu_be_id` helper for the short-link host — youtu.be paths are the id itself, which does not fit a regex that assumes `/watch?v=…`. - Path-based regexes use `(?i)` only on the keyword portion (`watch`, `shorts`, `playlist`, `channel`, `user`, `c`) so that the captured id preserves its original case. YouTube ids are case-sensitive; lowercasing them produces a different video. README.md - Fix install docs: the Vortex plugin loader enforces *directory name* == manifest `name` and *exactly one .wasm file* in the directory — not a strict filename match. The previous wording implied filename was pinned. Document both rules explicitly and note that the cargo build artifact (`vortex_mod_youtube.wasm`, underscores) and the renamed install target (`vortex-mod-youtube.wasm`, hyphens) are both acceptable. - Add `text` language id to the install layout fenced code block (MD040 lint). New tests (9): - detects_valid_youtube_urls: add youtube-nocookie.com variants - classifies_youtube_nocookie_watch_as_video - extracts_video_id_from_youtube_nocookie - extracts_video_id_from_url_with_port - extracts_video_id_from_url_with_userinfo - extracts_playlist_id_from_url_with_port - accepts_mixed_case_watch_keyword - preserves_case_of_video_id Tests: 105 passing (96 → +9), clippy clean on both native and wasm32-wasip1 targets, release WASM still 1.2 MB.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/vortex-mod-youtube/README.md (1)
60-62: Consider adding a cross-platform test command note.
cargo test --target x86_64-unknown-linux-gnuis explicit but Linux-target-specific. A short note (or fallbackcargo test) would reduce confusion for macOS/Windows contributors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/vortex-mod-youtube/README.md` around lines 60 - 62, The README currently shows a Linux-specific test command `cargo test --target x86_64-unknown-linux-gnu`; update the test instructions to mention a cross-platform fallback and alternatives — e.g., keep the explicit Linux target line but add a sentence recommending contributors on macOS/Windows to run plain `cargo test` or specify their own target triple, and note how to run cross-platform CI tests if applicable; reference the README section where the current `cargo test --target x86_64-unknown-linux-gnu` example appears to place the new note adjacent to that command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugins/vortex-mod-youtube/README.md`:
- Around line 60-62: The README currently shows a Linux-specific test command
`cargo test --target x86_64-unknown-linux-gnu`; update the test instructions to
mention a cross-platform fallback and alternatives — e.g., keep the explicit
Linux target line but add a sentence recommending contributors on macOS/Windows
to run plain `cargo test` or specify their own target triple, and note how to
run cross-platform CI tests if applicable; reference the README section where
the current `cargo test --target x86_64-unknown-linux-gnu` example appears to
place the new note adjacent to that command.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9c559edf-4772-48ae-a18a-1a9fb6ba6f16
📒 Files selected for processing (2)
plugins/vortex-mod-youtube/README.mdplugins/vortex-mod-youtube/src/url_matcher.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/vortex-mod-youtube/src/url_matcher.rs
Summary
plugins/vortex-mod-youtube/, the first production crawler plugin for Vortex, targetingwasm32-wasip1viaextism-pdk1.4.yt-dlpthrough the hostrun_subprocessfunction — no direct network I/O from the plugin sandbox.specs/01-mvp/README.md.Architecture
Native-testable core / WASM-thin-wrapper split. The plugin keeps business logic in plain Rust modules that compile for the host toolchain, and gates the
#[plugin_fn]exports /#[host_fn]imports behindcfg(target_family = "wasm"). Result: 77 unit tests run undercargo test --target x86_64-unknown-linux-gnuwithout spinning up a WASM runtime, while the release build still produces a 1.2 MB stripped.wasmbinary.url_matcheryoutube.com/youtu.be//shorts///playlist//@handle//channel///user///c/URLs; video + playlist ID extraction.metadatayt-dlp --dump-json(single video) and--flat-playlist(accepts both JSONL and envelope shapes). Classifies formats asmuxed/video_only/audio_only/unknownviavcodec/acodecinspection.quality_managerBest) with height-bucket fallback and user container preference. Audio-only path picks highestabr.extractorplugin_api(WASM-only)#[plugin_fn]exports:can_handle,supports_playlist,extract_links,get_media_variants,extract_playlist.#[host_fn] extern \"ExtismHost\"import ofrun_subprocess.errorthiserrorenum withSerdeJson(#[from])preserving the error source chain.Security hardening (applied from code review)
--sentinel inserted before every URL argument passed toyt-dlp, so a URL that accidentally starts with-can never be interpreted as an option. Defense in depth beyond host-side URL validation.subprocess = [\"yt-dlp\"]is declared; thehttpcapability is omitted because all network access is delegated to yt-dlp.unsafeblock (therun_subprocesshost-function call) documents four named invariants: host symbol resolution, ABI shape, capability check, and ownership..unwrap()/.expect()in production paths (CLAUDE.md rule).build_subprocess_requestreturnsResulteven though the failure branch is practically unreachable.Test plan
cargo test --target x86_64-unknown-linux-gnu— 77/77 pass (url_matcher: 22, metadata: 16, quality_manager: 13, extractor: 10, ipc_tests: 10, error: 0 new)cargo clippy --target x86_64-unknown-linux-gnu --all-targets -- -D warnings— cleancargo clippy --all-targets -- -D warnings(WASM target) — cleancargo fmt -- --check— cleancargo build --release— producestarget/wasm32-wasip1/release/vortex_mod_youtube.wasm(1.2 MB)pre-push(src-tauriworkspace cargo test + vitest + tsc) — cleanplugin.tomland the release.wasminto~/.config/vortex/plugins/vortex-mod-youtube/, ensureyt-dlpis onPATH, then verifyplugin_loader.resolve_url("https://youtu.be/...")routes to the plugin in a running Vortex build.extract_linksandget_media_variantsend-to-end once a driving adapter wires them in.Follow-ups (not in this PR)
extract_links/get_media_variants/extract_playlistcalls from the Vortex application layer — currently onlycan_handleis consumed byresolve_url.watch_path_regexandwatch_id_regexinto a single compiled pattern.Summary by cubic
Adds a hardened YouTube WASM plugin via
yt-dlpthat handles videos, Shorts, playlists, and channels with strict URL validation and robust parsing. Completes MVP task 24.New Features
plugins/vortex-mod-youtubecrate forwasm32-wasip1usingextism-pdk1.4; exportscan_handle,supports_playlist,extract_links,get_media_variants,extract_playlist.yt-dlp(--dump-json); supports quality/container choice and audio-only; variants listed from metadata.youtube-nocookie.com,youtu.be); path-based regex matching; trims userinfo/ports; rejects query/fragment smuggling; preserves case of IDs; improved ID extraction for ports/userinfo.get_media_variantsrejects playlists;extract_playlistrejects single videos;can_handlerejects unsupported paths like/embed/.--sentinel before URLs; least-privilege manifest (subprocess = ["yt-dlp"], no HTTP); UTF-8-safe stderr truncation; 105 native tests; release.wasm~1.2 MB.Migration
yt-dlpis onPATHand host allowssubprocessforyt-dlp.wasm32-wasip1, then install to~/.config/vortex/plugins/vortex-mod-youtube/withplugin.tomland exactly one.wasmfile (rename tovortex-mod-youtube.wasmif needed).plugin.tomlconfig:default_quality(incl.best),preferred_format,audio_only, subtitles,cookie_file.Written for commit 5750b8e. Summary will update on new commits.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores