feat: support first+last frame multi-conditioning for I2V by zhaopengme · Pull Request #23 · Blaizzy/mlx-video

zhaopengme · 2026-03-23T06:43:47Z

Fix negative frame_idx bug in apply_conditioning (e.g. -1 for last frame)
Add end_image and end_image_strength params to generate_video()
Add _build_i2v_conditionings() helper to construct conditioning list
Update all 4 pipeline branches (DISTILLED, DEV, DEV_TWO_STAGE, DEV_TWO_STAGE_HQ) to encode and apply both first-frame and last-frame conditioning
Add --end-image and --end-image-strength CLI arguments

When both image and end_image are provided, the video is conditioned to start from the first image and end at the last image, creating a smooth transition between the two frames.

Made-with: Cursor

- Fix negative frame_idx bug in apply_conditioning (e.g. -1 for last frame) - Add end_image and end_image_strength params to generate_video() - Add _build_i2v_conditionings() helper to construct conditioning list - Update all 4 pipeline branches (DISTILLED, DEV, DEV_TWO_STAGE, DEV_TWO_STAGE_HQ) to encode and apply both first-frame and last-frame conditioning - Add --end-image and --end-image-strength CLI arguments When both image and end_image are provided, the video is conditioned to start from the first image and end at the last image, creating a smooth transition between the two frames. Made-with: Cursor

Copilot

Pull request overview

This PR extends the LTX-2 I2V generation path to support dual conditioning (first frame + last frame) while fixing negative frame index handling in latent conditioning.

Changes:

Fix apply_conditioning() to correctly handle negative frame_idx values (e.g., -1 for last frame).
Add end_image / end_image_strength to generate_video() and wire them through all pipeline branches to apply first+last frame conditioning.
Expose --end-image and --end-image-strength in the CLI and add a helper to build the conditioning list.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`mlx_video/models/ltx_2/generate.py`	Adds end-frame conditioning support and applies first+last conditioning across all pipelines + CLI args.
`mlx_video/models/ltx_2/conditioning/latent.py`	Normalizes negative conditioning indices so `-1` can target the last latent frame.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T06:47:03Z

mlx_video/models/ltx_2/generate.py

+            console.print(
+                f"[dim]First image: {image} (strength={image_strength}, frame={image_frame_idx})[/]"
+            )


When --end-image is set, image_frame_idx is ignored and the first image is always conditioned at frame 0 (see _build_i2v_conditionings). The log line still prints frame={image_frame_idx}, which can be misleading (e.g., user passes --image-frame-idx 8 and sees it echoed even though it won’t be used). Consider printing frame=0 (or explicitly indicating it’s ignored) whenever end_image is provided.

Suggested change

console.print(

f"[dim]First image: {image} (strength={image_strength}, frame={image_frame_idx})[/]"

)

if has_end_image:

console.print(

f"[dim]First image: {image} (strength={image_strength}, frame=0; image_frame_idx ignored when --end-image is set)[/]"

)

else:

console.print(

f"[dim]First image: {image} (strength={image_strength}, frame={image_frame_idx})[/]"

)

Copilot · 2026-03-23T06:47:04Z

mlx_video/models/ltx_2/generate.py

+    is_i2v = image is not None or end_image is not None
+    has_end_image = end_image is not None
+    if end_image_strength is None:
+        end_image_strength = image_strength


If both image and end_image are provided, a 1-frame video (or any case where latent_frames == 1) can’t satisfy “first+last” conditioning: frame_idx=-1 normalizes to the same frame as the first conditioning and will overwrite it. Consider validating upfront (e.g., require num_frames > 1 when both images are set, or raise a clear error) to avoid silently dropping one conditioning.

Copilot · 2026-03-23T06:47:04Z

mlx_video/models/ltx_2/generate.py

+                    input_image = load_image(image, height=s1_h, width=s1_w, dtype=model_dtype)
+                    stage1_image_latent = vae_encoder(prepare_image_for_encoding(input_image, s1_h, s1_w, dtype=model_dtype))
+                    mx.eval(stage1_image_latent)
+                    input_image = load_image(image, height=s2_h, width=s2_w, dtype=model_dtype)
+                    stage2_image_latent = vae_encoder(prepare_image_for_encoding(input_image, s2_h, s2_w, dtype=model_dtype))
+                    mx.eval(stage2_image_latent)
+
+                if has_end_image:
+                    end_input = load_image(end_image, height=s1_h, width=s1_w, dtype=model_dtype)
+                    stage1_end_image_latent = vae_encoder(prepare_image_for_encoding(end_input, s1_h, s1_w, dtype=model_dtype))
+                    mx.eval(stage1_end_image_latent)
+                    end_input = load_image(end_image, height=s2_h, width=s2_w, dtype=model_dtype)
+                    stage2_end_image_latent = vae_encoder(prepare_image_for_encoding(end_input, s2_h, s2_w, dtype=model_dtype))


Several newly added lines here are far beyond Black’s default line length (e.g., nested vae_encoder(prepare_image_for_encoding(...)) calls on a single line). Since the repo enforces Black via pre-commit, this will likely be reformatted (or fail CI if not run). Please run Black / reformat these calls into the standard multi-line style used elsewhere in this file for readability and consistency.

Suggested change

input_image = load_image(image, height=s1_h, width=s1_w, dtype=model_dtype)

stage1_image_latent = vae_encoder(prepare_image_for_encoding(input_image, s1_h, s1_w, dtype=model_dtype))

mx.eval(stage1_image_latent)

input_image = load_image(image, height=s2_h, width=s2_w, dtype=model_dtype)

stage2_image_latent = vae_encoder(prepare_image_for_encoding(input_image, s2_h, s2_w, dtype=model_dtype))

mx.eval(stage2_image_latent)

if has_end_image:

end_input = load_image(end_image, height=s1_h, width=s1_w, dtype=model_dtype)

stage1_end_image_latent = vae_encoder(prepare_image_for_encoding(end_input, s1_h, s1_w, dtype=model_dtype))

mx.eval(stage1_end_image_latent)

end_input = load_image(end_image, height=s2_h, width=s2_w, dtype=model_dtype)

stage2_end_image_latent = vae_encoder(prepare_image_for_encoding(end_input, s2_h, s2_w, dtype=model_dtype))

input_image = load_image(

image,

height=s1_h,

width=s1_w,

dtype=model_dtype,

)

stage1_image_latent = vae_encoder(

prepare_image_for_encoding(

input_image,

s1_h,

s1_w,

dtype=model_dtype,

)

)

mx.eval(stage1_image_latent)

input_image = load_image(

image,

height=s2_h,

width=s2_w,

dtype=model_dtype,

)

stage2_image_latent = vae_encoder(

prepare_image_for_encoding(

input_image,

s2_h,

s2_w,

dtype=model_dtype,

)

)

mx.eval(stage2_image_latent)

if has_end_image:

end_input = load_image(

end_image,

height=s1_h,

width=s1_w,

dtype=model_dtype,

)

stage1_end_image_latent = vae_encoder(

prepare_image_for_encoding(

end_input,

s1_h,

s1_w,

dtype=model_dtype,

)

)

mx.eval(stage1_end_image_latent)

end_input = load_image(

end_image,

height=s2_h,

width=s2_w,

dtype=model_dtype,

)

stage2_end_image_latent = vae_encoder(

prepare_image_for_encoding(

end_input,

s2_h,

s2_w,

dtype=model_dtype,

)

)

Copilot · 2026-03-23T06:47:04Z

mlx_video/models/ltx_2/conditioning/latent.py

+        # Normalize negative indices (e.g. -1 -> last frame)
+        if frame_idx < 0:
+            frame_idx = frame_idx % f


apply_conditioning now supports negative frame_idx values via modulo normalization, but VideoConditionByLatentIndex’s docstring still describes frame_idx only as “0 = first frame”. Consider updating the public-facing documentation (class docstring and/or apply_conditioning doc) to explicitly state that negative indices are accepted (e.g., -1 = last frame).

Blaizzy · 2026-03-23T08:48:08Z

Could you share a before and after video?

zhaopengme · 2026-03-23T09:32:03Z

ref first image

first video

video_6fd8a68187df444fb66ddad5fc25a347.mp4

and second

video_17d988fff1e4446fbec7c798928a08fa.mp4

ref end image

three video

video_18423d7276ef4e84b53bce1b8fa223fc.mp4

this my repo

https://github.com/zhaopengme/MLXGateway

Copilot AI review requested due to automatic review settings March 23, 2026 06:43

Copilot started reviewing on behalf of zhaopengme March 23, 2026 06:44 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support first+last frame multi-conditioning for I2V#23

feat: support first+last frame multi-conditioning for I2V#23
zhaopengme wants to merge 1 commit intoBlaizzy:mainfrom
zhaopengme:main

zhaopengme commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Blaizzy commented Mar 23, 2026

Uh oh!

zhaopengme commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zhaopengme commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Blaizzy commented Mar 23, 2026

Uh oh!

zhaopengme commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants