Add flux1 pipeline by kkimmk · Pull Request #2 · SqueezeBits/modular

kkimmk · 2026-01-16T08:35:55Z

No description provided.

gemini-code-assist · 2026-01-16T08:37:16Z

Summary of Changes

Hello @kkimmk, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the MAX framework's capabilities by introducing a complete image generation workflow through the Flux1 diffusion pipeline. It integrates several advanced deep learning models (VAE, CLIP, T5, and a custom Flux Transformer) and a specialized diffusion scheduler. The changes enable users to generate high-quality images from textual prompts with fine-grained control over various parameters, while also enhancing core utilities for model compilation and image handling. This lays a robust foundation for future generative AI applications within the MAX ecosystem.

Highlights

New Flux1 Image Generation Pipeline: A comprehensive image generation pipeline, FluxPipeline, has been integrated, leveraging various sub-models including VAE, CLIP, T5, and a custom Flux Transformer model for text-to-image synthesis.
Dedicated Diffusion Entrypoint and CLI: A new DiffusionPipeline entrypoint is introduced for image generation tasks, complemented by new CLI commands under max.entrypoints.pipelines to facilitate image generation with configurable parameters like prompts, dimensions, and inference steps.
Core Model Architectures Implemented: Detailed implementations for AutoencoderKL (VAE), CLIPTextModel, and T5EncoderModel are added, complete with their respective sub-modules such as ResnetBlock2D, VAEAttention, CLIPAttention, CLIPMLP, T5LayerNorm, and T5Attention.
Custom Diffusion Scheduler: A FlowMatchEulerDiscreteScheduler has been added, providing a native Modular implementation of the Euler scheduler, crucial for controlling the denoising process in diffusion models.
Image Processing and Utility Enhancements: A VaeImageProcessor is introduced for robust image preprocessing and postprocessing, alongside experimental CompileWrapper and max_compile utilities for optimizing graph compilation, and a finfo class for DType numerical properties, including bfloat16 support.
Configuration System Updates: The configuration system (max.config and max.pipelines.lib.config) has been updated to support the new image generation pipeline, including handling of model_index.json for Diffusers-style repositories and optimizing memory estimation by skipping KV cache profiling for image generation tasks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a large but well-structured pull request that adds support for the FLUX.1 diffusion model. The changes span from new model architecture implementations, a new diffusion pipeline, CLI extensions, to various supporting utilities. The code is generally of high quality. My review focuses on improving robustness, correctness, and maintainability in a few key areas. I've identified a potential bug in the CLIP text encoder's pooling logic, suggested improvements for command-line argument parsing and exception handling, and recommended using pathlib for more robust path manipulation.

gemini-code-assist · 2026-01-16T08:42:15Z

max/python/max/pipelines/architectures/clip/clip.py

+        if self.eos_token_id == 2:
+            eos_token_indices = ops.argmax(input_ids, axis=-1).cast(DType.int32)
+        else:
+            eos_token_indices = ops.argmax(
+                ops.equal(input_ids, self.eos_token_id).cast(DType.int32),
+                axis=-1,
+            ).cast(DType.int32)


The special handling for self.eos_token_id == 2 using ops.argmax(input_ids, ...) is brittle and likely incorrect. It assumes that the token with the highest ID in the sequence is the EOS token, which is not a safe assumption. This can lead to incorrect pooling by selecting the wrong token's hidden state.

The logic in the else block, which explicitly finds the eos_token_id, is much more robust. I recommend using this robust logic for all cases to prevent potential bugs.

eos_token_indices = ops.argmax( ops.equal(input_ids, self.eos_token_id).cast(DType.int32), axis=-1, ).cast(DType.int32)

gemini-code-assist · 2026-01-16T08:42:16Z

max/examples/diffusion/offline_generation.py

+    parser.add_argument(
+        "--model-path", type=str, default="black-forest-labs/FLUX.1-dev"
+    )
+    parser.add_argument("--use-torch-randn", type=bool, default=True)


Using type=bool with argparse can lead to unexpected behavior. For instance, python script.py --use-torch-randn False would result in True because bool('False') evaluates to True.

For boolean flags, it's better to use action=argparse.BooleanOptionalAction (for Python 3.9+) which automatically creates --use-torch-randn and --no-use-torch-randn flags. This makes the CLI behavior explicit and less error-prone.

Suggested change

parser.add_argument("--use-torch-randn", type=bool, default=True)

parser.add_argument("--use-torch-randn", action=argparse.BooleanOptionalAction, default=True)

gemini-code-assist · 2026-01-16T08:42:16Z

max/python/max/config/__init__.py

+    except Exception as e:
+        raise ValueError(
+            f"Failed to load configuration from {config_path}: {e}"
+        ) from e


Catching a generic Exception is too broad and can hide unexpected errors. It's better to catch more specific exceptions that you expect to handle, such as json.JSONDecodeError for parsing issues and IOError for file reading problems.

Suggested change

except Exception as e:

raise ValueError(

f"Failed to load configuration from {config_path}: {e}"

) from e

except (json.JSONDecodeError, IOError) as e:

raise ValueError(

f"Failed to load configuration from {config_path}: {e}"

) from e

gemini-code-assist · 2026-01-16T08:42:16Z

max/python/max/pipelines/lib/interfaces/diffusion_pipeline.py

+                weight_paths = [
+                    Path(pretrained_model_name_or_path) / weight_path
+                    for weight_path in self.pipeline_config.model.weight_path
+                    if weight_path.split("/")[0] == name


Using weight_path.split("/")[0] for path manipulation is not platform-agnostic and can fail on operating systems that use a different path separator (like Windows). It's more robust to use pathlib for this, as it handles path components correctly across different OS environments.

Suggested change

if weight_path.split("/")[0] == name

if Path(weight_path).parts[0] == name

The `InlineArray` copy/move constructors current implementation simply looped through each element and moved/copied. This produced poor codegen as it does not take advantage of any loop unrolling or checking if the element types are trivial and simply copying the underlying MLIR storage. The previous codegen for a simple function: ```mojo fn return_array() -> InlineArray[Int32, 4]: var arr = InlineArray[Int32, 4](fill=0) return arr^ ``` produces this codegen: ``` define dso_local void @"test_inline_array::return_array"(ptr noalias noundef nonnull writeonly captures(none) %0) #0 !dbg !5 { tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %0, i8 0, i64 16, i1 false), !dbg !90 tail call void asm sideeffect "nop", ""() #2, !dbg !91 ret void, !dbg !32 } ``` but now, we get this improved LLVM IR: ``` define dso_local void @"test_inline_array::_return_array"(ptr noalias noundef nonnull writeonly captures(none) initializes((0, 16)) %0) #0 !dbg !5 { tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %0, i8 0, i64 16, i1 false), !dbg !51 ret void, !dbg !32 } ``` Notably there is no `asm sideffect "nop"` and an `initializes((0, 16))` is on the `ptr` argument showing that the values do get initialized. MODULAR_ORIG_COMMIT_REV_ID: 14efb85038ec4697b7e52a8b7beb76e2b48b09bc

token when context is reset When a preemption occurs when using overlap scheduler, MAX Serve crashes in the subsequent iteration. For example: ``` Requesting API: 51%|█████ | 674/1319 [04:05<02:01, 5.31it/s]01:00:23.055 INFO: Executed TG batch with 64 reqs | Terminated: 0 reqs, Pending: 0 reqs | Input Tokens: 64/128 toks | Context Tokens: 91369/10880 toks | Prompt Tput: 1.5K tok/s, Generation Tput: 1.5K tok/s | Batch creation: 474.32us, Execution: 42.34ms | KVCache usage: 51.3% of 680 blocks, Cache hit rate: 0.0% | All Preemptions: 0 reqs Requesting API: 51%|█████ | 675/1319 [04:05<03:02, 3.53it/s]01:00:24.328 INFO: Preempted a request due to lack of KV pages. This can affect the end-to-end performance. Consider increasing device-memory-utilization via `--device-memory-utilization` to provide more KV cache memory. Total Preemption Count: 1 ... File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/profiler/tracing.py", line 111, in wrapper return func(*args, **kwargs) File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/pipelines/lib/pipeline_variants/overlap_text_generation.py", line 810, in execute context.update_with_future_token() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/pipelines/core/context.py", line 315, in update_with_future_token raise ValueError("Cannot have multiple future tokens.") ValueError: Cannot have multiple future tokens. ``` This is what happens: ``` iteration 1: [100, 101, ..., 116] (generated_length = 0) iteration 2: [100, 101, ..., 116, FUTURE_TOKEN] (generated_length = 1) iteration 3: [100, 101, ..., 116, 117, FUTURE_TOKEN] (generated_length = 2) iteration 4: [100, 101, ..., 116, 117, 118, FUTURE_TOKEN] (generated_length = 0) **PREEMPT!!!** iteration 5: [100, 101, ..., 116, 117, 118, 119, FUTURE_TOKEN, FUTURE_TOKEN] (generated_length = 1) ^^^^ Cannot have multiple future tokens. ``` Notice that we have this logic: ``` \\ If generated_length is still 0, then there is no placeholder \\ future token. This is possible due to chunked prefill. if context.tokens.generated_length: context.realize_future_token( new_token=next_token, log_probabilities=log_probs ) ``` So when a preemption occurs, the `generated_length` is reset to 0. Thus the `context.realize_future_token` is not run after the preemption. This causes us to have multiple future tokens in a row. There are two options to fix this: 1. When a preemption occurs, keep the future token. Then realize the future token in the next iteration even if `generated_length` is 0. This means that we will not lose the progress from that iteration. 2. When a preemption occurs, delete the future token. This means that we will lose the progress from that iteration. I picked #2. This loses an iteration of progress when a preemption occurs relative to option #1. However, it is simpler. This is because the placeholder future token would become part of the prompt in #1 despite the future token still needing to be associated with log_probs and still needing to be streamed to the user in `consume_recently_generated_tokens`. MODULAR_ORIG_COMMIT_REV_ID: 0a21e63721148e2c2c85f9037f0d68b27eaeb405

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

kkimmk force-pushed the add-flux1-pipeline branch from 072e77c to 6252c71 Compare January 19, 2026 07:04

kkimmk and others added 14 commits January 21, 2026 04:53

feat: flux1 entrypoint

97b3f3a

feat: flux1 interface

5ef581b

feat: flux1 pipeline

d81e142

[MAX] Add Conv2d and GroupNorm to module_v3

2b80ab7

[Pipelines] Add AutoencoderKL VAE decoder implementation for Flux.1

ee0f9a4

feat: add clip for flux1 pipeline support

8cb2147

feat: add t5 for flux1 pipeline support

dc452f9

feat: add flux1 model for flux1 pipeline support

74b220a

feat: add flux1 pipeline

e5957d3

Merge branch 'add/flux1-pipeline/models-clip' into add-flux1-pipeline

120066a

Merge branch 'add/flux1-pipeline/models-t5' into add-flux1-pipeline

0aad0c2

Merge branch 'add/flux1-pipeline/models-flux1' into add-flux1-pipeline

d8b0ba3

Merge branch 'add/flux1-pipeline/models-vae' into add-flux1-pipeline

3ba9328

chore: add benchmark

c16451a

kkimmk force-pushed the add-flux1-pipeline branch from a149668 to c16451a Compare February 3, 2026 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flux1 pipeline#2

Add flux1 pipeline#2
kkimmk wants to merge 14 commits intomainfrom
add-flux1-pipeline

kkimmk commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	parser.add_argument("--use-torch-randn", type=bool, default=True)
	parser.add_argument("--use-torch-randn", action=argparse.BooleanOptionalAction, default=True)

	if weight_path.split("/")[0] == name
	if Path(weight_path).parts[0] == name

Conversation

kkimmk commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants