Skip to content

Add flux1 pipeline#2

Draft
kkimmk wants to merge 14 commits intomainfrom
add-flux1-pipeline
Draft

Add flux1 pipeline#2
kkimmk wants to merge 14 commits intomainfrom
add-flux1-pipeline

Conversation

@kkimmk
Copy link
Collaborator

@kkimmk kkimmk commented Jan 16, 2026

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @kkimmk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the MAX framework's capabilities by introducing a complete image generation workflow through the Flux1 diffusion pipeline. It integrates several advanced deep learning models (VAE, CLIP, T5, and a custom Flux Transformer) and a specialized diffusion scheduler. The changes enable users to generate high-quality images from textual prompts with fine-grained control over various parameters, while also enhancing core utilities for model compilation and image handling. This lays a robust foundation for future generative AI applications within the MAX ecosystem.

Highlights

  • New Flux1 Image Generation Pipeline: A comprehensive image generation pipeline, FluxPipeline, has been integrated, leveraging various sub-models including VAE, CLIP, T5, and a custom Flux Transformer model for text-to-image synthesis.
  • Dedicated Diffusion Entrypoint and CLI: A new DiffusionPipeline entrypoint is introduced for image generation tasks, complemented by new CLI commands under max.entrypoints.pipelines to facilitate image generation with configurable parameters like prompts, dimensions, and inference steps.
  • Core Model Architectures Implemented: Detailed implementations for AutoencoderKL (VAE), CLIPTextModel, and T5EncoderModel are added, complete with their respective sub-modules such as ResnetBlock2D, VAEAttention, CLIPAttention, CLIPMLP, T5LayerNorm, and T5Attention.
  • Custom Diffusion Scheduler: A FlowMatchEulerDiscreteScheduler has been added, providing a native Modular implementation of the Euler scheduler, crucial for controlling the denoising process in diffusion models.
  • Image Processing and Utility Enhancements: A VaeImageProcessor is introduced for robust image preprocessing and postprocessing, alongside experimental CompileWrapper and max_compile utilities for optimizing graph compilation, and a finfo class for DType numerical properties, including bfloat16 support.
  • Configuration System Updates: The configuration system (max.config and max.pipelines.lib.config) has been updated to support the new image generation pipeline, including handling of model_index.json for Diffusers-style repositories and optimizing memory estimation by skipping KV cache profiling for image generation tasks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a large but well-structured pull request that adds support for the FLUX.1 diffusion model. The changes span from new model architecture implementations, a new diffusion pipeline, CLI extensions, to various supporting utilities. The code is generally of high quality. My review focuses on improving robustness, correctness, and maintainability in a few key areas. I've identified a potential bug in the CLIP text encoder's pooling logic, suggested improvements for command-line argument parsing and exception handling, and recommended using pathlib for more robust path manipulation.

Comment on lines +453 to +459
if self.eos_token_id == 2:
eos_token_indices = ops.argmax(input_ids, axis=-1).cast(DType.int32)
else:
eos_token_indices = ops.argmax(
ops.equal(input_ids, self.eos_token_id).cast(DType.int32),
axis=-1,
).cast(DType.int32)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The special handling for self.eos_token_id == 2 using ops.argmax(input_ids, ...) is brittle and likely incorrect. It assumes that the token with the highest ID in the sequence is the EOS token, which is not a safe assumption. This can lead to incorrect pooling by selecting the wrong token's hidden state.

The logic in the else block, which explicitly finds the eos_token_id, is much more robust. I recommend using this robust logic for all cases to prevent potential bugs.

        eos_token_indices = ops.argmax(
            ops.equal(input_ids, self.eos_token_id).cast(DType.int32),
            axis=-1,
        ).cast(DType.int32)

parser.add_argument(
"--model-path", type=str, default="black-forest-labs/FLUX.1-dev"
)
parser.add_argument("--use-torch-randn", type=bool, default=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using type=bool with argparse can lead to unexpected behavior. For instance, python script.py --use-torch-randn False would result in True because bool('False') evaluates to True.

For boolean flags, it's better to use action=argparse.BooleanOptionalAction (for Python 3.9+) which automatically creates --use-torch-randn and --no-use-torch-randn flags. This makes the CLI behavior explicit and less error-prone.

Suggested change
parser.add_argument("--use-torch-randn", type=bool, default=True)
parser.add_argument("--use-torch-randn", action=argparse.BooleanOptionalAction, default=True)

Comment on lines +1083 to +1086
except Exception as e:
raise ValueError(
f"Failed to load configuration from {config_path}: {e}"
) from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a generic Exception is too broad and can hide unexpected errors. It's better to catch more specific exceptions that you expect to handle, such as json.JSONDecodeError for parsing issues and IOError for file reading problems.

Suggested change
except Exception as e:
raise ValueError(
f"Failed to load configuration from {config_path}: {e}"
) from e
except (json.JSONDecodeError, IOError) as e:
raise ValueError(
f"Failed to load configuration from {config_path}: {e}"
) from e

weight_paths = [
Path(pretrained_model_name_or_path) / weight_path
for weight_path in self.pipeline_config.model.weight_path
if weight_path.split("/")[0] == name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using weight_path.split("/")[0] for path manipulation is not platform-agnostic and can fail on operating systems that use a different path separator (like Windows). It's more robust to use pathlib for this, as it handles path components correctly across different OS environments.

Suggested change
if weight_path.split("/")[0] == name
if Path(weight_path).parts[0] == name

@kkimmk kkimmk force-pushed the add-flux1-pipeline branch from 072e77c to 6252c71 Compare January 19, 2026 07:04
@kkimmk kkimmk force-pushed the add-flux1-pipeline branch from a149668 to c16451a Compare February 3, 2026 13:46
tae-su-kim pushed a commit that referenced this pull request Feb 13, 2026
The `InlineArray` copy/move constructors current implementation simply
looped through each element and moved/copied. This produced poor codegen
as it does not take advantage of any loop unrolling or checking if the
element types are trivial and simply copying the underlying MLIR
storage.

The previous codegen for a simple function:
```mojo
fn return_array() -> InlineArray[Int32, 4]:
    var arr = InlineArray[Int32, 4](fill=0)
    return arr^
```

produces this codegen:
```
define dso_local void @"test_inline_array::return_array"(ptr noalias noundef nonnull writeonly captures(none) %0) #0 !dbg !5 {
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %0, i8 0, i64 16, i1 false), !dbg !90
  tail call void asm sideeffect "nop", ""() #2, !dbg !91
  ret void, !dbg !32
}
```

but now, we get this improved LLVM IR:
```
define dso_local void @"test_inline_array::_return_array"(ptr noalias noundef nonnull writeonly captures(none) initializes((0, 16)) %0) #0 !dbg !5 {
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 4 dereferenceable(16) %0, i8 0, i64 16, i1 false), !dbg !51
  ret void, !dbg !32
}
```

Notably there is no `asm sideffect "nop"` and an `initializes((0, 16))`
is on the `ptr` argument showing that the values do get initialized.

MODULAR_ORIG_COMMIT_REV_ID: 14efb85038ec4697b7e52a8b7beb76e2b48b09bc
pei0033 pushed a commit that referenced this pull request Mar 10, 2026
token when context is reset

When a preemption occurs when using overlap scheduler, MAX Serve crashes
in the subsequent iteration.

For example:
```
Requesting API:  51%|█████     | 674/1319 [04:05<02:01,  5.31it/s]01:00:23.055 INFO: Executed TG batch with 64 reqs | Terminated: 0 reqs, Pending: 0 reqs | Input Tokens: 64/128 toks | Context Tokens: 91369/10880 toks | Prompt Tput: 1.5K tok/s, Generation Tput: 1.5K tok/s | Batch creation: 474.32us, Execution: 42.34ms | KVCache usage: 51.3% of 680 blocks, Cache hit rate: 0.0% | All Preemptions: 0 reqs

Requesting API:  51%|█████     | 675/1319 [04:05<03:02,  3.53it/s]01:00:24.328 INFO: Preempted a request due to lack of KV pages. This can affect the end-to-end performance. Consider increasing device-memory-utilization via `--device-memory-utilization` to provide more KV cache memory. Total Preemption Count: 1
...
  File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/profiler/tracing.py", line 111, in wrapper
    return func(*args, **kwargs)
  File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/pipelines/lib/pipeline_variants/overlap_text_generation.py", line 810, in execute
    context.update_with_future_token()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/runner/.cache/bazel/_bazel_runner/e5185c6d580a8eeb67820192d4a9baf7/execroot/_main/bazel-out/k8-opt-ci-build/bin/max/tests/integration/accuracy/pipelines-lm-eval.runfiles/_main/max/python/max/pipelines/core/context.py", line 315, in update_with_future_token
    raise ValueError("Cannot have multiple future tokens.")
ValueError: Cannot have multiple future tokens.
```

This is what happens:
```
iteration 1: [100, 101, ..., 116] (generated_length = 0)
iteration 2: [100, 101, ..., 116, FUTURE_TOKEN] (generated_length = 1)
iteration 3: [100, 101, ..., 116, 117, FUTURE_TOKEN] (generated_length = 2)
iteration 4: [100, 101, ..., 116, 117, 118, FUTURE_TOKEN] (generated_length = 0)

**PREEMPT!!!**

iteration 5: [100, 101, ..., 116, 117, 118, 119, FUTURE_TOKEN, FUTURE_TOKEN] (generated_length = 1)
                                                     ^^^^ Cannot have multiple future tokens.
```

Notice that we have this logic:
```
\\ If generated_length is still 0, then there is no placeholder
\\ future token. This is possible due to chunked prefill.
if context.tokens.generated_length:
    context.realize_future_token(
        new_token=next_token, log_probabilities=log_probs
    )
```

So when a preemption occurs, the `generated_length` is reset to 0. Thus
the `context.realize_future_token` is not run after the preemption. This
causes us to have multiple future tokens in a row.

There are two options to fix this:
1. When a preemption occurs, keep the future token. Then realize the
future
token in the next iteration even if `generated_length` is 0. This means
     that we will not lose the progress from that iteration.
2. When a preemption occurs, delete the future token. This means that we
will
     lose the progress from that iteration.

I picked #2. This loses an iteration of progress when a preemption
occurs
relative to option #1. However, it is simpler. This is because the
placeholder
future token would become part of the prompt in #1 despite the future
token
still needing to be associated with log_probs and still needing to be
streamed
to the user in `consume_recently_generated_tokens`.

MODULAR_ORIG_COMMIT_REV_ID: 0a21e63721148e2c2c85f9037f0d68b27eaeb405
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants