Skip to content

Add lazy-loading for extras packages#641

Draft
sjmonson wants to merge 4 commits intomainfrom
feat/lazy_loader
Draft

Add lazy-loading for extras packages#641
sjmonson wants to merge 4 commits intomainfrom
feat/lazy_loader

Conversation

@sjmonson
Copy link
Copy Markdown
Collaborator

TODO

Summary

Lazy loads extras submodules in order to defer import errors to the time of use.

Details

TODO

Test Plan

Without vLLM:

  1. Run guidellm benchmark run --help and observe no errors
  2. Run uv run guidellm benchmark run --backend vllm_python --model test and observe error with helpful message

With vLLM:

  1. Run guidellm benchmark run --help and observe no errors
  2. Run uv run guidellm benchmark run --backend vllm_python --model test ... and observe successful benchmark
  3. Run tox -re test-e2e and observe that tests pass (Previously they would fail when vLLM was installed due to load times)

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 17, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sjmonson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 17, 2026
@sjmonson sjmonson changed the base branch from main to fix/split_utils March 18, 2026 18:40
Base automatically changed from fix/split_utils to main March 18, 2026 18:41
@mergify mergify bot removed the needs-rebase label Mar 18, 2026
@sjmonson sjmonson mentioned this pull request Mar 19, 2026
4 tasks
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
`torch` seems to break when it encounters lazy ImportErrors

Signed-off-by: Samuel Monson <smonson@redhat.com>
@sjmonson sjmonson changed the base branch from main to fix/init_cleanup March 19, 2026 18:21
sjmonson added a commit that referenced this pull request Mar 19, 2026
## Summary

Clean up some `__init__` package code and move uvloop config to
`__init__`.

## Details

This clean up was originally a part of #641 but as that PR is blocked I
decided to split it out. Removing the transformers logging config does
not seem to have any real affect; I do not get logs either way.
Importing any huggingface libraries incurs a significant time cost so
this is a prereq to improving CLI responsiveness. Additionally uvloop
should be configured as early as possible so moved the setup to
`__init__`.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Base automatically changed from fix/init_cleanup to main March 19, 2026 20:00
@sjmonson sjmonson mentioned this pull request Mar 20, 2026
4 tasks
@sjmonson
Copy link
Copy Markdown
Collaborator Author

This PR is currently blocked on "Lazy load in __main__.py" as our click definitions depend on heavy imports. CLI refactoring will be the main focus of v0.7.0.

@dbutenhof dbutenhof added this to the v0.7.0 milestone Mar 20, 2026
sjmonson added a commit that referenced this pull request Mar 20, 2026
## Summary

Fixes spawn and forkserver multi-process contexts.

## Details

I was hoping that after #647 we could switch to `forkserver` by default.
However it turns out that `forkserver` and `spawn` will import the
calling processes entrypoint (E.g. `__main__.py`) so we run into the
same blocker as #641. However, I was able to confirm that striping every
heavy import out of `__main__.py` solves the issue. So we should be good
to switch in v0.7.0.

On my machine there is about a ~10s overhead for `forkserver` and
slightly more for `spawn`, which is not the worst for a default.
However, the overhead may be more on other systems:

### `time guidellm benchmark run --profile poisson --rate 5 --data
prompt_tokens=128,output_tokens=128 --max-seconds 30 --outputs json`

| Context    | real      | user      | sys      |
| ---------- | --------- | --------- | -------- |
| Fork       | 0m37.874s | 0m17.356s | 0m1.883s |
| Forkserver | 0m47.344s | 0m14.862s | 0m0.860s |
| Spawn      | 0m49.515s | 1m51.230s | 0m8.915s |

### `time guidellm benchmark run --profile concurrent --rate 400 --data
prompt_tokens=128,output_tokens=128 --max-seconds 30 --outputs json`

| Context    | real      | user      | sys       |
| ---------- | --------- | --------- | --------- |
| Fork       | 0m39.324s | 0m37.602s | 0m5.623s  |
| Forkserver | 0m49.609s | 0m19.710s | 0m1.311s  |
| Spawn      | 0m50.399s | 2m9.724s  | 0m11.374s |

### `time guidellm benchmark run --profile concurrent --rate 400 --data
prompt_tokens=128,output_tokens=128 --max-seconds 120 --outputs json`

| Context    | real      | user      | sys       |
| ---------- | --------- | --------- | --------- |
| Fork       | 2m15.309s | 1m42.911s | 0m15.957s |
| Forkserver | 2m25.964s | 0m38.891s | 0m2.802s  |
| Spawn      | 2m27.454s | 3m24.325s | 0m22.531s |

## Test Plan

Set `GUIDELLM__MP_CONTEXT_TYPE=forkserver` and confirm benchmarks run.

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants