Draft
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
e1e9203 to
5e4ee86
Compare
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
`torch` seems to break when it encounters lazy ImportErrors Signed-off-by: Samuel Monson <smonson@redhat.com>
876cc30 to
8156271
Compare
sjmonson
added a commit
that referenced
this pull request
Mar 19, 2026
## Summary Clean up some `__init__` package code and move uvloop config to `__init__`. ## Details This clean up was originally a part of #641 but as that PR is blocked I decided to split it out. Removing the transformers logging config does not seem to have any real affect; I do not get logs either way. Importing any huggingface libraries incurs a significant time cost so this is a prereq to improving CLI responsiveness. Additionally uvloop should be configured as early as possible so moved the setup to `__init__`. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
Collaborator
Author
|
This PR is currently blocked on "Lazy load in |
sjmonson
added a commit
that referenced
this pull request
Mar 20, 2026
## Summary Fixes spawn and forkserver multi-process contexts. ## Details I was hoping that after #647 we could switch to `forkserver` by default. However it turns out that `forkserver` and `spawn` will import the calling processes entrypoint (E.g. `__main__.py`) so we run into the same blocker as #641. However, I was able to confirm that striping every heavy import out of `__main__.py` solves the issue. So we should be good to switch in v0.7.0. On my machine there is about a ~10s overhead for `forkserver` and slightly more for `spawn`, which is not the worst for a default. However, the overhead may be more on other systems: ### `time guidellm benchmark run --profile poisson --rate 5 --data prompt_tokens=128,output_tokens=128 --max-seconds 30 --outputs json` | Context | real | user | sys | | ---------- | --------- | --------- | -------- | | Fork | 0m37.874s | 0m17.356s | 0m1.883s | | Forkserver | 0m47.344s | 0m14.862s | 0m0.860s | | Spawn | 0m49.515s | 1m51.230s | 0m8.915s | ### `time guidellm benchmark run --profile concurrent --rate 400 --data prompt_tokens=128,output_tokens=128 --max-seconds 30 --outputs json` | Context | real | user | sys | | ---------- | --------- | --------- | --------- | | Fork | 0m39.324s | 0m37.602s | 0m5.623s | | Forkserver | 0m49.609s | 0m19.710s | 0m1.311s | | Spawn | 0m50.399s | 2m9.724s | 0m11.374s | ### `time guidellm benchmark run --profile concurrent --rate 400 --data prompt_tokens=128,output_tokens=128 --max-seconds 120 --outputs json` | Context | real | user | sys | | ---------- | --------- | --------- | --------- | | Fork | 2m15.309s | 1m42.911s | 0m15.957s | | Forkserver | 2m25.964s | 0m38.891s | 0m2.802s | | Spawn | 2m27.454s | 3m24.325s | 0m22.531s | ## Test Plan Set `GUIDELLM__MP_CONTEXT_TYPE=forkserver` and confirm benchmarks run. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TODO
attachscientific-python/lazy-loader#168perfgroup)__main__.pyto improve CLI responsivenessSummary
Lazy loads extras submodules in order to defer import errors to the time of use.
Details
TODO
Test Plan
Without vLLM:
guidellm benchmark run --helpand observe no errorsuv run guidellm benchmark run --backend vllm_python --model testand observe error with helpful messageWith vLLM:
guidellm benchmark run --helpand observe no errorsuv run guidellm benchmark run --backend vllm_python --model test ...and observe successful benchmarktox -re test-e2eand observe that tests pass (Previously they would fail when vLLM was installed due to load times)Related Issues
Use of AI
## WRITTEN BY AI ##)