inference-optmization

Here is 1 public repository matching this topic...

karun2328 / qwen2.5-7b-vllm-prefill-benchmarks

Prefill performance study on Qwen2.5-7B using vLLM. Compares static vs mixed (bucketed) prefill under eager execution and CUDA Graphs, with controlled concurrency and real-world latency/throughput metrics.

gpu vllm llm-inference qwen2-5 cuda-graphs prefill-benchmarking inference-optmization

Updated Feb 10, 2026
Python

Improve this page

Add a description, image, and links to the inference-optmization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference-optmization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-optmization

Here is 1 public repository matching this topic...

karun2328 / qwen2.5-7b-vllm-prefill-benchmarks

Improve this page

Add this topic to your repo