continuous-batching

Star

Here are 6 public repositories matching this topic...

lumia431 / photon_infer

Star

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

modern-cpp inference-engine ai-infra vllm llm-inference paged-attention continuous-batching

Updated Jan 2, 2026
C++

gty111 / gLLM

Star

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

pipeline-parallelism tensor-parallelism llm-serving llm-inference pagedattention continuous-batching qwen3 token-throttling chunked-prefill

Updated Jan 12, 2026
Python

maxime-dlabai / mlx-continuous-batching

Star

OpenAI-compatible server with continuous batching for MLX on Apple Silicon

macos inference text-generation mlx apple-silicon openai-api llm continuous-batching

Updated Dec 4, 2025
Python

Fork of OpenAI and Anthropic compatible server for Apple Silicon. Native MLX backend, 500+ tok/s. Run LLMs and vision-language models with continuous batching, MCP tool calling, and multimodal support.

inference-server mlx multimodal apple-silicon llm vllm local-ai continuous-batching tool-calling openai-compatible

Updated Mar 9, 2026
Python

LessUp / hetero-paged-infer

Star

PagedAttention + Continuous Batching Inference Engine Prototype (Rust): Paged KV Cache & Dynamic Scheduling | PagedAttention + Continuous Batching 推理引擎原型（Rust），KV Cache 分页管理与动态调度

rust gpu-computing llm llm-inference paged-attention continuous-batching

Updated Mar 9, 2026
Rust

nagababumo / Efficiently-Serving-LLMs

Star

batching lora quantization lorax low-rank-adaptation continuous-batching multi-lora

Updated Jun 19, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the continuous-batching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the continuous-batching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continuous-batching

Here are 6 public repositories matching this topic...

lumia431 / photon_infer

gty111 / gLLM

maxime-dlabai / mlx-continuous-batching

swaylenhayes / vllm-mlx

LessUp / hetero-paged-infer

nagababumo / Efficiently-Serving-LLMs

Improve this page

Add this topic to your repo