A simplified recreation of the batching logic behind vLLM and SGLang. Implements:
- Rolling dynamic batches
- Concurrent generation
- Deadlines (soft/hard)
- Per-token scheduling
- Requests can enter mid-generation
- Variable sequence lengths
- Max-batch + max-latency constraints
- Fair queueing
- Throughput benchmarks