Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docker/Dockerfile.lmcache
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
ARG LMCACHE_VERSION=latest
FROM lmcache/vllm-openai:${LMCACHE_VERSION}

COPY . /opt/contextpilot
WORKDIR /opt/contextpilot

RUN /opt/venv/bin/python3 -m ensurepip && \
/opt/venv/bin/python3 -m pip install --no-cache-dir . && \
/opt/venv/bin/python3 contextpilot/install_hook.py

ENV CONTEXTPILOT_INDEX_URL=http://localhost:8765
EXPOSE 8000 8765

COPY docker/entrypoint-vllm.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]
CMD ["Qwen/Qwen2.5-7B-Instruct", "--enable-prefix-caching", "--kv-transfer-config", "{\"kv_connector\":\"LMCacheConnectorV1\",\"kv_role\":\"kv_both\"}"]
39 changes: 37 additions & 2 deletions docs/getting_started/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ Single container with both the engine and ContextPilot server.
### Build

```bash
docker build -t contextpilot-sglang -f docker/Dockerfile.sglang .
docker build -t contextpilot-vllm -f docker/Dockerfile.vllm .
docker build -t contextpilot-sglang -f docker/Dockerfile.sglang .
docker build -t contextpilot-vllm -f docker/Dockerfile.vllm .
docker build -t contextpilot-lmcache -f docker/Dockerfile.lmcache .
```

Pin a specific engine version:
Expand Down Expand Up @@ -75,6 +76,40 @@ docker run --gpus all --ipc=host \

Everything after the image name is passed to the engine. Defaults are `Qwen/Qwen3.5-2B` for both images.

**vLLM + LMCache (KV cache CPU offloading):**

[LMCache](https://github.com/LMCache/LMCache) offloads KV cache to CPU/disk so evicted prefixes can be restored without recomputation. ContextPilot works with LMCache out of the box — the `BlockPool` hook is unaffected.

```bash
docker build -t contextpilot-lmcache -f docker/Dockerfile.lmcache .
```

Pin a specific LMCache version:

```bash
docker build -t contextpilot-lmcache -f docker/Dockerfile.lmcache --build-arg LMCACHE_VERSION=latest .
```

Run:

```bash
docker run --gpus all --ipc=host \
-p 8000:8000 -p 8765:8765 \
-e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN \
contextpilot-lmcache
```

Override the model or LMCache config:

```bash
docker run --gpus all --ipc=host \
-p 8000:8000 -p 8765:8765 \
-e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN \
contextpilot-lmcache \
Qwen/Qwen3-4B --enable-prefix-caching \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'
```

## GPU Selection

```bash
Expand Down
13 changes: 13 additions & 0 deletions docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,19 @@ python -m vllm.entrypoints.openai.api_server \
--enable-prefix-caching
```

**vLLM + LMCache (optional KV cache CPU offloading):**

[LMCache](https://github.com/LMCache/LMCache) offloads evicted KV cache to CPU/disk so prefixes can be restored without recomputation. Just install it and add the `--kv-transfer-config` flag — ContextPilot works with LMCache out of the box.

```bash
pip install lmcache
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-4B \
--port 30000 \
--enable-prefix-caching \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'
```

> **Note:** For eviction sync, prefix with `CONTEXTPILOT_INDEX_URL=http://localhost:8765`. This lets the inference engine notify ContextPilot when KV cache entries are evicted.

## Step 2: Start ContextPilot
Expand Down
Loading