This repository utilizes Docker to package large language models and multimodal models optimized for Rockchip platforms. It provides a unified calling interface that is compatible with the OpenAI API, making it easy for users to integrate and use these models.
For reComputer RK3588 and reComputer RK3576.
| Device | Model |
|---|---|
| RK3588 | rk3588-qwen2-vl:7b-w8a8-latest rk3588-qwen2-vl:2b-w8a8-latest |
| RK3576 | rk3576-qwen2.5-vl:3b-w4a16-latest |
Note: A rough estimate of a model's inference speed includes both TTFT and TPOT. Note: You can use
python test_inference_speed.py --helpto view the help function.
python -m venv .env && source .env/bin/activate
pip install requests
python llm_speed_test.pyReference: rknn-llm