Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
241 commits
Select commit Hold shift + click to select a range
fe5720e
Add ggml-openvino base files
YangleiZouIntel Oct 29, 2024
5294402
add openvino as optional backend for Llama.cpp ggml
zhanmyz Nov 13, 2024
9b9d51d
* Configure the device(default CPU) that uses OpenVINO to compile th…
zhanmyz Nov 19, 2024
faa4a7d
Solve the issue of abnormal model output caused by using OpenVINO ADD…
zhanmyz Nov 21, 2024
adc2c70
Add OpenVINO MUL operator to GGML of Llama.cpp.
zhanmyz Dec 2, 2024
0a81aa1
Add compile options
zhanmyz Dec 2, 2024
77d6814
add OpenVINO frontend convert process steps
zhanmyz Dec 4, 2024
ee31dc1
add get openvino available ops function
zhanmyz Dec 5, 2024
171c468
Add PoC of integration of openvino frontend. Main changes: ggml-ov-fr…
yumengbo Nov 16, 2024
34e826a
Implement GgmlOvDecoder. Add dump functions.
yumengbo Nov 19, 2024
9b7b63d
Convert subgraph with add, sub, mul, div op to ov model and do infer …
yumengbo Nov 22, 2024
31bd816
Add GGML_OV_FRONTEND option. Add readme.
yumengbo Nov 22, 2024
5b46dc2
Change output for infer request to set output tensor. Support scale, …
yumengbo Dec 5, 2024
49804f4
add GET_ROWS operator of OpenVINO to GGML of llama.cpp
zhanmyz Dec 9, 2024
80c330a
Update build.md and add operation mapping(GGML to OpenVINO)
zhanmyz Dec 10, 2024
8c5a609
add the rms_norm operator implemented using OpenVINO to the GGML back…
zhanmyz Dec 16, 2024
e95f29c
Fix issue for output memory copy of infer request
yumengbo Dec 12, 2024
b100f89
Change to implementation following pytorch frontend
yumengbo Dec 12, 2024
590f587
Add support for UNARY SILU op . Fix pytorch impl bugs.
yumengbo Dec 17, 2024
d218c61
Support Softmax op
yumengbo Dec 18, 2024
8aba03b
Support Softmax op
yumengbo Dec 18, 2024
2353c73
Support ROPE op.
yumengbo Dec 21, 2024
0f7d07d
Add support for RMS_NORM OP
zhanmyz Dec 19, 2024
2b04bd4
Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML ba…
zhanmyz Jan 14, 2025
cb2729b
Move CPY from GGML OV Backend to OV Frontend
zhanmyz Jan 22, 2025
8484769
add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops
zhanmyz Feb 18, 2025
57582fd
add implementation of CPY when the output tensor is non-contiguous
zhanmyz Feb 19, 2025
afb8594
add tmp source code files
zhanmyz Feb 25, 2025
081b526
Execute singel CONT operator is OK
zhanmyz Feb 25, 2025
901f734
Execute CONT & VIEW operators in OV Frontend is OK
zhanmyz Mar 1, 2025
95ae982
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion o…
zhanmyz Mar 3, 2025
9a7b7d8
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX…
zhanmyz Mar 5, 2025
f98d215
Change the input parameter shape of CONT operator
zhanmyz Mar 5, 2025
f37fa21
Change the input and ouput node shape of MUL_MAT operator
zhanmyz Mar 5, 2025
246a2d1
Change the input and ouput node shape of MUL_MAT operator
zhanmyz Mar 5, 2025
d05c458
change CONT and MULMAT input node shape
zhanmyz Mar 6, 2025
e08a7fd
All adjacent ops can conversion but calculation result is wrong and n…
zhanmyz Mar 6, 2025
cff473a
1. All operators implemented using OpenVINO can be successfully execu…
zhanmyz Mar 9, 2025
467a5dd
1. Update the implementation of CPY node when it's non-contiguous
zhanmyz Mar 11, 2025
b14b49d
Minor Update
zhanmyz Mar 11, 2025
19ec9b6
Try to add VIEW node to OV Frontend and have some issues that need to…
zhanmyz Mar 12, 2025
b02265a
1. In the Prompt process and predict first token stage, the PERMUTE n…
zhanmyz Mar 15, 2025
8020138
add debug info
zhanmyz Mar 17, 2025
8ae700a
Process Prompt and predict first token is OK
zhanmyz Mar 26, 2025
eac9a99
1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase o…
zhanmyz Mar 31, 2025
84be5c6
1. Delete some comments
zhanmyz Mar 31, 2025
651b2c0
* Use find_package in CMake to configure OpenVINO
wine99 Apr 14, 2025
91d2a19
change op mappings to list in openvino_supports_op
wine99 Apr 15, 2025
8d263bd
2nd+ token correct by fix CPY in OV, remove single op backend compute…
wine99 Apr 15, 2025
8b40886
Arbitrary token len (>32) work; Fix bug in mulmat
wine99 Apr 17, 2025
6ed44a3
FEAT: do PERMUTE eagerly
wine99 Apr 21, 2025
0c7b026
FEAT: Add interleaved mode for ROPE
wine99 Apr 22, 2025
c04966c
REFACTOR: support weigts as constant
wine99 Apr 28, 2025
96ba47d
STYLE: minor refactor
wine99 Apr 28, 2025
d3bdca2
PERF: share const nodes for weights for diff infer
wine99 Apr 28, 2025
0a8cc9a
BUILD: update build doc, add cmake preset, add CACHE_DIR env var
wine99 Apr 29, 2025
7d5e234
FEAT: improve debug capability
wine99 Apr 30, 2025
a8e5efa
PERF: compile once (dynamic graph + cache)
wine99 May 8, 2025
ffabe95
Rebase - Bring up to date and fix build process
virajwad May 9, 2025
4c905b2
fix build error
wine99 May 13, 2025
a0b3052
FIX: backend buffer type issue
wine99 May 13, 2025
f15a2cc
STYLE: clang-format
wine99 May 9, 2025
0d009fe
FEAT: Add all conversion code from ov side
wine99 May 9, 2025
cdf5370
PERF: favor low precision matmul
wine99 May 13, 2025
0d505b4
STYLE and minor REFACTOR
wine99 May 13, 2025
041d220
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
wine99 May 14, 2025
c57f614
FIX: input shape of KQ_mask
wine99 May 14, 2025
a30dc6e
PERF: add weight constant in parallel
wine99 May 14, 2025
8ac5c22
FIX: set_max_token_len
wine99 May 16, 2025
d7cc802
PERF: use Slice+Concat in writing cache_v
wine99 May 16, 2025
fd32436
Update build doc
wine99 May 20, 2025
8ce5cc5
Add cgraph tensor output name to OV op name
wine99 May 22, 2025
3051d5a
Update openvino build instructions
ravi9 May 29, 2025
7fec223
Add initial NPU support
wine99 May 27, 2025
34531ab
draft NPU support version 2: prefill + kvcache
wine99 May 29, 2025
d9ca8f5
NPU support version 2: prefill + kvcache
wine99 Jun 3, 2025
f7ad779
Change due to ggml cgraph changes, not correct yet
wine99 Jun 4, 2025
592d7f8
Change due to ggml cgraph changes, llama-3.2 CPU work
wine99 Jun 16, 2025
e27738a
Add AMD64 to CMakeLists
wine99 Jun 16, 2025
42d4240
Change due to ggml cgraph changes, all device work
wine99 Jun 16, 2025
593484c
Refactor: clean, fix warning
wine99 Jun 20, 2025
8afee79
Update clang-format
wine99 Jun 23, 2025
4c582ac
Statful transformation for CPU GPU
wine99 Jun 26, 2025
73ee84f
Add SwiGLU
wine99 Jul 3, 2025
ebc4fc9
Fuse to SDPA
wine99 Jul 3, 2025
bf5414c
Replace Concat with Broadcast in MulMat for GQA
wine99 Jul 4, 2025
acf358d
Pull out indices creation for kv cache update
wine99 Jul 6, 2025
0fa7a5e
Refactor: remove past_token_len from extra_inputs
wine99 Jul 9, 2025
3533c14
Fix Phi3 SwiGLU and SoftMax
wine99 Jul 9, 2025
a80da69
Pull out sin cos from rope
wine99 Jul 9, 2025
f3c0519
Reduce memory: free ov weights node after graph conversion
wine99 Jul 11, 2025
d61f83c
Fix CPY due to cgraph change
wine99 Jul 17, 2025
ea75772
Added OpenVINO CI/CD. Updated docs
ravi9 Jul 18, 2025
1ed49bb
Fix llama-cli
wine99 Jul 23, 2025
44f4cf3
Fix Phi3 ROPE; Add test-backend-ops
wine99 Jul 21, 2025
6dc4b90
Fix NPU
wine99 Jul 23, 2025
75eec62
Fix llama-bench; Clang-format
wine99 Jul 24, 2025
4e7f04a
Fix llama-perplexity
wine99 Jul 24, 2025
9cf56d6
temp. changes for mark decomp
cavusmustafa Jul 29, 2025
01cdf4a
matmul in fp32
wine99 Jul 29, 2025
e2fdc1b
mulmat input conversion fix
cavusmustafa Jul 30, 2025
93b2d09
mulmat type conversion update
cavusmustafa Jul 30, 2025
1a19566
add mark decomp pass
cavusmustafa Jul 30, 2025
43489bb
Revert changes in fuse_to_sdpa
wine99 Jul 30, 2025
2f99135
Update build.md
ravi9 Jul 31, 2025
fc86534
Fix test-backend-ops
wine99 Jul 31, 2025
1141350
Skip test-thread-safety; Run ctest only in ci/run.sh
wine99 Jul 31, 2025
37ff226
Use CiD for NPU
wine99 Aug 1, 2025
9a91ca6
Optimize tensor conversion, improve TTFT
wine99 Aug 4, 2025
63d000b
Support op SET_ROWS
wine99 Aug 13, 2025
7bda502
Fix NPU
wine99 Aug 14, 2025
839f8c6
Remove CPY
wine99 Aug 14, 2025
f4123be
Fix test-backend-ops
wine99 Aug 14, 2025
a7b611b
Minor updates for raising PR
wine99 Aug 14, 2025
14c8a85
Perf: RMS fused to OV internal RMS op
wine99 Aug 27, 2025
65e1b1a
Fix after rebasing
wine99 Sep 4, 2025
56d5967
Change openvino device_type to GPU; Enable flash_attn
wine99 Sep 5, 2025
3e897df
Update supports_buft and supports_op for quantized models
wine99 Aug 5, 2025
d4ca760
Add quant weight conversion functions from genai gguf reader
wine99 Aug 5, 2025
663a0b8
Quant models run with accuracy issue
wine99 Aug 6, 2025
6ab76ed
Fix accuracy: disable cpu_repack
wine99 Aug 7, 2025
dd80b04
Fix CI; Disable test-backend-ops
wine99 Aug 7, 2025
a1ce428
Fix Q4_1
wine99 Aug 8, 2025
9900245
Fix test-backend-ops: Treat quantized tensors as weights
wine99 Aug 12, 2025
9ca53c7
Add NPU Q4_0 support
wine99 Aug 19, 2025
82c9833
NPU perf: eliminate zp
wine99 Aug 22, 2025
b593428
Dequantize q4_1 q4_k q6_k for NPU
wine99 Aug 29, 2025
6926655
Add custom quant type: q8_1_c, q4_0_128
wine99 Sep 2, 2025
c5231a2
Set m_is_static=false as default in decoder
wine99 Sep 2, 2025
810eb48
Simpilfy translation of get_rows
wine99 Sep 2, 2025
0f7b253
Fix after rebasing
wine99 Sep 8, 2025
2ad1147
Improve debug util; Eliminate nop ReshapeReshape
wine99 Sep 10, 2025
dc77cbb
STYLE: make get_types_to_requant a function
wine99 Sep 10, 2025
bcc343a
Support BF16 model
wine99 Sep 11, 2025
434059a
Fix NPU compile
wine99 Sep 12, 2025
da2cc99
WA for npu 1st token acc issue
wine99 Sep 12, 2025
be07073
Apply EliminateZP only for npu
wine99 Sep 12, 2025
5975612
Add GeGLU
wine99 Sep 15, 2025
7d81861
Fix Hunyuan
wine99 Sep 15, 2025
9de874c
Support iSWA
wine99 Sep 16, 2025
602f9ca
Fix NPU accuracy
wine99 Sep 17, 2025
1a38339
Fix ROPE accuracy when freq_scale != 1
wine99 Sep 17, 2025
67e178a
Minor: not add attention_size_swa for non-swa model
wine99 Sep 17, 2025
2f1d50f
Minor refactor
wine99 Sep 19, 2025
e4bfe5a
Add Q5_K to support phi-3-q4_k_m
wine99 Sep 23, 2025
f3afa7b
Requantize Q6_K (gs16) to gs32 on GPU
wine99 Sep 26, 2025
fdadca1
Fix after rebasing
wine99 Sep 28, 2025
973a80f
Always apply Eliminate_ZP to fix GPU compile issue on some platforms
wine99 Sep 28, 2025
c112bc4
kvcachefusion support
cavusmustafa Oct 1, 2025
e725292
env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added
cavusmustafa Oct 1, 2025
05d7aba
Fix for Phi3
cavusmustafa Oct 2, 2025
a9371ea
Fix llama-cli (need to run with --no-warmup)
wine99 Oct 9, 2025
8b82d11
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_s…
wine99 Oct 10, 2025
299f492
fix after rebasing
wine99 Oct 11, 2025
2d2f00a
Fix llama-3-8b and phi3-mini q4_0 NPU
wine99 Oct 14, 2025
841d673
Update to OV-2025.3 and CMakeLists.txt
ravi9 Oct 15, 2025
4c8406e
Add OV CI cache
wine99 Oct 15, 2025
38e8a19
Apply CISC review and update CI to OV2025.3
ravi9 Oct 15, 2025
45af912
Update CI to run OV dep install before build
ravi9 Oct 15, 2025
3a1129e
Update OV dockerfile to use OV2025.3 and update build docs
ravi9 Oct 15, 2025
bd3093f
Style: use switch in supports_ops
wine99 Oct 21, 2025
eba8113
Style: middle ptr and ref align, omit optional struct keyword
wine99 Oct 21, 2025
b8690bc
NPU Unify PD (#14)
wine99 Nov 4, 2025
303923a
Clean placeholders in ggml-openvino.cpp
wine99 Oct 21, 2025
ea2c99b
NPU unify PD (handled internally)
wine99 Nov 5, 2025
072dde0
change graph to 4d, support multi sequences
wine99 Nov 20, 2025
ae404f7
Fix llama-bench
wine99 Nov 20, 2025
531941b
Fix NPU
wine99 Nov 24, 2025
047bfb5
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
11b4cc5
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
bed4952
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
4a57b37
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
98396b2
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
4400b5c
Update ggml-decoder.cpp
I-N-T-E-L Nov 20, 2025
ae93651
Remove the second decoder for node. Moving the function into the mode…
zhaixuejun1993 Nov 26, 2025
992dea7
Fix error for naive
zhaixuejun1993 Nov 26, 2025
38254cf
NPU prefill chunking
wine99 Dec 1, 2025
59e7e7c
NPU fix llama-bench
wine99 Dec 3, 2025
65348b5
fallback naive run with accuracy issue
wine99 Nov 27, 2025
808619e
NPU support llma-perplexity -b 512 --no-warmup
wine99 Dec 3, 2025
2a9d4ca
Refactor: split ov_graph_compute for dynamic and static
wine99 Dec 4, 2025
0ea8238
remove unused API GgmlOvDecoder::get_output_stride(const std::string …
zhaixuejun1993 Dec 4, 2025
8f4ee4e
minor update due to ov 2025.4
wine99 Dec 4, 2025
497964a
remove unused API GgmlOvDecoder::get_output_names()
zhaixuejun1993 Dec 4, 2025
f516db1
remove unused API get_output_shape(const std::string & name)
zhaixuejun1993 Dec 4, 2025
6d7a0d6
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
zhaixuejun1993 Dec 4, 2025
ba852f2
Removed API GgmlOvDecoder::get_output_op_params(const std::string & n…
zhaixuejun1993 Dec 4, 2025
111c96c
Removed API get_output_ggml_tensor(const std::string & name)
zhaixuejun1993 Dec 4, 2025
8ff73e5
Removed API m_outputs
zhaixuejun1993 Dec 4, 2025
197ed99
Removed m_output_names
zhaixuejun1993 Dec 4, 2025
95c3071
Removed API GgmlOvDecoder::get_input_names()
zhaixuejun1993 Dec 4, 2025
cd61178
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
zhaixuejun1993 Dec 4, 2025
891a3be
Removed API get_input_type
zhaixuejun1993 Dec 4, 2025
42ca27f
Removed API get_input_type
zhaixuejun1993 Dec 4, 2025
acb8a01
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
zhaixuejun1993 Dec 4, 2025
47c91db
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
zhaixuejun1993 Dec 4, 2025
91a1b20
Fix error for decoder cache
zhaixuejun1993 Dec 5, 2025
28da9a9
Reuse cached decoder
wine99 Dec 5, 2025
469325c
GPU remove Q6_K requantization
wine99 Dec 8, 2025
ae01322
NPU fix wrong model output shape
wine99 Dec 8, 2025
c9234b4
NPU fix q4 perf regression
wine99 Dec 8, 2025
9e3163e
Remove unused variable nodes
zhaixuejun1993 Dec 10, 2025
0ef2e5e
Fix decoder can_reuse for llama-bench
wine99 Dec 11, 2025
ae53363
Update build.md for Windows
I-N-T-E-L Dec 26, 2025
22d9c17
backend buffer: allocate on host
wine99 Dec 18, 2025
72bba82
Use shared_buffer for GPU NPU; Refactor
wine99 Dec 18, 2025
3fdcb6a
Add ov_backend_host_buffer; Use cached remote context
wine99 Dec 19, 2025
d757849
Put kvcache on GPU
wine99 Dec 22, 2025
8273a7c
Use ggml_aligned_malloc
wine99 Dec 24, 2025
88d1d17
only use remote tensor for kvcache
wine99 Dec 25, 2025
a356b44
only use remote tensor for kvcache for GPU
wine99 Dec 25, 2025
cfc4713
FIX: use remote tensor from singleton
wine99 Dec 26, 2025
52a4401
Update build.md to include OpenCL
wine99 Dec 26, 2025
c1142dd
NPU always requant to q4_0_128
wine99 Dec 26, 2025
67c9720
Optimize symmetric quant weight extraction: use single zp
wine99 Dec 29, 2025
4e45177
Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant
wine99 Dec 29, 2025
f5c71e3
Update build.md
wine99 Dec 30, 2025
0d6f253
Support -ctk f32
wine99 Jan 7, 2026
5f30eac
Initial stateful graph support
cavusmustafa Jan 8, 2026
d2fc152
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa Jan 9, 2026
981ec65
code cleanup
cavusmustafa Jan 9, 2026
a40a5df
npu perf fix
cavusmustafa Jan 9, 2026
a81b202
requant to f16 for Q6 embed on NPU
cavusmustafa Jan 12, 2026
a92ecee
Update ggml/src/ggml-openvino/ggml-decoder.cpp
cavusmustafa Jan 13, 2026
599335c
Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp
cavusmustafa Jan 13, 2026
416556a
Create OPENVINO.md in llama.cpp backend docs
ynimmaga Jan 13, 2026
25e6525
Update OPENVINO.md
ynimmaga Jan 13, 2026
9ba3247
Update OPENVINO.md
ynimmaga Jan 13, 2026
61552e4
Update OPENVINO.md
ynimmaga Jan 13, 2026
63eed0d
Update build.md
ynimmaga Jan 13, 2026
f44c60e
Update OPENVINO.md
ynimmaga Jan 13, 2026
e9ed5c4
Update OPENVINO.md
ynimmaga Jan 13, 2026
d3649c1
Update OPENVINO.md
ynimmaga Jan 13, 2026
d7dccf8
kq_mask naming fix
cavusmustafa Jan 15, 2026
aa4bc90
Syntax correction for workflows build file
cavusmustafa Jan 16, 2026
9a15c8b
Change ov backend buffer is_host to false
wine99 Jan 21, 2026
be2d4b6
Merge pull request #34 from ravi9/fix-backend-buffer-is-host
cavusmustafa Jan 21, 2026
e0c377f
Fix llama-bench -p -n where p<=256
wine99 Jan 22, 2026
ff9bb1a
Fix --direct-io 0
wine99 Jan 22, 2026
a6eafbc
Stateful fix for shape errors after rebase
cavusmustafa Jan 21, 2026
e4c1c5b
Simplification for stateful and update output shape processing
cavusmustafa Jan 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
ARG OPENVINO_VERSION_MAJOR=2025.3
ARG OPENVINO_VERSION_FULL=2025.3.0.19807.44526285f24
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
ARG http_proxy=
ARG https_proxy=

## Build Image
FROM ubuntu:${UBUNTU_VERSION} AS build

# Pass proxy args to build stage
ARG http_proxy
ARG https_proxy

RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
gnupg \
wget \
git \
cmake \
ninja-build \
build-essential \
libtbb12 \
libcurl4-openssl-dev && \
rm -rf /var/lib/apt/lists/*

# Install OpenVINO for Ubuntu 24.04
ARG OPENVINO_VERSION_MAJOR
ARG OPENVINO_VERSION_FULL
RUN mkdir -p /opt/intel && \
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \
cd - && \
ln -s /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} /opt/intel/openvino

ENV OpenVINO_DIR=/opt/intel/openvino

WORKDIR /app

COPY . .

# Build Stage
RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON && \
cmake --build build/ReleaseOV -j$(nproc)"

# Copy all necessary libraries
RUN mkdir -p /app/lib && \
find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \
find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \
find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \;

# Create runtime directories and copy binaries
RUN mkdir -p /app/full \
&& cp build/ReleaseOV/bin/* /app/full/ \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base Runtime Image
FROM ubuntu:${UBUNTU_VERSION} AS base

# Pass proxy args to runtime stage
ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app/

### Full (all binaries)
FROM base AS full

ARG http_proxy
ARG https_proxy

COPY --from=build /app/full /app/

WORKDIR /app

RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
python3 \
python3-venv \
python3-pip && \
python3 -m venv /ov-venv && \
/ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"]


### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app/

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app/

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
25 changes: 25 additions & 0 deletions .github/actions/linux-setup-openvino/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: "Linux - Setup OpenVINO Toolkit"
description: "Setup OpenVINO Toolkit for Linux"
inputs:
path:
description: "Installation path"
required: true
version_major:
description: "OpenVINO major version (e.g., 2025.3)"
required: true
version_full:
description: "OpenVINO full version (e.g., 2025.3.0.19807.44526285f24)"
required: true

runs:
using: "composite"
steps:
- name: Setup OpenVINO Toolkit
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://storage.openvinotoolkit.org/repositories/openvino/packages/${{ inputs.version_major }}/linux/openvino_toolkit_ubuntu24_${{ inputs.version_full }}_x86_64.tgz
path: ${{ inputs.path }}
type: z
strip: 1

28 changes: 28 additions & 0 deletions .github/workflows/build-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,34 @@ jobs:
path: ./spacemit_toolchain
version: ${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}

ubuntu-24-openvino-cache:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build.yml
OPENVINO_VERSION_MAJOR: "2025.3"
OPENVINO_VERSION_FULL: "2025.3.0.19807.44526285f24"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Setup Cache
uses: actions/cache@v4
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

windows-2022-rocm-cache:
runs-on: windows-2022

Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -737,6 +737,61 @@ jobs:
-DGGML_SYCL_F16=ON
cmake --build build --config Release -j $(nproc)

ubuntu-24-cmake-openvino:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build-cache.yml
OPENVINO_VERSION_MAJOR: "2025.3"
OPENVINO_VERSION_FULL: "2025.3.0.19807.44526285f24"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ubuntu-24-cmake-openvino-no-preset-v1
evict-old-files: 1d

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential libcurl4-openssl-dev libtbb12 cmake ninja-build python3-pip

- name: Use OpenVINO Toolkit Cache
uses: actions/cache@v4
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh

- name: Build
id: cmake_build
run: |
source ./openvino_toolkit/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
cmake --build build/ReleaseOV --config Release -j $(nproc)

build-linux-cross:
uses: ./.github/workflows/build-linux-cross.yml

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
- { tag: "vulkan", dockerfile: ".devops/vulkan.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false, runs_on: "ubuntu-22.04" }
- { tag: "s390x", dockerfile: ".devops/s390x.Dockerfile", platforms: "linux/s390x", full: true, light: true, server: true, free_disk_space: false, runs_on: "ubuntu-22.04-s390x" }
- { tag: "rocm", dockerfile: ".devops/rocm.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: true, runs_on: "ubuntu-22.04" }
- { tag: "openvino", dockerfile: ".devops/openvino.Dockerfile", platforms: "linux/amd64", full: true, light: true, server: true, free_disk_space: false }
steps:
- name: Check out the repo
uses: actions/checkout@v4
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,78 @@ jobs:
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.tar.gz
name: llama-bin-ubuntu-vulkan-x64.tar.gz

ubuntu-24-openvino:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build.yml
OPENVINO_VERSION_MAJOR: "2025.3"
OPENVINO_VERSION_FULL: "2025.3.0.19807.44526285f24"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: ccache
uses: ggml-org/ccache-action@v1.2.16
with:
key: ubuntu-24-cmake-openvino-release-no-preset-v1
evict-old-files: 1d

- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential libcurl4-openssl-dev libtbb12 cmake ninja-build python3-pip

- name: Use OpenVINO Toolkit Cache
uses: actions/cache@v4
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh

- name: Build
id: cmake_build
run: |
source ./openvino_toolkit/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
cmake --build build/ReleaseOV --config Release -j $(nproc)

- name: Determine tag name
id: tag
uses: ./.github/actions/get-tag-name

- name: Pack artifacts
id: pack_artifacts
run: |
cp LICENSE ./build/ReleaseOV/bin/
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-openvino-x64.zip ./build/ReleaseOV/bin/*

- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-openvino-x64.zip
name: llama-bin-ubuntu-openvino-x64.zip

windows-cpu:
runs-on: windows-2025

Expand Down
12 changes: 12 additions & 0 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
# # with KLEIDIAI support
# GG_BUILD_KLEIDIAI=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
#
# # with OPENVINO support
# GG_BUILD_OPENVINO=1 GG_BUILD_LOW_PERF=1 GGML_OPENVINO_DEVICE=CPU bash ./ci/run.sh ./tmp/results ./tmp/mnt
#

if [ -z "$2" ]; then
echo "usage: $0 <output-dir> <mnt-dir>"
Expand Down Expand Up @@ -165,6 +168,15 @@ if [ -n "${GG_BUILD_KLEIDIAI}" ]; then
-DBUILD_SHARED_LIBS=OFF"
fi

if [ ! -z ${GG_BUILD_OPENVINO} ]; then
if [ -z ${OpenVINO_DIR} ]; then
echo "OpenVINO_DIR not found, please install OpenVINO via archives and enable it by:"
echo "source /opt/intel/openvino/setupvars.sh"
exit 1
fi
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_OPENVINO=ON -DGGML_CPU_REPACK=OFF"
fi

## helpers

# download a file if it does not exist or if it is outdated
Expand Down
Loading