Add ov falback to CPU machinisim & verified with OV without ROPE support by zhaixuejun1993 · Pull Request #42 · ravi9/llama.cpp

zhaixuejun1993 · 2026-02-09T05:21:36Z

Enabling the OpenVINO backend fallback to Llama.cpp CPU backend mechanisms.

Below is a summary of the main process:

Dynamic Dimension Computation
Function: compute_cgraph_dynamic_dims()
Purpose: Determines the dynamic dimensions for each node in the computation graph. This is essential for handling nodes with variable shapes during runtime.
Process:
Traverses the computation graph.
Assigns dynamic dimension indices to nodes based on their operation type and dependencies.
Handles specific operations like [GGML_OP_VIEW], [GGML_OP_RESHAPE], and others to propagate dynamic dimensions.
Adding Extra Model Outputs
Function: add_extra_model_outputs_for_fallback()
Purpose: Ensures that all relevant nodes in the computation graph are included as model outputs for fallback scenarios.
Process:
Maps tensor data addresses to their corresponding nodes, excluding [GGML_OP_VIEW] nodes.
Adds nodes to the [m_model_outputs] map if they are not already present.
Adding Extra Model Inputs
Function: add_extra_model_inputs_for_fallback()
Purpose: Ensures that all necessary input nodes are included as model inputs for fallback scenarios.
Process:
Iterates through the source nodes of each computation graph node.
Skips nodes already in [m_model_weights] or [m_model_inputs].
Excludes intermediate nodes from [m_node_info_list].
Creates OpenVINO parameter nodes for eligible source nodes and updates the [m_inputs] and [m_model_inputs] maps.

…end of llama.cpp

…ckend

…f consecutive OPs

…/ADD adjacent op graph conversion

…eed debugging

…ted individually. 2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape 3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor) Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.

2. Remove duplicate get node operation function

… be dealt with

…ode needs to be integrated into the OV Frontend 2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.

Fix for stateful accuracy issues and cl_out_of_resources error in stateful GPU with larger context sizes.

zhanmyz and others added 30 commits January 15, 2026 10:05

Update build.md and add operation mapping(GGML to OpenVINO)

80c330a

add the rms_norm operator implemented using OpenVINO to the GGML back…

8c5a609

…end of llama.cpp

Fix issue for output memory copy of infer request

e95f29c

Change to implementation following pytorch frontend

b100f89

Add support for UNARY SILU op . Fix pytorch impl bugs.

590f587

Support Softmax op

d218c61

Support Softmax op

8aba03b

Support ROPE op.

2353c73

Add support for RMS_NORM OP

0f7d07d

Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML ba…

2b04bd4

…ckend

Move CPY from GGML OV Backend to OV Frontend

cb2729b

add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops

8484769

add implementation of CPY when the output tensor is non-contiguous

57582fd

add tmp source code files

afb8594

Execute singel CONT operator is OK

081b526

Execute CONT & VIEW operators in OV Frontend is OK

901f734

OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion o…

95ae982

…f consecutive OPs

OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX…

9a7b7d8

…/ADD adjacent op graph conversion

Change the input parameter shape of CONT operator

f98d215

Change the input and ouput node shape of MUL_MAT operator

f37fa21

Change the input and ouput node shape of MUL_MAT operator

246a2d1

change CONT and MULMAT input node shape

d05c458

All adjacent ops can conversion but calculation result is wrong and n…

e08a7fd

…eed debugging

1. Update the implementation of CPY node when it's non-contiguous

467a5dd

2. Remove duplicate get node operation function

Minor Update

b14b49d

Try to add VIEW node to OV Frontend and have some issues that need to…

19ec9b6

… be dealt with

1. In the Prompt process and predict first token stage, the PERMUTE n…

b02265a

…ode needs to be integrated into the OV Frontend 2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.

add debug info

8020138

Process Prompt and predict first token is OK

8ae700a

wine99 and others added 8 commits January 27, 2026 10:21

Remove hardcode names

e480d5b

Fix stateful shapes

4f51bc8

Simplification for stateful and update output shape processing

4a8fd24

Remove hardcode names

750a04a

Avoid re-compilation in llama-bench

47346d0

Extract zp directly instead of bias

907d832

Refactor weight tensor processing

6d71ded

Add ov falback to CPU machinisim & verified with OV without ROPE support

f60ee79

github-actions bot added the ggml label Feb 9, 2026

wine99 and others added 10 commits February 11, 2026 10:15

Fix llama-bench -p -n where p<=256

8fb20b2

Fix --direct-io 0

1c0a47a

Don't put kvcache on GPU in stateful mode

c840210

Remove hardcode names

d398214

Fix stateful shapes

26328fe

Simplification for stateful and update output shape processing

3259921

Remove hardcode names

18ab0f5

Avoid re-compilation in llama-bench

b6c0697

Extract zp directly instead of bias

0ee7e05

Refactor weight tensor processing

900dd76

wine99 force-pushed the dev_backend_openvino branch from 6d71ded to 900dd76 Compare February 11, 2026 02:16

wine99 and others added 6 commits February 11, 2026 10:26

Merge branch 'master' into dev_backend_openvino

7b3b65b

create_weight_node accept non-ov backend buffer

1d4ec1b

remove changes in llama-graph.cpp

e059015

stateful masking fix (ravi9#38)

0d74aba

Fix for stateful accuracy issues and cl_out_of_resources error in stateful GPU with larger context sizes.

Fix test-backend-ops crash glu, get_rows, scale, rms_norm, add

d5d673c

Merge branch 'dev_backend_openvino' into xuejun/fallback-ov-to-cpu

eb1d091

cavusmustafa requested review from cavusmustafa and wine99 as code owners March 10, 2026 20:45

ggerganov force-pushed the dev_backend_openvino branch from 76e4057 to e73b4d4 Compare March 13, 2026 10:44

wine99 force-pushed the dev_backend_openvino branch from 996b739 to b6c83aa Compare March 17, 2026 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ov falback to CPU machinisim & verified with OV without ROPE support#42

Add ov falback to CPU machinisim & verified with OV without ROPE support#42
zhaixuejun1993 wants to merge 264 commits intoravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/fallback-ov-to-cpu

zhaixuejun1993 commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

zhaixuejun1993 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

zhaixuejun1993 commented Feb 9, 2026 •

edited

Loading