Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
b7ad48e
llama: add custom newline split for Gemma 4 (#21406)
am17an Apr 4, 2026
650bf14
llama-model: read final_logit_softcapping for Gemma 4 (#21390)
ssam18 Apr 4, 2026
d01f627
common : respect specified tag, only fallback when tag is empty (#21413)
angt Apr 4, 2026
9c69907
server: Fix undefined timing measurement errors in server context (#2…
thedanhoffman Apr 4, 2026
b863507
common : add gemma 4 specialized parser (#21418)
aldehir Apr 4, 2026
661e9ac
ci: fix vulkan workflow referencing non-existent action (#21442)
nisparks Apr 5, 2026
c08d28d
ci: lower cuda12 floor to 12.8.1 for broader host compatibility (#21438)
M1DNYT3 Apr 5, 2026
5d3a4a7
server : fix logging of build + system info (#21460)
ddh0 Apr 5, 2026
761797f
ci : use default RISE RISC-V Runners (#21263)
luhenry Apr 5, 2026
af76639
model : add HunyuanOCR support (#21395)
richarddd Apr 5, 2026
58190cc
llama : correct platform-independent loading of BOOL metadata (#21428)
anchortense Apr 5, 2026
25eec6f
hexagon: slight optimization for argosrt output init (#21463)
YardenTal44 Apr 6, 2026
f51fd36
sycl : handle other FA case (#21377)
arthw Apr 6, 2026
400ac8e
convert : set "add bos" == True for Gemma 4 (#21500)
ggerganov Apr 6, 2026
3979f2b
docs: add hunyuan-ocr gguf, also add test [no ci] (#21490)
ngxson Apr 6, 2026
482d862
server : handle unsuccessful sink.write in chunked stream provider (#…
lainon1 Apr 6, 2026
941146b
convert : fix block_ff_dim retrieval for lfm2 (#21508)
CISC Apr 6, 2026
4aa962e
vocab : add byte token handling to BPE detokenizer for Gemma4 (#21488)
aldehir Apr 6, 2026
94ca829
llama-bench: add `-fitc` and `-fitt` to arguments (#21304)
am17an Apr 6, 2026
15f786e
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel (#21159)
gaugarg-nv Apr 6, 2026
506200c
cli: fix stripping of \n in multiline input (#21485)
bipinyadav3175 Apr 6, 2026
2e1f0a8
ggml: add Q1_0 1-bit quantization support (CPU) (#21273)
khosravipasha Apr 6, 2026
d0a6dfe
ggml-webgpu: Add the support of `MUL_MAT_ID` (#21147)
yomaytk Apr 6, 2026
0033f53
docs: fix typo in build.md (emdawbwebgpu -> emdawnwebgpu) (#21518)
CastelDazur Apr 7, 2026
0988acc
[SYCL] Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (#…
PMZFX Apr 7, 2026
d1f82e3
Fix rtl text rendering (#21382)
Kabir08 Apr 7, 2026
ecce008
fix: Detect streaming state in reasoning content blocks (#21549)
allozaur Apr 7, 2026
71a81f6
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210) …
aviallon Apr 7, 2026
482192f
webui : store reasoning_content so it is sent back in subsequent requ…
aldehir Apr 7, 2026
edd4d9b
vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)
mkoker Apr 7, 2026
2a619f6
ggml: Vulkan build, Linux -- output error string for errno on fork fa…
tomoverlund Apr 7, 2026
22fc791
ggml : deprecate GGML_OP_ADD1 (#21363)
ggerganov Apr 7, 2026
e8f5082
server : fix restore for checkpoints with pos_min == 0 (#21510)
ggerganov Apr 7, 2026
a8ec0df
llama: remove per-arch tensor name lists (#21531)
JohannesGaessler Apr 7, 2026
0d049d6
unicode : add custom Qwen2 regex handler to fix segfault on long inpu…
nhs000 Apr 7, 2026
69c28f1
llama-server: fix model params not propagated (#21509)
taronaeo Apr 7, 2026
de1aa6f
CUDA: check for buffer overlap before fusing (#21566)
am17an Apr 7, 2026
957d717
ggml-webgpu: parameterize submission size and add iOS specific limits…
reeselevine Apr 7, 2026
4eb1951
kv-cache : support attention rotation for heterogeneous iSWA (#21513)
ggerganov Apr 7, 2026
93bdc61
gguf-py : fix missing comma after bad merge in tensor-mapping (#21558)
danbev Apr 7, 2026
66c4f9d
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (#21168)
iacopPBK Apr 7, 2026
c5ce4bc
CUDA: make cuda graphs props check faster (#21472)
am17an Apr 8, 2026
5c4aae6
devops: kleidiai: provide KleidiAI-Enabled ARM Release Artifact (#21259)
martin-klacer-arm Apr 8, 2026
97508ac
webui: fix syntax highlighting lost after streaming for non-common la…
hmblair Apr 8, 2026
09343c0
model : support step3-vl-10b (#21287)
forforever73 Apr 8, 2026
ece522f
chore: Remove legacy files (#21606)
allozaur Apr 8, 2026
3bd9aa1
chore: Update labeler to have separate labels for `server/webui` and …
allozaur Apr 8, 2026
ae65fbd
tests : remove obsolete .mjs script (#21615)
ggerganov Apr 8, 2026
85d482e
parser: fix MiniMax handling (#21573)
pwilkin Apr 8, 2026
87f4744
examples : disable cb_eval callback for --save-logits (#21553)
danbev Apr 8, 2026
5764d7c
gemma : perform per-layer projections in the first layer (#21612)
ggerganov Apr 8, 2026
dcdcbad
metal: Q1_0 backend (#21528)
khosravipasha Apr 8, 2026
5473949
webgpu : Query for adapter support when registering WebGPU backend (#…
reeselevine Apr 8, 2026
3ba12fe
kv-cache : extend cache quantization checks (#21586)
Green-Sky Apr 8, 2026
e9fd962
Propose fix a couple of typos (#21581)
jeis4wpi Apr 8, 2026
4a05e0c
webui : send both backend_sampling == false/true (#18781)
ggerganov Apr 8, 2026
d9a12c8
vocab : remove </s> eog token if gemma4 (#21492)
aldehir Apr 8, 2026
6606000
server: respect the ignore eos flag (#21203)
ykhrustalev Apr 8, 2026
2dcb7f7
fix: free ctx_copy in ggml_opt_free to plug per-training-session leak…
RealOrko Apr 8, 2026
d12cc3d
CUDA: also store `node->src->data` ptrs for equality check (#21635)
am17an Apr 8, 2026
4293919
common : skip non-primary GGUF split files when selecting model (#21633)
angt Apr 9, 2026
8a132fa
vulkan: unify type macros to use Vx instead of _VECx (#21605)
0cc4m Apr 9, 2026
8a65a7a
ci: drop v5 `all:` composition from labeler.yml (#21627)
Marxist-Leninist Apr 9, 2026
b54cb2e
sycl : add flash-attn support for head size 512 (#21654)
qnixsynapse Apr 9, 2026
75511a8
webui: Add option to pre-encode conversation for faster next turns (#…
allozaur Apr 9, 2026
3ee9da0
server : fix grammar commandline args (#21543)
AUTOMATIC1111 Apr 9, 2026
9949ad0
fix: Model Selector choice sync (#21628)
allozaur Apr 9, 2026
5e9c635
metal : add missing mm-id specializations for q1_0 (#21662)
ggerganov Apr 9, 2026
243532e
jinja : support ensure_ascii=true, string repetition and int/float se…
kwajiehao Apr 9, 2026
0ec191e
vocab: add gemma4 tokenizer tests, fix edge case (#21534)
pwilkin Apr 9, 2026
501aeed
mtmd: support dots.ocr (#17575)
ngxson Apr 9, 2026
057dba3
model: fix multimodal padding token for gemma3n/gemma4 (#21625)
ngxson Apr 9, 2026
2622975
common : simplify autoparser tagged parser rules (#21216)
aldehir Apr 9, 2026
ddf03c6
common : fix ambiguous grammar rule in gemma4 (#21661)
aldehir Apr 9, 2026
4ef9301
webui: add "Send message on Enter" setting (#21577)
mourix Apr 9, 2026
c8ac02f
requirements : update transformers to 5.5.1 (#21617)
danbev Apr 9, 2026
009a113
ggml : check return value of CUB calls used in argsort and top-k (the…
fairydreaming Apr 9, 2026
d6f3030
ggml: backend-agnostic tensor parallelism (experimental) (#19378)
JohannesGaessler Apr 9, 2026
d132f22
HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570)
andyluo7 Apr 9, 2026
e34f042
CUDA: fuse muls (#21665)
am17an Apr 10, 2026
e095a48
common : add fluidity to the progress bar (#21671)
angt Apr 10, 2026
7b69125
vulkan: Support Q1_0 (#21539)
jeffbolznv Apr 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,18 @@ android:
- changed-files:
- any-glob-to-any-file:
- examples/llama.android/**
server/webui:
- changed-files:
- any-glob-to-any-file:
- tools/server/webui/**
- tools/server/public/**
server:
- changed-files:
- any-glob-to-any-file:
- tools/server/**



ggml:
- changed-files:
- any-glob-to-any-file:
Expand Down
38 changes: 14 additions & 24 deletions .github/workflows/build-riscv.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ env:

jobs:
ubuntu-riscv64-native-sanitizer:
runs-on: RISCV64
runs-on: ubuntu-24.04-riscv

continue-on-error: true

Expand All @@ -50,17 +50,18 @@ jobs:
sudo apt-get update

# Install necessary packages
sudo apt-get install -y libatomic1 libtsan2 gcc-14 g++-14 rustup cmake build-essential wget ccache git-lfs
sudo apt-get install -y libatomic1 libtsan2 gcc-14 g++-14 cmake build-essential wget git-lfs

# Set gcc-14 and g++-14 as the default compilers
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-14 100
sudo ln -sf /usr/bin/gcc-14 /usr/bin/gcc
sudo ln -sf /usr/bin/g++-14 /usr/bin/g++

# Install Rust stable version
rustup install stable
rustup default stable
if ! which rustc; then
# Install Rust stable version
sudo apt-get install -y rustup
rustup install stable
rustup default stable
fi

git lfs install

Expand All @@ -73,23 +74,12 @@ jobs:
id: checkout
uses: actions/checkout@v6

- name: Setup ccache
run: |
# Unique cache directory per matrix combination
export CCACHE_DIR="$HOME/.ccache/sanitizer-${{ matrix.sanitizer }}-${{ matrix.build_type }}"
mkdir -p "$CCACHE_DIR"

# Configure ccache
ccache --set-config=max_size=5G
ccache --set-config=compression=true
ccache --set-config=compression_level=6
ccache --set-config=cache_dir="$CCACHE_DIR"
ccache --set-config=sloppiness=file_macro,time_macros,include_file_mtime,include_file_ctime
ccache --set-config=hash_dir=false

# Export for subsequent steps
echo "CCACHE_DIR=$CCACHE_DIR" >> $GITHUB_ENV
echo "PATH=/usr/lib/ccache:$PATH" >> $GITHUB_ENV
# FIXME: Enable when ggml-org/ccache-action works on riscv64
# - name: ccache
# uses: ggml-org/ccache-action@v1.2.21
# with:
# key: ubuntu-riscv64-native-sanitizer-${{ matrix.sanytizer }}-${{ matrix.build_type }}
# save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Build
id: cmake_build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-vulkan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:

- name: Setup Vulkan SDK
if: steps.cache-sdk.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-vulkan-llvmpipe
uses: ./.github/actions/linux-setup-vulkan
with:
path: ./vulkan_sdk
version: ${{ env.VULKAN_SDK_VERSION }}
Expand Down
47 changes: 18 additions & 29 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -996,32 +996,29 @@ jobs:
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}

ubuntu-cpu-riscv64-native:
runs-on: RISCV64
runs-on: ubuntu-24.04-riscv

steps:
- name: Install dependencies
run: |
sudo apt-get update

# Install necessary packages
sudo apt-get install -y libatomic1 libtsan2 gcc-14 g++-14 rustup cmake build-essential libssl-dev wget ccache git-lfs
sudo apt-get install -y libatomic1 libtsan2 gcc-14 g++-14 cmake build-essential libssl-dev wget git-lfs

# Set gcc-14 and g++-14 as the default compilers
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-14 100
sudo ln -sf /usr/bin/gcc-14 /usr/bin/gcc
sudo ln -sf /usr/bin/g++-14 /usr/bin/g++

# Install Rust stable version
rustup install stable
rustup default stable
if ! which rustc; then
# Install Rust stable version
sudo apt-get install -y rustup
rustup install stable
rustup default stable
fi

git lfs install

- name: Clone
id: checkout
uses: actions/checkout@v6

- name: Check environment
run: |
uname -a
Expand All @@ -1031,25 +1028,17 @@ jobs:
cmake --version
rustc --version

- name: Setup ccache
run: |
# Set unique cache directory for this job
export CCACHE_DIR="$HOME/.ccache/cpu-cmake-rv64-native"
mkdir -p "$CCACHE_DIR"

# Configure ccache for optimal performance
ccache --set-config=max_size=5G
ccache --set-config=compression=true
ccache --set-config=compression_level=6
ccache --set-config=cache_dir="$CCACHE_DIR"

# Enable more aggressive caching
ccache --set-config=sloppiness=file_macro,time_macros,include_file_mtime,include_file_ctime
ccache --set-config=hash_dir=false
- name: Clone
id: checkout
uses: actions/checkout@v6

# Export for subsequent steps
echo "CCACHE_DIR=$CCACHE_DIR" >> $GITHUB_ENV
echo "PATH=/usr/lib/ccache:$PATH" >> $GITHUB_ENV
# FIXME: Enable when ggml-org/ccache-action works on riscv64
# - name: ccache
# uses: ggml-org/ccache-action@v1.2.21
# with:
# key: ubuntu-cpu-riscv64-native
# evict-old-files: 1d
# save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Build
id: cmake_build
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ jobs:
{ "tag": "cpu", "dockerfile": ".devops/cpu.Dockerfile", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": false, "runs_on": "ubuntu-24.04" },
{ "tag": "cpu", "dockerfile": ".devops/cpu.Dockerfile", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": false, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "cpu", "dockerfile": ".devops/s390x.Dockerfile", "platforms": "linux/s390x", "full": true, "light": true, "server": true, "free_disk_space": false, "runs_on": "ubuntu-24.04-s390x" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.9.1", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.9.1", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.8.1", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda cuda12", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "12.8.1", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.1.1", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
{ "tag": "cuda13", "dockerfile": ".devops/cuda.Dockerfile", "cuda_version": "13.1.1", "platforms": "linux/arm64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04-arm" },
{ "tag": "musa", "dockerfile": ".devops/musa.Dockerfile", "platforms": "linux/amd64", "full": true, "light": true, "server": true, "free_disk_space": true, "runs_on": "ubuntu-24.04" },
Expand Down
86 changes: 27 additions & 59 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,55 +36,26 @@ env:
CMAKE_ARGS: "-DLLAMA_BUILD_EXAMPLES=OFF -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON"

jobs:
macOS-arm64:
runs-on: macos-14

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
with:
fetch-depth: 0

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: macOS-latest-arm64
evict-old-files: 1d

- name: Build
id: cmake_build
run: |
sysctl -a
cmake -B build \
-DCMAKE_INSTALL_RPATH='@loader_path' \
-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_BUILD_BORINGSSL=ON \
-DGGML_METAL_USE_BF16=ON \
-DGGML_METAL_EMBED_LIBRARY=ON \
-DGGML_RPC=ON \
${{ env.CMAKE_ARGS }}
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)

- name: Determine tag name
id: tag
uses: ./.github/actions/get-tag-name

- name: Pack artifacts
id: pack_artifacts
run: |
cp LICENSE ./build/bin/
tar -czvf llama-${{ steps.tag.outputs.name }}-bin-macos-arm64.tar.gz -s ",./,llama-${{ steps.tag.outputs.name }}/," -C ./build/bin .

- name: Upload artifacts
uses: actions/upload-artifact@v6
with:
path: llama-${{ steps.tag.outputs.name }}-bin-macos-arm64.tar.gz
name: llama-bin-macos-arm64.tar.gz
macOS-cpu:
strategy:
matrix:
include:
- build: 'arm64'
arch: 'arm64'
os: macos-14
defines: "-DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON"
- build: 'arm64-kleidiai'
arch: 'arm64'
os: macos-14
defines: "-DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON"
- build: 'x64'
arch: 'x64'
os: macos-15-intel
# Metal is disabled on x64 due to intermittent failures with Github runners not having a GPU:
# https://github.com/ggml-org/llama.cpp/actions/runs/8635935781/job/23674807267#step:5:2313
defines: "-DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3"

macOS-x64:
runs-on: macos-15-intel
runs-on: ${{ matrix.os }}

steps:
- name: Clone
Expand All @@ -96,23 +67,20 @@ jobs:
- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: macOS-latest-x64
key: macOS-latest-${{ matrix.arch }}
evict-old-files: 1d

- name: Build
id: cmake_build
run: |
sysctl -a
# Metal is disabled due to intermittent failures with Github runners not having a GPU:
# https://github.com/ggml-org/llama.cpp/actions/runs/8635935781/job/23674807267#step:5:2313
cmake -B build \
${{ matrix.defines }} \
-DCMAKE_INSTALL_RPATH='@loader_path' \
-DCMAKE_BUILD_WITH_INSTALL_RPATH=ON \
-DLLAMA_FATAL_WARNINGS=ON \
-DLLAMA_BUILD_BORINGSSL=ON \
-DGGML_METAL=OFF \
-DGGML_RPC=ON \
-DCMAKE_OSX_DEPLOYMENT_TARGET=13.3
${{ env.CMAKE_ARGS }}
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)

- name: Determine tag name
Expand All @@ -123,13 +91,13 @@ jobs:
id: pack_artifacts
run: |
cp LICENSE ./build/bin/
tar -czvf llama-${{ steps.tag.outputs.name }}-bin-macos-x64.tar.gz -s ",./,llama-${{ steps.tag.outputs.name }}/," -C ./build/bin .
tar -czvf llama-${{ steps.tag.outputs.name }}-bin-macos-${{ matrix.build }}.tar.gz -s ",./,llama-${{ steps.tag.outputs.name }}/," -C ./build/bin .

- name: Upload artifacts
uses: actions/upload-artifact@v6
with:
path: llama-${{ steps.tag.outputs.name }}-bin-macos-x64.tar.gz
name: llama-bin-macos-x64.tar.gz
path: llama-${{ steps.tag.outputs.name }}-bin-macos-${{ matrix.build }}.tar.gz
name: llama-bin-macos-${{ matrix.build }}.tar.gz

ubuntu-cpu:
strategy:
Expand Down Expand Up @@ -1003,8 +971,7 @@ jobs:
- ubuntu-cpu
- ubuntu-vulkan
- ubuntu-24-openvino
- macOS-arm64
- macOS-x64
- macOS-cpu
- ios-xcode-build
- openEuler-cann

Expand Down Expand Up @@ -1079,6 +1046,7 @@ jobs:

**macOS/iOS:**
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-macos-arm64.tar.gz)
- [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github.com/ggml-org/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-macos-arm64-kleidiai.tar.gz)
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-bin-macos-x64.tar.gz)
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/${{ steps.tag.outputs.name }}/llama-${{ steps.tag.outputs.name }}-xcframework.zip)

Expand Down
16 changes: 9 additions & 7 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2348,19 +2348,21 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
}
).set_env("LLAMA_ARG_N_GPU_LAYERS"));
add_opt(common_arg(
{"-sm", "--split-mode"}, "{none,layer,row}",
{"-sm", "--split-mode"}, "{none,layer,row,tensor}",
"how to split the model across multiple GPUs, one of:\n"
"- none: use one GPU only\n"
"- layer (default): split layers and KV across GPUs\n"
"- row: split rows across GPUs",
"- layer (default): split layers and KV across GPUs (pipelined)\n"
"- row: split weight across GPUs by rows (parallelized)\n"
"- tensor: split weights and KV across GPUs (parallelized)",
[](common_params & params, const std::string & value) {
std::string arg_next = value;
if (arg_next == "none") {
if (value == "none") {
params.split_mode = LLAMA_SPLIT_MODE_NONE;
} else if (arg_next == "layer") {
} else if (value == "layer") {
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
} else if (arg_next == "row") {
} else if (value == "row") {
params.split_mode = LLAMA_SPLIT_MODE_ROW;
} else if (value == "tensor") {
params.split_mode = LLAMA_SPLIT_MODE_TENSOR;
} else {
throw std::invalid_argument("invalid value");
}
Expand Down
Loading
Loading