Add Agent PTQ skill for model quantization by mxinO · Pull Request #1107 · NVIDIA/Model-Optimizer

mxinO · 2026-03-24T02:00:02Z

What does this PR do?

Type of change: New feature

Add a PTQ skill that supports
Non-launcher path:

Supported or unsupported transfomers models, if supported, will run ptq script, in unsupported (needs source code), will patch modules and write custom script.
Remote execution support, config your cluster in clusters.yaml (or give info to the agent directly), then say "quantize Qwen3-8B in xxx". The agent will use SSH ControlMaster to keep a persistent ssh session, so the remote server won't be abused and connection overhead can be reduced significantly.
Slurm support, will automatically detect if the current machine (or a remote machine) is a slurm login node, and will write slurm script for PTQ if so.
It can also dequantize fp8 to bf16 if a fp8 model is given like deepseek.

launcher path:
We currently only use launcher for (supported models + non bare metal) case.

What are not supported now:

Models not in transformers

Usage

Quantize Qwen3-0.6B to NVFP4
Quantize Qwen3-0.6B using cluster xxx

Testing

Test	Model	Path	Result
1	Qwen3-0.6B	4B Launcher + SLURM	✓
2	SmolLM-135M	4C → hf_ptq.py + SLURM	✓
3	InternVL3.5-20B	4C → patched + SLURM	✓
4	InternVL3.5-30B	4C → hf_ptq.py + SLURM	✓
5	FakeUnsupported-0.6B (local)	4C → custom script, local GPU	✓
6	FakeUnsupported-0.6B (HF)	4C → custom script, remote SLURM	✓

The fake model is available at https://huggingface.co/supermmx/FakeUnsupported-0.6B, it's modified to have a module needs quantization patch.

What are not tested?

Multi-node PTQ.
More complex unsupported models.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ❌
added, but it won't automatically run now.
Did you update Changelog?: ✅

Additional Information

Summary by CodeRabbit

Documentation
- New comprehensive guides for remote GPU execution, SLURM workflows, workspace management, PTQ workflow, launcher usage, unsupported-model investigation, and PTQ SLURM/container guidance.
Chores
- Added example cluster configuration and a sourced remote-execution helper script to manage remote runs, sync, and job lifecycle.
Tests
- Added PTQ skill evaluation specification covering expected flows and artifact verification.

copy-pr-bot · 2026-03-24T02:00:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-03-24T02:03:57Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1107/
Built to branch `gh-pages` at 2026-04-03 06:10 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai · 2026-03-24T02:11:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds remote-execution tooling and documentation plus a new PTQ skill and supporting guides: example cluster config, a sourced Bash remote-exec library, SLURM/workspace guidance, PTQ skill docs, references, and tests to enable remote/local PTQ workflows and job orchestration.

Changes

Cohort / File(s)	Summary
Cluster config example `.claude/clusters.yaml.example`	New example clusters YAML introducing `clusters` map, `default_cluster`, SSH/login/workspace fields, optional `ssh_proxy`, `gpu_type`, and commented SLURM defaults.
Remote execution tooling `.claude/skills/common/remote_exec.sh`, `.claude/skills/common/remote-execution.md`	Adds a sourced Bash library exporting remote helpers (session lifecycle, SSH ControlMaster, `remote_run`, rsync sync, SLURM submit/poll, docker-run helpers, env detection) plus docs describing usage, commands, and troubleshooting.
Environment & workspace docs `.claude/skills/common/environment-setup.md`, `.claude/skills/common/workspace-management.md`, `.claude/skills/common/slurm-setup.md`	New guides for locating ModelOpt sources, choosing local vs remote execution, detecting compute (CUDA/Docker/SLURM), workspace layout/reuse, and SLURM job templates/patterns.
PTQ skill and tests `.claude/skills/ptq/SKILL.md`, `.claude/skills/ptq/tests.json`	Adds PTQ skill spec covering end-to-end PTQ control flow, execution paths (manual, launcher, unlisted-model), expected outputs, and JSON test cases asserting behavior and outputs.
PTQ reference guides `.claude/skills/ptq/references/launcher-guide.md`, `.claude/skills/ptq/references/unsupported-models.md`, `.claude/skills/ptq/references/slurm-setup-ptq.md`	New reference docs: launcher integration mapping to cluster fields, detailed unsupported-model quantization patterns/patches, and PTQ-specific SLURM/container guidance and smoke-test procedures.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Local as Local Skill / Shell
participant Lib as remote_exec.sh (sourced)
participant SSH as SSH (ControlMaster)
participant Remote as Remote Host
participant Scheduler as SLURM / Docker runtime

Local->>Lib: source remote_exec.sh & remote_load_cluster
Lib->>SSH: remote_start_session (persistent ControlMaster)
SSH->>Remote: establish SSH control socket
Local->>Lib: remote_detect_env
Lib->>Remote: probe tooling (srun/docker/nvidia-smi)
Remote->>Lib: REMOTE_ENV_TYPE (slurm/docker/bare)
alt REMOTE_ENV_TYPE == slurm
Local->>Lib: remote_sync_to (rsync scripts)
Lib->>Remote: upload wrapper & submit via sbatch
Lib->>Scheduler: remote_submit_job
Scheduler->>Lib: job_id
Lib->>Lib: remote_wait_job -> remote_sync_from
else REMOTE_ENV_TYPE == docker
Local->>Lib: remote_docker_run (run or exec container)
Lib->>Remote: docker run/exec with workspace bind
Remote->>Lib: command output
Lib->>Local: remote_sync_from results
else REMOTE_ENV_TYPE == bare
Local->>Lib: remote_run (base64-encoded command)
Lib->>Remote: execute command in workspace
Remote->>Lib: command output
Lib->>Local: remote_sync_from results
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a new Agent PTQ skill for model quantization, which aligns with the PR's core objective of introducing this new feature.
Security Anti-Patterns	✅ Passed	PR adds documentation and bash utilities without Python code, no hardcoded security parameters, no eval/exec on untrusted input, no # nosec comments found.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mxin/agent-ptq

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-24T02:23:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.94%. Comparing base (7cd7d18) to head (f612c66).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
- Coverage   74.28%   73.94%   -0.34%     
==========================================
  Files         349      349              
  Lines       39838    40035     +197     
==========================================
+ Hits        29592    29603      +11     
- Misses      10246    10432     +186

Flag	Coverage Δ
unit	`54.51% <ø> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds a new Claude skill (ptq) to guide/execute NVIDIA ModelOpt post-training quantization workflows, including remote execution and SLURM-oriented operational guidance.

Changes:

Introduces the ptq skill playbook and reference guides for unsupported models, SLURM setup, and remote execution.
Adds a reusable remote_exec.sh library for SSH ControlMaster sessions, remote command execution, rsync syncing, and SLURM helpers.
Documents multi-user workspace management and provides a sample clusters.yaml configuration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`.claude/skills/ptq/SKILL.md`	PTQ decision flow and key rules for supported vs unsupported models, plus environment selection (local/remote/SLURM).
`.claude/skills/ptq/references/unsupported-models.md`	Patterns and investigation steps for quantizing unsupported architectures.
`.claude/skills/ptq/references/slurm-setup.md`	SLURM container/job-script templates and operational guidance (smoke tests, monitoring).
`.claude/skills/ptq/references/remote-execution.md`	Remote execution workflow and cluster configuration guidance.
`.claude/skills/common/remote_exec.sh`	Bash utility library for persistent SSH, remote run/sync, environment detection, and SLURM helpers.
`.claude/skills/common/workspace-management.md`	Workspace creation/reuse guidance for multi-user (Slack bot) environments.
`.claude/clusters.yaml.example`	Example cluster configuration template.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.claude/skills/common/remote_exec.sh

.claude/skills/ptq/references/unsupported-models.md

.claude/skills/ptq/references/remote-execution.md

.claude/skills/common/remote_exec.sh

Add a Claude Code skill that guides the agent through post-training quantization (PTQ) of LLMs/VLMs using ModelOpt. Includes: - PTQ skill with decision tree for supported/unsupported models - Launcher integration for SLURM/Docker execution - Common environment setup (shared across future skills) - Remote execution and SLURM setup references (manual fallback) - Unsupported model monkey-patching guide - Workspace management for multi-user environments Signed-off-by: Meng Xin <mxin@nvidia.com>

Workspace-by-model-name is useful for single users too, not just multi-user Slack bot. Rewrite to cover both: ./workspaces/ for single-user, $MODELOPT_WORKSPACE_ROOT for multi-user. Signed-off-by: Meng Xin <mxin@nvidia.com>

Signed-off-by: Meng Xin <mxin@nvidia.com>

Edwardf0t1

Thanks @mxinO for refactoring the ptq skills based on our discussions, and I think the PR is in good shape.

Let's consider adding a tests.json with some basic cases (e.g., supported model on SLURM, unsupported VLM, format selection on Hopper vs Blackwell, smoke test failure handling)
We can also add negative triggers to the frontmatter description, especially when deployment and evaluation skills are added. cc @kaix-nv

Edwardf0t1 · 2026-03-25T20:19:11Z

.claude/skills/ptq/references/unsupported-models.md

I think we can consider adding a references/supported-models.md as well. Currently it's inline in SKILL.md, and the hf_ptq.py workflow and flag reference is just a few lines in SKILL.md rather than a reference. Acceptable for now since it's short, but if it grows, we can extract it.

Edwardf0t1 · 2026-03-25T20:23:40Z

.claude/skills/ptq/SKILL.md

+
+# ModelOpt Post-Training Quantization
+
+Produce a quantized checkpoint from a pretrained model. **Read `examples/llm_ptq/README.md` first** — it has the support matrix, CLI flags, and accuracy guidance.


I think this works but means the agent loads the entire README rather than a focused GPU → format matrix. Consider extracting later, e.g., adding a references/format-selection.md.

I think it's better to refer to readme, otherwise after each update of new models, we need update the skill. we can talk about this later, but currently let's keep it referring the readme.

SG - it will be better to point to a specific section if we know what we want claude to read specifically.

.claude/skills/ptq/references/slurm-setup-ptq.md

.claude/skills/common/environment-setup.md

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

.claude/skills/common/remote-execution.md (1)
112-112: Use sbatch --parsable for deterministic job ID extraction.

The grep-based approach can misparse output if SLURM includes multiple numbers. Use --parsable with cut -d';' -f1 for reliable, machine-readable extraction that handles both single-cluster and federated SLURM environments.
Proposed doc fix
-JOBID=$(remote_run "sbatch /remote/path/scripts/job_slurm.sh" | grep -o '[0-9]\+' | tail -1)
+JOBID=$(remote_run "sbatch --parsable /remote/path/scripts/job_slurm.sh" | cut -d';' -f1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/remote-execution.md at line 112, The current JOBID
extraction using remote_run "sbatch /remote/path/scripts/job_slurm.sh" | grep
... is brittle; change the sbatch invocation to use --parsable and extract the
first field to get a deterministic job id (e.g., run remote_run "sbatch
--parsable /remote/path/scripts/job_slurm.sh" and pipe to cut -d';' -f1) so
JOBID reliably contains only the numeric job id even in federated SLURM setups;
update the assignment of JOBID and any references to JOBID accordingly.
.claude/skills/common/slurm-setup.md (1)
58-59: Use sbatch --parsable for stable job ID capture.

Parsing the 4th whitespace token is brittle and depends on human-readable output format, which can break with localized SLURM installations or site-specific wrappers. The --parsable flag is the SLURM-recommended approach for scripting, outputting structured format (job ID or job ID;cluster name).
Proposed doc fix
-JOBID=$(sbatch <script>.sh | awk '{print $4}')
+JOBID=$(sbatch --parsable <script>.sh | cut -d';' -f1)
 echo "Submitted job $JOBID"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/slurm-setup.md around lines 58 - 59, The job
submission snippet uses brittle whitespace parsing to extract JOBID; replace the
sbatch invocation with the parsable mode and assign its output directly to JOBID
(e.g., use sbatch --parsable) and, if you need only the numeric ID, split on ';'
or take the part before any cluster suffix; update the lines that set and echo
the JOBID variable (JOBID=$(sbatch ...) and echo "Submitted job $JOBID") to use
this robust approach.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/common/environment-setup.md:
- Around line 18-31: The doc assumes .claude is in $PWD and mis-specifies SLURM
detection; update the recipe to instruct callers to resolve the
repository/project root (the same ancestor-walk used by your implementation)
before referencing .claude/clusters.yaml and invoking remote_load_cluster and
remote_check_ssh, and clarify that remote_detect_env should detect SLURM by
checking for sbatch (not both srun and sbatch) so PATH/partial-login-node cases
don't pick the wrong branch; also state that remote_load_cluster must be passed
the resolved cluster config path (not a relative .claude path) so subdirectory
invocation behaves identically to the implementation.

In @.claude/skills/common/remote_exec.sh:
- Around line 217-245: The code currently exports REMOTE_WORKSPACE without
validating it, which can result in accidental root (/) or empty paths used by
remote_sync_to/remote_sync_from/remote_ensure_workspace/remote_docker_run; after
parsing REMOTE_WORKSPACE (the REMOTE_WORKSPACE assignment) add a validation: if
REMOTE_WORKSPACE is empty or equals "/" (or otherwise looks invalid) then fail
fast (echo a clear error referencing the cluster name and return 1) so the
config is rejected rather than exported; ensure this validation happens before
the export of REMOTE_WORKSPACE and before printing "Loaded cluster..." so
consumers like remote_sync_to, remote_sync_from, remote_ensure_workspace and
remote_docker_run never receive an invalid workspace.
- Around line 66-82: The _parse_yaml_value function currently injects the
dot.path into an inlined Python string (keys = '$path'), which enables code
injection; change it to accept the file and path via Python argv (use sys.argv
for file and lookup keys) and print the resolved scalar, avoiding any string
interpolation of YAML inputs. Likewise remove uses of eval that build commands
from cluster_name, REMOTE_HOST, REMOTE_SSH_KEY and sync path variables: replace
all eval-based command construction with explicit arrays/argument lists for ssh
and rsync (construct subprocess/exec argument arrays or shell-safe arrays in
bash) and pass variables as quoted argv elements rather than interpolating them
into command strings so no untrusted config can execute locally. Ensure
references to the functions/places using eval (the sections building SSH
connection strings and rsync invocations) are updated to use safe argument
arrays and direct variable passing.

In @.claude/skills/common/workspace-management.md:
- Line 45: Replace the awkward phrase "Task needs output from a previous step
(e.g., eval needs the PTQ checkpoint)" with clearer wording: "Task requires
output from a previous step (e.g., eval requires the PTQ checkpoint)"; update
the string in the document where that exact text appears (search for "Task needs
output from a previous step") so both instances in the example use "requires"
for readability and consistency.

In @.claude/skills/ptq/references/launcher-guide.md:
- Around line 46-50: Clarify that the find command targeting
tools/launcher/local_experiments and matching */exported_model/*/config.json is
specific to local Docker runs (local-launcher) by adding a short note that this
lookup only applies to local Docker mode and to scope the host-path lookup
accordingly; also add a separate brief note describing where artifacts live for
remote SLURM runs (e.g., central experiment/artifact storage or node-specific
scratch paths) and suggest the user consult SLURM job logs or the remote
experiment directory rather than the local tools/launcher/local_experiments
path.

In @.claude/skills/ptq/SKILL.md:
- Line 12: The reference paths in the sentence "Read
`skills/common/environment-setup.md` and
`skills/common/workspace-management.md`." should be normalized to match
companion docs by prefixing with ".claude/"; update those two references to
".claude/skills/common/environment-setup.md" and
".claude/skills/common/workspace-management.md" (and scan the rest of
.claude/skills/ptq/SKILL.md for any other `skills/...` references and change
them to the ".claude/skills/..." style to keep consistency).

---

Nitpick comments:
In @.claude/skills/common/remote-execution.md:
- Line 112: The current JOBID extraction using remote_run "sbatch
/remote/path/scripts/job_slurm.sh" | grep ... is brittle; change the sbatch
invocation to use --parsable and extract the first field to get a deterministic
job id (e.g., run remote_run "sbatch --parsable
/remote/path/scripts/job_slurm.sh" and pipe to cut -d';' -f1) so JOBID reliably
contains only the numeric job id even in federated SLURM setups; update the
assignment of JOBID and any references to JOBID accordingly.

In @.claude/skills/common/slurm-setup.md:
- Around line 58-59: The job submission snippet uses brittle whitespace parsing
to extract JOBID; replace the sbatch invocation with the parsable mode and
assign its output directly to JOBID (e.g., use sbatch --parsable) and, if you
need only the numeric ID, split on ';' or take the part before any cluster
suffix; update the lines that set and echo the JOBID variable (JOBID=$(sbatch
...) and echo "Submitted job $JOBID") to use this robust approach.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60724936-1b1c-4a85-a31d-a30327a23eed

📥 Commits

Reviewing files that changed from the base of the PR and between b1f9f01 and 87988e8.

📒 Files selected for processing (10)

.claude/clusters.yaml.example
.claude/skills/common/environment-setup.md
.claude/skills/common/remote-execution.md
.claude/skills/common/remote_exec.sh
.claude/skills/common/slurm-setup.md
.claude/skills/common/workspace-management.md
.claude/skills/ptq/SKILL.md
.claude/skills/ptq/references/launcher-guide.md
.claude/skills/ptq/references/slurm-setup.md
.claude/skills/ptq/references/unsupported-models.md

.claude/skills/common/environment-setup.md

.claude/skills/common/remote_exec.sh

.claude/skills/common/workspace-management.md

.claude/skills/ptq/references/launcher-guide.md

.claude/skills/ptq/SKILL.md

Edwardf0t1

LGTM overall 👍 Left minor comments in lines.

Additional comments:

We can add a tests.json for the ptq skills in a follow-up PR.
I would consider further extracting the execution and monitoring parts into a separate run-and-monitor skills when we land deployment and evaluation skills to avoid duplicating logics.

.claude/skills/ptq/references/slurm-setup-ptq.md

.claude/skills/ptq/SKILL.md

.claude/skills/common/environment-setup.md

.claude/skills/ptq/SKILL.md

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.claude/skills/common/workspace-management.md (1)
45-45: ⚠️ Potential issue | 🟡 Minor

Use clearer wording for the reuse criterion.

Line 45 reads awkwardly; use “Task requires output from a previous step (e.g., eval requires the PTQ checkpoint)” for clarity and consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/workspace-management.md at line 45, Replace the
awkward phrase "Task needs output from a previous step (e.g., eval needs the PTQ
checkpoint)" with the clearer wording "Task requires output from a previous step
(e.g., eval requires the PTQ checkpoint)" in the workspace-management text; find
the sentence that mentions "Task needs output" and update both occurrences of
"needs" to "requires" to ensure consistency and clearer phrasing.

🧹 Nitpick comments (2)

.claude/skills/ptq/references/unsupported-models.md (2)

78-84: Make the parameter-name inspection snippet robust for non-sharded checkpoints.

The current snippet assumes model.safetensors.index.json always exists. That breaks for single-file checkpoints.

Proposed doc snippet update

-import json
-idx = json.load(open('<ckpt_path>/model.safetensors.index.json'))
-import re
-names = set(re.sub(r'\.\d+\.', '.N.', k) for k in idx['weight_map'])
-for n in sorted(names): print(n)
+import json, re, os
+from safetensors import safe_open
+
+index_path = "<ckpt_path>/model.safetensors.index.json"
+names = set()
+if os.path.exists(index_path):
+    idx = json.load(open(index_path))
+    names = set(idx["weight_map"].keys())
+else:
+    # fallback for single-file safetensors
+    with safe_open("<ckpt_path>/model.safetensors", framework="pt") as f:
+        names = set(f.keys())
+
+normalized = set(re.sub(r"\.\d+\.", ".N.", k) for k in names)
+for n in sorted(normalized):
+    print(n)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 78 - 84,
The snippet assumes model.safetensors.index.json always exists; change it to
first check for that file (use os.path.exists) and if present load idx =
json.load(... ) and use idx['weight_map'], otherwise open the single-file
checkpoint and extract parameter names (e.g., via safetensors.safe_open or
safetensors.torch.load_file to list keys) and then apply the same regex
normalization (re.sub(r'\.\d+\.', '.N.', k)) to produce names; update variable
usage (idx, names) so both branches yield the same set for printing.

9-9: Fix the internal doc path to avoid a broken reference.

Line 9 points to skills/common/workspace-management.md, but this doc tree is under .claude/skills/.... Please align the path format used elsewhere in this PR so this reference resolves reliably.

Proposed doc fix

-**Download first.** Follow `skills/common/workspace-management.md` to set up local and remote workspaces, sync ModelOpt source, and download the model on the target machine. This avoids downloading twice and gives access to README, custom modeling code, and tokenizer config.
+**Download first.** Follow `.claude/skills/common/workspace-management.md` to set up local and remote workspaces, sync ModelOpt source, and download the model on the target machine. This avoids downloading twice and provides access to README, custom modeling code, and tokenizer config.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md at line 9, Update the
broken internal doc link in the sentence that currently references
"skills/common/workspace-management.md" so it uses the same
.claude/skills-relative path format used elsewhere in this PR; replace that
target with ".claude/skills/common/workspace-management.md" (the same canonical
path style as other references in this doc tree) to ensure the link resolves
reliably.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 235-243: Add explicit shape guards before the FP8 block dequant
reshape: verify that w.shape[-2] is divisible by s.shape[-2] and w.shape[-1] is
divisible by s.shape[-1] (the block_m/block_n computation for variables w, s,
block_m, block_n) and raise a clear ValueError (or AssertionError) that includes
the offending shapes if not divisible so we fail fast instead of performing an
invalid reshape; apply this check in the code path where scale.dim() == 3 before
computing block_m/block_n and reshaping/scaling param.data.

---

Duplicate comments:
In @.claude/skills/common/workspace-management.md:
- Line 45: Replace the awkward phrase "Task needs output from a previous step
(e.g., eval needs the PTQ checkpoint)" with the clearer wording "Task requires
output from a previous step (e.g., eval requires the PTQ checkpoint)" in the
workspace-management text; find the sentence that mentions "Task needs output"
and update both occurrences of "needs" to "requires" to ensure consistency and
clearer phrasing.

---

Nitpick comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 78-84: The snippet assumes model.safetensors.index.json always
exists; change it to first check for that file (use os.path.exists) and if
present load idx = json.load(... ) and use idx['weight_map'], otherwise open the
single-file checkpoint and extract parameter names (e.g., via
safetensors.safe_open or safetensors.torch.load_file to list keys) and then
apply the same regex normalization (re.sub(r'\.\d+\.', '.N.', k)) to produce
names; update variable usage (idx, names) so both branches yield the same set
for printing.
- Line 9: Update the broken internal doc link in the sentence that currently
references "skills/common/workspace-management.md" so it uses the same
.claude/skills-relative path format used elsewhere in this PR; replace that
target with ".claude/skills/common/workspace-management.md" (the same canonical
path style as other references in this doc tree) to ensure the link resolves
reliably.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9107923f-5183-44df-8d92-439acfcc8e31

📥 Commits

Reviewing files that changed from the base of the PR and between 87988e8 and fe8516c.

📒 Files selected for processing (5)

.claude/skills/common/environment-setup.md
.claude/skills/common/workspace-management.md
.claude/skills/ptq/SKILL.md
.claude/skills/ptq/references/unsupported-models.md
.claude/skills/ptq/tests.json

✅ Files skipped from review due to trivial changes (3)

.claude/skills/ptq/tests.json
.claude/skills/common/environment-setup.md
.claude/skills/ptq/SKILL.md

.claude/skills/ptq/references/unsupported-models.md

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.claude/skills/common/remote_exec.sh (1)

105-127: ⚠️ Potential issue | 🔴 Critical

Replace remaining eval command construction (still injectable).

Line 155, Line 261, Line 348, Line 381, and Line 393 still execute interpolated command strings via eval. Because host/key/proxy/path values come from config or function args, this keeps a local command-injection path open.

🔧 Safer pattern (argv arrays, no eval)

-_ssh_base_opts() {
-    local opts="-o BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new"
-    local ctl_path
-    ctl_path="$(_ssh_control_path)"
-    opts+=" -o ControlPath='${ctl_path}'"
-    opts+=" -o ControlMaster=auto"
-    if [[ -n "${REMOTE_SSH_KEY:-}" ]]; then
-        opts+=" -i $REMOTE_SSH_KEY"
-    fi
-    if [[ -n "${REMOTE_SSH_PROXY:-}" ]]; then
-        opts+=" -o ProxyCommand='${REMOTE_SSH_PROXY}'"
-    fi
-    echo "$opts"
-}
-
-_ssh_base_cmd() {
-    echo "ssh $(_ssh_base_opts) ${REMOTE_USER}@${REMOTE_HOST}"
-}
+_ssh_base_argv() {
+    SSH_BASE_ARGV=(
+        -o BatchMode=yes
+        -o ConnectTimeout=15
+        -o StrictHostKeyChecking=accept-new
+        -o "ControlPath=$(_ssh_control_path)"
+        -o ControlMaster=auto
+    )
+    [[ -n "${REMOTE_SSH_KEY:-}" ]] && SSH_BASE_ARGV+=(-i "$REMOTE_SSH_KEY")
+    [[ -n "${REMOTE_SSH_PROXY:-}" ]] && SSH_BASE_ARGV+=(-o "ProxyCommand=${REMOTE_SSH_PROXY}")
+    SSH_BASE_ARGV+=("${REMOTE_USER}@${REMOTE_HOST}")
+}
@@
-    eval "ssh $opts -f -N ${REMOTE_USER}@${REMOTE_HOST}" 2>&1
+    ssh "${start_argv[@]}" -f -N 2>&1
@@
-    if result=$(eval "$(_ssh_base_cmd)" '"echo SSH_OK"' 2>&1); then
+    _ssh_base_argv
+    if result=$(ssh "${SSH_BASE_ARGV[@]}" "echo SSH_OK" 2>&1); then
@@
-        result=$(eval "$(_ssh_base_cmd)" "'echo $encoded | base64 -d | bash'" 2>&1) && rc=$? || rc=$?
+        _ssh_base_argv
+        result=$(ssh "${SSH_BASE_ARGV[@]}" "echo $encoded | base64 -d | bash" 2>&1) && rc=$? || rc=$?
@@
-    eval "$rsync_cmd"
+    rsync "${rsync_argv[@]}"
@@
-    eval "rsync -avz --progress -e \"ssh $(_ssh_base_opts)\" '${remote_src}/' '${local_path}/'"
+    rsync "${rsync_argv[@]}"

#!/bin/bash
# Verify no eval-based execution remains in this script.
rg -n '\beval\b|_ssh_base_cmd\b|rsync_cmd=' .claude/skills/common/remote_exec.sh

Expected: no eval usage for ssh/rsync execution paths.

Also applies to: 155-155, 261-261, 348-348, 371-381, 393-393

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/remote_exec.sh around lines 105 - 127, The script
still builds and executes interpolated command strings via eval (notably around
_ssh_base_cmd/_ssh_base_opts and places that set rsync_cmd), leaving
command-injection risk; fix by eliminating eval and switching to POSIX/Bash argv
arrays: change callers that currently do eval "$(_ssh_base_cmd) ...", instead
build an array like local -a ssh_cmd=(ssh -o BatchMode=yes -o ConnectTimeout=15
-o StrictHostKeyChecking=accept-new) then conditionally append args with quoted
variables (e.g. [[ -n $REMOTE_SSH_KEY ]] && ssh_cmd+=(-i "$REMOTE_SSH_KEY"); [[
-n $REMOTE_SSH_PROXY ]] && ssh_cmd+=(-o "ProxyCommand=$REMOTE_SSH_PROXY");
ssh_cmd+=("${REMOTE_USER}@${REMOTE_HOST}")), and execute with "${ssh_cmd[@]}"
(similarly convert any rsync_cmd string to an array and call "${rsync_cmd[@]}");
update or add an array-returning helper (or rename _ssh_base_opts to
produce/apply options as array pieces) and remove all eval usages so no
interpolated strings are executed.

🧹 Nitpick comments (1)

.claude/skills/ptq/references/unsupported-models.md (1)

52-53: Pin transformers install by commit SHA for reproducibility.

The current doc line installs from a moving main checkout, which makes troubleshooting non-reproducible. Recommend documenting install by commit SHA:

-    - **Found** → install from that clone: `pip install /tmp/transformers-main --quiet`, then re-run `AutoConfig.from_pretrained()`.
+    - **Found** → install from that clone at a pinned commit, then re-run `AutoConfig.from_pretrained()`:
+      `cd /tmp/transformers-main && git rev-parse HEAD`
+      `pip install /tmp/transformers-main --quiet`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 52 - 53,
The doc currently tells users to install from a moving main checkout ("pip
install /tmp/transformers-main --quiet") which harms reproducibility; update the
instruction to pin the installed transformers to a specific commit SHA by
checking out the repo to that SHA before installing, and update the surrounding
guidance (the sentence referencing AutoConfig.from_pretrained() and the
follow-up question about <ArchName>) to instruct using the pinned-commit install
for reproducible debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/common/remote_exec.sh:
- Around line 217-239: REMOTE_WORKSPACE may contain a literal "~/" which is not
expanded and breaks remote_run(), remote_sync_to(), rsync and mounts; add the
same tilde handling used for REMOTE_SSH_KEY to expand "~/" to "${HOME}/..."
right after REMOTE_WORKSPACE is assigned (or alternatively reject values
beginning with "~/" with an error), and ensure references in remote_run and
remote_sync_to continue to use the expanded REMOTE_WORKSPACE value; update
validation that currently checks REMOTE_WORKSPACE to run after the expansion so
the non-root and non-empty checks operate on the expanded path.

In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 14-15: Update the guidance around using --trust_remote_code
(referenced in the sentence about modeling_*.py and tokenization_*.py and the
tools hf_ptq.py, AutoConfig, AutoTokenizer, AutoModel and
trust_remote_code=True) to include a security warning: state that
--trust_remote_code executes arbitrary Python from the model repo, recommend
running it only in isolated or ephemeral environments (e.g., containers or VMs),
advise pinning to a specific trusted model revision/commit and verifying the
remote code before use, and suggest running with least privileges and network
restrictions where possible.

---

Duplicate comments:
In @.claude/skills/common/remote_exec.sh:
- Around line 105-127: The script still builds and executes interpolated command
strings via eval (notably around _ssh_base_cmd/_ssh_base_opts and places that
set rsync_cmd), leaving command-injection risk; fix by eliminating eval and
switching to POSIX/Bash argv arrays: change callers that currently do eval
"$(_ssh_base_cmd) ...", instead build an array like local -a ssh_cmd=(ssh -o
BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new) then
conditionally append args with quoted variables (e.g. [[ -n $REMOTE_SSH_KEY ]]
&& ssh_cmd+=(-i "$REMOTE_SSH_KEY"); [[ -n $REMOTE_SSH_PROXY ]] && ssh_cmd+=(-o
"ProxyCommand=$REMOTE_SSH_PROXY"); ssh_cmd+=("${REMOTE_USER}@${REMOTE_HOST}")),
and execute with "${ssh_cmd[@]}" (similarly convert any rsync_cmd string to an
array and call "${rsync_cmd[@]}"); update or add an array-returning helper (or
rename _ssh_base_opts to produce/apply options as array pieces) and remove all
eval usages so no interpolated strings are executed.

---

Nitpick comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 52-53: The doc currently tells users to install from a moving main
checkout ("pip install /tmp/transformers-main --quiet") which harms
reproducibility; update the instruction to pin the installed transformers to a
specific commit SHA by checking out the repo to that SHA before installing, and
update the surrounding guidance (the sentence referencing
AutoConfig.from_pretrained() and the follow-up question about <ArchName>) to
instruct using the pinned-commit install for reproducible debugging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 251ab2f2-b085-471c-93b7-5fd5c7045659

📥 Commits

Reviewing files that changed from the base of the PR and between fe8516c and d8f1764.

📒 Files selected for processing (7)

.claude/skills/common/remote-execution.md
.claude/skills/common/remote_exec.sh
.claude/skills/common/workspace-management.md
.claude/skills/ptq/SKILL.md
.claude/skills/ptq/references/launcher-guide.md
.claude/skills/ptq/references/slurm-setup-ptq.md
.claude/skills/ptq/references/unsupported-models.md

✅ Files skipped from review due to trivial changes (5)

.claude/skills/ptq/references/slurm-setup-ptq.md
.claude/skills/common/workspace-management.md
.claude/skills/ptq/SKILL.md
.claude/skills/common/remote-execution.md
.claude/skills/ptq/references/launcher-guide.md

coderabbitai · 2026-04-01T08:01:42Z

.claude/skills/common/remote_exec.sh

+    REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"
+    REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"
+    REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"
+    REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"
+
+    # SLURM-specific
+    REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"
+    REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"
+
+    # Expand ~ in ssh_key
+    if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then
+        REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"
+    fi
+
+    # Validate required fields
+    if [[ -z "$REMOTE_HOST" ]]; then
+        echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2
+        return 1
+    fi
+    if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then
+        echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2
+        return 1
+    fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find workspace values in example/docs that may still use '~' rg -n 'workspace:\s*~/?' .claude README.md docs 2>/dev/null

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# First, get file size and structure wc -l .claude/skills/common/remote_exec.sh

Repository: NVIDIA/Model-Optimizer

Length of output: 106

🏁 Script executed:

# Read the section in question (lines 217-239) and surrounding context sed -n '215,245p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 1816

🏁 Script executed:

# Read lines around 341-343 to see how REMOTE_WORKSPACE is used sed -n '335,350p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 881

🏁 Script executed:

# Read lines around 369-370 to see how REMOTE_WORKSPACE is used sed -n '360,380p' .claude/skills/common/remote_exec.sh | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 932

Add validation to reject ~ in workspace values, or expand them like ssh_key.

REMOTE_WORKSPACE parsed from config can contain ~/..., but unlike REMOTE_SSH_KEY (lines 227–228), it has no tilde expansion. When used in remote_run() (line 348, double-quoted) and remote_sync_to() (line 379, then wrapped in single quotes at line 388), the tilde is passed literally and won't expand remotely, causing cd, rsync, and mounts to fail.

Either expand ~ in workspace values to $HOME form (matching SSH_KEY pattern), or reject them with validation:

Suggested validation

if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2 return 1 fi + if [[ "$REMOTE_WORKSPACE" == "~" || "$REMOTE_WORKSPACE" == "~/"* ]]; then + echo "ERROR: Cluster '$cluster_name' workspace must not use '~'. Use an absolute path or \$HOME/... form." >&2 + return 1 + fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"

REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"

REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"

REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"

# SLURM-specific

REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"

REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"

# Expand ~ in ssh_key

if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then

REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"

fi

# Validate required fields

if [[ -z "$REMOTE_HOST" ]]; then

echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2

return 1

fi

if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then

echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2

return 1

fi

REMOTE_WORKSPACE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.workspace")"

REMOTE_GPU_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.gpu_type")"

REMOTE_CONTAINER_IMAGE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.container_image")"

REMOTE_ENV_TYPE="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.env_type")"

# SLURM-specific

REMOTE_SLURM_ACCOUNT="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_account")"

REMOTE_SLURM_PARTITION="$(_parse_yaml_value "$config_file" "clusters.${cluster_name}.slurm.default_partition")"

# Expand ~ in ssh_key

if [[ "${REMOTE_SSH_KEY:-}" == "~/"* ]]; then

REMOTE_SSH_KEY="${HOME}/${REMOTE_SSH_KEY#\~/}"

fi

# Validate required fields

if [[ -z "$REMOTE_HOST" ]]; then

echo "ERROR: Cluster '$cluster_name' has no login_node defined." >&2

return 1

fi

if [[ -z "${REMOTE_WORKSPACE:-}" || "$REMOTE_WORKSPACE" == "/" ]]; then

echo "ERROR: Cluster '$cluster_name' must define a non-root workspace." >&2

return 1

fi

if [[ "$REMOTE_WORKSPACE" == "~" || "$REMOTE_WORKSPACE" == "~/"* ]]; then

echo "ERROR: Cluster '$cluster_name' workspace must not use '~'. Use an absolute path or \$HOME/... form." >&2

return 1

fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 227-227: Tilde does not expand in quotes. Use $HOME.

(SC2088)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/common/remote_exec.sh around lines 217 - 239, REMOTE_WORKSPACE may contain a literal "~/" which is not expanded and breaks remote_run(), remote_sync_to(), rsync and mounts; add the same tilde handling used for REMOTE_SSH_KEY to expand "~/" to "${HOME}/..." right after REMOTE_WORKSPACE is assigned (or alternatively reject values beginning with "~/" with an error), and ensure references in remote_run and remote_sync_to continue to use the expanded REMOTE_WORKSPACE value; update validation that currently checks REMOTE_WORKSPACE to run after the expansion so the non-root and non-empty checks operate on the expanded path.

coderabbitai · 2026-04-01T08:01:43Z

.claude/skills/ptq/references/unsupported-models.md

+2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .claude/skills/ptq/references/unsupported-models.md | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 2380

🏁 Script executed:

wc -l .claude/skills/ptq/references/unsupported-models.md

Repository: NVIDIA/Model-Optimizer

Length of output: 121

🏁 Script executed:

grep -n -i "security\|isolation\|revision\|trust_remote_code\|sandbox" .claude/skills/ptq/references/unsupported-models.md

Repository: NVIDIA/Model-Optimizer

Length of output: 662

🏁 Script executed:

cat -n .claude/skills/ptq/references/unsupported-models.md | sed -n '1,30p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1946

🏁 Script executed:

# Check the workspace-management.md reference mentioned in line 9 to see if security guidance is there cat -n .claude/skills/common/workspace-management.md 2>/dev/null | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 1515

🏁 Script executed:

# Search for security-related content in the broader context grep -r -n -i "trust_remote_code\|sandbox\|isolation" .claude/skills/ 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 1431

Add security guidance for --trust_remote_code usage.

Line 14 mandates --trust_remote_code but omits any warning that this executes arbitrary Python from model repositories. Add guidance to run only in isolated environments and pin a trusted model revision/commit.

Suggested patch

2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes. + **Security note:** `trust_remote_code` executes model-repo Python. Use only in isolated environments and pin a trusted `revision`/commit.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.

2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.

**Security note:** `trust_remote_code` executes model-repo Python. Use only in isolated environments and pin a trusted `revision`/commit.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/ptq/references/unsupported-models.md around lines 14 - 15, Update the guidance around using --trust_remote_code (referenced in the sentence about modeling_*.py and tokenization_*.py and the tools hf_ptq.py, AutoConfig, AutoTokenizer, AutoModel and trust_remote_code=True) to include a security warning: state that --trust_remote_code executes arbitrary Python from the model repo, recommend running it only in isolated or ephemeral environments (e.g., containers or VMs), advise pinning to a specific trusted model revision/commit and verifying the remote code before use, and suggest running with least privileges and network restrictions where possible.

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai

♻️ Duplicate comments (1)

.claude/skills/ptq/references/unsupported-models.md (1)

14-15: ⚠️ Potential issue | 🟠 Major

Add a security warning next to mandatory --trust_remote_code guidance.

Line 14 currently prescribes --trust_remote_code but does not warn that this executes model-repo Python. Please add a short warning about isolation, pinning a trusted revision/commit, and using least privileges/network restrictions.

Suggested doc patch

-2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.
+2. **Check for `modeling_*.py` or `tokenization_*.py`** — custom code shipped with the model. If found, **always use `--trust_remote_code`** with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.  
+   **Security note:** `trust_remote_code` executes Python from the model repository. Run only in isolated/ephemeral environments (container/VM), pin a trusted `revision`/commit, review remote code before use, and apply least-privilege + network restrictions where possible.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/ptq/references/unsupported-models.md around lines 14 - 15,
Add a short security warning alongside the mandatory --trust_remote_code
guidance: state that enabling --trust_remote_code (and trust_remote_code=True in
custom scripts, e.g., hf_ptq.py) executes Python from the model repo, so
recommend running it in an isolated environment (container/VM), pinning a
trusted revision/commit, and applying least-privilege and network restrictions;
place this warning immediately after the sentence mentioning modeling_*.py /
tokenization_*.py and the use of --trust_remote_code so readers see the risk and
mitigations together.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.claude/skills/ptq/references/unsupported-models.md:
- Around line 14-15: Add a short security warning alongside the mandatory
--trust_remote_code guidance: state that enabling --trust_remote_code (and
trust_remote_code=True in custom scripts, e.g., hf_ptq.py) executes Python from
the model repo, so recommend running it in an isolated environment
(container/VM), pinning a trusted revision/commit, and applying least-privilege
and network restrictions; place this warning immediately after the sentence
mentioning modeling_*.py / tokenization_*.py and the use of --trust_remote_code
so readers see the risk and mitigations together.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: daa72d22-6939-4f35-b0da-95f0c0aa5c9a

📥 Commits

Reviewing files that changed from the base of the PR and between 448091b and bf3f574.

📒 Files selected for processing (1)

.claude/skills/ptq/references/unsupported-models.md

Signed-off-by: Meng Xin <mxin@nvidia.com>

mxinO force-pushed the mxin/agent-ptq branch from 5edb700 to 4a3343a Compare March 24, 2026 02:11

mxinO requested review from ChenhanYu, Edwardf0t1, Copilot and kaix-nv March 24, 2026 04:01

Copilot started reviewing on behalf of mxinO March 24, 2026 04:01 View session

mxinO requested a review from realAsma March 24, 2026 04:01

Copilot AI reviewed Mar 24, 2026

View reviewed changes

mxinO added 5 commits March 25, 2026 06:11

Update workspace management to support single-user case

70bb65f

Workspace-by-model-name is useful for single users too, not just multi-user Slack bot. Rewrite to cover both: ./workspaces/ for single-user, $MODELOPT_WORKSPACE_ROOT for multi-user. Signed-off-by: Meng Xin <mxin@nvidia.com>

Reference workspace management in PTQ skill

d8e1d37

Signed-off-by: Meng Xin <mxin@nvidia.com>

Improve step 4 routing and use nvidia-smi for GPU detection

60e654c

Signed-off-by: Meng Xin <mxin@nvidia.com>

update launcher guide

0667f13

Signed-off-by: Meng Xin <mxin@nvidia.com>

mxinO force-pushed the mxin/agent-ptq branch from a22c6ae to 0667f13 Compare March 25, 2026 08:34

mxinO added 4 commits March 25, 2026 08:55

clear remote path

1b0b346

Signed-off-by: Meng Xin <mxin@nvidia.com>

bridge config to launcher

e796a64

Signed-off-by: Meng Xin <mxin@nvidia.com>

restructure the decision tree

b5bd2b4

Signed-off-by: Meng Xin <mxin@nvidia.com>

bridge partiton

2b99eb5

Signed-off-by: Meng Xin <mxin@nvidia.com>

Edwardf0t1 reviewed Mar 25, 2026

View reviewed changes

.claude/skills/common/environment-setup.md Outdated Show resolved Hide resolved

mxinO added 5 commits March 26, 2026 03:58

optimize flow

630a954

Signed-off-by: Meng Xin <mxin@nvidia.com>

unsupported -> unlisted

21f68b9

Signed-off-by: Meng Xin <mxin@nvidia.com>

remote vs local path

5a4cb26

Signed-off-by: Meng Xin <mxin@nvidia.com>

slurm single node fix

84ad824

Signed-off-by: Meng Xin <mxin@nvidia.com>

Merge remote-tracking branch 'origin/main' into mxin/agent-ptq

87988e8

mxinO marked this pull request as ready for review March 26, 2026 08:56

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Edwardf0t1 approved these changes Mar 26, 2026

View reviewed changes

.claude/skills/ptq/references/slurm-setup-ptq.md Show resolved Hide resolved

This was referenced Mar 28, 2026

Add Agent Evaluation skill for accuracy benchmarking #1132

Open

Add Agent Deployment skill for model serving #1133

Open

kaix-nv reviewed Mar 29, 2026

View reviewed changes

.claude/skills/ptq/SKILL.md Outdated Show resolved Hide resolved

kaix-nv reviewed Mar 29, 2026

View reviewed changes

.claude/skills/ptq/SKILL.md Outdated Show resolved Hide resolved

kaix-nv reviewed Mar 29, 2026

View reviewed changes

.claude/skills/common/environment-setup.md Show resolved Hide resolved

kaix-nv reviewed Mar 29, 2026

View reviewed changes

.claude/skills/ptq/SKILL.md Show resolved Hide resolved

Edwardf0t1 requested review from jingyu-ml and juhim98 March 30, 2026 18:45

mxinO added 2 commits April 1, 2026 07:09

update for unlisted model

fc457fc

Signed-off-by: Meng Xin <mxin@nvidia.com>

add tests

fe8516c

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

.claude/skills/ptq/references/unsupported-models.md Show resolved Hide resolved

address comments

d8f1764

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

mxinO added 3 commits April 2, 2026 05:37

Add FakeUnsupported-0.6B test case for custom module patching

448091b

Signed-off-by: Meng Xin <mxin@nvidia.com>

Merge remote-tracking branch 'origin/main' into mxin/agent-ptq

6bd0609

Simplify MoE Pattern 2: table of plugin examples instead of inline code

bf3f574

Signed-off-by: Meng Xin <mxin@nvidia.com>

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

mxinO added 3 commits April 2, 2026 06:31

Add PTQ skill to changelog

9cec7ba

Signed-off-by: Meng Xin <mxin@nvidia.com>

Prefer patching ModelOpt plugin over custom scripts for unlisted models

f612c66

Signed-off-by: Meng Xin <mxin@nvidia.com>

Mark PTQ skill as early testing in changelog

8a3de6e

Signed-off-by: Meng Xin <mxin@nvidia.com>


		# ModelOpt Post-Training Quantization

		Produce a quantized checkpoint from a pretrained model. Read `examples/llm_ptq/README.md` first — it has the support matrix, CLI flags, and accuracy guidance.

		2. *Check for `modeling_.py` or `tokenization_.py`* — custom code shipped with the model. If found, always use `--trust_remote_code` with `hf_ptq.py`, and `trust_remote_code=True` in any custom scripts. Without it, `AutoConfig`, `AutoTokenizer`, and `AutoModel` will fail to resolve custom classes.

Conversation

mxinO commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-03 06:10 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

mxinO Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mxinO commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-03 06:10 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 24, 2026 •

edited

Loading

Edwardf0t1 left a comment •

edited

Loading