A powerful Gradio web application for downloading, resharding, and re-uploading large Hugging Face models, with built-in optimizations for large Vision-Language (VL) models such as the Qwen family. This intuitive tool is designed to help engineers and researchers easily manage unwieldy model weights by breaking them into smaller, more manageable shards and pushing them directly to either a private or public Hugging Face repository—all from a clean UI.
Maintains version control and enables the transformation of files from older Transformers versions to newer versions.
- Interactive UI: Fully built with Gradio, providing an easy-to-use visual interface to configure your source model, target repository, and architecture.
- Automated Resharding: Dynamically specify your desired
shard_size(e.g.,4.4GBor2GB) directly in the UI to optimize for different hardware constraints or storage limitations. - Hardware Acceleration: Automatically detects CUDA availability and utilizes GPU acceleration for faster model loading and processing.
- Supported Architectures: Built-in support for
Qwen3.5,Qwen3-VL,Qwen2.5-VL, andQwen2-VLarchitectures. - Dependency Isolation: Fully compatible with the
uvpackage manager.
The application currently supports the following Hugging Face model architectures (extensible in src/app.py):
Qwen3_5ForConditionalGeneration(Qwen 3.5)Qwen3VLForConditionalGeneration(Qwen 3 VL)Qwen2_5_VLForConditionalGeneration(Qwen 2.5 VL)Qwen2VLForConditionalGeneration(Qwen 2 VL)
- Python: Version 3.10 or higher.
- uv: Recommended for fast, reliable dependency management.
- Hugging Face Token: A valid token with write access (
hf_...) is required to create repositories and upload the sharded models.
-
Clone the repository:
git clone https://github.com/PRITHIVSAKTHIUR/model.resharder-transformers.git cd model.resharder-transformers -
Install dependencies: If using
uv(recommended):uv sync
You can start the Gradio server directly using uv or standard python:
uv run python src/app.pyThe application will launch on your local network (typically http://127.0.0.1:7860).
- Open the UI in your browser.
- In the configuration panel, enter the Original Model Name (e.g.,
Qwen/Qwen3-VL-2B-Instruct). - Enter your New Repository ID where you want the resharded model saved (e.g.,
your-username/Qwen3-VL-2B-Sharded). - Enter your HuggingFace Write Token.
- Set the Max Shard Size to your preferred split limit (e.g.,
4.4GB). - Select the correct Model Architecture from the dropdown menu.
- Click Shard & Upload Model.
- Watch the Process Logs panel for the final output once the operation completes.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.