δΈζ | English
Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
π Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.
Warning
Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
Tip
If the QR code has expired, please submit an Issue to request an updated one.
- πΈ Effect Demonstration
- π Key Features
- π οΈ Architecture Pipeline
- π Project Structure
- π¦ Installation & Setup
- π€ Usage
- βοΈ Configuration
- π Development Roadmap
- π¬ Join WeChat Group
- π€ Contribution Guidelines
- π€© Contributors
- π License
- π Star History
To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.
Note
β¨ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram.
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
- Accurate text recognition, supporting direct subsequent editing and format adjustment.
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.
-
Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
-
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.
-
Text Recognition:
- Local OCR for text localization; easy to install, runs offline.
- Pix2Text for mathematical formula recognition and LaTeX conversion .
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
-
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
- Input: Image (PNG/JPG/BMP/TIFF/WebP).
- Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
- Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
- DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.
Click to expand project structure
Edit-Banana/
βββ config/ # Configuration files (copy config.yaml.example β config.yaml)
βββ flowchart_text/ # OCR & Text Extraction Module (standalone entry)
β βββ src/
β βββ main.py # OCR-only entry point
βββ input/ # [Manual] Input images directory
βββ models/ # [Manual] Model weights (SAM3) and optional BPE vocab
βββ output/ # [Manual] Results directory
βββ sam3/ # SAM3 library (see Installation: install from facebookresearch/sam3)
βββ sam3_service/ # SAM3 HTTP service (optional, for multi-process deployment)
βββ scripts/ # Setup and utility scripts
β βββ setup_sam3.sh # Install SAM3 lib and copy BPE to models/
β βββ setup_rmbg.py # Download RMBG model from ModelScope
β βββ merge_xml.py # XML merge utilities
βββ main.py # CLI entry (modular pipeline)
βββ server_pa.py # FastAPI backend server
βββ requirements.txt # Python dependencies
Follow these core phases to set up the project locally.
Configure your base environment and directory structure.
-
Python 3.10+** & CUDA-capable GPU (Highly recommended)
-
Install PyTorch with CUDA support (e.g., for CUDA 11.8):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana
mkdir -p input output sam3_outputNext, install the required packages and download necessary model weights (which should be placed in models/ and not committed).
pip install -r requirements.txt-
SAM3 Library & BPE: Run
bash scripts/setup_sam3.shto install the lib and copy the BPE vocab tomodels/. Verify with:python -c "from sam3.model_builder import build_sam3_image_model; print('OK')" -
SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under
models/sam3_ms. -
Text Local OCR (Tesseract):
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
π§© Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
-
PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).
pip install paddlepaddle==3.2.2 paddleocr.
-
Formula (Pix2Text):
pip install pix2text onnxruntime-gpu.
-
Background Removal (RMBG):
pip install onnxruntime modelscopethen runpython scripts/setup_rmbg.py.
Copy the example config and adjust the asset paths:
cp config/config.yaml.example config/config.yamlEdit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.
π οΈ Before First Run Checklist & Troubleshooting - Click to expand
Checklist:
- Config files copied and model paths set in
config.yaml - SAM3 weights (
sam3.pt) and BPE vocab placed undermodels/ - Extracted SAM3 library via
scripts/setup_sam3.shTesseract or PaddleOCR installed
Common Issues:
- "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set
sam3.device: "cpu". - "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
- "PaddleOCR inference failed...": Use
paddlepaddle==3.2.2or fallback to Tesseract.
Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:
python main.py -i input/test_diagram.pngThe output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.
-
One-time setup
git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana python3 -m venv .venv && source .venv/bin/activate # Linux/macOS; Windows: .venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # or CPU build pip install -r requirements.txt sudo apt install tesseract-ocr tesseract-ocr-chi-sim # OCR (or equivalent on your OS)
Install the SAM3 library and download model weights + BPE. Then:
mkdir -p input output cp config/config.yaml.example config/config.yaml # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths -
Test with CLI
# Put a diagram image in input/, e.g. input/test.png python main.py -i input/test.png # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)
-
Optional: test the web API
python server_pa.py # In another terminal: curl -X POST http://localhost:8000/convert -F "file=@input/test.png" # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload
Customize the pipeline behavior in config/config.yaml:
-
sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
-
paths: Set input/output directories.
-
dominant_color: Fine-tune color extraction sensitivity.
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | β Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | Automatically associate arrows with target shapes | |
| DrawIO Template Adaptation | π Planned | Support custom template import |
| Batch Export Optimization | π Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | π Planned | Support local VLM deployment, independent of APIs |
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
Thanks to all developers who have contributed to the project and promoted its iteration!
View Contributors List
| Name/ID | |
|---|---|
| Chai Chengliang | ccl@bit.edu.cn |
| Zhang Chi | zc315@bit.edu.cn |
| Deng Qiyan | |
| Rao Sijing | |
| Yi Xiangjian | |
| Li Jianhui | |
| Shen Chaoyuan | |
| Zhang Junkai | |
| Han Junyi | |
| You Zirui | |
| Xu Haochen | |
| An Minghao | |
| Yu Mingjie | |
| Yu Xinjiang | |
| Chen Zhuofan | |
| Li Xiangkun |
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
π If this project helps you, please star it to show your support!










