Skip to content

BIT-DataLab/Edit-Banana

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Edit Banana Logo

🍌 Edit Banana

δΈ­ζ–‡ | English

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License CUDA WeChat GitHub stars


Try It Now!

Try Online Demo

πŸ‘† Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.

Warning

Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.


πŸ’¬ Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

Tip

If the QR code has expired, please submit an Issue to request an updated one.


πŸ“‘ Table of Contents


πŸ“Έ Effect Demonstration

High-Definition Input-Output Comparison (4 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO

πŸ”’ Original Static Diagram (Input Β· Non-editable) πŸ”“ DrawIO Reconstruction Result (Output Β· Fully Editable)

Example 1: Basic Flowchart

Original Diagram 1

✨ Editable Flowchart

Reconstruction Result 1

Example 2: Multi-level Architecture

Original Diagram 2

✨ Editable Architecture

Reconstruction Result 2

Example 3: Technical Schematic

Original Diagram 3

✨ Editable Schematic

Reconstruction Result 3

Example 4: Scientific Formula

Original Diagram 4

✨ Editable Formula

Reconstruction Result 4

Scenario 2: Human in the Loop Modification



✨ Manual repair




✨ Save locally

Note

✨ Conversion Highlights:

  1. Preserves the layout logic, color matching, and element hierarchy of the original diagram.
  2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
  3. Accurate text recognition, supporting direct subsequent editing and format adjustment.
  4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.

πŸš€ Key Features

  • Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.

  • Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.

  • Text Recognition:

    • Local OCR for text localization; easy to install, runs offline.
    • Pix2Text for mathematical formula recognition and LaTeX conversion .
    • Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
  • User System:

    • Registration: New users receive 10 free credits.
    • Credit System: Pay-per-use model prevents resource abuse.
    • Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

πŸ› οΈ Architecture Pipeline

  1. Input: Image (PNG/JPG/BMP/TIFF/WebP).
  2. Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
  3. Text Extraction (Parallel):
    • Local OCR (Tesseract) detects text bounding boxes.
    • High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
  4. DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

πŸ“‚ Project Structure

Click to expand project structure
Edit-Banana/
β”œβ”€β”€ config/               # Configuration files (copy config.yaml.example β†’ config.yaml)
β”œβ”€β”€ flowchart_text/       # OCR & Text Extraction Module (standalone entry)
β”‚   β”œβ”€β”€ src/
β”‚   └── main.py             # OCR-only entry point
β”œβ”€β”€ input/                # [Manual] Input images directory
β”œβ”€β”€ models/               # [Manual] Model weights (SAM3) and optional BPE vocab
β”œβ”€β”€ output/               # [Manual] Results directory
β”œβ”€β”€ sam3/                 # SAM3 library (see Installation: install from facebookresearch/sam3)
β”œβ”€β”€ sam3_service/         # SAM3 HTTP service (optional, for multi-process deployment)
β”œβ”€β”€ scripts/              # Setup and utility scripts
β”‚   β”œβ”€β”€ setup_sam3.sh       # Install SAM3 lib and copy BPE to models/
β”‚   β”œβ”€β”€ setup_rmbg.py       # Download RMBG model from ModelScope
β”‚   └── merge_xml.py        # XML merge utilities
β”œβ”€β”€ main.py               # CLI entry (modular pipeline)
β”œβ”€β”€ server_pa.py          # FastAPI backend server
└── requirements.txt      # Python dependencies

πŸ“¦ Installation & Setup

Follow these core phases to set up the project locally.

Phase 1: Environment & Base Setup

Configure your base environment and directory structure.

1. Prerequisites & Environment

  • Python 3.10+** & CUDA-capable GPU (Highly recommended)

  • Install PyTorch with CUDA support (e.g., for CUDA 11.8):

    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

2. Clone Repository & Init Directories

git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana
mkdir -p input output sam3_output

Phase 2: Models & Core Dependencies

Next, install the required packages and download necessary model weights (which should be placed in models/ and not committed).

1. Base Dependencies

pip install -r requirements.txt

2. SAM3 & Model Assets

  • SAM3 Library & BPE: Run bash scripts/setup_sam3.shto install the lib and copy the BPE vocab to models/. Verify with:

    python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"
  • SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under models/sam3_ms.

  • Text Local OCR (Tesseract):

    sudo apt install tesseract-ocr tesseract-ocr-chi-sim
🧩 Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
  • PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).

    pip install paddlepaddle==3.2.2 paddleocr.
  • Formula (Pix2Text):

    pip install pix2text onnxruntime-gpu.
  • Background Removal (RMBG): pip install onnxruntime modelscope then run python scripts/setup_rmbg.py.

Phase 3: Configuration & Troubleshooting

1. Final Configuration

Copy the example config and adjust the asset paths:

cp config/config.yaml.example config/config.yaml

Edit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.

πŸ› οΈ Before First Run Checklist & Troubleshooting - Click to expand

Checklist:

  • Config files copied and model paths set in config.yaml
  • SAM3 weights (sam3.pt) and BPE vocab placed under models/
  • Extracted SAM3 library via scripts/setup_sam3.sh Tesseract or PaddleOCR installed

Common Issues:

  • "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set sam3.device: "cpu".
  • "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
  • "PaddleOCR inference failed...": Use paddlepaddle==3.2.2 or fallback to Tesseract.

πŸ”€ Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

  1. One-time setup

    git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana
    python3 -m venv .venv && source .venv/bin/activate   # Linux/macOS; Windows: .venv\Scripts\activate
    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118   # or CPU build
    pip install -r requirements.txt
    sudo apt install tesseract-ocr tesseract-ocr-chi-sim   # OCR (or equivalent on your OS)

    Install the SAM3 library and download model weights + BPE. Then:

    mkdir -p input output
    cp config/config.yaml.example config/config.yaml
    # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths
  2. Test with CLI

    # Put a diagram image in input/, e.g. input/test.png
    python main.py -i input/test.png
    # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)
  3. Optional: test the web API

    python server_pa.py
    # In another terminal:
    curl -X POST http://localhost:8000/convert -F "file=@input/test.png"
    # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

βš™οΈ Configuration

Customize the pipeline behavior in config/config.yaml:

  • sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.

  • paths: Set input/output directories.

  • dominant_color: Fine-tune color extraction sensitivity.


πŸ“Œ Development Roadmap

Feature Module Status Description
Core Conversion Pipeline βœ… Completed Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection ⚠️ In Development Automatically associate arrows with target shapes
DrawIO Template Adaptation πŸ“ Planned Support custom template import
Batch Export Optimization πŸ“ Planned Batch export to DrawIO files (.drawio)
Local LLM Adaptation πŸ“ Planned Support local VLM deployment, independent of APIs

🀝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/xxx)
  3. Commit your changes (git commit -m 'feat: add xxx')
  4. Push to the branch (git push origin feature/xxx)
  5. Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions


🀩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

View Contributors List
Name/ID Email
Chai Chengliang ccl@bit.edu.cn
Zhang Chi zc315@bit.edu.cn
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
An Minghao
Yu Mingjie
Yu Xinjiang
Chen Zhuofan
Li Xiangkun

πŸ“„ License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).


🌟 Star History

🌟 If this project helps you, please star it to show your support!

Star History Chart

About

Edit Banana: A framework for converting statistical formats into editable.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors