A modular chatbot framework designed for easy experimentation with LLMs, VLMs, multimodal inputs, and customizable pipelines.
This project is the evolution of the original chatbot repository and introduces GPU‑accelerated vision capabilities, attachment handling, and a cleaner architecture for future development.
-
Vision-Language Model Integration (VLM):
Added support for Qwen V3 2B running on GPU for fast and accurate multimodal reasoning: image understanding, OCR-like extraction, captioning, and more. -
Attachment Input Handling:
You can now upload images or documents directly; the system processes them and routes them through the appropriate model pipeline. -
Improved Modular Architecture:
Cleaner separation between components (UI, inference backend, runtime logic).
- Text‑only and multimodal chat
- GPU‑accelerated inference
- Plug‑and‑play model configuration
- Support for attachments (images and videos)
- Local and cloud‑ready deployment
- Fully open‑source and easy to customize
git clone https://github.com/yaghmo/chatbot.git
cd chatbotconda create -n chatbot python=3.10 -y
conda activate chatbotpip install -r requirements.txtpython launch.py| Type | Model | Notes |
|---|---|---|
| LLM | Mistral 7B (gguf) | Local or API-based |
| VLM | Qwen V3 2B | GPU‑accelerated multimodal model |
