An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
-
Updated
Feb 9, 2026 - TypeScript
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
🌌 Orion AI Workspace – A free intelligent workspace platform that combines advanced AI models with real-time collaboration tools, designed with privacy-first principles and user-controlled API keys.
AI StoryTeller is a multimodal AI application that converts images into creative short stories by combining computer vision and natural language generation. The system uses a pretrained image captioning model to understand visual content and Google Gemini to generate context-aware narratives grounded in the image.
A modular academic project exploring multimodal intrusion detection using RGB video, thermal input, tracking, and future audio/RF signals. Work-in-progress learning project with a clean architecture and 70-task roadmap.
Build a Machine Learning model that predicts whether a mushroom is poisonous or edible based on its physical and environmental attributes. The goal is to help identify potentially harmful mushrooms early so safer decisions can be made while handling or consuming them.
Hệ thống Hỏi đáp trực quan (VQA). Mô hình AI đa phương thức kết hợp Thị giác máy tính (CNN) và Xử lý ngôn ngữ tự nhiên (LSTM) để trả lời câu hỏi dựa trên nội dung hình ảnh.
A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.
RAG MCP Frontend — a lightweight React/TypeScript frontend for interacting with Retrieval-Augmented Generation (RAG) services and the MCP (Multi-Channel Processing) backend. This project offers a clean UI for document ingestion, query/response flows, conversation history.
A real-time image captioning and visual question answering (VQA) system. This project uses computer vision and NLP to generate descriptive captions for images and answer user questions about them.
GenAI turns waste (peels, grounds) into drugs <60s. Upload img/txt → fragments → structures → ADMET/EcoScore → RAG validate → PDF. Built: GPT-4o, Llama-3, LangChain, RDKit. Guided: Dr. Hammad Majeed (UMT Lahore). Hackathon 2025.
MindTrack is an AI-powered multimodal emotion detection system using both text and images to monitor emotional well-being in real time.
Production-grade semantic video search engine - search across video content using natural language. Powered by Whisper, GPT-4o Vision, vector embeddings, and Pinecone.
Add a description, image, and links to the multimodel-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodel-ai topic, visit your repo's landing page and select "manage topics."