You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An interactive Visual Retrieval-Augmented Generation (RAG) system that combines:
🔎 Cohere Embed-4 for multimodal embeddings
⚡ Google Gemini 2.5 Flash for visual question answering
Built with Streamlit, this app allows you to upload images and PDFs, then query them with natural language to extract insights from charts, diagrams, and document pages.
✨ Features
📂 Upload PDFs & Images
PDFs are automatically converted into page images.
Images are auto-enhanced for better embedding + Q&A.
🔎 Multimodal Retrieval
Uses Cohere Embed-4 to compute embeddings for each image/page.
Finds the most semantically relevant page/image for a given query.
🤖 AI-Powered Answers
Google Gemini 2.5 Flash analyzes the retrieved visual content.
Generates clear, context-aware answers to your question.