An AI-powered multimodal video-analysis system that allows users to chat with YouTube videos, navigate through timestamped sections, and run visual content searches. This system helps users quickly find and reference specific parts of long videos.
- Video Upload & Chat: Upload a YouTube link and chat naturally with the video.
- Timestamped Section Breakdown: Automatically generates a structured video outline with hyperlinked timestamps.
- Contextual Citations: Chat responses include timestamp hyperlinks, redirecting users to the exact referenced moment.
- Visual Search: Accepts natural language queries about visual frames or content and returns matching video clips.
- Time-Saving Navigation: Helps users skip directly to relevant parts of long-form video content.
- Frontend: React, TypeScript
- AI Backend: Gemini API (multimodal video + text analysis)
- Data Source: YouTube video links
- UI Enhancements: TailwindCSS, shadcn/ui components (optional)
git clone https://github.com/YoshaM09/MultimodalVideoAnalysis.git
cd MultimodalVideoAnalysispip install -r requirements.txt- Create a .env file in the project root with your API key:
GEMINI_API_KEY=your_gemini_api_keynpm run dev- Open the web app in your browser.
- Enter a YouTube video link.
- Explore:
- View the section breakdown with timestamp hyperlinks.
- Chat with the video to ask questions about its content.
- Get timestamped answers pointing to the exact moment.
- Run a visual content search with natural language queries to retrieve relevant clips.
- Contributions are welcome! Please submit a pull request or open an issue for suggestions.
- This project is licensed under the MIT License.
