topic-voice is an AI-powered service that takes a user-provided topic, enhances it using Google Gemini's generative AI, and then generates a spoken audio file using Azure's Speech Service. The resulting audio file is saved locally and its path is returned to the user via an API endpoint.
- POST a topic to
/process-topic. - The service enhances the topic description using Google Gemini ("gemini-1.5-flash").
- The enhanced text is converted to speech using Azure's neural voices.
- The generated audio file is saved in the
tempdirectory, and the API returns the file path and enhanced text.
bun installbun run index.tscurl -X POST http://localhost:3000/process-topic \
-H 'Content-Type: application/json' \
-d '{"topic": "The future of AI in education"}'{
"originalTopic": "The future of AI in education",
"enhancedText": "...AI-enhanced version...",
"audioFilePath": "/path/to/project/temp/uuid.wav"
}The goal of this project is to provide a simple, extensible backend for generating rich, AI-enhanced spoken content from user topics. It demonstrates the integration of generative AI and speech synthesis, and serves as a foundation for more advanced content and media workflows.
- Database Integration: Save topic, enhanced text, and file metadata to a database for tracking and analytics.
- Cloud Storage: Store generated audio files in a cloud bucket (e.g., AWS S3, Google Cloud Storage) for scalable access and sharing.
- User Management & UI: Build a web UI for users to submit topics, manage their generated files, and listen to/download audio.
- Authentication: Add user authentication and authorization for secure access.
- History & Search: Allow users to view and search their history of generated content.
- Multi-language Support: Enable speech synthesis in multiple languages and voices.
- Batch Processing: Allow bulk topic submission and processing.
- Notifications: Notify users when their audio is ready (email, webhooks, etc).