Skip to content

theebruv/topic-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topic-voice

Overview

topic-voice is an AI-powered service that takes a user-provided topic, enhances it using Google Gemini's generative AI, and then generates a spoken audio file using Azure's Speech Service. The resulting audio file is saved locally and its path is returned to the user via an API endpoint.

How it works

  1. POST a topic to /process-topic.
  2. The service enhances the topic description using Google Gemini ("gemini-1.5-flash").
  3. The enhanced text is converted to speech using Azure's neural voices.
  4. The generated audio file is saved in the temp directory, and the API returns the file path and enhanced text.

Usage

Install dependencies

bun install

Run the server

bun run index.ts

Example API Request

curl -X POST http://localhost:3000/process-topic \
  -H 'Content-Type: application/json' \
  -d '{"topic": "The future of AI in education"}'

Example Response

{
  "originalTopic": "The future of AI in education",
  "enhancedText": "...AI-enhanced version...",
  "audioFilePath": "/path/to/project/temp/uuid.wav"
}

Intention

The goal of this project is to provide a simple, extensible backend for generating rich, AI-enhanced spoken content from user topics. It demonstrates the integration of generative AI and speech synthesis, and serves as a foundation for more advanced content and media workflows.

Future Plans

  • Database Integration: Save topic, enhanced text, and file metadata to a database for tracking and analytics.
  • Cloud Storage: Store generated audio files in a cloud bucket (e.g., AWS S3, Google Cloud Storage) for scalable access and sharing.
  • User Management & UI: Build a web UI for users to submit topics, manage their generated files, and listen to/download audio.
  • Authentication: Add user authentication and authorization for secure access.
  • History & Search: Allow users to view and search their history of generated content.
  • Multi-language Support: Enable speech synthesis in multiple languages and voices.
  • Batch Processing: Allow bulk topic submission and processing.
  • Notifications: Notify users when their audio is ready (email, webhooks, etc).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors