Text to audio generation

Generate audio from text

Project Description

Developed an AI-powered Text-to-Audio Generation System that converts written text into natural-sounding speech using a pre-trained deep learning model.

The project integrates the suno/bark model from Hugging Face for high-quality audio synthesis. The backend is implemented using Python Flask, which hosts the model and exposes REST APIs. GPU acceleration is enabled using PyTorch to optimize audio generation performance.

The frontend is developed using HTML and JavaScript, allowing users to input text and receive dynamically generated audio output through the web interface.

Features

Converts user input text into realistic audio output.
Utilizes suno/bark for high-quality voice synthesis.
Uses PyTorch with CUDA support for faster audio generation.
Frontend and backend communicate via HTTP APIs.
Simple and user-friendly UI built with HTML and JavaScript.
Generated audio can be played directly in the browser.

Tech Stack

Python 3.x
Flask
Transformers (Hugging Face library)
PyTorch
HTML5
CSS3
JavaScript
HTML, JavaScript (Frontend)

System Requirements

Python 3.8+
Anaconda
Libraries: numpy, pandas, scikit-learn, matplotlib
Good to have GPU

Deliverables

Project Code
Setup over video call
WhatsApp support

WhatsApp For Project

Page updated

Google Sites

Report abuse