Generate image from text and vice versa
Project Description
Developed a bidirectional AI system capable of generating images from text prompts and generating descriptive captions from images.
The project integrates two powerful pre-trained deep learning models:
stabilityai/stable-diffusion-2 for converting text prompts into high-quality AI-generated images.
nlpconnect/vit-gpt2-image-captioning for generating natural language descriptions from images.
The backend is built using Python Flask, which hosts the models and exposes REST APIs. The frontend is developed using HTML and JavaScript, enabling users to input text prompts or upload images and view results dynamically.
GPU acceleration is recommended for faster processing, although the system can operate on CPU with slower inference time.
Features
Generate realistic images from natural language prompts using Stable Diffusion.
Generate meaningful captions from uploaded images.
Smooth frontend-backend communication via HTTP requests.
Optimized for GPU execution; supports CPU fallback.
Simple and dynamic UI using HTML and JavaScript.
Utilizes state-of-the-art diffusion and vision-language models.
Tech Stack
Python 3.x
Flask
Transformers (Hugging Face library)
PyTorch
HTML5
CSS3
JavaScript
HTML, JavaScript (Frontend)
System Requirements
Python 3.8+
Anaconda
Libraries: numpy, pandas, scikit-learn, matplotlib
Good to have GPU
Deliverables
Project Code
Setup over video call
WhatsApp support
WhatsApp For Project