Text to image and Image to text

Generate image from text and vice versa

Project Description

Developed a bidirectional AI system capable of generating images from text prompts and generating descriptive captions from images.

The project integrates two powerful pre-trained deep learning models:

stabilityai/stable-diffusion-2 for converting text prompts into high-quality AI-generated images.

nlpconnect/vit-gpt2-image-captioning for generating natural language descriptions from images.

The backend is built using Python Flask, which hosts the models and exposes REST APIs. The frontend is developed using HTML and JavaScript, enabling users to input text prompts or upload images and view results dynamically.

GPU acceleration is recommended for faster processing, although the system can operate on CPU with slower inference time.

Features

Generate realistic images from natural language prompts using Stable Diffusion.
Generate meaningful captions from uploaded images.
Smooth frontend-backend communication via HTTP requests.
Optimized for GPU execution; supports CPU fallback.
Simple and dynamic UI using HTML and JavaScript.
Utilizes state-of-the-art diffusion and vision-language models.

Tech Stack

Python 3.x
Flask
Transformers (Hugging Face library)
PyTorch
HTML5
CSS3
JavaScript
HTML, JavaScript (Frontend)

System Requirements

Python 3.8+
Anaconda
Libraries: numpy, pandas, scikit-learn, matplotlib
Good to have GPU

Deliverables

Project Code
Setup over video call
WhatsApp support

WhatsApp For Project

Page updated

Google Sites

Report abuse