Caption any image and generate the text
Project Description
NLP-based Image Captioning System that automatically generates meaningful textual descriptions for uploaded images. The application uses a pre-trained deep learning model (nlpconnect/vit-gpt2-image-captioning) deployed on a Python Flask backend. The frontend is built using HTML, CSS, and JavaScript, and communicates with the backend via REST APIs to process images and display generated captions in real time.
The system integrates computer vision and natural language processing techniques to convert visual content into human-readable text.
Features
Users can upload images through a web interface.
Utilizes the vit-gpt2-image-captioning model for accurate caption generation.
Seamless frontend-backend interaction using HTTP requests.
Fast processing and instant caption display.
Clean and user-friendly UI built with HTML, CSS, and JavaScript.
Combines Vision Transformer (ViT) and GPT-2 for image understanding and text generation.
Tech Stack
Python 3.x
Flask
Transformers (Hugging Face library)
PyTorch
HTML5
CSS3
JavaScript
HTML, JavaScript (Frontend)
System Requirements
Python 3.8+
Anaconda
Libraries: numpy, pandas, scikit-learn, matplotlib
No GPU required
Deliverables
Project Code
Setup over video call
WhatsApp support
WhatsApp For Project