Image Captioner

Caption any image and generate the text

Project Description

NLP-based Image Captioning System that automatically generates meaningful textual descriptions for uploaded images. The application uses a pre-trained deep learning model (nlpconnect/vit-gpt2-image-captioning) deployed on a Python Flask backend. The frontend is built using HTML, CSS, and JavaScript, and communicates with the backend via REST APIs to process images and display generated captions in real time.

The system integrates computer vision and natural language processing techniques to convert visual content into human-readable text.

Features

Users can upload images through a web interface.
Utilizes the vit-gpt2-image-captioning model for accurate caption generation.
Seamless frontend-backend interaction using HTTP requests.
Fast processing and instant caption display.
Clean and user-friendly UI built with HTML, CSS, and JavaScript.
Combines Vision Transformer (ViT) and GPT-2 for image understanding and text generation.

Tech Stack

Python 3.x
Flask
Transformers (Hugging Face library)
PyTorch
HTML5
CSS3
JavaScript
HTML, JavaScript (Frontend)

System Requirements

Python 3.8+
Anaconda
Libraries: numpy, pandas, scikit-learn, matplotlib
No GPU required

Deliverables

Project Code
Setup over video call
WhatsApp support

WhatsApp For Project

Page updated

Google Sites

Report abuse