CLIP Video Investigator is a Flask-based web application designed to compare text and image embeddings using the CLIP model. The application integrates OpenCV for video processing and Plotly for data visualization to accomplish the following:
- Play a video in a web browser.
- Pause and resume video playback.
- Compare CLIP embeddings of video frames with text embeddings.
- Visualize the similarity between text and image embeddings in real-time using a Plotly plot.
- Jump to specific frames by clicking on the Plotly plot.
Youtube video: https://www.youtube.com/watch?v=XllrtZnPL6M
Understanding the relationship between text and image embeddings can provide insights into how well a model generalizes across modalities. By plotting these values in real-time, researchers and engineers can:
- Identify key frames where text and image embeddings are highly aligned or misaligned.
- Debug and fine-tune the performance of multimodal models.
- Gain insights into the temporal evolution of embeddings in video data.
- Enable more effective search and retrieval tasks for video content.
-
Video Playback: Uses OpenCV to read video frames and displays them in the web browser.
-
Play/Pause: Allows the user to start and stop video playback.
-
Data Visualization: Uses Plotly to plot data related to the video frames.
-
Interactive Plot: Allows the user to click on the plot to jump to specific frames in the video.
-
Reset Functionality: Resets the application to its initial state.
-
Embedding Caching: Pickle files of the text and image frame embeddings are saved for each video in the
/embeddings
folder. This allows for quicker subsequent analysis by avoiding the need to regenerate these embeddings.
A config.yaml
file is used to specify various settings for the application:
roboflow_api_key: "" # Roboflow API key
video_path: "" # Path to video file
CLIP:
- wall
- tile wall
- large tile wall
roboflow_api_key
: Your API key for Roboflow.video_path
: The path to the video file you want to analyze.CLIP
: A list of text inputs for which you want to generate CLIP embeddings.
clip_investigator/
├── config.yaml
├── scripts/
│ └── example.pkl
├── embeddings/
│ └── clip_app.py
│ └── clip_functions.py
├── static/
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── main.js
└── templates/
└── index.html
clip_app.py
: The main Flask application file.config.yaml
: Configuration file for specifying settings.embeddings/
: Folder where pickle files of text and image embeddings are stored.static/
: Contains static files like CSS and JavaScript.templates/
: Contains HTML templates.
- Python 3.x
- Virtualenv (optional but recommended)
-
Clone the repository.
git clone https://github.com/roboflow/clip_video_app.git
-
Navigate to the project directory.
cd clip_video_app
-
(Optional) Create a virtual environment.
-
Install the dependencies.
pip install -r requirements.txt
You must also be running the roboflow inference server localy!
-
Update the
config.yaml
file with your Roboflow API key and the path to your video file (or use sample file in /data folder). -
Start the Flask application.
python scripts/clip_app.py
-
Open a web browser and navigate to
http://localhost:5000
. -
Use the "Start" and "Stop" buttons to control video playback.
-
View real-time data related to the video in the Plotly plot below the video.
-
Click on the Plotly plot to jump to specific video frames.
-
WebSocket Errors: If you encounter WebSocket errors, check the browser console for specific error messages. The application has built-in error handling to attempt reconnections.
-
Plotly Click Events: If click events are not detected on the Plotly plot after a reset, reload the page.